Android LMKD(2) 源码分析2

作者 by adtxl / 2022-03-02 / 暂无评论 / 687 个足迹

转载自https://blog.csdn.net/shift_wwx/article/details/121593698

继续接着Android LMKD(2) 源码分析1

5. mainloop

这里暂时不过多分析,因为涉及到不同handler 可能引起的delay 处理。

主要是使用epoll机制,通过epoll_wait阻塞等待触发:

/* Wait for events until the next polling timeout */
nevents = epoll_wait(epollfd, events, maxevents, delay);

等待唤醒后主要做了两件事情,确认是否有connect断开,执行handler:

        /*
         * First pass to see if any data socket connections were dropped.
         * Dropped connection should be handled before any other events
         * to deallocate data connection and correctly handle cases when
         * connection gets dropped and reestablished in the same epoll cycle.
         * In such cases it's essential to handle connection closures first.
         */
        for (i = 0, evt = &events[0]; i < nevents; ++i, evt++) {
            if ((evt->events & EPOLLHUP) && evt->data.ptr) {
                ALOGI("lmkd data connection dropped");
                handler_info = (struct event_handler_info*)evt->data.ptr;
                ctrl_data_close(handler_info->data);
            }
        }

        /* Second pass to handle all other events */
        for (i = 0, evt = &events[0]; i < nevents; ++i, evt++) {
            if (evt->events & EPOLLERR) {
                ALOGD("EPOLLERR on event #%d", i);
            }
            if (evt->events & EPOLLHUP) {
                /* This case was handled in the first pass */
                continue;
            }
            if (evt->data.ptr) {
                handler_info = (struct event_handler_info*)evt->data.ptr;
                call_handler(handler_info, &poll_params, evt->events);
            }
        }

从init 中可以知道 epoll 主要监听了 9 个event,不同的event fd 对应不同的handler 处理逻辑。这些handler 大致分为:

  • 一个socket listener fd 监听,主要是/dev/socket/lmkd,在init() 中添加到epoll;
  • 三个客户端socket data fd 的数据通信,在ctrl_connect_handler() 中添加到epoll;
  • 三个presurre 状态的监听,在init_psi_monitors() -> init_mp_psi() 中添加到epoll;(或者init_mp_common 的旧策略)
  • 一个是LMK event kpoll_fd 监听,在init() 中添加到epoll,目前新的lmkd 不再使用这个监听;
  • 一个是wait 进程death 的pid fd 监听,在start_wait_for_proc_kill() 中添加到epoll;

下面来详细剖析这些 handler 的处理流程。

6. ctrl listener fd 的处理流程 ctrl_connect_handler

首先,在init 中得知,socket lmkd 在listen 之后会将fd 添加到epoll 中,用以监听socket 从上一节mailloop 得知epoll 触发后会调用event 对应的handler 接口,对于 lmkd,如果connect 成功后会触发ctrl_connect_handler。

AMS 中会尝试连接 lmkd,如果无法connect 会每隔 1 s 去retry,一直到connect。

frameworks/base/servcies/core/java/com/android/server/am/ProcessList.java

    // lmkd reconnect delay in msecs
    private static final long LMKD_RECONNECT_DELAY_MS = 1000;

    ...

    final class KillHandler extends Handler {
        ...

        @Override
        public void handleMessage(Message msg) {
            switch (msg.what) {
                case KILL_PROCESS_GROUP_MSG:
                   ...
                case LMKD_RECONNECT_MSG:
                    if (!sLmkdConnection.connect()) {
                        Slog.i(TAG, "Failed to connect to lmkd, retry after " +
                                LMKD_RECONNECT_DELAY_MS + " ms");
                        // retry after LMKD_RECONNECT_DELAY_MS
                        sKillHandler.sendMessageDelayed(sKillHandler.obtainMessage(
                                KillHandler.LMKD_RECONNECT_MSG), LMKD_RECONNECT_DELAY_MS);
                    }
                    break;
                default:
                    super.handleMessage(msg);
            }
        }
    }

通过代码可以得知,AMS 会通过sLmkdConnection.connect() 尝试连接lmkd,如果connect 失败会一直retry。

当LmkdConnection 连接lmkd 成功后,会进行notiry,而 lmkd 会通过epoll 收到消息,并调用ctrl_connect_handler

static void ctrl_connect_handler(int data __unused, uint32_t events __unused,
                                 struct polling_params *poll_params __unused) {
    struct epoll_event epev;
    int free_dscock_idx = get_free_dsock();

    if (free_dscock_idx < 0) {
        for (int i = 0; i < MAX_DATA_CONN; i++) {
            ctrl_data_close(i);
        }
        free_dscock_idx = 0;
    }

    data_sock[free_dscock_idx].sock = accept(ctrl_sock.sock, NULL, NULL);
    if (data_sock[free_dscock_idx].sock < 0) {
        ALOGE("lmkd control socket accept failed; errno=%d", errno);
        return;
    }

    ALOGI("lmkd data connection established");
    /* use data to store data connection idx */
    data_sock[free_dscock_idx].handler_info.data = free_dscock_idx;
    data_sock[free_dscock_idx].handler_info.handler = ctrl_data_handler;
    data_sock[free_dscock_idx].async_event_mask = 0;
    epev.events = EPOLLIN;
    epev.data.ptr = (void *)&(data_sock[free_dscock_idx].handler_info);
    if (epoll_ctl(epollfd, EPOLL_CTL_ADD, data_sock[free_dscock_idx].sock, &epev) == -1) {
        ALOGE("epoll_ctl for data connection socket failed; errno=%d", errno);
        ctrl_data_close(free_dscock_idx);
        return;
    }
    maxevents++;
}

对于 lmkd 会提供最大 3 个的客户端连接,如果超过3个后要进行ctrl_data_close() 以断开epoll 和socket。

如果没有超过的话,会通过accept 创建个新的data socket,并将其添加到epoll 中。

主要注意的是data 的交互函数ctrl_data_handler().

7. ctrl data fd 的处理流程 ctrl_data_handler

static void ctrl_data_handler(int data, uint32_t events,
                              struct polling_params *poll_params __unused) {
    if (events & EPOLLIN) {
        ctrl_command_handler(data);
    }
}

当时添加到epoll 时是以EPOLLIN 添加的,所以这里接着会调用ctrl_command_handler,主要处理从ProcessList.java 中发出的几个 lmk command:

enum lmk_cmd {
    LMK_TARGET = 0, /* Associate minfree with oom_adj_score */
    LMK_PROCPRIO,   /* Register a process and set its oom_adj_score */
    LMK_PROCREMOVE, /* Unregister a process */
    LMK_PROCPURGE,  /* Purge all registered processes */
    LMK_GETKILLCNT, /* Get number of kills */
    LMK_SUBSCRIBE,  /* Subscribe for asynchronous events */
    LMK_PROCKILL,   /* Unsolicited msg to subscribed clients on proc kills */
    LMK_UPDATE_PROPS, /* Reinit properties */
};

7.1 cmd_procprio

例如,进程的oom_adj_score 发生变化时,AMS 会调用setOomAdj 通知到lmkd:

frameworks/base/servcies/core/java/com/android/server/am/ProcessList.java

    public static void setOomAdj(int pid, int uid, int amt) {
        ...

        long start = SystemClock.elapsedRealtime();
        ByteBuffer buf = ByteBuffer.allocate(4 * 4);
        buf.putInt(LMK_PROCPRIO);
        buf.putInt(pid);
        buf.putInt(uid);
        buf.putInt(amt);
        writeLmkd(buf, null);
        long now = SystemClock.elapsedRealtime();
        ...

lmkd 中ctrl_command_handler 函数根据cmd 解析出LMK_PROCPRIO,最终调用cmd_procprio()

    case LMK_PROCPRIO:
        /* process type field is optional for backward compatibility */
        if (nargs < 3 || nargs > 4)
            goto wronglen;
        cmd_procprio(packet, nargs, &cred);
        break;

接着来看cmd_procprio 的处理:

static void cmd_procprio(LMKD_CTRL_PACKET packet, int field_count, struct ucred *cred) {
    struct proc *procp;
    char path[LINE_MAX];
    char val[20];
    int soft_limit_mult;
    struct lmk_procprio params;
    bool is_system_server;
    struct passwd *pwdrec;
    int tgid;

    lmkd_pack_get_procprio(packet, field_count, &params);

    ...

    snprintf(path, sizeof(path), "/proc/%d/oom_score_adj", params.pid);
    snprintf(val, sizeof(val), "%d", params.oomadj);
    if (!writefilestring(path, val, false)) {
        ALOGW("Failed to open %s; errno=%d: process %d might have been killed",
              path, errno, params.pid);
        /* If this file does not exist the process is dead. */
        return;
    }

    ...

    procp = pid_lookup(params.pid);
    if (!procp) {
        int pidfd = -1;

        if (pidfd_supported) {
            pidfd = TEMP_FAILURE_RETRY(sys_pidfd_open(params.pid, 0));
            ...
        }

        procp = static_cast<struct proc*>(calloc(1, sizeof(struct proc)));
        if (!procp) {
            // Oh, the irony.  May need to rebuild our state.
            return;
        }

        procp->pid = params.pid;
        procp->pidfd = pidfd;
        procp->uid = params.uid;
        procp->reg_pid = cred->pid;
        procp->oomadj = params.oomadj;
        proc_insert(procp);
    } else {
        ...
        proc_unslot(procp);
        procp->oomadj = params.oomadj;
        proc_slot(procp)
    }
}
  • 首先是将AMS 中传下来的进程的oom_score_adj 写入到节点/proc/pid/oom_score_adj
  • 通过pid_lookup 查找是否已经存在的进程;
  • 如果是新的进程,将通过sys_pidfd_open 获取pidfd,并通过proc_insert 添加到 procadjslot_list 数组链表中;
  • 如果是已经存在的进程,则更新oomadj 属性,重新添加到 procadjslot_list 数组链表中;

7.2 cmd_procremove

同理7.1 节,当应用进程不再启动时,会通过ProcessList.remove() 发送命令 LMK_PROCREMOVE 通知 lmkd,并最终调用到cmd_procremove:

static void cmd_procremove(LMKD_CTRL_PACKET packet, struct ucred *cred) {
    ...

    procp = pid_lookup(params.pid);
    if (!procp) {
        return;
    }

    ...

    pid_remove(params.pid);
}

代码比较简单,如果proc 存在,则通过pid_remove 进行移除工作。

7.3 cmd_procpurge

一般是在AMS 构造的时候会对 lmkd 进行connect,如果connect 成功,则会发命令LMK_PROCPURGE通知lmkd 先进行环境的打扫工作,最终调用 cmd_procpurge

static void cmd_procpurge(struct ucred *cred) {
    ...

    for (i = 0; i < PIDHASH_SZ; i++) {
        procp = pidhash[i];
        while (procp) {
            next = procp->pidhash_next;
            /* Purge only records created by the requestor */
            if (claim_record(procp, cred->pid)) {
                pid_remove(procp->pid);
            }
            procp = next;
        }
    }
}

代码比较简单,就是将所有的proc 都清理一遍。

7.4 cmd_subscribe

在AMS 通过ProcessList connect 到 lmkd 之后,会发送命令LMK_SUBSCRIBE

frameworks/base/servcies/core/java/com/android/server/am/ProcessList.java

    public boolean onLmkdConnect(OutputStream ostream) {
        try {
            ...
            // Subscribe for kill event notifications
            buf = ByteBuffer.allocate(4 * 2);
            buf.putInt(LMK_SUBSCRIBE);
            buf.putInt(LMK_ASYNC_EVENT_KILL);
            ostream.write(buf.array(), 0, buf.position());
        } catch (IOException ex) {
            return false;
        }
        return true;
    }

用以接受 lmkd 在kill 进程后的通知,lmkd 在kill 进程需要根据client 是否有subscribe 决定是否通知,如果向 lmkd 发送subscribe:

static void cmd_subscribe(int dsock_idx, LMKD_CTRL_PACKET packet) {
    struct lmk_subscribe params;

    lmkd_pack_get_subscribe(packet, &params);
    data_sock[dsock_idx].async_event_mask |= 1 << params.evt_type;
}

会将对应的客户端信息数组 data_sock 中对应的async_event_mask 标记为LMK_ASYNC_EVENT_KILL,在 lmkd kill 一个进程后会调用:

static void ctrl_data_write_lmk_kill_occurred(pid_t pid, uid_t uid) {
    LMKD_CTRL_PACKET packet;
    size_t len = lmkd_pack_set_prockills(packet, pid, uid);

    for (int i = 0; i < MAX_DATA_CONN; i++) {
        if (data_sock[i].sock >= 0 && data_sock[i].async_event_mask & 1 << LMK_ASYNC_EVENT_KILL) {
            ctrl_data_write(i, (char*)packet, len);
        }
    }
}

通过ctrl_data_write 通知 AMS 中的ProcessList:

frameworks/base/servcies/core/java/com/android/server/am/ProcessList.java

sLmkdConnection = new LmkdConnection(sKillThread.getLooper().getQueue(),
                    new LmkdConnection.LmkdConnectionListener() {
                        ...

                        @Override
                        public boolean handleUnsolicitedMessage(ByteBuffer dataReceived,
                                int receivedLen) {
                            ...
                        }

7.5 cmd_target

从ProcessList.java 中得知在ProcessList 构造时会初始化一次,另外会在ATMS.updateConfiguration 是会触发:

frameworks/base/services/core/java/com/android/server/wm/ActivityTaskManagerService.java

   public boolean updateConfiguration(Configuration values) {
        mAmInternal.enforceCallingPermission(CHANGE_CONFIGURATION, "updateConfiguration()");

        synchronized (mGlobalLock) {
            ...

            mH.sendMessage(PooledLambda.obtainMessage(
                    ActivityManagerInternal::updateOomLevelsForDisplay, mAmInternal,
                    DEFAULT_DISPLAY));

            ...
        }
    }

感兴趣的可以跟一下源码,最终会调用到ProcessList.updateOomLevels()

frameworks/base/servcies/core/java/com/android/server/am/ProcessList.java

    private void updateOomLevels(int displayWidth, int displayHeight, boolean write) {
        ...

        if (write) {
            ByteBuffer buf = ByteBuffer.allocate(4 * (2 * mOomAdj.length + 1));
            buf.putInt(LMK_TARGET);
            for (int i = 0; i < mOomAdj.length; i++) {
                buf.putInt((mOomMinFree[i] * 1024)/PAGE_SIZE);
                buf.putInt(mOomAdj[i]);
            }

            writeLmkd(buf, null);
            ...
        }
    }

系统通过这个函数计算oom adj 的minfree,并将各个级别的 minfree和oom_adj_score 传入到 lmkd 中,至于adj minfree 的算法,后续会补充,这里继续跟lmkd 的cmd_target

static void cmd_target(int ntargets, LMKD_CTRL_PACKET packet) {
    int i;
    struct lmk_target target;
    char minfree_str[PROPERTY_VALUE_MAX];
    char *pstr = minfree_str;
    char *pend = minfree_str + sizeof(minfree_str);
    static struct timespec last_req_tm;
    struct timespec curr_tm;

    ...

    for (i = 0; i < ntargets; i++) {
        lmkd_pack_get_target(packet, i, &target);
        lowmem_minfree[i] = target.minfree;
        lowmem_adj[i] = target.oom_adj_score;

        pstr += snprintf(pstr, pend - pstr, "%d:%d,", target.minfree,
            target.oom_adj_score);
        if (pstr >= pend) {
            /* if no more space in the buffer then terminate the loop */
            pstr = pend;
            break;
        }
    }

    lowmem_targets_size = ntargets;

    /* Override the last extra comma */
    pstr[-1] = '\0';
    property_set("sys.lmk.minfree_levels", minfree_str);

    ...
}

代码比较简单,将minfree 和oom_adj_score 进行组装,然后将组装的字符串存入到prop sys.lmk.minfree_levels。

这里的prop 其实应该是为了后面debug 时查看的,而最终的是两个数组变量:

static int lowmem_adj[MAX_TARGETS];
static int lowmem_minfree[MAX_TARGETS];

这里是将AMS 中设置的oom adj 都存放起来,后面需要kill 进程时会根据内存的使用情况、内存的mem pressure计算出最合适的min_score_adj,然后根据这个min_score_adj,kill 所有大于此值的进程。

至此, ctrl data fd 的处理流程 ctrl_data_handler 基本已剖析完成了,主要是配合第 6 节,AMS 在构造的时候会通过ProcessList 进行相对 lmkd 的初始化,包括connect 和 lmkd kill 进程后的通知监听。

  • 在AMS 初始化时connect lmkd,并发送命令LMK_PROCPURGE进行环境清理;
  • 同上,在AMS 发送完LMK_PROCPURGE后,会紧接着发送LMK_SUBSCRIBE用以接受 lmkd kill 进程后的通知;
  • 在AMS 停掉某个进程时,会发送命令LMK_PROCREMOVE
  • 在AMS 更新oom_score_adj时,会通过接口setOomAdj 发送命令LMK_PROCPRIO
  • 在更新oom level 时,会通过updateOomLevels 发送命令LMK_TARGET

独特见解