转载自https://justinwei.blog.csdn.net/article/details/122268437
10. mp_event_psi
与第 8 节中 mp_event_common
对应,新策略的event 处理是通过 mp_event_psi
。函数逻辑有点多,还是分解剖析。
10.1 step0. 注意一些static变量
static int64_t init_ws_refault;
static int64_t prev_workingset_refault;
static int64_t base_file_lru;
static int64_t init_pgscan_kswapd;
static int64_t init_pgscan_direct;
static int64_t swap_low_threshold;
static bool killing;
static int thrashing_limit = thrashing_limit_pct;
static struct zone_watermarks watermarks;
static struct timespec wmark_update_tm;
static struct wakeup_info wi;
static struct timespec thrashing_reset_tm;
static int64_t prev_thrash_growth = 0;
static bool check_filecache = false;
static int max_thrashing = 0;
整个lmkd 处理都是持续记录的,对于PSI 策略处理过程,这些static 起到了至关重要的作用。
init_ws_refault
初始的工作集 refault 值。每次event 触发时都会重新读取/proc/vmstat 节点中部分属性值,其中就有工作集refault,读取节点后都会记录在这个变量中;prev_workingset_refault
上一次工作集refault 值,用以确认两次event 是否存在workingset_refault值是一样的;base_file_lru
从vmstat 节点读取的inactive file 和active file 之和;init_pgscan_kswaped
上一次vmstat 节点中pgscan_kswaped
值,用以下一次event 时确认reclaim 状态,详细看 step 3;init_pgscan_direct
上一次vmstat 节点中pgscan_direct
值,用以下一次event 时确认reclaim 状态,与上面的init_pgscan_kswaped
组合使用,详细看 step 3;swap_low_threshold
用以记录swap 分区预留的内存大小。用以确认从/proc/meminfo
节点中读取的free_swap
小于此预留值,详细看 step2 和 step 6;killing
用以记录上一次event 正在处理,已经找到process 并处于killing 状态;thrashing_limit
PSI event处理的重要变量,用以记录抖动界限。如上面代码,正常情况下thrashing_limit 的值等同于prop ro.lmk.thrashing_limit(详细看 lmkd机制一),每一次reset thrashing时也会重置该值为propro.lmk.thrashing_limit
。但是,当内存紧张时,短时间内可能会触发多次event,此时抖动比较厉害,抖动值thrashing 有可能会超过thrashing_limit,选择process kill后,会对该值进行衰减处理,衰减百分比为 propro.lmk.thrashing_limit_decay
的值,详细看 step 7;watermarks
记录水位值,每分钟都会读取/proc/zoneinfo 的水位,会记录在此变量中,详细看 step 5;wmark_update_tm
记录上一次更新水位的时间,详细看 step 5;wi
用以记录event 被wake up 的时间;thrashing_reset_tm
记录thrashing 值reset 的时间,详细看 step 4;prev_thrash_growth
记录两次vmstat 节点读取的workingset_refault
增长幅度,详细看setp 4;
10.2 step1: 解析vmstat和meminfo
if (vmstat_parse(&vs) < 0) {
ALOGE("Failed to parse vmstat!");
return;
}
/* Starting 5.9 kernel workingset_refault vmstat field was renamed workingset_refault_file */
workingset_refault_file = vs.field.workingset_refault ? : vs.field.workingset_refault_file;
if (meminfo_parse(&mi) < 0) {
ALOGE("Failed to parse meminfo!");
return;
}
10.3 step2: 确定是swap是否足够
/* Check free swap levels */
if (swap_free_low_percentage) {
if (!swap_low_threshold) {
swap_low_threshold = mi.field.total_swap * swap_free_low_percentage / 100;
}
swap_is_low = mi.field.free_swap < swap_low_threshold;
}
变量swap_free_low_percentage 是通过prop ro.lmk.swap_free_low_percentage 来标记swap 可预留的最低空间百分比,取值 0~100。
如果当前free 的swap 低于 swap 的最低空间大小,则标记swap 处于low 状态。
该值只会计算一次。
10.4 step3. 确定reclaim状态
/* Identify reclaim state */
// vs.field.pgscan_direct是从`/proc/vmstat`中解析的,init_pgscan_direct默认为0
if (vs.field.pgscan_direct > init_pgscan_direct) {
init_pgscan_direct = vs.field.pgscan_direct;
init_pgscan_kswapd = vs.field.pgscan_kswapd;
reclaim = DIRECT_RECLAIM;
} else if (vs.field.pgscan_kswapd > init_pgscan_kswapd) {
init_pgscan_kswapd = vs.field.pgscan_kswapd;
reclaim = KSWAPD_RECLAIM;
} else if (workingset_refault_file == prev_workingset_refault) {
/*
* Device is not thrashing and not reclaiming, bail out early until we see these stats
* changing
*/
goto no_kill;
}
prev_workingset_refault = workingset_refault_file;
通过当前的pgscan_direct 和pgscan_kswapd 与上一次对应的值进行比较,确认当前kswapd 处于reclaim 的状态。
10.5 step5: thrashing计算
/*
* It's possible we fail to find an eligible process to kill (ex. no process is
* above oom_adj_min). When this happens, we should retry to find a new process
* for a kill whenever a new eligible process is available. This is especially
* important for a slow growing refault case. While retrying, we should keep
* monitoring new thrashing counter as someone could release the memory to mitigate
* the thrashing. Thus, when thrashing reset window comes, we decay the prev thrashing
* counter by window counts. If the counter is still greater than thrashing limit,
* we preserve the current prev_thrash counter so we will retry kill again. Otherwise,
* we reset the prev_thrash counter so we will stop retrying.
*/
since_thrashing_reset_ms = get_time_diff_ms(&thrashing_reset_tm, &curr_tm);
if (since_thrashing_reset_ms > THRASHING_RESET_INTERVAL_MS) {
long windows_passed;
/* Calculate prev_thrash_growth if we crossed THRASHING_RESET_INTERVAL_MS */
prev_thrash_growth = (workingset_refault_file - init_ws_refault) * 100
/ (base_file_lru + 1);
windows_passed = (since_thrashing_reset_ms / THRASHING_RESET_INTERVAL_MS);
/*
* Decay prev_thrashing unless over-the-limit thrashing was registered in the window we
* just crossed, which means there were no eligible processes to kill. We preserve the
* counter in that case to ensure a kill if a new eligible process appears.
*/
if (windows_passed > 1 || prev_thrash_growth < thrashing_limit) {
prev_thrash_growth >>= windows_passed;
}
/* Record file-backed pagecache size when crossing THRASHING_RESET_INTERVAL_MS */
base_file_lru = vs.field.nr_inactive_file + vs.field.nr_active_file;
init_ws_refault = workingset_refault_file;
thrashing_reset_tm = curr_tm;
thrashing_limit = thrashing_limit_pct;
} else {
/* Calculate what % of the file-backed pagecache refaulted so far */
thrashing = (workingset_refault_file - init_ws_refault) * 100 / (base_file_lru + 1);
}
/* Add previous cycle's decayed thrashing amount */
thrashing += prev_thrash_growth;
if (max_thrashing < thrashing) {
max_thrashing = thrashing;
}
本段代码总的来说就是重置 thrashing 值。从代码来看,如果距离上一次重置超过了 THRASHING_RESET_INTERVAL_MS(默认是1000,即1s),那么thrashing 相关的值都需要重置。
主要是计算工作集refault 值占据 file-backed 页面缓存的抖动百分比:
thrashing = (workingset_refault_file - init_ws_refault) * 100 / (base_file_lru + 1);
vs.feild.workingset_refault
是当前的refault 值(kernel 5.9 之后改名了),init_ws_refault
是上一次的refault 值,base_file_lru
是file page(包括inactive 和active)。
有些时候计算后的oom_adj_min
却找不到大于该adj 的进程,此时需要重新找到一个虚拟的可以kill 的进程。
10.6 每过1分钟计算一次水位
/*
* Refresh watermarks once per min in case user updated one of the margins.
* TODO: b/140521024 replace this periodic update with an API for AMS to notify LMKD
* that zone watermarks were changed by the system software.
*/
if (watermarks.high_wmark == 0 || get_time_diff_ms(&wmark_update_tm, &curr_tm) > 60000) {
struct zoneinfo zi;
if (zoneinfo_parse(&zi) < 0) {
ALOGE("Failed to parse zoneinfo!");
return;
}
calc_zone_watermarks(&zi, &watermarks);
wmark_update_tm = curr_tm;
}
通过读取/proc/zoneinfo
中的min、low、high 水位和protection 计算出这次的最终水位,并保存在静态结构体变量 watermarks 中,1 分钟计算一次(最开始high_wmark 为0,后面是1 分钟一次)。
在获取到water mark 后,会确认当前触发event 时处于什么水位:
/* Find out which watermark is breached if any */
wmark = get_lowest_watermark(&mi, &watermarks);
/*
* Returns lowest breached watermark or WMARK_NONE.
*/
static enum zone_watermark get_lowest_watermark(union meminfo *mi,
struct zone_watermarks *watermarks)
{
int64_t nr_free_pages = mi->field.nr_free_pages - mi->field.cma_free;
if (nr_free_pages < watermarks->min_wmark) {
return WMARK_MIN;
}
if (nr_free_pages < watermarks->low_wmark) {
return WMARK_LOW;
}
if (nr_free_pages < watermarks->high_wmark) {
return WMARK_HIGH;
}
return WMARK_NONE;
}
通过/proc/meminfo
中的nr_free_pages - cma_free
与水位进行比较。
10.7 step6: 确定kill reason和min_score_adj
/*
* TODO: move this logic into a separate function
* Decide if killing a process is necessary and record the reason
*/
if (cycle_after_kill && wmark < WMARK_LOW) {
/*
* Prevent kills not freeing enough memory which might lead to OOM kill.
* This might happen when a process is consuming memory faster than reclaim can
* free even after a kill. Mostly happens when running memory stress tests.
*/
kill_reason = PRESSURE_AFTER_KILL;
strncpy(kill_desc, "min watermark is breached even after kill", sizeof(kill_desc));
} else if (level == VMPRESS_LEVEL_CRITICAL && events != 0) {
/*
* Device is too busy reclaiming memory which might lead to ANR.
* Critical level is triggered when PSI complete stall (all tasks are blocked because
* of the memory congestion) breaches the configured threshold.
*/
kill_reason = NOT_RESPONDING;
strncpy(kill_desc, "device is not responding", sizeof(kill_desc));
} else if (swap_is_low && thrashing > thrashing_limit_pct) {
/* Page cache is thrashing while swap is low */
kill_reason = LOW_SWAP_AND_THRASHING;
snprintf(kill_desc, sizeof(kill_desc), "device is low on swap (%" PRId64
"kB < %" PRId64 "kB) and thrashing (%" PRId64 "%%)",
mi.field.free_swap * page_k, swap_low_threshold * page_k, thrashing);
/* Do not kill perceptible apps unless below min watermark or heavily thrashing */
if (wmark > WMARK_MIN && thrashing < thrashing_critical_pct) {
min_score_adj = PERCEPTIBLE_APP_ADJ + 1;
}
check_filecache = true;
} else if (swap_is_low && wmark < WMARK_HIGH) {
/* Both free memory and swap are low */
kill_reason = LOW_MEM_AND_SWAP;
snprintf(kill_desc, sizeof(kill_desc), "%s watermark is breached and swap is low (%"
PRId64 "kB < %" PRId64 "kB)", wmark < WMARK_LOW ? "min" : "low",
mi.field.free_swap * page_k, swap_low_threshold * page_k);
/* Do not kill perceptible apps unless below min watermark or heavily thrashing */
if (wmark > WMARK_MIN && thrashing < thrashing_critical_pct) {
min_score_adj = PERCEPTIBLE_APP_ADJ + 1;
}
} else if (wmark < WMARK_HIGH && swap_util_max < 100 &&
(swap_util = calc_swap_utilization(&mi)) > swap_util_max) {
/*
* Too much anon memory is swapped out but swap is not low.
* Non-swappable allocations created memory pressure.
*/
kill_reason = LOW_MEM_AND_SWAP_UTIL;
snprintf(kill_desc, sizeof(kill_desc), "%s watermark is breached and swap utilization"
" is high (%d%% > %d%%)", wmark < WMARK_LOW ? "min" : "low",
swap_util, swap_util_max);
} else if (wmark < WMARK_HIGH && thrashing > thrashing_limit) {
/* Page cache is thrashing while memory is low */
kill_reason = LOW_MEM_AND_THRASHING;
snprintf(kill_desc, sizeof(kill_desc), "%s watermark is breached and thrashing (%"
PRId64 "%%)", wmark < WMARK_LOW ? "min" : "low", thrashing);
cut_thrashing_limit = true;
/* Do not kill perceptible apps unless thrashing at critical levels */
if (thrashing < thrashing_critical_pct) {
min_score_adj = PERCEPTIBLE_APP_ADJ + 1;
}
check_filecache = true;
} else if (reclaim == DIRECT_RECLAIM && thrashing > thrashing_limit) {
/* Page cache is thrashing while in direct reclaim (mostly happens on lowram devices) */
kill_reason = DIRECT_RECL_AND_THRASHING;
snprintf(kill_desc, sizeof(kill_desc), "device is in direct reclaim and thrashing (%"
PRId64 "%%)", thrashing);
cut_thrashing_limit = true;
/* Do not kill perceptible apps unless thrashing at critical levels */
if (thrashing < thrashing_critical_pct) {
min_score_adj = PERCEPTIBLE_APP_ADJ + 1;
}
check_filecache = true;
} else if (check_filecache) {
int64_t file_lru_kb = (vs.field.nr_inactive_file + vs.field.nr_active_file) * page_k;
if (file_lru_kb < filecache_min_kb) {
/* File cache is too low after thrashing, keep killing background processes */
kill_reason = LOW_FILECACHE_AFTER_THRASHING;
snprintf(kill_desc, sizeof(kill_desc),
"filecache is low (%" PRId64 "kB < %" PRId64 "kB) after thrashing",
file_lru_kb, filecache_min_kb);
min_score_adj = PERCEPTIBLE_APP_ADJ + 1;
} else {
/* File cache is big enough, stop checking */
check_filecache = false;
}
}
kill reason 大致分为:
PRESSURE_AFTER_KILL
NOT_RESPONDING
LOW_SWAP_AND_THRASHING
LOW_MEM_AND_SWAP
LOW_MEM_AND_SWAP_UTIL
LOW_MEM_AND_THRASHING
DIRECT_RECL_AND_THRASHING
LOW_FILECACHE_AFTER_THRASHING
(1)状态 PRESSURE_AFTER_KILL
此状态条件是:cycle_after_kill
&& wmark < WMARK_LOW
cycle_after_kill
为true 表明此时还处于killing 状态,并且水位已经低于low 水位。此状态通常发生在memory 压力测试中。
wmark的值即为proc/zoneinfo
节点中的nr_free_pages
.
(2)状态 NOT_RESPONDING
此状态条件是:level == VMPRESS_LEVEL_CRITICAL
&& events !=0
此时内存pressure 已经超出了PSI complete stall,即full 状态设定的阈值。此时设备处于拼命reclaim memory ,这有可能导致ANR 产生。
(3)状态LOW_SWAP_AND_THRASHING
此状态条件是:swap_is_low
&& thrashing > thrashing_limit_pct
swap_is_low
是swap 空间已经超过底线了,这个底线是详细看step 2。
thrashing 是workset refault值基于file-backed 页面缓存的抖动百分比,详细看step 4。thrashing_limit_pct
来自prop ro.lmk.thrashing_limit
,对于low ram 该值为30,否则为100;
但如果水位还没有低于MIN,并且thrashing 没有高于thrashing_critical_pct
(由prop ro.lmk.thrashing_limit_critical指定,如果未定义该属性,默认取thrashing_limit_pct的2倍) 时,不去kill perceptible 之下的应用。
(4)状态 LOW_MEM_AND_SWAP
此状态条件是: swap_is_low
&& wmark < WMARK_HIGH
此时swap 低于设限的阈值,free pages 处于水位 HIGH 之下(有可能已经处于MIN 之下)。
但如果水位还没有低于MIN,并且thrashing 没有高于thrashing_critical_pct
(由prop ro.lmk.thrashing_limit_critical指定,如果未定义该属性,默认取thrashing_limit_pct的2倍) 时,不去kill perceptible 之下的应用。
(5) LOW_MEM_AND_SWAP_UTIL
此状态条件为: wmark < WMARK_HIGH && swap_util_max < 100 && (swap_util = calc_swap_utilization(&mi)) > swap_util_max
此时的内存水位已经很低了,swap_util_max
由属性ro.lmk.swap_util_max
指定,默认为100%,表示最大可使用的交换内存量。
且通过meminfo计算出的交换内存使用大于设置的可用swap内存。
说明即使大量anon的内存被交换后,swap的使用量依旧很高。说明此时使用的不可交换的内存造成了内存压力。
(6)LOW_MEM_AND_THRASHING
此状态条件是: wmark < WMARK_HIGH
&& thrashing > thrashing_limit
水位在HIGH 之下(有可能处于MIN),并且抖动值已经超过 thrashing_limit。标记此时处于低水位并抖动状态。
如果抖动的值没有超过了ro.lmk.thrashing_limit_critical
设定的(默认为ro.lmk.thrashing_limit
2倍),则不去kill perceptible 之下的进程。
(7)DIRECT_RECL_AND_THRASHING
此状态条件是:reclaim == DIRECT_RECLAIM
&& thrashing > thrashing_limit
当抖动大于limit 值,kswap 进入reclaim状态时,就会kill apps。
默认kill apps 的min_score_adj
是从0 开始,有些条件不是很过分时min_score_adj
会选择从PERCEPTIBLE_APP_ADJ
+ 1 开始。最终根据该min_score_adj
传入find_and_kill_process
找到合适的进程进行kill。
(8)LOW_FILECACHE_AFTER_THRASHING
此状态条件为file_lru_kb < filecache_min_kb
其中:
file_lru_kb = (vs.field.nr_inactive_file + vs.field.nr_active_file) * page_k;
filecache_min_kb
由属性ro.lmk.filecache_min_kb
指定,默认值为0.
这种情况说名抖动导致file cache值太低,不断的杀死后台进程。
10.8 step7: kill 进程
/* Kill a process if necessary */
if (kill_reason != NONE) {
struct kill_info ki = {
.kill_reason = kill_reason,
.kill_desc = kill_desc,
.thrashing = (int)thrashing,
.max_thrashing = max_thrashing,
};
int pages_freed = find_and_kill_process(min_score_adj, &ki, &mi, &wi, &curr_tm);
if (pages_freed > 0) {
killing = true;
max_thrashing = 0;
if (cut_thrashing_limit) {
/*
* Cut thrasing limit by thrashing_limit_decay_pct percentage of the current
* thrashing limit until the system stops thrashing.
*/
thrashing_limit = (thrashing_limit * (100 - thrashing_limit_decay_pct)) / 100;
}
}
}
详细看第 9 节。
注意这里thrashing_limit
有可能衰减,因为之前的thrasing 值已经超过了thrashing_limit
。这种情况一般出现在短时间连续抖动的情况。
评论 (0)