kernel verson: Android common kernel 4.19.176
1. 问题的引入
在monkey测试app切换过程中,发现log中会存在page allocation failure信息,如下所示,分配失败的原因未分配order4大小的GFP_ATOMIC页面失败,从下面打印的内存信息可以看出系统中所剩的order4及以上的页面类型全是CMA的,由此导致了分配失败。也就是说,虽然系统中还有可用的内存,但都是CMA或MOVEABLE类型的。由于内存碎片导致了分配order 4页面内存失败。
本文主要分析GFP_ATOMIC内存页面的分配和创建,看能否通过增加GFP_ATOMIC类型的页面,来减少或解决内存分配失败的问题?
[16694.233623] .(1)[2966:WifiHandlerThre]binder: 2567:2966 BC_REQUEST_DEATH_NOTIFICATION death notification already set
[16694.496553] .(0)[2318:wifi@1.0-servic][wlan]Set ALL DBG module log level to [0x2f]
[16694.506604] .(0)[2318:wifi@1.0-servic][wlan]Reset ALL DBG module log level to DEFAULT!
[16694.522205] .(1)[2318:wifi@1.0-servic]wifi@1.0-servic: page allocation failure: order:4, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null)
[16694.535169] -(0)[2318:wifi@1.0-servic]CPU: 0 PID: 2318 Comm: wifi@1.0-servic Tainted: P O 4.19.176 #54
[16694.545833] -(0)[2318:wifi@1.0-servic]Hardware name: UniPhier LD20 Global Board v4 (REF_LD20_GP_V4) (DT)
[16694.555338] -(0)[2318:wifi@1.0-servic]Call trace:
[16694.560060] -(0)[2318:wifi@1.0-servic] dump_backtrace+0x0/0x1b0
[16694.565999] -(0)[2318:wifi@1.0-servic] show_stack+0x24/0x30
[16694.571590] -(0)[2318:wifi@1.0-servic] dump_stack+0xb4/0xec
[16694.577176] -(0)[2318:wifi@1.0-servic] warn_alloc+0xf0/0x158
[16694.582849] -(0)[2318:wifi@1.0-servic] __alloc_pages_nodemask+0xb5c/0xd68
[16694.589657] -(0)[2318:wifi@1.0-servic] kmalloc_order+0x38/0x78
[16694.595504] -(0)[2318:wifi@1.0-servic] kmalloc_order_trace+0x3c/0x110
[16694.602053] -(0)[2318:wifi@1.0-servic] glSetHifInfo+0x590/0x610 [wlan_7961_usb]
[16694.609441] -(0)[2318:wifi@1.0-servic] wlanGetConfig+0x428/0xc98 [wlan_7961_usb]
[16694.616926] -(0)[2318:wifi@1.0-servic] kalP2pIndicateChnlSwitch+0x61c/0x658 [wlan_7961_usb]
[16694.625305] -(0)[2318:wifi@1.0-servic] usb_probe_interface+0x190/0x2e8
[16694.631855] -(0)[2318:wifi@1.0-servic] really_probe+0x3c4/0x420
[16694.637789] -(0)[2318:wifi@1.0-servic] driver_probe_device+0x9c/0x148
[16694.644248] -(0)[2318:wifi@1.0-servic] __driver_attach+0x154/0x158
[16694.650444] -(0)[2318:wifi@1.0-servic] bus_for_each_dev+0x78/0xe0
[16694.656555] -(0)[2318:wifi@1.0-servic] driver_attach+0x30/0x40
[16694.662403] -(0)[2318:wifi@1.0-servic] bus_add_driver+0x1f0/0x288
[16694.668514] -(0)[2318:wifi@1.0-servic] driver_register+0x68/0x118
[16694.674624] -(0)[2318:wifi@1.0-servic] usb_register_driver+0x7c/0x170
[16694.681151] -(0)[2318:wifi@1.0-servic] glRegisterBus+0x88/0xa0 [wlan_7961_usb]
[16694.688456] -(0)[2318:wifi@1.0-servic] init_module+0x2b8/0x2d8 [wlan_7961_usb]
[16694.695700] -(0)[2318:wifi@1.0-servic] do_one_initcall+0x5c/0x260
[16694.701808] -(0)[2318:wifi@1.0-servic] do_init_module+0x64/0x1ec
[16694.707833] -(0)[2318:wifi@1.0-servic] load_module+0x1c7c/0x1ec0
[16694.713854] -(0)[2318:wifi@1.0-servic] __se_sys_finit_module+0xa0/0x100
[16694.720486] -(0)[2318:wifi@1.0-servic] __arm64_sys_finit_module+0x24/0x30
[16694.727295] -(0)[2318:wifi@1.0-servic] el0_svc_common.constprop.0+0x7c/0x198
[16694.734362] -(0)[2318:wifi@1.0-servic] el0_svc_compat_handler+0x2c/0x38
[16694.740993] -(0)[2318:wifi@1.0-servic] el0_svc_compat+0x8/0x34
[16694.748520] .(2)[2318:wifi@1.0-servic]Mem-Info:
[16694.753421] .(2)[2318:wifi@1.0-servic]active_anon:33088 inactive_anon:33107 isolated_anon:0
[16694.753421] active_file:24010 inactive_file:24005 isolated_file:10
[16694.753421] unevictable:1063 dirty:29 writeback:0 unstable:0
[16694.753421] slab_reclaimable:10351 slab_unreclaimable:22174
[16694.753421] mapped:39560 shmem:2208 pagetables:8811 bounce:0
[16694.753421] free:79870 free_pcp:926 free_cma:65727
[16694.796994] .(2)[2318:wifi@1.0-servic]Node 0 active_anon:132352kB inactive_anon:132428kB active_file:96040kB inactive_file:96020kB unevictable:4252kB isolated(anon):0kB isolated(file):40kB mapped:158488kB dirty:116kB writeback:0kB shmem:8832kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[16694.831470] DMA32 free:319728kB min:4556kB low:29180kB high:30484kB active_anon:132384kB inactive_anon:132068kB active_file:96092kB inactive_file:96044kB unevictable:4252kB writepending:116kB present:1382784kB managed:1306504kB mlocked:4252kB kernel_stack:28576kB pagetables:35244kB bounce:0kB free_pcp:3508kB local_pcp:720kB free_cma:263032kB
[16694.863014] .(2)[2318:wifi@1.0-servic]lowmem_reserve[]: 0 0 0
[16694.869695] DMA32: 392*4kB (UMECH) 2000*8kB (UMECH) 1361*16kB (UMECH) 726*32kB (UECH) 20*64kB (C) 11*128kB (C) 1*256kB (C) 1*512kB (C) 2*1024kB (C) 3*2048kB (C) 60*4096kB (C) = 319984kB
[16694.889046] .(2)[2318:wifi@1.0-servic]Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[16694.900476] .(2)[2318:wifi@1.0-servic]53422 total pagecache pages
[16694.906729] .(2)[2318:wifi@1.0-servic]2580 pages in swap cache
[16694.912702] .(2)[2318:wifi@1.0-servic]Swap cache stats: add 3007676, delete 3005100, find 446225/2585766
[16694.922722] .(2)[2318:wifi@1.0-servic]Free swap = 48216kB
[16694.928316] .(2)[2318:wifi@1.0-servic]Total swap = 393212kB
[16694.934441] .(2)[2318:wifi@1.0-servic]345696 pages RAM
[16694.940486] .(2)[2318:wifi@1.0-servic]0 pages HighMem/MovableOnly
[16694.946758] .(2)[2318:wifi@1.0-servic]19070 pages reserved
[16694.952821] .(2)[2318:wifi@1.0-servic]95232 pages cma reserved
[16694.958823] .(2)[2318:wifi@1.0-servic]0 pages hwpoisoned
[16694.973764] .(2)[2318:wifi@1.0-servic]wlan: probe of 1-3:1.3 failed with error -1
2. 代码实现
关于MIGRATE_HIGHATOMIC类型的引入,请参考Some kernel memory-allocation improvements.
There is still value in reserving blocks of memory for high-order allocations, though; fragmentation is still a concern in current kernels. So another part of Mel's patch set creates a new MIGRATE_HIGHATOMIC reserve that serves this purpose, but in a different way. Initially, this reserve contains no page blocks at all. If a high-order allocation cannot be satisfied without breaking up a previously whole page block, that block will be marked as being part of the high-order atomic reserve; thereafter, only higher-order allocations (and only high-priority ones at that) can be satisfied from that page block.
从介绍来看,针对高阶page的申请导致的碎片一直以来就是一个令人担忧的问题,所以Mel创建了一个新的page type,MIGRATE_HIGHATOMIC,用于放置碎片化过多的问题,仅仅高阶并且具备同等级别的才能从该pageblock中申请page。
2.1 highatomic 页面的reserve
从定义来看,MIGRATE_HIGHATOMIC的值和MIGRATE_PCPTYPES值相同
// include/linux/mmzone.h
enum migratetype {
MIGRATE_UNMOVABLE,
MIGRATE_MOVABLE,
MIGRATE_RECLAIMABLE,
#ifdef CONFIG_CMA
/*
* MIGRATE_CMA migration type is designed to mimic the way
* ZONE_MOVABLE works. Only movable pages can be allocated
* from MIGRATE_CMA pageblocks and page allocator never
* implicitly change migration type of MIGRATE_CMA pageblock.
*
* The way to use it is to change migratetype of a range of
* pageblocks to MIGRATE_CMA which can be done by
* __free_pageblock_cma() function. What is important though
* is that a range of pageblocks must be aligned to
* MAX_ORDER_NR_PAGES should biggest page be bigger then
* a single pageblock.
*/
MIGRATE_CMA,
#endif
MIGRATE_PCPTYPES, /* the number of types on the pcp lists */
MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES,
#ifdef CONFIG_MEMORY_ISOLATION
MIGRATE_ISOLATE, /* can't allocate from here */
#endif
MIGRATE_TYPES
};
这一类型的页面使用下面的函数reserve.
// mm/page_alloc.c
/*
* Reserve a pageblock for exclusive use of high-order atomic allocations if
* there are no empty page blocks that contain a page with a suitable order
*/
static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
unsigned int alloc_order)
{
int mt;
unsigned long max_managed, flags;
/*
* Limit the number reserved to 1 pageblock or roughly 1% of a zone.
* Check is race-prone but harmless.
*/
max_managed = (zone->managed_pages / 100) + pageblock_nr_pages;
if (zone->nr_reserved_highatomic >= max_managed)
return;
spin_lock_irqsave(&zone->lock, flags);
/* Recheck the nr_reserved_highatomic limit under the lock */
if (zone->nr_reserved_highatomic >= max_managed)
goto out_unlock;
/* Yoink! */
mt = get_pageblock_migratetype(page);
if (!is_migrate_highatomic(mt) && !is_migrate_isolate(mt)
&& !is_migrate_cma(mt)) {
zone->nr_reserved_highatomic += pageblock_nr_pages;
set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
}
out_unlock:
spin_unlock_irqrestore(&zone->lock, flags);
}
函数调用关系为get_page_from_freelist()-->reserve_highatomic_pageblock(),
// mm/page_alloc.c
/*
* get_page_from_freelist goes through the zonelist trying to allocate
* a page.
*/
static struct page *
get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
const struct alloc_context *ac)
{
......
try_this_zone:
page = rmqueue(ac->preferred_zoneref->zone, zone, order,
gfp_mask, alloc_flags, ac->migratetype);
if (page) {
prep_new_page(page, order, gfp_mask, alloc_flags);
/*
* If this is a high-order atomic allocation then check
* if the pageblock should be reserved for the future
*/
if (unlikely(order && (alloc_flags & ALLOC_HARDER)))
reserve_highatomic_pageblock(page, zone, order);
return page;
} else {
......
从上面代码可以看到,在申请page的时候,会通过判定该page申请时的一些flags配置,例如属于ALLOC_HARDER,表明该页申请时无低阶page申请时从高阶获取,可以认为该page是从high order分配下来的,所以将该页加入到highatomic_pageblock中,但是该类型的页又不能无限多,否则后面内存紧张的时候,就无法申请page了。
所以尽量设置该pageblock的数目小于zone里面page的1/100。
max_managed = (zone_managed_pages(zone) / 100) + pageblock_nr_pages;
最终会调用move_freepages_block函数将该页移动到highatomic_pageblock中去。
2.2 关于MIGRATE_HIGHATOMIC的释放
内核函数往往体现这对称性,将page加入到highatomic_pageblock中去采用的reserve_highatomic_pageblock函数,而将page从该highatomic_pageblock移除,则调用的unreserve_highatomic_pageblock函数.
函数调用__alloc_pages_nodemask()-->__alloc_pages_slowpath()-->_alloc_pages_direct_reclaim()-->unreserve_highatomic_pageblock()
。简单说,就是当直接get_page_from_freelist()函数无法获取内存时,会从slowpath获取内存,在slowpath中会做内存回收,将会回收highatomic_pageblock类型的内存。
此处不再做过多的解释,大体意思如下,当alloc_page
slow申请page的时候失败,会从该highatomic_pageblock中申请page。
/* The really slow allocator path where we enter direct reclaim */
static inline struct page *
__alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
unsigned int alloc_flags, const struct alloc_context *ac,
unsigned long *did_some_progress)
{
struct page *page = NULL;
bool drained = false;
*did_some_progress = __perform_reclaim(gfp_mask, order, ac);
if (unlikely(!(*did_some_progress)))
return NULL;
retry:
page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
/*
* If an allocation failed after direct reclaim, it could be because
* pages are pinned on the per-cpu lists or in high alloc reserves.
* Shrink them them and try again
*/
if (!page && !drained) {
unreserve_highatomic_pageblock(ac, false);
drain_all_pages(NULL);
drained = true;
goto retry;
}
return page;
}
下面是unreserve_highatomic_pageblock()函数的具体定义:
/*
* Used when an allocation is about to fail under memory pressure. This
* potentially hurts the reliability of high-order allocations when under
* intense memory pressure but failed atomic allocations should be easier
* to recover from than an OOM.
*
* If @force is true, try to unreserve a pageblock even though highatomic
* pageblock is exhausted.
*/
static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
bool force)
{
struct zonelist *zonelist = ac->zonelist;
unsigned long flags;
struct zoneref *z;
struct zone *zone;
struct page *page;
int order;
bool ret;
for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->high_zoneidx,
ac->nodemask) {
/*
* Preserve at least one pageblock unless memory pressure
* is really high.
*/
if (!force && zone->nr_reserved_highatomic <=
pageblock_nr_pages)
continue;
spin_lock_irqsave(&zone->lock, flags);
for (order = 0; order < MAX_ORDER; order++) {
struct free_area *area = &(zone->free_area[order]);
page = list_first_entry_or_null(
&area->free_list[MIGRATE_HIGHATOMIC],
struct page, lru);
if (!page)
continue;
/*
* In page freeing path, migratetype change is racy so
* we can counter several free pages in a pageblock
* in this loop althoug we changed the pageblock type
* from highatomic to ac->migratetype. So we should
* adjust the count once.
*/
if (is_migrate_highatomic_page(page)) {
/*
* It should never happen but changes to
* locking could inadvertently allow a per-cpu
* drain to add pages to MIGRATE_HIGHATOMIC
* while unreserving so be safe and watch for
* underflows.
*/
zone->nr_reserved_highatomic -= min(
pageblock_nr_pages,
zone->nr_reserved_highatomic);
}
/*
* Convert to ac->migratetype and avoid the normal
* pageblock stealing heuristics. Minimally, the caller
* is doing the work and needs the pages. More
* importantly, if the block was always converted to
* MIGRATE_UNMOVABLE or another type then the number
* of pageblocks that cannot be completely freed
* may increase.
*/
set_pageblock_migratetype(page, ac->migratetype);
ret = move_freepages_block(zone, page, ac->migratetype,
NULL);
if (ret) {
spin_unlock_irqrestore(&zone->lock, flags);
return ret;
}
}
spin_unlock_irqrestore(&zone->lock, flags);
}
return false;
}
参考文献:
评论