关于GFP_ATOMIC内存

作者 by adtxl / 2022-07-27 / 暂无评论 / 196 个足迹

kernel verson: Android common kernel 4.19.176

1. 问题的引入

在monkey测试app切换过程中,发现log中会存在page allocation failure信息,如下所示,分配失败的原因未分配order4大小的GFP_ATOMIC页面失败,从下面打印的内存信息可以看出系统中所剩的order4及以上的页面类型全是CMA的,由此导致了分配失败。也就是说,虽然系统中还有可用的内存,但都是CMA或MOVEABLE类型的。由于内存碎片导致了分配order 4页面内存失败。

本文主要分析GFP_ATOMIC内存页面的分配和创建,看能否通过增加GFP_ATOMIC类型的页面,来减少或解决内存分配失败的问题?

[16694.233623] .(1)[2966:WifiHandlerThre]binder: 2567:2966 BC_REQUEST_DEATH_NOTIFICATION death notification already set
[16694.496553] .(0)[2318:wifi@1.0-servic][wlan]Set ALL DBG module log level to [0x2f]
[16694.506604] .(0)[2318:wifi@1.0-servic][wlan]Reset ALL DBG module log level to DEFAULT!
[16694.522205] .(1)[2318:wifi@1.0-servic]wifi@1.0-servic: page allocation failure: order:4, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null)
[16694.535169] -(0)[2318:wifi@1.0-servic]CPU: 0 PID: 2318 Comm: wifi@1.0-servic Tainted: P           O      4.19.176 #54
[16694.545833] -(0)[2318:wifi@1.0-servic]Hardware name: UniPhier LD20 Global Board v4 (REF_LD20_GP_V4) (DT)
[16694.555338] -(0)[2318:wifi@1.0-servic]Call trace:
[16694.560060] -(0)[2318:wifi@1.0-servic] dump_backtrace+0x0/0x1b0
[16694.565999] -(0)[2318:wifi@1.0-servic] show_stack+0x24/0x30
[16694.571590] -(0)[2318:wifi@1.0-servic] dump_stack+0xb4/0xec
[16694.577176] -(0)[2318:wifi@1.0-servic] warn_alloc+0xf0/0x158
[16694.582849] -(0)[2318:wifi@1.0-servic] __alloc_pages_nodemask+0xb5c/0xd68
[16694.589657] -(0)[2318:wifi@1.0-servic] kmalloc_order+0x38/0x78
[16694.595504] -(0)[2318:wifi@1.0-servic] kmalloc_order_trace+0x3c/0x110
[16694.602053] -(0)[2318:wifi@1.0-servic] glSetHifInfo+0x590/0x610 [wlan_7961_usb]
[16694.609441] -(0)[2318:wifi@1.0-servic] wlanGetConfig+0x428/0xc98 [wlan_7961_usb]
[16694.616926] -(0)[2318:wifi@1.0-servic] kalP2pIndicateChnlSwitch+0x61c/0x658 [wlan_7961_usb]
[16694.625305] -(0)[2318:wifi@1.0-servic] usb_probe_interface+0x190/0x2e8
[16694.631855] -(0)[2318:wifi@1.0-servic] really_probe+0x3c4/0x420
[16694.637789] -(0)[2318:wifi@1.0-servic] driver_probe_device+0x9c/0x148
[16694.644248] -(0)[2318:wifi@1.0-servic] __driver_attach+0x154/0x158
[16694.650444] -(0)[2318:wifi@1.0-servic] bus_for_each_dev+0x78/0xe0
[16694.656555] -(0)[2318:wifi@1.0-servic] driver_attach+0x30/0x40
[16694.662403] -(0)[2318:wifi@1.0-servic] bus_add_driver+0x1f0/0x288
[16694.668514] -(0)[2318:wifi@1.0-servic] driver_register+0x68/0x118
[16694.674624] -(0)[2318:wifi@1.0-servic] usb_register_driver+0x7c/0x170
[16694.681151] -(0)[2318:wifi@1.0-servic] glRegisterBus+0x88/0xa0 [wlan_7961_usb]
[16694.688456] -(0)[2318:wifi@1.0-servic] init_module+0x2b8/0x2d8 [wlan_7961_usb]
[16694.695700] -(0)[2318:wifi@1.0-servic] do_one_initcall+0x5c/0x260
[16694.701808] -(0)[2318:wifi@1.0-servic] do_init_module+0x64/0x1ec
[16694.707833] -(0)[2318:wifi@1.0-servic] load_module+0x1c7c/0x1ec0
[16694.713854] -(0)[2318:wifi@1.0-servic] __se_sys_finit_module+0xa0/0x100
[16694.720486] -(0)[2318:wifi@1.0-servic] __arm64_sys_finit_module+0x24/0x30
[16694.727295] -(0)[2318:wifi@1.0-servic] el0_svc_common.constprop.0+0x7c/0x198
[16694.734362] -(0)[2318:wifi@1.0-servic] el0_svc_compat_handler+0x2c/0x38
[16694.740993] -(0)[2318:wifi@1.0-servic] el0_svc_compat+0x8/0x34
[16694.748520] .(2)[2318:wifi@1.0-servic]Mem-Info:
[16694.753421] .(2)[2318:wifi@1.0-servic]active_anon:33088 inactive_anon:33107 isolated_anon:0
[16694.753421]  active_file:24010 inactive_file:24005 isolated_file:10
[16694.753421]  unevictable:1063 dirty:29 writeback:0 unstable:0
[16694.753421]  slab_reclaimable:10351 slab_unreclaimable:22174
[16694.753421]  mapped:39560 shmem:2208 pagetables:8811 bounce:0
[16694.753421]  free:79870 free_pcp:926 free_cma:65727
[16694.796994] .(2)[2318:wifi@1.0-servic]Node 0 active_anon:132352kB inactive_anon:132428kB active_file:96040kB inactive_file:96020kB unevictable:4252kB isolated(anon):0kB isolated(file):40kB mapped:158488kB dirty:116kB writeback:0kB shmem:8832kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[16694.831470] DMA32 free:319728kB min:4556kB low:29180kB high:30484kB active_anon:132384kB inactive_anon:132068kB active_file:96092kB inactive_file:96044kB unevictable:4252kB writepending:116kB present:1382784kB managed:1306504kB mlocked:4252kB kernel_stack:28576kB pagetables:35244kB bounce:0kB free_pcp:3508kB local_pcp:720kB free_cma:263032kB
[16694.863014] .(2)[2318:wifi@1.0-servic]lowmem_reserve[]: 0 0 0
[16694.869695] DMA32: 392*4kB (UMECH) 2000*8kB (UMECH) 1361*16kB (UMECH) 726*32kB (UECH) 20*64kB (C) 11*128kB (C) 1*256kB (C) 1*512kB (C) 2*1024kB (C) 3*2048kB (C) 60*4096kB (C) = 319984kB
[16694.889046] .(2)[2318:wifi@1.0-servic]Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[16694.900476] .(2)[2318:wifi@1.0-servic]53422 total pagecache pages
[16694.906729] .(2)[2318:wifi@1.0-servic]2580 pages in swap cache
[16694.912702] .(2)[2318:wifi@1.0-servic]Swap cache stats: add 3007676, delete 3005100, find 446225/2585766
[16694.922722] .(2)[2318:wifi@1.0-servic]Free swap  = 48216kB
[16694.928316] .(2)[2318:wifi@1.0-servic]Total swap = 393212kB
[16694.934441] .(2)[2318:wifi@1.0-servic]345696 pages RAM
[16694.940486] .(2)[2318:wifi@1.0-servic]0 pages HighMem/MovableOnly
[16694.946758] .(2)[2318:wifi@1.0-servic]19070 pages reserved
[16694.952821] .(2)[2318:wifi@1.0-servic]95232 pages cma reserved
[16694.958823] .(2)[2318:wifi@1.0-servic]0 pages hwpoisoned
[16694.973764] .(2)[2318:wifi@1.0-servic]wlan: probe of 1-3:1.3 failed with error -1

2. 代码实现

关于MIGRATE_HIGHATOMIC类型的引入,请参考Some kernel memory-allocation improvements.

There is still value in reserving blocks of memory for high-order allocations, though; fragmentation is still a concern in current kernels. So another part of Mel's patch set creates a new MIGRATE_HIGHATOMIC reserve that serves this purpose, but in a different way. Initially, this reserve contains no page blocks at all. If a high-order allocation cannot be satisfied without breaking up a previously whole page block, that block will be marked as being part of the high-order atomic reserve; thereafter, only higher-order allocations (and only high-priority ones at that) can be satisfied from that page block.

从介绍来看,针对高阶page的申请导致的碎片一直以来就是一个令人担忧的问题,所以Mel创建了一个新的page type,MIGRATE_HIGHATOMIC,用于放置碎片化过多的问题,仅仅高阶并且具备同等级别的才能从该pageblock中申请page。

2.1 highatomic 页面的reserve

从定义来看,MIGRATE_HIGHATOMIC的值和MIGRATE_PCPTYPES值相同

// include/linux/mmzone.h
enum migratetype {
    MIGRATE_UNMOVABLE,
    MIGRATE_MOVABLE,
    MIGRATE_RECLAIMABLE,
#ifdef CONFIG_CMA
    /*
     * MIGRATE_CMA migration type is designed to mimic the way
     * ZONE_MOVABLE works.  Only movable pages can be allocated
     * from MIGRATE_CMA pageblocks and page allocator never
     * implicitly change migration type of MIGRATE_CMA pageblock.
     *
     * The way to use it is to change migratetype of a range of
     * pageblocks to MIGRATE_CMA which can be done by
     * __free_pageblock_cma() function.  What is important though
     * is that a range of pageblocks must be aligned to
     * MAX_ORDER_NR_PAGES should biggest page be bigger then
     * a single pageblock.
     */
    MIGRATE_CMA,
#endif
    MIGRATE_PCPTYPES, /* the number of types on the pcp lists */
    MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES,
#ifdef CONFIG_MEMORY_ISOLATION
    MIGRATE_ISOLATE,    /* can't allocate from here */
#endif
    MIGRATE_TYPES
};

这一类型的页面使用下面的函数reserve.

// mm/page_alloc.c
/*
 * Reserve a pageblock for exclusive use of high-order atomic allocations if
 * there are no empty page blocks that contain a page with a suitable order
 */
static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
                unsigned int alloc_order)
{
    int mt;
    unsigned long max_managed, flags;

    /*
     * Limit the number reserved to 1 pageblock or roughly 1% of a zone.
     * Check is race-prone but harmless.
     */
    max_managed = (zone->managed_pages / 100) + pageblock_nr_pages;
    if (zone->nr_reserved_highatomic >= max_managed)
        return;

    spin_lock_irqsave(&zone->lock, flags);

    /* Recheck the nr_reserved_highatomic limit under the lock */
    if (zone->nr_reserved_highatomic >= max_managed)
        goto out_unlock;

    /* Yoink! */
    mt = get_pageblock_migratetype(page);
    if (!is_migrate_highatomic(mt) && !is_migrate_isolate(mt)
        && !is_migrate_cma(mt)) {
        zone->nr_reserved_highatomic += pageblock_nr_pages;
        set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
        move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
    }

out_unlock:
    spin_unlock_irqrestore(&zone->lock, flags);
}

函数调用关系为get_page_from_freelist()-->reserve_highatomic_pageblock(),

// mm/page_alloc.c
/*
 * get_page_from_freelist goes through the zonelist trying to allocate
 * a page.
 */
static struct page *
get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
                        const struct alloc_context *ac)
{
......
try_this_zone:
        page = rmqueue(ac->preferred_zoneref->zone, zone, order,
                gfp_mask, alloc_flags, ac->migratetype);
        if (page) {
            prep_new_page(page, order, gfp_mask, alloc_flags);

            /*
             * If this is a high-order atomic allocation then check
             * if the pageblock should be reserved for the future
             */
            if (unlikely(order && (alloc_flags & ALLOC_HARDER)))
                reserve_highatomic_pageblock(page, zone, order);

            return page;
        } else {
......

从上面代码可以看到,在申请page的时候,会通过判定该page申请时的一些flags配置,例如属于ALLOC_HARDER,表明该页申请时无低阶page申请时从高阶获取,可以认为该page是从high order分配下来的,所以将该页加入到highatomic_pageblock中,但是该类型的页又不能无限多,否则后面内存紧张的时候,就无法申请page了。

所以尽量设置该pageblock的数目小于zone里面page的1/100。

max_managed = (zone_managed_pages(zone) / 100) + pageblock_nr_pages;

最终会调用move_freepages_block函数将该页移动到highatomic_pageblock中去。

2.2 关于MIGRATE_HIGHATOMIC的释放

内核函数往往体现这对称性,将page加入到highatomic_pageblock中去采用的reserve_highatomic_pageblock函数,而将page从该highatomic_pageblock移除,则调用的unreserve_highatomic_pageblock函数.

函数调用__alloc_pages_nodemask()-->__alloc_pages_slowpath()-->_alloc_pages_direct_reclaim()-->unreserve_highatomic_pageblock()。简单说,就是当直接get_page_from_freelist()函数无法获取内存时,会从slowpath获取内存,在slowpath中会做内存回收,将会回收highatomic_pageblock类型的内存。
此处不再做过多的解释,大体意思如下,当alloc_page
slow申请page的时候失败,会从该highatomic_pageblock中申请page。

/* The really slow allocator path where we enter direct reclaim */
static inline struct page *
__alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
        unsigned int alloc_flags, const struct alloc_context *ac,
        unsigned long *did_some_progress)
{
    struct page *page = NULL;
    bool drained = false;

    *did_some_progress = __perform_reclaim(gfp_mask, order, ac);
    if (unlikely(!(*did_some_progress)))
        return NULL;

retry:
    page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);

    /*
     * If an allocation failed after direct reclaim, it could be because
     * pages are pinned on the per-cpu lists or in high alloc reserves.
     * Shrink them them and try again
     */
    if (!page && !drained) {
        unreserve_highatomic_pageblock(ac, false);
        drain_all_pages(NULL);
        drained = true;
        goto retry;
    }

    return page;
}

下面是unreserve_highatomic_pageblock()函数的具体定义:

/*
 * Used when an allocation is about to fail under memory pressure. This
 * potentially hurts the reliability of high-order allocations when under
 * intense memory pressure but failed atomic allocations should be easier
 * to recover from than an OOM.
 *
 * If @force is true, try to unreserve a pageblock even though highatomic
 * pageblock is exhausted.
 */
static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
                        bool force)
{
    struct zonelist *zonelist = ac->zonelist;
    unsigned long flags;
    struct zoneref *z;
    struct zone *zone;
    struct page *page;
    int order;
    bool ret;

    for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->high_zoneidx,
                                ac->nodemask) {
        /*
         * Preserve at least one pageblock unless memory pressure
         * is really high.
         */
        if (!force && zone->nr_reserved_highatomic <=
                    pageblock_nr_pages)
            continue;

        spin_lock_irqsave(&zone->lock, flags);
        for (order = 0; order < MAX_ORDER; order++) {
            struct free_area *area = &(zone->free_area[order]);

            page = list_first_entry_or_null(
                    &area->free_list[MIGRATE_HIGHATOMIC],
                    struct page, lru);
            if (!page)
                continue;

            /*
             * In page freeing path, migratetype change is racy so
             * we can counter several free pages in a pageblock
             * in this loop althoug we changed the pageblock type
             * from highatomic to ac->migratetype. So we should
             * adjust the count once.
             */
            if (is_migrate_highatomic_page(page)) {
                /*
                 * It should never happen but changes to
                 * locking could inadvertently allow a per-cpu
                 * drain to add pages to MIGRATE_HIGHATOMIC
                 * while unreserving so be safe and watch for
                 * underflows.
                 */
                zone->nr_reserved_highatomic -= min(
                        pageblock_nr_pages,
                        zone->nr_reserved_highatomic);
            }

            /*
             * Convert to ac->migratetype and avoid the normal
             * pageblock stealing heuristics. Minimally, the caller
             * is doing the work and needs the pages. More
             * importantly, if the block was always converted to
             * MIGRATE_UNMOVABLE or another type then the number
             * of pageblocks that cannot be completely freed
             * may increase.
             */
            set_pageblock_migratetype(page, ac->migratetype);
            ret = move_freepages_block(zone, page, ac->migratetype,
                                    NULL);
            if (ret) {
                spin_unlock_irqrestore(&zone->lock, flags);
                return ret;
            }
        }
        spin_unlock_irqrestore(&zone->lock, flags);
    }

    return false;
}

参考文献:

  1. 为什么会有MIGRATE_PCPTYPES

独特见解