Bug:alloc_contig_range: [c1dab, c1dac) PFNs busy

作者 by adtxl / 2022-02-16 / 暂无评论 / 679 个足迹

1. 问题说明:

该问题是因为camera持续录制时,audio数据丢失,当录制视频分辨率为HD 1280*720,录制一段时间后没有了声音。

同事分析的root cause就是大量的PCM数据都堆积在MediaCodecSource的puller中,而音频编码器收不到数据,或者收数据很慢。而收到数据之后,编码工作又很快就完成了

其中怀疑可能和下面的log有关系,

  358.217486] .(1)[3765:win.aac.encoder]alloc_contig_range: [c1dab, c1dac) PFNs busy
[  365.715042] .(1)[3765:win.aac.encoder]alloc_contig_range: [c1db7, c1db8) PFNs busy
[  374.078024] .(1)[3765:win.aac.encoder]alloc_contig_range: [c1d99, c1d9a) PFNs busy
[  384.576737] .(1)[3765:win.aac.encoder]alloc_contig_range: [c1da1, c1da2) PFNs busy
[  385.807956] .(3)[3765:win.aac.encoder]alloc_contig_range: [c1db3, c1db4) PFNs busy
[  393.270187] .(0)[3765:win.aac.encoder]alloc_contig_range: [c1d9c, c1d9d) PFNs busy
...

2. 问题分析

主要分析下PFNs busy的原因,代码在page_alloc.c函数中,

函数调用流程:

alloc_contig_range()-->test_pages_isolated()

// kernel/common/mm/page_alloc.c


/**
 * alloc_contig_range() -- tries to allocate given range of pages
 * @start:  start PFN to allocate
 * @end:    one-past-the-last PFN to allocate
 * @migratetype:    migratetype of the underlaying pageblocks (either
 *          #MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
 *          in range must have the same migratetype and it must
 *          be either of the two.
 * @gfp_mask:   GFP mask to use during compaction
 *
 * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
 * aligned.  The PFN range must belong to a single zone.
 *
 * The first thing this routine does is attempt to MIGRATE_ISOLATE all
 * pageblocks in the range.  Once isolated, the pageblocks should not
 * be modified by others.
 *
 * Returns zero on success or negative error code.  On success all
 * pages which PFN is in [start, end) are allocated for the caller and
 * need to be freed with free_contig_range().
 */
int alloc_contig_range(unsigned long start, unsigned long end,
               unsigned migratetype, gfp_t gfp_mask)
{
......
    /* Make sure the range is really isolated. */
    if (test_pages_isolated(outer_start, end, false)) {
        pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
            __func__, outer_start, end);
        ret = -EBUSY;
        goto done;
    }
}
......

其中,test_pages_isolated()函数检查需要分配的内存范围地址是否isolated,如果检查成功,则返回0.否则返回-EBUSY,

/* Caller should ensure that requested range is in a single zone */
int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
            bool skip_hwpoisoned_pages)
{
    unsigned long pfn, flags;
    struct page *page;
    struct zone *zone;

    /*
     * Note: pageblock_nr_pages != MAX_ORDER. Then, chunks of free pages
     * are not aligned to pageblock_nr_pages.
     * Then we just check migratetype first.
     */
    for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
        page = __first_valid_page(pfn, pageblock_nr_pages);
        if (page && !is_migrate_isolate_page(page))
            break;
    }
    page = __first_valid_page(start_pfn, end_pfn - start_pfn);
    if ((pfn < end_pfn) || !page)
        return -EBUSY;
    /* Check all pages are free or marked as ISOLATED */
    zone = page_zone(page);
    spin_lock_irqsave(&zone->lock, flags);
    pfn = __test_page_isolated_in_pageblock(start_pfn, end_pfn,
                        skip_hwpoisoned_pages);
    spin_unlock_irqrestore(&zone->lock, flags);

    trace_test_pages_isolated(start_pfn, end_pfn, pfn);

    return pfn < end_pfn ? -EBUSY : 0;
}

独特见解