kernel exception时打印的ESR相关信息解读

adtxl
2022-08-03 / 0 评论 / 123 阅读 / 正在检测是否收录...

1. 问题说明

内核发生异常时,会打印相关的状态信息用于debug.下面是一个kernel exception信息,

运行环境:linux kernel 4.19; arm64

[10758.661444] .(1)[14489:kworker/1:0H]Unhandled fault at 0xffffffc07f8f0000
[10758.668671] .(1)[14489:kworker/1:0H]Mem abort info:
[10758.674891] .(1)[14489:kworker/1:0H]  ESR = 0x96000170
[10758.680107] .(1)[14489:kworker/1:0H]  Exception class = DABT (current EL), IL = 32 bits
[10758.688829] .(1)[14489:kworker/1:0H]  SET = 0, FnV = 0
[10758.695334] .(1)[14489:kworker/1:0H]  EA = 0, S1PTW = 0
[10758.700682] .(1)[14489:kworker/1:0H]Data abort info:
[10758.706239] .(1)[14489:kworker/1:0H]  ISV = 0, ISS = 0x00000170
[10758.713516] .(1)[14489:kworker/1:0H]  CM = 1, WnR = 1
[10758.718823] .(1)[14489:kworker/1:0H]swapper pgtable: 4k pages, 39-bit VAs, pgdp = 000000007bcc0f8b
[10758.728199] .(1)[14489:kworker/1:0H][ffffffc07f8f0000] pgd=00000000fff99803, pud=00000000fff99803, pmd=00000000fff36803, pte=00780000ff8f0f13
[10758.743744] -(1)[14489:kworker/1:0H]Internal error: TLB conflict abort: 96000170 [#1] PREEMPT SMP
[10758.752661] -(1)[14489:kworker/1:0H]Modules linked in: wlan_7961_usb(O) mtk7921_btusb(O) snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_hwdep sni_ave combined_init dsp_uniphier_ld20(O) dsp_uniphier(O) hantrodec(O) hx170dec(O) acodec_uniphier(O) uniphier_spdif_tx(O) uniphier_thru_io(O) avtv_com mss tsd avout uniphier_thru_in(O) tvadjust uniphier_thru_out(O) uniphier_evea(O) tvp snd_soc_uniphier_aio2013(O) power snd_soc_bd28623(O) pincon uniphier_ld20_mn884434_helene hdmirx ttuner uniphier_dvb sc1501a helene stream tsport demux dwc3_uniphier dwc3 audio video eeprom mali_kbase vbi vpedrv_ld20(PO) vacancy tuner_pwr svl vocdrv_ld20(O) psi msc led_drv iccard hsc_udl(O) counter vio_rm(O) cec exivdrv_ld20(PO) amixer cpcl_ld20(PO) aesdescramble fbmem fa_uniphier(O) stream_com descramble set ion_uniphier(O) piedrv_sc devglue(O) pow(O)
[10758.825944]  map_reg(O) [last unloaded: wlan_7961_usb]
[10758.831096] -(1)[14489:kworker/1:0H]Process kworker/1:0H (pid: 14489, stack limit = 0x00000000f24a4da3)
[10758.840520] -(1)[14489:kworker/1:0H]CPU: 1 PID: 14489 Comm: kworker/1:0H Tainted: P           O      4.19.176 #56
[10758.850812] -(1)[14489:kworker/1:0H]Hardware name: UniPhier LD20 Global Board v4 (REF_LD20_GP_V4) (DT)
[10758.860160] -(1)[14489:kworker/1:0H]Workqueue: kblockd blk_mq_run_work_fn
[10758.866966] -(1)[14489:kworker/1:0H]pstate: 40400005 (nZcv daif +PAN -UAO)
[10758.873864] -(1)[14489:kworker/1:0H]pc : __dma_inv_area+0x40/0x58
[10758.879970] -(1)[14489:kworker/1:0H]lr : __swiotlb_map_sg_attrs+0xa4/0xd0
[10758.886773] -(1)[14489:kworker/1:0H]sp : ffffff801780ba80
[10758.892185] -(1)[14489:kworker/1:0H]x29: ffffff801780ba80 x28: ffffffc0799f9940 
[10758.899603] -(1)[14489:kworker/1:0H]x27: 0000000000000002 x26: ffffffc07998a618 
[10758.907019] -(1)[14489:kworker/1:0H]x25: ffffffc07998a600 x24: ffffff8009058000 
[10758.914435] -(1)[14489:kworker/1:0H]x23: 0000000000000002 x22: 0000000000000076 
[10758.921849] -(1)[14489:kworker/1:0H]x21: ffffffc07b1b8810 x20: 0000000000000027 
[10758.929265] -(1)[14489:kworker/1:0H]x19: ffffffc079a334c0 x18: ffffff8008f4a580 
[10758.936682] -(1)[14489:kworker/1:0H]x17: 0000000000000000 x16: 0000000000000000 
[10758.944098] -(1)[14489:kworker/1:0H]x15: 0000000000000000 x14: 0000000000000000 
[10758.951513] -(1)[14489:kworker/1:0H]x13: 0000000000000000 x12: 00000000ca898000 
[10758.958928] -(1)[14489:kworker/1:0H]x11: ffffffc078dd4400 x10: 0000000000000ac0 
[10758.966345] -(1)[14489:kworker/1:0H]x9 : ffffffbf010d4180 x8 : ffffffc0799f9bb0 
[10758.973760] -(1)[14489:kworker/1:0H]x7 : 0000000000000000 x6 : 000000000000003f 
[10758.981175] -(1)[14489:kworker/1:0H]x5 : ffffffffffffffff x4 : 0000000080000000 
[10758.988592] -(1)[14489:kworker/1:0H]x3 : 000000000000003f x2 : 0000000000000040 
[10758.996008] -(1)[14489:kworker/1:0H]x1 : ffffffc07f8f1000 x0 : ffffffc07f8f0000 
[10759.003427] -(1)[14489:kworker/1:0H]Call trace:
[10759.007972] -(1)[14489:kworker/1:0H] __dma_inv_area+0x40/0x58
[10759.013733] -(1)[14489:kworker/1:0H] sdhci_pre_dma_transfer+0x154/0x1a8
[10759.020364] -(1)[14489:kworker/1:0H] sdhci_pre_req+0x58/0x60
[10759.026038] -(1)[14489:kworker/1:0H] mmc_blk_mq_issue_rq+0x354/0x778
[10759.032410] -(1)[14489:kworker/1:0H] mmc_mq_queue_rq+0x148/0x260
[10759.038435] -(1)[14489:kworker/1:0H] blk_mq_dispatch_rq_list+0xb8/0x5a8
[10759.045066] -(1)[14489:kworker/1:0H] blk_mq_do_dispatch_sched+0xa4/0x138
[10759.051786] -(1)[14489:kworker/1:0H] blk_mq_sched_dispatch_requests+0x104/0x198
[10759.059116] -(1)[14489:kworker/1:0H] __blk_mq_run_hw_queue+0xb4/0x120
[10759.065572] -(1)[14489:kworker/1:0H] blk_mq_run_work_fn+0x28/0x38
[10759.071684] -(1)[14489:kworker/1:0H] process_one_work+0x1a8/0x420
[10759.077791] -(1)[14489:kworker/1:0H] worker_thread+0x54/0x3e8
[10759.083553] -(1)[14489:kworker/1:0H] kthread+0x120/0x160
[10759.088880] -(1)[14489:kworker/1:0H] ret_from_fork+0x10/0x1c
[10759.094560] -(1)[14489:kworker/1:0H]Code: 8a230000 54000060 d50b7e20 14000002 (d5087620) 
[10759.102760] -(1)[14489:kworker/1:0H]---[ end trace b0ded5eee466978f ]---
[10759.109476] -(1)[14489:kworker/1:0H]Kernel panic - not syncing: Fatal exception
[10759.116808] -(1)[14489:kworker/1:0H]SMP: stopping secondary CPUs
[10759.122831] -(1)[14489:kworker/1:0H]Kernel Offset: 0x80000 from 0xffffff8008000000
[10759.130418] -(1)[14489:kworker/1:0H]CPU features: 0x00000000,2180600c
[10759.136873] -(1)[14489:kworker/1:0H]Memory Limit: none
[10759.142025] -(1)[14489:kworker/1:0H]Rebooting in 5 seconds..

2. ESR相关信息说明

[10758.661444] .(1)[14489:kworker/1:0H]Unhandled fault at 0xffffffc07f8f0000
[10758.668671] .(1)[14489:kworker/1:0H]Mem abort info:
[10758.674891] .(1)[14489:kworker/1:0H]  ESR = 0x96000170
[10758.680107] .(1)[14489:kworker/1:0H]  Exception class = DABT (current EL), IL = 32 bits
[10758.688829] .(1)[14489:kworker/1:0H]  SET = 0, FnV = 0
[10758.695334] .(1)[14489:kworker/1:0H]  EA = 0, S1PTW = 0
[10758.700682] .(1)[14489:kworker/1:0H]Data abort info:
[10758.706239] .(1)[14489:kworker/1:0H]  ISV = 0, ISS = 0x00000170
[10758.713516] .(1)[14489:kworker/1:0H]  CM = 1, WnR = 1
[10758.718823] .(1)[14489:kworker/1:0H]swapper pgtable: 4k pages, 39-bit VAs, pgdp = 000000007bcc0f8b
[10758.728199] .(1)[14489:kworker/1:0H][ffffffc07f8f0000] pgd=00000000fff99803, pud=00000000fff99803, pmd=00000000fff36803, pte=00780000ff8f0f13
[10758.743744] -(1)[14489:kworker/1:0H]Internal error: TLB conflict abort: 96000170 [#1] PREEMPT SMP

第一行简要说明了kernel为什么panic了,原因为Unhandled fault at 0xffffffc07f8f0000,也就是说在这个地址遇到无法处理的错误了。
再下面打印了Mem abort info,可以看到ESR寄存器的值为96000170.关于这个值的解读,我们可以使用下面的网址解析

https://esr.arm64.dev/

结果如下:

image.png

后面会打印的log也是从这个寄存器解析出来的,并打印了页表信息。
最好打印出错的原因,Internal error: TLB conflict abort: 96000170 [#1] PREEMPT SMP。我也不是很清楚这个错误具体是什么原因,感觉应该是tlb的同步出问题了。

参考https://developer.arm.com/documentation/ddi0406/c/System-Level-Architecture/Virtual-Memory-System-Architecture--VMSA-/Translation-Lookaside-Buffers--TLBs-/TLB-conflict-aborts

3. 驱动相关信息

后面会接着打印驱动相关信息,加载了哪些驱动。[last unloaded: wlan_7961_usb]说明上次卸载的模块。

[10758.752661] -(1)[14489:kworker/1:0H]Modules linked in: wlan_7961_usb(O) mtk7921_btusb(O) snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_hwdep sni_ave combined_init dsp_uniphier_ld20(O) dsp_uniphier(O) hantrodec(O) hx170dec(O) acodec_uniphier(O) uniphier_spdif_tx(O) uniphier_thru_io(O) avtv_com mss tsd avout uniphier_thru_in(O) tvadjust uniphier_thru_out(O) uniphier_evea(O) tvp snd_soc_uniphier_aio2013(O) power snd_soc_bd28623(O) pincon uniphier_ld20_mn884434_helene hdmirx ttuner uniphier_dvb sc1501a helene stream tsport demux dwc3_uniphier dwc3 audio video eeprom mali_kbase vbi vpedrv_ld20(PO) vacancy tuner_pwr svl vocdrv_ld20(O) psi msc led_drv iccard hsc_udl(O) counter vio_rm(O) cec exivdrv_ld20(PO) amixer cpcl_ld20(PO) aesdescramble fbmem fa_uniphier(O) stream_com descramble set ion_uniphier(O) piedrv_sc devglue(O) pow(O)
[10758.825944]  map_reg(O) [last unloaded: wlan_7961_usb]

4. cpu和寄存器相关信息

[10758.831096] -(1)[14489:kworker/1:0H]Process kworker/1:0H (pid: 14489, stack limit = 0x00000000f24a4da3)
[10758.840520] -(1)[14489:kworker/1:0H]CPU: 1 PID: 14489 Comm: kworker/1:0H Tainted: P           O      4.19.176 #56
[10758.850812] -(1)[14489:kworker/1:0H]Hardware name: UniPhier LD20 Global Board v4 (REF_LD20_GP_V4) (DT)
[10758.860160] -(1)[14489:kworker/1:0H]Workqueue: kblockd blk_mq_run_work_fn
[10758.866966] -(1)[14489:kworker/1:0H]pstate: 40400005 (nZcv daif +PAN -UAO)
[10758.873864] -(1)[14489:kworker/1:0H]pc : __dma_inv_area+0x40/0x58
[10758.879970] -(1)[14489:kworker/1:0H]lr : __swiotlb_map_sg_attrs+0xa4/0xd0
[10758.886773] -(1)[14489:kworker/1:0H]sp : ffffff801780ba80
[10758.892185] -(1)[14489:kworker/1:0H]x29: ffffff801780ba80 x28: ffffffc0799f9940 
[10758.899603] -(1)[14489:kworker/1:0H]x27: 0000000000000002 x26: ffffffc07998a618 
[10758.907019] -(1)[14489:kworker/1:0H]x25: ffffffc07998a600 x24: ffffff8009058000 
[10758.914435] -(1)[14489:kworker/1:0H]x23: 0000000000000002 x22: 0000000000000076 
[10758.921849] -(1)[14489:kworker/1:0H]x21: ffffffc07b1b8810 x20: 0000000000000027 
[10758.929265] -(1)[14489:kworker/1:0H]x19: ffffffc079a334c0 x18: ffffff8008f4a580 
[10758.936682] -(1)[14489:kworker/1:0H]x17: 0000000000000000 x16: 0000000000000000 
[10758.944098] -(1)[14489:kworker/1:0H]x15: 0000000000000000 x14: 0000000000000000 
[10758.951513] -(1)[14489:kworker/1:0H]x13: 0000000000000000 x12: 00000000ca898000 
[10758.958928] -(1)[14489:kworker/1:0H]x11: ffffffc078dd4400 x10: 0000000000000ac0 
[10758.966345] -(1)[14489:kworker/1:0H]x9 : ffffffbf010d4180 x8 : ffffffc0799f9bb0 
[10758.973760] -(1)[14489:kworker/1:0H]x7 : 0000000000000000 x6 : 000000000000003f 
[10758.981175] -(1)[14489:kworker/1:0H]x5 : ffffffffffffffff x4 : 0000000080000000 
[10758.988592] -(1)[14489:kworker/1:0H]x3 : 000000000000003f x2 : 0000000000000040 
[10758.996008] -(1)[14489:kworker/1:0H]x1 : ffffffc07f8f1000 x0 : ffffffc07f8f0000 

CPU后的数字1表示错误发生在CPU1上,PID即异常进程的进程号为14489,内核污染原因为[P],内核版本号为4.19.176

内核污染原因包括:私有驱动加载(P),模块强制加载(F),机器检查异常发生[M],检测到错误页(B)等。

如果涉及到了某项原因,就会显示为Tainted: G PF R这样。如果不存在问题,就会显示为Not Tainted。

其中Tainted的表示可以从内核中 kernel/panic.c 中找到:

Tainted描述
‘G’if all modules loaded have a GPL or compatible license
‘P’if any proprietary module has been loaded. Modules without a MODULE_LICENSE or with a MODULE_LICENSE that is not recognised by insmod as GPL compatible are assumed to be proprietary.
‘F’if any module was force loaded by “insmod -f”.
‘S’if the Oops occurred on an SMP kernel running on hardware that hasn’t been certified as safe to run multiprocessor. Currently this occurs only on various Athlons that are not SMP capable.
‘R’if a module was force unloaded by “rmmod -f”.
‘M’if any processor has reported a Machine Check Exception.
‘B’if a page-release function has found a bad page reference or some unexpected page flags.
‘U’if a user or user application specifically requested that the Tainted flag be set.
‘D’if the kernel has died recently, i.e. there was an OOPS or BUG.
‘W’if a warning has previously been issued by the kernel.
‘C’if a staging module / driver has been loaded.
‘I’if the kernel is working around a sever bug in the platform’s firmware (BIOS or similar).

Hardware name表示硬件平台的名称。
stack limit显示的大小为kstack内核选项指定的大小。

后面打印了异常发生时寄存器的值:

PC: PC也叫计数寄存器,用于存放下一条要执行的指令的地址,因此在子程序返回后,要将LR中的地址存入PC,即mov PC LR

LR: 子程序的返回地址:从子程序返回后,主程序继续执行的指令的地址称为子程序的返回地址.LR也叫链接寄存器,用于存放子程序的返回地址。在要进入子程序之前,先将子程序的返回地址存入LR

SP: SP也叫堆栈寄存器,用于存放要执行的数据。

X0~X7:传递子程序的参数和返回值,使用时不需要保存,多余的参数用堆栈传递,64位的返回结果保存在x0中。

X8:用于保存子程序的返回地址,使用时不需要保存。

X9~X15:临时寄存器,也叫可变寄存器,子程序使用时不需要保存。

X16~X17:子程序内部调用寄存器(IPx),使用时不需要保存,尽量不要使用。

X18:平台寄存器,它的使用与平台相关,尽量不要使用。

X19~X28:临时寄存器,子程序使用时必须保存。

X29:帧指针寄存器(FP),用于连接栈帧,使用时必须保存。

X30:链接寄存器(LR),用于保存子程序的返回地址。

X31:堆栈指针寄存器(SP),用于指向每个函数的栈顶。

再后面打印栈的回溯信息,可以从中看出函数调用关系。

[10759.094560] -(1)[14489:kworker/1:0H]Code: 8a230000 54000060 d50b7e20 14000002 (d5087620) 
[10759.102760] -(1)[14489:kworker/1:0H]---[ end trace b0ded5eee466978f ]---
[10759.109476] -(1)[14489:kworker/1:0H]Kernel panic - not syncing: Fatal exception
[10759.116808] -(1)[14489:kworker/1:0H]SMP: stopping secondary CPUs
[10759.122831] -(1)[14489:kworker/1:0H]Kernel Offset: 0x80000 from 0xffffff8008000000
[10759.130418] -(1)[14489:kworker/1:0H]CPU features: 0x00000000,2180600c
[10759.136873] -(1)[14489:kworker/1:0H]Memory Limit: none
[10759.142025] -(1)[14489:kworker/1:0H]Rebooting in 5 seconds..

code是错误发生时PC指向的地址处的开头20字节的代码,括号里的是出错的具体指令。

参考文献:
1.kernel exception时打印出的ESR相关信息

2.ARMv8 异常处理简介

0

评论

博主关闭了当前页面的评论