<acronym id="s8ci2"><small id="s8ci2"></small></acronym>
<rt id="s8ci2"></rt><rt id="s8ci2"><optgroup id="s8ci2"></optgroup></rt>
<acronym id="s8ci2"></acronym>
<acronym id="s8ci2"><center id="s8ci2"></center></acronym>
0
  • 聊天消息
  • 系統消息
  • 評論與回復
登錄后你可以
  • 下載海量資料
  • 學習在線課程
  • 觀看技術視頻
  • 寫文章/發帖/加入社區
創作中心

完善資料讓更多小伙伴認識你,還能領取20積分哦,立即完善>

3天內不再提示

探索aarch64架構上使用ftrace的BPF LSM

Linux閱碼場 ? 來源:Linux閱碼場 ? 2024-01-25 09:30 ? 次閱讀

譯者注

筆者在MacBook M2上搭建Linux虛擬機上開發eBPF程序時,遇到一些LSM eBPF類型程序無法運行的問題,哪怕是5.15內核的ubuntu server,依舊無法正常運行。顯然,aarch64跟x86_64的內核功能有差異。在筆者嘗試定位這些差異時,看到這篇文章,可以讓大家更直觀地了解LSM eBPF在兩種CPU 內核上的差異。

原文本博客文章是我們在Linux中對于`aarch64`上`BPF LSM`支持的內部研究的摘要。如果你對內核代碼庫不熟悉,要開始查看內核源碼是非常困難的,因此我們決定發布這篇文章,展示我們的方法,因為這對于想要探索內核內部的任何人都可能有所幫助。

簡介

在x86_64上,我們已經在使用BPF LSM,而在aarch64上,我們依賴于Kprobes,因此我們想知道內核中缺少了哪些功能,才能讓這些功能在aarch64上可用。

我們曾多次深入研究內核源代碼,但通常我們搜索的是已經存在的東西,以了解其工作原理。但在這種情況下,我們在尋找的是不存在的東西,我們追尋的是那些因為未實現而返回錯誤的內容。

回想起Steven Rostedt關于如何開始學習Linux內核的講話,我們從ftrace(以及構建在跟蹤基礎設施上的工具)開始,以了解當我們將一個不受支持的BPF程序加載到內核時會發生什么。

問題

這是當我們嘗試將一個BPF LSM程序加載到aarch64 5.15 Linux內核時,使用我們的軟件pulsar[2]時的輸出:

root@pine64-1:/home/exein#./pulsar-enterprise-execpulsard
[2023-02-16T1445ZINFOpulsar::daemon]Startingmoduleprocess-monitor
[2023-02-16T1445ZINFOpulsar::daemon]Startingmodulefile-system-monitor
[2023-02-16T1446ZINFOpulsar::daemon]Startingmodulenetwork-monitor
[2023-02-16T1446ZINFOpulsar::daemon]Startingmodulelogger
[2023-02-16T1446ZINFOpulsar::daemon]Startingmodulerules-engine
[2023-02-16T1446ZINFOpulsar::daemon]Startingmoduledesktop-notifier
[2023-02-16T1446ZERRORpulsar::module_manager]Moduleerrorinfile-system-monitor:failedprogramattachlsmpath_mknod

Causedby:
0:`bpf_raw_tracepoint_open`failed
1:Noerrorinformation(oserror524)
[2023-02-16T1446ZINFOpulsar::daemon]Startingmoduleanomaly-detection
[2023-02-16T1446ZINFOpulsar::daemon]Startingmodulemalware-detection
[2023-02-16T1446ZERRORpulsar::module_manager]Moduleerrorinmalware-detection:/var/lib/pulsar/malware_detection/models/parameters.jsonnotfound
[2023-02-16T1446ZINFOpulsar::daemon]Startingmoduleplatform-connector
[2023-02-16T1446ZINFOplatform_connector::client]Connectedtohttps://platform-dev-instance.exein.io:8001/
[2023-02-16T1446ZINFOpulsar::daemon]Startingmodulethreat-response
[2023-02-16T1446ZERRORpulsar::module_manager]Moduleerrorinnetwork-monitor:failedprogramattachlsmsocket_bind

Causedby:
0:`bpf_raw_tracepoint_open`failed
1:Noerrorinformation(oserror524)

我們在嘗試加載與path_mknodLSM掛鉤相關的BPF程序時,pulsar出現了錯誤524或ENOTSUPP。讓我們嘗試深入研究這個問題。

注意: 在進行這項研究時,我們當時無法找到預先編譯為啟用BPF和BTF的aarch64,因此我們不得不編譯一個自定義內核。我們還啟用了跟蹤選項和function_graph插件,以使用下面的工具。
所有的實驗都是在一臺裝有定制Armbian[3]鏡像的Pine A64上進行的。
這些鏡像具有帶有標準Ubuntu 22.04 LTS Jammy用戶空間的自定義內核。

工具

為了調查這個問題,我們使用了以下工具:

bpftrace[4]:基于BPF的工具,使用自定義類C語言動態附加探針。

trace-cmd[5]:圍繞tracefs文件系統的包裝器,與ftrace基礎設施交互。

要使用這些工具,您需要在Linux內核中啟用一些選項,請查閱官方文檔獲取完整的要求。

注意: 也可以使用其他工具來完成相同的工作,例如perf-tools[6]中的funcgraph和kprobe。

Linux 5.15

現在我們開始使用這些工具來查看在內核5.15中嘗試加載我們的BPF程序時會發生什么。

從這一點開始到本文末尾,我們將使用probe二進制文件代替pulsar,因為它更簡單。為了簡要概括其工作原理,以下是命令行幫助:

exein@pine64-1:~$./probe
TestrunnerforeBPFprograms

Usage:probe[OPTIONS]

Commands:
file-system-monitorWatchfilecreations
process-monitorWatchprocessevents(fork/exec/exit)
network-monitorWatchnetworkevents
helpPrintthismessageorthehelpofthegivensubcommand(s)

Options:
-v,--verbose
-h,--helpPrinthelp
-V,--versionPrintversion

在這些示例中,我們將嘗試加載file-system-monitor探針。

通過運行以下命令,我們可以看到__sys_bpf函數的函數圖調用,這是BPF系統調用的入口點:

trace-cmdrecord-pfunction_graph-g__sys_bpf./probefile-system-monitor
trace-cmdreport

輸出是一個非常龐大的函數圖,太大了,無法在這里粘貼。由于我們遇到了錯誤,我們對程序停止前的最后幾個函數感興趣。以下是trace-cmd report輸出的最后幾行:

...
tokio-runtime-w-1666[003]1318.058019:funcgraph_entry:|bpf_trampoline_link_prog(){
tokio-runtime-w-1666[003]1318.058020:funcgraph_entry:2.292us|bpf_attach_type_to_tramp();
tokio-runtime-w-1666[003]1318.058024:funcgraph_entry:1.250us|mutex_lock();
tokio-runtime-w-1666[003]1318.058028:funcgraph_entry:|bpf_trampoline_update(){
tokio-runtime-w-1666[003]1318.058030:funcgraph_entry:|kmem_cache_alloc_trace(){
tokio-runtime-w-1666[003]1318.058031:funcgraph_entry:1.167us|should_failslab();
tokio-runtime-w-1666[003]1318.058036:funcgraph_exit:6.792us|}
tokio-runtime-w-1666[003]1318.058039:funcgraph_entry:|kmem_cache_alloc_trace(){
tokio-runtime-w-1666[003]1318.058042:funcgraph_entry:2.750us|should_failslab();
tokio-runtime-w-1666[003]1318.058046:funcgraph_exit:6.417us|}
tokio-runtime-w-1666[003]1318.058048:funcgraph_entry:2.708us|bpf_jit_charge_modmem();
tokio-runtime-w-1666[003]1318.058053:funcgraph_entry:|bpf_jit_alloc_exec_page(){
tokio-runtime-w-1666[003]1318.058055:funcgraph_entry:|bpf_jit_alloc_exec(){
tokio-runtime-w-1666[003]1318.058057:funcgraph_entry:|vmalloc(){
tokio-runtime-w-1666[003]1318.058059:funcgraph_entry:|__vmalloc_node(){
tokio-runtime-w-1666[003]1318.058061:funcgraph_entry:|__vmalloc_node_range(){
tokio-runtime-w-1666[003]1318.058064:funcgraph_entry:|__get_vm_area_node.constprop.64(){
tokio-runtime-w-1666[003]1318.058067:funcgraph_entry:|kmem_cache_alloc_node_trace(){
tokio-runtime-w-1666[003]1318.058069:funcgraph_entry:1.459us|should_failslab();
tokio-runtime-w-1666[003]1318.058073:funcgraph_exit:6.292us|}
tokio-runtime-w-1666[003]1318.058075:funcgraph_entry:|alloc_vmap_area(){
tokio-runtime-w-1666[003]1318.058077:funcgraph_entry:|kmem_cache_alloc_node(){
tokio-runtime-w-1666[003]1318.058079:funcgraph_entry:1.167us|should_failslab();
tokio-runtime-w-1666[003]1318.058085:funcgraph_exit:7.625us|}
tokio-runtime-w-1666[003]1318.058088:funcgraph_entry:|kmem_cache_alloc_node(){
tokio-runtime-w-1666[003]1318.058089:funcgraph_entry:1.208us|should_failslab();
tokio-runtime-w-1666[003]1318.058092:funcgraph_exit:4.584us|}
tokio-runtime-w-1666[003]1318.058104:funcgraph_entry:|kmem_cache_free(){
tokio-runtime-w-1666[003]1318.058107:funcgraph_entry:2.084us|__slab_free();
tokio-runtime-w-1666[003]1318.058110:funcgraph_exit:5.667us|}
tokio-runtime-w-1666[003]1318.058112:funcgraph_entry:6.375us|insert_vmap_area.constprop.74();
tokio-runtime-w-1666[003]1318.058119:funcgraph_exit:+44.667us|}
tokio-runtime-w-1666[003]1318.058122:funcgraph_exit:+58.250us|}
tokio-runtime-w-1666[003]1318.058124:funcgraph_entry:|__kmalloc_node(){
tokio-runtime-w-1666[003]1318.058125:funcgraph_entry:1.625us|kmalloc_slab();
tokio-runtime-w-1666[003]1318.058128:funcgraph_entry:1.167us|should_failslab();
tokio-runtime-w-1666[003]1318.058131:funcgraph_exit:7.208us|}
tokio-runtime-w-1666[003]1318.058133:funcgraph_entry:|alloc_pages(){
tokio-runtime-w-1666[003]1318.058135:funcgraph_entry:1.583us|get_task_policy.part.48();
tokio-runtime-w-1666[003]1318.058138:funcgraph_entry:1.500us|policy_node();
tokio-runtime-w-1666[003]1318.058141:funcgraph_entry:1.209us|policy_nodemask();
tokio-runtime-w-1666[003]1318.058143:funcgraph_entry:|__alloc_pages(){
tokio-runtime-w-1666[003]1318.058145:funcgraph_entry:1.458us|should_fail_alloc_page();
tokio-runtime-w-1666[003]1318.058147:funcgraph_entry:|get_page_from_freelist(){
tokio-runtime-w-1666[003]1318.058150:funcgraph_entry:1.583us|prep_new_page();
tokio-runtime-w-1666[003]1318.058153:funcgraph_exit:5.459us|}
tokio-runtime-w-1666[003]1318.058154:funcgraph_exit:+10.542us|}
tokio-runtime-w-1666[003]1318.058155:funcgraph_exit:+22.083us|}
tokio-runtime-w-1666[003]1318.058157:funcgraph_entry:|__cond_resched(){
tokio-runtime-w-1666[003]1318.058158:funcgraph_entry:1.833us|rcu_all_qs();
tokio-runtime-w-1666[003]1318.058161:funcgraph_exit:4.167us|}
tokio-runtime-w-1666[003]1318.058166:funcgraph_entry:5.542us|vmap_pages_range_noflush();
tokio-runtime-w-1666[003]1318.058173:funcgraph_exit:!112.375us|}
tokio-runtime-w-1666[003]1318.058175:funcgraph_exit:!116.000us|}
tokio-runtime-w-1666[003]1318.058176:funcgraph_exit:!119.292us|}
tokio-runtime-w-1666[003]1318.058177:funcgraph_exit:!122.542us|}
tokio-runtime-w-1666[003]1318.058179:funcgraph_entry:|find_vm_area(){
tokio-runtime-w-1666[003]1318.058180:funcgraph_entry:1.375us|find_vmap_area();
tokio-runtime-w-1666[003]1318.058183:funcgraph_exit:4.333us|}
tokio-runtime-w-1666[003]1318.058185:funcgraph_entry:|set_memory_x(){
tokio-runtime-w-1666[003]1318.058186:funcgraph_entry:|change_memory_common(){
tokio-runtime-w-1666[003]1318.058188:funcgraph_entry:|find_vm_area(){
tokio-runtime-w-1666[003]1318.058189:funcgraph_entry:1.333us|find_vmap_area();
tokio-runtime-w-1666[003]1318.058192:funcgraph_exit:3.875us|}
tokio-runtime-w-1666[003]1318.058193:funcgraph_entry:|vm_unmap_aliases(){
tokio-runtime-w-1666[003]1318.058194:funcgraph_entry:|_vm_unmap_aliases.part.58(){
tokio-runtime-w-1666[003]1318.058196:funcgraph_entry:1.542us|rcu_read_unlock_strict();
tokio-runtime-w-1666[003]1318.058199:funcgraph_entry:1.208us|rcu_read_unlock_strict();
tokio-runtime-w-1666[003]1318.058202:funcgraph_entry:1.166us|rcu_read_unlock_strict();
tokio-runtime-w-1666[003]1318.058205:funcgraph_entry:1.208us|rcu_read_unlock_strict();
tokio-runtime-w-1666[003]1318.058207:funcgraph_entry:1.208us|mutex_lock();
tokio-runtime-w-1666[003]1318.058210:funcgraph_entry:|purge_fragmented_blocks_allcpus(){
tokio-runtime-w-1666[003]1318.058212:funcgraph_entry:1.500us|rcu_read_unlock_strict();
tokio-runtime-w-1666[003]1318.058214:funcgraph_entry:1.500us|rcu_read_unlock_strict();
tokio-runtime-w-1666[003]1318.058217:funcgraph_entry:1.500us|rcu_read_unlock_strict();
tokio-runtime-w-1666[003]1318.058220:funcgraph_entry:1.167us|rcu_read_unlock_strict();
tokio-runtime-w-1666[003]1318.058222:funcgraph_exit:+11.917us|}
tokio-runtime-w-1666[003]1318.058224:funcgraph_entry:|__purge_vmap_area_lazy(){
tokio-runtime-w-1666[003]1318.058232:funcgraph_entry:|kmem_cache_free(){
tokio-runtime-w-1666[003]1318.058234:funcgraph_entry:1.250us|__slab_free();
tokio-runtime-w-1666[003]1318.058237:funcgraph_exit:4.791us|}
tokio-runtime-w-1666[003]1318.058241:funcgraph_entry:1.209us|__cond_resched_lock();
tokio-runtime-w-1666[003]1318.058244:funcgraph_exit:+19.625us|}
tokio-runtime-w-1666[003]1318.058245:funcgraph_entry:1.167us|mutex_unlock();
tokio-runtime-w-1666[003]1318.058247:funcgraph_exit:+53.042us|}
tokio-runtime-w-1666[003]1318.058248:funcgraph_exit:+55.625us|}
tokio-runtime-w-1666[003]1318.058250:funcgraph_entry:|__change_memory_common(){
tokio-runtime-w-1666[003]1318.058251:funcgraph_entry:|apply_to_page_range(){
tokio-runtime-w-1666[003]1318.058253:funcgraph_entry:|__apply_to_page_range(){
tokio-runtime-w-1666[003]1318.058255:funcgraph_entry:1.250us|pud_huge();
tokio-runtime-w-1666[003]1318.058258:funcgraph_entry:1.166us|pmd_huge();
tokio-runtime-w-1666[003]1318.058260:funcgraph_entry:1.208us|change_page_range();
tokio-runtime-w-1666[003]1318.058263:funcgraph_exit:9.834us|}
tokio-runtime-w-1666[003]1318.058264:funcgraph_exit:+12.709us|}
tokio-runtime-w-1666[003]1318.058266:funcgraph_exit:+15.459us|}
tokio-runtime-w-1666[003]1318.058268:funcgraph_exit:+80.791us|}
tokio-runtime-w-1666[003]1318.058270:funcgraph_exit:+84.834us|}
tokio-runtime-w-1666[003]1318.058272:funcgraph_exit:!218.500us|}
tokio-runtime-w-1666[003]1318.058274:funcgraph_entry:|__alloc_percpu_gfp(){
tokio-runtime-w-1666[003]1318.058276:funcgraph_entry:|pcpu_alloc(){
tokio-runtime-w-1666[003]1318.058281:funcgraph_entry:2.250us|mutex_lock_killable();
tokio-runtime-w-1666[003]1318.058290:funcgraph_entry:|pcpu_find_block_fit(){
tokio-runtime-w-1666[003]1318.058293:funcgraph_entry:2.833us|pcpu_next_fit_region.constprop.38();
tokio-runtime-w-1666[003]1318.058299:funcgraph_exit:9.084us|}
tokio-runtime-w-1666[003]1318.058301:funcgraph_entry:|pcpu_alloc_area(){
tokio-runtime-w-1666[003]1318.058315:funcgraph_entry:4.000us|pcpu_block_update_hint_alloc();
tokio-runtime-w-1666[003]1318.058320:funcgraph_entry:2.208us|pcpu_chunk_relocate();
tokio-runtime-w-1666[003]1318.058324:funcgraph_exit:+22.625us|}
tokio-runtime-w-1666[003]1318.058327:funcgraph_entry:1.208us|mutex_unlock();
tokio-runtime-w-1666[003]1318.058332:funcgraph_entry:1.584us|pcpu_memcg_post_alloc_hook();
tokio-runtime-w-1666[003]1318.058335:funcgraph_exit:+58.833us|}
tokio-runtime-w-1666[003]1318.058336:funcgraph_exit:+61.834us|}
tokio-runtime-w-1666[003]1318.058338:funcgraph_entry:|kmem_cache_alloc_trace(){
tokio-runtime-w-1666[003]1318.058339:funcgraph_entry:1.167us|should_failslab();
tokio-runtime-w-1666[003]1318.058342:funcgraph_exit:4.458us|}
tokio-runtime-w-1666[003]1318.058359:funcgraph_entry:|bpf_image_ksym_add(){
tokio-runtime-w-1666[003]1318.058360:funcgraph_entry:|bpf_ksym_add(){
tokio-runtime-w-1666[003]1318.058363:funcgraph_entry:1.583us|__local_bh_enable_ip();
tokio-runtime-w-1666[003]1318.058366:funcgraph_exit:5.750us|}
tokio-runtime-w-1666[003]1318.058369:funcgraph_exit:9.834us|}
tokio-runtime-w-1666[003]1318.058371:funcgraph_entry:1.250us|arch_prepare_bpf_trampoline();
tokio-runtime-w-1666[003]1318.058373:funcgraph_entry:2.292us|kfree();
tokio-runtime-w-1666[003]1318.058377:funcgraph_exit:!348.625us|}
tokio-runtime-w-1666[003]1318.058379:funcgraph_entry:1.250us|mutex_unlock();
tokio-runtime-w-1666[003]1318.058382:funcgraph_exit:!363.167us|}
tokio-runtime-w-1666[003]1318.058384:funcgraph_entry:|bpf_link_cleanup(){
tokio-runtime-w-1666[003]1318.058386:funcgraph_entry:|bpf_link_free_id.part.30(){
tokio-runtime-w-1666[003]1318.058392:funcgraph_entry:|call_rcu(){
tokio-runtime-w-1666[003]1318.058396:funcgraph_entry:1.834us|rcu_segcblist_enqueue();
tokio-runtime-w-1666[003]1318.058401:funcgraph_exit:9.333us|}
tokio-runtime-w-1666[003]1318.058403:funcgraph_entry:1.542us|__local_bh_enable_ip();
tokio-runtime-w-1666[003]1318.058406:funcgraph_exit:+19.542us|}
tokio-runtime-w-1666[003]1318.058408:funcgraph_entry:|fput(){
tokio-runtime-w-1666[003]1318.058409:funcgraph_entry:|fput_many(){
tokio-runtime-w-1666[003]1318.058411:funcgraph_entry:|task_work_add(){
tokio-runtime-w-1666[003]1318.058414:funcgraph_entry:1.625us|kick_process();
tokio-runtime-w-1666[003]1318.058418:funcgraph_exit:6.750us|}
tokio-runtime-w-1666[003]1318.058419:funcgraph_exit:+10.333us|}
tokio-runtime-w-1666[003]1318.058420:funcgraph_exit:+12.708us|}
tokio-runtime-w-1666[003]1318.058422:funcgraph_entry:2.250us|put_unused_fd();
tokio-runtime-w-1666[003]1318.058426:funcgraph_exit:+41.416us|}
tokio-runtime-w-1666[003]1318.058428:funcgraph_entry:1.292us|mutex_unlock();
tokio-runtime-w-1666[003]1318.058430:funcgraph_entry:1.250us|kfree();
tokio-runtime-w-1666[003]1318.058433:funcgraph_exit:!567.458us|}
tokio-runtime-w-1666[003]1318.058435:funcgraph_entry:2.125us|__bpf_prog_put.isra.47();
tokio-runtime-w-1666[003]1318.058438:funcgraph_exit:!602.291us|}
tokio-runtime-w-1666[003]1318.058439:funcgraph_exit:!631.791us|}
```shell
這是`kernel/bpf/trampoline.c`中與最后執行的函數`bpf_trampoline_update`對應的源代碼:
```c
staticintbpf_trampoline_update(structbpf_trampoline*tr)
{
structbpf_tramp_image*im;
structbpf_tramp_progs*tprogs;
u32flags=BPF_TRAMP_F_RESTORE_REGS;
boolip_arg=false;
interr,total;

tprogs=bpf_trampoline_get_progs(tr,&total,&ip_arg);
if(IS_ERR(tprogs))
returnPTR_ERR(tprogs);

if(total==0){
err=unregister_fentry(tr,tr->cur_image->image);
bpf_tramp_image_put(tr->cur_image);
tr->cur_image=NULL;
tr->selector=0;
gotoout;
}

im=bpf_tramp_image_alloc(tr->key,tr->selector);
if(IS_ERR(im)){
err=PTR_ERR(im);
gotoout;
}

if(tprogs[BPF_TRAMP_FEXIT].nr_progs||
tprogs[BPF_TRAMP_MODIFY_RETURN].nr_progs)
flags=BPF_TRAMP_F_CALL_ORIG|BPF_TRAMP_F_SKIP_FRAME;

if(ip_arg)
flags|=BPF_TRAMP_F_IP_ARG;

err=arch_prepare_bpf_trampoline(im,im->image,im->image+PAGE_SIZE,
&tr->func.model,flags,tprogs,
tr->func.addr);
if(errcur_image&&tr->selector==0);
WARN_ON(!tr->cur_image&&tr->selector);
if(tr->cur_image)
/*progsalreadyrunningatthisaddress*/
err=modify_fentry(tr,tr->cur_image->image,im->image);
else
/*firsttimeregistering*/
err=register_fentry(tr,im->image);
if(err)
gotoout;
if(tr->cur_image)
bpf_tramp_image_put(tr->cur_image);
tr->cur_image=im;
tr->selector++;
out:
kfree(tprogs);
returnerr;
}

根據先前的輸出,我們可以看到:

tokio-runtime-w-1666[003]1318.058371:funcgraph_entry:1.250us|arch_prepare_bpf_trampoline();
tokio-runtime-w-1666[003]1318.058373:funcgraph_entry:2.292us|kfree();

在arch_prepare_bpf_trampoline和kfree函數之間沒有其他函數調用,所以很可能第一個函數在err變量中返回了錯誤代碼。讓我們來驗證一下!

通過以下方式在shell中啟動bpftace,我們可以捕獲arch_prepare_bpf_trampoline函數的返回值并將其打印到控制臺上:

bpftrace-e'kretprobe:arch_prepare_bpf_trampoline{printf("retvallink:%d
",retval);}'

并且在另一個終端中啟動probe后,我們從bpftace得到了以下輸出:

root@pine64-1:/home/exein#bpftrace-e'kretprobe:arch_prepare_bpf_trampoline{printf("retvallink:%d
",retval);}'
Attaching1probe...
retvallink:-524

這是因為內核5.15缺乏對aarch64架構的arch_prepare_bpf_trampoline實現,并使用了默認的占位符實現。

int__weak
arch_prepare_bpf_trampoline(structbpf_tramp_image*tr,void*image,void*image_end,
conststructbtf_func_model*m,u32flags,
structbpf_tramp_links*tlinks,
void*orig_call)
{
return-ENOTSUPP;
}

因此,這個功能在這個內核版本上是不受支持的。好消息是,多虧了這個補丁[7],它在6.x內核中得到了實現。

讓我們移步到6.x內核。

Linux 6.1

如果我們嘗試在內核 6.1 上運行 probe,我們會得到以下輸出:

root@pine64:/home/exein#./probefile-system-monitor
thread'main'panickedat'initializationfailed:ProgramAttachError{program:"lsmpath_mknod",program_error:SyscallError{call:"bpf_raw_tracepoint_open",io_error:Os{code:524,kind:Uncategorized,message:"Noerrorinformation"}}}',src/bin/probe.rs43
note:runwith`RUST_BACKTRACE=1`environmentvariabletodisplayabacktrace

對于內核版本6.1,我們仍然遇到了和5.15內核一樣的錯誤?。?!讓我們找出其中的原因。

這次在arch_prepare_bpf_trampoline上運行bpftrace,我們得到了以下輸出:

root@pine64:/home/exein#bpftrace-e'kretprobe:arch_prepare_bpf_trampoline{printf("retvaltplink:%d
",retval);}'
Attaching1probe...
retvaltplink:284

所以問題不在這里,這個函數不再返回錯誤了。讓我們回到函數調用圖。

這次我們啟動trace-cmd,跳過一些函數以獲得更清晰的輸出:

trace-cmdrecord
-pfunction_graph
-gbpf_trampoline_link_prog
-nbpf_jit_alloc_exec
-nkmalloc_trace
-narch_prepare_bpf_trampoline
-ngeneric_handle_domain_irq
-ndo_interrupt_handler
-nirq_exit_rcu
./probefile-system-monitor

我們從trace-cmd report中獲得以下輸出:

root@pine64:/home/exein#trace-cmdreport
CPU0isempty
CPU1isempty
CPU3isempty
cpus=4
tokio-runtime-w-11886[002]193385.056283:funcgraph_entry:|bpf_trampoline_link_prog(){
tokio-runtime-w-11886[002]193385.056321:funcgraph_entry:+15.042us|mutex_lock();
tokio-runtime-w-11886[002]193385.056373:funcgraph_entry:|__bpf_trampoline_link_prog(){
tokio-runtime-w-11886[002]193385.056395:funcgraph_entry:+14.833us|bpf_attach_type_to_tramp();
tokio-runtime-w-11886[002]193385.056428:funcgraph_entry:|bpf_trampoline_update.isra.23(){
tokio-runtime-w-11886[002]193385.056459:funcgraph_entry:2.917us|bpf_jit_charge_modmem();
tokio-runtime-w-11886[002]193385.056531:funcgraph_entry:|find_vm_area(){
tokio-runtime-w-11886[002]193385.056540:funcgraph_entry:3.000us|find_vmap_area();
tokio-runtime-w-11886[002]193385.056547:funcgraph_exit:+16.208us|}
tokio-runtime-w-11886[002]193385.056554:funcgraph_entry:|__alloc_percpu_gfp(){
tokio-runtime-w-11886[002]193385.056563:funcgraph_entry:|pcpu_alloc(){
tokio-runtime-w-11886[002]193385.056568:funcgraph_entry:4.875us|mutex_lock_killable();
tokio-runtime-w-11886[002]193385.056591:funcgraph_entry:|pcpu_find_block_fit(){
tokio-runtime-w-11886[002]193385.056599:funcgraph_entry:8.625us|pcpu_next_fit_region.constprop.38();
tokio-runtime-w-11886[002]193385.056608:funcgraph_exit:+17.166us|}
tokio-runtime-w-11886[002]193385.056610:funcgraph_entry:|pcpu_alloc_area(){
tokio-runtime-w-11886[002]193385.056639:funcgraph_entry:9.167us|pcpu_block_update();
tokio-runtime-w-11886[002]193385.056656:funcgraph_entry:7.667us|pcpu_block_update_hint_alloc();
tokio-runtime-w-11886[002]193385.056671:funcgraph_entry:7.750us|pcpu_chunk_relocate();
tokio-runtime-w-11886[002]193385.056679:funcgraph_exit:+69.667us|}
tokio-runtime-w-11886[002]193385.056682:funcgraph_entry:7.042us|mutex_unlock();
tokio-runtime-w-11886[002]193385.056703:funcgraph_entry:2.792us|pcpu_memcg_post_alloc_hook();
tokio-runtime-w-11886[002]193385.056712:funcgraph_exit:!148.709us|}
tokio-runtime-w-11886[002]193385.056719:funcgraph_exit:!165.250us|}
tokio-runtime-w-11886[002]193385.056866:funcgraph_entry:|bpf_image_ksym_add(){
tokio-runtime-w-11886[002]193385.056873:funcgraph_entry:|bpf_ksym_add(){
tokio-runtime-w-11886[002]193385.056882:funcgraph_entry:2.750us|__local_bh_disable_ip();
tokio-runtime-w-11886[002]193385.056897:funcgraph_entry:4.625us|__local_bh_enable_ip();
tokio-runtime-w-11886[002]193385.056905:funcgraph_exit:+32.459us|}
tokio-runtime-w-11886[002]193385.056922:funcgraph_entry:7.584us|perf_event_ksymbol();
tokio-runtime-w-11886[002]193385.056944:funcgraph_exit:+78.417us|}
tokio-runtime-w-11886[002]193385.057492:funcgraph_entry:|set_memory_ro(){
tokio-runtime-w-11886[002]193385.057501:funcgraph_entry:|change_memory_common(){
tokio-runtime-w-11886[002]193385.057504:funcgraph_entry:|find_vm_area(){
tokio-runtime-w-11886[002]193385.057506:funcgraph_entry:8.875us|find_vmap_area();
tokio-runtime-w-11886[002]193385.057518:funcgraph_exit:+14.250us|}
tokio-runtime-w-11886[002]193385.057522:funcgraph_entry:|__change_memory_common(){
tokio-runtime-w-11886[002]193385.057531:funcgraph_entry:|apply_to_page_range(){
tokio-runtime-w-11886[002]193385.057538:funcgraph_entry:|__apply_to_page_range(){
tokio-runtime-w-11886[002]193385.057544:funcgraph_entry:+12.791us|pud_huge();
tokio-runtime-w-11886[002]193385.057559:funcgraph_entry:2.708us|pmd_huge();
tokio-runtime-w-11886[002]193385.057574:funcgraph_entry:+15.125us|change_page_range();
tokio-runtime-w-11886[002]193385.057591:funcgraph_exit:+53.792us|}
tokio-runtime-w-11886[002]193385.057597:funcgraph_exit:+66.083us|}
tokio-runtime-w-11886[002]193385.057610:funcgraph_exit:+88.125us|}
tokio-runtime-w-11886[002]193385.057619:funcgraph_entry:|vm_unmap_aliases(){
tokio-runtime-w-11886[002]193385.057622:funcgraph_entry:|_vm_unmap_aliases.part.77(){
tokio-runtime-w-11886[002]193385.057625:funcgraph_entry:9.125us|mutex_lock();
tokio-runtime-w-11886[002]193385.057637:funcgraph_entry:3.084us|purge_fragmented_blocks_allcpus();
tokio-runtime-w-11886[002]193385.057643:funcgraph_entry:|__purge_vmap_area_lazy(){
tokio-runtime-w-11886[002]193385.057687:funcgraph_entry:|kmem_cache_free(){
tokio-runtime-w-11886[002]193385.057693:funcgraph_entry:+13.250us|__slab_free();
tokio-runtime-w-11886[002]193385.057705:funcgraph_exit:+18.750us|}
tokio-runtime-w-11886[002]193385.057718:funcgraph_entry:7.416us|__cond_resched_lock();
tokio-runtime-w-11886[002]193385.057733:funcgraph_exit:+90.042us|}
tokio-runtime-w-11886[002]193385.057741:funcgraph_entry:2.792us|mutex_unlock();
tokio-runtime-w-11886[002]193385.057747:funcgraph_exit:!124.666us|}
tokio-runtime-w-11886[002]193385.057749:funcgraph_exit:!130.291us|}
tokio-runtime-w-11886[002]193385.057756:funcgraph_entry:|__change_memory_common(){
tokio-runtime-w-11886[002]193385.057759:funcgraph_entry:|apply_to_page_range(){
tokio-runtime-w-11886[002]193385.057765:funcgraph_entry:|__apply_to_page_range(){
tokio-runtime-w-11886[002]193385.057768:funcgraph_entry:4.125us|pud_huge();
tokio-runtime-w-11886[002]193385.057778:funcgraph_entry:8.750us|pmd_huge();
tokio-runtime-w-11886[002]193385.057790:funcgraph_entry:4.625us|change_page_range();
tokio-runtime-w-11886[002]193385.057797:funcgraph_exit:+31.958us|}
tokio-runtime-w-11886[002]193385.057803:funcgraph_exit:+44.375us|}
tokio-runtime-w-11886[002]193385.057817:funcgraph_exit:+61.208us|}
tokio-runtime-w-11886[002]193385.057820:funcgraph_exit:!319.292us|}
tokio-runtime-w-11886[002]193385.057826:funcgraph_exit:!333.667us|}
tokio-runtime-w-11886[002]193385.057840:funcgraph_entry:|set_memory_x(){
tokio-runtime-w-11886[002]193385.057847:funcgraph_entry:|change_memory_common(){
tokio-runtime-w-11886[002]193385.057855:funcgraph_entry:|find_vm_area(){
tokio-runtime-w-11886[002]193385.057858:funcgraph_entry:2.917us|find_vmap_area();
tokio-runtime-w-11886[002]193385.057870:funcgraph_exit:+14.375us|}
tokio-runtime-w-11886[002]193385.057876:funcgraph_entry:|vm_unmap_aliases(){
tokio-runtime-w-11886[002]193385.057879:funcgraph_entry:|_vm_unmap_aliases.part.77(){
tokio-runtime-w-11886[002]193385.057882:funcgraph_entry:3.959us|mutex_lock();
tokio-runtime-w-11886[002]193385.057893:funcgraph_entry:3.000us|purge_fragmented_blocks_allcpus();
tokio-runtime-w-11886[002]193385.057900:funcgraph_entry:2.791us|__purge_vmap_area_lazy();
tokio-runtime-w-11886[002]193385.057907:funcgraph_entry:2.709us|mutex_unlock();
tokio-runtime-w-11886[002]193385.057913:funcgraph_exit:+33.708us|}
tokio-runtime-w-11886[002]193385.057915:funcgraph_exit:+43.000us|}
tokio-runtime-w-11886[002]193385.057922:funcgraph_entry:|__change_memory_common(){
tokio-runtime-w-11886[002]193385.057925:funcgraph_entry:|apply_to_page_range(){
tokio-runtime-w-11886[002]193385.057930:funcgraph_entry:|__apply_to_page_range(){
tokio-runtime-w-11886[002]193385.057933:funcgraph_entry:4.292us|pud_huge();
tokio-runtime-w-11886[002]193385.057945:funcgraph_entry:8.750us|pmd_huge();
tokio-runtime-w-11886[002]193385.057956:funcgraph_entry:3.958us|change_page_range();
tokio-runtime-w-11886[002]193385.058037:funcgraph_exit:+32.083us|}
tokio-runtime-w-11886[002]193385.058089:funcgraph_entry:7.667us|irq_enter_rcu();
tokio-runtime-w-11886[002]193385.058233:funcgraph_exit:!308.041us|}
tokio-runtime-w-11886[002]193385.058239:funcgraph_exit:!316.709us|}
tokio-runtime-w-11886[002]193385.058247:funcgraph_exit:!400.417us|}
tokio-runtime-w-11886[002]193385.058255:funcgraph_exit:!415.000us|}
tokio-runtime-w-11886[002]193385.058555:funcgraph_entry:8.250us|irq_enter_rcu();
tokio-runtime-w-11886[002]193385.058958:funcgraph_entry:|kallsyms_lookup_size_offset(){
tokio-runtime-w-11886[002]193385.058974:funcgraph_entry:+36.333us|get_symbol_pos();
tokio-runtime-w-11886[002]193385.059017:funcgraph_exit:+59.750us|}
tokio-runtime-w-11886[002]193385.059043:funcgraph_entry:|kfree(){
tokio-runtime-w-11886[002]193385.059057:funcgraph_entry:3.000us|__kmem_cache_free();
tokio-runtime-w-11886[002]193385.059065:funcgraph_exit:+22.833us|}
tokio-runtime-w-11886[002]193385.059073:funcgraph_exit:#2644.708us|}
tokio-runtime-w-11886[002]193385.059079:funcgraph_exit:#2706.292us|}
tokio-runtime-w-11886[002]193385.059095:funcgraph_entry:2.792us|mutex_unlock();
tokio-runtime-w-11886[002]193385.059101:funcgraph_exit:#2870.416us|}

這次程序已經通過了arch_prepare_bpf_trampoline、set_memory_ro和set_memory_x,我們看到的最后一個函數是kallsyms_lookup_size_offset。

正如我們在kernel/bpf/trampoline.c中的bpf_trampoline_update函數中所看到的,這里并沒有明確調用kallsyms_lookup_size_offset:

staticintbpf_trampoline_update(structbpf_trampoline*tr,boollock_direct_mutex)
{

//...OTHERCODE...

#ifdefCONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
again:
if((tr->flags&BPF_TRAMP_F_SHARE_IPMODIFY)&&
(tr->flags&BPF_TRAMP_F_CALL_ORIG))
tr->flags|=BPF_TRAMP_F_ORIG_STACK;
#endif

err=arch_prepare_bpf_trampoline(im,im->image,im->image+PAGE_SIZE,
&tr->func.model,tr->flags,tlinks,
tr->func.addr);
if(errimage,1);
set_memory_x((long)im->image,1);

WARN_ON(tr->cur_image&&tr->selector==0);
WARN_ON(!tr->cur_image&&tr->selector);
if(tr->cur_image)
/*progsalreadyrunningatthisaddress*/
err=modify_fentry(tr,tr->cur_image->image,im->image,lock_direct_mutex);
else
/*firsttimeregistering*/
err=register_fentry(tr,im->image);

#ifdefCONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
if(err==-EAGAIN){
/*-EAGAINfrombpf_tramp_ftrace_ops_func.Now
*BPF_TRAMP_F_SHARE_IPMODIFYisset,wecangeneratethe
*trampolineagain,andretryregister.
*/
/*resetfops->funcandfops->trampolineforre-register*/
tr->fops->func=NULL;
tr->fops->trampoline=0;

/*resetim->imagememoryattrforarch_prepare_bpf_trampoline*/
set_memory_nx((long)im->image,1);
set_memory_rw((long)im->image,1);
gotoagain;
}
#endif
if(err)
gotoout;

if(tr->cur_image)
bpf_tramp_image_put(tr->cur_image);
tr->cur_image=im;
tr->selector++;
out:
/*Ifanyerrorhappens,restorepreviousflags*/
if(err)
tr->flags=orig_flags;
kfree(tlinks);
returnerr;
}
```shell

>**注意:**`bpf_trampoline_update`的實現與之前的內核5.15稍有不同。

`kallsyms_lookup_size_offset`的調用被隱藏在另一個函數內部。我們在函數圖中看不到它,因為編譯器將其內聯了。

看起來`kallsyms_lookup_size_offset`是由`ftrace_location`調用的:
```c
unsignedlongftrace_location(unsignedlongip)
{
structdyn_ftrace*rec;
unsignedlongoffset;
unsignedlongsize;

rec=lookup_rec(ip,ip);
if(!rec){
if(!kallsyms_lookup_size_offset(ip,&size,&offset))
gotoout;

/*mapsym+0to__fentry__*/
if(!offset)
rec=lookup_rec(ip,ip+size-1);
}

if(rec)
returnrec->ip;

out:
return0;
}

ftrace_location被register_fentry調用,而register_fentry在調用ftrace_location之后,在struct bpf_trampoline *tr的fops字段上包含了一次檢查。

/*firsttimeregistering*/
staticintregister_fentry(structbpf_trampoline*tr,void*new_addr)
{
void*ip=tr->func.addr;
unsignedlongfaddr;
intret;

faddr=ftrace_location((unsignedlong)ip);
if(faddr){
if(!tr->fops)
return-ENOTSUPP;
tr->func.ftrace_managed=true;
}

if(bpf_trampoline_module_get(tr))
return-ENOENT;

if(tr->func.ftrace_managed){
ftrace_set_filter_ip(tr->fops,(unsignedlong)ip,0,1);
ret=register_ftrace_direct_multi(tr->fops,(long)new_addr);
}else{
ret=bpf_arch_text_poke(ip,BPF_MOD_CALL,NULL,new_addr);
}

if(ret)
bpf_trampoline_module_put(tr);
returnret;
}

確實,如果tr->fops為false,該函數將返回錯誤-ENOTSUPP。

讓我們找出tr->fops是在哪里初始化的。

如果我們是正確的,那么創建trampoline的地方應該在bpf_trampoline_lookup函數內部。

staticstructbpf_trampoline*bpf_trampoline_lookup(u64key)
{
structbpf_trampoline*tr;
structhlist_head*head;
inti;

mutex_lock(&trampoline_mutex);
head=&trampoline_table[hash_64(key,TRAMPOLINE_HASH_BITS)];
hlist_for_each_entry(tr,head,hlist){
if(tr->key==key){
refcount_inc(&tr->refcnt);
gotoout;
}
}
tr=kzalloc(sizeof(*tr),GFP_KERNEL);
if(!tr)
gotoout;
#ifdefCONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
tr->fops=kzalloc(sizeof(structftrace_ops),GFP_KERNEL);
if(!tr->fops){
kfree(tr);
tr=NULL;
gotoout;
}
tr->fops->private=tr;
tr->fops->ops_func=bpf_tramp_ftrace_ops_func;
#endif

tr->key=key;
INIT_HLIST_NODE(&tr->hlist);
hlist_add_head(&tr->hlist,head);
refcount_set(&tr->refcnt,1);
mutex_init(&tr->mutex);
for(i=0;iprogs_hlist[i]);
out:
mutex_unlock(&trampoline_mutex);
returntr;
}

在分配之后,只有在出現CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS標志時,才會填充trampoline的fops字段。這個標志依賴于HAVE_CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS標志,而這個標志在aarch64上不存在。

結論

當前情況下,由于缺少_ftrace直接調用_功能,無法在aarch64上使用BPF LSM。幸運的是,當前的mainline分支已經合并了一個補丁[8],該補丁將在aarch64上啟用LSMs(以及其他功能)。

預計這些變化將會在下一個6.4版的Linux內核中發布。

審核編輯:湯梓紅

聲明:本文內容及配圖由入駐作者撰寫或者入駐合作網站授權轉載。文章觀點僅代表作者本人,不代表電子發燒友網立場。文章及其配圖僅供工程師學習之用,如有內容侵權或者其他違規問題,請聯系本站處理。 舉報投訴
  • 內核
    +關注

    關注

    3

    文章

    1309

    瀏覽量

    39848
  • cpu
    cpu
    +關注

    關注

    68

    文章

    10443

    瀏覽量

    206566
  • Linux
    +關注

    關注

    87

    文章

    10990

    瀏覽量

    206736
  • 程序
    +關注

    關注

    114

    文章

    3631

    瀏覽量

    79544

原文標題:探索aarch64架構上使用ftrace的BPF LSM

文章出處:【微信號:LinuxDev,微信公眾號:Linux閱碼場】歡迎添加關注!文章轉載請注明出處。

收藏 人收藏

    評論

    相關推薦

    ARM-v8架構分析

    ARM-v8是在32位ARM架構上進行開發的,將被首先用于對擴展虛擬地址和64位數據處理技術有更高要求的產品領域,如企業應用、高檔消費電子產品。ARMv8架構包含兩個執行狀態:AArch64
    發表于 12-07 10:08

    ARMv8架構資料分享

    ,大大提升了處理器的性能。從目前的的了解來看,基本 ARMv8 與上代架構的差別是非常大的。除了 A64 指令集之外,還有許多地方都有較大改動,下面列出幾個目前比較關注的點:  · 執行狀態與異常級別
    發表于 03-21 14:50

    在ARMv8中aarch64aarch32是怎樣進行切換的

    32條件下進行編程。在EL3,設置EL2的架構aarch32,設置好返回地址,通過ERET指令,切換到EL2。對于A64代碼,使用aarch64編譯工具鏈進行編譯。對于A32代碼,使
    發表于 04-01 15:09

    談一談在AArch64架構下內核與用戶地址的隔離機制

    1、在 AArch64 架構下內核與用戶地址的隔離機制一般來說在操作系統之上會有多個應用程序或者任務同時運行。每一個任務都有自己獨立的頁表,在進程上下文切換的過程中,也會進行頁表的切換。然而,大部分
    發表于 04-13 17:27

    ARMv8架構概述

    Hypervisor,EL3用于Secure/Non-Secure的切換。Memory Management在AArch64,TBBR0和TBBR1分別用于指定user和kernel的頁表。最大支持48bit
    發表于 05-13 10:31

    在armv8架構中Arch32切換到Arch64是如何運作的

    各位大神,armv8架構中,如果Arch32要去切換到Arch64,是如何運作的?狀態會清空嗎?
    發表于 06-06 16:13

    如何在x86環境下基于Qemu和Docker快速搭建AARCH64開發環境

    ,官方 release 的 gcc 還不支持 SVE intrinsics ,但 github 的 gcc-mirror 倉包含了一個 aarch64/sve-acle-branch 的分支,通過
    發表于 07-11 15:18

    在ARM64架構下為啥沒有OpenJDK8的鏡像

    為什么需要ARM64架構的OpenJDK8的Docker鏡像對現有的Java應用,之前一直運行在x86處理器環境下,編譯和運行都是JDK8,如今在樹莓派的Docker環境運行(也可能是其他ARM環境
    發表于 07-12 15:57

    為何Arm 64位指令集架構AArch64)是移動設備中不可或缺的

    等,都只能在AArch64架構實現在不斷涌現和演化的移動應用場景(如混合現實,人工智能,機器學習,和網絡應用)中具備更好的性能表現單一運行時意味著更少的測試和維護工作量僅支持AArch64
    發表于 09-13 15:03

    AArch64異常模型指南

    AArch64異常模型指南介紹了Armv8-A中的異常和特權模型Armv9-A。它涵蓋了Arm體系結構中不同類型的異常,以及處理器與異常的關系。 這些內容面向底層代碼的開發人員,例如引導代碼或內核
    發表于 08-02 06:03

    AArch64自托管調試指南

    集成在Arm核心中的調試邏輯提供了觀察和控制CPU和系統環境,同時在深度嵌入式處理器執行軟件。手臂調試體系結構規范允許將調試邏輯合并到Arm體系結構中。 本指南介紹了調試,并介紹了AArch64
    發表于 08-02 10:05

    AArch64平臺上性能下降的例子

    編者按:目前許多公司同時使用 x86 和 AArch64 2 種主流的服務器。這兩種環境的算力相當,內存相同的情況下:相同版本的 JVM 和 Java 應用,相同的 JVM 參數,應用性
    的頭像 發表于 09-09 11:11 ?2035次閱讀

    AArch64寄存器介紹

    作為 RISC 架構,AArch64 提供了大量的通用寄存器。除通用寄存器之外,本節還會介紹特殊寄存器、系統控制寄存器、處理器狀態、函數調用標準。
    的頭像 發表于 08-24 09:57 ?4916次閱讀

    如何使用預裝程序創建并分發AArch64容器

    本文我們將探討如何使用預裝程序創建并分發 AArch64 容器。
    的頭像 發表于 09-30 10:57 ?810次閱讀

    最新的Linux aarch64 LSA驅動程序

    電子發燒友網站提供《最新的Linux aarch64 LSA驅動程序.zip》資料免費下載
    發表于 08-23 15:46 ?2次下載
    最新的Linux <b class='flag-5'>aarch64</b> LSA驅動程序
    亚洲欧美日韩精品久久_久久精品AⅤ无码中文_日本中文字幕有码在线播放_亚洲视频高清不卡在线观看
    <acronym id="s8ci2"><small id="s8ci2"></small></acronym>
    <rt id="s8ci2"></rt><rt id="s8ci2"><optgroup id="s8ci2"></optgroup></rt>
    <acronym id="s8ci2"></acronym>
    <acronym id="s8ci2"><center id="s8ci2"></center></acronym>