0%

riscv iommu 玄铁升级 移植

首先 probe 失败的问题

需要先添加 iommu capable 函数, 表明 IOMMU_CAP_CACHE_COHERENCY 能力为 true, 否则 probe 时会失败.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
static struct iommu_ops xt_iommu_ops = {
.capable = xt_iommu_capable,
}
static bool xt_iommu_capable(struct device *dev, enum iommu_cap cap)
{
switch (cap) {
case IOMMU_CAP_CACHE_COHERENCY:
return true;
case IOMMU_CAP_NOEXEC:
return true;
default:
return false;
}
}

打开 option allow_unsafe_interrupts=1
编译为动态模块时, insmod 会读取 /etc/modprobe.d/iommu_unsafe_interrupts.conf 文件, 将 allow_unsafe_interrupts 设置为 1.

1
options vfio_iommu_type1 allow_unsafe_interrupts=1

如果不是动态模块, 包含在 kernel Image 中, 则需要强制打开

1
static bool allow_unsafe_interrupts = 1;

其次 xt_iommu 在 base 版本上使用的一些接口在升级版本上发生了变化:
iommu_device_set_ops iommu_device_set_fwnode bus_set_iommu 不见了, 取而代之的是总结成了一个接口 iommu_device_register, 该接口支持的参数也有变化.
其次某些函数的参数个数意义变化:
xt_iommu_map xt_iommu_unmap , size 变成了 pgsize 和 pgcount.

需要相应的适配修改.

kernel 的相关 config 需要打开:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
CONFIG_IOMMU_IOVA=y
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y
CONFIG_IOMMU_DMA=y
unset CONFIG_IOMMUFD
CONFIG_VFIO=y
CONFIG_VFIO_CONTAINER=y
CONFIG_VFIO_IOMMU_TYPE1=y
CONFIG_VFIO_VIRQFD=y
CONFIG_VFIO_PCI_CORE=y
CONFIG_VFIO_PCI_MMAP=y
CONFIG_VFIO_PCI_INTX=y
CONFIG_VFIO_PCI=y
CONFIG_PCI=y
CONFIG_PCI_DOMAINS=y
CONFIG_PCI_DOMAINS_GENERIC=y
# kvm-mode 下支持vfio
CONFIG_KVM_VFIO=y

移植 iommu 后, 直通给 guest 的网卡 e1000e 报错, log 如下:
经过分析后, 得出结论, 网卡通过 dma 发包出现异常, 最后由 ndo_tx_timeout 回调了 e1000e 的 e1000_tx_timeout 函数将网卡重启.

由监控的 watchdog

1
2
3
WARN_ONCE(1, "NETDEV WATCHDOG: %s (%s): transmit queue %u timed out %u ms\n",
dev->name, netdev_drivername(dev), i, timedout_ms);
dev->netdev_ops->ndo_tx_timeout(dev, i);

打印出了下面的堆栈:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
[   42.096669] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out 9960 ms
[ 42.248201] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x224/0x228
[ 42.250411] Modules linked in:
[ 42.250692] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.4.0-rc1-00026-g80e62bc8487b-dirty #91
[ 42.250818] Hardware name: riscv-virtio,qemu (DT)
[ 42.250926] epc : dev_watchdog+0x224/0x228
[ 42.251014] ra : dev_watchdog+0x224/0x228
[ 42.251042] epc : ffffffff806e48dc ra : ffffffff806e48dc sp : ff20000000693b90
[ 42.251058] gp : ffffffff814a2b00 tp : ff600000018cd240 t0 : ffffffff8082fba0
[ 42.251074] t1 : 0720072007200720 t2 : 4157205645445445 s0 : ff20000000693c00
[ 42.251089] s1 : ff600000024bc498 a0 : 0000000000000042 a1 : ffffffff8147e108
[ 42.251103] a2 : 00000000ffffefff a3 : fffffffffffffffe a4 : b6182cbfc7d9c400
[ 42.251117] a5 : b6182cbfc7d9c400 a6 : 0000000000000050 a7 : ffffffff8048d172
[ 42.251131] s2 : ff600000024bc000 s3 : ff600000018e3a00 s4 : ff600000024bc3e0
[ 42.251145] s5 : 0000000000000000 s6 : ffffffff8140a980 s7 : 00000000000026e8
[ 42.251159] s8 : ffffffff80c15470 s9 : ffffffff81408088 s10: 0000000000000101
[ 42.251173] s11: 0000000000000282 t3 : ff60000001818f00 t4 : ff60000001818f00
[ 42.251186] t5 : ff60000001818000 t6 : ff20000000693978
[ 42.251199] status: 0000000200000120 badaddr: 0000000000000000 cause: 0000000000000003
[ 42.251303] [<ffffffff806e48dc>] dev_watchdog+0x224/0x228
[ 42.251402] [<ffffffff800940e4>] call_timer_fn.constprop.0+0x14/0x5e
[ 42.252827] [<ffffffff800941b8>] expire_timers+0x8a/0xbc
[ 42.252840] [<ffffffff80094782>] run_timer_softirq+0xe0/0x202
[ 42.252850] [<ffffffff80844460>] __do_softirq+0x100/0x27e
[ 42.256472] [<ffffffff80024a9c>] __irq_exit_rcu+0xa8/0xde
[ 42.258214] [<ffffffff80024bb6>] irq_exit_rcu+0xc/0x14
[ 42.258389] [<ffffffff8083c238>] do_irq+0x6c/0x86
[ 42.258543] [<ffffffff800035ac>] ret_from_exception+0x0/0x64
[ 42.258646] [<ffffffff8083cb2a>] default_idle_call+0x26/0x34
[ 42.259488] [<ffffffff8005a7ee>] do_idle+0x206/0x226
[ 42.259983] [<ffffffff8005a970>] cpu_startup_entry+0x1a/0x1c
[ 42.259997] [<ffffffff8000618a>] handle_IPI+0x0/0xfc
[ 45.988551] e1000e 0000:00:01.0 eth0: Reset adapter unexpectedly
[ 46.743848] e1000e 0000:00:01.0 eth0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx

首先状态上来说, 中断是正常的, 但是 dma remapping 不正常.
在 riscv 基础版本上 (未开启 AIA), 只支持 intx 线中断模式
e1000_intr 中断处理函数可以正常触发.

再由不同版本对比后, 发现 qemu (host) / qemu (guest) 以及 kernel (guest) 为不变量, 变量仅有 kernel (host) 从 5.10 版本升级到了 6.4 版本.
在 kernel 变更版本后, 相对应的 vfio 框架发生了一些变化, 在确认 5.10 -> 6.4 iommu / vfio 相关的 config 都相同时, 无明显的其他异常 log.
无明显的排查点, 再往下追只能根据代码行为去正向跟踪.

qemu xmmuv1 模拟行为

正常的 log, 未开启虚拟化时, host 中对 e1000e 网卡的处理就经由了 dma, 而在将 kernel 版本升级到 6.4 后, 却没触发对应的 xmmuv1_translate.
说明 base 版本上开了 e1000e 网卡的 iommu 支持, 而升级版本后关闭了网卡的 iommu 支持.
未开启虚拟化时, 还没有涉及到 vfio 的特性, 所以应重点排查 iommu 相关的 feature.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#0  0x000055555591fa12 in xmmuv1_translate (mr=<optimized out>, addr=4294963200, flag=<optimized out>, iommu_idx=<optimized out>) at ../hw/riscv/xmmuv1.c:186
#1 0x00005555559fe0b6 in address_space_translate_iommu (iommu_mr=0x555557025af0, xlat=xlat@entry=0x7fffe9c61d40, plen_out=plen_out@entry=0x7fffe9c61d38, page_mask_out=page_mask_out@entry=0x0, is_write=is_write@entry=false, is_mmio=true, target_as=0x7fffe9c61cc8, attrs=...) at ../softmmu/physmem.c:435
#2 0x00005555559fe340 in flatview_do_translate (fv=fv@entry=0x7ffee830ab40, addr=addr@entry=4294963200, xlat=xlat@entry=0x7fffe9c61d40, plen_out=plen_out@entry=0x7fffe9c61d38, page_mask_out=page_mask_out@entry=0x0, is_write=false, is_mmio=true, target_as=0x7fffe9c61cc8, attrs=...) at ../softmmu/physmem.c:508
#3 0x00005555559feed1 in flatview_translate (fv=fv@entry=0x7ffee830ab40, addr=addr@entry=4294963200, xlat=xlat@entry=0x7fffe9c61d40, plen=plen@entry=0x7fffe9c61d38, is_write=is_write@entry=false, attrs=...) at ../softmmu/physmem.c:568
#4 0x0000555555a02b91 in flatview_read (fv=0x7ffee830ab40, addr=addr@entry=4294963200, attrs=attrs@entry=..., buf=buf@entry=0x7fffe9c61e10, len=len@entry=16) at ../softmmu/physmem.c:2753
#5 0x0000555555a02cf8 in address_space_read_full (as=0x7fffe8416250, addr=addr@entry=4294963200, attrs=..., buf=buf@entry=0x7fffe9c61e10, len=len@entry=16) at ../softmmu/physmem.c:2770
#6 0x0000555555a02e31 in address_space_rw (as=<optimized out>, addr=addr@entry=4294963200, attrs=..., attrs@entry=..., buf=buf@entry=0x7fffe9c61e10, len=len@entry=16, is_write=is_write@entry=false) at ../softmmu/physmem.c:2798
#7 0x00005555557bd7e9 in dma_memory_rw_relaxed (attrs=..., dir=DMA_DIRECTION_TO_DEVICE, len=16, buf=0x7fffe9c61e10, addr=4294963200, as=<optimized out>) at /home/liguang/program/riscv-lab/qemu/include/sysemu/dma.h:87
#8 0x00005555557bd7e9 in dma_memory_rw (attrs=..., dir=DMA_DIRECTION_TO_DEVICE, len=16, buf=0x7fffe9c61e10, addr=4294963200, as=<optimized out>) at /home/liguang/program/riscv-lab/qemu/include/sysemu/dma.h:130
#9 0x00005555557bd7e9 in pci_dma_rw (attrs=..., dir=DMA_DIRECTION_TO_DEVICE, len=16, buf=0x7fffe9c61e10, addr=<optimized out>, dev=<optimized out>) at /home/liguang/program/riscv-lab/qemu/include/hw/pci/pci_device.h:233
#10 0x00005555557bd7e9 in pci_dma_read (len=16, buf=0x7fffe9c61e10, addr=<optimized out>, dev=<optimized out>) at /home/liguang/program/riscv-lab/qemu/include/hw/pci/pci_device.h:252
#11 0x00005555557bd7e9 in e1000e_start_xmit (core=0x7fffe8418eb0, txr=txr@entry=0x7fffe9c61e80) at ../hw/net/e1000e_core.c:944
#12 0x00005555557bdda6 in e1000e_set_tdt (core=<optimized out>, index=<optimized out>, val=<optimized out>) at ../hw/net/e1000e_core.c:2456
#13 0x00005555557c1130 in e1000e_core_write (core=0x7fffe8418eb0, addr=<optimized out>, val=1, size=4) at ../hw/net/e1000e_core.c:3280
#14 0x00005555557b7b66 in e1000e_mmio_write (opaque=<optimized out>, addr=<optimized out>, val=<optimized out>, size=<optimized out>) at ../hw/net/e1000e.c:112
#15 0x00005555559f9060 in memory_region_write_accessor (mr=0x7fffe8418a60, addr=14360, value=<optimized out>, size=4, shift=<optimized out>, mask=<optimized out>, attrs=...) at ../softmmu/memory.c:493
#16 0x00005555559f8a8b in access_with_adjusted_size (addr=addr@entry=14360, value=value@entry=0x7fffe9c62038, size=size@entry=4, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=access_fn@entry=0x5555559f8ffe <memory_region_write_accessor>, mr=0x7fffe8418a60, attrs=...) at ../softmmu/memory.c:569
#17 0x00005555559f8d58 in memory_region_dispatch_write (mr=mr@entry=0x7fffe8418a60, addr=addr@entry=14360, data=<optimized out>, data@entry=1, op=op@entry=MO_32, attrs=...) at ../softmmu/memory.c:1533
#18 0x0000555555a3d245 in io_writex (env=env@entry=0x55555669e430, full=0x7ffef0d6cf38, mmu_idx=1, val=val@entry=1, addr=18446743867826518040, retaddr=retaddr@entry=140736773646269, op=MO_32) at ../accel/tcg/cputlb.c:1435
#19 0x0000555555a40cf1 in do_st_4 (ra=140736773646269, memop=<optimized out>, mmu_idx=<optimized out>, val=1, p=0x7fffe9c62140, env=0x55555669e430) at ../accel/tcg/cputlb.c:2772
#20 0x0000555555a40cf1 in do_st4_mmu (env=0x55555669e430, addr=<optimized out>, val=1, oi=<optimized out>, ra=140736773646269) at ../accel/tcg/cputlb.c:2850
#21 0x0000555555a42727 in helper_stl_mmu (env=<optimized out>, addr=<optimized out>, val=<optimized out>, oi=<optimized out>, retaddr=<optimized out>) at ../accel/tcg/cputlb.c:2866
  1. 怀疑点 CONFIG_IOMMU_DMA, 这个开关在 base 和升级版本上都开了.

升级版本的开机日志:
dmesg | grep -E “DMAR|IOMMU”

1
2
[    1.156855] Failed to set up IOMMU for device Fixed MDIO bus.0; retaining platform DMA ops
[ 1.102567] Failed to set up IOMMU for device riscv-pmu; retaining platform DMA ops

base 版本中的开机日志中并没有上述异常.

先看下这处异常:

1
2
3
4
5
6
7
8
9
struct iommu_domain *iommu_get_domain_for_dev(struct device *dev)
{
struct iommu_domain *domain;
struct iommu_group *group;
group = iommu_group_get(dev);
// error: 此处domain 为 null, 未找到 dev->group->domain
domain = group->domain;
return domain;
}

该 domain 应该来自于 iommu driver 设置的 default_domain
跟踪堆栈, 发现 xt_iommu driver 分配 default_domain 时失败了.

1
2
3
4
5
6
7
#0  xt_iommu_domain_alloc (type=<optimized out>) at ../drivers/iommu/xuantie-iommu.c:365
#1 0xffffffff804f7444 in __iommu_domain_alloc (bus=0xffffffff8150c0a0 <platform_bus_type>, type=<optimized out>) at ../drivers/iommu/iommu.c:1987
#2 0xffffffff804f754c in iommu_group_alloc_default_domain (bus=bus@entry=0xffffffff8150c0a0 <platform_bus_type>, group=group@entry=0xff60000080015e00, type=<optimized out>) at ../drivers/iommu/iommu.c:1667
#3 0xffffffff804f7d46 in probe_alloc_default_domain (group=0xff60000080015e00, bus=0xffffffff8150c0a0 <platform_bus_type>) at ../drivers/iommu/iommu.c:1819
#4 bus_iommu_probe (bus=bus@entry=0xffffffff8150c0a0 <platform_bus_type>) at ../drivers/iommu/iommu.c:1882
#5 0xffffffff804f7e24 in iommu_device_register (iommu=0xff600000801f3d20, ops=ops@entry=0xffffffff8150b120 <xt_iommu_ops>, hwdev=hwdev@entry=0xff600000802c1010) at ../drivers/iommu/iommu.c:245
#6 0xffffffff80837748 in xt_iommu_device_probe (pdev=0xff600000802c1000) at ../drivers/iommu/xuantie-iommu.c:718

__iommu_domain_alloc 函数中, 这个地方退出了, 而 base 版本没有这个逻辑.

1
2
3
4
if (iommu_is_dma_domain(domain) && iommu_get_dma_cookie(domain)) {
iommu_domain_free(domain);
domain = NULL;
}

将这段注释掉后, 继而需要将 pci_bus_type 注册 iommu_ops, 但其他的 bus 类型 (platform_bus_type) 不能注册, 会导致异常, base 版本上的 xtiommu 只支持 pci_bus.
但升级版本已经没有 bus_set_iommu 相关的函数了, 只能改代码进行定制.

在改完后, 网卡在进行 dma 操作时, 已经使用 xtiommu, 但仍出现了异常.
还是表现在地址翻译时, 触发的地址 iova 不正常.

正常的 log:
xmmuv1_translate addr 4294963200
xmmuv1_translate addr 4294963201
xmmuv1_translate addr 4294963202
xmmuv1_translate addr 4294963203

而异常的 log:
xmmuv1_translate addr 4294963200
xmmuv1_translate addr 4294963204
xmmuv1_translate addr 4294963208
xmmuv1_translate addr 4294963212
退出

堆栈:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#0  0x000055555591f75b in xmmuv1_translate (mr=0x555557025af0, addr=4294963201, flag=IOMMU_RO, iommu_idx=0) at ../hw/riscv/xmmuv1.c:104
#1 0x00005555559fe09a in address_space_translate_iommu (iommu_mr=0x555557025af0, xlat=xlat@entry=0x7ffef59fdcc8, plen_out=plen_out@entry=0x7ffef59fdd20, page_mask_out=page_mask_out@entry=0x0, is_write=is_write@entry=false, is_mmio=true, target_as=0x7ffef59fdc38, attrs=...) at ../softmmu/physmem.c:435
#2 0x00005555559fe324 in flatview_do_translate (fv=fv@entry=0x7ffef0252180, addr=addr@entry=4294963201, xlat=xlat@entry=0x7ffef59fdcc8, plen_out=plen_out@entry=0x7ffef59fdd20, page_mask_out=page_mask_out@entry=0x0, is_write=false, is_mmio=true, target_as=0x7ffef59fdc38, attrs=...) at ../softmmu/physmem.c:508
#3 0x00005555559feeb5 in flatview_translate (fv=fv@entry=0x7ffef0252180, addr=addr@entry=4294963201, xlat=xlat@entry=0x7ffef59fdcc8, plen=plen@entry=0x7ffef59fdd20, is_write=is_write@entry=false, attrs=..., attrs@entry=...) at ../softmmu/physmem.c:568
#4 0x0000555555a02aa9 in flatview_read_continue (fv=fv@entry=0x7ffef0252180, addr=4294963201, addr@entry=4294963200, attrs=..., ptr=ptr@entry=0x7ffef59fde10, len=15, len@entry=16, addr1=<optimized out>, l=<optimized out>, mr=0x55555656aee0) at ../softmmu/physmem.c:2738
#5 0x0000555555a02baf in flatview_read (fv=0x7ffef0252180, addr=addr@entry=4294963200, attrs=attrs@entry=..., buf=buf@entry=0x7ffef59fde10, len=len@entry=16) at ../softmmu/physmem.c:2757
#6 0x0000555555a02cdc in address_space_read_full (as=0x7fffe8416250, addr=addr@entry=4294963200, attrs=..., buf=buf@entry=0x7ffef59fde10, len=len@entry=16) at ../softmmu/physmem.c:2770
#7 0x0000555555a02e15 in address_space_rw (as=<optimized out>, addr=addr@entry=4294963200, attrs=..., attrs@entry=..., buf=buf@entry=0x7ffef59fde10, len=len@entry=16, is_write=is_write@entry=false) at ../softmmu/physmem.c:2798
#8 0x00005555557bd7e9 in dma_memory_rw_relaxed (attrs=..., dir=DMA_DIRECTION_TO_DEVICE, len=16, buf=0x7ffef59fde10, addr=4294963200, as=<optimized out>) at /home/liguang/program/riscv-lab/qemu/include/sysemu/dma.h:87
#9 0x00005555557bd7e9 in dma_memory_rw (attrs=..., dir=DMA_DIRECTION_TO_DEVICE, len=16, buf=0x7ffef59fde10, addr=4294963200, as=<optimized out>) at /home/liguang/program/riscv-lab/qemu/include/sysemu/dma.h:130
#10 0x00005555557bd7e9 in pci_dma_rw (attrs=..., dir=DMA_DIRECTION_TO_DEVICE, len=16, buf=0x7ffef59fde10, addr=<optimized out>, dev=<optimized out>) at /home/liguang/program/riscv-lab/qemu/include/hw/pci/pci_device.h:233
#11 0x00005555557bd7e9 in pci_dma_read (len=16, buf=0x7ffef59fde10, addr=<optimized out>, dev=<optimized out>) at /home/liguang/program/riscv-lab/qemu/include/hw/pci/pci_device.h:252
#12 0x00005555557bd7e9 in e1000e_start_xmit (core=0x7fffe8418eb0, txr=txr@entry=0x7ffef59fde80) at ../hw/net/e1000e_core.c:944
#13 0x00005555557bdda6 in e1000e_set_tdt (core=<optimized out>, index=<optimized out>, val=<optimized out>) at ../hw/net/e1000e_core.c:2456
#14 0x00005555557c1130 in e1000e_core_write (core=0x7fffe8418eb0, addr=<optimized out>, val=1, size=4) at ../hw/net/e1000e_core.c:3280
#15 0x00005555557b7b66 in e1000e_mmio_write (opaque=<optimized out>, addr=<optimized out>, val=<optimized out>, size=<optimized out>) at ../hw/net/e1000e.c:112

在 flatview_read_continue 处追踪, 发现步长来自于翻译结果, 进一步跟踪发现
xt iommu 在第一级地址转换处出问题了 riscv_one_stage, 地址翻译出错了.

qemu 中的 xt iommu 是未改动的, 且在 base 版本和升级版本都是同一份, 所以问题应该出现在 map 的地方.

跟踪 kernel 中 xt_iommu_map 的过程

1
2
3
4
5
6
7
8
9
10
11
12
#0  xt_iommu_map (domain=0xff6000008023a6b0, iova=4294963200, paddr=4397146112, pgsize=4096, pgcount=1, iommu_prot=7, gfp=3520, mapped=0xff20000000d13218) at ../drivers/iommu/xuantie-iommu.c:70
#1 0xffffffff804f56bc in __iommu_map_pages (mapped=0xff20000000d13218, gfp=3520, prot=7, size=4096, paddr=<optimized out>, iova=4294963200, domain=0xff6000008023a6b0) at ../drivers/iommu/iommu.c:2359
#2 __iommu_map (domain=domain@entry=0xff6000008023a6b0, iova=iova@entry=4294963200, paddr=<optimized out>, paddr@entry=4397146112, size=size@entry=4096, prot=prot@entry=7, gfp=gfp@entry=3520) at ../drivers/iommu/iommu.c:2405
#3 0xffffffff804f58f4 in iommu_map_sg (domain=domain@entry=0xff6000008023a6b0, iova=iova@entry=4294963200, sg=sg@entry=0xff600000822a9ac0, nents=<optimized out>, prot=prot@entry=7, gfp=gfp@entry=3520) at ../drivers/iommu/iommu.c:2560
#4 0xffffffff804f9644 in __iommu_dma_alloc_noncontiguous (dev=dev@entry=0xff600000802890c8, size=size@entry=4096, sgt=sgt@entry=0xff20000000d133f8, gfp=3520, attrs=attrs@entry=0, prot=...) at ../drivers/iommu/dma-iommu.c:846
#5 0xffffffff804fa042 in iommu_dma_alloc_remap (attrs=0, prot=..., gfp=<optimized out>, dma_handle=0xff600000808ce990, size=4096, dev=0xff600000802890c8) at ../drivers/iommu/dma-iommu.c:872
#6 iommu_dma_alloc (dev=0xff600000802890c8, size=4096, handle=0xff600000808ce990, gfp=<optimized out>, attrs=0) at ../drivers/iommu/dma-iommu.c:1462
#7 0xffffffff80088072 in dma_alloc_attrs (dev=0xff600000802890c8, size=4096, dma_handle=dma_handle@entry=0xff600000808ce990, flag=flag@entry=3264, attrs=attrs@entry=0) at ../kernel/dma/mapping.c:522
#8 0xffffffff805aea92 in dma_alloc_coherent (gfp=3264, dma_handle=0xff600000808ce990, size=<optimized out>, dev=<optimized out>) at ../include/linux/dma-mapping.h:423
#9 e1000_alloc_ring_dma (adapter=0xff600000811a4980, adapter=0xff600000811a4980, ring=0xff600000808ce980) at ../drivers/net/ethernet/intel/e1000e/netdev.c:2317
#10 e1000e_setup_tx_resources (tx_ring=0xff600000808ce980) at ../drivers/net/ethernet/intel/e1000e/netdev.c:2345
#11 0xffffffff805b044a in e1000e_open (netdev=0xff600000811a4000) at ../drivers/net/ethernet/intel/e1000e/netdev.c:4630

追踪 map 的过程, 发现 xt iommu 驱动代码中做 mmu 映射时, 启用的 sv39 mode, 第一级的 pgd_shift 错误了使用了系统中定义的
PGDIR_SHIFT

这个值是跟着系统的页表模式走的, 当前系统使用了 sv57 mode 5 级页表, 所以这个值是 48, 而 sv39 mode 这个值应该是 30.

1
2
3
4
for (i = 0; i < loop; ++i) {
vpn2_idx = (iova >> PGDIR_SHIFT) & PAGE_TABLE_LEVEL_MASK;
vpn1_idx = (iova >> PMD_SHIFT) & PAGE_TABLE_LEVEL_MASK;
vpn0_idx = (iova >> PAGE_SHIFT) & PAGE_TABLE_LEVEL_MASK;

将 PGDIR_SHIFT 修改为 PGDIR39_SHIFT (30) 后
再进行测试, 发现 host 中启用 xt iommu 进行 dma 的寻址终于正常了.

host 中正常后, 再进行 guest 直通网卡测试
发现 guest kvm-mode 下直通的 e1000e 网卡也正常工作了