GPUDirect Storage(GDS)安装与使用
GPUDirect Storage(GDS)安装与使用
1 GDS 安装
1.1 在宿主机上配置
在宿主机上的环境配置及安装可以参考以下文章:
- GPUDirect Storage的部署与启用条件需求
- GPUDirect Storage Installation and Troubleshooting Guide
- GPUDirect Storage Release Notes
- Best Practices Guide
安装完成后,验证GDS是否安装成功:
1
/usr/local/cuda-<x>.<y>/gds/tools/gdscheck.py -p
示例输出:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
xjhui@gpu-2:/usr/local/cuda-12.8/gds/tools$ ./gdscheck -p
warn: error opening log file: Permission denied, logging will be disabled
GDS release version: 1.13.0.11
nvidia_fs version: 2.17 libcufile version: 2.12
Platform: x86_64
============
ENVIRONMENT:
============
=====================
DRIVER CONFIGURATION:
=====================
NVMe P2PDMA : Unsupported
NVMe : Supported
NVMeOF : Unsupported
SCSI : Unsupported
ScaleFlux CSD : Unsupported
NVMesh : Unsupported
DDN EXAScaler : Unsupported
IBM Spectrum Scale : Unsupported
NFS : Supported
BeeGFS : Unsupported
WekaFS : Unsupported
Userspace RDMA : Unsupported
--Mellanox PeerDirect : Enabled
--rdma library : Not Loaded (libcufile_rdma.so)
--rdma devices : Not configured
--rdma_device_status : Up: 0 Down: 0
=====================
CUFILE CONFIGURATION:
=====================
properties.use_pci_p2pdma : false
properties.use_compat_mode : true
properties.force_compat_mode : false
properties.gds_rdma_write_support : true
properties.use_poll_mode : false
properties.poll_mode_max_size_kb : 4
properties.max_batch_io_size : 128
properties.max_batch_io_timeout_msecs : 5
properties.max_direct_io_size_kb : 1024
properties.max_device_cache_size_kb : 131072
properties.max_device_pinned_mem_size_kb : 18014398509481980
properties.posix_pool_slab_size_kb : 4 1024 16384
properties.posix_pool_slab_count : 128 64 64
properties.rdma_peer_affinity_policy : RoundRobin
properties.rdma_dynamic_routing : 0
fs.generic.posix_unaligned_writes : false
fs.lustre.posix_gds_min_kb: 0
fs.beegfs.posix_gds_min_kb: 0
fs.weka.rdma_write_support: false
fs.gpfs.gds_write_support: false
fs.gpfs.gds_async_support: true
profile.nvtx : false
profile.cufile_stats : 0
miscellaneous.api_check_aggressive : false
execution.max_io_threads : 0
execution.max_io_queue_depth : 128
execution.parallel_io : false
execution.min_io_threshold_size_kb : 1024
execution.max_request_parallelism : 0
properties.force_odirect_mode : false
properties.prefer_iouring : false
=========
GPU INFO:
=========
GPU index 0 NVIDIA H100 80GB HBM3 bar:1 bar size (MiB):131072 supports GDS, IOMMU State: Disabled
GPU index 1 NVIDIA H100 80GB HBM3 bar:1 bar size (MiB):131072 supports GDS, IOMMU State: Disabled
GPU index 2 NVIDIA H100 80GB HBM3 bar:1 bar size (MiB):131072 supports GDS, IOMMU State: Disabled
GPU index 3 NVIDIA H100 80GB HBM3 bar:1 bar size (MiB):131072 supports GDS, IOMMU State: Disabled
GPU index 4 NVIDIA H100 80GB HBM3 bar:1 bar size (MiB):131072 supports GDS, IOMMU State: Disabled
GPU index 5 NVIDIA H100 80GB HBM3 bar:1 bar size (MiB):131072 supports GDS, IOMMU State: Disabled
GPU index 6 NVIDIA H100 80GB HBM3 bar:1 bar size (MiB):131072 supports GDS, IOMMU State: Disabled
GPU index 7 NVIDIA H100 80GB HBM3 bar:1 bar size (MiB):131072 supports GDS, IOMMU State: Disabled
==============
PLATFORM INFO:
==============
IOMMU: disabled
Nvidia Driver Info Status: Supported(Nvidia Open Driver Installed)
Cuda Driver Version Installed: 12080
Platform: R8868 G13, Arch: x86_64(Linux 5.15.0-105-generic)
Platform verification succeeded
有 Supported 则说明 GDS 配置成功吗,且有支持的文件系统。注意 IOMMU 需要是 disabled。
接着挂载文件,例如:
1
2
3
4
# 挂载文件
mount -t nfs -o vers=3,nolock,proto=rdma,nconnect=8,port=20049 10.10.10.119:/test0908 /mnt/gds_rdma
# 查看挂载是否成功 rdma/nvme/nfs...
mount | grep rdma
1.2 在容器上配置
https://github.com/NVIDIA/MagnumIO/blob/main/gds/docker/gds-run-container
在宿主机上安装好GDS后,一般会在容器中使用。在执行docker run的时候同样要配置一些参数。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
docker run \
--gpus all \
--ipc=host \
--network=host \
--ulimit memlock=-1 \
--privileged \
--cap-add SYS_ADMIN \
--cap-add=IPC_LOCK \
--device /dev/infiniband/rdma_cm \
--device /dev/infiniband/uverbs* \
--device /dev/nvidia-fs* \
--device /dev/infiniband/umad* \
-v /usr/lib/x86_64-linux-gnu/libmlx5.so.1:/usr/lib/x86_64-linux-gnu/libmlx5.so.1:ro \
-v /usr/lib/x86_64-linux-gnu/libibverbs.so.1:/usr/lib/x86_64-linux-gnu/libibverbs.so.1:ro \
-v /usr/lib/x86_64-linux-gnu/librdmacm.so.1:/usr/lib/x86_64-linux-gnu/librdmacm.so.1:ro \
-v /usr/lib/x86_64-linux-gnu/rsocket/librspreload.so.1:/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so.1:ro \
-v /sys/class/infiniband:/sys/class/infiniband:ro \
-v /sys/class/net:/sys/class/net:ro \
-v /usr/local/cuda-12.8/targets/x86_64-linux/lib/libcufile.so.0:/usr/local/cuda/targets/x86_64-linux/lib/libcufile.so.0:ro \
-v /usr/local/cuda-12.8/targets/x86_64-linux/lib/libcufile_rdma.so.1:/usr/local/cuda/targets/x86_64-linux/lib/libcufile_rdma.so.1:ro \
-v /var/lib/nfsd-ro:/host/nfsd:ro \
-v /home/models:/home/models \
-v /home:/home \
-v /home/nfs:/home/nfs:rw,rshared \
-w /workspace \
--entrypoint="/bin/bash" \
--name $2 \
-itd $1
2 GDS 使用
3 如何验证真正使用了GDS?
3.1 用 gdscheck 证明挂载点/路径是 GDS-capable
1
2
3
4
5
# 查系统上哪些 FS 支持 GDS:
sudo /usr/local/cuda-*/gds/tools/gdscheck -p
# 检查实际的 gds_path
sudo /usr/local/cuda-*/gds/tools/gdscheck -p <YOUR_GDS_PATH>
3.2 看libcufile 日志,确认没有 fallback
因为 libcufile 可能内部 fallback,常用做法是设置 cuFile 日志环境变量:设置 compat_mode 为false,跑完后看到明确的 “using nvidia-fs / gds path / GDS enabled” 且没有 “fallback to POSIX” 之类的字样:强证据。如果日志里出现 “POSIX fallback”:那就证明数据面不是纯 GDS。
3.3 看 nvidia-fs 内核统计计数是否增长
真正走 GDS,会经过 nvidia-fs(kernel module)。可以在跑推理前后各读一次 nvidia-fs 的统计计数,确认读/写请求计数在增长。不同版本路径可能不同,常见候选:
1
2
3
ls /proc/driver | grep -i nvidia
ls /proc/driver/nvidia-fs 2>/dev/null
find /proc/driver -maxdepth 2 -type f | grep -i 'nvidia.*fs'
如果存在类似:
/proc/driver/nvidia-fs/stats/proc/driver/nvidia-fs/metrics/proc/driver/nvidia-fs/*
就:
1
2
3
cat /proc/driver/nvidia-fs/stats
# 跑一次 workload
cat /proc/driver/nvidia-fs/stats
workload 前后,stats 里的 read/write 相关计数明显增加:证明走了 nvidia-fs 通路(GDS 数据面)。
参考资料
This post is licensed under CC BY 4.0 by the author.