Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA API support: cudaHostAlloc and cudaFreeHost #1304

Open
chengchen666 opened this issue May 31, 2024 · 8 comments
Open

CUDA API support: cudaHostAlloc and cudaFreeHost #1304

chengchen666 opened this issue May 31, 2024 · 8 comments
Assignees

Comments

@chengchen666
Copy link
Collaborator

Need to implement cudaHostAlloc and cudaFreeHost to support vLLM.
Test case is in:
16bf3d2

To build:
nvcc -cudart shared test_cudahostalloc.cpp -o test_cudahostalloc -lcuda

To Run:
./test_cudahostalloc 1024 1024

@chengchen666
Copy link
Collaborator Author

chengchen666 commented Jun 1, 2024

Not in high priority. It's highly possible that this API is called by NCCL. So once we finish the NCCL support, we might not need to support this API for now. This is because I don't find this API in vLLM source code, but in NCCL source code, I find it.

@mehryar72
Copy link
Collaborator

Branch Merge Issue with mab_hostalloc

The branch named mab_hostalloc is a merge of hostalloc with multithread and nccl. The throughput test that utilizes cudahostalloc using " test_cudahostalloc" successfully executes on its first run. However, upon a second attempt, the container experiences a crash.

Log Details appearing right after crash:

[INFO] [0/60932323] unmap ptr is 4000000000, len is 1000
[INFO] [0/60932402] unmap ptr is 4000001000, len is 28f000
[INFO] [0/60932563] unmap ptr is 4000290000, len is 1a000
[INFO] [0/60932585] unmap ptr is 40002aa000, len is 2000
[INFO] [0/60932594] unmap ptr is 40002ac000, len is 3000

@QuarkContainer
Copy link
Owner

@mehryar72 Thank you! Would you please provide more detail repro step and it will be great to attach whole quark log.

@mehryar72
Copy link
Collaborator

mehryar72 commented Jun 5, 2024

@QuarkContainer
how to replicate:
build quark from mab_hostalloc branch.
Inside a container with quark runtime run the cudahostalloc throuput test.
LD_PRELOAD=/path_to_libcudaproxy/libcudaproxy.so ./test_cudahostalloc 1024 1024
the first time the run is successfull. the second time the container gets stuck.
Quark log is attached
quark_log.txt

@QuarkContainer
Copy link
Owner

@mehryar72
I tried to build the branch mab_hostalloc but fail with following error. Looks like I need to install the nvcc library. Could you please update the steps to do that?

Compiling containerd-shim v0.3.0 (https://github.com/QuarkContainer/rust-extensions.git#b3ac82d9)
Compiling quark v0.6.0 (/home/brad/rust/Quark/qvisor)
error: linking with cc failed: exit status: 1
|
= note: LC_ALL="C" PATH="/home/brad/.rustup/toolchains/nightly-2023-12-11-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/bin:/home/brad/.pyenv/shims:/home/brad/.pyenv/bin:/home/brad/.cargo/bin:/home/brad/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin:/usr/local/go/bin" VSLANG="1033" "cc" "-m64" "/tmp/rustc7pziLu/symbols.o" "/home/brad/rust/Quark/qvisor/../target/release/deps/quark-15c31bd88d58b28c.quark.ad02c6ded2946f8b-cgu.0.rcgu.o" "-Wl,--as-needed" "-L" "/home/brad/rust/Quark/qvisor/../target/release/deps" "-L" "/usr/local/cuda/lib64" "-L" "/usr/local/cuda/lib64/stubs" "-L" "/usr/local/cuda/targets/x86_64-linux/lib" "-L" "/usr/local/cuda/targets/x86_64-linux/lib/stubs" "-L" "/usr/local/cuda-12/lib64" "-L" "/usr/local/cuda-12/lib64/stubs" "-L" "/usr/local/cuda-12/targets/x86_64-linux/lib" "-L" "/usr/local/cuda-12/targets/x86_64-linux/lib/stubs" "-L" "/usr/local/cuda-12.3/lib64" "-L" "/usr/local/cuda-12.3/lib64/stubs" "-L" "/usr/local/cuda-12.3/targets/x86_64-linux/lib" "-L" "/usr/local/cuda-12.3/targets/x86_64-linux/lib/stubs" "-L" "/usr/local/cuda/lib64" "-L" "/usr/local/cuda/lib64/stubs" "-L" "/usr/local/cuda/targets/x86_64-linux/lib" "-L" "/usr/local/cuda/targets/x86_64-linux/lib/stubs" "-L" "/usr/local/cuda-12/lib64" "-L" "/usr/local/cuda-12/lib64/stubs" "-L" "/usr/local/cuda-12/targets/x86_64-linux/lib" "-L" "/usr/local/cuda-12/targets/x86_64-linux/lib/stubs" "-L" "/usr/local/cuda-12.3/lib64" "-L" "/usr/local/cuda-12.3/lib64/stubs" "-L" "/usr/local/cuda-12.3/targets/x86_64-linux/lib" "-L" "/usr/local/cuda-12.3/targets/x86_64-linux/lib/stubs" "-L" "/usr/local/cuda/lib64" "-L" "/usr/lib/x86_64-linux-gnu" "-L" "/usr/lib/x86_64-linux-gnu/stubs" "-L" "/home/brad/.rustup/toolchains/nightly-2023-12-11-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-Wl,-Bdynamic" "-lnccl" "-lcuda" "-lcudart" "-lnvidia-ml" "-lcublas" "-lcublasLt" "-Wl,-Bstatic" "/tmp/rustc7pziLu/libcompiler_builtins-8ebeba8f78436673.rlib" "-Wl,-Bdynamic" "-lcuda" "-lcublas" "-lcuda" "-lcublasLt" "-lelf" "-lcudart" "-lc" "-lcap" "-lgcc_s" "-lutil" "-lrt" "-lpthread" "-lm" "-ldl" "-lc" "-Wl,--eh-frame-hdr" "-Wl,-z,noexecstack" "-L" "/home/brad/.rustup/toolchains/nightly-2023-12-11-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-o" "/home/brad/rust/Quark/qvisor/../target/release/deps/quark-15c31bd88d58b28c" "-Wl,--gc-sections" "-pie" "-Wl,-z,relro,-z,now" "-Wl,-O1" "-nodefaultlibs"
= note: /usr/bin/ld: cannot find -lnccl: No such file or directory
collect2: error: ld returned 1 exit status

my test in the branch hostalloc pass as below.

root@brad-MS-7D46:/var/log/quark# rm quark.log; docker run --net=host --cpus=0.8 -P --runtime=quark_d --mount type=bind,source="/home/brad/rust/Quark",target=/Quark --rm -it nvidia/cuda:12.1.0-devel-ubuntu22.04 /bin/bash

==========
== CUDA ==

CUDA Version 12.1.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .


** DEPRECATION NOTICE! **


THIS IMAGE IS DEPRECATED and is scheduled for DELETION.
https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md

root@brad-MS-7D46:/# LD_PRELOAD=/Quark/target/release/libcudaproxy.so /Quark/test/c/test_cudahostalloc 1024 1024
Average throughput from host to device (cudaHostAlloc): 22.3543 GB/s
Average throughput from device to host (cudaHostAlloc): 24.2204 GB/s
root@brad-MS-7D46:/# LD_PRELOAD=/Quark/target/release/libcudaproxy.so /Quark/test/c/test_cudahostalloc 1024 1024
Average throughput from host to device (cudaHostAlloc): 22.2447 GB/s
Average throughput from device to host (cudaHostAlloc): 24.2431 GB/s

@chengchen666
Copy link
Collaborator Author

Maybe we should make NCCL as an option for building quark. Because not all cuda users require for NCCL.

@QuarkContainer
Copy link
Owner

When test with latest GPUVirtNew branch the test code fail at weired place.

root@brad-MS-7D46:/Quark/target/release# LD_PRELOAD=/Quark/target/release/libcudaproxy.so /Quark/test/c/test_cudahostalloc 1024 1024
failed to replaced dlopen call to libcudaproxy.so
CUDA error at test_cuda.cpp:104 - �ViY

@QuarkContainer
Copy link
Owner

@mehryar72 @chengchen666 with PR #1315. The cudahostalloc works as below.

root@brad-MS-7D46:/var/log/quark# rm quark.log; docker run --net=host --cpus=0.8 -P --runtime=quark_d --mount type=bind,source="/home/brad/rust/Quark",target=/Quark --rm -it nvidia/cuda:12.1.0-devel-ubuntu22.04 /bin/bash

==========
== CUDA ==

CUDA Version 12.1.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .


** DEPRECATION NOTICE! **


THIS IMAGE IS DEPRECATED and is scheduled for DELETION.
https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md

root@brad-MS-7D46:/# LD_PRELOAD=/Quark/target/release/libcudaproxy.so /Quark/test/c/test_cudahostalloc 1024 1024
Average throughput from host to device (cudaHostAlloc): 22.3902 GB/s
Average throughput from device to host (cudaHostAlloc): 23.9117 GB/s
root@brad-MS-7D46:/# LD_PRELOAD=/Quark/target/release/libcudaproxy.so /Quark/test/c/test_cudahostalloc 1024 1024
Average throughput from host to device (cudaHostAlloc): 22.31 GB/s
Average throughput from device to host (cudaHostAlloc): 23.875 GB/s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants