Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[breaking][CI] Use CTK 12.4 #10697

Merged
merged 17 commits into from
Aug 22, 2024
Merged

[breaking][CI] Use CTK 12.4 #10697

merged 17 commits into from
Aug 22, 2024

Conversation

hcho3
Copy link
Collaborator

@hcho3 hcho3 commented Aug 12, 2024

Closes #10696
Closes #10370

Marking as breaking, as we are dropping support for CUDA 11.8.

@hcho3 hcho3 changed the title [CI] Use CTK 12.4 [breaking][CI] Use CTK 12.4 Aug 12, 2024
@trivialfis
Copy link
Member

trivialfis commented Aug 13, 2024

Thank you for working on this! Using 12.4 is going to help with the external memory work significantly. Out of curiosity, what's the kernel driver version of the AMI? Is it using kernel-open? Ubuntu package:

$ apt search nvidia-driver
...
nvidia-headless-555-open/jammy 555.58.02-0ubuntu0~gpu22.04.1 amd64
  NVIDIA headless metapackage (open kernel module)
...

It's required to use HMM addressing:

$ nvidia-smi -q | grep Addressing

    Addressing Mode                       : HMM
    Addressing Mode                       : HMM

Currently, this test is skipped on the CI:

if (common::SupportsPageableMem()) {

It's OK if we don't want to update the AMI for now, we are still experimenting with the feature and might not support it in the end. Only bringing this up to share the need for newer CUDA versions.

As for the GRPC, I use:

>>> import grpc
>>> grpc.__version__
'1.62.2'

from conda-forge, seems to work well with CUDA 12.4/5, if it helps.

@hcho3
Copy link
Collaborator Author

hcho3 commented Aug 13, 2024

@trivialfis Here is the kernel driver version:

$ nvidia-smi
Tue Aug 13 19:57:56 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       On  |   00000000:00:1E.0 Off |                    0 |
| N/A   32C    P8             14W /   70W |       1MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

The AMI is based on Amazon Linux 2, so it probably doesn't use kernel-open.

Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What release is this targeting?

@hcho3
Copy link
Collaborator Author

hcho3 commented Aug 15, 2024

The next release, which is going to be 3.0.0 (due to the rewrite for the JVM packages).

@hcho3
Copy link
Collaborator Author

hcho3 commented Aug 15, 2024

@terrytangyuan BTW, we would love to get your input on #10639

@terrytangyuan
Copy link
Member

Sounds great!

@trivialfis
Copy link
Member

Hmm, RMM is bringing in a different CCCL. Might have to patch it by back porting the one liner fix.

@hcho3 hcho3 merged commit cd83fe6 into dmlc:master Aug 22, 2024
28 of 30 checks passed
@hcho3 hcho3 deleted the upgrade_cuda124 branch August 22, 2024 02:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update the default CTK to 12.4.
3 participants