-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extreme CPU load ~3100% during variable read from netcdf #264
Comments
@durack1 Thanks for confirming what I've been seeing. I haven't tested 3.0.1 yet, but I'll try to confirm what you have. In the meantime, @dnadeau4 and @doutriaux1 we need a test for this. It should query |
@zshaheen no problem, I've seen ~1200, ~1500 but the new record of 3100% today was too much not to list this issue.. Let me know if you want me to dig a little deeper into the libraries that comprise the |
@zshaheen thanks for doing this. |
@durack1 @doutriaux1 @dnadeau4 I have horrible news. I just tested 3.0.1 of cdms and I'm getting the issue. I followed the instructions I had in #248. With cdms like this, even running e3sm_diags with 4 processes spawns so many children processes that it can't even complete. The 12+ users of our (@chengzhuzhang and I) software are having similar issues. |
@zshaheen same hash as the build I am using above? |
@durack1 Yes:
|
Maybe a rebuild with careful compiler and library selection will fix things? I believe most machines I’ve seen issues are RHEL 6.10 |
I haven’t checked Mac |
@zshaheen do you have a test? |
@durack1 this file does not exist. '/work/durack1/Shared/obs_data/Argo/UCSD/180719_UCSD_monthly_TAndS_200401-201806_2p5-1975db.nc' |
@dnadeau4 not an automated test, but you can manually follow these instructions. This should only take 1 min or so. |
@dnadeau4 this path was ocean, it's now on crunch |
set these environment variables. |
@dnadeau4 great, this fixes me for the issue above and likely another problem with me maxing out thread counts on some other cron jobs. @pochedls - this will likely also effect you with the xml cron/spawning, my test case for a single thread operation created ~20 suspended threads, x40 for the xml scan (along with other processes) this will start getting close to the 1024 hard limit imposed for each user |
@gleckler1 @lee1043 @pochedls the env variables above #264 (comment) should be added in anytime you're going to start heavy (multithread) computing, I've been hitting problems for a number of months now, in my test this contains the issue |
@dnadeau4 thanks again for this, I've been testing all afternoon and the highest thread count I've seen is ~130%, and the scripts are running MUCH MUCH faster.. Thanks again. |
@dnadeau4 bad news, this problem is not fixed. I have set the
And it seems this is not controlling threads, with a new thread added every 3 secs that the script runs:
|
I have noticed some extreme CPU usage during variable read from a netcdf file, an example below:
And from
top
:Conda env info:
@dnadeau4 @zshaheen @doutriaux1
The text was updated successfully, but these errors were encountered: