Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DART-MPI: call into MPI every once in a while for local put/get #712

Open
wants to merge 1 commit into
base: development
Choose a base branch
from

Conversation

devreal
Copy link
Member

@devreal devreal commented Jun 25, 2020

We need to call into MPI every once in a while as otherwise polling on a local variable may not trigger progress if that is needed.

@codecov
Copy link

codecov bot commented Jun 25, 2020

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.19%. Comparing base (425857a) to head (151d79d).
Report is 124 commits behind head on development.

Additional details and impacted files
@@               Coverage Diff               @@
##           development     #712      +/-   ##
===============================================
+ Coverage        84.06%   85.19%   +1.12%     
===============================================
  Files              336      336              
  Lines            24954    24945       -9     
  Branches         11349    11540     +191     
===============================================
+ Hits             20977    21251     +274     
  Misses            3693     3693              
+ Partials           284        1     -283     
Files with missing lines Coverage Δ
dart-impl/mpi/src/dart_communication.c 69.44% <ø> (-0.33%) ⬇️

... and 41 files with indirect coverage changes

@@ -30,6 +30,11 @@
#include <math.h>
#include <alloca.h>

/* the number of consecutive memcpy for local put/get before calling into MPI */
#define NUM_CONSECUTIVE_MEMCPY 16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 16 ? educated guess?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm open to other suggestions :D

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least document, that 16 is just an educated guess

@@ -30,6 +30,11 @@
#include <math.h>
#include <alloca.h>

/* the number of consecutive memcpy for local put/get before calling into MPI */
#define NUM_CONSECUTIVE_MEMCPY 16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least document, that 16 is just an educated guess

#define NUM_CONSECUTIVE_MEMCPY 16

/* number of performed local memcpy between calling into MPI */
static _Thread_local int num_local_memcpy = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs C11, is this already ensured by CMake? I also think its better to use <threads.h> and thread_local here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and no = 0 needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, we don't officially require a C11 compiler. I am not going to open the discussion on whether we should abandon support for >20 year old compilers though...

// use direct memcpy if we are on the same unit
memcpy(dest, seginfo->selfbaseptr + offset,
nelem * dart__mpi__datatype_sizeof(dtype));
DART_LOG_DEBUG("dart_get: memcpy nelem:%zu "
"source (coll.): offset:%lu -> dest: %p",
nelem, offset, dest);
num_local_memcpy = 0;
return DART_OK;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_shared_mem and put_shared_mem also call memcpy, are they also effected by this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do not, as that doesn't depend on progress triggered locally.

@devreal
Copy link
Member Author

devreal commented Jun 26, 2020

After having given this some more thought I modified the PR to always call into MPI for local memory accesses. If fast local access without progress is required the application should just dereference the native pointer provided by DASH. Once we are in DART we have lost the latency race anyway. Everything else just adds complexity.

@bertwesarg
Copy link
Member

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants