-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check that the snapshot directory is writeable before starting training #3049
Conversation
This should be helpful. Thanks @seanbell ! |
if (!param_.has_snapshot_prefix()) { | ||
LOG(FATAL) << "In solver params, snapshot is specified " | ||
<< "but snapshot_prefix is not"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor issue: better to use CHECK
marcos here and below to make code more succinct.
a522cea
to
865b6bf
Compare
Thanks for the review! I addressed your comments and squashed into a single commit. I left the second FATAL because there is cleanup required if it passes. |
void Solver<Dtype>::CheckSnapshotWritePermissions() { | ||
if (Caffe::root_solver() && param_.snapshot()) { | ||
CHECK(param_.has_snapshot_prefix()) | ||
<< "In solver params, snapshot is specified but snapshot_prefix is not"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to indent 4 spaces in continuing lines here and below, to be consistent with our code style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK -- updated to 4 spaces (and below) and rebased again.
865b6bf
to
ab554cb
Compare
Check that the snapshot directory is writeable before starting training
Check that the snapshot directory is writeable before starting training. PR BVLC#3049
the snapshot directory cannot write when i training the lenent ,what should i do? |
When training, if a snapshot cannot be written (directory does not exist, or insufficient permissions, or invalid snapshot prefix), a lot of time can potentially be lost waiting for the error. This tries to open an empty test file (and delete it) before training starts, so that failure happens as soon as possible.
This does not check for disk space, but usually that's a problem for later in training.