Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check that the snapshot directory is writeable before starting training #3049

Merged
merged 1 commit into from
Sep 14, 2015

Conversation

seanbell
Copy link

@seanbell seanbell commented Sep 9, 2015

When training, if a snapshot cannot be written (directory does not exist, or insufficient permissions, or invalid snapshot prefix), a lot of time can potentially be lost waiting for the error. This tries to open an empty test file (and delete it) before training starts, so that failure happens as soon as possible.

This does not check for disk space, but usually that's a problem for later in training.

@ronghanghu
Copy link
Member

This should be helpful. Thanks @seanbell !

if (!param_.has_snapshot_prefix()) {
LOG(FATAL) << "In solver params, snapshot is specified "
<< "but snapshot_prefix is not";
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor issue: better to use CHECK marcos here and below to make code more succinct.

@seanbell
Copy link
Author

Thanks for the review! I addressed your comments and squashed into a single commit. I left the second FATAL because there is cleanup required if it passes.

void Solver<Dtype>::CheckSnapshotWritePermissions() {
if (Caffe::root_solver() && param_.snapshot()) {
CHECK(param_.has_snapshot_prefix())
<< "In solver params, snapshot is specified but snapshot_prefix is not";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to indent 4 spaces in continuing lines here and below, to be consistent with our code style.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK -- updated to 4 spaces (and below) and rebased again.

ronghanghu added a commit that referenced this pull request Sep 14, 2015
Check that the snapshot directory is writeable before starting training
@ronghanghu ronghanghu merged commit e4baef2 into BVLC:master Sep 14, 2015
ctrevino added a commit to Robotertechnik/caffe that referenced this pull request Sep 15, 2015
Check that the snapshot directory is writeable before starting training.

PR BVLC#3049
ctrevino added a commit to Robotertechnik/caffe that referenced this pull request Sep 15, 2015
@hjc1028
Copy link

hjc1028 commented Oct 16, 2016

the snapshot directory cannot write when i training the lenent ,what should i do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants