Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter layer rebased #2054

Closed
wants to merge 1 commit into from

Conversation

mtamburrano
Copy link
Contributor

This is the new PR based on old #1482 which was not mergeable anymore
The blobs are now treated as N dimensional entities.
To correctly deal with 0 sized batches (it could be happen when all the data is filtered) this PR needs #2053
To allow discriminating backprop this layer needs #2052

Possible use case:
HD-CNN: Hierarchical Deep Convolutional Neural Network for Image Classification

Old description:

Filter_layer developed as discussed in #1448, implemented following the suggestions by @sguada and @longjon (paragraph 2(i)).

This layer accepts 2+ bottom blobs, the first blob acts as a selector, so it should be a vector s={0,1,2, .. n} in [0,1], and the remaining blobs > are blobs to be filtered. Each value of 1 in the selector vector means that the items at the corresponding indices int the bottom_blobs[1+] will be kept, on the contrary a values of 0 means the items with corresponding symbols will be filtered out.
So only non-filtered items will be forwarded and afterwards will receive backpropagation.

As param the filter_layer needs a need_back_prop vector. This is needed because some blob could contains labels, and therefore will not > need backpropagation. The need_back_prop vector as the same size as the top vector, and accepts 1s and 0s depending on whether it should be backpropagated or not.

@mtamburrano mtamburrano mentioned this pull request Mar 6, 2015
@ducha-aiki
Copy link
Contributor

Possible use case:
HD-CNN: Hierarchical Deep Convolutional Neural Network for Image Classification

BTW, I have asked guys who wrote this paper about their plans releasing their caffe code. The answer was no, until paper got`s accepted.

@bhack
Copy link
Contributor

bhack commented Mar 6, 2015

@jeffdonahue
Copy link
Contributor

As param the filter_layer needs a need_back_prop vector. This is needed because some blob could contains labels, and therefore will not > need backpropagation. The need_back_prop vector as the same size as the top vector, and accepts 1s and 0s depending on whether it should be backpropagated or not.

This still doesn't make sense to me. If your labels are not a function of the network weights, Caffe will already figure out that they do not need backpropagation. This is how all of our current loss layers work -- Caffe figures out that the labels are a net "input" (i.e., unaffected by the weights) and hence we have d(labels)/dw = 0 and don't need to compute anything explicitly, thereby setting propagate_down = false for the labels. On the other hand, if the learned weights of the net DO contribute to the labels that your layer takes as input, Caffe will correctly determine that we need to know how these labels change as we change the weights, hence propagate_down will be set to true for the labels in this case.

Maybe there is a use case for adding a proto field skip_propagate_down for certain bottoms when you don't care about backpropagation being mathematically correct, but it certainly seems to me that it shouldn't be called need_back_prop when the mathematical "need" for backprop is already inferred by Caffe.

@shelhamer shelhamer added the JD label Mar 7, 2015
@ducha-aiki
Copy link
Contributor

@jeffdonahue
Let me make an example, even not considering this filter layer. Imagine, Id like to train three networks in one (they branch from, i.e. conv4 in alexnet). Their average accuracy is little bit better, that accuracy of each single network. So I use probabilities of the ensemble as additional optimization goal (with L2 loss). But I don`t want to network to move ensemmble prediction close to each individual. I want only otherwise.
Or - similar case - pseudolabel on unsupervised dataset. I use labels from argmax on ensebmle as true labels. Caffe determines that they depends on weights with layers and then tell me "Caanot propagate to label inputs" and dies. So this proto field "skip_back_prop" would solve this issues.

@jeffdonahue
Copy link
Contributor

Thanks for the example @ducha-aiki -- it does seem useful to have a way of indicating that backprop should be truncated for particular bottoms of particular layers, or more generally a way to scale the bottom diff by a particular scale factor other than 1 (e.g., 0 if you want to truncate). It would also be a convenient way to freeze all layers of the net up to layer K without having to set lr_mult = 0 everywhere, which has been suggested before.

I am happy to review a separate PR which does the following:

  • Add a new proto field to the top level of message LayerParameter that is either repeated bool skip_propagate_down or repeated float bottom_diff_scale, which may be specified either 0 times or N times, where N is the number of bottoms.
  • If specified 0 times, skip_propagate_down would default to all false and bottom_diff_scale would default to all 1.0, and everything behaves exactly as it currently does.
  • In the case where skip_propagate_down == true or bottom_diff_scale == 0 for some bottom(s), Net::Init would modify its propagate_down settings accordingly (potentially resulting in more than just the layer where skip_propagate_down=true was specified skipping backprop)
  • Adds appropriate tests for the new behavior to TestNet (src/caffe/test/test_net.cpp)

For this PR, I can review it again when the need_back_prop-related stuff is removed.

@mtamburrano
Copy link
Contributor Author

thank you for your example @ducha-aiki.
To return on the specific case of filter_layer, the issue occurs when the labels are filtered according to the selector: because the filter_layer takes labels as input, Caffe thinks that the labels
contribute to weights computation and sets propagate_down to true, but they are just filtered and not influenced by successive computations, so they don't need backpropagation.
By the way, @jeffdonahue when you open the new PR with skip_propagate_down, I'll remove the redundant need_back_prop param

@bhack
Copy link
Contributor

bhack commented Mar 9, 2015

I think that @jeffdonahue told that he want only review a separte PR with skip_propagate_down and not to write himself one.
Actually we cannot remove need_back_prop param here without this new PR. So i want know from him if we are stalled.

@bhack
Copy link
Contributor

bhack commented Mar 9, 2015

@longjon Do you remember our discussion on this?

@jeffdonahue
Copy link
Contributor

To return on the specific case of filter_layer, the issue occurs when the labels are filtered according to the selector: because the filter_layer takes labels as input, Caffe thinks that the labels contribute to weights computation and sets propagate_down to true, but they are just filtered and not influenced by successive computations, so they don't need backpropagation.

You are misunderstanding -- Caffe will not think your labels need backpropagation if they aren't the output (direct or indirect) of a layer with weights. For example, if you have a data layer that directly produces the FilterLayer labels, FilterLayer will work fine without any need_back_prop or skip_back_prop hackery -- Caffe's Net::Init will correctly set propagate_down = false for the label input. On the other hand, if for example you take an input and multiply it by learnable weights then take an argmax to produce your FilterLayer labels, and FilterLayer's output influences a loss, then Caffe will correctly set propagate_down = true for the labels as the weights influence the loss through your FilterLayer.

It is important that if you put a layer in a network and it needs to be able to backpropagate (i.e., the loss is influenced by the weights via the layer), but can't, that Caffe will correctly complain and fail to train in this case. However, I agree that sometimes it could be desirable to train the net in spite of that, as in @ducha-aiki's example and other cases, but such an option (1) shouldn't be called need_back_prop as that to me implies a lack of mathematical need for backprop when backprop may in fact be needed, and (2) shouldn't have an ad-hoc implementation for one specific layer type as it creates redundancy (now have to listen to both propagate_down and need_back_prop) and it's equally useful for other layers.

So yes, I will wait to review this PR once the need_back_prop option is removed, regardless of whether that immediately satisfies your use case, and you or anyone else are welcome to also create the skip_propagate_down PR I described.

@mtamburrano
Copy link
Contributor Author

I agree with point 2, maybe a more general param could be useful in cases beyond this filter layer and to avoid redundancy the need_back_prop vector should be removed, but I still think that filter_layer needs either this vector or the new parameter you proposed.
Let's make an example:
We can have a net of this type, where the selector is obtained applying a sigmoid_layer followed by a threshold_layer to obtain the selctor for the filter layer:

filter layer example

In this case the filter_layer receives the output from pooling_layer and the labels directly from the data_layer. Both of these input needs to be filtered according to the third input, the selector, and the filter_layer outputs two filtered top_blobs (pooling and labels) for further computations.
Caffe correctly detects that filter_layer needs backpropagation and sets propagate_down to true for all of his bottom_blobs, but the label values don't need to be updated since are only used to calculate the loss and to be filtered and provided to further loss layers.

By the way me and @bhack are thinking about implementing this skip_propagate_down param, so we would like to clarify some doubt:

  • we should override the propagate_down values at the end of Net::Init?
  • can you suggest some test cases to be covered?

@sguada
Copy link
Contributor

sguada commented Mar 9, 2015

Meanwhile you could have two filter layers, one that filter the
pooling_layer and another one that filter the labels. In that case one of
the filter layers will propagate_down and the other will not.

Sergio

2015-03-09 12:12 GMT-07:00 Manuele notifications@github.com:

I agree with point 2, maybe a more general param could be useful in cases
beyond this filter layer and to avoid redundancy the need_back_prop
vector should be removed, but I still think that filter_layer needs
either this vector or the new parameter you proposed.
Let's make an example:
We can have a net of this type, where the selector is obtained applying a
sigmoid_layer followed by a threshold_layer to obtain the selctor for the filter
layer:

[image: filter layer example]
https://camo.githubusercontent.com/4677795349393be6627c9040b62bb8ca25c8db67/687474703a2f2f69313336372e70686f746f6275636b65742e636f6d2f616c62756d732f723739352f6a6f6b756c6861757073736b792f556e7469746c65642532304469616772616d5f7a70736a39627876737a732e6a7067

In this case the filter_layer receives the output from pooling_layer and
the labels directly from the data_layer. Both of these input needs to be
filtered according to the third input, the selector, and the filter_layer
outputs two filtered top_blobs (pooling and labels) for further
computations.
Caffe correctly detects that filter_layer needs backpropagation and sets
propagate_down to true for all of his bottom_blobs, but the label values
don't need to be updated since are only used to calculate the loss and to
be filtered and provided to further loss layers.

By the way me and @bhack https://github.com/bhack are thinking about
implementing this skip_propagate_down param, so we would like to clarify
some doubt:

  • we should override the propagate_down values at the end of Net::Init?
  • can you suggest some test cases to be covered?


Reply to this email directly or view it on GitHub
#2054 (comment).

@bhack
Copy link
Contributor

bhack commented Mar 9, 2015

@sguada Is your comment the same of #1482 (comment) and that @mtamburrano replied with #1482 (comment)? Is there something different?

@sguada
Copy link
Contributor

sguada commented Mar 9, 2015

@bhack yeah my comment is the same, for the label part of filter one doesn't need to back-propagate while for the data part of filter it will back-propagate as needed.

I think you should try to implement the simplest filter layer possible and address problems in future PRs, otherwise this will never reach a merge-able point.

@jeffdonahue
Copy link
Contributor

@mtamburrano right, I understand the use case, and your example makes sense. Note that in your example you would add skip_propagate_down: false skip_propagate_down: true to the SigmoidCrossEntropyLoss layer definition to avoid the problem; FilterLayer itself should not need to be changed as Caffe should infer propagate_down = false for the labels if skip_propagate_down: true is set in the loss layer (though also adding skip_propagate_down: true for the labels in the FilterLayer would not affect anything).

we should override the propagate_down values at the end of Net::Init?

It is probably easier/better to weave in the logic where Net is already setting propagate_down; you can likely use similar logic to the logic used to handle the lr_mult: 0 cases not needing backprop.

can you suggest some test cases to be covered?

How about the example you drew? You can turn it into a prototxt string, and add a new test case to TestNet (src/caffe/test/test_net.cpp) that checks propagate_down == false for FilterLayer's label input with skip_propagate_down set in SoftmaxWithLossLayer, and checks propagate_down == true without skip_propagate_down. (May have to add a public accessor to Net for propagate_down if there isn't one already.)

@mtamburrano
Copy link
Contributor Author

How about the example you drew? You can turn it into a prototxt string, and add a new test case to TestNet (src/caffe/test/test_net.cpp) that checks propagate_down == false for FilterLayer's label input with skip_propagate_down set in SoftmaxWithLossLayer, and checks propagate_down == true without skip_propagate_down. (May have to add a public accessor to Net for propagate_down if there Isn't one already.)

I'd like to use my example for tests, but if I have to open a separate PR for the new parameter, I can't use this unmerged layer for running tests. Or rather, I can write the code, but the travis build will not succeed...

@jeffdonahue
Copy link
Contributor

The test doesn't need to use a FilterLayer specifically; the result should be the same regardless of layer type (so any layer that takes two inputs should work). But if it's really an issue you can base the other PR on this one (thereby requiring that this one be merged first).

@mtamburrano
Copy link
Contributor Author

@jeffdonahue the new PR is live at #2095
tomorrow I'll remove the need_back_prop vector from this PR and probably #2052 is no more needed and can be closed

@mtamburrano
Copy link
Contributor Author

need_back_prop vector and all related stuff removed

@bhack
Copy link
Contributor

bhack commented Mar 13, 2015

@jeffdonahue I see that somebody is tagging PR as "ready for review". Can you tag also this one?

@bhack
Copy link
Contributor

bhack commented Mar 31, 2015

@jeffdonahue Can you pass here?

@bhack
Copy link
Contributor

bhack commented May 6, 2015

@mtamburrano Seems that It needs a rebase again.

@bhack
Copy link
Contributor

bhack commented May 14, 2015

@jeffdonahue We want to program our activity to reserve a little bit of time for this. Do you need other changes or is it mergeable?

@bhack
Copy link
Contributor

bhack commented May 15, 2015

@jeffdonahue Can we squash also this? Do you think is it ready?

// Message that stores parameters used by FilterLayer
message FilterParameter {
// specify which blobs need to be backpropagated and which not
repeated uint32 need_back_prop = 1 [packed = true];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be needed after #2095 right?

@mtamburrano
Copy link
Contributor Author

removed need_back_prop param from Caffe.proto (I had already removed from the other files, but forgot that one) and merged again with Master.

@jeffdonahue I would like to remember that this PR now works, but to correctly treats with zero-sized batches, it still needs #2053 (although seems it is not only filter_layer to need it, but it applies to other cases, too)

@bhack
Copy link
Contributor

bhack commented May 21, 2015

@jeffdonahue Can we try to merge this before CVPR? We can allocate a little bit more time if you need some other little fixes.

@mtamburrano
Copy link
Contributor Author

rebased on master again.
@jeffdonahue, have you any idea when this PR could be merged? (and #2053 too)

@bhack
Copy link
Contributor

bhack commented May 27, 2015

@jeffdonahue (@sguada amarcord) Since 18 Nov 2014 in #1448. This PR is becoming barrel aged edition!


template <typename Dtype>
void FilterLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have something like CHECK(!propagate_down[bottom.size() - 1]) << this->type() << "Layer cannot backpropagate to filter index inputs" (like SoftmaxWithLossLayer for labels, etc.). Same comment in Backward_gpu. (This will make the gradient check fail, but you give it bottom ID 0 as an additional argument to only check that input.)

@jeffdonahue
Copy link
Contributor

Okay, please add a check as I commented above, squash, and I'll merge.

@mtamburrano
Copy link
Contributor Author

Check added, test updated and squashed all commits.

I would to point up again that without #2053, if a Filter_Layer filters all the input data for a given batch, it outputs a 0-sized batch that will cause a crash because net.cpp doesn't allow that.
This was discussed with @longjon here(in particular see Point 5) and in the old PR #1484

@mtamburrano mtamburrano reopened this May 28, 2015
@mtamburrano
Copy link
Contributor Author

seems like Travis fails first 2 builds because not every tests pass.
I don't know why, that tests should not regard my changes, I also tried to build the same branch on my repositories and everything works fine: https://travis-ci.org/mtamburrano/caffe/builds/64395042

@jeffdonahue
Copy link
Contributor

Not sure what happened but your diff is now huge. Did you rebase on the latest master?

@bhack
Copy link
Contributor

bhack commented May 29, 2015

@mtamburrano The problem seems that now it is rebased on mtamburrano@596add7 but caffe master is at b12c171

@mtamburrano
Copy link
Contributor Author

I think I did something wrong when I squashed commits, everything should be fine now.
I'm sorry for the closed-reopen PR spam ;)

@bhack
Copy link
Contributor

bhack commented Jun 2, 2015

@jeffdonahue Seems rebased correctly now. Can you pass here?

@jeffdonahue
Copy link
Contributor

Thanks @mtamburrano! Merged @ 21b6bf8. I fixed a bug in the GPU backward implementation (gradient test was segfaulting) and did some minor comment/style cleanup @ 0ccf336.

@jeffdonahue jeffdonahue closed this Jun 3, 2015
@bhack bhack mentioned this pull request Aug 26, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants