Filter layer rebased #2054

mtamburrano · 2015-03-06T12:21:26Z

This is the new PR based on old #1482 which was not mergeable anymore
The blobs are now treated as N dimensional entities.
To correctly deal with 0 sized batches (it could be happen when all the data is filtered) this PR needs #2053
To allow discriminating backprop this layer needs #2052

Possible use case:
HD-CNN: Hierarchical Deep Convolutional Neural Network for Image Classification

Old description:

Filter_layer developed as discussed in #1448, implemented following the suggestions by @sguada and @longjon (paragraph 2(i)).

This layer accepts 2+ bottom blobs, the first blob acts as a selector, so it should be a vector s={0,1,2, .. n} in [0,1], and the remaining blobs > are blobs to be filtered. Each value of 1 in the selector vector means that the items at the corresponding indices int the bottom_blobs[1+] will be kept, on the contrary a values of 0 means the items with corresponding symbols will be filtered out.
So only non-filtered items will be forwarded and afterwards will receive backpropagation.

As param the filter_layer needs a need_back_prop vector. This is needed because some blob could contains labels, and therefore will not > need backpropagation. The need_back_prop vector as the same size as the top vector, and accepts 1s and 0s depending on whether it should be backpropagated or not.

ducha-aiki · 2015-03-06T15:51:33Z

Possible use case:
HD-CNN: Hierarchical Deep Convolutional Neural Network for Image Classification

BTW, I have asked guys who wrote this paper about their plans releasing their caffe code. The answer was no, until paper got`s accepted.

bhack · 2015-03-06T15:58:32Z

Another one with some possible relation to this PR: Error-Driven Incremental Learning in Deep Convolutional Neural Network for Large-Scale Image Classification

jeffdonahue · 2015-03-06T23:59:59Z

As param the filter_layer needs a need_back_prop vector. This is needed because some blob could contains labels, and therefore will not > need backpropagation. The need_back_prop vector as the same size as the top vector, and accepts 1s and 0s depending on whether it should be backpropagated or not.

This still doesn't make sense to me. If your labels are not a function of the network weights, Caffe will already figure out that they do not need backpropagation. This is how all of our current loss layers work -- Caffe figures out that the labels are a net "input" (i.e., unaffected by the weights) and hence we have d(labels)/dw = 0 and don't need to compute anything explicitly, thereby setting propagate_down = false for the labels. On the other hand, if the learned weights of the net DO contribute to the labels that your layer takes as input, Caffe will correctly determine that we need to know how these labels change as we change the weights, hence propagate_down will be set to true for the labels in this case.

Maybe there is a use case for adding a proto field skip_propagate_down for certain bottoms when you don't care about backpropagation being mathematically correct, but it certainly seems to me that it shouldn't be called need_back_prop when the mathematical "need" for backprop is already inferred by Caffe.

ducha-aiki · 2015-03-07T08:58:26Z

@jeffdonahue
Let me make an example, even not considering this filter layer. Imagine, Id like to train three networks in one (they branch from, i.e. conv4 in alexnet). Their average accuracy is little bit better, that accuracy of each single network. So I use probabilities of the ensemble as additional optimization goal (with L2 loss). But I don`t want to network to move ensemmble prediction close to each individual. I want only otherwise.
Or - similar case - pseudolabel on unsupervised dataset. I use labels from argmax on ensebmle as true labels. Caffe determines that they depends on weights with layers and then tell me "Caanot propagate to label inputs" and dies. So this proto field "skip_back_prop" would solve this issues.

jeffdonahue · 2015-03-09T01:50:04Z

Thanks for the example @ducha-aiki -- it does seem useful to have a way of indicating that backprop should be truncated for particular bottoms of particular layers, or more generally a way to scale the bottom diff by a particular scale factor other than 1 (e.g., 0 if you want to truncate). It would also be a convenient way to freeze all layers of the net up to layer K without having to set lr_mult = 0 everywhere, which has been suggested before.

I am happy to review a separate PR which does the following:

Add a new proto field to the top level of message LayerParameter that is either repeated bool skip_propagate_down or repeated float bottom_diff_scale, which may be specified either 0 times or N times, where N is the number of bottoms.
If specified 0 times, skip_propagate_down would default to all false and bottom_diff_scale would default to all 1.0, and everything behaves exactly as it currently does.
In the case where skip_propagate_down == true or bottom_diff_scale == 0 for some bottom(s), Net::Init would modify its propagate_down settings accordingly (potentially resulting in more than just the layer where skip_propagate_down=true was specified skipping backprop)
Adds appropriate tests for the new behavior to TestNet (src/caffe/test/test_net.cpp)

For this PR, I can review it again when the need_back_prop-related stuff is removed.

mtamburrano · 2015-03-09T13:44:45Z

thank you for your example @ducha-aiki.
To return on the specific case of filter_layer, the issue occurs when the labels are filtered according to the selector: because the filter_layer takes labels as input, Caffe thinks that the labels
contribute to weights computation and sets propagate_down to true, but they are just filtered and not influenced by successive computations, so they don't need backpropagation.
By the way, @jeffdonahue when you open the new PR with skip_propagate_down, I'll remove the redundant need_back_prop param

bhack · 2015-03-09T15:01:39Z

I think that @jeffdonahue told that he want only review a separte PR with skip_propagate_down and not to write himself one.
Actually we cannot remove need_back_prop param here without this new PR. So i want know from him if we are stalled.

bhack · 2015-03-09T15:07:56Z

@longjon Do you remember our discussion on this?

jeffdonahue · 2015-03-09T18:17:40Z

To return on the specific case of filter_layer, the issue occurs when the labels are filtered according to the selector: because the filter_layer takes labels as input, Caffe thinks that the labels contribute to weights computation and sets propagate_down to true, but they are just filtered and not influenced by successive computations, so they don't need backpropagation.

You are misunderstanding -- Caffe will not think your labels need backpropagation if they aren't the output (direct or indirect) of a layer with weights. For example, if you have a data layer that directly produces the FilterLayer labels, FilterLayer will work fine without any need_back_prop or skip_back_prop hackery -- Caffe's Net::Init will correctly set propagate_down = false for the label input. On the other hand, if for example you take an input and multiply it by learnable weights then take an argmax to produce your FilterLayer labels, and FilterLayer's output influences a loss, then Caffe will correctly set propagate_down = true for the labels as the weights influence the loss through your FilterLayer.

It is important that if you put a layer in a network and it needs to be able to backpropagate (i.e., the loss is influenced by the weights via the layer), but can't, that Caffe will correctly complain and fail to train in this case. However, I agree that sometimes it could be desirable to train the net in spite of that, as in @ducha-aiki's example and other cases, but such an option (1) shouldn't be called need_back_prop as that to me implies a lack of mathematical need for backprop when backprop may in fact be needed, and (2) shouldn't have an ad-hoc implementation for one specific layer type as it creates redundancy (now have to listen to both propagate_down and need_back_prop) and it's equally useful for other layers.

So yes, I will wait to review this PR once the need_back_prop option is removed, regardless of whether that immediately satisfies your use case, and you or anyone else are welcome to also create the skip_propagate_down PR I described.

mtamburrano · 2015-03-09T19:12:18Z

I agree with point 2, maybe a more general param could be useful in cases beyond this filter layer and to avoid redundancy the need_back_prop vector should be removed, but I still think that filter_layer needs either this vector or the new parameter you proposed.
Let's make an example:
We can have a net of this type, where the selector is obtained applying a sigmoid_layer followed by a threshold_layer to obtain the selctor for the filter layer:

In this case the filter_layer receives the output from pooling_layer and the labels directly from the data_layer. Both of these input needs to be filtered according to the third input, the selector, and the filter_layer outputs two filtered top_blobs (pooling and labels) for further computations.
Caffe correctly detects that filter_layer needs backpropagation and sets propagate_down to true for all of his bottom_blobs, but the label values don't need to be updated since are only used to calculate the loss and to be filtered and provided to further loss layers.

By the way me and @bhack are thinking about implementing this skip_propagate_down param, so we would like to clarify some doubt:

we should override the propagate_down values at the end of Net::Init?
can you suggest some test cases to be covered?

sguada · 2015-03-09T19:17:03Z

Meanwhile you could have two filter layers, one that filter the
pooling_layer and another one that filter the labels. In that case one of
the filter layers will propagate_down and the other will not.

Sergio

2015-03-09 12:12 GMT-07:00 Manuele notifications@github.com:

I agree with point 2, maybe a more general param could be useful in cases
beyond this filter layer and to avoid redundancy the need_back_prop
vector should be removed, but I still think that filter_layer needs
either this vector or the new parameter you proposed.
Let's make an example:
We can have a net of this type, where the selector is obtained applying a
sigmoid_layer followed by a threshold_layer to obtain the selctor for the filter
layer:

[image: filter layer example]
https://camo.githubusercontent.com/4677795349393be6627c9040b62bb8ca25c8db67/687474703a2f2f69313336372e70686f746f6275636b65742e636f6d2f616c62756d732f723739352f6a6f6b756c6861757073736b792f556e7469746c65642532304469616772616d5f7a70736a39627876737a732e6a7067

In this case the filter_layer receives the output from pooling_layer and
the labels directly from the data_layer. Both of these input needs to be
filtered according to the third input, the selector, and the filter_layer
outputs two filtered top_blobs (pooling and labels) for further
computations.
Caffe correctly detects that filter_layer needs backpropagation and sets
propagate_down to true for all of his bottom_blobs, but the label values
don't need to be updated since are only used to calculate the loss and to
be filtered and provided to further loss layers.

By the way me and @bhack https://github.com/bhack are thinking about
implementing this skip_propagate_down param, so we would like to clarify
some doubt:

we should override the propagate_down values at the end of Net::Init?

can you suggest some test cases to be covered?

—
Reply to this email directly or view it on GitHub
#2054 (comment).

bhack · 2015-03-09T19:42:11Z

@sguada Is your comment the same of #1482 (comment) and that @mtamburrano replied with #1482 (comment)? Is there something different?

sguada · 2015-03-09T19:49:44Z

@bhack yeah my comment is the same, for the label part of filter one doesn't need to back-propagate while for the data part of filter it will back-propagate as needed.

I think you should try to implement the simplest filter layer possible and address problems in future PRs, otherwise this will never reach a merge-able point.

jeffdonahue · 2015-03-09T20:00:06Z

@mtamburrano right, I understand the use case, and your example makes sense. Note that in your example you would add skip_propagate_down: false skip_propagate_down: true to the SigmoidCrossEntropyLoss layer definition to avoid the problem; FilterLayer itself should not need to be changed as Caffe should infer propagate_down = false for the labels if skip_propagate_down: true is set in the loss layer (though also adding skip_propagate_down: true for the labels in the FilterLayer would not affect anything).

we should override the propagate_down values at the end of Net::Init?

It is probably easier/better to weave in the logic where Net is already setting propagate_down; you can likely use similar logic to the logic used to handle the lr_mult: 0 cases not needing backprop.

can you suggest some test cases to be covered?

How about the example you drew? You can turn it into a prototxt string, and add a new test case to TestNet (src/caffe/test/test_net.cpp) that checks propagate_down == false for FilterLayer's label input with skip_propagate_down set in SoftmaxWithLossLayer, and checks propagate_down == true without skip_propagate_down. (May have to add a public accessor to Net for propagate_down if there isn't one already.)

mtamburrano · 2015-03-10T00:06:53Z

How about the example you drew? You can turn it into a prototxt string, and add a new test case to TestNet (src/caffe/test/test_net.cpp) that checks propagate_down == false for FilterLayer's label input with skip_propagate_down set in SoftmaxWithLossLayer, and checks propagate_down == true without skip_propagate_down. (May have to add a public accessor to Net for propagate_down if there Isn't one already.)

I'd like to use my example for tests, but if I have to open a separate PR for the new parameter, I can't use this unmerged layer for running tests. Or rather, I can write the code, but the travis build will not succeed...

jeffdonahue · 2015-03-10T00:22:12Z

The test doesn't need to use a FilterLayer specifically; the result should be the same regardless of layer type (so any layer that takes two inputs should work). But if it's really an issue you can base the other PR on this one (thereby requiring that this one be merged first).

mtamburrano · 2015-03-10T20:12:40Z

@jeffdonahue the new PR is live at #2095
tomorrow I'll remove the need_back_prop vector from this PR and probably #2052 is no more needed and can be closed

mtamburrano · 2015-03-11T11:55:24Z

need_back_prop vector and all related stuff removed

bhack · 2015-03-13T18:29:07Z

@jeffdonahue I see that somebody is tagging PR as "ready for review". Can you tag also this one?

bhack · 2015-03-31T21:37:19Z

@jeffdonahue Can you pass here?

bhack · 2015-05-06T08:02:49Z

@mtamburrano Seems that It needs a rebase again.

bhack · 2015-05-14T11:11:31Z

@jeffdonahue We want to program our activity to reserve a little bit of time for this. Do you need other changes or is it mergeable?

bhack · 2015-05-15T07:22:36Z

@jeffdonahue Can we squash also this? Do you think is it ready?

jeffdonahue · 2015-05-15T19:54:53Z

src/caffe/proto/caffe.proto

+// Message that stores parameters used by FilterLayer
+message FilterParameter {
+  // specify which blobs need to be backpropagated and which not
+  repeated uint32 need_back_prop = 1 [packed = true];


This shouldn't be needed after #2095 right?

mtamburrano · 2015-05-18T11:59:07Z

removed need_back_prop param from Caffe.proto (I had already removed from the other files, but forgot that one) and merged again with Master.

@jeffdonahue I would like to remember that this PR now works, but to correctly treats with zero-sized batches, it still needs #2053 (although seems it is not only filter_layer to need it, but it applies to other cases, too)

bhack · 2015-05-21T21:49:02Z

@jeffdonahue Can we try to merge this before CVPR? We can allocate a little bit more time if you need some other little fixes.

mtamburrano · 2015-05-27T15:25:52Z

rebased on master again.
@jeffdonahue, have you any idea when this PR could be merged? (and #2053 too)

bhack · 2015-05-27T21:21:34Z

@jeffdonahue (@sguada amarcord) Since 18 Nov 2014 in #1448. This PR is becoming barrel aged edition!

jeffdonahue · 2015-05-28T04:53:41Z

src/caffe/layers/filter_layer.cpp

+
+template <typename Dtype>
+void FilterLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
+      const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {


This should have something like CHECK(!propagate_down[bottom.size() - 1]) << this->type() << "Layer cannot backpropagate to filter index inputs" (like SoftmaxWithLossLayer for labels, etc.). Same comment in Backward_gpu. (This will make the gradient check fail, but you give it bottom ID 0 as an additional argument to only check that input.)

jeffdonahue · 2015-05-28T04:59:17Z

Okay, please add a check as I commented above, squash, and I'll merge.

mtamburrano · 2015-05-28T10:11:50Z

Check added, test updated and squashed all commits.

I would to point up again that without #2053, if a Filter_Layer filters all the input data for a given batch, it outputs a 0-sized batch that will cause a crash because net.cpp doesn't allow that.
This was discussed with @longjon here(in particular see Point 5) and in the old PR #1484

mtamburrano · 2015-05-28T11:31:26Z

seems like Travis fails first 2 builds because not every tests pass.
I don't know why, that tests should not regard my changes, I also tried to build the same branch on my repositories and everything works fine: https://travis-ci.org/mtamburrano/caffe/builds/64395042

jeffdonahue · 2015-05-28T23:06:17Z

Not sure what happened but your diff is now huge. Did you rebase on the latest master?

bhack · 2015-05-29T06:20:29Z

@mtamburrano The problem seems that now it is rebased on mtamburrano@596add7 but caffe master is at b12c171

mtamburrano · 2015-05-29T15:17:57Z

I think I did something wrong when I squashed commits, everything should be fine now.
I'm sorry for the closed-reopen PR spam ;)

bhack · 2015-06-02T18:28:25Z

@jeffdonahue Seems rebased correctly now. Can you pass here?

jeffdonahue · 2015-06-03T01:37:43Z

Thanks @mtamburrano! Merged @ 21b6bf8. I fixed a bug in the GPU backward implementation (gradient test was segfaulting) and did some minor comment/style cleanup @ 0ccf336.

mtamburrano mentioned this pull request Mar 6, 2015

Filter layer #1482

Closed

shelhamer added the JD label Mar 7, 2015

mtamburrano mentioned this pull request Mar 10, 2015

Added param skip_propagate_down to LayerParameter #2095

Merged

jeffdonahue added the ready for review label Mar 31, 2015

jeffdonahue reviewed May 15, 2015
View reviewed changes

jeffdonahue reviewed May 28, 2015
View reviewed changes

mtamburrano force-pushed the filter_layer_rebased branch from e32982b to c24d184 Compare May 28, 2015 09:55

mtamburrano closed this May 28, 2015

mtamburrano reopened this May 28, 2015

mtamburrano closed this May 29, 2015

mtamburrano force-pushed the filter_layer_rebased branch from c24d184 to 596add7 Compare May 29, 2015 13:44

mtamburrano reopened this May 29, 2015

mtamburrano closed this May 29, 2015

mtamburrano force-pushed the filter_layer_rebased branch from b28e03d to ae4a5b1 Compare May 29, 2015 13:53

mtamburrano reopened this May 29, 2015

Filter Layer implemented

8a2b4e8

mtamburrano force-pushed the filter_layer_rebased branch from d234166 to 8a2b4e8 Compare May 29, 2015 14:01

jeffdonahue added a commit that referenced this pull request Jun 3, 2015

Merge pull request #2054 from mtamburrano/filter_layer_rebased

21b6bf8

jeffdonahue closed this Jun 3, 2015

bhack mentioned this pull request Aug 26, 2015

Switch layer #1496

Closed

Filter layer rebased #2054

Filter layer rebased #2054

Conversation

mtamburrano commented Mar 6, 2015

ducha-aiki commented Mar 6, 2015

bhack commented Mar 6, 2015

jeffdonahue commented Mar 6, 2015

ducha-aiki commented Mar 7, 2015

jeffdonahue commented Mar 9, 2015

mtamburrano commented Mar 9, 2015

bhack commented Mar 9, 2015

bhack commented Mar 9, 2015

jeffdonahue commented Mar 9, 2015

mtamburrano commented Mar 9, 2015

sguada commented Mar 9, 2015

bhack commented Mar 9, 2015

sguada commented Mar 9, 2015

jeffdonahue commented Mar 9, 2015

mtamburrano commented Mar 10, 2015

jeffdonahue commented Mar 10, 2015

mtamburrano commented Mar 10, 2015

mtamburrano commented Mar 11, 2015

bhack commented Mar 13, 2015

bhack commented Mar 31, 2015

bhack commented May 6, 2015

bhack commented May 14, 2015

bhack commented May 15, 2015

jeffdonahue May 15, 2015

Choose a reason for hiding this comment

mtamburrano commented May 18, 2015

bhack commented May 21, 2015

mtamburrano commented May 27, 2015

bhack commented May 27, 2015

jeffdonahue May 28, 2015

Choose a reason for hiding this comment

jeffdonahue commented May 28, 2015

mtamburrano commented May 28, 2015

mtamburrano commented May 28, 2015

jeffdonahue commented May 28, 2015

bhack commented May 29, 2015

mtamburrano commented May 29, 2015

bhack commented Jun 2, 2015

jeffdonahue commented Jun 3, 2015