Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter layer #1482

Closed
wants to merge 11 commits into from
Closed

Filter layer #1482

wants to merge 11 commits into from

Conversation

mtamburrano
Copy link
Contributor

Filter_layer developed as discussed in #1448, implemented following the suggestions by @sguada and @longjon (paragraph 2(i)).

This layer accepts 2+ bottom blobs, the first blob acts as a selector, so it should be a vector s={0,1,2, .. n} in [0,1], and the remaining blobs are blobs to be filtered. Each value of 1 in the selector vector means that the items at the corresponding indices int the bottom_blobs[1+] will be kept, on the contrary a values of 0 means the items with corresponding symbols will be filtered out.
So only non-filtered items will be forwarded and afterwards will receive backpropagation.

As param the filter_layer needs a need_back_prop vector. This is needed because some blob could contains labels, and therefore will not need backpropagation. The need_back_prop vector as the same size as the top vector, and accepts 1s and 0s depending on whether it should be backpropagated or not.

This PR needs #1483 and #1484

@@ -190,6 +190,71 @@ class EltwiseLayer : public Layer<Dtype> {
};

/**
* @brief Takes two Blob%s, computes the argmax of the IF bottom blob,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this out-of-date?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is, totally missed it, thanks

@longjon
Copy link
Contributor

longjon commented Dec 1, 2014

I took a first pass... it looks like the core implementation is there, but it needs some fixes as noted.

I'm confused about whether need_back_prop_ does anything post-#1483?

@longjon longjon mentioned this pull request Dec 1, 2014
protected:
/**
* @param bottom input Blob vector (length 2+)
* -# @f$ (N \times C \times H \times W) @f$
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The selector should be (N \times 1 \times 1 \times 1), isn't it?
I'm not sure if the selector should be the first bottom blob or the last one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it should be, I'll edit this.
I don't see differences between first and last position, just say me what you prefer ;)

@mtamburrano
Copy link
Contributor Author

thank you for your detailed review, I'll fix the code waiting for your further replies where needed


for (size_t n = 0; n < new_tops_num; n++) {
int offset = indices_to_forward_[n];
int data_offset_top = dim*n;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to use predefined methods for offset
int data_offset_top = top[b-1]->offset(n);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@sguada
Copy link
Contributor

sguada commented Dec 1, 2014

I don't know how the test passed given the problems with lint, and the error with indices_to_forward in the backward pass.


TYPED_TEST(FilterLayerTest, TestGradient) {
typedef typename TypeParam::Dtype Dtype;
if (Caffe::mode() == Caffe::CPU || sizeof(Dtype) == 4) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should work independently of the sizeof(Dtype) and also for GPU

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about lint errors because make lint doesn't show anything (everything seems ok). About the indices_to_forward error, I think it access random data when n > indice_to_forward.size() and then always enter in (n != offset) case, I'll fix this.

@mtamburrano
Copy link
Contributor Author

pushed various fixes discussed.
Now the gpu calls fail because caffe_set doesn't properly work in GPU mode, I opened #1511 to fix this.
The filter is still applied in Reshape instead Forward and need_back_prop vector is still a param because I think we need further discussion about that

"Selector blob (bottom[0]) must have height == 1";
int num_items = bottom[0]->num();
for (int i = 1; i < bottom.size(); i++) {
CHECK_EQ(num_items, bottom[i]->num()) <<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think CHECK_EQ(bottom[0]->num(), bottom[i]->num()) is more clear

for (int i = 1; i < propagate_down.size(); i++) {
// bottom[0] is the selector and never needs backpropagation
// so we can start from i = 1 and index each top with i-1
if (propagate_down[i] && need_back_prop_[i-1]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is weird that propagate_down and need_back_prop don't align.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they don't align because need_back_prop size is equals to bottom.size() - 1 (or top.size()), instead propagate_down has size equals to bottom.size().
This happens because bottom_selector never needs backprop and doesn't have a corresponding top, so is useless to force the user to specify '0' on the first element in the prototxt params

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the selector is the last bottom they will align.

@sguada
Copy link
Contributor

sguada commented Dec 3, 2014

@mtamburrano Have you tried what happen if you don't use need_prop_down?

int data_offset_top = top[b-1]->offset(n);
int data_offset_bottom = bottom[b]->offset(indices_to_forward_[n]);

caffe_copy(dim, bottom_data+data_offset_bottom,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add spaces around '+' and try fit in one line

@mtamburrano
Copy link
Contributor Author

Have you tried what happen if you don't use need_prop_down?

Happens that blobs that not needs backprop (e.g. labels) receive it.
Until the way propagate_down is computed doesn't change (as you early proposed here and as discussed in #1483) I don't think that need_back_prop vector could be removed.

This is very complicated logic to do the backward propagation. I think it could done much easier using bottom_data_selector

I'm not sure what are you proposing... We need to fill with zeros the diff matrix on indices of filtered items and these indices are stored in indices_to_forward_, I could pick them from bottom_selector instead, but I don't think this is what you are suggesting

@mtamburrano
Copy link
Contributor Author

moved selector blob to last bottom position, so now top and bottom blobs indices are aligned.

@bhack
Copy link
Contributor

bhack commented Dec 15, 2014

@sguada @longjon @shelhamer We need to allocate time on this in our weekly working plan. Can you give us some feedbacks of the review plan or what kind of work is still needed?

@sguada
Copy link
Contributor

sguada commented Dec 16, 2014

@bhack @mtamburrano the introduction of need_back_prop makes thing messier. The code for backward is also complicated, in its current form.

Could you try to use two different filter layers, one for data and another for label, which should behave properly, propagating down for data blobs but not for label blobs.

@mtamburrano
Copy link
Contributor Author

@sguada, I don't think this solution works.
Both data and labels filter layers would be attached to a selector, obtained from a layer that needs back_prop, so both layers will need back_prop too.
I don't know if this is clear, let's say a branch of our net ends with an inner_product layer (remember the cat /not_cat example?) from which the selector is computed using a sigmoid followed by a threshold layer (like you suggested). Now, the inner_product is also attached to a sigmoid_cross_entropy_loss layer and needs back_prop, then all the successive layers will have the need_backward flag raised, and so the filter layers.
To avoid a back_prop, the label's filter layer should be only attached directly to data layers, but this is not possible because of the need of the selector

@bhack
Copy link
Contributor

bhack commented Dec 18, 2014

I saw a very very low activity by core developers (BVLC members) also after cvpr 2014 deadline.
What happens? @shelhamer Can you give some general feedback to the community (if we can consider to have ones)? We was quite responsive on every PR that we have opened and we are investing working hours on this but at this rate it very hard to reserve activity in the planning. I really hope the the "final solution" will not be that you are changing the infrastructure so that everyone can maintain its own layer with python prototypes.

@bhack
Copy link
Contributor

bhack commented Feb 7, 2015

@shelhamer Can you pass here?

@mtamburrano
Copy link
Contributor Author

closed, new PR is #2054

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants