Make bias optional #873

DhairyaLGandhi · 2019-09-27T06:36:20Z

Addresses #868

pshashk · 2019-09-27T07:45:50Z

Thank you for your prompt PR. For completeness sake this option could also be added to dense, recurrent and normalization layers (to disable affine transformation after normalization).

CarloLucibello · 2019-09-27T10:37:05Z

src/layers/conv.jl

 end

 function Conv(w::AbstractArray{T,N}, b::AbstractVector{T}, σ = identity;
-              stride = 1, pad = 0, dilation = 1) where {T,N}
+              stride = 1, pad = 0, dilation = 1, use_bias = true) where {T,N}


It is weird though that when calling this constructor with use_bias=false one has to pass a vector b as well.
I would suggest the following non-breaking change instead of the use_bias flag:

relax the signature to

Conv(w::AbstractArray{T,N}, b::Union{Nothing,AbstractVector{T}}, σ = identity; ...)

and have a call to

Conv(w, nothing)

construct a Conv layer with no bias

Should be able to support Conv(w, nothing) and make it a bit more extensible now, I think

MikeInnes · 2019-09-27T14:08:58Z

How about if we did this by setting the bias to nothing?

We could even use e.g. some kind of Zero type that does the right thing when you broadcast over it; then we can automatically use that in a bunch of layers, rather than modifying each one.

pshashk · 2019-09-27T15:01:44Z

Passing nothing as weight initializer (stuff like initW, initb, initβ, initγ) seems like the most flexible solution to me.

DhairyaLGandhi · 2019-09-27T15:07:00Z

Yeah, holding the bias felt awkward to say the least. I had wanted to keep the signature more or less the same to avoid breaking code that depended on that behavior. I had also previously tried just setting the bias to a zero(eltype(T)), but the extra compute I felt could be avoided.

I'll modify this to have a nothing instead. That looks clean

mcabbott · 2019-09-27T20:44:45Z

I was going to suggest setting the bias to false, as this ought to drop out of σ.(conv(x, c.weight, cdims) .+ b) when applying the layer, without needing further checking. And that constructing this Conv(…, bias=false) would then be both nice and accurate.

Would there be much extra computing, given that you are already broadcasting σ = identity? Or perhaps: if we’re going to add logic here, skipping broadcasting when σ === identity && b===false would be the more important case.

Some commit of #856 had bias=false for Dense, but I didn’t get around to actually testing that on GPU.

DhairyaLGandhi · 2019-09-30T08:53:50Z

We could introduce something like

struct ZeroType <: Number end
+(a::Number, ::ZeroType) = a
reshape(::ZeroType, args...) = ZeroType()

And use this whenever we need it for similar use cases. The constructor can just accept this type or a Nothing and we can get the bias (or any switch) that way

DhairyaLGandhi · 2019-09-30T08:55:09Z

With the false approach, how do we handle the call to reshape in the forward pass?

mcabbott · 2019-09-30T11:28:30Z

Good point about reshape, and [false]is ugly, so I change my vote to branching on b===nothing.

DhairyaLGandhi · 2019-10-01T16:03:20Z

Now we should be able to construct Conv(w, nothing), Conv(w, ZeroType(...)) and Conv((2,2), 1=>3, use_bias = false)

MikeInnes · 2019-10-03T14:53:09Z

The Conv(weight, bias) should be fine as is; how about the following setup for the convenience constructor:

Conv((k, k), in=>out; weight = convweight((k, k), in=>out), bias = convbias(out)) = ...

Then we can do Conv((2, 2), 3=>6, bias = zero). And as a bonus it's easy to generate the weight matrix outside of the conv layer, which is a nice utility to have.

DhairyaLGandhi · 2019-10-05T22:58:05Z

For people who want to provide their own weight matrices, Conv(weight, bias) is the more intuitive constructor, and I like having convweight to help that constructor more than the convenience one, as it makes the API a bit symmetric. Conv(convweight((k,k), in=>out), convbias(out))

A weight kwarg might make it harder for folks to find Conv(weight, bias), hence will document it explicitly.

I also worry that if provided, it effectively makes the first few args to the convenience constructor redundant.

janEbert · 2019-10-07T00:07:15Z

This ought to be documented in the "convenience" constructor as well! Great change, too; I always wrote silly extra constructors until now.

MikeInnes

Looking good!

src/layers/conv.jl

MikeInnes · 2019-10-07T14:35:56Z

test/layers/conv.jl

+  op = bias(ip)
+  @test sum(op) == prod(size(op))
+
+  bias = Conv((2,2), 1=>3, bias = zero(3))


Why use the zero function here? Also, are we still interested in a zero type?

Ah, did passing a bias = zero (from #873 (comment)) point to the zero type?

yeah, sorry, I was imagining a definition like zero = Zero(). That name might not work given that zero exists as a function, but I think we'll need a new type to guarantee that it won't be updated by optimisers.

Right, in that case I'll just get the definition of ZeroType back and we can have the guarantee baked into that

src/utils.jl

MikeInnes · 2020-01-16T16:12:41Z

src/utils.jl

+  res .= zero(S)
+end
+
+function *(a::Zeros{T,2}, b::AbstractArray{S,2}) where {T,S}


What's the motivation for this method?

Largely to define matmul here to avoid scalar operations, iirc

Flux.Zeros(3000,3000)) * rand(3000,3000);

would hit generic_matmul! here, which is slow too

julia> @btime $(zeros(3000,3000)) * $(rand(3000,3000)); 564.237 ms (2 allocations: 68.66 MiB) julia> @btime $(Flux.Zeros(3000,3000)) * $(rand(3000,3000)); 7.953 ms (6 allocations: 68.66 MiB)

That makes sense, I was just surprised that this method came up at all since Zeros() usually gets used as a bias. Did this come up from trying to support Dense layers with Zeros() for the weight?

If so it'd be good to discuss whether that's something we want to support anyway; it seems like an odd thing to want to do.

Partly to cover our ground in case it ends up being used outside of the context of a bias, and to cover the basic ops so it doesn't end up being accidentally slow

There's an unlimited number of methods we could special case like this though, right? Given that we can't add everything I think we need to decide what we're supporting and why. For broadcasting + and - that's obvious enough (it covers the bias case) but I don't know that we want to implement every matmul method without a good use case for it.

DhairyaLGandhi · 2020-02-26T17:08:08Z

@MikeInnes I added the kwarg only constructors, and made some docs corrections too. Perhaps good to get this in for now?

DhairyaLGandhi · 2020-02-26T18:51:12Z

The error happens with or without the bias machinery; perhaps Zygote related

MikeInnes · 2020-02-27T17:50:57Z

src/utils.jl

+
+function broadcasted(::typeof(*), a::AbstractArray, b::Zeros)
+  sz = similar(a, Broadcast.broadcast_shape(size(a), size(b)))
+  sz .= zero(a)


I have the same concern with this as with the methods above, i.e. if we're going to allocate anyway we should just be able to use the built-in fallback.

We could also potentially just have it return a correctly shaped Zeros object to avoid allocating.

But I understand the concern, and I agree that that shouldn't be necessary

Yeah, that seems like the best route for this particular method.

DhairyaLGandhi · 2020-03-04T13:00:47Z

The run passes here https://travis-ci.org/FluxML/Flux.jl/jobs/658191484?utm_medium=notification&utm_source=github_status

CarloLucibello · 2020-03-07T10:43:45Z

why we need to define our own types when we can use FillArrays.Zeros? FullArrays is already used by zygote.

It's very hard to review this PR, since it comes with a lot of unrelated doc additions

MikeInnes · 2020-05-01T13:10:18Z

FullArrays is already used by zygote.

I think the reasoning is that we want to represent not just an array of zeros, but an array which is held fixed with respect to gradient descent. There are a couple of ways to do this; either Zygote or FillArrays could decide that the gradient of a Fill is zero, but it's a bit hard to justify that outside of Flux (and probably breaks other uses of Fill). Alternatively Flux/Optimisers.jl could decide that Fill is a special case and doesn't get updated regardless of its gradient. This could again be a bit surprising and it's not clear how we'd document it (the type itself is the obvious choice, but we don't own that).

import Flux: Zeros sends a clear signal that this is a zero type that Flux knows about, which hopefully makes its behaviour feel more intuitively obvious, if not quite self-documenting.

MikeInnes · 2020-05-01T13:27:58Z

bors r+

bors · 2020-05-01T13:50:30Z

Build succeeded:

ci/gitlab/gitlab.com

Dhairya Gandhi added 3 commits September 27, 2019 11:48

make bias optional

5ea6a33

ditto remaining conv layers

9f2ac8f

docstrings

a801fcb

CarloLucibello reviewed Sep 27, 2019

View reviewed changes

Dhairya Gandhi added 2 commits October 1, 2019 21:25

use ZeroType

dced8c0

add to docs

1fe3217

add weight and bias kwargs

55ef7c1

Dhairya Gandhi added 6 commits October 6, 2019 04:41

ditto remaining layers

48a305b

fixes

e97d61f

rm ZeroType

d00f833

doc fixes

2ae3ad3

add N

214f71f

fixes

a1e826b

MikeInnes reviewed Oct 7, 2019

View reviewed changes

Dhairya Gandhi added 4 commits October 8, 2019 17:17

add ZeroType back

f3904b4

add bias and weight kwarg

040697f

tests bias switch

b596faa

document bias switch

95c5845

DhairyaLGandhi mentioned this pull request Dec 13, 2019

Is there a way to have layers (esp. a Conv) without biases? #966

Closed

MikeInnes reviewed Jan 16, 2020

View reviewed changes

src/utils.jl Outdated Show resolved Hide resolved

MikeInnes reviewed Jan 16, 2020

View reviewed changes

Dhairya Gandhi added 2 commits January 31, 2020 12:24

::typeof(op) -> op

b9fbee1

no-op copy

bc20103

DhairyaLGandhi mentioned this pull request Feb 15, 2020

Ways to freeze some part of a functor during training #1034

Closed

Dhairya Gandhi added 4 commits February 26, 2020 22:19

add kwarg constructors

f889d0c

docs improve

58211e3

more docs and constructors

cd93179

type signatures

cf82393

docs fix

20e78e2

MikeInnes reviewed Feb 27, 2020

View reviewed changes

Dhairya Gandhi added 2 commits March 4, 2020 17:57

rm unneccesary fns

7e308e7

correct broadcasting for addition

d8e44fc

mcabbott mentioned this pull request Mar 18, 2020

Dense with optional reshape #856

Closed

Dhairya Gandhi added 3 commits April 29, 2020 16:11

merge conflicts

5086c0f

move zeros to its own file

534809a

comment on possible future deprecations

29215fa

quick fix

8f877f2

bors bot merged commit 5d9acc7 into FluxML:master May 1, 2020

bhvieira mentioned this pull request Jun 6, 2020

Non-descriptive arg in Conv: why filter intead of size? #1212

Closed

MacKenzieHnC mentioned this pull request Sep 10, 2020

Flux.Zeros conflicts with Flux.loadparams! #1277

Closed

DhairyaLGandhi mentioned this pull request Nov 2, 2020

Fix some issues with Zeros option 2 #1379

Merged

4 tasks

Make bias optional #873

Make bias optional #873

Conversation

DhairyaLGandhi commented Sep 27, 2019

pshashk commented Sep 27, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeInnes commented Sep 27, 2019

pshashk commented Sep 27, 2019

DhairyaLGandhi commented Sep 27, 2019

mcabbott commented Sep 27, 2019 • edited Loading

DhairyaLGandhi commented Sep 30, 2019

DhairyaLGandhi commented Sep 30, 2019

mcabbott commented Sep 30, 2019

DhairyaLGandhi commented Oct 1, 2019

MikeInnes commented Oct 3, 2019

DhairyaLGandhi commented Oct 5, 2019

janEbert commented Oct 7, 2019

MikeInnes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeInnes Oct 7, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DhairyaLGandhi commented Feb 26, 2020

DhairyaLGandhi commented Feb 26, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DhairyaLGandhi commented Mar 4, 2020

CarloLucibello commented Mar 7, 2020

MikeInnes commented May 1, 2020

MikeInnes commented May 1, 2020

bors bot commented May 1, 2020

mcabbott commented Sep 27, 2019 •

edited

Loading

MikeInnes Oct 7, 2019 •

edited

Loading

DhairyaLGandhi commented Feb 26, 2020 •

edited

Loading