Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config for the mic scene #29

Closed
krrish94 opened this issue Apr 16, 2020 · 12 comments
Closed

Config for the mic scene #29

krrish94 opened this issue Apr 16, 2020 · 12 comments

Comments

@krrish94
Copy link

Hi @bmild,

Do you happen to have a separate config for the mic scene? (or the defaults for blender sequences are expected to work?)

@bmild
Copy link
Owner

bmild commented Apr 16, 2020

The defaults should work, what problems are you running into?

@krrish94
Copy link
Author

The defaults work for most of the other blender sequences, but in case of mic, the loss curves seem to keep fluctuating near the initialization.

(I'll try a couple other random seeds and keep you posted)

@bmild
Copy link
Owner

bmild commented Apr 16, 2020

We did find that for the blender scenes with a lot of whitespace (Ficus and Mic) the training would sometimes diverge right at the beginning of training. One hack that helps with initial training stability for these scenes with a lot of background and a small object of interest is to crop out the central part of the image for training for just the first 1000 iters or so, so it's not being supervised on almost all white pixels at the start.

@krrish94
Copy link
Author

Ah thanks for the note, I briefly pondered over cropping, but this is reassuring!

@bmild
Copy link
Owner

bmild commented Apr 16, 2020

For future use, just added this functionality: fd624c0

Something like --precrop_iters 500 --precrop_frac .5 should hopefully fix the divergence problem.

@kwea123
Copy link

kwea123 commented Apr 19, 2020

Another question on the white background: Is your PSNR score (and others) calculated with the whole image? Actually the white background is quite large (about 60% to 70% of the pixels), using pretrained model gives 40 to 50 PSNR on white regions and ~30 on object region.

Also, won't training on large fraction on white background makes the model optimize more for white background? i.e. it predicts perfectly the white part but slightly worse on object. That's not what we want right? How about training on object region only?

@krrish94
Copy link
Author

@kwea123 While training on "object region only" is a fair thing to do for synthetic objects (like the Blender dataset), it's seldom applicable to real scenes. Eg. if you look at LLFF data, you'll need to reason over the entire image and it's hard to ignore pixels, unless you bake in additional prior knowledge about the scene.

This is my perspective though; I'd be keen to hear if others feel the same way too.

@bmild
Copy link
Owner

bmild commented Apr 19, 2020

@kwea123 We calculate all metrics with the whole image. It is true that training more on the object region than the background improves overall metrics -- however, you still need to send some amount of the samples through the background pixels (10-20%) or else you will end up with floater artifacts when you render whole images at test time.

In general I tend to agree with @krrish94 that it's best to avoid using object masks with image based supervision since it limits the applicability to real scenes.

@kwea123
Copy link

kwea123 commented Jan 22, 2021

A late reply of what I just found recently! (For my pytorch implementation) Sometimes the network initialization makes the sigma output always zero after the relu activation (i.e. the ray's color is predicted white), and since there is a lot of white space, the training will not escape from this local minimum, making the result all white.

Therefore, I tried replacing relu with softplus (they are similar functions, except that softplus is always>0), and the training seems more stable now, without the need of luck to make it converge initially...
That being said, this is just my personal thought without any rigorous experiments, but it should be worth a try if anyone still encounter this not converging problem.

@kwea123
Copy link

kwea123 commented Apr 5, 2021

The softplus activation is verified in the recent paper: https://arxiv.org/pdf/2103.13415.pdf

In the original NeRF paper, the activation functions used by the MLP to construct the predicted density τ and color c
are a ReLU and a sigmoid, respectively. Instead of a ReLU as the activation function to produce τ , we use a shifted softplus: log(1 + exp(x − 1)). We found that using a softplus yielded a smoother optimization problem that is less prone
to catastrophic failure modes in which the MLP emits negative values everywhere (in which case all gradients from τ
are zero and optimization will fail).

@Gatsby23
Copy link

A late reply of what I just found recently! (For my pytorch implementation) Sometimes the network initialization makes the sigma output always zero after the relu activation (i.e. the ray's color is predicted white), and since there is a lot of white space, the training will not escape from this local minimum, making the result all white.

Therefore, I tried replacing relu with softplus (they are similar functions, except that softplus is always>0), and the training seems more stable now, without the need of luck to make it converge initially... That being said, this is just my personal thought without any rigorous experiments, but it should be worth a try if anyone still encounter this not converging problem.

Thank you for your wonderful work about re-implement the NERF-IN-Wild, However, from the original nerf-wild paper and the new SIGGRAPH paper 《ReLU Fields: The Little Non-linearity That Could》 suggests, it seems the relu function is important in the nerf training. I think maybe there are some other mistake cause this problem.

@kjchalup
Copy link

kjchalup commented Aug 2, 2022

We did find that for the blender scenes with a lot of whitespace (Ficus and Mic) the training would sometimes diverge right at the beginning of training. One hack that helps with initial training stability for these scenes with a lot of background and a small object of interest is to crop out the central part of the image for training for just the first 1000 iters or so, so it's not being supervised on almost all white pixels at the start.

I too am trying to repro vanilla NeRF and finding it hard on mic & ficus. Could you please confirm whether in the original NeRF paper you used this trick? Or did you get it to eventually train without cropping, just after trying enough times?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants