Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ground truth depth is actually distance #9

Closed
sniklaus opened this issue Jan 5, 2021 · 5 comments
Closed

ground truth depth is actually distance #9

sniklaus opened this issue Jan 5, 2021 · 5 comments

Comments

@sniklaus
Copy link
Contributor

sniklaus commented Jan 5, 2021

The following line doesn't compute the depth, but the distance from a point to the camera. As such, the provided depth_meters files are actually distance_meters instead. Not a big deal as long as you are aware of it, one can convert one to the other using the focal length. But if you aren't aware of it you may get severely wrong results as shown in the screenshots below.

depth = linalg.norm(position - camera_position[newaxis,newaxis,:], axis=2)

If you use the provided depth (which is actually distance) to render the image as a point cloud you will get distortions:
image

If you instead convert the provided depth to the actual depth and then render the image as a point cloud from that:
image

@mikeroberts3000
Copy link
Collaborator

mikeroberts3000 commented Jan 5, 2021

Hi @sniklaus, thanks for this great visualization. You're absolutely correct that depth_meters should really be called distance_meters. I apologize for this unclear naming convention 😅 In our defense, we document the correct interpretation for this data in our README.

frame.IIII.depth_meters.hdf5    # Euclidean distances in meters to the optical center of the camera

But this is a great reminder to be careful when interpreting our depth_meters images. We also provide position images, where the value at each pixel is a position in world-space, which can be projected into image-space using whatever convention you prefer. Note that our position images are given in asset units, i.e., not meters.

@sniklaus, if you have a self-contained code snippet to convert our depth_meters or position images into planar depth images that are more useful in your downstream application, please feel free to post it here 😀

@sniklaus
Copy link
Contributor Author

sniklaus commented Jan 5, 2021

Thanks for chiming in and for the clarifications! I used the following to convert the distance to depth, it expects to have the following variables: intWidth (1024), intHeight (768), fltFocal (886.81), and npyDistance (from the depth_meters.hdf5).

npyImageplaneX = numpy.linspace((-0.5 * intWidth) + 0.5, (0.5 * intWidth) - 0.5, intWidth).reshape(1, intWidth).repeat(intHeight, 0).astype(numpy.float32)[:, :, None]
npyImageplaneY = numpy.linspace((-0.5 * intHeight) + 0.5, (0.5 * intHeight) - 0.5, intHeight).reshape(intHeight, 1).repeat(intWidth, 1).astype(numpy.float32)[:, :, None]
npyImageplaneZ = numpy.full([intHeight, intWidth, 1], fltFocal, numpy.float32)
npyImageplane = numpy.concatenate([npyImageplaneX, npyImageplaneY, npyImageplaneZ], 2)

npyDepth = npyDistance / numpy.linalg.norm(npyImageplane, 2, 2) * fltFocal

@mikeroberts3000
Copy link
Collaborator

Sweet! 😀

@Tord-Zhang
Copy link

@sniklaus Hi, the focal length of all images are the same in this dataset?

@mikeroberts3000
Copy link
Collaborator

mikeroberts3000 commented Apr 24, 2022

@Tord-Zhang When computing planar depth images for typical downstream learning applications, it is a reasonable approximation to assume that all images have the same focal length.

However, if you want exactly perfect planar depth data, you need to account for the fact that our camera intrinsics can vary in minor ways for each scene. More specifically, due to minor tilt-shift photography effects that can vary per-scene, the image plane is not guaranteed to be exactly orthogonal to the camera-space z-axis. So what does it mean to compute a "planar" depth image in these cases? What is the exact quantity that you want to store at each pixel in your "planar" depth image?

In these cases, the solution that makes the most sense to me is to warp the scene geometry in a way that exactly inverts the tilt-shift photography effects. If you do this correctly, the warped scene geometry viewed through a typical pinhole camera will produce an identical image to the non-warped scene geometry viewed through a tilt-shift camera. At this point, you can compute the planar depth image as usual using the warped scene geometry.

See here and here for a more detailed discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants