ground truth depth is actually distance #9

sniklaus · 2021-01-05T21:43:39Z

The following line doesn't compute the depth, but the distance from a point to the camera. As such, the provided depth_meters files are actually distance_meters instead. Not a big deal as long as you are aware of it, one can convert one to the other using the focal length. But if you aren't aware of it you may get severely wrong results as shown in the screenshots below.

ml-hypersim/code/python/tools/generate_hdf5_from_vrimg.py

Line 302 in 9c9be19

    
           depth        = linalg.norm(position - camera_position[newaxis,newaxis,:], axis=2)

If you use the provided depth (which is actually distance) to render the image as a point cloud you will get distortions:

If you instead convert the provided depth to the actual depth and then render the image as a point cloud from that:

The text was updated successfully, but these errors were encountered:

mikeroberts3000 · 2021-01-05T22:11:11Z

Hi @sniklaus, thanks for this great visualization. You're absolutely correct that depth_meters should really be called distance_meters. I apologize for this unclear naming convention 😅 In our defense, we document the correct interpretation for this data in our README.

frame.IIII.depth_meters.hdf5    # Euclidean distances in meters to the optical center of the camera

But this is a great reminder to be careful when interpreting our depth_meters images. We also provide position images, where the value at each pixel is a position in world-space, which can be projected into image-space using whatever convention you prefer. Note that our position images are given in asset units, i.e., not meters.

@sniklaus, if you have a self-contained code snippet to convert our depth_meters or position images into planar depth images that are more useful in your downstream application, please feel free to post it here 😀

sniklaus · 2021-01-05T22:16:27Z

Thanks for chiming in and for the clarifications! I used the following to convert the distance to depth, it expects to have the following variables: intWidth (1024), intHeight (768), fltFocal (886.81), and npyDistance (from the depth_meters.hdf5).

npyImageplaneX = numpy.linspace((-0.5 * intWidth) + 0.5, (0.5 * intWidth) - 0.5, intWidth).reshape(1, intWidth).repeat(intHeight, 0).astype(numpy.float32)[:, :, None]
npyImageplaneY = numpy.linspace((-0.5 * intHeight) + 0.5, (0.5 * intHeight) - 0.5, intHeight).reshape(intHeight, 1).repeat(intWidth, 1).astype(numpy.float32)[:, :, None]
npyImageplaneZ = numpy.full([intHeight, intWidth, 1], fltFocal, numpy.float32)
npyImageplane = numpy.concatenate([npyImageplaneX, npyImageplaneY, npyImageplaneZ], 2)

npyDepth = npyDistance / numpy.linalg.norm(npyImageplane, 2, 2) * fltFocal

mikeroberts3000 · 2021-01-05T22:19:06Z

Sweet! 😀

Tord-Zhang · 2022-04-24T12:38:15Z

@sniklaus Hi, the focal length of all images are the same in this dataset?

mikeroberts3000 · 2022-04-24T19:40:21Z

@Tord-Zhang When computing planar depth images for typical downstream learning applications, it is a reasonable approximation to assume that all images have the same focal length.

However, if you want exactly perfect planar depth data, you need to account for the fact that our camera intrinsics can vary in minor ways for each scene. More specifically, due to minor tilt-shift photography effects that can vary per-scene, the image plane is not guaranteed to be exactly orthogonal to the camera-space z-axis. So what does it mean to compute a "planar" depth image in these cases? What is the exact quantity that you want to store at each pixel in your "planar" depth image?

In these cases, the solution that makes the most sense to me is to warp the scene geometry in a way that exactly inverts the tilt-shift photography effects. If you do this correctly, the warped scene geometry viewed through a typical pinhole camera will produce an identical image to the non-warped scene geometry viewed through a tilt-shift camera. At this point, you can compute the planar depth image as usual using the warped scene geometry.

See here and here for a more detailed discussion.

mikeroberts3000 closed this as completed Jan 5, 2021

mikeroberts3000 mentioned this issue Jan 6, 2021

Question on coordinate systems #10

Closed

teufelweich mentioned this issue May 8, 2021

Broken Scenes #22

Closed

feiran-l mentioned this issue Mar 29, 2022

camera intrinsic matrix #44

Closed

brotherofken mentioned this issue May 31, 2023

HypersimDataset is missing nianticlabs/implicit-depth#1

Closed

mikeroberts3000 mentioned this issue Jul 17, 2023

How to get point cloud? #57

Closed

Lizhuoling mentioned this issue Aug 20, 2023

How to obtain 3D point cloud for 3D object detetcion #60

Closed

fuxiao0719 mentioned this issue Jul 18, 2024

Request for code of datasets preprocess fuxiao0719/GeoWizard#28

Open

ZakeyShi mentioned this issue Aug 21, 2024

PointCloud Align Problem #76

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ground truth depth is actually distance #9

ground truth depth is actually distance #9

sniklaus commented Jan 5, 2021

mikeroberts3000 commented Jan 5, 2021 •

edited

Loading

sniklaus commented Jan 5, 2021 •

edited by mikeroberts3000

Loading

mikeroberts3000 commented Jan 5, 2021

Tord-Zhang commented Apr 24, 2022

mikeroberts3000 commented Apr 24, 2022 •

edited

Loading

ground truth depth is actually distance #9

ground truth depth is actually distance #9

Comments

sniklaus commented Jan 5, 2021

mikeroberts3000 commented Jan 5, 2021 • edited Loading

sniklaus commented Jan 5, 2021 • edited by mikeroberts3000 Loading

mikeroberts3000 commented Jan 5, 2021

Tord-Zhang commented Apr 24, 2022

mikeroberts3000 commented Apr 24, 2022 • edited Loading

mikeroberts3000 commented Jan 5, 2021 •

edited

Loading

sniklaus commented Jan 5, 2021 •

edited by mikeroberts3000

Loading

mikeroberts3000 commented Apr 24, 2022 •

edited

Loading