Misplaced 2D and 3D bboxes in DAIR-V2X-C infrastructure-side dataset in monocular 3D detection #2921

ramajoballester · 2024-03-06T19:25:02Z

Prerequisite

I have searched Issues and Discussions but cannot get the expected help.
I have read the FAQ documentation but cannot get the expected help.
The bug has not been fixed in the latest version (dev-1.x) or latest version (dev-1.0).

Task

I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.

Branch

main branch https://github.com/open-mmlab/mmdetection3d

Environment

sys.platform: linux
Python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: NVIDIA GeForce RTX 4060 Laptop GPU
CUDA_HOME: /home/breaststroker/miniconda3/envs/sensus
NVCC: Cuda compilation tools, release 11.7, V11.7.99
GCC: gcc (conda-forge gcc 13.2.0-2) 13.2.0
PyTorch: 1.13.1
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.7
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.5
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

TorchVision: 0.14.1
OpenCV: 4.9.0
MMEngine: 0.10.3
MMDetection: 3.3.0
MMDetection3D: 1.4.0+5bae4db
spconv2.0: False

Reproduces the problem - code sample

python tools/misc/browse_dataset.py configs/_base_/datasets/dair-infrastructure-mono3d.py --task mono_det --show-interval -1

Reproduces the problem - command or script

python tools/misc/browse_dataset.py configs/_base_/datasets/dair-infrastructure-mono3d.py --task mono_det --show-interval -1

Reproduces the problem - error message

I have the DAIR-V2X infrastructure side transformed into KITTI dataset format, as explained in the DAIR-V2X repo. For a bit of context, I attach the example image, calib and label:

image_2/000009.jpg

calib/000009.txt

P2: 2186.359688 0.0 968.712906 0.0 0.0 2332.160319 542.356703 0.0 0.0 0.0 1.0 0.0
P2: 2186.359688 0.0 968.712906 0.0 0.0 2332.160319 542.356703 0.0 0.0 0.0 1.0 0.0
P2: 2186.359688 0.0 968.712906 0.0 0.0 2332.160319 542.356703 0.0 0.0 0.0 1.0 0.0
P2: 2186.359688 0.0 968.712906 0.0 0.0 2332.160319 542.356703 0.0 0.0 0.0 1.0 0.0
R0_rect: 1 0 0 0 1 0 0 0 1
Tr_velo_to_cam: -0.0638033225610772 -0.9910914864003576 -0.04429948490729328 -5.779144404715124 -0.2102873406178483 0.043997692433495696 -0.7987692871343754 6.037615758600886 0.97575114561348 -0.06031492538699515 -0.17158543199893228 1.0636424034755758
Tr_velo_to_cam: -0.0638033225610772 -0.9910914864003576 -0.04429948490729328 -5.779144404715124 -0.2102873406178483 0.043997692433495696 -0.7987692871343754 6.037615758600886 0.97575114561348 -0.06031492538699515 -0.17158543199893228 1.0636424034755758

label_2/000009.txt

Car 0 0 4.6669410114388885 768.12085 747.97583 1002.4884040000001 1021.312134 1.688849 4.234148 1.804866 -0.8819495516233096 2.8953843521790024 21.46221178229285 0.02090180743721963
Car 0 2 1.7132049465649037 537.003235 129.776245 566.351257 157.738037 1.528294 4.310337 2.020647 -33.328591873556334 -28.412114486707786 173.37066552019755 3.124081467350504
Van 0 0 -1.5653797595952301 840.910278 325.771698 939.458007 443.844879 1.984082 4.571132 1.939208 -1.4597153684581072 -2.6631098031012668 46.65140313581139 -0.0388277025095219
Car 0 0 -1.5514922842707104 742.326355 145.181198 776.1932370000001 180.11795 1.785853 4.377283 1.911703 -12.682089755150056 -20.945567572722865 134.50046827972872 0.009269784349501603
Motorcyclist 0 0 4.674842098522565 887.250793 167.996002 905.1947630000001 203.949646 1.672734 1.754379 0.529791 -3.44209662571358 -16.470410166202925 111.61942889944046 0.0030322111751362717
Car 0 0 -1.5623483740764919 894.659302 192.236069 948.35376 239.52072199999998 1.651805 4.245745 1.823661 -2.0744662700321657 -12.30245666249644 91.58862887154407 -0.05031164125617026
Trafficcone 0 0 3.5891132314464436 1331.449951 294.689697 1340.6091310000002 310.10672 0.44962 0.222651 0.219005 10.35971527224232 -6.069461457834561 61.40834019294974 0.8861011511218699
Car 0 0 1.8616468166392441 211.584747 254.128204 301.793823 317.809997 1.58761 4.57856 2.002337 -22.56602826435825 -6.342838921004163 67.53999353490286 3.1084151885785167
Trafficcone 0 0 4.1024536776262055 1241.356323 287.517853 1249.868042 301.202942 0.393784 0.258348 0.18615 8.047659840988533 -6.5939597659038 64.07250234460521 0.41476853495445914
Car 0 1 4.709589673079833 777.143066 271.808533 843.879516 337.971649 1.503798 4.2154 1.874583 -4.390414891235553 -6.07770137919595 62.974634550515304 0.0070001806763275555
Car 0 1 1.7731603509315175 462.032532 163.720047 502.709412 197.77861 1.422646 4.379472 1.880285 -26.70736057312631 -17.936692775729945 122.30703894923853 3.089732051524637
Trafficcone 0 0 3.400897366048148 1484.218994 301.777435 1493.920288 320.284851 0.365403 0.278439 0.215041 13.92612343182919 -5.631366622481028 58.858360021719406 1.0102553142580815
Trafficcone 0 0 4.368320226626967 1173.750732 272.711395 1182.2017819999999 290.742523 0.448666 0.206246 0.209143 6.598687863648057 -7.383986054362884 67.63837057883407 0.1789692557829332
Van 0 0 -1.492439335436994 352.213623 709.656189 622.050659 1059.0161130000001 2.041044 4.559657 1.877645 -4.056301335660896 3.0519587280884264 21.037630293620072 0.04612096574272841
Car 0 0 -1.39907314861604 730.152344 187.649612 774.882447 230.827285 1.679892 4.413034 1.875433 -9.312170958672386 -13.828439367188246 99.56277064046868 -0.14151448995393795
Car 0 0 -1.523657868931706 821.516785 209.794785 874.700928 257.488007 1.553817 4.597573 1.977554 -4.7396739368956124 -10.854466403844034 85.0099652285325 -0.05586356249819718
Car 0 0 -1.3457988660114772 98.036835 696.638428 355.456085 918.447144 1.446576 4.359478 1.970789 -7.049763254276547 2.961105807417854 21.61757908202802 0.0245684483504741
Car 0 0 1.7979275645395674 305.414154 198.309235 363.55011 246.641312 1.537114 4.284989 2.058519 -27.695496271911075 -12.195997315729105 94.58783424463807 3.134120934236738

When I browse the dataset with python tools/misc/browse_dataset.py configs/_base_/datasets/dair-infrastructure-3d-3class.py --task lidar_det --show-interval -1,

dair-infrastructure-3d-3class.py

# dataset settings
dataset_type = 'KittiDataset'
data_root = 'data/DAIR-V2X/cooperative-vehicle-infrastructure-kittiformat/infrastructure-side/'
class_names = ['Pedestrian', 'Cyclist', 'Car']
point_cloud_range = [0, -60, -2, 200, 60, 1]
input_modality = dict(use_lidar=True, use_camera=False)
metainfo = dict(classes=class_names)

# Example to use different file client
# Method 1: simply set the data root and let the file I/O module
# automatically infer from prefix (not support LMDB and Memcache yet)

# data_root = 's3://openmmlab/datasets/detection3d/kitti/'

# Method 2: Use backend_args, file_client_args in versions before 1.1.0
# backend_args = dict(
#     backend='petrel',
#     path_mapping=dict({
#         './data/': 's3://openmmlab/datasets/detection3d/',
#          'data/': 's3://openmmlab/datasets/detection3d/'
#      }))
backend_args = None

db_sampler = dict(
    data_root=data_root,
    info_path=data_root + 'dair_dbinfos_train.pkl',
    rate=1.0,
    prepare=dict(
        filter_by_difficulty=[-1],
        filter_by_min_points=dict(Car=5, Pedestrian=10, Cyclist=10)),
    classes=class_names,
    sample_groups=dict(Car=12, Pedestrian=6, Cyclist=6),
    points_loader=dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=4,
        use_dim=4,
        backend_args=backend_args),
    backend_args=backend_args)

train_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=4,  # x, y, z, intensity
        use_dim=4,
        backend_args=backend_args),
    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
    dict(type='ObjectSample', db_sampler=db_sampler),
    dict(
        type='ObjectNoise',
        num_try=100,
        translation_std=[1.0, 1.0, 0.5],
        global_rot_range=[0.0, 0.0],
        rot_range=[-0.78539816, 0.78539816]),
    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
    dict(
        type='GlobalRotScaleTrans',
        rot_range=[-0.78539816, 0.78539816],
        scale_ratio_range=[0.95, 1.05]),
    dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='PointShuffle'),
    dict(
        type='Pack3DDetInputs',
        keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
test_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=4,
        use_dim=4,
        backend_args=backend_args),
    dict(
        type='MultiScaleFlipAug3D',
        img_scale=(1333, 800),
        pts_scale_ratio=1,
        flip=False,
        transforms=[
            dict(
                type='GlobalRotScaleTrans',
                rot_range=[0, 0],
                scale_ratio_range=[1., 1.],
                translation_std=[0, 0, 0]),
            dict(type='RandomFlip3D'),
            dict(
                type='PointsRangeFilter', point_cloud_range=point_cloud_range)
        ]),
    dict(type='Pack3DDetInputs', keys=['points'])
]
# construct a pipeline for data and gt loading in show function
# please keep its loading function consistent with test_pipeline (e.g. client)
eval_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=4,
        use_dim=4,
        backend_args=backend_args),
    dict(type='Pack3DDetInputs', keys=['points'])
]
train_dataloader = dict(
    batch_size=16,
    num_workers=2,
    persistent_workers=True,
    sampler=dict(type='DefaultSampler', shuffle=True),
    dataset=dict(
        type='RepeatDataset',
        times=2,
        dataset=dict(
            type=dataset_type,
            data_root=data_root,
            ann_file='dair_infos_train.pkl',
            # data_prefix=dict(pts='training/velodyne_reduced'),
            data_prefix=dict(pts='training/velodyne'),
            pipeline=train_pipeline,
            modality=input_modality,
            test_mode=False,
            metainfo=metainfo,
            # we use box_type_3d='LiDAR' in kitti and nuscenes dataset
            # and box_type_3d='Depth' in sunrgbd and scannet dataset.
            box_type_3d='LiDAR',
            backend_args=backend_args)))
val_dataloader = dict(
    batch_size=1,
    num_workers=1,
    persistent_workers=True,
    drop_last=False,
    sampler=dict(type='DefaultSampler', shuffle=False),
    dataset=dict(
        type=dataset_type,
        data_root=data_root,
        # data_prefix=dict(pts='training/velodyne_reduced'),
        data_prefix=dict(pts='training/velodyne'),
        ann_file='dair_infos_val.pkl',
        pipeline=test_pipeline,
        modality=input_modality,
        test_mode=True,
        metainfo=metainfo,
        box_type_3d='LiDAR',
        backend_args=backend_args))
test_dataloader = dict(
    batch_size=1,
    num_workers=1,
    persistent_workers=True,
    drop_last=False,
    sampler=dict(type='DefaultSampler', shuffle=False),
    dataset=dict(
        type=dataset_type,
        data_root=data_root,
        # data_prefix=dict(pts='training/velodyne_reduced'),
        data_prefix=dict(pts='training/velodyne'),
        ann_file='dair_infos_val.pkl',
        pipeline=test_pipeline,
        modality=input_modality,
        test_mode=True,
        metainfo=metainfo,
        box_type_3d='LiDAR',
        backend_args=backend_args))
val_evaluator = dict(
    type='KittiMetric',
    ann_file=data_root + 'dair_infos_val.pkl',
    metric='bbox',
    backend_args=backend_args)
test_evaluator = val_evaluator

vis_backends = [dict(type='LocalVisBackend')]
visualizer = dict(
    type='Det3DLocalVisualizer', vis_backends=vis_backends, name='visualizer')

I get perfect 3D bboxes in the pointcloud

Results with pointcloud (perfect)

BUT when I browse the dataset with python tools/misc/browse_dataset.py configs/_base_/datasets/dair-infrastructure-mono3d.py --task mono_det --show-interval -1, I get the misplaced 2D and 3D bboxes:

dair-infrastructure-mono3d.py

dataset_type = 'KittiDataset'
data_root = 'data/DAIR-V2X/cooperative-vehicle-infrastructure-kittiformat/infrastructure-side/'
class_names = ['Pedestrian', 'Cyclist', 'Car']
input_modality = dict(use_lidar=False, use_camera=True)
metainfo = dict(classes=class_names)

# Example to use different file client
# Method 1: simply set the data root and let the file I/O module
# automatically infer from prefix (not support LMDB and Memcache yet)

# data_root = 's3://openmmlab/datasets/detection3d/kitti/'

# Method 2: Use backend_args, file_client_args in versions before 1.1.0
# backend_args = dict(
#     backend='petrel',
#     path_mapping=dict({
#         './data/': 's3://openmmlab/datasets/detection3d/',
#          'data/': 's3://openmmlab/datasets/detection3d/'
#      }))
backend_args = None

train_pipeline = [
    dict(type='LoadImageFromFileMono3D', backend_args=backend_args),
    dict(
        type='LoadAnnotations3D',
        with_bbox=True,
        with_label=True,
        with_attr_label=False,
        with_bbox_3d=True,
        with_label_3d=True,
        with_bbox_depth=True),
    dict(type='Resize', scale=(1920, 1080), keep_ratio=True),
    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
    dict(
        type='Pack3DDetInputs',
        keys=[
            'img', 'gt_bboxes', 'gt_bboxes_labels', 'gt_bboxes_3d',
            'gt_labels_3d', 'centers_2d', 'depths'
        ]),
]
test_pipeline = [
    dict(type='LoadImageFromFileMono3D', backend_args=backend_args),
    dict(type='Resize', scale=(1920, 1080), keep_ratio=True),
    dict(type='Pack3DDetInputs', keys=['img'])
]
eval_pipeline = [
    dict(type='LoadImageFromFileMono3D', backend_args=backend_args),
    dict(type='Pack3DDetInputs', keys=['img'])
]

train_dataloader = dict(
    batch_size=2,
    num_workers=2,
    persistent_workers=True,
    sampler=dict(type='DefaultSampler', shuffle=True),
    dataset=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file='dair_infos_train.pkl',
        data_prefix=dict(img='training/image_2'),
        pipeline=train_pipeline,
        modality=input_modality,
        load_type='fov_image_based',
        test_mode=False,
        metainfo=metainfo,
        # we use box_type_3d='Camera' in monocular 3d
        # detection task
        box_type_3d='Camera',
        # box_type_3d='LiDAR',
        backend_args=backend_args))
val_dataloader = dict(
    batch_size=1,
    num_workers=2,
    persistent_workers=True,
    drop_last=False,
    sampler=dict(type='DefaultSampler', shuffle=False),
    dataset=dict(
        type=dataset_type,
        data_root=data_root,
        data_prefix=dict(img='training/image_2'),
        ann_file='dair_infos_val.pkl',
        pipeline=test_pipeline,
        modality=input_modality,
        load_type='fov_image_based',
        metainfo=metainfo,
        test_mode=True,
        box_type_3d='Camera',
        backend_args=backend_args))
test_dataloader = val_dataloader

val_evaluator = dict(
    type='KittiMetric',
    ann_file=data_root + 'dair_infos_val.pkl',
    metric='bbox',
    backend_args=backend_args)

test_evaluator = val_evaluator

vis_backends = [dict(type='LocalVisBackend')]
visualizer = dict(
    type='Det3DLocalVisualizer', vis_backends=vis_backends, name='visualizer')

Misplaced 2D bboxes

Misplaced 3D bboxes

It seems like the 3D bboxes orientation is the same as if it were the KITTI dataset (the vanishing point of the boxes is in the center of the image) and the 2D boxes coordinates are "calculated" from the 3D bboxes boundaries.

Troubleshoot

I already checked the lidar2cam and cam2img transformation matrices and the labels, since I can succesfully manually load the 2D bboxes and 3D bboxes. However, if the labels coordinates were incorrect, I could not have seen the correctly placed 3D bboxes in lidar detection, I guess.

Correct 2D bboxes

Correct 3D bboxes

Additional information

No response

The text was updated successfully, but these errors were encountered:

yeetypete · 2024-03-08T13:09:30Z

I am having a similar issue with a custom dataset in KITTI format. Only the projected 3D bounding box in the image is being incorrectly rendered and it looks correct in the point cloud when using multi-modality_det. I am on the dev-1.x branch.

yeetypete · 2024-03-08T15:51:04Z

I fixed this in #2923. It turns out the issue was that the images were being resized but the projected 2D bounding box points were not being correctly scaled down as well...

ramajoballester · 2024-03-13T22:55:20Z

Hi, @yeetypete !
Thank you very much for your response. Unfortunately, your proposed solution does not solve the issue. Let me know if you need any data samples to work on them.

ramajoballester changed the title ~~Misplaced 3D bboxes in DAIR-V2X-C infrastructure-side dataset in monocular 3D detection~~ Misplaced 2D and 3D bboxes in DAIR-V2X-C infrastructure-side dataset in monocular 3D detection Mar 7, 2024

yeetypete mentioned this issue Mar 8, 2024

fixed bbox scaling adjustment during projection (#2921) #2923

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misplaced 2D and 3D bboxes in DAIR-V2X-C infrastructure-side dataset in monocular 3D detection #2921

Misplaced 2D and 3D bboxes in DAIR-V2X-C infrastructure-side dataset in monocular 3D detection #2921

ramajoballester commented Mar 6, 2024 •

edited

Loading

yeetypete commented Mar 8, 2024 •

edited

Loading

yeetypete commented Mar 8, 2024

ramajoballester commented Mar 13, 2024

Misplaced 2D and 3D bboxes in DAIR-V2X-C infrastructure-side dataset in monocular 3D detection #2921

Misplaced 2D and 3D bboxes in DAIR-V2X-C infrastructure-side dataset in monocular 3D detection #2921

Comments

ramajoballester commented Mar 6, 2024 • edited Loading

Prerequisite

Task

Branch

Environment

Reproduces the problem - code sample

Reproduces the problem - command or script

Reproduces the problem - error message

Troubleshoot

Additional information

yeetypete commented Mar 8, 2024 • edited Loading

yeetypete commented Mar 8, 2024

ramajoballester commented Mar 13, 2024

ramajoballester commented Mar 6, 2024 •

edited

Loading

yeetypete commented Mar 8, 2024 •

edited

Loading