Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Because the mmdet repo is already in this project ,did i need to run pip install mmdet==2.25.3 #23

Closed
RicoJYang opened this issue Jul 31, 2023 · 8 comments

Comments

@RicoJYang
Copy link

if i did run pip install mmdet==2.25.3, this project has error: No module named mmdet. When i run pip install mmdet==2.25.3 , i try to train my dataset,then has error:KeyError: 'CoDETR is not in the models registry'

@TempleX98
Copy link
Collaborator

We have included the source code of mmdet in this repo. So you don't need to install it via pip.

@RicoJYang
Copy link
Author

We have included the source code of mmdet in this repo. So you don't need to install it via pip.

Thank you for your reply. How can i sove the error : KeyError: 'CoDETR is not in the models registry' when i run 'sh tools/dist_train.sh projects/configs/co_deformable_detr/co_deformable_detr_r50_1x_coco.py 2 results/'

@TempleX98
Copy link
Collaborator

  1. If you have installed mmdet, you should move the projects folder to your mmdet directory.
  2. Add the code from projects import * to tools/train.py, tools/test.py, mmdet/apis/train.py and mmdet/apis/inference.py.

@RicoJYang
Copy link
Author

  1. If you have installed mmdet, you should move the projects folder to your mmdet directory.
  2. Add the code from projects import * to tools/train.py, tools/test.py, mmdet/apis/train.py and mmdet/apis/inference.py.

I am greatly disturbed that I did the operation you have provided and the files and folders in mmdet are apis core datasets __init__.py models projects __pycache__ utils version.py But i still didn't solve the problem

Traceback (most recent call last):
  File "tools/train.py", line 245, in <module>
    main()
  File "tools/train.py", line 216, in main
    test_cfg=cfg.get('test_cfg'))
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/mmdet-2.25.3-py3.7.egg/mmdet/models/builder.py", line 59, in build_detector
    main()
  File "tools/train.py", line 216, in main
    cfg, default_args=dict(train_cfg=train_cfg, test_cfg=test_cfg))
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/mmcv/utils/registry.py", line 217, in build
    test_cfg=cfg.get('test_cfg'))
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/mmdet-2.25.3-py3.7.egg/mmdet/models/builder.py", line 59, in build_detector
    return self.build_func(*args, **kwargs, registry=self)
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg
    cfg, default_args=dict(train_cfg=train_cfg, test_cfg=test_cfg))
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/mmcv/utils/registry.py", line 217, in build
    return build_from_cfg(cfg, registry, default_args)
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/mmcv/utils/registry.py", line 47, in build_from_cfg
    return self.build_func(*args, **kwargs, registry=self)
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg
    f'{obj_type} is not in the {registry.name} registry')
KeyError: 'CoDETR is not in the models registry'
    return build_from_cfg(cfg, registry, default_args)
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/mmcv/utils/registry.py", line 47, in build_from_cfg
    f'{obj_type} is not in the {registry.name} registry')
KeyError: 'CoDETR is not in the models registry'

@TempleX98
Copy link
Collaborator

  1. The best solution is to uninstall your mmdet: pip uninstall mmdet and directly use the mmdet in our repo.
  2. Another solution can be viewed here.

@RicoJYang
Copy link
Author

  1. The best solution is to uninstall your mmdet: pip uninstall mmdet and directly use the mmdet in our repo.
  2. Another solution can be viewed here.

Thank you for your reply . I have solved this problem. But when i try to train my own modedl,there is an another error:

Traceback (most recent call last):
  File "tools/train.py", line 245, in <module>
    main()
  File "tools/train.py", line 241, in main
    meta=meta)
  File "/mnt/lustre/GPU7/home/yangbo/workspace/codes/Co-DETR-main/mmdet/apis/train.py", line 245, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
    self.run_iter(data_batch, train_mode=True, **kwargs)
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
    **kwargs)
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 59, in train_step
    output = self.module.train_step(*inputs[0], **kwargs[0])
  File "/mnt/lustre/GPU7/home/yangbo/workspace/codes/Co-DETR-main/mmdet/models/detectors/base.py", line 248, in train_step
    losses = self(**data)
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 110, in new_func
    return old_func(*args, **kwargs)
  File "/mnt/lustre/GPU7/home/yangbo/workspace/codes/Co-DETR-main/mmdet/models/detectors/base.py", line 172, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/mnt/lustre/GPU7/home/yangbo/workspace/codes/Co-DETR-main/mmdet/models/detectors/co_detr.py", line 180, in forward_train
    gt_labels, gt_bboxes_ignore)
  File "/mnt/lustre/GPU7/home/yangbo/workspace/codes/Co-DETR-main/mmdet/models/dense_heads/co_deformable_detr_head.py", line 629, in forward_train
    losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 198, in new_func
    return old_func(*args, **kwargs)
  File "/mnt/lustre/GPU7/home/yangbo/workspace/codes/Co-DETR-main/mmdet/models/dense_heads/co_deformable_detr_head.py", line 700, in loss
    img_metas, gt_bboxes_ignore)
ValueError: not enough values to unpack (expected 4, got 3)
Traceback (most recent call last):
  File "tools/train.py", line 245, in <module>
    main()
  File "tools/train.py", line 241, in main
    meta=meta)
  File "/mnt/lustre/GPU7/home/yangbo/workspace/codes/Co-DETR-main/mmdet/apis/train.py", line 245, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
    self.run_iter(data_batch, train_mode=True, **kwargs)
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
    **kwargs)
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 59, in train_step
    output = self.module.train_step(*inputs[0], **kwargs[0])
  File "/mnt/lustre/GPU7/home/yangbo/workspace/codes/Co-DETR-main/mmdet/models/detectors/base.py", line 248, in train_step
    losses = self(**data)
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 110, in new_func
    return old_func(*args, **kwargs)
  File "/mnt/lustre/GPU7/home/yangbo/workspace/codes/Co-DETR-main/mmdet/models/detectors/base.py", line 172, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/mnt/lustre/GPU7/home/yangbo/workspace/codes/Co-DETR-main/mmdet/models/detectors/co_detr.py", line 180, in forward_train
    gt_labels, gt_bboxes_ignore)
  File "/mnt/lustre/GPU7/home/yangbo/workspace/codes/Co-DETR-main/mmdet/models/dense_heads/co_deformable_detr_head.py", line 629, in forward_train
    losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 198, in new_func
    return old_func(*args, **kwargs)
  File "/mnt/lustre/GPU7/home/yangbo/workspace/codes/Co-DETR-main/mmdet/models/dense_heads/co_deformable_detr_head.py", line 700, in loss
    img_metas, gt_bboxes_ignore)
ValueError: not enough values to unpack (expected 4, got 3)
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 107352) of binary: /mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/bin/python
Traceback (most recent call last):
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/torch/distributed/run.py", line 718, in run
    )(*cmd_args)
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/mnt/lustre/GPU7/home/yangbo/anaconda3/envs/co-dert/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 247, in launch_agent
    failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

I change the code to enc_loss_cls, enc_losses_bbox, enc_losses_iou= self.loss_single(enc_cls_scores, enc_bbox_preds,gt_bboxes_list, binary_labels_list,img_metas, gt_bboxes_ignore) , then the error is solved. Will the changed code will affect my training?

@TempleX98
Copy link
Collaborator

This is a bug, I will fix it.

TempleX98 added a commit that referenced this issue Aug 3, 2023
@minh132
Copy link

minh132 commented Mar 18, 2024

@TempleX98 @RicoJYang . When I train model, I still meet above error

Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

  warnings.warn(
Traceback (most recent call last):
  File "tools/train.py", line 18, in <module>
    from mmdet.apis import init_random_seed, set_random_seed, train_detector
  File "/root/code/exp/Lab/Co-DETR/mmdet/apis/__init__.py", line 2, in <module>
    from .inference import (async_inference_detector, inference_detector,
  File "/root/code/exp/Lab/Co-DETR/mmdet/apis/inference.py", line 13, in <module>
    from mmdet.datasets import replace_ImageToTensor
  File "/root/code/exp/Lab/Co-DETR/mmdet/datasets/__init__.py", line 2, in <module>
    from .builder import DATASETS, PIPELINES, build_dataloader, build_dataset
  File "/root/code/exp/Lab/Co-DETR/mmdet/datasets/builder.py", line 26, in <module>
    resource.setrlimit(resource.RLIMIT_NOFILE, (soft_limit, hard_limit))
ValueError: not allowed to raise maximum limit
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 752631) of binary: /root/miniconda3/envs/lab/bin/python
Traceback (most recent call last):
  File "/root/miniconda3/envs/lab/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/miniconda3/envs/lab/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/root/miniconda3/envs/lab/lib/python3.8/site-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/root/miniconda3/envs/lab/lib/python3.8/site-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/root/miniconda3/envs/lab/lib/python3.8/site-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/root/miniconda3/envs/lab/lib/python3.8/site-packages/torch/distributed/run.py", line 715, in run
    elastic_launch(
  File "/root/miniconda3/envs/lab/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/root/miniconda3/envs/lab/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
tools/train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-03-18_07:35:16
  host      : 02d76903f0b2
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 752631)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants