Lists (1)
Sort Name ascending (A-Z)
Stars
Official code and checkpoint release for mobile robot foundation models: GNM, ViNT, and NoMaD.
Theia: Distilling Diverse Vision Foundation Models for Robot Learning
Language/Clicking grounded SAM + VOS for real-time video object tracking
An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) fo…
This is the official code for MobileSAM project that makes SAM lightweight for mobile applications and beyond!
A distilled Segment Anything (SAM) model capable of running real-time with NVIDIA TensorRT
RepViT: Revisiting Mobile CNN From ViT Perspective [CVPR 2024] and RepViT-SAM: Towards Real-Time Segmenting Anything
[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Implementation of a 2-D 2DoF manipulator configuration space visualizer, as well as implementations of a gradient descent and wavefront planner for this manipulator.
Leveraging Large Language Models for Visual Target Navigation
Pytorch code for NeurIPS-20 Paper "Object Goal Navigation using Goal-Oriented Semantic Exploration"
Reading list for research topics in embodied vision
A curated list of research papers in Vision-Language Navigation (VLN)
[TPAMI 2024] Official repo of "ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments"
Vision-and-Language Navigation in Continuous Environments using Habitat
AI Research Platform for Reinforcement Learning from Real Panoramic Images.
Official implementation of Think Global, Act Local: Dual-scale GraphTransformer for Vision-and-Language Navigation (CVPR'22 Oral).
DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments
Massively Parallel Deep Reinforcement Learning. 🔥
An MBTI Exploration of Large Language Models
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment