-
The Chinese University of Hong Kong
- Hong Kong
- yilunchen.com/about
Stars
The repository provides code associated with the paper VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation (ICRA 2024)
(NeurIPS 2022) Self-Supervised Visual Representation Learning with Semantic Grouping
[ECCV 2024] Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
MichalZawalski / embodied-CoT
Forked from openvla/openvlaEmbodied Chain of Thought: A robotic policy that reason to solve the task.
Official codebase for "Any-point Trajectory Modeling for Policy Learning"
OVExp: Open Vocabulary Exploration for Object-Oriented Navigation
GRUtopia: Dream General Robots in a City at Scale
(NeurIPS 2024) Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights
Official code for "FeatUp: A Model-Agnostic Frameworkfor Features at Any Resolution" ICLR 2024
Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRA
Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
Unified framework for robot learning built on NVIDIA Isaac Sim
[CVPR'24 Best Student Paper] Mip-Splatting: Alias-free 3D Gaussian Splatting
[CoRL 2024] HumanPlus: Humanoid Shadowing and Imitation from Humans
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
CleanDiffuser: An Easy-to-use Modularized Library for Diffusion Models in Decision Making
A generative and self-guided robotic agent that endlessly propose and master new skills.
Imitation learning algorithms with Co-training for Mobile ALOHA: ACT, Diffusion Policy, VINN
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
[ICRA2023] Grounding Language with Visual Affordances over Unstructured Data
Code&Data for Grounded 3D-LLM with Referent Tokens
Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge)
The official source code for "X-Ray: A Sequential 3D Representation for Generation".
Distributed Robot Interaction Dataset.
The repository for the largest and most comprehensive empirical study of visual foundation models for Embodied AI (EAI).
An official code repository for CVPR 2023 paper SGTAPose : Robot Structure Prior Guided Temporal Attention for Camera-to-Robot Pose Estimation from Image Sequence