Skip to content

KAIST-Edlab/Study_Of_VL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

93 Commits
 
 
 
 
 
 

Repository files navigation

CXR VL research group

We're a group of doctoral students from KAIST's AI Graduate School, and we're all about multi-modal (vision-language) research in the medical field. Our aim is to continuously expand our knowledge and experience beyond traditional boundaries by deeply analyzing the essence of AI and the unique characteristics of the medical domain.

Every Thursday, we get together to review papers on the multi-modal research conducted in both general and medical fields, actively exploring the endless possibilities of AI through analysis and discussion. If you're interested in our study, especially if you have a background in medical or AI fields, we'd love for you to join us and grow together. (contact: jhak.moon@kaist.ac.kr)

KAIST AI 대학원의 박사과정 학생으로 구성된 우리 그룹은 의료 분야의 멀티모달(시각-언어) 연구에 전념하고 있습니다. 인공지능의 본질과 의료 도메인의 특성을 깊이 연구하면서, 기존의 경계를 초월하여 우리의 지식과 경험을 지속적으로 확장하고자 합니다.

우리는 일반 분야와 의료 분야에서 진행되는 멀티모달 연구의 논문을 매주 선정하여 리뷰하며, 분석과 토론을 통해 인공지능의 끊임없는 가능성을 적극 탐구하고 있습니다. 우리 그룹에 참여해 함께 성장할 분은 언제든지 환영합니다! (contact: jhak.moon@kaist.ac.kr)

We will upload a recorded video on personal youtube storage. please check the link below.

Objective:

Paper reading/discussion on VL models (not limited to md (medical domain); md -> gd (general domain) -> md -> gd ...)

Time:

Thur 16:00 AM - 17:30 AM

Participants and presentation order:

(KAIST-Edlab, 2023-04-06 Joined) 종학, 현경, 성수

(KAIST-MLIlab, 2023-07-27 Joined) 한결

(KAIST-Edlab, 2024-06-08 Joined) 다은

Presentation order

종학 -> 현경 -> 성수 -> 한결 -> 다은

Paper-Review:

Date Week Presenter Topic Paper Material Link
2023.04.06 Week01 Jonghak parametric model BioViL-T Slides
2023.04.13 Week02 Hyungyung Consistency based MLM EPIC Slides
2023.04.20 Week03 Seongsu Textual inversion on medical domain Medical diffusion on a budget: textual inversion for medical image generation Paper -
2023.04.27 Week04 Jonghak Zero convoluton ControlNet None
2023.05.04 Week05 Hyungyung CXR Generation Cheff None
2023.05.11 Week06 Seongsu PEFT, multi-modal LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention, LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model Paper1 Paper2 -
2023.05.18 Week07 Jonghak Region-guided generation (CVPR23) RGRG Slides
2023.05.25 Week08 Hyungyung Compositionality MosaiCLIP None
2023.06.01 None None None None None
2023.06.08 None None None None None
2023.06.15 Week9 Seongsu Benchmark and evaluation VisualGPTScore: Visio-Linguistic Reasoning with Multimodal Generative Pre-Training Scores Paper -
2023.06.22 Week10 Jonghak Open-set detection in Genneral & Medical Recent 6 papers (ViLD, GLIP/GLIP-v2, ...) Slides
2023.06.29 Week11 Hyungyung Machine World Learning Benchmark MEWL: Few-shot multimodal word learning with referential uncertainty None
2023.07.06 Week12 Seongsu Evaluation on RRG Evaluating Progress in Automatic Chest X-Ray Radiology Report Generation Paper -
2023.07.13 Week13 Jonghak Openset detection with LLM GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest None
2023.07.20 Week14 Hyungyung Attention & Retrieval based RRG Reading Radiology Imaging Like The Radiologists None
2023.07.27 Week15 Seongsu RAG for RRG Retrieval Augmented Chest X-Ray Report Generation using OpenAI GPT models Paper -
2023.08.03 None None None None None
2023.08.10 Week16 Jonghak In-context learning in medical MedFlamingo None
2023.08.16 Week17 Hyungyung Reasoning Segmentation with Large Multimodal Model (CVPR 24) LISA & (ICCV 23) SAM Slide
2023.08.24 Week18 Hangyul Graph Consturction for Ophthalmologic Report Generation (CVPR 22) Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation Slides Video
2023.08.31 Week19 Seongsu IE benchmark on radiology reports RadGraph2: Modeling Disease Progression in Radiology Reports via Hierarchical Information Extraction Paper -
2023.09.08 Week20 Jonghak
2023.09.15 Week21 Hyungyung Anomaly detection + LLM AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models Slides
2023.09.22 Week22 Hangyul Image Paragraph Captioning (NeurIPS 22) Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning Slides
2023.10.05 Week23 Seongsu Exploiting LLMs as visual explainers Learning Concise and Descriptive Attributes for Visual Recognition Paper -
2023.10.12 Week24 Jonghak zero-shot VQA & GPT4 in radiograph 1. Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language 2. Exploring the Boundaries of GPT-4 in Radiology paper1 paper2 Video
2023.10.19 Week25 Hyungyung Refinement strategy for VLLM Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models Slides
2023.10.26 Week26 Hangyul Segmentation w/o annotation using vision-language model (CVPR 22) GroupViT: Semantic Segmentation Emerges from Text Supervision Slides Video
2023.11.02 Week27 Seongsu InstructPix2Pix adaptable for sequential CXR exams BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys Paper -
2023.11.09 Week28 Jonghak World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models
2023.11.16 Week29 Hyungyung Benchmark for VLLM HALLUSIONBENCH: You See What You Think? Or You Think What You See?
2023.11.23 Week30 Hangyul Model Customization w/ retrieval (CVPR 23) Learning Customized Visual Models with Retrieval-Augmented Knowledge Slides Video
2023.11.30 Week31 Seongsu Benchmark integartion, multi-task & multi-modal learning Learning A Multi-Task Transformer Via Unified And Customized Instruction Tuning For Chest Radiograph Interpretation Paper -
2023.12.21 Week32 Jonghak Image Captioners Are Scalable Vision Learners Too
2023.12.28 Week33 Hyungyung See, Say, and Segment: Teaching LMMs to Overcome False Premises
2024.01.02 Week34 Hangyul Masked Representation Learning in medical VL (ICLR 23) Advancing Radiograph Representation Learning with Masked Record Modeling Slides Video
2024.01.11 Week35 Seongsu Identifying and resolving artifact phenomena in feature maps of ViTs Vision Transformers Need Registers Paper -
2024.01.18 Week36 Jonghak A Vision Check-up for Language Models -
2024.01.25 Week37 Hyungyung Incorporating Visual Experts to Resolve the Information Loss in Multimodal Large Language Models -
2024.02.02 Week38 Hangyul Multimodal CoT Multimodal Chain-of-Thought Reasoning in Language Models Slides Video
2024.02.08 Week39 Seongsu Vision Backbones for the Radiology Domain RAD-DINO: Exploring Scalable Medical Image Encoders Beyond Text Supervision Paper -
2024.02.15 Week40 Jonghak -
2024.02.22 Week41 Hyungyung Chain-of-Reasoning with Question Generation Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation Slide -
2024.03.07 Week42 Seongsu Benchmark and Toolkit for Evaluating Medical Vision-Language Models MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language Models Paper -
2024.03.22 Week43 Hangyul CLIP-Based Zero-Shot Anomaly Detection (ICLR 24) AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection Slides Video
2024.03.29 Week44 Jonghak MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
2024.04.04 Week45 Hyungyung Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters
2024.04.11 Week46 Seongsu LLM-as-Judge in Radiology Report Generation LLM-RadJudge: Achieving Radiologist-Level Evaluation for X-Ray Report Generation Paper -
2024.04.18 Week47 Hangyul LLM for Multimodal Learning of CXR (ICLR 24) LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation Slides Video
2024.04.25 Week48 Jonghak Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine
2024.05.02 Week49 Hyungyung BLINK : Multimodal Large Language Models Can See but Not Perceive
2024.05.09 Week50 Seongsu LLM-as-Judge in Radiology Report Generation GREEN: Generative Radiology Report Evaluation and Error Notation Paper -
2024.05.23 Week51 Hangyul MiniGPT4 for CXR (AAAI 24) Bootstrapping Large Language Models for Radiology Report Generation Slides Video
2024.05.30 Week52 Jonghak Dense captioning (CVPR 24) Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation Slides
2024.06.13 Week53 Hyungyung Why are Visually-Grounded Language Models Bad at Image Classification?
2024.06.20 Week54 Seongsu Generation of Digitally Reconstructed Radiographs from CT images Shadow and Light: Digitally Reconstructed Radiographs for Disease Classification Paper -
2024.06.27 Week55 Hangyul Chatting for CXR WoLF:Wide-scope Large Language Model Framework for CXR Understanding Slides
2024.07.05 Week56 Daeun Doctor LLM evaluation Towards Automatic Evaluation for LLMs’ Clinical Capabilities: Metric, Data, and Algorithm Slides
2024.07.11 Week57 Jonghak Symbolic representation (RL) Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding Slides
2024.07.18 Week58 Hyungyung
2024.08.01 Week59 Seongsu Encoder-free Vision-Language Model Unveiling Encoder-Free Vision-Language Models Paper -
2024.08.08 Week60 Hangyul Chatting-based image retrieval (NeurIPS 23) Chatting Makes Perfect: Chat-based Image Retrieval Slides
2024.08.22 Week61 Daeun MLLMs Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs Slide
2024.08.29 Week62 Jonghak Knowledge Graph for CXR Uncovering Knowledge Gaps in Radiology Report Generation Models through Knowledge Graphs slides
2024.09.12 Week63 Hyungyung

Releases

No releases published

Packages

No packages published