Embracing Massive Medical Data

Paper

Embracing Massive Medical Data
Yu-Cheng Chou, Zongwei Zhou, and Alan L. Yuille
Johns Hopkins University
MICCAI 2024
paper | code

Figure 1: Different Training Method. Linear memory stores only a few recent samples, causing significant forgetting. Dynamic memory adapts to varying data distributions by retaining unique samples, while selective memory further identifies and selects challenging samples, including those that might be duplicated, ensuring they are not missed by dynamic memory

0. Installation

conda create -n massive python=3.9
source activate massive
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install monai[all]==0.9.0
pip install -r requirements.txt

wget https://www.dropbox.com/s/lh5kuyjxwjsxjpl/Genesis_Chest_CT.pt

1. Dataset

We adopt two large-scale CT datasets in our experiments, including a single-site private dataset and a sequential-site dataset. To download the sequential-site dataset, please see the Datasets and Dataset Pre-Process setction to create training and testing data.

2. Train the model

CUDA_VISIBLE_DEVICES=0,1,2,3 python -W ignore -m torch.distributed.launch --nproc_per_node=4 --master_port=1234 train.py

3. Test the model

CUDA_VISIBLE_DEVICES=0 python -W ignore test.py

4. Results

Table 1: Data Efficiency. The results demonstrate that the linear memory trained on continual data streams achieves comparable performance to the prevalent training paradigm that trains models repeatedly with 100 epochs. Linear memory enables training without the need to revisit old data, thereby enhancing data efficiency.

Table 2: Dynamic Adaptation. Under the varying distributions in the streaming source, Dynamic Memory (DM) and Selective Memory (SM) enable the identification of the significant samples and thereby enhance the segmentation performance.

Figure 2: Catastrophic Forgetting. To evaluate forgetting, we calculate the relative Dice drop after training on the incoming sub-datasets. Both Dynamic Memory (DM) and Selective Memory (SM) store samples from previous sub-datasets, thereby alleviating forgetting observed with LM.

Figure 3: Diverse Memory. We visualize the memory to demonstrate the diversity of stored samples from previous $D_d$. Both Dynamic Memory (DM) and Selective Memory (SM) can retain the samples from previous sub-datasets. Selective Memory (SM) can further identify samples with higher uncertainty.

Acknowledgement

This work was supported by the Lustgarten Foundation for Pancreatic Cancer Research and partially by the Patrick J. McGovern Foundation Award.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
dataset		dataset
model		model
pretrained_weights		pretrained_weights
utils		utils
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Embracing Massive Medical Data

Paper

0. Installation

1. Dataset

2. Train the model

3. Test the model

4. Results

Acknowledgement

About

Releases

Packages

Languages

License

MrGiovanni/OnlineLearning

Folders and files

Latest commit

History

Repository files navigation

Embracing Massive Medical Data

Paper

0. Installation

1. Dataset

2. Train the model

3. Test the model

4. Results

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages