Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cold shard eviction #219

Merged
merged 128 commits into from
May 25, 2023
Merged
Changes from 1 commit
Commits
Show all changes
128 commits
Select commit Hold shift + click to select a range
5de5d30
Refactor shard states array/lock usage.
knighton Apr 6, 2023
fae0611
Rename: _PartitionState -> _IterState.
knighton Apr 6, 2023
a1c1f86
Move state_dict(), load_state_dict() up.
knighton Apr 7, 2023
8374c8a
Redo shard states, implement evict_shard(), redo download_shard().
knighton Apr 7, 2023
b2841bf
Initialize shard_states, make shared barrier take changeable num_procs.
knighton Apr 8, 2023
75a9723
Fix (lint).
knighton Apr 8, 2023
c663f21
Merge branch 'main' into james/evict
knighton Apr 20, 2023
9decddc
init_local_dir -> are_shards_present -> _shard_states.
knighton Apr 20, 2023
d8ea1dd
Fix typo in ticks.
knighton Apr 20, 2023
c788634
Update StreamingDataset arguments: drop keep_raw and add cache_limit.
knighton Apr 21, 2023
c326deb
worker_barrier -> shared_barrier
knighton Apr 21, 2023
a6bdd07
Split shared.py int three: prefix, memory, and barrier.
knighton Apr 22, 2023
c1ed176
Rename CreateSharedMemory (a class) -> SharedMemory.
knighton Apr 22, 2023
e9d2e8f
Add and use SharedArray.
knighton Apr 22, 2023
5f0d08b
Create and populate _shard_raw_sizes, _shard_zip_sizes shm arrays.
knighton Apr 22, 2023
5b32b0e
Merge branch 'main' into james/evict
knighton Apr 22, 2023
8cd0531
Stream.validate_weights
knighton Apr 23, 2023
9f250ef
Purportedly do the rest of cold shard eviction.
knighton Apr 23, 2023
d8d5617
Merge branch 'james/evict' of github.com:mosaicml/streaming into jame…
knighton Apr 23, 2023
54add55
Fix (docstring).
knighton Apr 23, 2023
c7b5f37
SharedScalar.
knighton Apr 23, 2023
a9e08fb
Add comments.
knighton Apr 24, 2023
4d91193
Pull weight derivations out of StreamingDataset to Stream.apply_weights
knighton Apr 24, 2023
afb78a9
Rename _sample_ids -> _work in line with StreamingDataset._get_work
knighton Apr 24, 2023
7dc25df
Fix (docstrings -- cache usage in bytes).
knighton Apr 24, 2023
c0d3871
Update streaming/base/format/base/reader.py
knighton Apr 24, 2023
8c5aaf8
Refactor shard init_local_dir, move eviction from stream to shard.
knighton Apr 24, 2023
ec545ce
Merge branch 'james/evict' of github.com:mosaicml/streaming into jame…
knighton Apr 24, 2023
c7e16a1
Rename variable: ls -> fileames_present.
knighton Apr 24, 2023
7f3cfc0
Docstring improvement.
knighton Apr 24, 2023
24542e0
Improve docstrings.
knighton Apr 24, 2023
53f2191
Merge branch 'main' into james/evict
knighton Apr 24, 2023
a7ce3e9
Include the index(es) in the calculation of cache usage.
knighton Apr 25, 2023
cbbf52e
Merge branch 'james/evict' of github.com:mosaicml/streaming into jame…
knighton Apr 25, 2023
cdd35f7
Forgetting something?
knighton Apr 25, 2023
3077885
Merge branch 'main' into james/evict
knighton Apr 25, 2023
1d6511a
Harden _IterState shutdown, improve docstrings.
knighton Apr 25, 2023
05fad12
Prototype test_eviction.
knighton Apr 25, 2023
54afedb
Merge branch 'james/evict' of github.com:mosaicml/streaming into jame…
knighton Apr 25, 2023
2bfbfaf
Merge branch 'main' into james/evict
knighton Apr 25, 2023
257893a
Fix (docstring).
knighton Apr 25, 2023
1e6def8
Merge branch 'james/evict' of github.com:mosaicml/streaming into jame…
knighton Apr 25, 2023
41a7eba
Merge branch 'main' into james/evict
knighton Apr 25, 2023
80d671b
Merge branch 'main' into james/evict
knighton Apr 25, 2023
959f6d8
Merge branch 'main' into james/evict
knighton Apr 25, 2023
74ee2ff
Unfuck the merge.
knighton Apr 26, 2023
72ad378
Fix (lint).
knighton Apr 26, 2023
e8dd0c4
Merge branch 'main' into james/evict
knighton Apr 26, 2023
c226171
Merge branch 'main' into james/evict
knighton Apr 26, 2023
adec1e5
Merge branch 'main' into james/evict
knighton Apr 26, 2023
b63e2fa
Unfuck the merge.
knighton Apr 26, 2023
ef7d968
Fix (missing paren).
knighton Apr 26, 2023
d5936a9
Expose mt/mp-safe download_shard(), evict_shard().
knighton Apr 27, 2023
a729839
Merge branch 'james/evict' of github.com:mosaicml/streaming into jame…
knighton Apr 27, 2023
33cf070
Improve docstring.
knighton Apr 27, 2023
738d1b6
Several small fixes.
knighton Apr 28, 2023
5afca91
Change tick back.
knighton Apr 28, 2023
4ac2cf2
Sleepy __del__.
knighton Apr 28, 2023
44ef7bf
10x.
knighton Apr 28, 2023
80be2fb
Break parts of tests into their own methods.
knighton Apr 28, 2023
2b12d13
Add a grace period for the old when registering the new.
knighton Apr 28, 2023
a562ae4
Fix stray utf-8.
knighton Apr 28, 2023
aad7c62
ValueError -> RuntimeError.
knighton Apr 28, 2023
0cee886
Docstring.
knighton Apr 28, 2023
bbcd066
Improve eviction tests.
knighton Apr 28, 2023
b8fe9c5
Merge branch 'james/evict' of github.com:mosaicml/streaming into jame…
knighton Apr 28, 2023
5975be4
Also call exit() at the end of SD.__iter__
knighton Apr 28, 2023
742f22d
Update usage.
knighton Apr 28, 2023
9e00574
Get specific about exception types.
knighton Apr 28, 2023
9921c09
Fix (docstrings).
knighton Apr 28, 2023
dd0b7f3
Fix (docstrings).
knighton Apr 28, 2023
93474cc
Merge branch 'main' into james/evict
knighton Apr 28, 2023
39e6f71
Merge branch 'james/evict' of github.com:mosaicml/streaming into jame…
knighton Apr 28, 2023
a74530a
zip_nokeep, zip_keep.
knighton Apr 28, 2023
57053d0
Fix (lint).
knighton Apr 28, 2023
70781cc
cache_limit check in SD init.
knighton Apr 28, 2023
378d176
Tweak logic.
knighton May 8, 2023
895b5c4
Merge branch 'james/evict' of github.com:mosaicml/streaming into jame…
knighton May 8, 2023
16b148b
Up get_prefix retries to account for tightened TICK.
knighton May 8, 2023
eff7a40
f64 time() -> u64 time_ns() (better properties).
knighton May 8, 2023
4876758
Merge branch 'james/evict' of github.com:mosaicml/streaming into jame…
knighton May 8, 2023
28232e4
Merge branch 'main' into james/evict
knighton May 8, 2023
7ad7de1
Merge branch 'james/evict' of github.com:mosaicml/streaming into jame…
knighton May 8, 2023
f7836a3
Replace while with for.
knighton May 8, 2023
50d45f0
While -> for.
knighton May 8, 2023
65d640e
Merge branch 'james/evict' of github.com:mosaicml/streaming into jame…
knighton May 8, 2023
f8bb998
Update streaming/base/shared/memory.py
knighton May 8, 2023
298b2ba
SharedMemory.buf property
knighton May 8, 2023
f865b67
Merge branch 'james/evict' of github.com:mosaicml/streaming into jame…
knighton May 8, 2023
c9b3cf3
Remove spurious imports.
knighton May 8, 2023
e4678e1
Merge branch 'james/evict' of github.com:mosaicml/streaming into jame…
knighton May 9, 2023
77e0231
Harden concurrent download/evict logic.
knighton May 9, 2023
ca48983
Switch shared barrier usage in SD init to torch dist barrier.
knighton May 10, 2023
ab85bd1
SD _cache_filelock -> SD torch multiprocesing _cache_lock.
knighton May 10, 2023
173d480
Do the same to SharedBarrier.
knighton May 10, 2023
dc146f6
Fancy exception handling in __iter__ threads.
knighton May 10, 2023
bde5ea2
Merge branch 'main' into james/evict
knighton May 10, 2023
bc04cab
Fix (get_shards call).
knighton May 11, 2023
e41a269
Merge branch 'james/evict' of github.com:mosaicml/streaming into jame…
knighton May 11, 2023
08cfead
Merge branch 'main' into james/evict
knighton May 11, 2023
3fea380
Merge branch 'james/evict' of github.com:mosaicml/streaming into jame…
knighton May 11, 2023
b44ea24
Switch torch.multiprocessing.Manager().Lock() _cache_lock to FileLock…
knighton May 11, 2023
5b1f80b
Switch SharedBarrer torch mp lock to FileLock.
knighton May 11, 2023
f768883
Merge branch 'james/evict' of github.com:mosaicml/streaming into jame…
knighton May 11, 2023
6a3e818
Fix (yapf).
knighton May 11, 2023
91a576f
Speed up test_eviction.
knighton May 11, 2023
1bca9f1
Fix (typo).
knighton May 11, 2023
27e855e
Switch dist barrier -> wait_for_file_to_exist.
knighton May 12, 2023
29c9f64
Merge branch 'main' into james/evict
knighton May 14, 2023
b37c9cc
Fixed boto3 interpreter shutdown issue and fixed waiting for shard in…
karan6181 May 18, 2023
cb419a1
Add support for human readble size format
karan6181 May 18, 2023
966237e
Check for a correct number of thread exits
karan6181 May 18, 2023
2de1628
Removed extra eviction test
karan6181 May 19, 2023
cede2ca
Merge remote-tracking branch 'origin/main' into cold_shard_eviction
karan6181 May 19, 2023
1a46fc2
Draft shuffling docs.
knighton May 24, 2023
96422e1
Add missing dirs.
knighton May 24, 2023
49bf2f8
Attempt to harden logic.
knighton May 24, 2023
8a31ae6
EOF line.
knighton May 24, 2023
7412e14
Tweak docs.
knighton May 24, 2023
7749595
Lower the predownload value and updated the default shuffle_algo
karan6181 May 24, 2023
5db6ef6
Update readme with cold shard eviction details and update predownload…
karan6181 May 24, 2023
fc0b555
Lingo.
knighton May 25, 2023
58334e0
NCN x64.
knighton May 25, 2023
e162ab6
Merge branch 'main' into james/evict
knighton May 25, 2023
7c5cbca
Tweak lingo.
knighton May 25, 2023
990f093
Merge branch 'james/evict' of github.com:mosaicml/streaming into jame…
knighton May 25, 2023
5e719c2
"Dynamically"
knighton May 25, 2023
f8498ba
revert NCN 64 changes
karan6181 May 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
ValueError -> RuntimeError.
  • Loading branch information
knighton committed Apr 28, 2023
commit aad7c62900f624e8c4bc7a8655c9a536beb2f555
2 changes: 1 addition & 1 deletion streaming/base/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ def exit(self) -> None:
# Signal threads to exit.
with self._lock:
if self._is_exiting:
raise ValueError('Called exit() on an _IterState that is already exiting.')
raise RuntimeError('Called exit() on an _IterState that is already exiting.')
self._is_exiting = True

# Block until they have all exited, returning _is_exiting to False.
Expand Down