Skip to content

Releases: databricks/megablocks

v0.6.1

31 Aug 14:49
Compare
Choose a tag to compare

What's New

Patch release to remove dependencies specified via github and instead use released versions through pypi (specifically, stanford-stk and grouped-gemm). This allows for releasing megablocks itself via pypi.

What's Changed

  • Remove direct dependencies, allowing for megablocks pypi release by @snarayan21 in #149

Full Changelog: v0.6.0...v0.6.1

v0.6.0

30 Aug 18:55
Compare
Choose a tag to compare

What's New

1. Torch 2.4 Compatibility (#145)

MegaBlocks now supports Torch 2.4!

2. New CI/CD

MegaBlocks has new Github Actions for better CI/CD! Now on every PR, MegaBlocks will automatically perform code linting and formatting (#131) and run tests on a GPU (#127).

3. Remove Weight Parallelism (#137)

Weight parallelism was not in use and so we removed it.

4. Shared Experts (#109)
Implement shared experts, based on the DeepSeekMoE paper.

Bug Fixes

  1. Better handle incompatible ffn sizes (#108)
  2. Fix AMP for memory optimized options (#111)
  3. Don't save moe lb-loss tensors (#119)

What's Changed

New Contributors

Full Changelog: v0.5.1...v0.6.0

v0.5.1

11 Jan 22:14
f05609c
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.5.0...v0.5.1

v0.5.0

08 Dec 16:51
0460181
Compare
Choose a tag to compare

What's New

Several improvements to avoid CPU <> GPU device synchronizations, GLU support, and support for some new models 👀

What's Changed

New Contributors

Full Changelog: v0.4.0...v0.5.0

v0.4.0

24 Oct 22:44
6a71b18
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.3.3...v0.4.0

v0.3.3

17 Oct 21:58
52aa1b2
Compare
Choose a tag to compare

What's Changed

  • Enable running MegaBlocks MoE without bias by @vchiley in #31

Full Changelog: v0.3.2...v0.3.3

v0.3.2

10 Oct 22:32
Compare
Choose a tag to compare

What's Changed

  • Support for bfloat16
  • Optimizations for top_k > 1
  • Support for fully-sharded data parallelism
  • Support tensor model parallelism when expert_parallel_world_size > num_experts
  • Optimizations for activation memory
  • Support activation quantization (thanks @dblalock!)
  • Optimizations for SM90 (Hopper)
  • Lots of bug fixes, cleanup and small optimizations

New Contributors

Full Changelog: v0.1...v0.3.2

Version 0.1

01 May 15:14
Compare
Choose a tag to compare
Version 0.1 Pre-release
Pre-release

Initial release documenting repository state prior to MLSys'23 camera-ready publication.