Note:

Extending GPT-2 Context Length via RoPE Scaling

note: ive chose gpt2 as its the only small model i could find that i could finetune easiy without ooming which was old enough to not have rope preimplemented (qlora was making it harder and giving too many errors so dint wanna go into that as time constrians)

Training Runs

Demo

Try the model here: GPT-2 Long Demo
Try giving an input of >1k or 2k tokens

Evaluation

Not as good as expected: RoPE Test Evaluation but...good enough! given the compute constrians! :D

Approach

Use the rotatory pos implementation by lucid rains here
change the model to use rope pos embeddings
save and upload to huggingface (to not oom), the model can be found here
load and train seperately on long-alpaca12k
these steps can be seen in notebook and notebook
for logs and other findings or docs check logs and this

Note:

i kind of get that the ideal way to apply pathces to models would be something like this kaiokendev impl though this was my frist time doing this and time constrains so i just used whatever i could

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.obsidian		.obsidian
DOC.md		DOC.md
Readme.md		Readme.md
app.py		app.py
asses.md		asses.md
final.ipynb		final.ipynb
gpt2long-train.ipynb		gpt2long-train.ipynb
image.png		image.png
logs.md		logs.md
pretier_documentation.md		pretier_documentation.md
pretier_log.md		pretier_log.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extending GPT-2 Context Length via RoPE Scaling

Training Runs

Demo

Evaluation

Approach

Note:

About

Releases

Packages

Languages

archit-spec/RoPE-scaling

Folders and files

Latest commit

History

Repository files navigation

Extending GPT-2 Context Length via RoPE Scaling

Training Runs

Demo

Evaluation

Approach

Note:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages