Who, When, Why?

👨🏾‍💻 Original Author: Nick Renotte, updates from Todd as I bumbled through making it work for me
📅 Version: 1.1 Sept 24, 2023
📜 License: This project is licensed under the MIT license. Feel free to use it, just don't do bad things with it.

Environment notes

Runpod using 1 A100-80GB GPU with 300GB of storage
Tried running across multiple GPUs using DataParallel to wrap the model but was not successful. I've left in the code to try to run in parallel but it would require more work to break up the workload into batches or find some other solution
300GB is required for the LLama 70B chat model. LLama2 requires ~140GB (70B x 2 bytes). The safetensors stored in the /model directory are another 140-150GB
If you want to access from an external port using streamlit later you also need to customize the deployment to expose TCP port 8501
Latest Pytorch environments are running Cuda11.8 so the notebook was modified accordingly

Jupyter notebook changes

would need to do a compare with Nick's original code to see all the changes but numerous small changes made to get code working
Vectorstoreindex struggled with format of original vectors created by PyMuPDFReader so needed to ensure metadata was all changed to str data

More granular steps to setup

LLama2 download request - https://ai.meta.com/resources/models-and-libraries/llama-downloads/ . The turnaround on this is generally pretty quick and works well. You need to do this before you can run download.sh
The LLama models are gated models on huggingface so you must get access to the models before you can run the notebook. This can take multiple days so do it early. Here's the page for the 70B Chat model access - https://huggingface.co/meta-llama/Llama-2-70b-chat-hf . It seems like once you have access to one LLama model, you will have acess to all of the LLama2 models
Set up your Runpod environment - 1 A100 80GB GPU and 300GB of storage
Close the repository in the new Runpod environment
Run download.sh. Check first if the file has execute permissions using "ls -l download.sh". If there is not -x in the permissions then you need to run "chmod +x download.sh". Once the file has permissions run "./download.sh" from the terminal windown. You will be prompted for the URL in the email from Meta (this should work by simply cutting and pasting). You will also be prompted for the model you want to use. Download the 70B-chat model. This may take 20-30 minutes
Get your huggingface access token at https://huggingface.co/settings/tokens. This is immediate. Once you have it you need to add it to the appropriate place in the notebook
You should now be able to run the llama2 notebook and see the magic happen. Note that the first time this is run, you will need to wait for the sharded safetensors to download from huggingface. This could take 60 minutes so grab a drink and find something else to do for a while. It can be run in parallel to the download of the LLama2 model.

And, boom! that should work

Note: I ran into out of memory errors repeatedly at the building model steps when the GPU would not release memory. The easiest fix was to restart the pod and then rerun the notebook. Not great but my other attempts at clearing the memory didn't work.

Streamlit version notes

From the terminal prompt run: "pip install streamlit"
Next run "streamlit run app.py" from the terminal prompt
If you've configured your environment correctly that should work (but I never managed to get it to work)

--------------- original readme from Nick ------------------------------

Building LLama Banker

Doing RAG for Finance using LLama2. Highly recommend you run this in a GPU accelerated environment. I used a A100-80GB GPU on Runpod for the video!

See it live and in action 📺

Startup 🚀

Clone this repo git clone https://github.com/nicknochnack/Llama2RAG
Go into the directory cd Llama2RAG
Startup jupyter by running jupyter lab in a terminal or command prompt
Update the auth_token variable in the notebook.
Hit Ctrl + Enter to run through the notebook!
Go back to my YouTube channel and like and subscribe 😉...no seriously...please! lol
If you want to start up the streamlit app run streamlit run app.py (make sure you update your auth token in there as well!)

Other References 🔗

-Llama 2 70b Chat Model Card:hugging face model card on the model used for the video.

-Llama Index Doco:sick library used for RAG.

Who, When, Why?

👨🏾‍💻 Author: Nick Renotte
📅 Version: 1.x
📜 License: This project is licensed under the MIT license. Feel free to use it, just don't do bad things with it.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
LlamaIndex_Tutorials_Embeddings.ipynb		LlamaIndex_Tutorials_Embeddings.ipynb
README.md		README.md
app.py		app.py
download.sh		download.sh
llama2 notebook.ipynb		llama2 notebook.ipynb
streamlit_version.ipynb		streamlit_version.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Who, When, Why?

Environment notes

Jupyter notebook changes

More granular steps to setup

Streamlit version notes

Building LLama Banker

See it live and in action 📺

Startup 🚀

Other References 🔗

Who, When, Why?

About

Releases

Packages

Languages

tascheidt/Llama2RAG

Folders and files

Latest commit

History

Repository files navigation

Who, When, Why?

Environment notes

Jupyter notebook changes

More granular steps to setup

Streamlit version notes

Building LLama Banker

See it live and in action 📺

Startup 🚀

Other References 🔗

Who, When, Why?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages