llasm: Naming Functions in Binaries by Fusing Encoder-only and Decoder-only LLMs

About | New | Install | Train | QuickStart | Data | Acknowledgement |

About

llasm, is a novel framework that fuses encoder-only and decoder-only LLMs, which utilizes their capabilities to better comprehend assembly language and have better generalizability at function naming.

News

[2024/3/31] The base model of llasm-encoder is now available on Hugging Face Model Hub (https://huggingface.co/sandspeare/llasm-encoder).
[2024/3/31] The base model of llasm-decoder is now available on Hugging Face Model Hub (https://huggingface.co/sandspeare/llasm-decoder).

Install

Install Package

conda create -n llasm python=3.10 -y
conda activate llasm
pip install --upgrade pip
pip install -e .

Install additional packages for training cases

pip install ninja
pip install flash-attn==1.0.2

Train

Hyperparameters

We use a similar set of hyperparameters as LLaVA in finetuning. Both hyperparameters used in pretraining and finetuning are provided below.

Pretraining

Hyperparameter	Global Batch Size	Learning rate	Epochs	Max length	Weight decay
LLasm-13B	128	2e-3	1	2048	0

Finetuning

Hyperparameter	Global Batch Size	Learning rate	Epochs	Max length	Weight decay
LLasm-13B	32	2e-5	3	2048	0

Pretrain

Pretrain takes around 24 hours for LLasm-13B on 4x A100 (80G).

./scripts/train.sh

Instruction Tuning

Tuning takes around 24 hours for LLasm-13B on 4x A100 (80G).

./scripts/test.sh

QuickStart

Inference

python ./eval/eval_binary.py

Evaluation

python ./eval/performance.py

We will release all evaluation datasets after publication.

Data

performance across differnet optimization

./llasm/eval/save/dataset

performance on mirai malware

./llasm/eval/save/mirai

Acknowledgement

Vicuna: the base model we built upon, and our base model Vicuna-13B that has the amazing language capabilities!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
llasm		llasm
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llasm: Naming Functions in Binaries by Fusing Encoder-only and Decoder-only LLMs

About | New | Install | Train | QuickStart | Data | Acknowledgement |

About

News

Install

Train

Hyperparameters

Pretrain

Instruction Tuning

QuickStart

Inference

Evaluation

Data

Acknowledgement

About

Releases

Packages

Languages

Sandspeare/llasm

Folders and files

Latest commit

History

Repository files navigation

llasm: Naming Functions in Binaries by Fusing Encoder-only and Decoder-only LLMs

About | New | Install | Train | QuickStart | Data | Acknowledgement |

About

News

Install

Train

Hyperparameters

Pretrain

Instruction Tuning

QuickStart

Inference

Evaluation

Data

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages