Skip to content

Sandspeare/llasm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

llasm: Naming Functions in Binaries by Fusing Encoder-only and Decoder-only LLMs

About

llasm, is a novel framework that fuses encoder-only and decoder-only LLMs, which utilizes their capabilities to better comprehend assembly language and have better generalizability at function naming.

News

Install

  1. Install Package
conda create -n llasm python=3.10 -y
conda activate llasm
pip install --upgrade pip
pip install -e .
  1. Install additional packages for training cases
pip install ninja
pip install flash-attn==1.0.2

Train

Hyperparameters

We use a similar set of hyperparameters as LLaVA in finetuning. Both hyperparameters used in pretraining and finetuning are provided below.

  1. Pretraining
Hyperparameter Global Batch Size Learning rate Epochs Max length Weight decay
LLasm-13B 128 2e-3 1 2048 0
  1. Finetuning
Hyperparameter Global Batch Size Learning rate Epochs Max length Weight decay
LLasm-13B 32 2e-5 3 2048 0

Pretrain

Pretrain takes around 24 hours for LLasm-13B on 4x A100 (80G).

./scripts/train.sh

Instruction Tuning

Tuning takes around 24 hours for LLasm-13B on 4x A100 (80G).

./scripts/test.sh

QuickStart

Inference

python ./eval/eval_binary.py

Evaluation

python ./eval/performance.py

We will release all evaluation datasets after publication.

Data

performance across differnet optimization

./llasm/eval/save/dataset

performance on mirai malware

./llasm/eval/save/mirai

Acknowledgement

  • Vicuna: the base model we built upon, and our base model Vicuna-13B that has the amazing language capabilities!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published