Skip to content

Latest commit

 

History

History

evaluate

AlpacaEval with Fine-grained pairwise evaluation

Template for GPT-as-Judge

See eval_template_pairwise.md.

Run evaluation in parallel

bash evaluate/eval.sh urial-llama-70b 
bash evaluate/eval.sh urial-llama-7b
bash evaluate/eval.sh mistral-urial