-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Insights: EleutherAI/lm-evaluation-harness
September 19, 2024 – September 26, 2024
Overview
Could not load contribution data
Please try again later
11 Pull requests merged by 5 people
-
openai: better error messages; fix greedy matching
#2327 merged
Sep 26, 2024 -
add mmlu readme
#2282 merged
Sep 26, 2024 -
Added TurkishMMLU to LM Evaluation Harness
#2283 merged
Sep 26, 2024 -
mmlu-pro: add newlines to task descriptions (not leaderboard)
#2334 merged
Sep 26, 2024 -
change glianorex to test split
#2332 merged
Sep 26, 2024 -
change group to tags in task
eus_exams
task configs#2320 merged
Sep 26, 2024 -
Treat tags in python tasks the same as yaml tasks
#2288 merged
Sep 26, 2024 -
fix writeout script
#2350 merged
Sep 26, 2024 -
squad v2: load metric with
evaluate
#2351 merged
Sep 26, 2024 -
Add a note for missing dependencies
#2336 merged
Sep 24, 2024 -
Fixed dummy model
#2339 merged
Sep 24, 2024
5 Pull requests opened by 4 people
-
Added metric aggregation for leaderboard tasks.
#2340 opened
Sep 24, 2024 -
Support pipeline parallel with OpenVINO models
#2349 opened
Sep 25, 2024 -
HF: switch conditional checks to `self.backend` from `AUTO_MODEL_CLASS`
#2353 opened
Sep 25, 2024 -
Add metabench task to LM Evaluation Harness
#2357 opened
Sep 26, 2024 -
fix `cost_estimate` script
#2359 opened
Sep 26, 2024
17 Issues closed by 5 people
-
Support for Using Multiple Choice Datasets with GPT-4o Model via OpenAI API
#2326 closed
Sep 26, 2024 -
Evaluation of MMLU tasks using the OpenAI API
#2318 closed
Sep 26, 2024 -
Issue with openai completions API - related to logprobs
#2287 closed
Sep 26, 2024 -
`glianorex_en` task does not work
#2329 closed
Sep 26, 2024 -
Tasks of type `python_task` are not listed in `lm-eval --tasks list`
#2268 closed
Sep 26, 2024 -
--tasks mmlu
#2355 closed
Sep 26, 2024 -
AttributeError: 'dict' object has no attribute 'has_test_docs'
#2342 closed
Sep 26, 2024 -
squadv2 task occurred "AttributeError: module 'datasets' has no attribute 'load_metric'"
#2348 closed
Sep 26, 2024 -
Question about IFEval on LeaderBoard
#2200 closed
Sep 24, 2024 -
Feature request: `4.4.0` Pypi release with `leaderboard`
#2195 closed
Sep 23, 2024 -
mmlu_pro fewshot_config
#2196 closed
Sep 23, 2024 -
Error of `continuation_logprobs_dicts` is `None` when running with `vllm` on multi-choice tasks
#2205 closed
Sep 23, 2024 -
Metrics that require probability scores (y_scores)
#2272 closed
Sep 23, 2024 -
the log is end,the gpu is not calculate,but is storing,the result is not getting,is it normal?
#2295 closed
Sep 23, 2024 -
Multi-node MMLU support ?
#2281 closed
Sep 23, 2024 -
External API - same results different models
#2284 closed
Sep 23, 2024 -
Using multi-GPU with accelerate is not working
#2292 closed
Sep 23, 2024
15 Issues opened by 13 people
-
[multimodal] llava-1.5-7b-hf doesn't work on `mmmu_val`
#2360 opened
Sep 26, 2024 -
Add a test for `scripts/write_out.py` and other `scripts/` utils
#2356 opened
Sep 26, 2024 -
Evaluation of MMLU tasks using a fined tuned Gemma 2 model
#2354 opened
Sep 26, 2024 -
Setting limit_mm_per_prompt for vllm_vlm fails argument parser
#2352 opened
Sep 25, 2024 -
Unexpected space character
#2346 opened
Sep 25, 2024 -
tasks RACE only high not "middle"
#2345 opened
Sep 25, 2024 -
Reproduce QWen 2.5-14B-Instruct and LLaMa-3.1-8B-Instruct Results
#2344 opened
Sep 25, 2024 -
gpt2 evaluation
#2343 opened
Sep 24, 2024 -
Locally reproducible HF-Leaderboard evals
#2338 opened
Sep 24, 2024 -
Dynamical prompt with extremely promising results #RIPrompt
#2335 opened
Sep 23, 2024 -
Confusion over the model outputs
#2331 opened
Sep 23, 2024 -
Failed to add a new metric
#2330 opened
Sep 23, 2024 -
Hashing error when setting random seed for vllm model
#2328 opened
Sep 22, 2024
18 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Fix float limit override
#2325 commented on
Sep 24, 2024 • 1 new comment -
Add new benchmark: Basque bench
#2153 commented on
Sep 26, 2024 • 1 new comment -
Mathvista
#2321 commented on
Sep 24, 2024 • 0 new comments -
Draft: mmlu translated professionally by OpenAI
#2312 commented on
Sep 24, 2024 • 0 new comments -
avoid timeout errors with high concurrency in api_model
#2307 commented on
Sep 26, 2024 • 0 new comments -
Ifeval: Dowload `punkt_tab` on rank 0
#2267 commented on
Sep 23, 2024 • 0 new comments -
[Draft] llm-as-judge
#2251 commented on
Sep 25, 2024 • 0 new comments -
HellaSwag with UnicodeDecodeError
#1757 commented on
Sep 26, 2024 • 0 new comments -
Bug in the float limit handling
#2324 commented on
Sep 26, 2024 • 0 new comments -
zero accuracy on `mmlu_generative`
#2279 commented on
Sep 26, 2024 • 0 new comments -
Fail to reproduce the perplexity of Llama-2 7B on wikitext
#2301 commented on
Sep 26, 2024 • 0 new comments -
New Task: `openai_mmmlu` professionaly translated by OpenAI as part of o1 release
#2305 commented on
Sep 24, 2024 • 0 new comments -
Configuring Azure OPENAI
#2302 commented on
Sep 23, 2024 • 0 new comments -
Error for AGIEval when using fewshot
#2323 commented on
Sep 23, 2024 • 0 new comments -
Which version to use
#2322 commented on
Sep 23, 2024 • 0 new comments -
Add long context evaluation benchmarks such as LongBench and LEval.
#2180 commented on
Sep 23, 2024 • 0 new comments -
eval gsm8k from local dataset folder with the bug info "ValueError: BuilderConfig 'main' not found."
#1829 commented on
Sep 23, 2024 • 0 new comments -
IFEval fails when multiple gpus are used (for DDP)
#2266 commented on
Sep 22, 2024 • 0 new comments