Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Task: openai_mmmlu professionaly translated by OpenAI as part of o1 release #2305

Open
giuliolovisotto opened this issue Sep 16, 2024 · 1 comment
Labels
feature request A feature that isn't implemented yet.

Comments

@giuliolovisotto
Copy link
Contributor

giuliolovisotto commented Sep 16, 2024

From the OpenAI o1 System Card:

"we translated MMLU’s[39] test set into 14 languages using professional human translators. This approach differs from the GPT-4 Paper where MMLU was machine translated with Azure Translate [14]. Relying on human translators for this evaluation increases confidence in the accuracy of the translations"

The datasets are included in their library here -> https://github.com/openai/simple-evals .

Is anyone working on this? I'd be interested in adding these to lm-evaluation-harness. What's a good way to structure this new task in terms of co-existing with the already present mmlu versions (kmllu, cmmlu, arabicmmlu, ...)

Tagging @baberabb 😄

@baberabb
Copy link
Contributor

baberabb commented Sep 17, 2024

Hi! This would be great! We should be able to use the same nomenclature as they use here, maybe prepending it with openai?

This script to generate the task boilerplate should come in handy, and let me know if I can help!

@baberabb baberabb added the feature request A feature that isn't implemented yet. label Sep 17, 2024
@giuliolovisotto giuliolovisotto changed the title New Task: mmlu professionaly translated by OpenAI as part of o1 release New Task: openai_mmmlu professionaly translated by OpenAI as part of o1 release Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request A feature that isn't implemented yet.
Projects
None yet
Development

No branches or pull requests

2 participants