Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SWE-bench] Util: Compare files modified between gold patches and OpenDevin patches #2934

Merged

Conversation

li-boxuan
Copy link
Collaborator

https://aider.chat/2024/05/22/swe-bench-lite.html says it finds correct files to edit in 70.3% of benchmark tests. This PR adds a utility script that computes the number for OpenDevin.

Usage Example:

poetry run python evaluation/swe_bench/scripts/setup/compare_patch_filename.py --od_output_file /Users/liboxuan/workspace/OpenDevin/evaluation/evaluation_outputs/outputs/swe_bench_lite/CodeActAgent/claude-3-5-sonnet@20240620_maxiter_30_N_v1.8-no-hint/output.jsonl

claude-3-5-sonnet@20240620_maxiter_30_N_v1.8-no-hint result says OpenDevin found correct files to edit in 64% of benchmark tests. The real number might be even lower because I consider success as long as OpenDevin modifies the correct file even though it might have modified other files erroneously and caused the test to fail.

@li-boxuan
Copy link
Collaborator Author

li-boxuan commented Jul 15, 2024

I didn't add a README file coz I feel this script is not very useful for general audience. I created this PR mainly to convey my findings. Feel free to merge or close this PR.

@li-boxuan li-boxuan requested a review from xingyaoww July 15, 2024 04:45
@tobitege
Copy link
Collaborator

I didn't add a README file coz I feel this script is not very useful for general audience. I created this PR mainly to convey my findings. Feel free to merge or close this PR.

I don't mind having this added, in case anyone else might find it useful.
How about adding your usage example to the docstring in this file itself, though?

Copy link
Contributor

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this script! LGTM!

@xingyaoww xingyaoww merged commit b834b35 into All-Hands-AI:main Jul 15, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants