Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(fix) Aider bench: logname fix; improve test calling instruction #3666

Merged
merged 2 commits into from
Aug 30, 2024

Conversation

tobitege
Copy link
Collaborator

Short description of the problem this fixes or functionality that this introduces. This may be used for the CHANGELOG

Fixes potential logfile name if model has : character in it. Improve instruction on how to call the (optional) test case to prevent timeouts.


Give a summary of what the PR does, explaining any non-trivial design decisions

  • Any colon : in a model name, like nousresearch/hermes-3-llama-3.1-405b:extended prevents logging to file as there won't be a folder created.
  • If the use of the unit test is enabled (env var option), the LLM's have all different ideas on how to execute that test, which often resulted in timeouts. The instruction to the LLM has been improved to use the same format as the bench would use itself,
    e.g. python -m unittest {instance.instance_name}_test.py

@tobitege tobitege added enhancement New feature or request evaluation Related to running evaluations with OpenHands labels Aug 30, 2024
Copy link
Contributor

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tobitege tobitege merged commit dbb671a into main Aug 30, 2024
@tobitege tobitege deleted the tobitege/aider-bench-logfix branch August 30, 2024 15:15
RajWorking pushed a commit to RajWorking/OpenHands that referenced this pull request Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request evaluation Related to running evaluations with OpenHands
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants