Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Agent] Improve edits by adding back append_file #2722

Merged
merged 61 commits into from
Jul 11, 2024

Conversation

xingyaoww
Copy link
Contributor

@xingyaoww xingyaoww commented Jul 1, 2024

What is the problem that this fixes or functionality that this introduces? Does it fix any open issues?

Dependency: #2805

This is a follow-up of #2685.

Give a brief summary of what the PR does, explaining any non-trivial design decisions

Per discussion with @tobitege in #2685, this PR:

TODO:

Other references

@xingyaoww
Copy link
Contributor Author

xingyaoww commented Jul 7, 2024

@neubig Got the claude-3.5 results out.. There's slight degradation on claude-3.5 results (69/300=23%) compared to 1.7 (73/300=24.3%). I suspect it might be caused by the addition of edit by line number in addition to edit by replace.

Do you think we should remove the edit by line and run eval again, or we can just merge as is, and then try to remove it further in the next PR?

image

EDIT: I plan to try removing edit_file_by_line and see if it improves things - if so, we can just do it with this PR.

@neubig
Copy link
Contributor

neubig commented Jul 7, 2024

I agree that if accuracy has gone down we should probably re-try the eval.

@xingyaoww
Copy link
Contributor Author

xingyaoww commented Jul 10, 2024

Improved results on this PR - after merging #2846 that removes edit_file_by_line (this does hurt!).

74 resolved out of 300-27=273 inference instances (~27% solve rate - will be 24.6% if we consider 300 instances).
Those 27 instances are unable to run on claude-3.5-sonnet due to the following error:

litellm.BadRequestError: litellm.ContentPolicyViolationError: VertexAIException ContentPolicyViolationError - 
{"type":"error","error":{"type":"invalid_request_error","message":"Output blocked by content filtering policy"}}

Anthropic updated their article last week (https://support.anthropic.com/en/articles/9205721-why-am-i-receiving-an-output-blocked-by-content-filtering-policy-error), suggesting that '.... they generally arise from Anthropic’s efforts to prevent Claude from being used to replicate or regurgitate pre-existing materials...'

Maybe claude is trained on some repository in the SWE-Bench test set, and managed to reproduce them exactly during inference, which is blocked by anthropic's latest update feature?

Nevertheless, 74 resolved, compared to the previous run, 73 resolved, which is still a minor improvement, which I think worth merge this in.

image

@xingyaoww
Copy link
Contributor Author

Let me know if people have more concerns here (cc @tobitege) -- if not, i plan to merge this sometime tomorrow (in the next 12-18 hours) so we can go ahead test #2489 with the updated "edit" function.

Copy link
Collaborator

@yufansong yufansong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -230,17 +235,17 @@ def _cur_file_header(current_file, total_lines) -> str:

@update_pwd_decorator
def open_file(
path: str, line_number: int | None = 1, context_lines: int | None = 100
path: str, line_number: int | None = 1, context_lines: int | None = WINDOW
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WINDOW should be 300?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! It is actually 100 - i experimented earlier with 300 and it didn't go well so i reverted back to 100, but forgot to revert the doc back :(

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 more locations of 300, I think

@tobitege
Copy link
Collaborator

Let me know if people have more concerns here (cc @tobitege) -- if not, i plan to merge this sometime tomorrow (in the next 12-18 hours) so we can go ahead test #2489 with the updated "edit" function.

Of course, go ahead. 😃

@xingyaoww xingyaoww enabled auto-merge (squash) July 11, 2024 14:26
@xingyaoww xingyaoww merged commit 1b54800 into All-Hands-AI:main Jul 11, 2024
2 checks passed
@xingyaoww xingyaoww changed the title [Agent] Improve edits by adding back edit_file_by_line [Agent] Improve edits by adding back append_file Jul 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants