-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Agent] Improve edits by adding back append_file
#2722
[Agent] Improve edits by adding back append_file
#2722
Conversation
…inter errors (best attempt)
25577be
to
cf9eef9
Compare
@neubig Got the claude-3.5 results out.. There's slight degradation on claude-3.5 results (69/300=23%) compared to 1.7 (73/300=24.3%). I suspect it might be caused by the addition of edit by line number in addition to edit by replace. Do you think we should remove the edit by line and run eval again, or we can just merge as is, and then try to remove it further in the next PR? EDIT: I plan to try removing |
I agree that if accuracy has gone down we should probably re-try the eval. |
This reverts commit 06fd2c5.
Improved results on this PR - after merging #2846 that removes 74 resolved out of 300-27=273 inference instances (~27% solve rate - will be 24.6% if we consider 300 instances).
Anthropic updated their article last week (https://support.anthropic.com/en/articles/9205721-why-am-i-receiving-an-output-blocked-by-content-filtering-policy-error), suggesting that '.... they generally arise from Anthropic’s efforts to prevent Claude from being used to replicate or regurgitate pre-existing materials...' Maybe Nevertheless, 74 resolved, compared to the previous run, 73 resolved, which is still a minor improvement, which I think worth merge this in. |
04b4acf
to
8df6970
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -230,17 +235,17 @@ def _cur_file_header(current_file, total_lines) -> str: | |||
|
|||
@update_pwd_decorator | |||
def open_file( | |||
path: str, line_number: int | None = 1, context_lines: int | None = 100 | |||
path: str, line_number: int | None = 1, context_lines: int | None = WINDOW |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WINDOW should be 300?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! It is actually 100 - i experimented earlier with 300 and it didn't go well so i reverted back to 100, but forgot to revert the doc back :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 2 more locations of 300, I think
f9c1785
to
f02c145
Compare
edit_file_by_line
append_file
What is the problem that this fixes or functionality that this introduces? Does it fix any open issues?
Dependency: #2805
This is a follow-up of #2685.
Give a brief summary of what the PR does, explaining any non-trivial design decisions
Per discussion with @tobitege in #2685, this PR:
add backit shows degraded performance ([Agent] Improve edits by adding backedit_file_by_line
and make it co-exists withedit_file_by_replace
append_file
#2722 (comment))append_file
so that agent can easily append content to the end due to concern mentioned here.insert_content_at_line
introduced in [Agent] (Potentially) improve Editing usingdiff
#2685TODO:
diff
#2685 to be merged firstOther references