Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Aider Agent #120

Closed
thiswillbeyourgithub opened this issue Mar 24, 2024 · 18 comments
Closed

Create Aider Agent #120

thiswillbeyourgithub opened this issue Mar 24, 2024 · 18 comments
Labels
agent framework Strategies for prompting, agent, etc enhancement New feature or request severity:low Minor issues, affecting single user

Comments

@thiswillbeyourgithub
Copy link

Summary
There's an open source AI pair programming tool called aider that implements something interesting to you: a bunch of python classes and functions to ask the LLM to output only the diff to apply instead of writing the whole code. This both reduces the chances of errors and greatly reduces the number of tokens to write (importantly: the completion tokens are way more expensive that the prompt tokens)

Motivation
Reduce token cost and errors.

Technical Design
A report showcasing their suff can be found here. Most of the code is here and the prompts are here.
As you can see lots of though when into this because the LLM has otherwise trouble with the number of lines etc.

Alternatives to Consider
None that I know of.

Additional context
For a personnal project I inquired about using only the functions of aider and you can read the issue here
Also, hearing about OpenDevin made me hear about devika too so I'll be posting this exact same issue on their repo too.

@rbren
Copy link
Collaborator

rbren commented Mar 24, 2024

This is really interesting--thanks for bringing it up!

The improvement from using diffs is impressive. But I imagine the logic for applying them is...messy. This would be really interesting for an agent to try out

@rbren
Copy link
Collaborator

rbren commented Mar 24, 2024

I also really like the idea of the user telling the agent which files to focus on!

@thiswillbeyourgithub
Copy link
Author

Glad you like it!

messy

To me on the contrary it seems cleaner especially for long files and project.

Also things like symbex for python seem very promising to allow a bird's eye view of a project by seeing only the signature of each functions. Like a human would. I'm sure there exist other general parser for multiple languages.

But the diff thing is a priority in my mind.

@rbren rbren changed the title HeadsUp: Aider is a project implementing robust diff writing for code Create Aider Agent Mar 25, 2024
@rbren rbren added the agent framework Strategies for prompting, agent, etc label Mar 25, 2024
@cloudbow
Copy link

cloudbow commented Mar 28, 2024

As a user of aider chat I love aider chat the most out of all the tools available. One big thing where aider also adds value is to work with existing codebase. can we prioritize this as well into the opendevin project. I guess you already told this. But aider.chat is the tool which can be utilized for existing repository and it understands all the symbols in the repository. it creates a repository map which contains all the symbols which is extremely good on pinpointing the changes.

@0xdevalias
Copy link

This blog about how they use tree-sitter to build a graph of the repo/code is also really interesting/useful:

@rbren rbren added the severity:low Minor issues, affecting single user label Apr 9, 2024
@neubig
Copy link
Contributor

neubig commented May 11, 2024

Our new OpenDevin CodeAct agent implements some of the tools from SWE-Agent that make it possible to do many of the things that aider supports. If there is interest in implementing an aider agent we'd be happy to have contributions, but I'm going to close the issue as unplanned for now unless someone is interested in doing this!

@neubig neubig closed this as not planned Won't fix, can't repro, duplicate, stale May 11, 2024
@rawwerks
Copy link

rawwerks commented May 24, 2024

watching the llamaindex webinar with @rbren now - i think an aider "microagent" would be insanely powerful.

specifically - i think it could help with some of the context window mgmt challenges. paul has put an insane amount of work into to refining the diff structure within aider. and if aider is just a tool or a micro-agent, then the parent agent can just see if things works and doesn't necessarily need to be bothered with the details of what the aider tool / microagent did.

@neubig neubig reopened this May 25, 2024
@neubig neubig mentioned this issue May 28, 2024
@0xdevalias
Copy link

0xdevalias commented May 30, 2024

  • https://aider.chat/2024/05/22/swe-bench-lite.html
    • Aider scored 26.3% on the SWE Bench Lite benchmark, achieving a state-of-the-art result. The current top leaderboard entry is 20.3% from Amazon Q Developer Agent. The best result reported elsewhere seems to be 25% from OpenDevin.

  • https://www.swebench.com/
  • https://github.com/paul-gauthier/aider-swe-bench
    • Harness used to benchmark aider against SWE Bench benchmarks

    • https://github.com/paul-gauthier/aider-swe-bench#the-aider-agent
      • The "aider agent"
        The "aider agent" is dead simple. It simply invokes aider on a fresh copy the problem's git repo over and over, iterating through the models it's been told to use. Aider is invoked repeatedly until aider reports that it successfully edited the repo without any outstanding edit, lint or test errors. This is a plausible solution, so the agent is done.

        Aider is configured with a test command to run all the pre-existing tests in the problem's repo. Aider is also configured to proceed with all its suggestioned actions without any user approval.

See also:

@li-boxuan
Copy link
Collaborator

Yeah aider's benchmark score is insanely high! We should definitely incorporate aider (in some form).

@neubig
Copy link
Contributor

neubig commented May 30, 2024

@rbren @deniz-birlikci and I were talking about the logistics of doing this on slack, here are some details:

Regarding benchmark scores, aider is doing a thing where they repeat over and over again, up to 6 times, if aider doesn't come up with a test-runnable/lintable solution. So the scores are actually a bit lower (~20%) if they only try once. But I think aider definitely has some good ideas incorporated so it's worth trying.

From @rbren:

We might want to pull from Aider in a piecemeal way, rather than importing them as a dependency (we actually can't add it rn due to a conflict in playwright versions anyways--I guess they're working on browsing?)
Some ideas:

  • Add RepoMap to the State object, maybe using the Aider class. Then other agents can take advantage of it
  • Implement an EditBlockCoder agent, which
    • takes in a task that describes the edits to be made
    • reads the necessary file
    • prompts the LLM in EditBlock format
    • translates the response into bash for SEARCH/REPLACE or similar
  • Pull in the linting functionality

@PierrunoYT
Copy link
Contributor

We need this after it passes more than 40 % on SWE Bench

@0xdevalias
Copy link

We need this after it passes more than 40 % on SWE Bench

@PierrunoYT What is significant about 40%?

@assertion
Copy link
Contributor

We need this after it passes more than 40 % on SWE Bench

Its name is Aide, seems not the same as Aider? @PierrunoYT
image

@0xdevalias
Copy link

0xdevalias commented Jul 1, 2024

Its name is Aide, seems not the same as Aider?

Definitely seems to be different to aider.

Context:

  • https://github.com/codestoryai/swe_bench_traces
  • https://codestory.ai/
    • We believe, we now have the opportunity and necessity, to fundamentally re-imagine the editor to be a place where both humans and AI can work together.

      Our attempt at this mighty goal, is Aide. We're building an editor that bridges the present and the future — equipped to help developers effectively leverage AI in their workflows today, while paving the way for how we imagine programming with AI will look in the future.

  • https://aide.dev/
    • Aide lets you pick an infra provider and model of choice, add your API key and just start coding. All queries made to the model are available to you in a SQLite DB locally, and our prompts are Open Source.

    • https://github.com/codestoryai/prompts
      • Contains the prompts we use to talk to various LLMs for different utilities inside the editor

I couldn't see a PR submission for aide's results here though:

And since I hadn't seen that MentatBot on the SWE-Bench leaderboard either, here's the blog link + results submission PR for it:

@0xdevalias
Copy link

0xdevalias commented Jul 1, 2024

Also things like symbex for python seem very promising to allow a bird's eye view of a project by seeing only the signature of each functions. Like a human would. I'm sure there exist other general parser for multiple languages.

This blog about how they use tree-sitter to build a graph of the repo/code is also really interesting/useful:

Stack graphs may also help in the 'code search/context' space of things (similar to aider's repo map/etc); it's what powers GitHub's smart code navigation features:

@neubig
Copy link
Contributor

neubig commented Jul 1, 2024

Here is a link to a twitter thread explaining it: https://x.com/skcd42/status/1806640696662675469

@PierrunoYT
Copy link
Contributor

PierrunoYT commented Jul 1, 2024

@assertion @0xdevalias Yeah I wrote it wrong and forgot to edit it.

@neubig
Copy link
Contributor

neubig commented Jul 3, 2024

I think that actually we can probably close this issue in favor of the more concrete #2185, #2220, #2221

@neubig neubig closed this as not planned Won't fix, can't repro, duplicate, stale Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent framework Strategies for prompting, agent, etc enhancement New feature or request severity:low Minor issues, affecting single user
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants