Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qdrant keyword path filter in proc #1190

Merged
merged 4 commits into from
Dec 21, 2023
Merged

Conversation

ggordonhall
Copy link
Contributor

@ggordonhall ggordonhall commented Dec 19, 2023

We want to use a Qdrant keyword filter on paths when running proc to ensure that we only receive code snippets from the specified files. In all other cases, we want to use a text filter so that we can partially match path filters (e.g. so we can search within the directory server/)

Copy link

gitpod-io bot commented Dec 19, 2023

@ggordonhall ggordonhall merged commit 6560200 into main Dec 21, 2023
2 checks passed
@ggordonhall ggordonhall deleted the gabriel/proc-exact-path-match branch December 21, 2023 09:01
calyptobai added a commit that referenced this pull request Jan 17, 2024
We were making semantic queries with the full stringified repo ref.
Instead, we should have been constructing a semantic query using the
repository display name.

It seems that this was fixed coincidentally in #1190 via a condition
(which might now be possible to remove).
ggordonhall pushed a commit that referenced this pull request Jan 19, 2024
* Fix code search on local repos

We were making semantic queries with the full stringified repo ref.
Instead, we should have been constructing a semantic query using the
repository display name.

It seems that this was fixed coincidentally in #1190 via a condition
(which might now be possible to remove).

* Use indexed name in semantic query construction
calyptobai added a commit that referenced this pull request Jan 26, 2024
* Fix code search on local repos

We were making semantic queries with the full stringified repo ref.
Instead, we should have been constructing a semantic query using the
repository display name.

It seems that this was fixed coincidentally in #1190 via a condition
(which might now be possible to remove).

* Use indexed name in semantic query construction
calyptobai added a commit that referenced this pull request Jan 30, 2024
* wip: multi-repo

wip: multirepo

wip: multirepo

Use RepoPath in more places

Minor things

Clippy & some small things

Resurrect explain endpoint

Add Project to link references

Clean up call sites to semantic search

Always search on display name

We can propagate multiple repos, so this shouldn't be needed

Scope fuzzy queries to a project

Use RepoRef in RepoPath

Use RepoPath for relative_path refs

Add back sqlx data

Add back branch filters to fuzzy matching

This is just awful

Eh, need correct json here

---------

Co-authored-by: rsdy <p@symmetree.dev>

* update answer prompt for multi-repo

* WIP: projects

* WIP: projects

Back-end API changes include:

- Addition of `projects` table
  - Studios now live *inside* a project
  - Ownership was moved from studios to projects
- New routes:
  - `GET /api/projects`: returns a list of:

    `[ { id: number, name: string, modified_at: date string } ]`

  - `POST /api/projects/`: takes in a body like:

    `{ name: string | null }`

    Note: there is a default name generated here if not provided, "New Project"

    This route returns a string body which is the ID

  - `GET /api/projects/:id`: if the project exists, returns:

    `{ name: string }`

  - `POST /api/projects/:id`: updates a project, takes a body of:

    `{ name: string }`

    Returns nothing

- Additionally, all `/api/studio/...` routes have been moved to
  `/projects/:id/studios/..`
  - Note: `studio` was changed to `studios`
  - Note: all routes remain otherwise unchanged

* WIP: Projects

* WIP: Projects

* WIP: Projects

* WIP: Projects - refactoring conversation threads for multi-project context

* WIP: Projects

As part of this change, we add some new routes for project repo
associations.

- `GET /projects/:id/repos` returns a list of:

  `[ { ref: string } ]`

- `POST /projects/:id/repos`: takes in a body like:

  `{ ref: string }`

  This adds the repo by repo ref to the list of repos in a project

- `DELETE /projects/:id/repos/:repo_ref` deletes the repo from the
  project repo list

* WIP: Projects

As part of this change, we now return a complete `Repo` object when
retrieving `GET /projects/:id/repos`.

Tooling for the agent was also adjusted.

* WIP: Projects

Here, we add `most_common_langs` to `GET /projects` and`GET /projects/:id`

We also add routes for project to doc associations.

- `GET /projects/:id/docs` returns a list of:

  ```
  [
    {
      id: number,
      url: string,
      index_status: string,
      name: null | string,
      favicon: null | string,
      description: null | string,
      modified_at: date string,
    }
  ]
  ```

- `POST /projects/:id/docs`: takes in a body like:

  `{ doc_id: number }`

  This adds the doc by ID to the list of docs in a project

- `DELETE /projects/:id/docs/:doc_id` deletes the doc from the project doc
  list

* WIP: Projects

Add branches to `project_repo` associations

* WIP: Projects

Add constraints and foreign keys on new project models

* WIP: Projects

Amongst other patches, we introduce some API changes here.

- We move `/q` to `/projects/:id/q`:
  - This no longer takes a `repo_ref` argument. Now, this route will
    infer the related repositories based on repos associated with the
    requested project.
- We move `/search/path` to `/projects/:id/search/path`
- We add a new `GET /folder` route, which is like `/file` but retrieves
  directory data. Internally, this route makes an `open:`-style query.

* fix repo tantivy search (#1174)

* Fix conversation store/load

* Add `PUT /projects/:id/repos`, change `DELETE /projects/:id/repos/:id`

We add `PUT /projects/:id/repos`, which accepts an object:

```
{ "ref": repo ref, "branch": branch name or NULL }
```

Additionally, `DELETE /projects/:id/repos/:repo_ref` was changed to just
`DELETE /projects/:id/repos/:id`, where the `repo_ref` value was moved
to a JSON object in the request body:

```
{ "ref": repo ref }
```

* Avoid JSON body in `DELETE /projects/:id/repos`

Now, we use a query parameter to indicate the repo ref: `?ref=...`

* Filter repos in semantic search by query repos, if present

* Sync docs in background, whether stream still exists

* Return thread_id in conversation routes

- `GET /projects/:id/conversations/:id` now returns an object like:

  ```
  { thread_id: string, exchanges: [...] }
  ```

  Note that previously, this just returned the list of exchanges.

- `GET /projects/:id/conversations` now returns an additional field in
  each item:

  ```
  { thread_id: string, ...previous fields }
  ```

* Fix handling of `conversation_id` with `/answer`

* Fix routing for `DELETE /projects/:id/conversations/:id`

* Use `conversation_id` instead of `thread_id` in `GET /answer`

Rather than returning an initial JSON object, we introduce a new
`ChatEvent` type, and return the conversation ID on stream end upon
successful store.

* Return errors with debug formatting

* Fix more rebase errors

* Indexing status reporting improvements (#1192)

* repo index status reporting fixes

* report whether is resync in index progress

* rework sync logic for docs (#1186)

* rework sync logic for docs

- replace `/sync` with `/enqueue`; a non-streaming replacement to add
  items to the doc-sync queue
- introduce `/status` and `/cancel`; to stream updates for a syncing
  document or to cancel a sync job
- convert `/resync` to http from sse
- internal updates to `/list` to work with the new queue system

* track metadata update in progress stream

* handle possible error state

* Fix tests

* Run cargo fmt

* Path search edits (#1200)

* add repo name to path tool answer, use skim_fuzzy_path_match instead of fuzzy_path_match and use only repos from the project

* filter fuzzy path search by language and remove unused code paths

---------

Co-authored-by: rafael <22560219+rmuller-ml@users.noreply.github.com>

* Add repos to answer action prompt and step prompt (#1198)

* Add repos to answer action prompt and step prompt

* limit number of tokens for symbol classification

* tweak prompt text

---------

Co-authored-by: Gabriel Gordon-Hall <ggordonhall@gmail.com>

* Restrict queries on `/q` to only return results valid for a project (#1203)

We rewrite the parsed set of queries to restrict them such that they only return valid results.

* Restrict queries on `/q` to only return results valid for a project

* Add project ID to autocomplete

* Fix repo autocomplete

* fix default for repo deduplication in semantic search

* Fix autocomplete for path and lang queries

* Send back context data in studio list route

* Fix code search on local repos (#1204)

* Fix code search on local repos

We were making semantic queries with the full stringified repo ref.
Instead, we should have been constructing a semantic query using the
repository display name.

It seems that this was fixed coincidentally in #1190 via a condition
(which might now be possible to remove).

* Use indexed name in semantic query construction

* Add fields to list studios (projects) (#1205)

* return token counter, doc_context and full context in list studios route

* make studio prompt multi-repo (#1208)

* save onboarding status on user profile (#1210)

* Anastasiia/autocomplete page size (#1211)

don't override page_size from api_params

* return token counts for studio snapshots (#1212)

* fix clippy (#1213)

* return None if parent commit does not exist (#1214)

* fix SQL query that retrieves a list of docs for project (#1216)

* Fix autocomplete repo match (#1218)

* make repo_name ; make autocomplete and folder queries case insensitive; remove repo.display_name()

* use stringified repo_ref in prompt

* bump version to 0.6.0

* fix blocking status endpoint (#1217)

the status reporting endpoint can drop the lock over the tantivy index
once it has a handle to the progress stream.

* Migrate existing databases to new project schema

* Fix lints

* app redesign

* disable right click in prod mode

* add files to code studio

* project dependant autocomplete

* some fixes

* add selection hint

* studio conversation

* studio conversation design fixes

* studio navigation using left sidebar

* indexing docs, add docs to projects

* some fixes

* add docs to studios

* polishing studios

* usage popover

* add tutorial cards

* search docs

* fixes

* quick fix

* minor fixes

* fix apply diff

* feedback fixes

* feedback fixes 2

* feedback fixes 3

* dedupe studios, pass page_size to autocomplete

* studio response loading state

* studio history

* add repo to project when it finished indexing

* ensure there are no duplicates in file search, fix error handling in chat

* testing fixes

* add missing translations, clean up locales files

* doc fixes

* improve mouse handling in arrow navigation is command bar

* testing fixes

* rework arrow navigation

* rework arrow navigation, use context

* add arrow navigation to all dropdowns

* minor fixes

* improve header wrapping when adding to studio

---------

Co-authored-by: Gabriel Gordon-Hall <ggordonhall@gmail.com>
Co-authored-by: rsdy <p@symmetree.dev>
Co-authored-by: calyptobai <calyptobai@gmail.com>
Co-authored-by: akshay <nerdy@peppe.rs>
Co-authored-by: rafael <22560219+rmuller-ml@users.noreply.github.com>
Co-authored-by: calyptobai <111788964+calyptobai@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants