[Arch Refactor for Agent Runtime]: Deprecating SSH-based communication and use EventStream #2404

xingyaoww · 2024-06-12T05:17:35Z

What problem or use case are you trying to solve?

Right now, the backend mainly relies on ssh to communicate with sandbox, which does not fit too well with our current Event Stream-based communication. The requirement of the backend to know the existence of "ssh" makes thing much harder to support different runtimes/docker images (#1387, e.g., we need to automatically install sshd if user bring their own docker images without it - and it can get tricky very quickly since sshd may need to be installed differently across different linux distributions) and creating hosted version (#1086).

Describe the UX of the solution you'd like

I imagine our next step in architecture will be able to support arbitrary docker image sandbox/runtime by creating one piece of software called od-runtime-client and automatically installing it into the user-provided sandbox (if it wasn't installed already - this is already partially done in #2101).

Then, at the entry point of each docker sandbox, od-runtime-client will be started:

Do you have thoughts on the technical implementation?

Due to the diversity of user-provided docker images we might need to support, I propose we use some package manager like miniforge that is already multi-platform to maintain the environment (miniforge can install different python version, and even maintains its own glibc version to circumvent some system-level restriction, e.g., glibc version too old) and dependencies of od-runtime-client. So the workflow of user-bring docker sandbox would be (essentially what we implemented in #2101):

Detect if a user brings an image that comes with already installed od-runtime-client
If not, we create a temporary Dockerfile, FROM user-provided-docker, then build that image with a suffix _od
Then we use ${SANDBOX_CONTAINER_IMAGE}_od to start the sandbox, and assume every dependencies is met.

Describe alternatives you've considered

Additional context

The text was updated successfully, but these errors were encountered:

iFurySt · 2024-06-12T06:10:30Z

The design is great.

If I understand you correctly, the od-runtime-client will install all needed packages/plugins to the target sandbox to smoothly execute all tasks received from the OD server through the WebSocket connection.

I have a few worries about the dependencies, package installation, and plugin management.

we might also consider merging all the existing plugins into od-runtime-client

Will the od-runtime-client grow significantly over time, or does it only consist of basic plugins and it can pull others from the remote?

BTW, how do we define the plugins? Is it possible to let the users define their own plugins?

xingyaoww · 2024-06-12T07:20:47Z

Will the od-runtime-client grow significantly over time, or does it only consist of basic plugins and it can pull others from the remote?

This is a very good question! For simplicity now, we can just say we want to pack everything at the beginning (maybe keep the "setup.sh" based workflow for our plugins now).

BTW, how do we define the plugins? Is it possible to let the users define their own plugins?

Now user can already define their own plugins via setup.sh.

But in the long term, we should consider make this plugin management very easy, so that user can easily define their plugins. I have a preliminary thought about using a Python package manager like pip for plugin management -- user can bring their own pip package to the system, and we basically need a pip install to install it.

Besides this, I'm also debating the difference between plugins and agentskills:

I view plugins more like a heavy OD-specific dependency that is required to support certain things (e.g., if we have a VSCode frontend, we may have a vscode plugin that needs to be activated inside od-runtime-client so it can directly take to vscode for code editing actions). This should/can be transparent to the end user that just want to use OD to solve tasks.
agentskills is more like a custom library of tools that user can easily add new capabilities into it, whether it is for support new modality (e.g., transcribe_speech_to_text) or customized workflow (e.g., lookup_my_calendar). User should be aware of this, and can new actions to it when needed. I'm thinking maybe we should focus more on lowering the entry barrier for user here?

yufansong · 2024-06-17T16:19:20Z

Working on this recently, will open a draft PR when finish the basic framework.

rbren · 2024-06-18T13:55:16Z

Very excited for this! I have two basic concerns here though:

How will we manage state without an SSH session (e.g. environment variables and cwd)?
The browser installation is very tricky, and will be hard to do with a BYOImage. My guess is we'll have the most luck using a separate browser container

Re: plugins, IMO we should see if we can remove this entirely. It will be hard to create plugins that support every possible container OS, and create maintenance headaches. What are we currently using plugins for at this point?

yufansong · 2024-06-18T14:11:15Z

Hi @rbren , my basic plan:

implememnt a od-runtime-client (run inside the container)
1.1 initialization work (include plugin and agentskills)
1.2 open a websocket to receive cmd to execute
1.3 use pexcept to maintain a bash session, then can be stateful (solve the environment variables and cwd problem)
Implement websocket sandbox to communicate with the websocket in od-runtime-client

Do you have any suggestions?

How will we manage state without an SSH session (e.g. environment variables and cwd)?

It should solved by pexcept

The browser installation is very tricky, and will be hard to do with a BYOImage

I have not investigate this tech details. My basic idea is try to do it in od-runtime-client iniitialization. Still need more time to check our current browse code.

What are we currently using plugins for at this point?

In the code definition, only AgentSkillsRequirement() and JupyterRequirement()

rbren · 2024-06-18T14:28:16Z

👍 SGTM! Sounds like we can get a lot out of pexpect

The Jupyter requirement is probably the hardest thing to figure out. I don't know that there's a better way--we probably do need python installed in every sandbox container.

xingyaoww · 2024-06-18T16:33:48Z

@rbren

It will be hard to create plugins that support every possible container OS, and create maintenance headaches.

I think that's exactly the benefit of using miniforge3 to implement a stable environment across different sandboxes: https://github.com/conda-forge/miniforge already spend a tons of efforts to create isolated Python environment across different OS, now od-runtime-cli can just rely on miniforge3 to do all the heavy lifting, and we can just rely on the stable interface of miniforge3 for environment & package management.

The browser installation is very tricky, and will be hard to do with a BYOImage.

I think it is possible to install playwright using conda / miniforge3: https://anaconda.org/microsoft/playwright

So I'm not too worried?

tobitege · 2024-06-18T17:01:44Z

Re: plugins, IMO we should see if we can remove this entirely. It will be hard to create plugins that support every possible container OS, and create maintenance headaches. What are we currently using plugins for at this point?

Why remove plugins as such? Do you have specific alternative ways in mind?
Technically, I like the physical separation of their implementation from the core, even though the actual integration has potential for several enhancements.

xingyaoww · 2024-08-08T02:13:44Z

The last PR is merged! Farewell ServerRuntime!

xingyaoww added enhancement New feature or request architecture Related to architecture, including frontend and backend labels Jun 12, 2024

li-boxuan mentioned this issue Jun 12, 2024

CodeActAgent: Only delegate to BrowsingAgent as last resort #2326

Closed

xingyaoww added this to the 2024-07 milestone Jun 12, 2024

SmartManoj changed the title ~~[Arch Refractor for Agent Runtime]: Deprecating SSH-based communication and use EventStream~~ [Arch Refactor for Agent Runtime]: Deprecating SSH-based communication and use EventStream Jun 17, 2024

neubig modified the milestones: 2024-07, 2024-06 Jun 17, 2024

neubig assigned xingyaoww Jun 17, 2024

xingyaoww mentioned this issue Jun 18, 2024

Add Aider-inspired RepoMap #2248

Closed

5 tasks

xingyaoww mentioned this issue Jun 20, 2024

Streamline Logging Events #2532

Merged

yufansong mentioned this issue Jun 23, 2024

Add websocket runtime and od-client-runtime #2603

Merged

20 tasks

yufansong self-assigned this Jun 24, 2024

This was referenced Jul 4, 2024

Refactor SSH logic #2186

Closed

[Arch] Removing docker exec box #2802

Merged

[Arch] Remove supports for Background Commands #2803

Merged

[Bug]: Gets stuck while trying to activate virtual environment #2799

Open

neubig modified the milestones: 2024-06, 2024-07 Jul 5, 2024

mamoodi added the large effort Estimated large effort label Jul 6, 2024

xingyaoww mentioned this issue Jul 8, 2024

[WIP] Refractor: Combine Sandbox with Runtime #2856

Closed

yufansong mentioned this issue Jul 11, 2024

Feat: Add fast boot #2808

Closed

SmartManoj mentioned this issue Jul 11, 2024

Process interactive commands and stream output in logs #2059

Closed

This was referenced Jul 11, 2024

[Arch] EventStreamRuntime supports browser #2899

Merged

[Arch] Add tests for EventStreamRuntime and fix bash parsing #2933

Merged

This was referenced Jul 19, 2024

[Runtime] Mega-issue to track all issues related to bash Interactive terminal #3031

Open

Migrate multi-line-bash-related sandbox tests into runtime tests and fix multi-line issue #3128

Merged

xingyaoww closed this as completed Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Arch Refactor for Agent Runtime]: Deprecating SSH-based communication and use EventStream #2404

[Arch Refactor for Agent Runtime]: Deprecating SSH-based communication and use EventStream #2404

xingyaoww commented Jun 12, 2024 •

edited

Loading

iFurySt commented Jun 12, 2024

xingyaoww commented Jun 12, 2024

yufansong commented Jun 17, 2024

rbren commented Jun 18, 2024

yufansong commented Jun 18, 2024 •

edited

Loading

rbren commented Jun 18, 2024

xingyaoww commented Jun 18, 2024 •

edited

Loading

tobitege commented Jun 18, 2024 •

edited

Loading

xingyaoww commented Aug 8, 2024

[Arch Refactor for Agent Runtime]: Deprecating SSH-based communication and use EventStream #2404

[Arch Refactor for Agent Runtime]: Deprecating SSH-based communication and use EventStream #2404

Comments

xingyaoww commented Jun 12, 2024 • edited Loading

iFurySt commented Jun 12, 2024

xingyaoww commented Jun 12, 2024

yufansong commented Jun 17, 2024

rbren commented Jun 18, 2024

yufansong commented Jun 18, 2024 • edited Loading

rbren commented Jun 18, 2024

xingyaoww commented Jun 18, 2024 • edited Loading

tobitege commented Jun 18, 2024 • edited Loading

xingyaoww commented Aug 8, 2024

xingyaoww commented Jun 12, 2024 •

edited

Loading

yufansong commented Jun 18, 2024 •

edited

Loading

xingyaoww commented Jun 18, 2024 •

edited

Loading

tobitege commented Jun 18, 2024 •

edited

Loading