Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Arch Refactor for Agent Runtime]: Deprecating SSH-based communication and use EventStream #2404

Closed
11 of 12 tasks
xingyaoww opened this issue Jun 12, 2024 · 9 comments
Closed
11 of 12 tasks
Assignees
Labels
architecture Related to architecture, including frontend and backend enhancement New feature or request large effort Estimated large effort
Milestone

Comments

@xingyaoww
Copy link
Contributor

xingyaoww commented Jun 12, 2024

What problem or use case are you trying to solve?

Right now, the backend mainly relies on ssh to communicate with sandbox, which does not fit too well with our current Event Stream-based communication. The requirement of the backend to know the existence of "ssh" makes thing much harder to support different runtimes/docker images (#1387, e.g., we need to automatically install sshd if user bring their own docker images without it - and it can get tricky very quickly since sshd may need to be installed differently across different linux distributions) and creating hosted version (#1086).

Describe the UX of the solution you'd like

I imagine our next step in architecture will be able to support arbitrary docker image sandbox/runtime by creating one piece of software called od-runtime-client and automatically installing it into the user-provided sandbox (if it wasn't installed already - this is already partially done in #2101).

Then, at the entry point of each docker sandbox, od-runtime-client will be started:

Do you have thoughts on the technical implementation?

Due to the diversity of user-provided docker images we might need to support, I propose we use some package manager like miniforge that is already multi-platform to maintain the environment (miniforge can install different python version, and even maintains its own glibc version to circumvent some system-level restriction, e.g., glibc version too old) and dependencies of od-runtime-client. So the workflow of user-bring docker sandbox would be (essentially what we implemented in #2101):

  1. Detect if a user brings an image that comes with already installed od-runtime-client
  2. If not, we create a temporary Dockerfile, FROM user-provided-docker, then build that image with a suffix _od
  3. Then we use ${SANDBOX_CONTAINER_IMAGE}_od to start the sandbox, and assume every dependencies is met.

Describe alternatives you've considered

Additional context

@xingyaoww xingyaoww added enhancement New feature or request architecture Related to architecture, including frontend and backend labels Jun 12, 2024
@xingyaoww xingyaoww added this to the 2024-07 milestone Jun 12, 2024
@iFurySt
Copy link
Collaborator

iFurySt commented Jun 12, 2024

The design is great.

If I understand you correctly, the od-runtime-client will install all needed packages/plugins to the target sandbox to smoothly execute all tasks received from the OD server through the WebSocket connection.

I have a few worries about the dependencies, package installation, and plugin management.

we might also consider merging all the existing plugins into od-runtime-client

Will the od-runtime-client grow significantly over time, or does it only consist of basic plugins and it can pull others from the remote?

BTW, how do we define the plugins? Is it possible to let the users define their own plugins?

@xingyaoww
Copy link
Contributor Author

Will the od-runtime-client grow significantly over time, or does it only consist of basic plugins and it can pull others from the remote?

This is a very good question! For simplicity now, we can just say we want to pack everything at the beginning (maybe keep the "setup.sh" based workflow for our plugins now).

BTW, how do we define the plugins? Is it possible to let the users define their own plugins?

Now user can already define their own plugins via setup.sh.

But in the long term, we should consider make this plugin management very easy, so that user can easily define their plugins. I have a preliminary thought about using a Python package manager like pip for plugin management -- user can bring their own pip package to the system, and we basically need a pip install to install it.

Besides this, I'm also debating the difference between plugins and agentskills:

  • I view plugins more like a heavy OD-specific dependency that is required to support certain things (e.g., if we have a VSCode frontend, we may have a vscode plugin that needs to be activated inside od-runtime-client so it can directly take to vscode for code editing actions). This should/can be transparent to the end user that just want to use OD to solve tasks.

  • agentskills is more like a custom library of tools that user can easily add new capabilities into it, whether it is for support new modality (e.g., transcribe_speech_to_text) or customized workflow (e.g., lookup_my_calendar). User should be aware of this, and can new actions to it when needed. I'm thinking maybe we should focus more on lowering the entry barrier for user here?

@SmartManoj SmartManoj changed the title [Arch Refractor for Agent Runtime]: Deprecating SSH-based communication and use EventStream [Arch Refactor for Agent Runtime]: Deprecating SSH-based communication and use EventStream Jun 17, 2024
@neubig neubig modified the milestones: 2024-07, 2024-06 Jun 17, 2024
@yufansong
Copy link
Collaborator

Working on this recently, will open a draft PR when finish the basic framework.

@rbren
Copy link
Collaborator

rbren commented Jun 18, 2024

Very excited for this! I have two basic concerns here though:

  • How will we manage state without an SSH session (e.g. environment variables and cwd)?
  • The browser installation is very tricky, and will be hard to do with a BYOImage. My guess is we'll have the most luck using a separate browser container

Re: plugins, IMO we should see if we can remove this entirely. It will be hard to create plugins that support every possible container OS, and create maintenance headaches. What are we currently using plugins for at this point?

@yufansong
Copy link
Collaborator

yufansong commented Jun 18, 2024

Hi @rbren , my basic plan:

  1. implememnt a od-runtime-client (run inside the container)
    1.1 initialization work (include plugin and agentskills)
    1.2 open a websocket to receive cmd to execute
    1.3 use pexcept to maintain a bash session, then can be stateful (solve the environment variables and cwd problem)
  2. Implement websocket sandbox to communicate with the websocket in od-runtime-client

Do you have any suggestions?

How will we manage state without an SSH session (e.g. environment variables and cwd)?

It should solved by pexcept

The browser installation is very tricky, and will be hard to do with a BYOImage

I have not investigate this tech details. My basic idea is try to do it in od-runtime-client iniitialization. Still need more time to check our current browse code.

What are we currently using plugins for at this point?

In the code definition, only AgentSkillsRequirement() and JupyterRequirement()

@rbren
Copy link
Collaborator

rbren commented Jun 18, 2024

👍 SGTM! Sounds like we can get a lot out of pexpect

The Jupyter requirement is probably the hardest thing to figure out. I don't know that there's a better way--we probably do need python installed in every sandbox container.

@xingyaoww
Copy link
Contributor Author

xingyaoww commented Jun 18, 2024

@rbren

It will be hard to create plugins that support every possible container OS, and create maintenance headaches.

I think that's exactly the benefit of using miniforge3 to implement a stable environment across different sandboxes: https://github.com/conda-forge/miniforge already spend a tons of efforts to create isolated Python environment across different OS, now od-runtime-cli can just rely on miniforge3 to do all the heavy lifting, and we can just rely on the stable interface of miniforge3 for environment & package management.

The browser installation is very tricky, and will be hard to do with a BYOImage.

I think it is possible to install playwright using conda / miniforge3: https://anaconda.org/microsoft/playwright

So I'm not too worried?

@tobitege
Copy link
Collaborator

tobitege commented Jun 18, 2024

Re: plugins, IMO we should see if we can remove this entirely. It will be hard to create plugins that support every possible container OS, and create maintenance headaches. What are we currently using plugins for at this point?

Why remove plugins as such? Do you have specific alternative ways in mind?
Technically, I like the physical separation of their implementation from the core, even though the actual integration has potential for several enhancements.

@xingyaoww
Copy link
Contributor Author

The last PR is merged! Farewell ServerRuntime!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
architecture Related to architecture, including frontend and backend enhancement New feature or request large effort Estimated large effort
Projects
None yet
Development

No branches or pull requests

7 participants