Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Need to reduce parallel requests for health check based on model provider (i.e. ollama) #5816

Open
shuther opened this issue Sep 21, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@shuther
Copy link

shuther commented Sep 21, 2024

What happened?

If using ollama, health checks return with an error while the endpoint is ok. It is because the checks are done in parallel and with too many models, it times out (ollama is loading them in serial).
Proposition is to limit the number of parallel requests, maybe by provider (for ollama 1 or 2 should be the maximum, for openai maybe 3 or 4 maximum so it doesn't impact production rate limitation and it is more corporate citizen). I proposed to change the code below but you may want to add an env somewhere to capture the limits:
Maybe you want to use instead Semaphore or Queue for the work?
file: litellm/proxy/health_check.py

async def _perform_health_check(model_list: list, details: Optional[bool] = True, max_concurrent_tasks: int = 2):
    """
    Perform a health check for each model in the list.
    Limit the level of async call
    """
    semaphore = asyncio.BoundedSemaphore(max_concurrent_tasks)

    async def sem_task(model):
        async with semaphore:
            litellm_params = model["litellm_params"]
            model_info = model.get("model_info", {})
            litellm_params["messages"] = _get_random_llm_message()
            mode = model_info.get("mode", None)
            return await litellm.ahealth_check(
                litellm_params,
                mode=mode,
                prompt="test from litellm",
                input=["test from litellm"],
            )

    tasks = [sem_task(model) for model in model_list]
    results = await asyncio.gather(*tasks)

    healthy_endpoints = []
    unhealthy_endpoints = []
...

Relevant log output

error from health checks (the error starts happening after few tested models):
Please note that the error text gets truncated, and maybe we want to return only Timeout, not the full details in the /health endpoint

"error": "error:litellm.APIConnectionError: \nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.11/site-packages/litellm/main.py\", line 427, in acompletion\n    response = await init_response\n               ^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.11/site-packages/litellm/llms/ollama.py\", line 495, in ollama_acompletion\n    raise e  # don't use verbose_logger.exception, if exception is raised\n    ^^^^^^^\n  File \"/usr/local/lib/python3.11/site-packages/litellm/llms/ollama.py\", line 436, in ollama_acompletion\n    resp = await session.post(url, json=data)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.11/site-packages/aiohttp/client.py\", line 684, in _request\n    await resp.start(conn)\n  File \"/usr/local/lib/python3.11/site-packages/aiohttp/client_reqrep.py\", line 994, in start\n    with self._timer:\n  File \"/usr/local/lib/python3.11/site-packages/aiohttp/helpers.py\", line 713, in __exit__\n    raise asyncio.TimeoutError from None\nTimeoutError\n. Missing `mode`. Set the `mode` for the model - https://docs.litellm.ai/docs/proxy/health#embedding-models  \nstacktrace: Traceback (most recent call last):\n  File \"/usr/local/lib/python3.11/site-packages/litellm/main.py\", line 427, in acompletion\n    response = await init_response\n               ^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.11/site-packages/litellm/llms/ollama.py\", line 495, in ollama_acompletion\n    raise e  # don't use verbose_logger.exception, if exception is raised\n    ^^^^^^^\n  File \"/usr/local/lib/python3.11/site-packages/litellm/llms/ollama.py\", line 436, in ollama_acompletion\n    resp = await session.post(url, json=data)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.11/site-packages/aiohttp/client.py\", line 684, in _request\n    await resp.start(conn)\n  File \"/usr/local/lib/python3.11/site-packages/aiohttp/client_reqrep.py\", line 994, in start\n    with self._timer:\n  File \"/usr/local/lib/python3.11/site-packages/aiohttp/helpers.py\", line 713, in __exit__\n    raise asyncio.TimeoutError from None\nTimeoutError\n\nDuring handling of the above exceptio"
    },

Twitter / LinkedIn details

No response

@shuther shuther added the bug Something isn't working label Sep 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant