You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’ve been using vLLM to successfully run my local language model. Currently, I’m using AutoGen to connect to my vLLM server, define tools, run them, and everything works smoothly in offline mode, including calling the appropriate functions.
When using the standard vLLM, serving was straightforward with the vllm serve command. However, with AutoGen, since it’s just a basic script, I can’t serve it the same way as before. I now have to implement an additional layer using FastAPI or something similar on top of the vLLM server. This is a complex task and involves a lot of effort. Handling load balancing will also be a pain.
In short, is there a way to use AutoGen in an online setting? Perhaps a smart method to serve it, or a way to connect it back to the vLLM endpoint and continue interacting as before?
I’m curious to know what the standard approach is.
Many thanks.
Describe the solution you'd like
Maybe an autogen server?
The text was updated successfully, but these errors were encountered:
Hello,
I’ve been using vLLM to successfully run my local language model. Currently, I’m using AutoGen to connect to my vLLM server, define tools, run them, and everything works smoothly in offline mode, including calling the appropriate functions.
When using the standard vLLM, serving was straightforward with the vllm serve command. However, with AutoGen, since it’s just a basic script, I can’t serve it the same way as before. I now have to implement an additional layer using FastAPI or something similar on top of the vLLM server. This is a complex task and involves a lot of effort. Handling load balancing will also be a pain.
In short, is there a way to use AutoGen in an online setting? Perhaps a smart method to serve it, or a way to connect it back to the vLLM endpoint and continue interacting as before?
I’m curious to know what the standard approach is.
Many thanks.
Describe the solution you'd like
Maybe an autogen server?
The text was updated successfully, but these errors were encountered: