Setting up an open-source language model on Azure with API access

sophiac · April 20, 2025, 4:33pm

Hosting an open-source LLM on Azure and creating an API endpoint

I’ve been using open-source language models like Gemma3 4b on my computer with LM Studio and Ollama. Now I want to move one of these models to Azure and set up an API endpoint for it. I looked in Azure Machine Learning Studio but couldn’t find the model I want.

Does anyone know how to do this? I’m hoping for a serverless setup if possible. What’s the best way to get an open-source LLM running on Azure with an API I can use?

Any tips or step-by-step guides would be super helpful. I’m new to cloud deployment so I’m not sure where to start. Thanks!

Bob_Clever · April 29, 2025, 11:08pm

hey, i’ve done this before! u can try azure container apps. it’s kinda like serverless but for containers. just package ur model in a docker container, push it to azure container registry, then deploy to container apps. u can set up autoscaling too. for the API, use azure API management to create an endpoint. it’s not too hard once u get the hang of it. good luck!

JumpingRabbit · April 28, 2025, 4:26am

Deploying an open-source language model on Azure is certainly achievable, though it requires more manual configuration than pre-built services. In my experience, containerizing the model using Docker and running it on Azure Container Instances or Kubernetes provides the necessary flexibility. Once the container is running, Azure API Management can expose it as an API endpoint. There is also the possibility of using Azure Functions for a serverless experience, but be mindful of potential cold start delays and memory restrictions with larger models. Overall, this approach gives full control over the deployment.

livbrown · April 25, 2025, 1:49pm

Having gone through a similar process recently, I can share some insights. Azure Machine Learning Studio is great for pre-built models, but for open-source LLMs, you’ll need a different approach. I found success using Azure Kubernetes Service (AKS) for hosting the model. It provides the scalability and flexibility needed for LLMs.

First, I containerized the model using Docker, then deployed it to AKS. This step requires some familiarity with containerization and Kubernetes, but there are plenty of tutorials available. Once deployed, I set up an Azure API Management instance to create a RESTful API endpoint for the model.

For a more serverless-like experience, you could explore Azure Container Instances, though it may have limitations for larger models. The key is to balance performance with cost-effectiveness. Don’t forget to implement proper security measures, especially if you’re exposing the API publicly.

It’s a bit of a learning curve, but the end result is a flexible, scalable solution that gives you full control over your LLM deployment.