Deploying Locally Trained LLM & ML Assets to Azure

Despite the advantages to leveraging the Azure AI Marketplace for some use cases, there are many scenarios in which performing Fine-Tuning, Deep Learning and MLops on your local hardware is preferable to performing the same in the cloud. In my experience, this often includes one or more of the following:

You or your business unit invested in a local hardware setup to achieve your AI & ML goals (good move).
Cost constraints or budget caps of cloud-based training are prohibitive to your AI & ML training goals.
Your project requires maximum customization, where low-level control over training loops and frameworks such as MLX, custom PyTorch and others.
Iterative experimentation and debugging, where many quick tests on small models or datasets. In this scenario, local hardware could help you avoid long cloud spin-up times and remote debugging overhead.
Data privacy & governance requirements, in scenarios where data cannot leave your organization’s local environment.

Now, my prediction is that while the above 5 points may not be a constraint for all companies today, they will become so (to varying degrees) in the near future, as organizations become more dependent on LLMs, ANN’s and ML for everyday business tasks. Regardless of this, many of my friends and colleagues are in scenarios today, where they are seeking to establish standards for the deployment of locally trained AI assets to Azure cloud-based infrastructure.

In this article, we’ll explore the three primary infrastructure options to deploy AI models within your Azure environment, along with the advantages and disadvantages of each configuration. We’ll close with some one-line, scenario-based recommendations on which direction might be best for your organization.

Disclaimers

There are two types of assets which are generally referred to as Artificial Intelligence(“AI”) and you’ll see me use the term loosely in this article: Predictive modeling (or more broadly, “Traditional Machine Learning”) and Generative AI (aka “LLM”). The former are non-generative ML solutions that operate in the background without direct human interaction, often powering decision-making systems, automations, and analytics. This includes such tasks as making predictions based on structured input data, predictive analysis using historical data, discriminative modeling and embedded ML/Edge ML solutions.

On the other hand, Generative AI assets (such as my very own, Indigo LLM) often require heavier GPU processing capabilities while operating in production environments. As such, the focus of this article will be centered around the heavier compute workload requirements, though it’s worth noting that the recommendations covered can be scaled for both GenAI and ML requirements.

Further and as the title suggests, this article will focus on the infrastructure required to deploy models to an Azure environment – where follow up articles will likely focus on Google Cloud Platform(GCP), due to its ability to scale downward to homelab and startup budgets. I will not be covering Role Based Access Control(RBAC), Identity Access Management(IAM) or the other standard security controls which have recently been masquerading around LinkedIn as “AI Security & Governance”.

I’ve disabled comments for this post, but hope that you will feel free to connect and reach me directly via my LinkedIn with any questions, comments or things that I’ve left out.

The Primary Infrastructure Options:

When it comes to deploying a locally trained Machine Learning(ML) or Generative AI Model to the cloud, Azure offers several flexible infrastructure options tailored to different use cases and levels of complexity. Whether you need full control over your environment, containerized scalability, or a fully managed MLOps platform, Azure has you well covered with three primary paths: Virtual Machines (VMs), Azure Container Instances or Azure Kubernetes Service (ACI/AKS), and Azure Machine Learning (Azure ML). Each option enables you to bring your locally trained model into production, with trade-offs in customization, scalability, automation, and operational overhead. In the sections below, we’ll explore how each approach works and when it’s best suited for your deployment needs.

Azure Virtual Machines (VM)

When you would like more control over the computing environment on which you are deploying your AI solution, Azure Virtual Machines grant on-demand availability and highly customizable configuration. Depending on your use case for the model (i.e. number of simultaneous users or data throughput requirements) you’re likely to provision GPU-enabled Linux VM’s from the NC-Family or ND-Family, which are built for high compute and larger, memory graphics intensive workflows.

Once you’ve selected and provisioned your model’s new home, you can simply SSH into your Azure VM instance, install all dependencies needed for operations, then upload your model from your local environment via scp, azcpy or Git. You will also need to upload your custom scripts and chosen API for running the model.

Disadvantages:

The main complaint I hear against the use of Azure VM’s for LLM deployments is that scaling is manual, which requires some foresite and makes for hard limits on use. Additionally, you have to manage absolutely everything from security, CI/CD integration and updates, to networking, security and logging.

Additionally, Azure VM’s also suffer within the scalability department, making them less useful for supporting models which may be required to handle many hundreds of user queries per minute.

Best Use Case:

If you can get over the above disadvantages, there are some use case advantages to running your model in a virtual machine, versus Azure Containers (ACI/AKS) or Azure ML Studio, to include:

Rapid prototyping
Full environmental control
Running smaller, quantized models or custom runtimes
Pay-as-you-go works best for your budget

With the above in mind, I would typically recommend Azure VM’s to organizations, practices and business units who might have a lower volume of users per AI asset (15 – 30 daily users). Azure VM’s are also a great option for organizations who seek to train and host one LLM per business unit – such as DevOps, Compliance, Legal and Sales.

Azure Container Instances (ACI) & Azure Kubernetes Service (AKS)

As I was going through the scalability issues with Azure VM’s, a lot of my readers are probably thinking to themselves: “Why not just throw it in a container image or Kubernetes?” Well, if this was you, then you are correct – but with a few caveats to all the advantages in automatic deployment, orchestration and autoscaling.

Deployment of your model through this method is fairly strait-forward (for the initiated), in that you build a docker image on your local resources to contain your LLM, the code required to load the model and expose an API. From here, we push the image to Azure Container Registry (ACR) then decide whether you will deploy to Azure Container Instance (ACI) for lightweight tasks, or to Azure Kubernetes Services (AKS) for full-scale orchestration and enterprise level scalability.

Disadvantages:

As straightforward as the initial setup seems, deploying an LLM to ACI/AKS can be more complex than deploying single Azure VM’s, and generally requires prior Docker, Kubernetes and DevOps knowledge. Further, GPU access in containers can be tricky to navigate/manage, as it is only supported by specific SKUs (at the time of this writing).

The last disadvantage that comes to mind, is that scalability is somewhat limited through Azure Container Instances (ACI), but it is still automated – which is a huge step up from Azure VM’s. For projects requiring high scalability, AKS or Azure Machine Learning(AML) are better options.

Best Use Case:

Though they might lack the same level of flexibility of Azure VM’s, and are less than ideal for prototyping workflows, ACI/AKS stands out from both Azure VM’s and Azure Machine Learning (AML) in the following categories:

Deploying REST API’s in production
Rapidly scalable inference workloads
Great isolation and reproducibility
Light administrative and initial setup overhead, compared to Azure VM
Easy to manage team-based deployment pipelines with DevOps and CI/CD

Personally, I believe this is a great “happy medium” option for organizations of almost any size, assuming that you have team members with container and orchestration experience. AKS might also be a favorable option for some startups who are looking to expose the trial version of their AI enabled app to the public – though costs can add up quickly if you’re not in the Microsoft Founders Club.

Azure Machine Learning (Azure ML/AML)

If you’re looking for an end-to-end, general-purpose platform which provides tools to train, deploy, and manage models on Azure infrastructure, Azure Machine Learning(Azule ML/AML) is a great option. At the time of this writing, AML is one of the go-to Azure native services for teams implementing MLOps pipelines within their organization due.

Though it takes some getting used to and deployment is much different than the previous two options, deploying your model with AML shares many of the core components that you’ll see with other fully-managed MLops & AI pipeline platforms. This starts with uploading and registering your model within MLflow then selecting an inference configuration which includes entry scripts and model dependencies. From here, you will define a compute target and deploy your model to a managed endpoint, which allows you monitor, scale and grade the model through AML’s integrated suite of tools.

Disadvantages:

As a fully managed platform with semi-autonomous features, Azure Machine Learning has a much steeper administrative earning curve than the previous deployment options, and requires a fairly consistent (high) configuration overhead for each model deployed. Additionally and as is a common trade off for Azure’s fully integrated, managed platforms; AML is the culmination of a series of other services and capabilities which, when broken out, are much cheaper as standalone services. Bundling them together is costly, rendering AML as a platform almost exclusive to Enterprise-grade MLOps projects and budgets..

A major sacrifice I’ve experienced on numerous occasions, is that is that AML is also a lot less flexible (than raw VM’s or Kubernetes clusters) for the use of unconventional & custom tool chains (such as llama.cpp), as well as the use of off-SLA vector database solutions. On the projects where this became an issue, we were forced to explore other hosting options.

Best Use-Case:

For its disadvantages in cost and flexibility, Azure Machine Learning is a highly capable MLOps and deployment solution, and (at the time of this writing) is preferable to Azure AI Foundry for MLOps, as well as the other two deployment types mentioned in this post for many use cases where the following are desired:

Enterprise-grade deployments
Batch or real-time inference
Collaboration with data engineers and data scientists
Direct integration with AutoML or ML pipelines
Integration with Azure Data Factory, ML pipelines, and AutoML
Fully Managed Infrastructure
Model versioning, performance comparison & change rollbacks

A key advantage that continuously sets AML apart in my projects, is the ability to specify the types of resources you would like to use for training, known collectively as compute targets. What’s interesting about compute targets is that it gets companies about as close to ‘having your cake, and eating it too’ as you ever will with Azure, in that they can merge fully managed cloud-based MLOps services with local compute clusters and resources – as well as cloud based compute resources. In short: You can use Azure ML to train models on your local hardware.

Though this is not without some limitations and may cause concern for those working on more sensitive AI capabilities, it is an interesting option for those of us working to develop capabilities which are either semi-common or less sensitive in nature.

Recommendations (TLDR)

At this point, some of you may be thinking: “OK, I understand the differences, but does any of this have a direct impact on my use case – or which deployment option fits my use case the best?” Considering the above discussed three options, here are some bullet points for quick reference:

Azure VM: If you’re a Start-up in Stealth Mode – and you’re out of local resources.
Azure VM: If you’re performing rapid prototyping or running quantized & custom runtimes from your Azure environment.
Azure VM or ACI/AKS: If you’re a start-up that’s funded and planning to scale a model deployment
Azure VM or ACI/AKS: If you’re planning a few one-off deployments for specialized, internal use
ACI/AKS: If you require rapid scalability, up to (but just below) the enterprise scale.
Azure Machine Learning: If you’re building a wide range of LLM & ML assets, for both internal and external use.
Azure Machine Learning: If your solutions require managed autoscaling and endpoints, versioning and model performance comparison.

Closing Thoughts:

I hope that this post has some value to the planning phases of your project or AI/ML goals, and that I’ve provided some food for thought for training your models locally. Thanks for reading and again, I hope that you’ll feel free to reach me via LinkedIn or through the contact form on this website.