Time to Complete: 2-3 hours
This guide walks you through setting up your Azure Databricks environment for LLM development and deployment, integrating with Azure AI Foundry, and configuring GitHub for seamless CI/CD.
- Prerequisites
- Azure Databricks Setup
- Azure AI Foundry Integration
- GitHub Configuration
- Environment Management
- Secrets Management
- Validating Your Setup
Before starting, ensure you have:
- An Azure subscription with Owner or Contributor access
- A GitHub account with repository creation permissions
- Terraform installed locally (if using Infrastructure as Code)
- Azure CLI installed and configured
-
Log in to the Azure Portal
-
Navigate to "Create a resource" > Search for "Azure Databricks"
-
Fill in the basics:
- Subscription: Your Azure subscription
- Resource Group: Create new or select existing
- Workspace Name:
llm-mlops-workspace - Region: Choose a region supporting the ML runtime (e.g., East US)
- Pricing Tier: Premium (recommended for MLOps features)
-
Review + Create > Create
⚠️ Note: The Premium tier is required for advanced security features and access controls.
- Launch your Databricks workspace
- Navigate to Compute > Create Cluster
- Configure your development cluster:
- Cluster Name:
llm-dev-cluster - Cluster Mode: Single Node or Standard
- Databricks Runtime Version:
13.3 LTS MLor newer - Node Type: Select an appropriate node type with GPUs if needed for LLM training
- Auto-termination: 120 minutes (recommended for cost savings)
- Cluster Name:
{
"cluster_name": "llm-dev-cluster",
"spark_version": "13.3.x-gpu-ml-scala2.12",
"node_type_id": "Standard_NC6s_v3",
"driver_node_type_id": "Standard_NC6s_v3",
"autotermination_minutes": 120,
"spark_conf": {
"spark.databricks.cluster.profile": "singleNode",
"spark.master": "local[*]"
},
"custom_tags": {
"Environment": "Development"
}
}- For production workloads, create a separate cluster:
- Cluster Name:
llm-prod-cluster - Cluster Mode: Standard (multi-node)
- Autoscaling: Enabled, min: 2, max: 8 workers (adjust based on your needs)
- Cluster Name:
- Create a new notebook to validate the ML runtime
- Run the following code to validate the environment:
# Verify installed versions
import mlflow
import torch
import transformers
print(f"MLflow version: {mlflow.__version__}")
print(f"PyTorch version: {torch.__version__}")
print(f"Transformers version: {transformers.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
print(f"CUDA version: {torch.version.cuda}")
print(f"GPU count: {torch.cuda.device_count()}")
print(f"GPU name: {torch.cuda.get_device_name(0)}")- In the Azure Portal, search for "Azure OpenAI"
- Create a new Azure OpenAI resource
- Subscription: Same as your Databricks workspace
- Resource Group: Same as your Databricks workspace
- Region: Choose an available region (e.g., East US)
- Name:
llm-openai-service - Pricing Tier: Standard S0
- Go to your Azure OpenAI resource
- Select "Model Deployments"
- Deploy the following models:
- Text Embedding Model:
text-embedding-ada-002(for embeddings) - LLM Model:
gpt-4orgpt-35-turbo(for completions)
- Text Embedding Model:
- In your Azure OpenAI resource, go to "Keys and Endpoint"
- Copy the endpoint and key for later use in Databricks
- Create a new GitHub repository for your LLM project
- Clone the repository locally
- Set up the basic project structure (follow the structure in the main README)
- Navigate to your GitHub account settings
- Go to "Copilot" in the sidebar
- Enable GitHub Copilot for your account
- Install the Copilot extension in your IDE (VS Code, JetBrains IDEs, etc.)
- In your repository, go to "Settings" > "Security & analysis"
- Enable:
- Dependency graph
- Dependabot alerts
- Dependabot security updates
- Code scanning
For each environment (development, staging, production):
- Create separate Databricks workspaces or use a single workspace with separate folders
- Set up environment-specific clusters with appropriate security and scaling configurations
- Use Databricks Repos to link code from GitHub to each environment
- Configure environment-specific variables in each workspace
# Example script to set up workspace folders
databricks workspace mkdirs /Development
databricks workspace mkdirs /Staging
databricks workspace mkdirs /Production- Create a secret scope in Databricks:
databricks secrets create-scope --scope llm-secrets- Add your Azure OpenAI and other service credentials:
databricks secrets put --scope llm-secrets --key azure-openai-key --string-value "YOUR_OPENAI_KEY"
databricks secrets put --scope llm-secrets --key azure-openai-endpoint --string-value "YOUR_OPENAI_ENDPOINT"- Access secrets in notebooks:
openai_key = dbutils.secrets.get(scope="llm-secrets", key="azure-openai-key")
openai_endpoint = dbutils.secrets.get(scope="llm-secrets", key="azure-openai-endpoint")- Create an Azure Key Vault in the same resource group
- Add your secrets to Key Vault
- Set up Databricks to access Key Vault using Managed Identity
Run the following tests to verify your setup:
- Test Azure OpenAI connection:
import openai
import os
# Configure OpenAI API
openai.api_type = "azure"
openai.api_key = dbutils.secrets.get(scope="llm-secrets", key="azure-openai-key")
openai.api_base = dbutils.secrets.get(scope="llm-secrets", key="azure-openai-endpoint")
openai.api_version = "2023-05-15"
# Test connection
response = openai.ChatCompletion.create(
engine="gpt-35-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, are you working correctly?"}
]
)
print(response.choices[0].message.content)- Test GitHub integration:
# If using Databricks Repos
%sh
git statusCreate a simple notebook that:
- Loads data from storage
- Processes it using a simple model
- Logs the results using MLflow
- Serves predictions
This will validate that all components are working together properly.
Once your environment is set up, you can proceed to: