🏗️ Environment Setup

Time to Complete: 2-3 hours

This guide walks you through setting up your Azure Databricks environment for LLM development and deployment, integrating with Azure AI Foundry, and configuring GitHub for seamless CI/CD.

Prerequisites
Azure Databricks Setup
Azure AI Foundry Integration
GitHub Configuration
Environment Management
Secrets Management
Validating Your Setup

Prerequisites

Before starting, ensure you have:

An Azure subscription with Owner or Contributor access
A GitHub account with repository creation permissions
Terraform installed locally (if using Infrastructure as Code)
Azure CLI installed and configured

Azure Databricks Setup

Step 1: Create a Databricks Workspace

Log in to the Azure Portal
Navigate to "Create a resource" > Search for "Azure Databricks"
Fill in the basics:
- Subscription: Your Azure subscription
- Resource Group: Create new or select existing
- Workspace Name: llm-mlops-workspace
- Region: Choose a region supporting the ML runtime (e.g., East US)
- Pricing Tier: Premium (recommended for MLOps features)
Review + Create > Create

⚠️ Note: The Premium tier is required for advanced security features and access controls.

Step 2: Configure Databricks Clusters

Launch your Databricks workspace
Navigate to Compute > Create Cluster
Configure your development cluster:
- Cluster Name: llm-dev-cluster
- Cluster Mode: Single Node or Standard
- Databricks Runtime Version: 13.3 LTS ML or newer
- Node Type: Select an appropriate node type with GPUs if needed for LLM training
- Auto-termination: 120 minutes (recommended for cost savings)

{
  "cluster_name": "llm-dev-cluster",
  "spark_version": "13.3.x-gpu-ml-scala2.12",
  "node_type_id": "Standard_NC6s_v3",
  "driver_node_type_id": "Standard_NC6s_v3",
  "autotermination_minutes": 120,
  "spark_conf": {
    "spark.databricks.cluster.profile": "singleNode",
    "spark.master": "local[*]"
  },
  "custom_tags": {
    "Environment": "Development"
  }
}

For production workloads, create a separate cluster:
- Cluster Name: llm-prod-cluster
- Cluster Mode: Standard (multi-node)
- Autoscaling: Enabled, min: 2, max: 8 workers (adjust based on your needs)

Step 3: Set Up ML Runtime Components

Create a new notebook to validate the ML runtime
Run the following code to validate the environment:

# Verify installed versions
import mlflow
import torch
import transformers

print(f"MLflow version: {mlflow.__version__}")
print(f"PyTorch version: {torch.__version__}")
print(f"Transformers version: {transformers.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU count: {torch.cuda.device_count()}")
    print(f"GPU name: {torch.cuda.get_device_name(0)}")

Azure AI Foundry Integration

Step 1: Enable Azure OpenAI Service

In the Azure Portal, search for "Azure OpenAI"
Create a new Azure OpenAI resource
- Subscription: Same as your Databricks workspace
- Resource Group: Same as your Databricks workspace
- Region: Choose an available region (e.g., East US)
- Name: llm-openai-service
- Pricing Tier: Standard S0

Step 2: Deploy Foundation Models

Go to your Azure OpenAI resource
Select "Model Deployments"
Deploy the following models:
- Text Embedding Model: text-embedding-ada-002 (for embeddings)
- LLM Model: gpt-4 or gpt-35-turbo (for completions)

Step 3: Configure API Access

In your Azure OpenAI resource, go to "Keys and Endpoint"
Copy the endpoint and key for later use in Databricks

GitHub Configuration

Step 1: Repository Setup

Create a new GitHub repository for your LLM project
Clone the repository locally
Set up the basic project structure (follow the structure in the main README)

Step 2: GitHub Copilot Configuration

Navigate to your GitHub account settings
Go to "Copilot" in the sidebar
Enable GitHub Copilot for your account
Install the Copilot extension in your IDE (VS Code, JetBrains IDEs, etc.)

Step 3: GitHub Advanced Security

In your repository, go to "Settings" > "Security & analysis"
Enable:
- Dependency graph
- Dependabot alerts
- Dependabot security updates
- Code scanning

Environment Management

Creating Development, Staging, and Production Environments

For each environment (development, staging, production):

Create separate Databricks workspaces or use a single workspace with separate folders
Set up environment-specific clusters with appropriate security and scaling configurations
Use Databricks Repos to link code from GitHub to each environment
Configure environment-specific variables in each workspace

# Example script to set up workspace folders
databricks workspace mkdirs /Development
databricks workspace mkdirs /Staging
databricks workspace mkdirs /Production

Secrets Management

Databricks Secret Scopes

Create a secret scope in Databricks:

databricks secrets create-scope --scope llm-secrets

Add your Azure OpenAI and other service credentials:

databricks secrets put --scope llm-secrets --key azure-openai-key --string-value "YOUR_OPENAI_KEY"
databricks secrets put --scope llm-secrets --key azure-openai-endpoint --string-value "YOUR_OPENAI_ENDPOINT"

Access secrets in notebooks:

openai_key = dbutils.secrets.get(scope="llm-secrets", key="azure-openai-key")
openai_endpoint = dbutils.secrets.get(scope="llm-secrets", key="azure-openai-endpoint")

Key Vault Integration (Recommended for Production)

Create an Azure Key Vault in the same resource group
Add your secrets to Key Vault
Set up Databricks to access Key Vault using Managed Identity

Validating Your Setup

Connection Testing

Run the following tests to verify your setup:

Test Azure OpenAI connection:

import openai
import os

# Configure OpenAI API
openai.api_type = "azure"
openai.api_key = dbutils.secrets.get(scope="llm-secrets", key="azure-openai-key")
openai.api_base = dbutils.secrets.get(scope="llm-secrets", key="azure-openai-endpoint")
openai.api_version = "2023-05-15"

# Test connection
response = openai.ChatCompletion.create(
    engine="gpt-35-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello, are you working correctly?"}
    ]
)

print(response.choices[0].message.content)

Test GitHub integration:

# If using Databricks Repos
%sh
git status

End-to-End Test

Create a simple notebook that:

Loads data from storage
Processes it using a simple model
Logs the results using MLflow
Serves predictions

This will validate that all components are working together properly.

Next Steps

Once your environment is set up, you can proceed to:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🏗️ Environment Setup

Table of Contents

Prerequisites

Azure Databricks Setup

Step 1: Create a Databricks Workspace

Step 2: Configure Databricks Clusters

Step 3: Set Up ML Runtime Components

Azure AI Foundry Integration

Step 1: Enable Azure OpenAI Service

Step 2: Deploy Foundation Models

Step 3: Configure API Access

GitHub Configuration

Step 1: Repository Setup

Step 2: GitHub Copilot Configuration

Step 3: GitHub Advanced Security

Environment Management

Creating Development, Staging, and Production Environments

Secrets Management

Databricks Secret Scopes

Key Vault Integration (Recommended for Production)

Validating Your Setup

Connection Testing

End-to-End Test

Next Steps

FilesExpand file tree

environment_setup.md

Latest commit

History

environment_setup.md

File metadata and controls

🏗️ Environment Setup

Table of Contents

Prerequisites

Azure Databricks Setup

Step 1: Create a Databricks Workspace

Step 2: Configure Databricks Clusters

Step 3: Set Up ML Runtime Components

Azure AI Foundry Integration

Step 1: Enable Azure OpenAI Service

Step 2: Deploy Foundation Models

Step 3: Configure API Access

GitHub Configuration

Step 1: Repository Setup

Step 2: GitHub Copilot Configuration

Step 3: GitHub Advanced Security

Environment Management

Creating Development, Staging, and Production Environments

Secrets Management

Databricks Secret Scopes

Key Vault Integration (Recommended for Production)

Validating Your Setup

Connection Testing

End-to-End Test

Next Steps