Skip to content

Latest commit

 

History

History
578 lines (414 loc) · 13.5 KB

File metadata and controls

578 lines (414 loc) · 13.5 KB

OpenMeteo Python - Docker Airflow Setup

This Docker setup provides a local Apache Airflow environment for testing weather data DAGs generated by openmeteo-python.

Prerequisites

Quick Start

1. Navigate to the docker directory

cd docker

2. Set the Airflow user ID (Linux/Mac only)

echo -e "AIRFLOW_UID=$(id -u)" > .env

On Windows, create a .env file with:

AIRFLOW_UID=50000

3. Build and start the services

# Build the custom image with openmeteo-python
docker-compose build

# Start all services
docker-compose up -d

4. Wait for initialization

The first startup takes a few minutes. Check the status:

docker-compose ps

Wait until all services show "healthy" status.

5. Access Airflow UI

Open your browser and navigate to:

URL: http://localhost:8080

Username: airflow Password: airflow

Service Startup Order

Services start in the following sequence (automatically managed):

1. postgres         → Starts first, waits until healthy
2. airflow-init     → Runs DB migrations, creates admin user
3. airflow-scheduler → Starts after init completes
4. airflow-webserver → Starts after init completes
5. airflow-triggerer → Starts after init completes

Services

Service Port Description
airflow-webserver 8080 Airflow Web UI
airflow-scheduler - DAG scheduler
airflow-triggerer - Async trigger handling
postgres 5432 Metadata database

Directory Structure

docker/
├── Dockerfile              # Custom Airflow + openmeteo image
├── docker-compose.yml      # Service definitions
├── README.md               # This file
├── dags/                   # Place your DAGs here
│   └── example_weather_dag.py  # Sample weather DAG
├── logs/                   # Airflow logs (auto-created)
├── plugins/                # Custom Airflow plugins
└── output/                 # DAG output files

DAG Management Examples

List All DAGs

docker-compose exec airflow-webserver airflow dags list

Expected output:

dag_id                     | filepath               | owner     | paused
===========================+========================+===========+=======
openmeteo_weather_pipeline | example_weather_dag.py | openmeteo | False

Check DAG Status (Paused/Unpaused)

docker-compose exec airflow-webserver airflow dags list | grep openmeteo

Pause a DAG

# Pause DAG (stop scheduled runs)
docker-compose exec airflow-webserver airflow dags pause openmeteo_weather_pipeline

Output:

Dag: openmeteo_weather_pipeline, paused: True

Unpause a DAG

# Unpause DAG (enable scheduled runs)
docker-compose exec airflow-webserver airflow dags unpause openmeteo_weather_pipeline

Output:

Dag: openmeteo_weather_pipeline, paused: False

Trigger a DAG Manually

# Trigger immediate execution
docker-compose exec airflow-webserver airflow dags trigger openmeteo_weather_pipeline

Output:

Created <DagRun openmeteo_weather_pipeline @ 2025-11-27T20:41:25+00:00: manual__2025-11-27T20:41:25+00:00, state:queued>

Trigger DAG with Custom Configuration

# Trigger with configuration parameters
docker-compose exec airflow-webserver airflow dags trigger \
  --conf '{"locations": ["Paris", "Madrid"]}' \
  openmeteo_weather_pipeline

Check DAG Run Status

# List all runs for a DAG
docker-compose exec airflow-webserver airflow dags list-runs -d openmeteo_weather_pipeline

Output:

dag_id                     | run_id                            | state   | execution_date
===========================+===================================+=========+==========================
openmeteo_weather_pipeline | manual__2025-11-27T20:41:25+00:00 | success | 2025-11-27T20:41:25+00:00

Check Task Status

# List tasks in a DAG
docker-compose exec airflow-webserver airflow tasks list openmeteo_weather_pipeline

# Check state of tasks in a specific run
docker-compose exec airflow-webserver airflow tasks states-for-dag-run \
  openmeteo_weather_pipeline "manual__2025-11-27T20:41:25+00:00"

Test a Specific Task (Without Recording)

# Test fetch_forecast task for a specific date
docker-compose exec airflow-webserver airflow tasks test \
  openmeteo_weather_pipeline fetch_forecast 2025-11-27

# Test the air quality task
docker-compose exec airflow-webserver airflow tasks test \
  openmeteo_weather_pipeline fetch_air_quality 2025-11-27

# Test the report generation task
docker-compose exec airflow-webserver airflow tasks test \
  openmeteo_weather_pipeline generate_report 2025-11-27

Run Full DAG Test (Dry Run)

# Test entire DAG without recording to database
docker-compose exec airflow-webserver airflow dags test openmeteo_weather_pipeline 2025-11-27

View Task Logs

# View logs for a specific task instance
docker-compose exec airflow-webserver airflow tasks logs \
  openmeteo_weather_pipeline fetch_forecast "manual__2025-11-27T20:41:25+00:00"

Delete DAG Runs (Reset)

# Delete all runs for a DAG (use with caution)
docker-compose exec airflow-webserver airflow dags delete openmeteo_weather_pipeline

# Then re-import the DAG
docker-compose exec airflow-webserver airflow dags reserialize

Complete Workflow Example

Here's a complete example of managing a DAG from start to finish:

# 1. Check if DAG is loaded
docker-compose exec airflow-webserver airflow dags list

# 2. Check if DAG is paused
docker-compose exec airflow-webserver airflow dags list | grep openmeteo

# 3. Unpause the DAG (if paused)
docker-compose exec airflow-webserver airflow dags unpause openmeteo_weather_pipeline

# 4. Trigger the DAG
docker-compose exec airflow-webserver airflow dags trigger openmeteo_weather_pipeline

# 5. Wait a moment, then check the run status
sleep 10
docker-compose exec airflow-webserver airflow dags list-runs -d openmeteo_weather_pipeline

# 6. Check output files
ls -la output/

# 7. View the generated data
head -20 output/forecast_hourly_*.csv

Generating Custom DAGs

Use the openmeteo CLI to generate DAGs:

# Generate a DAG for multiple APIs
openmeteo generate-dag \
  --apis forecast,air_quality \
  --location "Berlin:52.52,13.41" \
  --location "London:51.51,-0.13" \
  --output docker/dags/my_weather_dag.py

# Generate a standalone script
openmeteo generate-script \
  --api forecast \
  --location "Tokyo:35.68,139.69" \
  --output docker/dags/tokyo_forecast.py

Testing Generated DAGs

After generating a DAG:

  1. Copy it to docker/dags/ directory
  2. Wait ~30 seconds for Airflow to detect it
  3. Verify it's loaded: docker-compose exec airflow-webserver airflow dags list
  4. Unpause and trigger the DAG

Viewing Logs

# View all service logs
docker-compose logs -f

# View specific service logs
docker-compose logs -f airflow-scheduler

# View only errors
docker-compose logs airflow-scheduler 2>&1 | grep -i error

# View DAG task logs in Airflow UI
# Navigate to: DAG > Task > Log

Accessing Output Files

Output files are saved to docker/output/:

# List output files
ls -la output/

# View CSV header and first rows
head -10 output/forecast_hourly_2025-11-27.csv

# Count rows in output
wc -l output/*.csv

# View specific location data
grep "Berlin" output/forecast_daily_2025-11-27.csv

Service Management

# Start services
docker-compose up -d

# Stop services
docker-compose down

# Stop and remove volumes (clean reset)
docker-compose down -v

# Restart services
docker-compose restart

# Restart specific service
docker-compose restart airflow-scheduler

# View service status
docker-compose ps

# View resource usage
docker stats

Running Python Commands

# Test openmeteo installation
docker-compose exec airflow-webserver python -c "from openmeteo import OpenMeteo; print('OK')"

# Run a quick forecast test
docker-compose exec airflow-webserver python -c "
from openmeteo import OpenMeteo
client = OpenMeteo()
response = client.forecast.get(latitude=52.52, longitude=13.41, hourly=['temperature_2m'])
print(response.to_dataframe().head())
"

# Test air quality API
docker-compose exec airflow-webserver python -c "
from openmeteo import OpenMeteo
client = OpenMeteo()
response = client.air_quality.get(latitude=52.52, longitude=13.41, hourly=['pm10', 'pm2_5'])
print(response.to_dataframe().head())
"

Configuration

Environment Variables

Create a .env file in the docker directory:

# Required for Linux/Mac
AIRFLOW_UID=50000

# Airflow credentials
_AIRFLOW_WWW_USER_USERNAME=airflow
_AIRFLOW_WWW_USER_PASSWORD=airflow

# OpenMeteo API key (optional, for commercial use)
OPENMETEO_API_KEY=your-api-key

Customizing Airflow Settings

Edit docker-compose.yml and modify the environment section:

environment:
  AIRFLOW__CORE__EXECUTOR: LocalExecutor
  AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
  AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL: 30  # DAG refresh interval
  # Add your custom settings here

Troubleshooting

Issue: Services won't start

Symptoms: Containers keep restarting or fail to start

Solutions:

# 1. Check detailed logs
docker-compose logs

# 2. Ensure ports are available
netstat -an | grep 8080
netstat -an | grep 5432

# 3. Check if another Airflow is running
docker ps -a | grep airflow

# 4. Reset everything
docker-compose down -v
docker-compose up -d

Issue: DAG not appearing in UI

Symptoms: DAG file exists but not visible in Airflow UI

Solutions:

# 1. Check for Python syntax errors
docker-compose exec airflow-webserver python /opt/airflow/dags/your_dag.py

# 2. Check scheduler logs for import errors
docker-compose logs airflow-scheduler | grep -i "error\|import\|syntax"

# 3. Force DAG reserialize
docker-compose exec airflow-webserver airflow dags reserialize

# 4. Wait for DAG refresh (default: 30 seconds)
sleep 30
docker-compose exec airflow-webserver airflow dags list

Issue: DAG stuck in "queued" or "running" state

Symptoms: Tasks don't progress, stay queued forever

Solutions:

# 1. Check scheduler is healthy
docker-compose ps airflow-scheduler

# 2. Restart scheduler
docker-compose restart airflow-scheduler

# 3. Check for deadlocks in database
docker-compose exec airflow-webserver airflow dags list-runs -d openmeteo_weather_pipeline

# 4. Clear stuck task instances
docker-compose exec airflow-webserver airflow tasks clear \
  openmeteo_weather_pipeline -s 2025-11-27 -e 2025-11-27

Issue: Task fails with "No module named 'openmeteo'"

Symptoms: Import error when task runs

Solutions:

# 1. Verify openmeteo is installed in container
docker-compose exec airflow-webserver pip list | grep openmeteo

# 2. Rebuild the image
docker-compose build --no-cache
docker-compose up -d

Issue: Permission errors (Linux)

Symptoms: Cannot write to logs or output directory

Solutions:

# Set correct ownership
sudo chown -R $(id -u):$(id -g) dags logs output plugins

# Or run with root (not recommended for production)
echo "AIRFLOW_UID=0" > .env
docker-compose down
docker-compose up -d

Issue: Out of memory

Symptoms: Containers killed, "OOMKilled" in docker inspect

Solutions:

  1. Increase Docker memory limit to at least 4GB in Docker Desktop settings
  2. Reduce parallelism in Airflow config:
    AIRFLOW__CORE__PARALLELISM: 4
    AIRFLOW__CORE__DAG_CONCURRENCY: 2

Issue: Database connection errors

Symptoms: "Connection refused" or "database does not exist"

Solutions:

# 1. Check postgres is healthy
docker-compose ps postgres

# 2. Check postgres logs
docker-compose logs postgres

# 3. Wait for postgres to be ready (auto-handled by depends_on)
docker-compose down
docker-compose up -d

# 4. Full reset of database
docker-compose down -v
docker-compose up -d

Issue: API rate limiting

Symptoms: Tasks fail with 429 errors or API timeouts

Solutions:

  1. Add delays between API calls in your DAG
  2. Use a commercial API key
  3. Reduce the number of locations or variables

Issue: Stale DAG definition

Symptoms: Changes to DAG file not reflected in UI

Solutions:

# 1. Force reimport
docker-compose exec airflow-webserver airflow dags reserialize

# 2. Restart scheduler
docker-compose restart airflow-scheduler

# 3. Check file was actually updated
docker-compose exec airflow-webserver cat /opt/airflow/dags/example_weather_dag.py | head -20

Resource Requirements

Resource Minimum Recommended
RAM 4 GB 8 GB
CPU 2 cores 4 cores
Disk 5 GB 10 GB

Security Notes

  • This setup is for local development only
  • Default credentials should be changed for any non-local deployment
  • The PostgreSQL database is exposed on port 5432
  • No SSL/TLS is configured

Cleanup

To completely remove all Docker resources:

# Stop services and remove volumes
docker-compose down -v

# Remove the custom image
docker rmi $(docker images | grep openmeteo | awk '{print $3}')

# Remove all unused Docker resources (optional)
docker system prune -a

Support

For issues and questions: