Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
fc08d1f
Async Simulation endpoint
Yannicked May 12, 2026
1eb037c
Add migration
Yannicked May 12, 2026
aba96f5
Fix ty
Yannicked May 12, 2026
55d8e53
Remove automatic table creation
Yannicked May 12, 2026
795de15
Add Celery
Yannicked May 18, 2026
e6989d9
Copy files
Yannicked May 19, 2026
d6c654d
Cleanup tasks
Yannicked May 19, 2026
7f9678f
Add tests
Yannicked May 19, 2026
22f35ee
Handle exceptions
Yannicked May 20, 2026
1fbe188
Sanitize path
Yannicked May 20, 2026
dc05598
Cleanup simulations post endpoint
Yannicked May 21, 2026
3fbfea6
Merge branch 'develop' into feature/celery-tasks
Yannicked May 21, 2026
d4fbb52
Add celery documentation
Yannicked May 21, 2026
2fbd598
Update ingestion status migration
Yannicked May 21, 2026
d4e6ac4
Fix typing
Yannicked May 21, 2026
68938a1
Cleanup tests
Yannicked May 28, 2026
03d2abb
Fix issues
Yannicked Jun 2, 2026
af44d84
Test script
Yannicked Jun 2, 2026
a9ed16c
Fix issues with imas data
Yannicked Jun 2, 2026
b60af62
Update test script for IMAS
Yannicked Jun 2, 2026
36b4bfa
Update tests
Yannicked Jun 2, 2026
a7c992e
Merge branch 'develop' into feature/celery-tasks
Yannicked Jun 3, 2026
287d0e1
Update docker files
Yannicked Jun 4, 2026
015242c
Check for master.h5 in hdf5 imas
Yannicked Jun 5, 2026
14967aa
Fix imas backend detection
Yannicked Jun 8, 2026
c21c06c
Fix test
Yannicked Jun 8, 2026
05cf791
Temporarily comment celery worker and beat
Yannicked Jun 8, 2026
fc7c3ef
Add uv lockfile
Yannicked Jun 8, 2026
95c100b
Update revision string
Yannicked Jun 9, 2026
5b5eb9f
Make mdsplus check superset
Yannicked Jun 9, 2026
c748208
Remove unused model
Yannicked Jun 9, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions alembic/versions/b2c52ee8ff12_add_ingestion_status.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
"""Add ingestion status

Revision ID: b2c52ee8ff12
Revises: 28bee3aa2429
Create Date: 2026-05-11 16:16:03.768893

"""

from typing import Sequence, Union

import sqlalchemy as sa

from alembic import op

# revision identifiers, used by Alembic.
revision: str = "b2c52ee8ff12"
down_revision: Union[str, Sequence[str], None] = "28bee3aa2429"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None


def upgrade() -> None:
"""Upgrade schema."""
conn = op.get_bind()
dialect = conn.dialect.name
if dialect == "postgresql":
op.execute(
"CREATE TYPE ingestionstatus AS ENUM ('QUEUED', 'COPYING', 'COPIED', "
"'VALIDATING', 'VALIDATED', 'COMPLETED', 'COPY_FAILED', "
"'VALIDATION_FAILED')"
)
with op.batch_alter_table("simulations", schema=None) as batch_op:
batch_op.add_column(
sa.Column(
"ingestion_status",
sa.Enum(
"QUEUED",
"COPYING",
"COPIED",
"VALIDATING",
"VALIDATED",
"COMPLETED",
"COPY_FAILED",
"VALIDATION_FAILED",
name="ingestionstatus",
),
nullable=True,
)
)
batch_op.add_column(sa.Column("ingestion_version", sa.Integer(), nullable=True))
op.execute(
"UPDATE simulations SET ingestion_status = 'COMPLETED' WHERE ingestion_status "
"IS NULL"
)
op.execute(
"UPDATE simulations SET ingestion_version = 0 WHERE ingestion_version IS NULL"
)
with op.batch_alter_table("simulations", schema=None) as batch_op:
batch_op.alter_column("ingestion_status", nullable=False)
batch_op.alter_column("ingestion_version", nullable=False)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this ingestion version for?



def downgrade() -> None:
"""Downgrade schema."""
with op.batch_alter_table("simulations", schema=None) as batch_op:
batch_op.drop_column("ingestion_version")
batch_op.drop_column("ingestion_status")
conn = op.get_bind()
dialect = conn.dialect.name
if dialect == "postgresql":
op.execute("DROP TYPE ingestionstatus")
77 changes: 77 additions & 0 deletions docs/celery.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Celery async task processing

SimDB uses [Celery](https://docs.celeryproject.org/) to run asynchronous background
tasks such as copying simulation files and completing the ingestion pipeline.

## Overview

When simulations are uploaded via the REST API, the server offloads heavy operations
to Celery workers instead of blocking the HTTP request. Tasks are defined in
`src/simdb/workers/tasks.py`:

- `copy_files_task` — copies input/output files from source locations to the server's
upload folder and updates the simulation's ingestion status.
- `complete_ingestion_task` — marks a simulation as fully ingested.
- `validate_imas_task` — runs validation checks on IMAS data (placeholder).
- `send_email_task` — sends email notifications.

Tasks can be chained in the API endpoint:

```python
copy_files = copy_files_task.si(simulation.uuid, ...)
complete = complete_ingestion_task.si(simulation.uuid)
_ = (copy_files | complete).apply_async()
```

## Configuration

Celery is configured via `app.cfg`:

| Section | Option | Required | Description |
|---------|----------------|----------|--------------------------------------------------|
| celery | broker_url | no | Redis URL for the message broker. Defaults to `redis://localhost:6379/0` |
| celery | result_backend | no | Redis URL for results storage. Defaults to `redis://localhost:6379/0` |

Example:

```ini
[celery]
broker_url = redis://localhost:6379/0
result_backend = redis://localhost:6379/0
```

## Running workers

### Standalone worker

Start a Celery worker using the built-in CLI:

```bash
simdb_celery worker

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be simdb_worker. i tried simdb_celery worker but it didnt work. simdb_worker works and i also see it registered under that in pyproject.toml

```

### Worker with beat scheduler

For periodic tasks (e.g. cleanup, reports), run both the worker and beat:

```bash
# Terminal 1: worker
simdb_celery worker

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simdb_worker


# Terminal 2: beat scheduler
simdb_celery beat

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simdb_beat

```

### Flower monitoring

[Flower](https://flower.readthedocs.io/) provides a web UI for monitoring Celery
workers and tasks:

```bash
celery -A simdb.workers.celery flower --port=5555
```

## Testing with eager mode

In tests, set `task_always_eager = True` to run tasks synchronously without a
broker.
16 changes: 16 additions & 0 deletions docs/developer_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,22 @@ simdb_server

This will start a server on port 5000. You can test this server is running by opening http://localhost:5000 in a browser.

## Running Celery workers

For development, you typically want to run Celery tasks synchronously. This is
enabled by setting `task_always_eager = True` in tests (see `tests/remote/api/v1.3/test_simulations3.py`).

To run actual background workers during development:

```bash
# Worker
simdb_celery worker

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simdb_worker


# Beat scheduler (if needed)
simdb_celery beat

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simdb_beat

```

See the [Celery documentation](celery.md) for full details.
## Swagger API documentation

SimDB provides interactive Swagger API documentation for each API version. The documentation is automatically generated and accessible at different endpoints depending on the API version you want to explore.
Expand Down
6 changes: 6 additions & 0 deletions docs/maintenance_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,12 @@ service nginx restart

You should now be able to check the simdb server is running by going to the http address defined in your nginx site (localhost:80 in the example above).

## Celery background workers

SimDB uses Celery to run asynchronous background tasks such as copying simulation
files. See the [Celery documentation](celery.md) for details on configuration and
running workers.

#### Nginx Request Entity Size

You may need to increase the size of uploaded files that Nginx will accept. For SimDB this should be at least 100MB.
Expand Down
10 changes: 8 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -92,13 +92,19 @@ build-docs = [
postgres = [
"psycopg2-binary>=2.8.0",
]
celery = [
"celery>=5.3.0",
"redis>=5.0.0",
]
all = [
"imas-simdb[server, imas-validator, postgres]"
"imas-simdb[server, imas-validator, postgres, celery]",
]

[project.scripts]
simdb = "simdb.cli.simdb:main"
simdb_server = "simdb.remote.wsgi:run"
simdb_worker = "simdb.workers.cli:worker"
simdb_beat = "simdb.workers.cli:beat"

[project.urls]
Homepage = "https://simdb.iter.org/dashboard/"
Expand Down Expand Up @@ -168,5 +174,5 @@ dev = [
"pytest-cov>=5.0.0",
"ruff~=0.15.0",
"ty==0.0.34",
"imas-simdb[server, imas-validator, postgres, auth]"
"imas-simdb[server, imas-validator, postgres, auth, celery]"
]
Loading
Loading