Peer-to-peer system for distributed IoT sensor simulation and replicated state convergence. Each node runs sensors, state merge, membership/liveness, TCP protocol handling, and a read-only HTTP API.
Course: Distributed Systems - MSc in Computer Science and Engineering
Institution: University of Bologna (UNIBO)
Academic Year: 2025/2026Course Report: Distributed Sensor Hub Final Report (PDF)
- Docs index
- Docker CD
- Course report (PDF)
- Course report repository (LaTeX source)
- Architecture
- Node services inventory
- Testing
- Introspection API
- Observability UI
| Module | Responsibility | README |
|---|---|---|
runtime/ |
Startup, wiring, lifecycle | runtime/README.md |
protocol/ |
Message contracts, codec, dispatcher, handlers | protocol/README.md |
networking/ |
TCP client/server and framing | networking/README.md |
membership/ |
Peer table, liveness metadata, membership merge | membership/README.md |
fd/ |
Phi-accrual failure detection | fd/README.md |
gossip/ |
Membership dissemination (GOSSIP_STATE) |
gossip/README.md |
state/ |
Local authoritative state and LWW merge | state/README.md |
sensors/ |
Sensor providers and ingestion boundary | sensors/README.md |
topology/ |
Topology policy and peer selection | topology/README.md |
webapi/ |
Read-only HTTP observation API | webapi/README.md |
- Sensors emit readings into the local event queue.
- State worker applies LWW on
(ts_ms, origin). - Runtime runs periodic push/pull replication (
SENSOR_UPDATE,GET_DELTA). - Protocol handlers merge inbound deltas/snapshots.
- Membership is updated via
JOIN_REQUEST,PEER_LIST,PING/PONG,GOSSIP_STATE. - Web API exposes snapshots (
/api/state,/api/updates,/api/membership,/api/introspection).
Prerequisites:
- Python
3.14+ pip- Docker + Docker Compose
Install:
git clone https://github.com/AlexSantini10/distributed-sensor-hub.git
cd distributed-sensor-hub
pip install -r requirements.txtBase configuration:
- Start from .env.example.
- Required identifiers/bindings:
NODE_ID,HOST,PORT,WEB_API_PORT. - Cluster bootstrap:
BOOTSTRAP_PEERS. - Inbound TCP resilience:
SOCKET_TIMEOUT(socket read/accept timeout in seconds)ACCEPT_QUEUE_SIZE(listen backlog)MAX_CONNECTIONS(max concurrent active inbound connections)MAX_WORKERS(max concurrent inbound handler workers)
- Replication cadence/fanout:
GOSSIP_SYNC_INTERVAL_MS,GOSSIP_PUSH_*,GOSSIP_PULL_*. - Failure detection thresholds:
PHI_THRESHOLD_SUSPECT,PHI_THRESHOLD_DEAD.
Single node (PowerShell):
$env:NODE_ID="node-1"
$env:HOST="0.0.0.0"
$env:PORT="9000"
$env:BOOTSTRAP_PEERS=""
$env:WEB_API_PORT="10000"
python node.pyDocker topologies:
docker compose -f docker/docker-compose-base.yml up --build -d
docker compose -f docker/docker-compose-6-nodes.yml up --build -d
docker compose -f docker/docker-compose-12-nodes.yml up --build -dObservability UI:
cd web
python -m http.server 8080Then open http://localhost:8080 and set the API base URL to a node endpoint (for example http://localhost:10000).
When many nodes and sensors are active, stale data in UI is usually caused by pull storms and/or a too-small delta history.
Recommended baseline for docker-compose-6-nodes.yml, docker-compose-12-nodes.yml, and docker-compose-base.yml:
GOSSIP_SYNC_INTERVAL_MS: 1000GOSSIP_PUSH_RATIO: 0.35GOSSIP_PUSH_MIN_PEERS: 1GOSSIP_PULL_RATIO: 0.05GOSSIP_PULL_MIN_PEERS: 1GOSSIP_PULL_EVERY_ROUNDS: 6REPLICATION_DELTA_MAXLEN: 4096
Sensor cadence for demos (mixed load, clearer UI behavior):
- Keep a few "fast" sensors at
1200-2500 ms. - Keep non-critical sensors at
10000-15000 ms. - Avoid making all sensors fast at the same time.
Practical rule of thumb:
- If you increase fast sensors, first increase
REPLICATION_DELTA_MAXLEN. - Increase pull pressure only if freshness is low and push alone is not enough.
- Keep pull less aggressive than push in stable networks.
Symptoms and actions:
- Many
sensor_update_receivedwithapplied:falseandsource:"pull": reduceGOSSIP_PULL_RATIOand/or increaseGOSSIP_PULL_EVERY_ROUNDS. - Frequent
DELTA_UNAVAILABLEor very old UI timestamps: increaseREPLICATION_DELTA_MAXLEN. - CPU/network pressure too high: increase
GOSSIP_SYNC_INTERVAL_MS(for example1200-1500) and slow non-critical sensors.
Suggested safe ranges:
GOSSIP_SYNC_INTERVAL_MS:800-1500GOSSIP_PUSH_RATIO:0.25-0.5GOSSIP_PULL_RATIO:0.05-0.25GOSSIP_PULL_EVERY_ROUNDS:3-8REPLICATION_DELTA_MAXLEN:2048-8192
Validation checklist after tuning:
- Restart compose and wait at least 1-2 minutes.
- Check introspection counters:
get_delta_unavailable_totalshould stay near0. - Verify retained delta buffer is not constantly full.
- In UI, most sensor timestamps should be recent (seconds, not minutes) for "fast" sensors.
applied:falseevents can exist, but they should not dominate traffic for long periods.
- Unit + integration overview: docs/testing.md
- Quick local run:
pytest --maxfail=1
GitHub Actions builds the node container image from docker/Dockerfile.base on every push to main and publishes the rolling tags on ghcr.io/<owner>/<repo>.
When you push a version tag such as v1.0.0, the workflow also creates the GitHub Release for that commit and attaches a small usage bundle with instructions, a .env example, and a minimal Compose file pinned to the released image digest. Pull requests to main still validate that the image builds, but they do not publish artifacts. Full usage details are documented in docs/docker-cd.md.
- Logs follow
LOG_FILEand are mounted tologs/in Docker setups. - Protocol
ERROR/ACKhandlers are currently placeholders in runtime setup. - License: LICENSE