Hinweis: Vage Einträge ohne messbares Ziel, Interface-Spezifikation oder Teststrategie mit
<!-- TODO: add measurable target, interface spec, test strategy -->markieren.
Centralized database maintenance orchestration (database_maintenance_orchestrator.cpp) and default schedule bundles (maintenance_registry.cpp). The orchestrator provides schedule CRUD, cron-based dispatch via TaskScheduler, sequential DAG execution of 19 task types, maintenance window enforcement, audit logging, observability metrics, per-module health probe registry, and the MaintenanceApiHandler (11 HTTP REST endpoints). Enhancements focus on persistence, explicit DAG dependencies, module task wiring, and distributed coordination.
[x]Schedules must survive server restarts — implemented viaMaintenanceScheduleStore(RocksDB, v1.1.0).[x]schedules_mutex_is held exclusively for all read operations (listSchedules,getSchedule) — upgraded toshared_mutexin v1.2.0; read operations now usestd::shared_lock.[ ]halt_on_task_failuresemantics must be preserved: a failed task stops execution of subsequent tasks in the same run; parallel task execution must not be introduced without preserving this contract.[ ]All admin operations (DELETE,PATCH,POST/run) must be atomic with respect to the running cron job; no partial state must be visible to concurrent readers.[x]Module-delegated tasks (STORAGE_COMPACTION,REPLICA_VALIDATION,MVCC_CLEANUP,METRICS_COLLECTION) must dispatch through a registeredIMaintenanceTaskHandlerinterface — direct module coupling inexecuteTask()is forbidden. Implemented in v1.2.0.
| Interface | Consumer | Notes |
|---|---|---|
DatabaseMaintenanceOrchestrator::registerTaskHandler(type, handler) |
Storage, sharding, replication modules | Registers a real implementation for a delegated task type |
IMaintenanceTaskHandler::execute(schedule_id, task_type, params) → Result |
executeTask() in orchestrator |
Called when the cron job fires for this task type |
MaintenanceScheduleStore::save(entry) / load(id) / loadAll() |
DatabaseMaintenanceOrchestrator |
RocksDB-backed persistence; replaces in-memory schedules_ map |
MaintenanceApiHandler REST API |
HTTP server, admin CLI | 11 endpoints; RBAC scopes maintenance:read/write/admin |
DistributedLock::tryAcquire(key, ttl) |
Orchestrator cron dispatch path | Prevents two nodes running the same schedule simultaneously |
Priority: High Target Version: v1.1.0
Schedules are currently in-memory (std::unordered_map<std::string, MaintenanceScheduleEntry> schedules_). They are lost on every server restart. Operators must re-create all schedules after each deployment.
Implementation Notes:
[x]Add aMaintenanceScheduleStoreclass wrapping the existingStorageEngineAPI; key format:maint_sched::{id}(UTF-8 JSON value).[x]InDatabaseMaintenanceOrchestrator::start(), callMaintenanceScheduleStore::loadAll()and populateschedules_before registering cron jobs.[x]IncreateSchedule,updateSchedule,patchSchedule,deleteSchedule— persist the change to RocksDB inside theschedules_mutex_critical section (write-through).[x]Corrupt schedule JSON on load: logWARNand skip that entry; all valid entries must be loaded.[x]Add a restart-persistence integration test: create 3 schedules, restart the orchestrator, verify all 3 are present.
Performance Targets:
loadAll()at startup: ≤ 100 ms for 10 000 stored schedules.
Priority: High Target Version: v1.1.0
There is no way to trigger a schedule outside its maintenance window without editing the window configuration. Operators need an emergency override for urgent maintenance.
Implementation Notes:
[x]AddPOST /api/v1/maintenance/schedules/{id}/runwith optional body{"force": true}.[x]Whenforce: true, bypass the UTC window check inexecuteSchedule(); setforced: truein the audit log entry.[x]Requiremaintenance:adminscope for the force flag;maintenance:writeallows manual trigger within the window only.[x]Unit test: schedule with a window that excludes the current hour; force-run triggers execution; regular run is skipped.
Priority: Medium Target Version: v1.2.0 ✅ Implemented
Task execution order is currently determined by list order in MaintenanceScheduleEntry::tasks. There are no explicit dependency declarations, making it impossible to express "run WAL rotation before compaction" without relying on position.
Implementation Notes:
[x]AddMaintenanceTaskDependencystruct:{ task_type: MaintenanceTaskType, depends_on: vector<MaintenanceTaskType> }.[x]AddMaintenanceScheduleEntry::task_dependenciesfield (optional; defaults to sequential list order).[x]Implement topological sort of the dependency graph using Kahn's algorithm inDatabaseMaintenanceOrchestrator::resolveTaskExecutionOrder().[x]Cycle detection: reject schedule creation / update with a cycle; returnERR_UTIL_INVALID_ARGUMENT.[x]Tests: DAG ordering correctness, cycle rejection, cascading failure withhalt_on_task_failure.
Performance Targets:
- Topological sort: O(V+E); V=19 max task types — negligible overhead.
Priority: Medium Target Version: v1.2.0 ✅ Implemented
executeTask() in database_maintenance_orchestrator.cpp succeeds immediately for all delegated task types (STORAGE_COMPACTION, REPLICA_VALIDATION, MVCC_CLEANUP, etc.) without calling any real module code. This is documented in ROADMAP.md as a known limitation.
Implementation Notes:
[x]AddregisterTaskHandler(MaintenanceTaskType, std::shared_ptr<IMaintenanceTaskHandler>)to the orchestrator public API.[x]StorageModuleregisters a handler forSTORAGE_COMPACTIONthat callsCompactionManager::compactAll(). (StorageCompactionHandlerimpl inmaintenance_task_handler_impls.h; wired inhttp_server.cpp, Issue #4587.)[~]ShardingModuleregisters a handler forREPLICA_VALIDATIONthat calls the consistency checker. (ReplicaValidationHandlerimpl provided; startup wiring call site pending — Issue: REPLICA_VALIDATION wiring.)[x]StorageEngineregisters a handler forMVCC_CLEANUPthat triggers MVCC tombstone GC. (MvccCleanupHandlerimpl provided; wired inhttp_server.cpp, Issue #4586.)[x]For unregistered task types,executeTask()returns aSKIPPEDresult with a structured log message indicating no handler is registered.[x]Add aGET /api/v1/maintenance/task-handlersendpoint listing registered handlers per task type.
Priority: Medium Target Version: v1.2.0 ✅ Implemented
database_maintenance_orchestrator.cpp used std::lock_guard<std::mutex> (exclusive) for all read operations (listSchedules, getSchedule, listJobs, getJob). Under concurrent admin API load, all readers serialized unnecessarily.
Implementation Notes:
[x]Replacestd::mutex schedules_mutex_andstd::mutex jobs_mutex_withstd::shared_mutex; upgradelistSchedules,getSchedule,listJobs,getJobtostd::shared_lock.[x]All write operations (createSchedule,updateSchedule,patchSchedule,deleteSchedule,pruneOldJobs) usestd::unique_lock.[ ]Add a TSAN-enabled test with 8 concurrentlistSchedulesthreads + 1createSchedulethread.
Priority: High (production multi-node) Target Version: v2.1.0 (interface implemented v2.0.0; Raft backend pending)
In a multi-node cluster, each node independently schedules and fires maintenance jobs. Two nodes may run the same schedule concurrently, causing compaction storms or double maintenance.
Implementation Notes:
[x]IDistributedLockinterface +InProcessDistributedLockimplementation (include/maintenance/i_distributed_lock.h).[x]setDistributedLock(shared_ptr<IDistributedLock>)DI injection; RAII lock guard inexecuteSchedule().[x]tryAcquire(schedule_id, ttl=window_duration_ms + 30s); non-leader nodes log SKIPPED at DEBUG level.[x]Lock TTL ≥ estimated task duration + 30 s; configurable viaMaintenanceScheduleEntry::lock_ttl_ms.[ ]Integrate Raft-backed implementation that forwards acquire/release tosrc/replication/raft_v2.cppor a dedicated distributed lock service.
Scientific Reference:
- [1] Chandra, T. D., & Toueg, S. (1996). Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2), 225–267. DOI: 10.1145/226643.226647
- [2] Ongaro, D., & Ousterhout, J. (2014). In search of an understandable consensus algorithm. USENIX Annual Technical Conference (ATC '14), 305–319. URL: https://raft.github.io/raft.pdf
Priority: Low Target Version: v2.0.0 ✅ Implemented
All schedules currently share a single global namespace and window. In a SaaS deployment, different tenants need independent maintenance windows and quotas.
Implementation Notes:
[x]AddMaintenanceScheduleEntry::tenant_id(optional; empty = global/system schedule).[x]Per-tenant window enforcement: tenant's schedule fires only when the current hour is within that tenant's configured maintenance window, loaded from the tenant config.[x]Per-tenant quota: max N concurrent running maintenance jobs per tenant; enforced inexecuteSchedule().[x]Admin API:GET /api/v1/maintenance/schedules?tenant_id={id}filters by tenant.
Implementation Details:
TenantMaintenanceConfigstruct added todatabase_maintenance_orchestrator.h:enforce_window,window_start_hour,window_end_hour,max_concurrent_jobs.DatabaseMaintenanceOrchestrator::setTenantMaintenanceConfig(tenant_id, config)/getTenantMaintenanceConfig(tenant_id)— thread-safe viatenant_configs_mutex_(shared_mutex).listSchedules(tenant_id_filter = "")— empty filter returns all, non-empty returns only matching tenant.MaintenanceApiHandler::listSchedules(tenant_id = "")— API handler passes filter to orchestrator.OrchestratorJob::tenant_idpopulated from parent schedule intriggerNow()andregisterWithScheduler().- 15 unit tests (MT-01..MT-15) in
test_database_maintenance_orchestrator.cpp.
Priority: Low Target Version: v3.0.0
Before executing a maintenance job, predict the CPU/memory impact using an ML model trained on historical job telemetry. Allow operators to defer scheduling if predicted impact exceeds thresholds.
Implementation Notes:
[ ]Collect job telemetry (task type, duration, CPU %, memory delta) viaMetricsCollector— basis for training data.[ ]Lightweight inference model (decision tree or linear regression) embedded in the orchestrator; no external service dependency.[ ]MaintenanceScheduleEntry::max_predicted_cpu_pctandmax_predicted_mem_mb— defer when predicted cost exceeds thresholds.[ ]Impact estimate surfaced ingetStatus()JSON response.
Scientific References:
- [3] Pavlo, A., Angulo, G., Arulraj, J., Lin, H., Lin, J., Ma, L., Menon, P., Mowry, T., Perron, M., Quah, I., Santurkar, S., Tomasic, A., Touw, W., Van Aken, D., Wang, Z., White, L., Zhang, G., Zhong, R., & Zhang, T. (2017). Self-driving database management systems. CIDR 2017. URL: https://db.cs.cmu.edu/papers/2017/p42-pavlo-cidr17.pdf
- [4] Van Aken, D., Pavlo, A., Gordon, G. J., & Zhang, B. (2017). Automatic database management system tuning through large-scale machine learning. SIGMOD 2017, 1009–1024. DOI: 10.1145/3035918.3064029
Priority: Medium Target Version: v2.1.0
REPLICA_VALIDATION tasks are currently unhandled (no registered handler at startup). The sharding/replica module needs to register a ReplicaValidationHandler at startup.
Implementation Notes:
[~]ReplicaValidationHandlerclass already provided ininclude/maintenance/maintenance_task_handler_impls.h.[ ]Sharding module startup: callorchestrator.registerTaskHandler(REPLICA_VALIDATION, make_shared<ReplicaValidationHandler>(replica_manager)).[ ]ReplicaValidationHandler::execute()calls the consistency checker insrc/replication/and returns a structuredResult<void>.[ ]Unit test: register handler, trigger REPLICA_VALIDATION schedule, verify handler invoked.
- Unit tests (≥55, including MT-01..MT-15): extend with TSAN concurrent-read stress and REPLICA_VALIDATION handler registration.
- Integration tests: restart-persistence (RocksDB round-trips); distributed lock with mock Raft; concurrent admin API stress (TSAN, 8 readers + 1 writer).
- Performance benchmarks:
loadAll()with 10 K schedules;listSchedules()under 8 concurrent readers.
loadAll()at startup with 10 K schedules: ≤ 100 ms.listSchedules()read path under 8 concurrent admin API requests: ≤ 2 ms p99.- Topological sort of 19-node task DAG: ≤ 1 µs.
- All schedule mutations are audit-logged via
AuditLogger::logEvent()with caller identity and HLC timestamp. - Force-run requires
maintenance:adminJWT scope. - Distributed lock prevents concurrent execution of the same schedule across cluster nodes.
halt_on_task_failureensures a single failed task stops cascading damage.
[1] T. D. Chandra and S. Toueg, "Unreliable failure detectors for reliable distributed systems," Journal of the ACM, vol. 43, no. 2, pp. 225–267, Mar. 1996. DOI: 10.1145/226643.226647
[2] D. Ongaro and J. Ousterhout, "In search of an understandable consensus algorithm," in Proc. USENIX Annual Technical Conference (ATC '14), Philadelphia, PA, USA, Jun. 2014, pp. 305–319. URL: https://raft.github.io/raft.pdf
[3] A. Pavlo et al., "Self-driving database management systems," in Proc. 8th Biennial Conference on Innovative Data Systems Research (CIDR 2017), Chaminade, CA, USA, Jan. 2017. URL: https://db.cs.cmu.edu/papers/2017/p42-pavlo-cidr17.pdf
[4] D. Van Aken, A. Pavlo, G. J. Gordon, and B. Zhang, "Automatic database management system tuning through large-scale machine learning," in Proc. ACM SIGMOD 2017, Chicago, IL, USA, May 2017, pp. 1009–1024. DOI: 10.1145/3035918.3064029
Last Updated: 2026-04-15 Module Version: v2.0.0