11# eventkit
22
3- Event ingestion and processing primitives for Python.
3+ Event ingestion and processing kit for Python.
44
55## Overview
66
7- ` eventkit ` is a high-performance, type-safe library for building event collection pipelines. It provides the core infrastructure for customer data platforms, product analytics, and event-driven architectures.
7+ ` eventkit ` is a production-ready ** kit** for building event collection pipelines. Clone it, customize it, make it yours.
8+
9+ ** Philosophy** : Provide a solid starting point with battle-tested patterns, then get out of your way. Customize for your specific needs.
810
911### Key Features
1012
@@ -27,13 +29,15 @@ Event ingestion and processing primitives for Python.
2729
2830## Quick Start
2931
30- Install from PyPI :
32+ Clone and customize :
3133
3234``` bash
33- pip install eventkit
35+ git clone https://github.com/prosdevlab/eventkit.git my-event-pipeline
36+ cd my-event-pipeline
37+ uv sync
3438```
3539
36- Add to your FastAPI application :
40+ Customize for your needs :
3741
3842``` python
3943from fastapi import FastAPI
@@ -181,25 +185,31 @@ Inspired by open-source CDP architectures:
181185- [ PostHog] ( https://github.com/PostHog/posthog ) - Modern Python stack (FastAPI, async)
182186- [ Snowplow] ( https://github.com/snowplow/snowplow ) - Schema-first validation (optional)
183187
184- ## Installation
188+ ## Getting Started
185189
186- ** Basic:**
187- ``` bash
188- pip install eventkit
189- ```
190+ ** EventKit is a kit** , not a library. Clone and make it your own:
190191
191- ** With ClickHouse support:**
192192``` bash
193- pip install eventkit[clickhouse]
194- ```
193+ # 1. Clone the repo
194+ git clone https://github.com/prosdevlab/eventkit.git my-event-pipeline
195+ cd my-event-pipeline
195196
196- ** Development:**
197- ``` bash
198- git clone https://github.com/prosdev/eventkit.git
199- cd eventkit
200- pip install -e " .[dev]"
197+ # 2. Install dependencies
198+ uv sync
199+
200+ # 3. Start local dev
201+ docker-compose up -d # GCS + PubSub emulators
202+ uv run uvicorn eventkit.api.app:app --reload
203+
204+ # 4. Customize for your needs
205+ # - Modify validation rules in src/eventkit/adapters/
206+ # - Add custom storage backends in src/eventkit/stores/
207+ # - Adjust queue behavior in src/eventkit/queues/
208+ # - Make it yours!
201209```
202210
211+ See [ LOCAL_DEV.md] ( LOCAL_DEV.md ) for detailed setup.
212+
203213## API Endpoints
204214
205215### Collection Endpoints
@@ -340,62 +350,6 @@ python -m scripts.run_bigquery_loader
340350
341351See ` scripts/bigquery/README.md ` and ` specs/gcs-bigquery-storage/ ` for full details.
342352
343- ### Error Store (Dead Letter Queue)
344-
345- All failed events are stored in a GCS-based dead letter queue for debugging and retry:
346-
347- ** Two Error Types:**
348- - ** Validation Errors** : Missing required fields, invalid schema
349- - ** Processing Errors** : Storage failures, unexpected exceptions
350-
351- ** Storage Structure:**
352- ```
353- gs://bucket/errors/
354- date=2026-01-15/
355- error_type=validation/
356- error-20260115-100000-abc123.parquet
357- error_type=processing/
358- error-20260115-100500-def456.parquet
359- ```
360-
361- ** Create BigQuery Errors Table:**
362- ``` bash
363- cd scripts/bigquery
364- export PROJECT_ID=my-project DATASET=events
365- cat create_errors_table.sql | sed " s/{PROJECT_ID}/$PROJECT_ID /g" | sed " s/{DATASET}/$DATASET /g" | bq query --use_legacy_sql=false
366- ```
367-
368- ** Query Errors:**
369- ``` sql
370- -- Find validation errors in last 24 hours
371- SELECT
372- error_message,
373- stream,
374- COUNT (* ) as count
375- FROM ` project.dataset.errors`
376- WHERE date >= CURRENT_DATE () - 1
377- AND error_type = ' validation_error'
378- GROUP BY error_message, stream
379- ORDER BY count DESC ;
380-
381- -- Get processing errors with stack traces
382- SELECT
383- timestamp ,
384- error_message,
385- JSON_EXTRACT_SCALAR(error_details, ' $.exception_type' ) as exception,
386- JSON_EXTRACT_SCALAR(error_details, ' $.stack_trace' ) as stack_trace
387- FROM ` project.dataset.errors`
388- WHERE error_type = ' processing_error'
389- ORDER BY timestamp DESC
390- LIMIT 10 ;
391- ```
392-
393- ** Key Features:**
394- - Never loses events - all failures stored for debugging
395- - Automatic 30-day retention (GCS lifecycle rules)
396- - Full event context (payload, error, timestamp, stream)
397- - Queryable via BigQuery for pattern analysis
398-
399353### Custom Storage
400354
401355Implement the ` EventStore ` protocol for any backend:
@@ -531,7 +485,7 @@ uv run ruff format src/
531485
532486## Roadmap
533487
534- ### Core (v0.x)
488+ ### Core Kit (v0.x) ✅
535489- [x] Composable validators (required fields, types, timestamps)
536490- [x] Segment-compatible adapter with ValidationPipeline
537491- [x] Collection API with stream routing
@@ -542,24 +496,25 @@ uv run ruff format src/
542496- [x] Prometheus metrics
543497- [x] EventSubscriptionCoordinator (dual-path architecture)
544498- [x] Hash-based sequencer for consistent ordering
545- - [x] Error store with dead letter queue (GCS-based )
546- - [ ] Performance benchmarks (10k+ events/sec )
499+ - [x] Performance benchmarks (10k+ events/sec validated )
500+ - [ ] Error handling and dead letter queue (ErrorStore protocol exists, needs implementation )
547501
548- ### v1.0
549- - [ ] OpenAPI spec and generated clients
550- - [ ] Comprehensive examples and documentation
502+ ### v1.0 - Production Ready
503+ - [ ] Comprehensive examples and use cases
551504- [ ] Production deployment guides (Cloud Run, GKE, ECS)
552505- [ ] S3 + Snowflake/Redshift storage adapters
506+ - [ ] Nextra documentation site
507+
508+ ### Future: Extract Focused Libraries
553509
554- ### Future Ecosystem
510+ As patterns stabilize, we may extract reusable components:
555511
556- These capabilities are intentionally scoped as separate packages to keep the core focused:
512+ - ** eventkit-ring-buffer** - SQLite WAL durability layer (could be used standalone)
513+ - ** eventkit-queues** - Queue abstractions (AsyncQueue, PubSub patterns)
514+ - ** eventkit-validators** - Composable validation framework
515+ - ** eventkit-storage** - Storage backend protocols and implementations
557516
558- - ** eventkit-profiles** - Profile building and field-level merge strategies
559- - ** eventkit-identity** - Graph-based identity resolution across devices
560- - ** eventkit-enrichment** - IP geolocation, user agent parsing, company enrichment
561- - ** eventkit-destinations** - Activate data to marketing and analytics tools
562- - ** eventkit-privacy** - GDPR/CCPA compliance utilities (deletion, anonymization)
517+ These would be pip-installable libraries while the kit remains a starting point.
563518
564519## Contributing
565520
0 commit comments