- Overview
- Architectural Patterns
- System Architecture
- Module Architecture
- Data Flow
- Design Decisions
- Scalability & Performance
- Security
- Error Handling & Resilience
The Firefly Framework Webhooks Library is built using a microservices-oriented architecture with clear separation between webhook ingestion (producer) and webhook processing (consumer). The platform follows reactive programming principles using Spring WebFlux and Project Reactor for non-blocking, high-throughput operations.
- Separation of Concerns: Ingestion logic is completely decoupled from business logic
- Reactive Programming: Non-blocking I/O throughout the stack
- Event-Driven Architecture: Asynchronous communication via message queues
- Hexagonal Architecture: Ports and adapters pattern in the processor module
- Idempotency: Exactly-once processing semantics
- Scalability: Horizontal scaling for both ingestion and processing
The platform implements a classic producer-consumer pattern:
┌──────────────────┐ ┌─────────────┐ ┌──────────────────┐
│ HTTP Webhooks │────────>│ Kafka │────────>│ Worker Apps │
│ (Producers) │ │ Topics │ │ (Consumers) │
└──────────────────┘ └─────────────┘ └──────────────────┘
Benefits:
- Decoupling: Producers and consumers evolve independently
- Buffering: Kafka acts as a buffer during traffic spikes
- Scalability: Scale producers and consumers independently
- Reliability: Messages are persisted in Kafka
The processor module follows hexagonal architecture:
┌─────────────────────────────────────────────────────┐
│ Application Core │
│ │
│ ┌─────────────────────────────────────────────┐ │
│ │ WebhookProcessor (Port) │ │
│ │ - Business logic interface │ │
│ └─────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────┘
▲ ▲
│ │
┌────────┴────────┐ ┌────────┴────────┐
│ Input Adapter │ │ Output Adapter │
│ │ │ │
│ Kafka Consumer │ │ Redis Cache │
│ (EventListener)│ │ (Idempotency) │
└─────────────────┘ └─────────────────┘
Benefits:
- Testability: Core logic can be tested without infrastructure
- Flexibility: Swap adapters without changing core logic
- Maintainability: Clear boundaries between layers
All communication between components is event-driven:
HTTP Request → Domain Event → Kafka Event → Processing Event → Business Logic
Benefits:
- Asynchronous Processing: Non-blocking operations
- Loose Coupling: Components communicate via events
- Audit Trail: All events are logged and traceable
- Replay Capability: Events can be replayed for recovery
The platform separates write operations (commands) from read operations (queries):
- Commands: Webhook ingestion (write to Kafka)
- Queries: Health checks, status queries (read from cache/state)
Benefits:
- Performance: Optimize reads and writes independently
- Scalability: Scale read and write paths separately
- Simplicity: Clear separation of concerns
┌──────────────────────────────────────────────────────────────────────────┐
│ External Webhook Providers │
│ (Stripe, PayPal, GitHub, Custom, etc.) │
└────────────────────────────────┬─────────────────────────────────────────┘
│ HTTPS POST
▼
┌──────────────────────────────────────────────────────────────────────────┐
│ Load Balancer / API Gateway │
└────────────────────────────────┬─────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────────┐
│ Webhook Platform (Multiple Instances) │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ WebhookController │ │
│ │ - Receives HTTP POST │ │
│ │ - Extracts headers, payload, metadata │ │
│ │ - Returns 200 OK immediately │ │
│ └────────────────────────┬───────────────────────────────────────────┘ │
│ │ │
│ ┌────────────────────────▼───────────────────────────────────────────┐ │
│ │ WebhookProcessingService │ │
│ │ - Maps DTO to Domain Event │ │
│ │ - Determines destination topic │ │
│ └────────────────────────┬───────────────────────────────────────────┘ │
│ │ │
│ ┌────────────────────────▼───────────────────────────────────────────┐ │
│ │ EventPublisherFactory (lib-common-eda) │ │
│ │ - Publishes to Kafka/RabbitMQ │ │
│ │ - Sets message headers │ │
│ └────────────────────────┬───────────────────────────────────────────┘ │
└───────────────────────────┼──────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────────┐
│ Kafka Cluster (3+ brokers) │
│ Topics: stripe, paypal, github, custom-provider, etc. │
└────────────────────────────┬─────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────────┐
│ Worker Applications (Multiple Instances) │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ AbstractWebhookEventListener │ │
│ │ - Consumes from Kafka │ │
│ │ - Checks idempotency (Redis) │ │
│ │ - Validates signature │ │
│ └────────────────────────┬───────────────────────────────────────────┘ │
│ │ │
│ ┌────────────────────────▼───────────────────────────────────────────┐ │
│ │ WebhookProcessor (Implementation) │ │
│ │ - Executes business logic │ │
│ │ - Calls external APIs │ │
│ │ - Updates database │ │
│ └────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────────┐
│ Redis Cluster (Idempotency) │
│ Keys: webhook:processing:{eventId}, webhook:processed:{eventId} │
└──────────────────────────────────────────────────────────────────────────┘
- Responsibility: Receive webhooks and publish to Kafka
- Scaling: Horizontal (stateless)
- Technology: Spring Boot WebFlux, Netty
- Performance: ~10,000 requests/second per instance
- Responsibility: Message broker and event log
- Scaling: Horizontal (add brokers and partitions)
- Technology: Apache Kafka 3.6.1
- Performance: Millions of messages/second
- Responsibility: Process webhooks with business logic
- Scaling: Horizontal (consumer groups)
- Technology: Spring Boot, Spring Kafka
- Performance: Depends on business logic complexity
- Responsibility: Distributed idempotency and caching
- Scaling: Horizontal (Redis Cluster mode)
- Technology: Redis 7+
- Performance: Sub-millisecond latency
┌──────────────────────────────────────────────────────────────┐
│ interfaces (DTOs) │
│ - WebhookEventDTO │
│ - WebhookResponseDTO │
└────────────────────────┬─────────────────────────────────────┘
│
│ depends on
▼
┌──────────────────────────────────────────────────────────────┐
│ core (Business Logic) │
│ - WebhookProcessingService │
│ - WebhookProcessingServiceImpl │
│ - WebhookReceivedEvent │
│ - WebhookEventMapper │
└────────────────────────┬─────────────────────────────────────┘
│
│ depends on
▼
┌──────────────────────────────────────────────────────────────┐
│ web (REST API) │
│ - WebhookManagementApplication │
│ - WebhookController │
│ - HealthCheckController │
│ - application.yml │
└──────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ processor (Worker Framework) │
│ - WebhookProcessor (Port) │
│ - AbstractWebhookEventListener (Adapter) │
│ - WebhookIdempotencyService (Port) │
│ - CacheBasedWebhookIdempotencyService (Adapter) │
│ - WebhookSignatureValidator (Port) │
│ - WebhookProcessingContext │
│ - WebhookProcessingResult │
└──────────────────────────────────────────────────────────────┘
Purpose: Shared contracts and DTOs
Design Pattern: Data Transfer Object (DTO)
Key Classes:
WebhookEventDTO: Immutable DTO with all webhook data (eventId, providerName, payload, headers, etc.)WebhookResponseDTO: Enhanced HTTP response DTO with:- Event tracking (eventId, status, message)
- Timestamps (receivedAt, processedAt)
- Payload echo (receivedPayload)
- Processing metadata (destination, sourceIp, payloadSize, headerCount)
Dependencies: None (pure POJOs with Jackson annotations)
Design Decisions:
- Immutable objects for thread safety
- Jackson annotations for JSON serialization
- Lombok for boilerplate reduction
- Validation annotations for input validation
Purpose: Business logic, services, configuration, and infrastructure
Design Patterns:
- Domain-Driven Design (DDD)
- Decorator Pattern (Resilience)
- Strategy Pattern (Validation)
Package Structure:
org.fireflyframework.webhooks.core/
├── config/ # Configuration classes
│ ├── ResilienceConfig # Resilience4j configuration
│ └── WebhookSecurityProperties # Security properties
├── domain/events/ # Domain events
│ └── WebhookReceivedEvent # Webhook received event
├── filter/ # Web filters
│ └── TracingWebFilter # Distributed tracing
├── health/ # Health indicators
│ └── WebhookCircuitBreakerHealthIndicator
├── idempotency/ # (Removed - now using lib-common-web)
│ # HTTP-level idempotency moved to lib-common-web IdempotencyWebFilter
├── mappers/ # MapStruct mappers
│ └── WebhookEventMapper # DTO ↔ Event mapping
├── metrics/ # Metrics services
│ └── WebhookMetricsService # Custom metrics
├── ratelimit/ # Rate limiting
│ └── WebhookRateLimitService
├── resilience/ # Resilience patterns
│ └── ResilientWebhookProcessingService
├── services/ # Business services
│ ├── WebhookProcessingService
│ └── impl/WebhookProcessingServiceImpl
└── validation/ # Validation services
└── WebhookValidator # Request validation
Key Features:
- Resilience Patterns: Circuit breaker, rate limiter, timeout, bulkhead
- Security: Payload validation, rate limiting, IP whitelisting
- Observability: Custom metrics, distributed tracing, health indicators
- HTTP Idempotency: Handled by lib-common-web IdempotencyWebFilter (transparent)
- Event Idempotency: Worker-level via CacheBasedWebhookIdempotencyService (processor module)
Dependencies:
lib-common-eda- Event publishinglib-common-cache- Redis/Caffeine cachinglib-common-core- Core utilities- Resilience4j - Resilience patterns
- Micrometer - Metrics
- MapStruct - DTO mapping
- Apache Commons Net - CIDR notation IP matching
Design Decisions:
- Decorator Pattern:
ResilientWebhookProcessingServicedecoratesWebhookProcessingServiceImpl - Environment Variables: All properties support environment variable configuration
- Reactive Programming: All services return Mono/Flux for non-blocking operations
- Separation of Concerns: Each package has a single responsibility
Purpose: Main Spring Boot application and REST controllers ONLY
Design Pattern: MVC (Model-View-Controller)
Key Classes:
WebhookManagementApplication: Main application classcontrollers/WebhookController: REST controller for webhook ingestion
Note: All business logic, services, configuration, and infrastructure code has been moved to the -core module. This module contains only the application entry point and REST controllers.
Dependencies:
fireflyframework-webhooks-core- Core business logic- Spring Boot WebFlux
- Spring Boot Actuator
- SpringDoc OpenAPI
lib-common-weblib-common-edalib-common-cache
Design Decisions:
- Reactive controllers for non-blocking I/O
- OpenAPI for API documentation
- Actuator for health checks and metrics
- Minimal business logic (delegated to core)
Purpose: Worker framework for webhook processing
Design Pattern: Hexagonal Architecture (Ports & Adapters)
Key Classes:
-
Ports (Interfaces):
WebhookProcessor- Business logic interfaceWebhookIdempotencyService- Idempotency interfaceWebhookSignatureValidator- Signature validation interface
-
Adapters (Implementations):
AbstractWebhookEventListener- Kafka consumer adapterCacheBasedWebhookIdempotencyService- Redis adapter
-
Models:
WebhookProcessingContext- Context objectWebhookProcessingResult- Result object
Dependencies:
lib-common-eda- Event consumptionlib-common-cache- Caching- Spring Kafka
Design Decisions:
- Abstract base class for common logic
- Template method pattern for lifecycle hooks
- Strategy pattern for signature validation
- Reactive processing with Mono/Flux
1. HTTP POST /api/v1/webhook/{providerName}
│
├─> WebhookController.receiveWebhook()
│ │
│ ├─> Extract headers, payload, metadata
│ ├─> Create WebhookEventDTO
│ │
│ └─> WebhookProcessingService.processWebhook()
│ │
│ ├─> WebhookEventMapper.toDomainEvent()
│ │ └─> Create WebhookReceivedEvent
│ │
│ ├─> Determine destination topic
│ │ └─> Apply routing strategy
│ │
│ └─> EventPublisherFactory.publish()
│ │
│ ├─> Serialize to JSON
│ ├─> Set Kafka headers
│ └─> Send to Kafka topic
│
└─> Return 202 ACCEPTED with WebhookResponseDTO
- eventId, status, message
- receivedAt, processedAt timestamps
- receivedPayload (echo for verification)
- metadata (destination, sourceIp, payloadSize, etc.)
1. Kafka Consumer polls topic
│
├─> AbstractWebhookEventListener.onEvent()
│ │
│ ├─> Deserialize JSON to WebhookReceivedEvent
│ ├─> Create WebhookProcessingContext
│ │
│ ├─> beforeProcess() hook
│ │
│ ├─> WebhookIdempotencyService.tryAcquireProcessingLock()
│ │ │
│ │ ├─> Check Redis: webhook:processing:{eventId}
│ │ ├─> If exists → Skip (already processing)
│ │ └─> If not exists → Acquire lock (SET NX EX)
│ │
│ ├─> WebhookIdempotencyService.isAlreadyProcessed()
│ │ │
│ │ └─> Check Redis: webhook:processed:{eventId}
│ │ ├─> If exists → Skip (already processed)
│ │ └─> If not exists → Continue
│ │
│ ├─> WebhookSignatureValidator.validate()
│ │ │
│ │ ├─> Extract signature from headers
│ │ ├─> Compute expected signature
│ │ └─> Compare signatures
│ │ ├─> If invalid → Reject
│ │ └─> If valid → Continue
│ │
│ ├─> WebhookProcessor.process()
│ │ │
│ │ └─> Execute business logic
│ │ ├─> Parse payload
│ │ ├─> Call external APIs
│ │ ├─> Update database
│ │ └─> Return WebhookProcessingResult
│ │
│ ├─> WebhookIdempotencyService.markAsProcessed()
│ │ │
│ │ └─> Set Redis: webhook:processed:{eventId} (TTL: 24h)
│ │
│ ├─> WebhookIdempotencyService.releaseProcessingLock()
│ │ │
│ │ └─> Delete Redis: webhook:processing:{eventId}
│ │
│ ├─> afterProcess() hook
│ │
│ └─> Commit Kafka offset
│
└─> If error → onError() hook
│
├─> WebhookIdempotencyService.recordProcessingFailure()
│ │
│ └─> Increment Redis: webhook:failures:{eventId}
│
└─> Retry or DLQ (Dead Letter Queue)
## Design Decisions
### 1. Why Reactive Programming?
**Decision**: Use Spring WebFlux and Project Reactor instead of traditional blocking I/O
**Rationale**:
- **High Concurrency**: Handle thousands of concurrent webhook requests with fewer threads
- **Non-Blocking I/O**: Kafka publishing is non-blocking, improving throughput
- **Backpressure**: Reactor provides built-in backpressure handling
- **Resource Efficiency**: Lower memory footprint compared to thread-per-request model
**Trade-offs**:
- **Complexity**: Reactive code is harder to debug and understand
- **Learning Curve**: Team needs to learn reactive programming
- **Ecosystem**: Some libraries don't support reactive patterns
**Outcome**: The benefits outweigh the costs for a high-throughput webhook platform
### 2. Why Separate Ingestion from Processing?
**Decision**: Split webhook ingestion (producer) and processing (consumer) into separate applications
**Rationale**:
- **Scalability**: Scale ingestion and processing independently based on load
- **Reliability**: Kafka provides buffering during processing delays
- **Flexibility**: Multiple consumers can process the same webhook differently
- **Fault Isolation**: Processing failures don't affect ingestion
**Trade-offs**:
- **Latency**: Additional hop through Kafka adds latency (~10-50ms)
- **Complexity**: More moving parts to deploy and monitor
- **Eventual Consistency**: Processing is asynchronous
**Outcome**: The architecture is more scalable and resilient
### 3. Why Dynamic Provider Support?
**Decision**: Use path parameter `{providerName}` instead of enum or hardcoded providers
**Rationale**:
- **Zero Deployment**: Add new providers without code changes or deployments
- **Flexibility**: Support custom/internal webhooks easily
- **Simplicity**: No need to maintain provider registry
- **Scalability**: Handle unlimited providers
**Trade-offs**:
- **Validation**: No compile-time validation of provider names
- **Documentation**: Harder to document all supported providers
- **Type Safety**: Lose type safety for provider-specific logic
**Outcome**: Flexibility and ease of use outweigh type safety concerns
### 4. Why AS-IS Payload Preservation?
**Decision**: Store webhook payloads exactly as received without transformation
**Rationale**:
- **Signature Validation**: Headers needed for downstream signature verification
- **No Data Loss**: Complete information preserved for consumers
- **Provider Flexibility**: Each provider has unique payload structures
- **Simplicity**: No complex transformations at ingestion
**Trade-offs**:
- **Storage**: Larger payloads consume more Kafka storage
- **Parsing**: Consumers must parse JSON payloads
- **Validation**: No schema validation at ingestion
**Outcome**: Preserving original data is more valuable than transformation
### 5. Why Redis for Idempotency?
**Decision**: Use Redis instead of database for idempotency tracking
**Rationale**:
- **Performance**: Sub-millisecond latency for idempotency checks
- **TTL Support**: Automatic expiration of old idempotency keys
- **Atomic Operations**: SET NX EX for distributed locking
- **Scalability**: Redis Cluster for horizontal scaling
**Trade-offs**:
- **Persistence**: Redis is not as durable as a database
- **Cost**: Additional infrastructure component
- **Complexity**: Need to manage Redis cluster
**Outcome**: Performance benefits justify the additional complexity
### 6. Why Hexagonal Architecture for Processor?
**Decision**: Use ports and adapters pattern in processor module
**Rationale**:
- **Testability**: Core logic can be tested without infrastructure
- **Flexibility**: Swap Kafka for RabbitMQ without changing core logic
- **Maintainability**: Clear boundaries between layers
- **Reusability**: Core logic can be reused across different adapters
**Trade-offs**:
- **Boilerplate**: More interfaces and abstractions
- **Complexity**: Harder for junior developers to understand
- **Indirection**: More layers to navigate
**Outcome**: Long-term maintainability justifies the upfront complexity
## Scalability & Performance
### Horizontal Scaling
#### Webhook Platform (Producer)
- **Stateless**: No session state, can scale horizontally
- **Load Balancing**: Use round-robin or least-connections
- **Auto-Scaling**: Scale based on CPU, memory, or request rate
- **Recommended**: 3+ instances for high availability
┌─────────────┐ │Load Balancer│ └──────┬──────┘ │ ┌───┴───┬───────┬───────┐ │ │ │ │ ▼ ▼ ▼ ▼ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │ Pod1│ │ Pod2│ │ Pod3│ │ PodN│ └─────┘ └─────┘ └─────┘ └─────┘
#### Worker Applications (Consumer)
- **Consumer Groups**: Kafka distributes partitions across consumers
- **Partition-Based**: Each partition processed by one consumer
- **Auto-Scaling**: Scale based on consumer lag
- **Recommended**: Number of consumers ≤ number of partitions
Topic: stripe (6 partitions) ┌────┬────┬────┬────┬────┬────┐ │ P0 │ P1 │ P2 │ P3 │ P4 │ P5 │ └─┬──┴─┬──┴─┬──┴─┬──┴─┬──┴─┬──┘ │ │ │ │ │ │ ▼ ▼ ▼ ▼ ▼ ▼ ┌───┐┌───┐┌───┐┌───┐┌───┐┌───┐ │C1 ││C2 ││C3 ││C4 ││C5 ││C6 │ └───┘└───┘└───┘└───┘└───┘└───┘ Consumer Group: webhook-worker
#### Kafka Cluster
- **Brokers**: 3+ brokers for high availability
- **Partitions**: More partitions = more parallelism
- **Replication**: Replication factor of 3 for durability
- **Recommended**: 6-12 partitions per topic
#### Redis Cluster
- **Cluster Mode**: 3+ master nodes with replicas
- **Sharding**: Data distributed across nodes
- **Replication**: Each master has 1+ replicas
- **Recommended**: 3 masters + 3 replicas
### Performance Characteristics
#### Webhook Platform
- **Throughput**: ~10,000 requests/second per instance
- **Latency**: P50: 5ms, P95: 15ms, P99: 50ms
- **Memory**: ~512MB per instance
- **CPU**: ~1 core per instance
#### Worker Applications
- **Throughput**: Depends on business logic complexity
- **Latency**: P50: 50ms, P95: 200ms, P99: 500ms
- **Memory**: ~1GB per instance
- **CPU**: ~2 cores per instance
#### Kafka
- **Throughput**: Millions of messages/second
- **Latency**: P50: 2ms, P95: 10ms, P99: 50ms
- **Storage**: Depends on retention policy (default: 7 days)
#### Redis
- **Throughput**: 100,000+ operations/second per node
- **Latency**: P50: <1ms, P95: 2ms, P99: 5ms
- **Memory**: Depends on idempotency key count
### Performance Optimization
#### Webhook Platform
1. **Connection Pooling**: Reuse Kafka producer connections
2. **Batch Publishing**: Batch multiple events (if applicable)
3. **Compression**: Use Snappy compression for Kafka messages
4. **Async Processing**: Use reactive chains, avoid blocking
#### Worker Applications
1. **Parallel Processing**: Process multiple events concurrently
2. **Batch Operations**: Batch database operations
3. **Caching**: Cache frequently accessed data
4. **Circuit Breakers**: Prevent cascading failures
#### Kafka
1. **Partitioning**: More partitions for higher parallelism
2. **Compression**: Enable compression (Snappy or LZ4)
3. **Batching**: Increase batch size for higher throughput
4. **Replication**: Balance replication factor vs. performance
#### Redis
1. **Pipelining**: Batch multiple commands
2. **Connection Pooling**: Reuse connections
3. **Cluster Mode**: Distribute load across nodes
4. **TTL**: Set appropriate TTL for idempotency keys
## Security
### Security Features
The platform implements multiple layers of security:
#### 1. Payload Size Validation
**Purpose**: Prevent DoS attacks via large payloads
**Configuration**:
```yaml
firefly:
webhooks:
security:
max-payload-size: 1048576 # 1MB (default)
validate-payload-size: true
Environment Variable:
FIREFLY_WEBHOOKS_SECURITY_MAX_PAYLOAD_SIZE=1048576
FIREFLY_WEBHOOKS_SECURITY_VALIDATE_PAYLOAD_SIZE=trueBehavior: Returns HTTP 413 Payload Too Large if exceeded
Purpose: Prevent injection attacks via malicious provider names
Configuration:
firefly:
webhooks:
security:
validate-provider-name: true
provider-name-pattern: "^[a-z0-9-]+$" # Alphanumeric and hyphens onlyEnvironment Variable:
FIREFLY_WEBHOOKS_SECURITY_VALIDATE_PROVIDER_NAME=true
FIREFLY_WEBHOOKS_SECURITY_PROVIDER_NAME_PATTERN="^[a-z0-9-]+$"Behavior: Returns HTTP 400 Bad Request if invalid
Purpose: Restrict access to known provider IPs
Supported Formats:
- Exact IP addresses:
54.187.174.169 - CIDR notation:
192.30.252.0/22(matches entire IP ranges)
Implementation:
- Uses Apache Commons Net library (
SubnetUtils) for CIDR matching - Validates both exact IP matches and CIDR range matches
- Logs warnings for invalid IP addresses or CIDR notations
- Configured per provider for granular control
Configuration:
firefly:
webhooks:
security:
enable-ip-whitelist: true
ip-whitelist:
stripe:
- "54.187.174.169" # Exact IP
- "54.187.205.235" # Exact IP
github:
- "192.30.252.0/22" # CIDR notation (192.30.252.0 - 192.30.255.255)
- "185.199.108.0/22" # CIDR notation (185.199.108.0 - 185.199.111.255)
paypal:
- "173.0.82.0/24" # CIDR notation (173.0.82.0 - 173.0.82.255)Environment Variable:
FIREFLY_WEBHOOKS_SECURITY_ENABLE_IP_WHITELIST=true
# Supports both exact IPs and CIDR notation
FIREFLY_WEBHOOKS_SECURITY_IP_WHITELIST='{"stripe":["54.187.174.169","54.187.205.235"],"github":["192.30.252.0/22","185.199.108.0/22"]}'Behavior: Returns HTTP 403 Forbidden if IP not whitelisted
Example CIDR Ranges:
/32- Single IP (e.g.,192.168.1.1/32= only192.168.1.1)/24- 256 IPs (e.g.,192.168.1.0/24=192.168.1.0-192.168.1.255)/22- 1024 IPs (e.g.,192.30.252.0/22=192.30.252.0-192.30.255.255)/16- 65,536 IPs (e.g.,10.0.0.0/16=10.0.0.0-10.0.255.255)
Purpose: Prevent abuse and DoS attacks
Per-Provider Rate Limiting:
- Default: 100 requests/second per provider
- Configurable per provider via Resilience4j
Per-IP Rate Limiting:
- Default: 100 requests/second per IP address
- Prevents single IP from overwhelming the system
Behavior: Returns HTTP 429 Too Many Requests if exceeded
Purpose: Prevent duplicate processing of the same webhook
Configuration:
firefly:
webhooks:
security:
enable-http-idempotency: true
http-idempotency-ttl-seconds: 86400 # 24 hoursEnvironment Variable:
FIREFLY_WEBHOOKS_SECURITY_ENABLE_HTTP_IDEMPOTENCY=true
FIREFLY_WEBHOOKS_SECURITY_HTTP_IDEMPOTENCY_TTL_SECONDS=86400Usage:
curl -X POST http://localhost:8080/api/v1/webhook/stripe \
-H "Content-Type: application/json" \
-H "X-Idempotency-Key: unique-key-123" \
-d '{"event": "payment.succeeded"}'Behavior:
- First request: Processes webhook and caches response
- Duplicate requests (within 24h): Returns cached response (HTTP 200)
Purpose: Ensure proper content type headers
Configuration:
firefly:
webhooks:
security:
enable-request-validation: true
require-content-type: trueEnvironment Variable:
FIREFLY_WEBHOOKS_SECURITY_ENABLE_REQUEST_VALIDATION=true
FIREFLY_WEBHOOKS_SECURITY_REQUIRE_CONTENT_TYPE=trueBehavior: Returns HTTP 400 Bad Request if Content-Type header missing
- No Authentication: Webhook providers don't support OAuth/JWT
- IP Whitelisting: Restrict access to known provider IPs (optional, configurable)
- Rate Limiting: Prevent abuse (per-provider and per-IP)
- Payload Validation: Validate payload size, provider name, content-type
- Signature Validation: Verify webhook signatures (provider-specific)
- HMAC SHA256: Most providers use HMAC SHA256
- Timestamp Validation: Reject old webhooks (prevent replay attacks)
- Idempotency: Prevent duplicate processing using Redis locks (7 days TTL)
Each provider has a unique signature algorithm:
String signature = headers.get("Stripe-Signature");
String payload = rawBody;
String secret = "whsec_...";
// Stripe uses HMAC SHA256
String expectedSignature = HmacUtils.hmacSha256Hex(secret, timestamp + "." + payload);
boolean valid = signature.contains("v1=" + expectedSignature);String signature = headers.get("X-Hub-Signature-256");
String payload = rawBody;
String secret = "secret";
// GitHub uses HMAC SHA256
String expectedSignature = "sha256=" + HmacUtils.hmacSha256Hex(secret, payload);
boolean valid = signature.equals(expectedSignature);- TLS 1.2+: All HTTP traffic uses TLS
- Kafka TLS: Enable TLS for Kafka (optional)
- Redis TLS: Enable TLS for Redis (optional)
- Kafka Encryption: Enable encryption at rest (optional)
- Redis Encryption: Enable encryption at rest (optional)
- Secrets Management: Use Kubernetes Secrets or Vault
- GDPR: Webhook payloads may contain PII
- PCI DSS: Payment webhooks may contain sensitive data
- Data Retention: Configure Kafka retention policy
- Audit Logging: All webhooks are logged with correlation IDs
The platform implements comprehensive resilience patterns using Resilience4j:
Purpose: Prevents cascading failures when Kafka is down
Configuration:
resilience4j:
circuitbreaker:
instances:
webhookKafkaPublisher:
failure-rate-threshold: 50 # Open circuit if 50% of calls fail
slow-call-rate-threshold: 50 # Open circuit if 50% of calls are slow
slow-call-duration-threshold: 5s # Call is slow if > 5 seconds
wait-duration-in-open-state: 30s # Wait 30s before trying half-open
permitted-number-of-calls-in-half-open-state: 5
sliding-window-type: COUNT_BASED
sliding-window-size: 10
minimum-number-of-calls: 5Implementation:
@Service
@Primary
public class ResilientWebhookProcessingService implements WebhookProcessingService {
private final WebhookProcessingService delegate;
private final CircuitBreaker circuitBreaker;
private final TimeLimiter timeLimiter;
@Override
public Mono<WebhookResponseDTO> processWebhook(WebhookEventDTO webhookEvent) {
return Mono.fromCallable(() ->
circuitBreaker.decorateSupplier(() ->
timeLimiter.executeFutureSupplier(() ->
delegate.processWebhook(webhookEvent).toFuture()
)
).get()
).flatMap(Mono::fromFuture);
}
}Behavior:
- CLOSED: Normal operation, all requests pass through
- OPEN: Circuit is open, requests fail immediately (no fallback, relies on lib-common-eda)
- HALF_OPEN: Testing if service recovered, limited requests allowed
Health Check:
{
"webhookCircuitBreaker": {
"status": "UP",
"details": {
"circuitBreakerName": "webhookKafkaPublisher",
"state": "CLOSED",
"failureRate": "0.0%",
"slowCallRate": "0.0%"
}
}
}Purpose: Protects against abuse and DoS attacks
Per-Provider Rate Limiting:
resilience4j:
ratelimiter:
instances:
webhook-provider-default:
limit-for-period: 100 # 100 requests
limit-refresh-period: 1s # per second
timeout-duration: 0s # Fail immediately if limit exceededPer-IP Rate Limiting:
resilience4j:
ratelimiter:
instances:
webhook-ip-default:
limit-for-period: 100 # 100 requests
limit-refresh-period: 1s # per second
timeout-duration: 0sImplementation:
@Service
public class WebhookRateLimitService {
public Mono<Void> checkRateLimit(String providerName, String ipAddress) {
return Mono.fromRunnable(() -> {
// Check provider rate limit
RateLimiter providerLimiter = getRateLimiter("webhook-provider-" + providerName);
if (!providerLimiter.acquirePermission()) {
throw new RateLimitExceededException("Provider rate limit exceeded");
}
// Check IP rate limit
RateLimiter ipLimiter = getRateLimiter("webhook-ip-" + ipAddress);
if (!ipLimiter.acquirePermission()) {
throw new RateLimitExceededException("IP rate limit exceeded");
}
});
}
}Response: HTTP 429 Too Many Requests
Purpose: Prevents hanging operations
Configuration:
resilience4j:
timelimiter:
instances:
webhookKafkaPublisher:
timeout-duration: 10s # Timeout after 10 seconds
cancel-running-future: true # Cancel the future on timeoutBehavior:
- If Kafka publishing takes > 10 seconds, the operation times out
- Returns HTTP 500 to webhook sender
- Webhook sender should retry
Purpose: Resource isolation to prevent thread pool exhaustion
Configuration (configured but not yet fully implemented):
resilience4j:
bulkhead:
instances:
webhookProcessing:
max-concurrent-calls: 100 # Max 100 concurrent webhook processing calls
max-wait-duration: 0s # Don't wait if limit reached- Circuit Breaker: Fails fast if Kafka is unavailable (no fallback)
- Timeout: Returns 500 if publishing takes > 10 seconds
- Rate Limiting: Returns 429 if rate limit exceeded
- Validation Errors: Returns 400 for invalid payloads
- Logging: All errors logged with correlation IDs (traceId, spanId, transactionId)
- Retry Logic: Retry transient failures with exponential backoff
- Dead Letter Queue: Move failed messages to DLQ after max retries
- Circuit Breaker: Stop processing if downstream service is down
- Graceful Degradation: Continue processing other events
- Idempotency: Prevent duplicate processing using Redis locks
public Mono<WebhookProcessingResult> process(WebhookProcessingContext context) {
return executeBusinessLogic(context)
.retryWhen(Retry.backoff(3, Duration.ofSeconds(1))
.filter(throwable -> isTransientError(throwable))
.onRetryExhaustedThrow((retryBackoffSpec, retrySignal) ->
new MaxRetriesExceededException("Max retries exceeded")));
}Idempotency ensures exactly-once processing:
- Acquire Lock: Try to acquire processing lock in Redis
- Check Processed: Check if event already processed
- Process: Execute business logic
- Mark Processed: Mark event as processed in Redis
- Release Lock: Release processing lock
public Mono<WebhookProcessingResult> processWithIdempotency(WebhookProcessingContext context) {
return idempotencyService.tryAcquireProcessingLock(context.getEventId())
.flatMap(acquired -> {
if (!acquired) {
return Mono.just(WebhookProcessingResult.skipped("Already processing"));
}
return idempotencyService.isAlreadyProcessed(context.getEventId())
.flatMap(processed -> {
if (processed) {
return Mono.just(WebhookProcessingResult.skipped("Already processed"));
}
return processor.process(context)
.flatMap(result ->
idempotencyService.markAsProcessed(context.getEventId())
.thenReturn(result));
})
.doFinally(signal ->
idempotencyService.releaseProcessingLock(context.getEventId()).subscribe());
});
}- Ingestion Rate: Webhooks received per second
- Processing Rate: Webhooks processed per second
- Error Rate: Failed webhooks per second
- Consumer Lag: Kafka consumer lag
- Idempotency Hit Rate: Duplicate webhook rate
- High Error Rate: Alert if error rate > 5%
- High Consumer Lag: Alert if lag > 10,000 messages
- Kafka Down: Alert if Kafka is unavailable
- Redis Down: Alert if Redis is unavailable
- High Latency: Alert if P99 latency > 1 second
The Firefly Framework Webhooks Library is designed for high throughput, scalability, and reliability. The architecture follows industry best practices including reactive programming, event-driven architecture, hexagonal architecture, and idempotency. The platform can handle millions of webhooks per day while maintaining low latency and high availability.