Overview
Multi-node deployments enable:- Cluster-wide visibility - Monitor AI activity across all servers
- Centralized analysis - Aggregate events in one place
- Scalable architecture - Add nodes without reconfiguration
- Production observability - Track AI usage across your infrastructure
Architecture Patterns
Pattern 1: OTLP → OpenTelemetry Collector
Best for: Modern observability stacks, cloud-native environments- Standard observability protocol
- Rich ecosystem of exporters (Prometheus, Datadog, etc.)
- Built-in batching, retry, backpressure
- Vendor-neutral
Pattern 2: Kafka Stream
Best for: High-volume environments, stream processing pipelines- Durability (events survive crashes)
- Multiple consumers
- Replay capability
- High throughput (millions of events/sec)
Pattern 3: File-Based with Log Aggregation
Best for: Existing log infrastructure, simple setups- Works with existing log pipelines
- Simple sensor configuration
- File-based buffering
- Easy debugging (cat events.jsonl)
Deployment: OTLP with OpenTelemetry Collector
Step 1: Deploy OpenTelemetry Collector
Install on central aggregation server:/etc/otel/config.yaml:
Step 2: Configure OISP Sensor on Each Node
On each application server, create/etc/oisp/config.toml:
sensor.name for each.
Step 3: Verify Event Flow
On collector server:Deployment: Kafka
Step 1: Deploy Kafka Cluster
Option A: Managed Kafka (Recommended for production)- AWS MSK
- Confluent Cloud
- Azure Event Hubs
Step 2: Configure OISP Sensor
On each server,/etc/oisp/config.toml:
Step 3: Consume Events
Console consumer (testing):Deployment: File-Based with Fluent Bit
Step 1: Configure OISP Sensor to Write Files
On each server:Step 2: Install Fluent Bit on Each Server
/etc/fluent-bit/fluent-bit.conf:
Node Identification
Ensure each sensor has a unique identifier:Health Monitoring
Sensor Health Checks
On each sensor node:Collector Health Checks
For OTel Collector:Security Considerations
Network Security
1. TLS encryption for OTLP:Data Security
1. Redaction mode:Scaling Considerations
Horizontal Scaling
OISP Sensor:- Scales linearly with servers
- Each sensor is independent
- No coordination required
- Deploy multiple collectors behind load balancer
- Use sticky sessions for trace correlation
- Add partitions to scale throughput
- Add brokers to scale storage
Performance Tuning
Sensor batching:Example Topologies
Small Deployment (5-10 servers)
Medium Deployment (10-50 servers)
Large Deployment (50+ servers)
Troubleshooting
Events not reaching collector
1. Check sensor logs:High latency
1. Check batch sizes:- Increase batch size to reduce network calls
- Decrease batch timeout for lower latency
Data loss
1. Enable durable queue: For Kafka, ensure replication factor ≥ 3. For OTel Collector, enable persistent queue:Next Steps
- Kubernetes Deployment - Deploy in Kubernetes
- Production Guide - Production best practices
- Configuration Reference - Full config options