Skip to main content
Deploy OISP Sensor across multiple servers with centralized event aggregation and analysis.

Overview

Multi-node deployments enable:
  • Cluster-wide visibility - Monitor AI activity across all servers
  • Centralized analysis - Aggregate events in one place
  • Scalable architecture - Add nodes without reconfiguration
  • Production observability - Track AI usage across your infrastructure

Architecture Patterns

Pattern 1: OTLP → OpenTelemetry Collector

Best for: Modern observability stacks, cloud-native environments
┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ Server 1     │────▶│              │     │              │
│ oisp-sensor  │ OTLP│ OpenTelemetry│────▶│ Backend      │
└──────────────┘     │ Collector    │     │ (Grafana,    │
┌──────────────┐     │              │     │  Datadog,    │
│ Server 2     │────▶│ (Aggregates  │     │  Elastic)    │
│ oisp-sensor  │ OTLP│  & Routes)   │     │              │
└──────────────┘     └──────────────┘     └──────────────┘
┌──────────────┐
│ Server N     │────▶
│ oisp-sensor  │ OTLP
└──────────────┘
Advantages:
  • Standard observability protocol
  • Rich ecosystem of exporters (Prometheus, Datadog, etc.)
  • Built-in batching, retry, backpressure
  • Vendor-neutral

Pattern 2: Kafka Stream

Best for: High-volume environments, stream processing pipelines
┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ Server 1     │────▶│              │     │ Consumer 1   │
│ oisp-sensor  │     │ Kafka        │────▶│ (Analytics)  │
└──────────────┘     │ Cluster      │     └──────────────┘
┌──────────────┐     │              │     ┌──────────────┐
│ Server 2     │────▶│ (Durable     │────▶│ Consumer 2   │
│ oisp-sensor  │     │  Queue)      │     │ (Storage)    │
└──────────────┘     └──────────────┘     └──────────────┘
┌──────────────┐
│ Server N     │────▶
│ oisp-sensor  │
└──────────────┘
Advantages:
  • Durability (events survive crashes)
  • Multiple consumers
  • Replay capability
  • High throughput (millions of events/sec)

Pattern 3: File-Based with Log Aggregation

Best for: Existing log infrastructure, simple setups
┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ Server 1     │     │ Fluent Bit / │     │ S3 /         │
│ oisp-sensor  │────▶│ Vector /     │────▶│ Elasticsearch│
│ → JSONL      │ tail│ Filebeat     │     │ / Loki       │
└──────────────┘     └──────────────┘     └──────────────┘
┌──────────────┐
│ Server 2     │────▶
│ oisp-sensor  │
│ → JSONL      │
└──────────────┘
Advantages:
  • Works with existing log pipelines
  • Simple sensor configuration
  • File-based buffering
  • Easy debugging (cat events.jsonl)

Deployment: OTLP with OpenTelemetry Collector

Step 1: Deploy OpenTelemetry Collector

Install on central aggregation server:
# Download OTel Collector
wget https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.91.0/otelcol_0.91.0_linux_amd64.tar.gz
tar -xvf otelcol_0.91.0_linux_amd64.tar.gz
sudo mv otelcol /usr/local/bin/

# Create config
sudo mkdir -p /etc/otel
Create /etc/otel/config.yaml:
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 10s
    send_batch_size: 100

  attributes:
    actions:
      - key: deployment
        value: production
        action: insert

exporters:
  logging:
    loglevel: info

  file:
    path: /var/log/otel/events.jsonl

  prometheus:
    endpoint: "0.0.0.0:8889"

  # Optional: Forward to observability backend
  otlp/datadog:
    endpoint: api.datadoghq.com:443
    headers:
      DD-API-KEY: ${DATADOG_API_KEY}

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, attributes]
      exporters: [logging, file, prometheus]
Create systemd service:
sudo tee /etc/systemd/system/otel-collector.service <<EOF
[Unit]
Description=OpenTelemetry Collector
After=network.target

[Service]
Type=simple
User=otel
ExecStart=/usr/local/bin/otelcol --config=/etc/otel/config.yaml
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

# Create user and directories
sudo useradd -r -s /bin/false otel
sudo mkdir -p /var/log/otel
sudo chown otel:otel /var/log/otel

# Start service
sudo systemctl daemon-reload
sudo systemctl enable otel-collector
sudo systemctl start otel-collector
Verify collector is running:
sudo systemctl status otel-collector
curl http://localhost:8889/metrics  # Prometheus metrics

Step 2: Configure OISP Sensor on Each Node

On each application server, create /etc/oisp/config.toml:
[sensor]
name = "prod-server-01"  # Unique name per server

[capture]
ssl = true
process = true

[redaction]
mode = "safe"  # Redact sensitive data

[export.otlp]
enabled = true
endpoint = "http://collector.example.com:4317"  # Your OTel collector
headers = {
  "x-server-name" = "prod-server-01",
  "x-datacenter" = "us-east-1"
}
Install and start sensor:
# Install
curl -fsSL https://github.com/oximyhq/sensor/releases/latest/download/oisp-sensor-x86_64-unknown-linux-gnu.tar.gz | tar xz
sudo mv oisp-sensor /usr/local/bin/
sudo setcap cap_sys_admin,cap_bpf,cap_perfmon,cap_net_admin+ep /usr/local/bin/oisp-sensor

# Start as service
sudo systemctl enable oisp-sensor
sudo systemctl start oisp-sensor
sudo systemctl status oisp-sensor
Repeat for all servers, changing sensor.name for each.

Step 3: Verify Event Flow

On collector server:
# Check OTLP receiver is getting events
sudo journalctl -u otel-collector -f

# Check file output
tail -f /var/log/otel/events.jsonl
On sensor servers:
# Check sensor is exporting
sudo journalctl -u oisp-sensor | grep OTLP

Deployment: Kafka

Step 1: Deploy Kafka Cluster

Option A: Managed Kafka (Recommended for production)
  • AWS MSK
  • Confluent Cloud
  • Azure Event Hubs
Option B: Self-hosted with Docker Compose
version: '3'
services:
  zookeeper:
    image: confluentinc/cp-zookeeper:7.5.0
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181

  kafka:
    image: confluentinc/cp-kafka:7.5.0
    depends_on:
      - zookeeper
    ports:
      - "9092:9092"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka.example.com:9092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
Create topic:
docker exec -it kafka kafka-topics \
  --create \
  --topic oisp-events \
  --bootstrap-server localhost:9092 \
  --partitions 12 \
  --replication-factor 3

Step 2: Configure OISP Sensor

On each server, /etc/oisp/config.toml:
[sensor]
name = "prod-server-01"

[capture]
ssl = true
process = true

[redaction]
mode = "safe"

[export.kafka]
enabled = true
brokers = [
    "kafka1.example.com:9092",
    "kafka2.example.com:9092",
    "kafka3.example.com:9092"
]
topic = "oisp-events"
compression = "snappy"
batch_size = 100
Start sensor:
sudo systemctl restart oisp-sensor

Step 3: Consume Events

Console consumer (testing):
kafka-console-consumer \
  --bootstrap-server kafka.example.com:9092 \
  --topic oisp-events \
  --from-beginning
Production consumer (Python example):
from kafka import KafkaConsumer
import json

consumer = KafkaConsumer(
    'oisp-events',
    bootstrap_servers=['kafka.example.com:9092'],
    value_deserializer=lambda m: json.loads(m.decode('utf-8'))
)

for message in consumer:
    event = message.value
    print(f"[{event['sensor_name']}] {event['event_type']}: {event['data']['provider']}")

Deployment: File-Based with Fluent Bit

Step 1: Configure OISP Sensor to Write Files

On each server:
[sensor]
name = "prod-server-01"

[capture]
ssl = true
process = true

[redaction]
mode = "safe"

[export.jsonl]
enabled = true
path = "/var/log/oisp/events.jsonl"

Step 2: Install Fluent Bit on Each Server

curl https://raw.githubusercontent.com/fluent/fluent-bit/master/install.sh | sh
Create /etc/fluent-bit/fluent-bit.conf:
[SERVICE]
    Flush        5
    Daemon       Off
    Log_Level    info

[INPUT]
    Name         tail
    Path         /var/log/oisp/events.jsonl
    Parser       json
    Tag          oisp.events

[OUTPUT]
    Name         s3
    Match        *
    bucket       my-oisp-events
    region       us-east-1
    store_dir    /tmp/fluent-bit
    total_file_size 100M
    upload_timeout  1m
    s3_key_format /year=%Y/month=%m/day=%d/hour=%H/$TAG[1].$UUID.json
Start Fluent Bit:
sudo systemctl enable fluent-bit
sudo systemctl start fluent-bit

Node Identification

Ensure each sensor has a unique identifier:
[sensor]
name = "prod-server-01"  # Unique per node

[export.otlp]
headers = {
  "x-hostname" = "server01.example.com",
  "x-datacenter" = "us-east-1",
  "x-environment" = "production"
}
Events will include:
{
  "sensor_name": "prod-server-01",
  "sensor_version": "0.2.0",
  "host": {
    "hostname": "server01.example.com",
    "datacenter": "us-east-1"
  }
}

Health Monitoring

Sensor Health Checks

On each sensor node:
# Check service status
sudo systemctl status oisp-sensor

# Check recent logs
sudo journalctl -u oisp-sensor -n 50

# Check event rate
sudo journalctl -u oisp-sensor --since "1 hour ago" | grep "Exported"
Automated health check script:
#!/bin/bash
# /usr/local/bin/oisp-healthcheck.sh

# Check if service is running
systemctl is-active --quiet oisp-sensor || exit 1

# Check if events were exported in last 5 minutes
if ! journalctl -u oisp-sensor --since "5 minutes ago" | grep -q "Exported"; then
  echo "No events exported in last 5 minutes"
  exit 1
fi

exit 0

Collector Health Checks

For OTel Collector:
# Check Prometheus metrics
curl http://localhost:8889/metrics | grep otelcol

# Key metrics:
# - otelcol_receiver_accepted_spans - Events received
# - otelcol_exporter_sent_spans - Events exported
# - otelcol_processor_batch_batch_send_size - Batch sizes
For Kafka:
# Check consumer lag
kafka-consumer-groups \
  --bootstrap-server kafka:9092 \
  --describe \
  --group oisp-consumers

Security Considerations

Network Security

1. TLS encryption for OTLP:
[export.otlp]
endpoint = "https://collector.example.com:4317"
tls_cert = "/etc/oisp/certs/client.crt"
tls_key = "/etc/oisp/certs/client.key"
tls_ca = "/etc/oisp/certs/ca.crt"
2. Authentication headers:
[export.otlp]
headers = {
  "Authorization" = "Bearer ${OISP_TOKEN}"
}
3. Firewall rules:
# Allow outbound to collector only
sudo ufw allow out to collector.example.com port 4317

Data Security

1. Redaction mode:
[redaction]
mode = "safe"  # Redacts PII, API keys, etc.
2. Field-level redaction:
[redaction]
mode = "custom"
custom_patterns = [
    { pattern = "email", replacement = "[EMAIL]" },
    { pattern = "ssn", replacement = "[SSN]" }
]
3. Encryption at rest: Use encrypted storage backends (S3 with KMS, encrypted Kafka topics).

Scaling Considerations

Horizontal Scaling

OISP Sensor:
  • Scales linearly with servers
  • Each sensor is independent
  • No coordination required
OpenTelemetry Collector:
  • Deploy multiple collectors behind load balancer
  • Use sticky sessions for trace correlation
Kafka:
  • Add partitions to scale throughput
  • Add brokers to scale storage

Performance Tuning

Sensor batching:
[export.otlp]
batch_size = 100  # Export every 100 events
batch_timeout = "10s"  # Or every 10 seconds
Collector batching:
processors:
  batch:
    timeout: 10s
    send_batch_size: 1000
Network tuning:
[export.otlp]
max_retries = 3
retry_backoff = "5s"
timeout = "30s"

Example Topologies

Small Deployment (5-10 servers)

Servers (5-10) → Single OTel Collector → File/S3
Simple, reliable, cost-effective.

Medium Deployment (10-50 servers)

Servers (10-50) → OTel Collector Cluster (3) → Kafka → Consumers
Adds durability, allows multiple consumers.

Large Deployment (50+ servers)

Servers (50+) → Regional OTel Collectors → Kafka Cluster → Stream Processing → Data Lake
Multi-region, high availability, complex analytics.

Troubleshooting

Events not reaching collector

1. Check sensor logs:
sudo journalctl -u oisp-sensor | grep -i error
2. Test collector connectivity:
curl -v http://collector.example.com:4317
3. Check collector logs:
sudo journalctl -u otel-collector | grep -i error

High latency

1. Check batch sizes:
  • Increase batch size to reduce network calls
  • Decrease batch timeout for lower latency
2. Check network:
ping collector.example.com
traceroute collector.example.com

Data loss

1. Enable durable queue: For Kafka, ensure replication factor ≥ 3. For OTel Collector, enable persistent queue:
exporters:
  otlp:
    sending_queue:
      enabled: true
      num_consumers: 10
      queue_size: 5000
      storage: file_storage

extensions:
  file_storage:
    directory: /var/lib/otelcol/queue

Next Steps