Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Scheduler

Rockfish Detect can run as a daemon with automated scheduling for continuous anomaly detection.

Running as Daemon

# Start scheduler
rockfish_detect -c config.yaml run

# Run immediately without waiting
rockfish_detect -c config.yaml run --run-now

The scheduler runs two daily jobs:

  1. Sample job - Sample new flow data
  2. Train job - Retrain models with new samples

Schedule Configuration

sampling:
  sample_hour: 0          # UTC hour (0-23)
  sample_minute: 30       # Optional; random if not set

training:
  train_hour: 1           # UTC hour (0-23)
  train_minute: 0         # Optional; random if not set

Random Minutes

If sample_minute or train_minute is not set, a random minute (0-59) is selected at startup. This prevents multiple instances from running concurrently.

Example Schedule

# Sample at 00:30 UTC, train at 01:00 UTC
sampling:
  sample_hour: 0
  sample_minute: 30

training:
  train_hour: 1
  train_minute: 0

Timeline:

00:30 UTC - Sample yesterday's flow data
01:00 UTC - Retrain models with updated samples

Systemd Service

Create /etc/systemd/system/rockfish-detect.service:

[Unit]
Description=Rockfish Detect ML Service
After=network.target

[Service]
Type=simple
User=rockfish
ExecStart=/usr/local/bin/rockfish_detect -c /etc/rockfish/detect.yaml run
Restart=on-failure
RestartSec=10
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl daemon-reload
sudo systemctl enable rockfish-detect
sudo systemctl start rockfish-detect

# Check status
sudo systemctl status rockfish-detect

# View logs
sudo journalctl -u rockfish-detect -f

Docker Deployment

# Pull the image
docker pull rockfishnetworks/toolkit:latest

# Run the scheduler
docker run -d \
  --name rockfish-detect \
  -v /path/to/config.yaml:/etc/rockfish/config.yaml \
  -v /path/to/license.json:/etc/rockfish/license.json \
  -e AWS_ACCESS_KEY_ID=xxx \
  -e AWS_SECRET_ACCESS_KEY=xxx \
  rockfishnetworks/toolkit:latest \
  rockfish_detect -c /etc/rockfish/config.yaml run

Graceful Shutdown

The scheduler handles SIGTERM/SIGINT for graceful shutdown:

  1. Stops accepting new jobs
  2. Waits for running jobs to complete
  3. Saves state
  4. Exits cleanly
# Graceful stop
sudo systemctl stop rockfish-detect

# Or with kill
kill -TERM $(pgrep rockfish_detect)

State Management

The scheduler maintains state to avoid redundant work:

Sample State

Tracks which dates have been sampled:

s3://<bucket>/<observation>/sample/.state.json

Skip already-sampled dates on restart.

Score State

Tracks last scored timestamp:

s3://<bucket>/<observation>/score/.state.json

Resume scoring from last checkpoint.

Reset State

# Clear sample state
rockfish_detect -c config.yaml sample --clear

# Force rescore
rockfish_detect -c config.yaml score --since 2025-01-01T00:00:00Z

Monitoring

Log Output

logging:
  level: info
  file: /var/log/rockfish/detect.log

Log levels:

  • error - Errors only
  • warn - Warnings and errors
  • info - Normal operation (default)
  • debug - Detailed operation
  • trace - Very verbose

Health Check

# Validate configuration
rockfish_detect -c config.yaml validate

# Test S3 connectivity
rockfish_detect -c config.yaml test-s3

# Check license
rockfish_detect -c config.yaml license

Metrics to Monitor

MetricDescription
Sample job durationTime to complete sampling
Train job durationTime to complete training
Flows sampledNumber of flows per sample run
Anomalies detectedHigh-severity anomalies per day
S3 errorsFailed S3 operations

Multi-Instance Deployment

For high availability or distributed processing:

Separate Responsibilities

# Instance 1: Sampling and training
rockfish_detect -c config-train.yaml run

# Instance 2: Scoring only
rockfish_detect -c config-score.yaml score --continuous

Shared State

All instances read/write to the same S3 bucket. State files prevent duplicate work.

Protocol Distribution

# Instance 1: TCP
rockfish_detect -c config.yaml run -p tcp

# Instance 2: UDP
rockfish_detect -c config.yaml run -p udp

Troubleshooting

Job Not Running

  1. Check system time (UTC)
  2. Verify schedule configuration
  3. Check logs for errors

Job Failing

# Run manually with verbose output
rockfish_detect -c config.yaml -vv auto

High Memory Usage

  • Reduce sample_percent
  • Process protocols sequentially
  • Limit sample_days

Slow Jobs

  • Enable parallel_protocols: true
  • Use faster S3 storage
  • Increase hardware resources