Scheduler

Rockfish Detect can run as a daemon with automated scheduling for continuous anomaly detection.

Running as Daemon

# Start scheduler
rockfish_detect -c config.yaml run

# Run immediately without waiting
rockfish_detect -c config.yaml run --run-now

The scheduler runs two daily jobs:

Sample job - Sample new flow data
Train job - Retrain models with new samples

Schedule Configuration

sampling:
  sample_hour: 0          # UTC hour (0-23)
  sample_minute: 30       # Optional; random if not set

training:
  train_hour: 1           # UTC hour (0-23)
  train_minute: 0         # Optional; random if not set

Random Minutes

If sample_minute or train_minute is not set, a random minute (0-59) is selected at startup. This prevents multiple instances from running concurrently.

Example Schedule

# Sample at 00:30 UTC, train at 01:00 UTC
sampling:
  sample_hour: 0
  sample_minute: 30

training:
  train_hour: 1
  train_minute: 0

Timeline:

00:30 UTC - Sample yesterday's flow data
01:00 UTC - Retrain models with updated samples

Systemd Service

Create /etc/systemd/system/rockfish-detect.service:

[Unit]
Description=Rockfish Detect ML Service
After=network.target

[Service]
Type=simple
User=rockfish
ExecStart=/usr/local/bin/rockfish_detect -c /etc/rockfish/detect.yaml run
Restart=on-failure
RestartSec=10
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl daemon-reload
sudo systemctl enable rockfish-detect
sudo systemctl start rockfish-detect

# Check status
sudo systemctl status rockfish-detect

# View logs
sudo journalctl -u rockfish-detect -f

Docker Deployment

# Pull the image
docker pull rockfishnetworks/toolkit:latest

# Run the scheduler
docker run -d \
  --name rockfish-detect \
  -v /path/to/config.yaml:/etc/rockfish/config.yaml \
  -v /path/to/license.json:/etc/rockfish/license.json \
  -e AWS_ACCESS_KEY_ID=xxx \
  -e AWS_SECRET_ACCESS_KEY=xxx \
  rockfishnetworks/toolkit:latest \
  rockfish_detect -c /etc/rockfish/config.yaml run

Graceful Shutdown

The scheduler handles SIGTERM/SIGINT for graceful shutdown:

Stops accepting new jobs
Waits for running jobs to complete
Saves state
Exits cleanly

# Graceful stop
sudo systemctl stop rockfish-detect

# Or with kill
kill -TERM $(pgrep rockfish_detect)

State Management

The scheduler maintains state to avoid redundant work:

Sample State

Tracks which dates have been sampled:

s3://<bucket>/<observation>/sample/.state.json

Skip already-sampled dates on restart.

Score State

Tracks last scored timestamp:

s3://<bucket>/<observation>/score/.state.json

Resume scoring from last checkpoint.

Reset State

# Clear sample state
rockfish_detect -c config.yaml sample --clear

# Force rescore
rockfish_detect -c config.yaml score --since 2025-01-01T00:00:00Z

Monitoring

Log Output

logging:
  level: info
  file: /var/log/rockfish/detect.log

Log levels:

error - Errors only
warn - Warnings and errors
info - Normal operation (default)
debug - Detailed operation
trace - Very verbose

Health Check

# Validate configuration
rockfish_detect -c config.yaml validate

# Test S3 connectivity
rockfish_detect -c config.yaml test-s3

# Check license
rockfish_detect -c config.yaml license

Metrics to Monitor

Metric	Description
Sample job duration	Time to complete sampling
Train job duration	Time to complete training
Flows sampled	Number of flows per sample run
Anomalies detected	High-severity anomalies per day
S3 errors	Failed S3 operations

Multi-Instance Deployment

For high availability or distributed processing:

Separate Responsibilities

# Instance 1: Sampling and training
rockfish_detect -c config-train.yaml run

# Instance 2: Scoring only
rockfish_detect -c config-score.yaml score --continuous

Shared State

All instances read/write to the same S3 bucket. State files prevent duplicate work.

Protocol Distribution

# Instance 1: TCP
rockfish_detect -c config.yaml run -p tcp

# Instance 2: UDP
rockfish_detect -c config.yaml run -p udp

Troubleshooting

Job Not Running

Check system time (UTC)
Verify schedule configuration
Check logs for errors

Job Failing

# Run manually with verbose output
rockfish_detect -c config.yaml -vv auto

High Memory Usage

Reduce sample_percent
Process protocols sequentially
Limit sample_days

Slow Jobs

Enable parallel_protocols: true
Use faster S3 storage
Increase hardware resources

Keyboard shortcuts

Rockfish Documentation