Scheduler
Rockfish Detect can run as a daemon with automated scheduling for continuous anomaly detection.
Running as Daemon
# Start scheduler
rockfish_detect -c config.yaml run
# Run immediately without waiting
rockfish_detect -c config.yaml run --run-now
The scheduler runs two daily jobs:
- Sample job - Sample new flow data
- Train job - Retrain models with new samples
Schedule Configuration
sampling:
sample_hour: 0 # UTC hour (0-23)
sample_minute: 30 # Optional; random if not set
training:
train_hour: 1 # UTC hour (0-23)
train_minute: 0 # Optional; random if not set
Random Minutes
If sample_minute or train_minute is not set, a random minute (0-59) is selected at startup. This prevents multiple instances from running concurrently.
Example Schedule
# Sample at 00:30 UTC, train at 01:00 UTC
sampling:
sample_hour: 0
sample_minute: 30
training:
train_hour: 1
train_minute: 0
Timeline:
00:30 UTC - Sample yesterday's flow data
01:00 UTC - Retrain models with updated samples
Systemd Service
Create /etc/systemd/system/rockfish-detect.service:
[Unit]
Description=Rockfish Detect ML Service
After=network.target
[Service]
Type=simple
User=rockfish
ExecStart=/usr/local/bin/rockfish_detect -c /etc/rockfish/detect.yaml run
Restart=on-failure
RestartSec=10
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
Enable and start:
sudo systemctl daemon-reload
sudo systemctl enable rockfish-detect
sudo systemctl start rockfish-detect
# Check status
sudo systemctl status rockfish-detect
# View logs
sudo journalctl -u rockfish-detect -f
Docker Deployment
# Pull the image
docker pull rockfishnetworks/toolkit:latest
# Run the scheduler
docker run -d \
--name rockfish-detect \
-v /path/to/config.yaml:/etc/rockfish/config.yaml \
-v /path/to/license.json:/etc/rockfish/license.json \
-e AWS_ACCESS_KEY_ID=xxx \
-e AWS_SECRET_ACCESS_KEY=xxx \
rockfishnetworks/toolkit:latest \
rockfish_detect -c /etc/rockfish/config.yaml run
Graceful Shutdown
The scheduler handles SIGTERM/SIGINT for graceful shutdown:
- Stops accepting new jobs
- Waits for running jobs to complete
- Saves state
- Exits cleanly
# Graceful stop
sudo systemctl stop rockfish-detect
# Or with kill
kill -TERM $(pgrep rockfish_detect)
State Management
The scheduler maintains state to avoid redundant work:
Sample State
Tracks which dates have been sampled:
s3://<bucket>/<observation>/sample/.state.json
Skip already-sampled dates on restart.
Score State
Tracks last scored timestamp:
s3://<bucket>/<observation>/score/.state.json
Resume scoring from last checkpoint.
Reset State
# Clear sample state
rockfish_detect -c config.yaml sample --clear
# Force rescore
rockfish_detect -c config.yaml score --since 2025-01-01T00:00:00Z
Monitoring
Log Output
logging:
level: info
file: /var/log/rockfish/detect.log
Log levels:
error- Errors onlywarn- Warnings and errorsinfo- Normal operation (default)debug- Detailed operationtrace- Very verbose
Health Check
# Validate configuration
rockfish_detect -c config.yaml validate
# Test S3 connectivity
rockfish_detect -c config.yaml test-s3
# Check license
rockfish_detect -c config.yaml license
Metrics to Monitor
| Metric | Description |
|---|---|
| Sample job duration | Time to complete sampling |
| Train job duration | Time to complete training |
| Flows sampled | Number of flows per sample run |
| Anomalies detected | High-severity anomalies per day |
| S3 errors | Failed S3 operations |
Multi-Instance Deployment
For high availability or distributed processing:
Separate Responsibilities
# Instance 1: Sampling and training
rockfish_detect -c config-train.yaml run
# Instance 2: Scoring only
rockfish_detect -c config-score.yaml score --continuous
Shared State
All instances read/write to the same S3 bucket. State files prevent duplicate work.
Protocol Distribution
# Instance 1: TCP
rockfish_detect -c config.yaml run -p tcp
# Instance 2: UDP
rockfish_detect -c config.yaml run -p udp
Troubleshooting
Job Not Running
- Check system time (UTC)
- Verify schedule configuration
- Check logs for errors
Job Failing
# Run manually with verbose output
rockfish_detect -c config.yaml -vv auto
High Memory Usage
- Reduce
sample_percent - Process protocols sequentially
- Limit
sample_days
Slow Jobs
- Enable
parallel_protocols: true - Use faster S3 storage
- Increase hardware resources