Rockfish Detect Overview
Rockfish Detect is the ML training and anomaly detection service for the Rockfish platform. It provides a complete pipeline for building models from network flow data and scoring flows for anomalies.
Note: Rockfish Detect requires an Enterprise tier license.
Features
- Data Sampling - Random sampling from S3-stored Parquet files
- Feature Engineering - Build normalization tables for ML training
- Feature Ranking - Identify most significant fields for detection
- Model Training - Train anomaly detection models (HBOS, Hybrid)
- Flow Scoring - Score flows using trained models
- Device Fingerprinting - Passive OS/device detection via nDPI fingerprints
- Automated Scheduling - Run as daemon with daily training cycles
Architecture
Network Traffic
|
v
Parquet Files in S3 (from rockfish_probe)
|
v
+------------------------------------------+
| rockfish_detect |
+------------------------------------------+
| Sampler |
| - Queries S3 with DuckDB |
| - Random sampling |
| - Output: sample/*.parquet |
+------------------------------------------+
| Feature Engineer |
| - Build normalization tables |
| - Histogram binning + frequency |
| - Output: extract/*.parquet |
+------------------------------------------+
| Feature Ranker |
| - Importance scoring |
| - Output: rockfish_rank.parquet |
+------------------------------------------+
| Model Trainer (HBOS/Hybrid) |
| - Train on sampled data |
| - Output: models/*.json |
+------------------------------------------+
| Flow Scorer |
| - Score flows using trained models |
| - Output: score/*.parquet |
+------------------------------------------+
|
v
Anomaly Scores --> rockfish_mcp --> Alerts
Algorithms
| Algorithm | Type | Description |
|---|---|---|
| HBOS | Unsupervised | Histogram-Based Outlier Score - fast, interpretable |
| Hybrid | Combined | HBOS + fingerprint correlation + threat intelligence |
| Random Forest | Supervised | Classification-based (framework) |
| Autoencoder | Neural Network | Reconstruction error-based (framework) |
Use Cases
- Unsupervised Anomaly Detection - HBOS identifies statistical outliers
- Behavioral Change Detection - Hybrid mode detects unusual fingerprint combinations
- Device Profiling - Fingerprinting detects lateral movement
- Threat Prioritization - Score-based reporting prioritizes investigations
- Network Baselining - Feature ranking identifies important characteristics
Quick Start
# Validate configuration
rockfish_detect -c config.yaml validate
# Run full pipeline for specific date
rockfish_detect -c config.yaml auto --date 2025-01-28
# Start as scheduler daemon
rockfish_detect -c config.yaml run
# Run immediately (don't wait for schedule)
rockfish_detect -c config.yaml run --run-now
Requirements
- Enterprise tier license
- S3-compatible storage with flow data from rockfish_probe
- Multi-core system recommended (uses half available cores)
Next Steps
- Configuration - Set up rockfish_detect
- Data Pipeline - Understand the processing stages
- Anomaly Detection - Configure detection models