Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Rockfish Detect Overview

Rockfish Detect is the ML training and anomaly detection service for the Rockfish platform. It provides a complete pipeline for building models from network flow data and scoring flows for anomalies.

Note: Rockfish Detect requires an Enterprise tier license.

Features

  • Data Sampling - Random sampling from S3-stored Parquet files
  • Feature Engineering - Build normalization tables for ML training
  • Feature Ranking - Identify most significant fields for detection
  • Model Training - Train anomaly detection models (HBOS, Hybrid)
  • Flow Scoring - Score flows using trained models
  • Device Fingerprinting - Passive OS/device detection via nDPI fingerprints
  • Automated Scheduling - Run as daemon with daily training cycles

Architecture

Network Traffic
    |
    v
Parquet Files in S3 (from rockfish_probe)
    |
    v
+------------------------------------------+
|   rockfish_detect                        |
+------------------------------------------+
| Sampler                                  |
|   - Queries S3 with DuckDB               |
|   - Random sampling                      |
|   - Output: sample/*.parquet             |
+------------------------------------------+
| Feature Engineer                         |
|   - Build normalization tables           |
|   - Histogram binning + frequency        |
|   - Output: extract/*.parquet            |
+------------------------------------------+
| Feature Ranker                           |
|   - Importance scoring                   |
|   - Output: rockfish_rank.parquet        |
+------------------------------------------+
| Model Trainer (HBOS/Hybrid)              |
|   - Train on sampled data                |
|   - Output: models/*.json                |
+------------------------------------------+
| Flow Scorer                              |
|   - Score flows using trained models     |
|   - Output: score/*.parquet              |
+------------------------------------------+
    |
    v
Anomaly Scores --> rockfish_mcp --> Alerts

Algorithms

AlgorithmTypeDescription
HBOSUnsupervisedHistogram-Based Outlier Score - fast, interpretable
HybridCombinedHBOS + fingerprint correlation + threat intelligence
Random ForestSupervisedClassification-based (framework)
AutoencoderNeural NetworkReconstruction error-based (framework)

Use Cases

  1. Unsupervised Anomaly Detection - HBOS identifies statistical outliers
  2. Behavioral Change Detection - Hybrid mode detects unusual fingerprint combinations
  3. Device Profiling - Fingerprinting detects lateral movement
  4. Threat Prioritization - Score-based reporting prioritizes investigations
  5. Network Baselining - Feature ranking identifies important characteristics

Quick Start

# Validate configuration
rockfish_detect -c config.yaml validate

# Run full pipeline for specific date
rockfish_detect -c config.yaml auto --date 2025-01-28

# Start as scheduler daemon
rockfish_detect -c config.yaml run

# Run immediately (don't wait for schedule)
rockfish_detect -c config.yaml run --run-now

Requirements

  • Enterprise tier license
  • S3-compatible storage with flow data from rockfish_probe
  • Multi-core system recommended (uses half available cores)

Next Steps