Rockfish Detect uses YAML configuration files.
rockfish_detect -c /path/to/config.yaml [command]
license:
path: /etc/rockfish/license.json
observation: flows
| Option | Type | Required | Description |
path | string | No | License file path (auto-searches if not set) |
observation | string | Yes | S3 prefix / observation domain |
s3:
bucket: my-flow-bucket
region: us-east-1
endpoint: https://s3.example.com
hive_partitioning: true
http_retries: 10
http_retry_wait_ms: 2000
http_retry_backoff: 2.0
| Option | Type | Default | Description |
bucket | string | (required) | S3 bucket name |
region | string | (required) | AWS region |
endpoint | string | - | Custom endpoint (MinIO, etc.) |
hive_partitioning | bool | true | Match rockfish_probe structure |
http_retries | int | 10 | Retry count for S3 operations |
http_retry_wait_ms | int | 2000 | Base wait between retries |
http_retry_backoff | float | 2.0 | Exponential backoff multiplier |
Expected path structure (from rockfish_probe):
s3://<bucket>/<observation>/v2/year=YYYY/month=MM/day=DD/*.parquet
sampling:
sample_percent: 10.0
retention_days: 7
sample_hour: 0
sample_minute: 30
output_prefix: flows/sample
| Option | Type | Default | Description |
sample_percent | float | 10.0 | Percentage of rows to sample (0-100) |
retention_days | int | 7 | Rolling window retention |
sample_hour | int | 0 | UTC hour for scheduled sampling |
sample_minute | int | random | Minute for scheduled sampling |
output_prefix | string | <obs>/sample/ | S3 output prefix |
Configure feature engineering (normalization tables).
features:
num_bins: 10
histogram_type: quantile
ip_hash_modulus: 65536
sample_days: 7
| Option | Type | Default | Description |
num_bins | int | 10 | Histogram bins for numeric features |
histogram_type | string | quantile | quantile or equal_width |
ip_hash_modulus | int | 65536 | Dimensionality reduction for IPs |
sample_days | int | 7 | Days of samples to process |
| Type | Description | Best For |
quantile | Equal sample count per bin | Skewed distributions |
equal_width | Equal value range per bin | Uniform distributions |
training:
enabled: true
train_hour: 1
train_minute: 0
algorithm: hbos
model_output_dir: /var/lib/rockfish/models
min_importance_score: 0.7
hbos:
num_bins: 10
fields:
- dur
- rtt
- pcr
- spkts
- dpkts
- sbytes
- dbytes
- sentropy
- dentropy
hybrid:
hbos_weight: 0.5
correlation_weight: 0.3
threat_intel_weight: 0.2
hbos_filter_percentile: 90.0
min_observations: 3
| Option | Type | Default | Description |
enabled | bool | true | Enable training |
train_hour | int | 1 | UTC hour for scheduled training |
train_minute | int | random | Minute for scheduled training |
algorithm | string | hbos | hbos, hybrid, random_forest, autoencoder |
model_output_dir | string | - | Directory for trained models |
min_importance_score | float | 0.7 | Threshold for ranked features |
| Option | Type | Default | Description |
num_bins | int | 10 | Histogram bins |
fields | list | - | Fields to include in model |
| Option | Type | Default | Description |
hbos_weight | float | 0.5 | Weight for HBOS score |
correlation_weight | float | 0.3 | Weight for fingerprint correlation |
threat_intel_weight | float | 0.2 | Weight for threat intel score |
hbos_filter_percentile | float | 90.0 | Pre-filter percentile |
min_observations | int | 3 | Min observations for correlation |
Device/OS fingerprinting via nDPI signatures.
fingerprint:
enabled: false
history_days: 7
client_field: ndpi_ja4
server_field: ndpi_ja3s
min_observations: 10
anomaly_threshold: 0.7
max_fingerprints_per_host: 5
detect_suspicious: true
| Option | Type | Default | Description |
enabled | bool | false | Enable fingerprinting |
history_days | int | 7 | Days of history to analyze |
client_field | string | ndpi_ja4 | Field for client fingerprint (JA4 via nDPI) |
server_field | string | ndpi_ja3s | Field for server fingerprint (JA3 via nDPI) |
min_observations | int | 10 | Minimum observations for baseline |
anomaly_threshold | float | 0.7 | Threshold for anomaly detection |
max_fingerprints_per_host | int | 5 | Max expected fingerprints |
detect_suspicious | bool | true | Detect fingerprint changes |
Note: Requires nDPI fingerprint fields in flow data (Professional+ license for probe).
logging:
level: info
file: /var/log/rockfish/detect.log
| Option | Type | Default | Description |
level | string | info | Log level: error, warn, info, debug, trace |
file | string | - | Log file path (optional) |
parallel_protocols: true
protocols:
- tcp
- udp
- icmp
duckdb:
autoload_extensions: false
| Option | Type | Default | Description |
parallel_protocols | bool | true | Process protocols in parallel |
protocols | list | tcp, udp, icmp | Protocols to process |
duckdb.autoload_extensions | bool | false | DuckDB extension autoload |
license:
path: /opt/rockfish/etc/license.json
observation: sensor-01
s3:
bucket: flow-data
region: us-east-1
hive_partitioning: true
sampling:
sample_percent: 10.0
retention_days: 7
sample_hour: 0
features:
num_bins: 10
histogram_type: quantile
sample_days: 7
training:
enabled: true
train_hour: 1
algorithm: hybrid
model_output_dir: /var/lib/rockfish/models
hbos:
num_bins: 10
fields:
- dur
- rtt
- pcr
- spkts
- dpkts
- sbytes
- dbytes
hybrid:
hbos_weight: 0.5
correlation_weight: 0.3
threat_intel_weight: 0.2
fingerprint:
enabled: true
history_days: 7
min_observations: 10
logging:
level: info
file: /var/log/rockfish/detect.log