Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Configuration Reference

Rockfish Detect uses YAML configuration files.

rockfish_detect -c /path/to/config.yaml [command]

Configuration Sections


License

license:
  path: /etc/rockfish/license.json
  observation: flows
OptionTypeRequiredDescription
pathstringNoLicense file path (auto-searches if not set)
observationstringYesS3 prefix / observation domain

S3

s3:
  bucket: my-flow-bucket
  region: us-east-1
  endpoint: https://s3.example.com
  hive_partitioning: true
  http_retries: 10
  http_retry_wait_ms: 2000
  http_retry_backoff: 2.0
OptionTypeDefaultDescription
bucketstring(required)S3 bucket name
regionstring(required)AWS region
endpointstring-Custom endpoint (MinIO, etc.)
hive_partitioningbooltrueMatch rockfish_probe structure
http_retriesint10Retry count for S3 operations
http_retry_wait_msint2000Base wait between retries
http_retry_backofffloat2.0Exponential backoff multiplier

S3 Data Structure

Expected path structure (from rockfish_probe):

s3://<bucket>/<observation>/v2/year=YYYY/month=MM/day=DD/*.parquet

Sampling

sampling:
  sample_percent: 10.0
  retention_days: 7
  sample_hour: 0
  sample_minute: 30
  output_prefix: flows/sample
OptionTypeDefaultDescription
sample_percentfloat10.0Percentage of rows to sample (0-100)
retention_daysint7Rolling window retention
sample_hourint0UTC hour for scheduled sampling
sample_minuteintrandomMinute for scheduled sampling
output_prefixstring<obs>/sample/S3 output prefix

Features

Configure feature engineering (normalization tables).

features:
  num_bins: 10
  histogram_type: quantile
  ip_hash_modulus: 65536
  sample_days: 7
OptionTypeDefaultDescription
num_binsint10Histogram bins for numeric features
histogram_typestringquantilequantile or equal_width
ip_hash_modulusint65536Dimensionality reduction for IPs
sample_daysint7Days of samples to process

Histogram Types

TypeDescriptionBest For
quantileEqual sample count per binSkewed distributions
equal_widthEqual value range per binUniform distributions

Training

training:
  enabled: true
  train_hour: 1
  train_minute: 0
  algorithm: hbos
  model_output_dir: /var/lib/rockfish/models
  min_importance_score: 0.7

  hbos:
    num_bins: 10
    fields:
      - dur
      - rtt
      - pcr
      - spkts
      - dpkts
      - sbytes
      - dbytes
      - sentropy
      - dentropy

  hybrid:
    hbos_weight: 0.5
    correlation_weight: 0.3
    threat_intel_weight: 0.2
    hbos_filter_percentile: 90.0
    min_observations: 3
OptionTypeDefaultDescription
enabledbooltrueEnable training
train_hourint1UTC hour for scheduled training
train_minuteintrandomMinute for scheduled training
algorithmstringhboshbos, hybrid, random_forest, autoencoder
model_output_dirstring-Directory for trained models
min_importance_scorefloat0.7Threshold for ranked features

HBOS Options

OptionTypeDefaultDescription
num_binsint10Histogram bins
fieldslist-Fields to include in model

Hybrid Options

OptionTypeDefaultDescription
hbos_weightfloat0.5Weight for HBOS score
correlation_weightfloat0.3Weight for fingerprint correlation
threat_intel_weightfloat0.2Weight for threat intel score
hbos_filter_percentilefloat90.0Pre-filter percentile
min_observationsint3Min observations for correlation

Fingerprint

Device/OS fingerprinting via nDPI signatures.

fingerprint:
  enabled: false
  history_days: 7
  client_field: ndpi_ja4
  server_field: ndpi_ja3s
  min_observations: 10
  anomaly_threshold: 0.7
  max_fingerprints_per_host: 5
  detect_suspicious: true
OptionTypeDefaultDescription
enabledboolfalseEnable fingerprinting
history_daysint7Days of history to analyze
client_fieldstringndpi_ja4Field for client fingerprint (JA4 via nDPI)
server_fieldstringndpi_ja3sField for server fingerprint (JA3 via nDPI)
min_observationsint10Minimum observations for baseline
anomaly_thresholdfloat0.7Threshold for anomaly detection
max_fingerprints_per_hostint5Max expected fingerprints
detect_suspiciousbooltrueDetect fingerprint changes

Note: Requires nDPI fingerprint fields in flow data (Professional+ license for probe).


Logging

logging:
  level: info
  file: /var/log/rockfish/detect.log
OptionTypeDefaultDescription
levelstringinfoLog level: error, warn, info, debug, trace
filestring-Log file path (optional)

Other Options

parallel_protocols: true
protocols:
  - tcp
  - udp
  - icmp

duckdb:
  autoload_extensions: false
OptionTypeDefaultDescription
parallel_protocolsbooltrueProcess protocols in parallel
protocolslisttcp, udp, icmpProtocols to process
duckdb.autoload_extensionsboolfalseDuckDB extension autoload

Complete Example

license:
  path: /opt/rockfish/etc/license.json
  observation: sensor-01

s3:
  bucket: flow-data
  region: us-east-1
  hive_partitioning: true

sampling:
  sample_percent: 10.0
  retention_days: 7
  sample_hour: 0

features:
  num_bins: 10
  histogram_type: quantile
  sample_days: 7

training:
  enabled: true
  train_hour: 1
  algorithm: hybrid
  model_output_dir: /var/lib/rockfish/models

  hbos:
    num_bins: 10
    fields:
      - dur
      - rtt
      - pcr
      - spkts
      - dpkts
      - sbytes
      - dbytes

  hybrid:
    hbos_weight: 0.5
    correlation_weight: 0.3
    threat_intel_weight: 0.2

fingerprint:
  enabled: true
  history_days: 7
  min_observations: 10

logging:
  level: info
  file: /var/log/rockfish/detect.log