Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Quick Start

This guide walks you through capturing network flows and querying them.

1. Capture Flows

From a PCAP File

# Basic capture to Parquet
rockfish_probe -i capture.pcap --parquet-dir ./flows

# With nDPI application labeling
rockfish_probe -i capture.pcap --ndpi --parquet-dir ./flows

Live Capture

# Standard libpcap capture (requires root)
sudo rockfish_probe -i eth0 --live pcap --parquet-dir ./flows

# High-performance AF_PACKET capture (Linux)
sudo rockfish_probe -i eth0 --live afpacket --parquet-dir ./flows

With a Configuration File

# Create config.yaml (see Configuration docs)
rockfish_probe -c config.yaml

2. Verify Output

# Check generated files
ls -la flows/

# View file info with DuckDB
duckdb -c "DESCRIBE SELECT * FROM 'flows/*.parquet'"

3. Query with MCP

Set up the MCP server to query your flows:

# mcp-config.yaml
sources:
  flow:
    path: ./flows/
    description: Network flow data

output:
  default_format: table
  max_rows: 100
# Start MCP server
ROCKFISH_CONFIG=mcp-config.yaml rockfish_mcp

Example Queries

Using the MCP tools:

# Count total flows
count:
  source: flow

# Top talkers by bytes
query:
  source: flow
  sql: |
    SELECT saddr, SUM(sbytes + dbytes) as total_bytes
    FROM {source}
    GROUP BY saddr
    ORDER BY total_bytes DESC
    LIMIT 10

# Filter by protocol
query:
  source: flow
  filter: "proto = 'TCP'"
  limit: 50

4. Upload to S3 (Optional)

Configure S3 upload in your probe config:

output:
  parquet_dir: /var/lib/rockfish/flows

s3:
  bucket: my-flow-data
  region: us-east-1
  hive_partitioning: true
  delete_after_upload: true

Files are automatically uploaded and organized by date:

s3://my-flow-data/year=2025/month=01/day=28/rockfish-*.parquet

Next Steps