Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Rockfish Networks

Introduction

Network Flow Telemetry. Simple. Affordable. AI-Ready.

Rockfish Toolkit captures network flows and writes them directly to your S3 in Apache Parquet format. That’s it. No intermediate databases, no proprietary formats, no vendor lock-in.

Your data. Your privacy. Your control.

Your data is immediately ready for analysis by DuckDB, Spark, Pandas, Python, R, or any tool that reads Parquet - which is virtually every modern data platform.

SimpleOne binary. Capture traffic. Write to S3. Done.
AffordableEnterprise-grade network visibility for less than the price of a grande latte per day.
AI-ReadyStructured, queryable data that ML pipelines and AI assistants can consume immediately.

A Bolt-On Toolkit for SOC AI Readiness

The question “Is your SOC AI-ready?” has become central to modern security operations. Industry consensus is clear: AI readiness starts with SOC Data Foundations - structured, queryable security data that AI systems can actually consume.

The challenge? Traditional security tools generate logs in proprietary formats, scattered across siloed systems. Ripping and replacing your entire security stack isn’t practical.

Rockfish Toolkit is different. Deploy alongside your existing infrastructure to create an AI-ready data layer:

  • No replacement required - Add Rockfish to your network without changing existing tools
  • Deploy in minutes - Single binary or Docker container, no complex dependencies
  • Immediate AI compatibility - Output flows directly to any ML pipeline, SIEM, or AI assistant
  • Open data format - Apache Parquet works with DuckDB, Spark, Pandas, and every major analytics platform
  • S3-native - Scalable, cost-effective cloud storage

Why Parquet for Network Data?

Rockfish Toolkit captures network flows and exports them as Apache Parquet files - the same columnar format used by data science platforms, ML pipelines, and modern SIEM architectures:

BenefitDescription
Columnar storageFast analytical queries on specific fields
Schema enforcementConsistent, typed data for ML models
70-90% compressionReduced storage costs vs. raw logs
Universal compatibilityWorks with DuckDB, Spark, Pandas, and AI frameworks
S3-nativeScalable, cost-effective cloud storage

This architecture enables security teams to add AI capabilities without rebuilding their entire SOC.

Why S3 Changes Everything

S3—and object storage generally—fundamentally changes what’s possible in cybersecurity by decoupling data collection from data analysis.

Traditional architectures force a painful tradeoff: either store everything and pay for expensive hot storage, or age out logs and lose forensic depth. S3 eliminates this with virtually unlimited, cheap, durable storage that can hold years of netflow, DNS logs, endpoint telemetry, and packet captures in columnar formats like Parquet.

This unlocks data science at scale:

  • Train anomaly detection models on months of baseline behavior
  • Run retrospective threat hunts when new IOCs emerge
  • Feed AI-driven SOC tools with the volume of data they need to learn patterns rather than just match signatures

You own your data:

The hive-partitioned, schema-on-read model means you’re not locked into a SIEM vendor’s data model. Your data lives in open formats, queryable by any tool—Athena, Spark, DuckDB, Pandas, or a custom Rust binary polling for new files.

When storage is cheap and permanent, detection becomes a software problem rather than a retention policy negotiation—and that shifts the advantage back to defenders.

What Rockfish Provides

CapabilityDescription
Network Flow CaptureHigh-performance packet capture with flow generation
Protocol DetectionApplication-level protocol identification via nDPI
Device FingerprintingTLS/TCP fingerprints via nDPI for device identification
Threat IntelligenceIP reputation and risk scoring
Anomaly DetectionML-based detection for enterprise deployments
MCP IntegrationQuery flows directly from AI assistants via Model Context Protocol

Use Cases

Rockfish Toolkit provides network visibility and AI-ready telemetry across diverse environments:

EnvironmentUse Case
Security Operations (SOC)Threat detection, incident response, network forensics, AI-assisted investigation
IoT NetworksDevice inventory, behavioral baselining, anomaly detection for connected devices
Industrial / ManufacturingOT network monitoring, detecting unauthorized communications, compliance auditing
Robotics & AutomationFleet communication analysis, identifying misconfigurations, performance monitoring
HealthcareMedical device tracking, HIPAA compliance, detecting data exfiltration
SMB / Branch OfficesAffordable network visibility without enterprise SIEM costs
MSPs / MSSPsMulti-tenant flow collection, centralized threat analysis across customers
Research & EducationNetwork traffic analysis, security research, ML model development

Components

ComponentDescription
rockfish_probeFlow meter - captures packets and generates flow records
rockfish_mcpMCP query server - SQL queries on Parquet files via DuckDB (Coming March 2025)
rockfish_detectML training and anomaly detection (Enterprise)
rockfish_intelThreat intelligence caching server

Data Pipeline

Network Traffic
      |
      v
rockfish_probe  -->  Parquet Files  -->  S3
                           |
                           v
                    rockfish_mcp (DuckDB queries)
                           |
                           v
                    AI Assistants / SIEM / Analytics

Parquet Schema by Tier

Rockfish outputs flow data in Apache Parquet format. The schema varies by license tier:

TierFieldsKey Data
Community445-tuple, timing, traffic volumes, TCP flags, payload entropy
Basic54+ nDPI application detection, GeoIP (country, city, ASN)
Professional60+ GeoIP AS org, nDPI fingerprints
Enterprise63++ Anomaly scores, severity classification

Key Fields

All tiers include:

  • saddr, daddr - Source/destination IP addresses
  • sport, dport - Source/destination ports
  • proto - Protocol (TCP, UDP, ICMP)
  • spkts, dpkts, sbytes, dbytes - Traffic volumes
  • dur, rtt - Duration and round-trip time
  • sentropy, dentropy - Payload entropy (encrypted traffic detection)

Basic+ adds:

  • scountry, dcountry - Geographic country codes
  • scity, dcity - Geographic city names
  • sasn, dasn - Autonomous System Numbers
  • ndpi_appid - Application identifier (e.g., “TLS.YouTube”)
  • ndpi_risk_score - Risk scoring

Professional+ adds:

  • sasnorg, dasnorg - AS organization names
  • ndpi_ja4, ndpi_ja3s - TLS fingerprints for device identification
  • ndpi_tcp_fp - TCP fingerprint with OS detection hint
  • ndpi_fp - nDPI composite fingerprint

Enterprise adds:

  • anomaly_score - ML-derived anomaly score (0.0-1.0)
  • anomaly_severity - Classification (LOW, MEDIUM, HIGH, CRITICAL)

See Parquet Schema for complete field reference.

License Tiers

TierFeatures
CommunityBasic schema (44 fields), S3 upload
Basic+ nDPI labels, GeoIP (country, city, ASN), custom observation name (54 fields)
Professional+ GeoIP AS org, nDPI fingerprints (60 fields)
Enterprise+ ML models, anomaly detection

See License Tiers for detailed comparison.

Getting Started

  1. Installation - Install from download portal
  2. Quick Start - Capture your first flows
  3. Licensing - Activate your license

Support