Introduction
Network Flow Telemetry. Simple. Affordable. AI-Ready.
Rockfish Toolkit captures network flows and writes them directly to your S3 in Apache Parquet format. That’s it. No intermediate databases, no proprietary formats, no vendor lock-in.
Your data. Your privacy. Your control.
Your data is immediately ready for analysis by DuckDB, Spark, Pandas, Python, R, or any tool that reads Parquet - which is virtually every modern data platform.
| Simple | One binary. Capture traffic. Write to S3. Done. |
| Affordable | Enterprise-grade network visibility for less than the price of a grande latte per day. |
| AI-Ready | Structured, queryable data that ML pipelines and AI assistants can consume immediately. |
A Bolt-On Toolkit for SOC AI Readiness
The question “Is your SOC AI-ready?” has become central to modern security operations. Industry consensus is clear: AI readiness starts with SOC Data Foundations - structured, queryable security data that AI systems can actually consume.
The challenge? Traditional security tools generate logs in proprietary formats, scattered across siloed systems. Ripping and replacing your entire security stack isn’t practical.
Rockfish Toolkit is different. Deploy alongside your existing infrastructure to create an AI-ready data layer:
- No replacement required - Add Rockfish to your network without changing existing tools
- Deploy in minutes - Single binary or Docker container, no complex dependencies
- Immediate AI compatibility - Output flows directly to any ML pipeline, SIEM, or AI assistant
- Open data format - Apache Parquet works with DuckDB, Spark, Pandas, and every major analytics platform
- S3-native - Scalable, cost-effective cloud storage
Why Parquet for Network Data?
Rockfish Toolkit captures network flows and exports them as Apache Parquet files - the same columnar format used by data science platforms, ML pipelines, and modern SIEM architectures:
| Benefit | Description |
|---|---|
| Columnar storage | Fast analytical queries on specific fields |
| Schema enforcement | Consistent, typed data for ML models |
| 70-90% compression | Reduced storage costs vs. raw logs |
| Universal compatibility | Works with DuckDB, Spark, Pandas, and AI frameworks |
| S3-native | Scalable, cost-effective cloud storage |
This architecture enables security teams to add AI capabilities without rebuilding their entire SOC.
Why S3 Changes Everything
S3—and object storage generally—fundamentally changes what’s possible in cybersecurity by decoupling data collection from data analysis.
Traditional architectures force a painful tradeoff: either store everything and pay for expensive hot storage, or age out logs and lose forensic depth. S3 eliminates this with virtually unlimited, cheap, durable storage that can hold years of netflow, DNS logs, endpoint telemetry, and packet captures in columnar formats like Parquet.
This unlocks data science at scale:
- Train anomaly detection models on months of baseline behavior
- Run retrospective threat hunts when new IOCs emerge
- Feed AI-driven SOC tools with the volume of data they need to learn patterns rather than just match signatures
You own your data:
The hive-partitioned, schema-on-read model means you’re not locked into a SIEM vendor’s data model. Your data lives in open formats, queryable by any tool—Athena, Spark, DuckDB, Pandas, or a custom Rust binary polling for new files.
When storage is cheap and permanent, detection becomes a software problem rather than a retention policy negotiation—and that shifts the advantage back to defenders.
What Rockfish Provides
| Capability | Description |
|---|---|
| Network Flow Capture | High-performance packet capture with flow generation |
| Protocol Detection | Application-level protocol identification via nDPI |
| Device Fingerprinting | TLS/TCP fingerprints via nDPI for device identification |
| Threat Intelligence | IP reputation and risk scoring |
| Anomaly Detection | ML-based detection for enterprise deployments |
| MCP Integration | Query flows directly from AI assistants via Model Context Protocol |
Use Cases
Rockfish Toolkit provides network visibility and AI-ready telemetry across diverse environments:
| Environment | Use Case |
|---|---|
| Security Operations (SOC) | Threat detection, incident response, network forensics, AI-assisted investigation |
| IoT Networks | Device inventory, behavioral baselining, anomaly detection for connected devices |
| Industrial / Manufacturing | OT network monitoring, detecting unauthorized communications, compliance auditing |
| Robotics & Automation | Fleet communication analysis, identifying misconfigurations, performance monitoring |
| Healthcare | Medical device tracking, HIPAA compliance, detecting data exfiltration |
| SMB / Branch Offices | Affordable network visibility without enterprise SIEM costs |
| MSPs / MSSPs | Multi-tenant flow collection, centralized threat analysis across customers |
| Research & Education | Network traffic analysis, security research, ML model development |
Components
| Component | Description |
|---|---|
| rockfish_probe | Flow meter - captures packets and generates flow records |
| rockfish_mcp | MCP query server - SQL queries on Parquet files via DuckDB (Coming March 2025) |
| rockfish_detect | ML training and anomaly detection (Enterprise) |
| rockfish_intel | Threat intelligence caching server |
Data Pipeline
Network Traffic
|
v
rockfish_probe --> Parquet Files --> S3
|
v
rockfish_mcp (DuckDB queries)
|
v
AI Assistants / SIEM / Analytics
Parquet Schema by Tier
Rockfish outputs flow data in Apache Parquet format. The schema varies by license tier:
| Tier | Fields | Key Data |
|---|---|---|
| Community | 44 | 5-tuple, timing, traffic volumes, TCP flags, payload entropy |
| Basic | 54 | + nDPI application detection, GeoIP (country, city, ASN) |
| Professional | 60 | + GeoIP AS org, nDPI fingerprints |
| Enterprise | 63+ | + Anomaly scores, severity classification |
Key Fields
All tiers include:
saddr,daddr- Source/destination IP addressessport,dport- Source/destination portsproto- Protocol (TCP, UDP, ICMP)spkts,dpkts,sbytes,dbytes- Traffic volumesdur,rtt- Duration and round-trip timesentropy,dentropy- Payload entropy (encrypted traffic detection)
Basic+ adds:
scountry,dcountry- Geographic country codesscity,dcity- Geographic city namessasn,dasn- Autonomous System Numbersndpi_appid- Application identifier (e.g., “TLS.YouTube”)ndpi_risk_score- Risk scoring
Professional+ adds:
sasnorg,dasnorg- AS organization namesndpi_ja4,ndpi_ja3s- TLS fingerprints for device identificationndpi_tcp_fp- TCP fingerprint with OS detection hintndpi_fp- nDPI composite fingerprint
Enterprise adds:
anomaly_score- ML-derived anomaly score (0.0-1.0)anomaly_severity- Classification (LOW, MEDIUM, HIGH, CRITICAL)
See Parquet Schema for complete field reference.
License Tiers
| Tier | Features |
|---|---|
| Community | Basic schema (44 fields), S3 upload |
| Basic | + nDPI labels, GeoIP (country, city, ASN), custom observation name (54 fields) |
| Professional | + GeoIP AS org, nDPI fingerprints (60 fields) |
| Enterprise | + ML models, anomaly detection |
See License Tiers for detailed comparison.
Getting Started
- Installation - Install from download portal
- Quick Start - Capture your first flows
- Licensing - Activate your license
Support
- Email: [email protected]
- Download Portal: download.rockfishnetworks.com