Parquet Schema
Rockfish exports flow data in Apache Parquet format with IPFIX-compliant field naming. The schema varies by license tier.
Schema by Tier
| Tier | Schema Version | Fields | Key Features |
|---|---|---|---|
| Community | v1 | 44 | Basic flow fields |
| Basic | v1 | 54 | + nDPI detection, GeoIP (country, city, ASN) |
| Professional | v2 | 60 | + GeoIP AS org, nDPI fingerprints |
| Enterprise | v2 | 63+ | + Anomaly scores, ML predictions |
Community Schema (44 Fields)
Basic flow capture with core network fields.
| # | Field | Type | Description |
|---|---|---|---|
| 1 | version | UInt16 | Schema version (1) |
| 2 | flowid | String | Unique flow UUID |
| 3 | obname | String | Observation domain name |
| 4 | stime | Timestamp | Flow start time (UTC) |
| 5 | etime | Timestamp | Flow end time (UTC) |
| 6 | dur | UInt32 | Duration (milliseconds) |
| 7 | rtt | UInt32 | Round-trip time (microseconds) |
| 8 | pcr | Int32 | Producer-consumer ratio |
| 9 | proto | String | Protocol (TCP, UDP, ICMP) |
| 10 | saddr | String | Source IP address |
| 11 | daddr | String | Destination IP address |
| 12 | sport | UInt16 | Source port |
| 13 | dport | UInt16 | Destination port |
| 14 | iflags | String | Initial TCP flags |
| 15 | uflags | String | Union of all TCP flags |
| 16 | stcpseq | UInt32 | Source initial TCP sequence |
| 17 | dtcpseq | UInt32 | Dest initial TCP sequence |
| 18 | svlan | UInt16 | Source VLAN ID |
| 19 | dvlan | UInt16 | Destination VLAN ID |
| 20 | spkts | UInt64 | Source packet count |
| 21 | dpkts | UInt64 | Destination packet count |
| 22 | sbytes | UInt64 | Source byte count |
| 23 | dbytes | UInt64 | Destination byte count |
| 24 | sentropy | UInt8 | Source payload entropy (0-255) |
| 25 | dentropy | UInt8 | Destination payload entropy |
| 26 | ssmallpktcnt | UInt32 | Source small packets (<60 bytes) |
| 27 | dsmallpktcnt | UInt32 | Dest small packets |
| 28 | slargepktcnt | UInt32 | Source large packets (>225 bytes) |
| 29 | dlargepktcnt | UInt32 | Dest large packets |
| 30 | snonemptypktcnt | UInt32 | Source non-empty packets |
| 31 | dnonemptypktcnt | UInt32 | Dest non-empty packets |
| 32 | sfirstnonemptycnt | UInt16 | Source first N non-empty sizes |
| 33 | dfirstnonemptycnt | UInt16 | Dest first N non-empty sizes |
| 34 | smaxpktsize | UInt16 | Source max packet size |
| 35 | dmaxpktsize | UInt16 | Dest max packet size |
| 36 | savgpayload | UInt16 | Source avg payload size |
| 37 | davgpayload | UInt16 | Dest avg payload size |
| 38 | sstdevpayload | UInt16 | Source payload std deviation |
| 39 | dstdevpayload | UInt16 | Dest payload std deviation |
| 40 | spd | String | Small packet direction flags |
| 41 | spdt | String | Small packet direction timing |
| 42 | reason | String | Flow termination reason |
| 43 | smac | String | Source MAC address |
| 44 | dmac | String | Destination MAC address |
Basic Schema (54 Fields)
Community schema + nDPI application detection + GeoIP (country, city, ASN).
GeoIP fields:
| # | Field | Type | Description |
|---|---|---|---|
| 45 | scountry | String | Source country (ISO 3166-1 alpha-2) |
| 46 | dcountry | String | Destination country |
| 47 | scity | String | Source city |
| 48 | dcity | String | Destination city |
| 49 | sasn | UInt32 | Source ASN |
| 50 | dasn | UInt32 | Destination ASN |
nDPI fields:
| # | Field | Type | Description |
|---|---|---|---|
| 51 | ndpi_appid | String | nDPI application ID (e.g., “TLS.YouTube”) |
| 52 | ndpi_category | String | nDPI category (e.g., “Streaming”) |
| 53 | ndpi_risk_score | UInt32 | nDPI cumulative risk score |
| 54 | ndpi_risk_severity | UInt8 | Risk severity (0=none, 1=low, 2=medium, 3=high) |
Professional Schema (60 Fields)
Basic schema + GeoIP AS organization names and nDPI fingerprinting.
Additional GeoIP fields (AS organization):
| # | Field | Type | Description |
|---|---|---|---|
| 55 | sasnorg | String | Source ASN organization |
| 56 | dasnorg | String | Destination ASN organization |
nDPI fingerprint fields:
| # | Field | Type | Description |
|---|---|---|---|
| 57 | ndpi_ja4 | String | JA4 TLS client fingerprint (via nDPI) |
| 58 | ndpi_ja3s | String | JA3 TLS server fingerprint (via nDPI) |
| 59 | ndpi_tcp_fp | String | TCP fingerprint with OS hint (via nDPI) |
| 60 | ndpi_fp | String | nDPI composite fingerprint |
Enterprise Schema (63+ Fields)
Professional schema + anomaly detection and ML predictions.
Anomaly detection fields:
| # | Field | Type | Description |
|---|---|---|---|
| 61 | anomaly_score | Float32 | Anomaly score (0.0 - 1.0) |
| 62 | anomaly_severity | String | Severity (LOW, MEDIUM, HIGH, CRITICAL) |
| 63 | anomaly_factors | String | Contributing factors |
File Naming
| Tier | File Pattern |
|---|---|
| Community | rockfish-v1-YYYYMMDD-HHMMSS.parquet |
| Basic | rockfish-v1-YYYYMMDD-HHMMSS.parquet |
| Professional | rockfish-<observation>-v2-YYYYMMDD-HHMMSS.parquet |
| Enterprise | rockfish-<observation>-v2-YYYYMMDD-HHMMSS.parquet |
S3 Path Structure
With Hive partitioning enabled:
s3://<bucket>/<prefix>/v1/year=YYYY/month=MM/day=DD/*.parquet
s3://<bucket>/<prefix>/v2/year=YYYY/month=MM/day=DD/*.parquet
Field Descriptions
Flow Identification
- flowid: Unique UUID for deduplication and correlation
- obname: Observation domain name (sensor identifier)
Timing
- stime/etime: Timestamps with microsecond precision, UTC
- dur: Duration in milliseconds
- rtt: Estimated TCP round-trip time
Network Addresses
- saddr/daddr: IPv4 or IPv6 addresses as strings
- sport/dport: Port numbers (0 for non-TCP/UDP)
- smac/dmac: MAC addresses in standard notation
Traffic Volumes
- spkts/dpkts: Packet counts per direction
- sbytes/dbytes: Byte counts per direction
- pcr: Producer-consumer ratio:
(sent-recv)/(sent+recv)
TCP Flags
- iflags: Initial TCP flags (SYN, ACK, etc.)
- uflags: Union of all flags seen in flow
Payload Analysis
- sentropy/dentropy: Shannon entropy (0-255)
-
230: Likely encrypted/compressed
- ~140: English text
- Low: Sparse or zero-padded
-
Flow Termination
- reason: Why the flow ended
idle: Idle timeoutactive: Active timeouteof: End of captureend: FIN exchangerst: TCP reset
GeoIP (Professional+)
- scountry/dcountry: ISO 3166-1 alpha-2 codes
- sasn/dasn: Autonomous System Numbers
- sasnorg/dasnorg: AS organization names
nDPI Detection (Basic+)
- ndpi_appid: Application identifier (e.g., “TLS.YouTube”)
- ndpi_category: Category (e.g., “Streaming”)
- ndpi_risk_score: Cumulative risk score
- ndpi_risk_severity: 0=none, 1=low, 2=medium, 3=high
nDPI Fingerprints (Professional+)
- ndpi_ja4: JA4 TLS client fingerprint
- ndpi_ja3s: JA3 TLS server fingerprint
- ndpi_tcp_fp: TCP fingerprint with OS detection hint (format: “fingerprint/os”)
- ndpi_fp: nDPI composite fingerprint for device correlation
Anomaly Detection (Enterprise)
- anomaly_score: 0.0-1.0 indicating how unusual the flow is
- anomaly_severity: Classification based on score percentile
- anomaly_factors: Fields contributing most to the score
Parquet File Metadata
Each file includes custom metadata:
| Key | Description |
|---|---|
rockfish.license_id | License identifier |
rockfish.tier | License tier |
rockfish.company | Company name |
rockfish.observation | Observation domain |
rockfish.schema_version | Schema version |
Example Queries
DuckDB - Read from S3
SELECT * FROM read_parquet(
's3://bucket/v2/year=2025/month=01/day=28/*.parquet',
hive_partitioning=true
);
Count by Protocol
SELECT proto, COUNT(*) as count
FROM read_parquet('flows/*.parquet')
GROUP BY proto
ORDER BY count DESC;
Filter by Country (Professional+)
SELECT saddr, daddr, scountry, dcountry, ndpi_appid
FROM read_parquet('flows/*.parquet')
WHERE scountry = 'US' AND dcountry != 'US';
High-Risk Flows (Basic+)
SELECT stime, saddr, daddr, ndpi_appid, ndpi_risk_score
FROM read_parquet('flows/*.parquet')
WHERE ndpi_risk_score > 100
ORDER BY ndpi_risk_score DESC;
Anomalous Flows (Enterprise)
SELECT stime, saddr, daddr, anomaly_score, anomaly_severity
FROM read_parquet('flows/*.parquet')
WHERE anomaly_severity IN ('HIGH', 'CRITICAL')
ORDER BY anomaly_score DESC
LIMIT 100;