Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Parquet Schema

Rockfish exports flow data in Apache Parquet format with IPFIX-compliant field naming. The schema varies by license tier.

Schema by Tier

TierSchema VersionFieldsKey Features
Communityv144Basic flow fields
Basicv154+ nDPI detection, GeoIP (country, city, ASN)
Professionalv260+ GeoIP AS org, nDPI fingerprints
Enterprisev263++ Anomaly scores, ML predictions

Community Schema (44 Fields)

Basic flow capture with core network fields.

#FieldTypeDescription
1versionUInt16Schema version (1)
2flowidStringUnique flow UUID
3obnameStringObservation domain name
4stimeTimestampFlow start time (UTC)
5etimeTimestampFlow end time (UTC)
6durUInt32Duration (milliseconds)
7rttUInt32Round-trip time (microseconds)
8pcrInt32Producer-consumer ratio
9protoStringProtocol (TCP, UDP, ICMP)
10saddrStringSource IP address
11daddrStringDestination IP address
12sportUInt16Source port
13dportUInt16Destination port
14iflagsStringInitial TCP flags
15uflagsStringUnion of all TCP flags
16stcpseqUInt32Source initial TCP sequence
17dtcpseqUInt32Dest initial TCP sequence
18svlanUInt16Source VLAN ID
19dvlanUInt16Destination VLAN ID
20spktsUInt64Source packet count
21dpktsUInt64Destination packet count
22sbytesUInt64Source byte count
23dbytesUInt64Destination byte count
24sentropyUInt8Source payload entropy (0-255)
25dentropyUInt8Destination payload entropy
26ssmallpktcntUInt32Source small packets (<60 bytes)
27dsmallpktcntUInt32Dest small packets
28slargepktcntUInt32Source large packets (>225 bytes)
29dlargepktcntUInt32Dest large packets
30snonemptypktcntUInt32Source non-empty packets
31dnonemptypktcntUInt32Dest non-empty packets
32sfirstnonemptycntUInt16Source first N non-empty sizes
33dfirstnonemptycntUInt16Dest first N non-empty sizes
34smaxpktsizeUInt16Source max packet size
35dmaxpktsizeUInt16Dest max packet size
36savgpayloadUInt16Source avg payload size
37davgpayloadUInt16Dest avg payload size
38sstdevpayloadUInt16Source payload std deviation
39dstdevpayloadUInt16Dest payload std deviation
40spdStringSmall packet direction flags
41spdtStringSmall packet direction timing
42reasonStringFlow termination reason
43smacStringSource MAC address
44dmacStringDestination MAC address

Basic Schema (54 Fields)

Community schema + nDPI application detection + GeoIP (country, city, ASN).

GeoIP fields:

#FieldTypeDescription
45scountryStringSource country (ISO 3166-1 alpha-2)
46dcountryStringDestination country
47scityStringSource city
48dcityStringDestination city
49sasnUInt32Source ASN
50dasnUInt32Destination ASN

nDPI fields:

#FieldTypeDescription
51ndpi_appidStringnDPI application ID (e.g., “TLS.YouTube”)
52ndpi_categoryStringnDPI category (e.g., “Streaming”)
53ndpi_risk_scoreUInt32nDPI cumulative risk score
54ndpi_risk_severityUInt8Risk severity (0=none, 1=low, 2=medium, 3=high)

Professional Schema (60 Fields)

Basic schema + GeoIP AS organization names and nDPI fingerprinting.

Additional GeoIP fields (AS organization):

#FieldTypeDescription
55sasnorgStringSource ASN organization
56dasnorgStringDestination ASN organization

nDPI fingerprint fields:

#FieldTypeDescription
57ndpi_ja4StringJA4 TLS client fingerprint (via nDPI)
58ndpi_ja3sStringJA3 TLS server fingerprint (via nDPI)
59ndpi_tcp_fpStringTCP fingerprint with OS hint (via nDPI)
60ndpi_fpStringnDPI composite fingerprint

Enterprise Schema (63+ Fields)

Professional schema + anomaly detection and ML predictions.

Anomaly detection fields:

#FieldTypeDescription
61anomaly_scoreFloat32Anomaly score (0.0 - 1.0)
62anomaly_severityStringSeverity (LOW, MEDIUM, HIGH, CRITICAL)
63anomaly_factorsStringContributing factors

File Naming

TierFile Pattern
Communityrockfish-v1-YYYYMMDD-HHMMSS.parquet
Basicrockfish-v1-YYYYMMDD-HHMMSS.parquet
Professionalrockfish-<observation>-v2-YYYYMMDD-HHMMSS.parquet
Enterpriserockfish-<observation>-v2-YYYYMMDD-HHMMSS.parquet

S3 Path Structure

With Hive partitioning enabled:

s3://<bucket>/<prefix>/v1/year=YYYY/month=MM/day=DD/*.parquet
s3://<bucket>/<prefix>/v2/year=YYYY/month=MM/day=DD/*.parquet

Field Descriptions

Flow Identification

  • flowid: Unique UUID for deduplication and correlation
  • obname: Observation domain name (sensor identifier)

Timing

  • stime/etime: Timestamps with microsecond precision, UTC
  • dur: Duration in milliseconds
  • rtt: Estimated TCP round-trip time

Network Addresses

  • saddr/daddr: IPv4 or IPv6 addresses as strings
  • sport/dport: Port numbers (0 for non-TCP/UDP)
  • smac/dmac: MAC addresses in standard notation

Traffic Volumes

  • spkts/dpkts: Packet counts per direction
  • sbytes/dbytes: Byte counts per direction
  • pcr: Producer-consumer ratio: (sent-recv)/(sent+recv)

TCP Flags

  • iflags: Initial TCP flags (SYN, ACK, etc.)
  • uflags: Union of all flags seen in flow

Payload Analysis

  • sentropy/dentropy: Shannon entropy (0-255)
    • 230: Likely encrypted/compressed

    • ~140: English text
    • Low: Sparse or zero-padded

Flow Termination

  • reason: Why the flow ended
    • idle: Idle timeout
    • active: Active timeout
    • eof: End of capture
    • end: FIN exchange
    • rst: TCP reset

GeoIP (Professional+)

  • scountry/dcountry: ISO 3166-1 alpha-2 codes
  • sasn/dasn: Autonomous System Numbers
  • sasnorg/dasnorg: AS organization names

nDPI Detection (Basic+)

  • ndpi_appid: Application identifier (e.g., “TLS.YouTube”)
  • ndpi_category: Category (e.g., “Streaming”)
  • ndpi_risk_score: Cumulative risk score
  • ndpi_risk_severity: 0=none, 1=low, 2=medium, 3=high

nDPI Fingerprints (Professional+)

  • ndpi_ja4: JA4 TLS client fingerprint
  • ndpi_ja3s: JA3 TLS server fingerprint
  • ndpi_tcp_fp: TCP fingerprint with OS detection hint (format: “fingerprint/os”)
  • ndpi_fp: nDPI composite fingerprint for device correlation

Anomaly Detection (Enterprise)

  • anomaly_score: 0.0-1.0 indicating how unusual the flow is
  • anomaly_severity: Classification based on score percentile
  • anomaly_factors: Fields contributing most to the score

Parquet File Metadata

Each file includes custom metadata:

KeyDescription
rockfish.license_idLicense identifier
rockfish.tierLicense tier
rockfish.companyCompany name
rockfish.observationObservation domain
rockfish.schema_versionSchema version

Example Queries

DuckDB - Read from S3

SELECT * FROM read_parquet(
    's3://bucket/v2/year=2025/month=01/day=28/*.parquet',
    hive_partitioning=true
);

Count by Protocol

SELECT proto, COUNT(*) as count
FROM read_parquet('flows/*.parquet')
GROUP BY proto
ORDER BY count DESC;

Filter by Country (Professional+)

SELECT saddr, daddr, scountry, dcountry, ndpi_appid
FROM read_parquet('flows/*.parquet')
WHERE scountry = 'US' AND dcountry != 'US';

High-Risk Flows (Basic+)

SELECT stime, saddr, daddr, ndpi_appid, ndpi_risk_score
FROM read_parquet('flows/*.parquet')
WHERE ndpi_risk_score > 100
ORDER BY ndpi_risk_score DESC;

Anomalous Flows (Enterprise)

SELECT stime, saddr, daddr, anomaly_score, anomaly_severity
FROM read_parquet('flows/*.parquet')
WHERE anomaly_severity IN ('HIGH', 'CRITICAL')
ORDER BY anomaly_score DESC
LIMIT 100;