Data Loading

Module: equser.data Dependencies: base (numpy, pyarrow)

Load CPOW and PMon Parquet files with automatic scaling and timestamp parsing.

CPOW data

load_cpow_scaled(file_path) -> dict

Load a CPOW Parquet file and return scaled voltage/current arrays.

Handles both formats automatically:

  • v3 int32 (current): raw ADC counts scaled by vscale/iscale from Parquet schema metadata. Topology and neutral-connection status are read from metadata to indicate which channels carry live data.
  • float (legacy): values already in V/A; scaling factors are 1.0.

Returns a dict with:

KeyTypeDescription
tablepa.TableRaw PyArrow Table
VA, VB, VCnp.ndarrayScaled voltage arrays (float64)
IA, IB, IC, INnp.ndarrayScaled current arrays (float64)
vscalefloatVoltage scaling factor applied
iscalefloatCurrent scaling factor applied
start_timedatetime or NoneParsed from metadata
sample_rateintAlways 32000
schema_versionint or NoneSchema version (3 for current files)
topologystr or None"three_phase", "split_phase", or "single_phase"
neutral_connectedbool or NoneTrue if IN channel carries live current data
cycle_start_anp.ndarray (int64)Phase A cycle boundaries (ns epoch); present only if file has this column
cycle_start_bnp.ndarray (int64)Phase B cycle boundaries (ns epoch); present only if file has this column
cycle_start_cnp.ndarray (int64)Phase C cycle boundaries (ns epoch); present only if file has this column

topology indicates which voltage channels carry real waveform data:

ValueActive channels
"three_phase"VA, VB, VC all live
"split_phase"VA live; VB = −VA (reconstructed); VC zero-filled
"single_phase"VA live; VB and VC zero-filled

cycle_start_a/b/c arrays are the same length as the waveform arrays. A non-zero value at index i means sample i is a phase-locked cycle boundary; zero means no boundary. Non-zero values are nanoseconds since epoch — the same time base as the ts column.

from equser.data import load_cpow_scaled
import numpy as np

result = load_cpow_scaled('20250623_075056.parquet')
print(f"Peak voltage A: {result['VA'].max():.1f} V")
print(f"Topology: {result['topology']}")
print(f"Neutral connected: {result['neutral_connected']}")

# Find cycle start indices for phase A (v3 files only)
if 'cycle_start_a' in result:
    cycle_idx = np.nonzero(result['cycle_start_a'])[0]
    print(f"Phase A cycle boundaries: {len(cycle_idx)}")

load_cpow(file_path) -> pa.Table

Load a CPOW Parquet file as a raw PyArrow Table with no scaling applied. Use this when you need the raw integer ADC values or want to handle scaling yourself. v3 files include optional cycle_start_a/b/c columns as nullable INT64.

Constants

ConstantValueDescription
SAMPLE_RATE_HZ32000CPOW sample rate
CHANNELS['VA', 'VB', 'VC', 'IA', 'IB', 'IC', 'IN']Channel names
CYCLE_START_CHANNELS['cycle_start_a', 'cycle_start_b', 'cycle_start_c']Optional v3 cycle-boundary columns
NEUTRAL_CT_RATIO30Neutral CT sensitivity ratio vs. phase CTs

PMon data

load_pmon(file_path) -> pa.Table

Load a PMon Parquet file as a PyArrow Table.

PMon files contain 10/12-cycle RMS measurements (10 cycles for 50 Hz grids, 12 cycles for 60 Hz). Common columns include:

ColumnDescription
time_usTimestamp in microseconds
FREQLine frequency (Hz)
AVRMS, BVRMS, CVRMSPhase RMS voltage
AIRMS, BIRMS, CIRMSPhase RMS current
NIRMSNeutral RMS current
AWATT, BWATT, CWATTPhase active power
from equser.data import load_pmon

table = load_pmon('20250623_0750.parquet')
freq = table.column('FREQ').to_numpy()
print(f"Mean frequency: {freq.mean():.3f} Hz")

Timestamp parsing

parse_start_time(s) -> datetime

Parse an ISO 8601 timestamp string from CPOW metadata. Handles nanosecond precision by truncating to microseconds (Python datetime limit).

from equser.data import parse_start_time

dt = parse_start_time("2025-06-23T07:50:56.123456789Z")
print(dt)  # 2025-06-23 07:50:56.123456

Note: The returned datetime is naive (no timezone info). The Z suffix is stripped and UTC is assumed.

parse_filename_timestamp(filename) -> datetime | None

Extract a timestamp from an EQ data filename pattern.

Supports:

  • YYYYMMDD_HHMM (PMon files)
  • YYYYMMDD_HHMMSS (CPOW files)
from equser.data import parse_filename_timestamp

dt = parse_filename_timestamp("20250623_075056.parquet")
print(dt)  # 2025-06-23 07:50:56


© 2026 EQ Systems Inc.GitHubMIT License
Last updated: May 2, 2026