dqflow¶
Lightweight, contract-first data quality engine for modern data pipelines.
What is dqflow?¶
dqflow lets you define explicit expectations for your data (schema, validity, freshness) and fail fast when data breaks — before bad data reaches downstream systems.
from dqflow import Contract, Column
contract = Contract(
name="orders",
columns={
"order_id": Column(str, not_null=True),
"amount": Column(float, min=0),
"currency": Column(str, allowed=["USD", "EUR"]),
},
)
result = contract.validate(df)
if not result.ok:
raise Exception(result.summary())
Why dqflow?¶
Data quality issues are inevitable — silent failures are not.
Most teams rely on ad-hoc checks, fragile assertions, or heavyweight frameworks that are hard to maintain. dqflow takes a different approach:
- Contracts over checks — expectations are explicit and versionable
- Pipeline-first — designed for ETL, ELT, and streaming workflows
- Lightweight & Pythonic — minimal API, easy to embed
- Fail fast — break pipelines intentionally, not silently
Features¶
- Contract-as-code (Python & YAML)
- Column-level validations (not null, min/max, allowed values, freshness)
- Table-level rules (row count, null rate, custom expressions)
- Structured validation results (JSON-friendly)
- CLI support
- Pandas engine (PySpark coming soon)