Column¶
The Column class defines expectations for a single column.
Class Definition¶
from dqflow import Column
column = Column(
dtype: type | str,
not_null: bool = False,
min: float | None = None,
max: float | None = None,
allowed: Sequence[Any] | None = None,
freshness_minutes: int | None = None,
description: str = "",
metadata: dict[str, Any] = {},
custom: Callable[[Any], bool] | None = None,
)
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
dtype |
type \| str |
required | Expected data type |
not_null |
bool |
False |
Reject null values |
min |
float \| None |
None |
Minimum value (numeric) |
max |
float \| None |
None |
Maximum value (numeric) |
allowed |
Sequence[Any] \| None |
None |
Allowed values |
freshness_minutes |
int \| None |
None |
Max age for timestamps |
description |
str |
"" |
Human-readable description |
metadata |
dict[str, Any] |
{} |
Custom metadata |
custom |
Callable[[Any], bool] \| None |
None |
Custom validation function |
Supported Types¶
| Type | Description |
|---|---|
str |
String/text |
int |
Integer |
float |
Floating point |
bool |
Boolean |
"timestamp" |
Datetime |
Examples¶
Basic Column¶
Not Null¶
Numeric Bounds¶
Allowed Values¶
Freshness Check¶
Column("timestamp", freshness_minutes=60) # Within 1 hour
Column("timestamp", freshness_minutes=1440) # Within 24 hours
With Metadata¶
Column(
dtype=str,
not_null=True,
description="Unique customer identifier",
metadata={"pii": True, "source": "crm"},
)
Custom Validation¶
Define custom validation logic with a function that takes a value and returns True if valid:
def is_email(value: str) -> bool:
"""Check if value is a valid email."""
return "@" in str(value) and "." in str(value)
Column(str, custom=is_email)
Using lambda functions:
Column(int, custom=lambda x: x > 0) # Positive numbers only
Column(int, custom=lambda x: x % 2 == 0) # Even numbers only
Column(float, custom=lambda x: 0 <= x <= 1) # Between 0 and 1
More complex validation:
def is_valid_phone(value: str) -> bool:
"""Check if value is a valid phone number."""
import re
return bool(re.match(r'^\+?1?\d{9,15}$', str(value)))
Column(str, custom=is_valid_phone)
Combining with other constraints:
Column(
dtype=float,
not_null=True,
min=0,
max=100,
custom=lambda x: x % 5 == 0, # Must be divisible by 5
description="Score (0-100, increments of 5)"
)
Full Example¶
from dqflow import Column
columns = {
"order_id": Column(str, not_null=True),
"customer_id": Column(str, not_null=True),
"amount": Column(float, min=0, max=100000),
"currency": Column(str, allowed=["USD", "EUR", "GBP"]),
"status": Column(str, allowed=["pending", "shipped", "delivered"]),
"created_at": Column("timestamp", freshness_minutes=1440),
}
Validation Behavior¶
| Constraint | Check |
|---|---|
not_null=True |
Fails if any null/NaN values |
min=X |
Fails if any value < X |
max=X |
Fails if any value > X |
allowed=[...] |
Fails if any value not in list |
freshness_minutes=X |
Fails if max timestamp > X minutes old |
custom=func |
Fails if func(value) returns False for any value |