dataset_validator
TemporalScope/src/temporalscope/datasets/dataset_validator.py.
This module provides backend-agnostic dataset validation utilities based on research-backed heuristics. Using Narwhals operations, it enables consistent validation across different DataFrame backends while supporting domain-specific requirements through customizable thresholds.
| CLASS | DESCRIPTION |
|---|---|
DatasetValidator |
A validator for ensuring dataset quality using research-backed heuristics. |
ValidationResult |
Container for dataset validation results. |
DatasetValidator
DatasetValidator(
time_col: str,
target_col: str,
min_samples: int = 3000,
max_samples: int = 50000,
min_features: int = 4,
max_features: int = 500,
max_feature_ratio: float = 0.1,
min_unique_values: int = 10,
max_categorical_values: int = 20,
class_imbalance_threshold: float = 1.5,
checks_to_run: Optional[List[str]] = None,
enable_warnings: bool = True,
)
A validator for ensuring dataset quality using research-backed heuristics.
| METHOD | DESCRIPTION |
|---|---|
fit |
Validate input DataFrame and prepare for validation checks. |
fit_transform |
Fit the validator and run validation checks in one step. |
print_report |
Print validation results in a tabular format. |
transform |
Run configured validation checks on the DataFrame. |
Source code in src/temporalscope/datasets/dataset_validator.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 | |
AVAILABLE_CHECKS
AVAILABLE_CHECKS = {
"sample_size",
"feature_count",
"feature_ratio",
"feature_variability",
"categorical_cardinality",
"class_balance",
"binary_features",
}
fit
fit(df: Union[Any, FrameT]) -> DatasetValidator
Validate input DataFrame and prepare for validation checks.
Source code in src/temporalscope/datasets/dataset_validator.py
368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 | |
fit_transform
fit_transform(
df: Union[Any, FrameT], target_col: Optional[str] = None
) -> Dict[str, ValidationResult]
Fit the validator and run validation checks in one step.
Source code in src/temporalscope/datasets/dataset_validator.py
433 434 435 | |
print_report
print_report(results: Dict[str, ValidationResult]) -> None
Print validation results in a tabular format.
Source code in src/temporalscope/datasets/dataset_validator.py
437 438 439 440 441 442 443 444 445 446 447 448 | |
transform
transform(
df: FrameT, target_col: Optional[str] = None
) -> Dict[str, ValidationResult]
Run configured validation checks on the DataFrame.
Source code in src/temporalscope/datasets/dataset_validator.py
403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 | |
ValidationResult
ValidationResult(
passed: bool,
message: Optional[str] = None,
details: Optional[Dict[str, Any]] = None,
severity: Optional[str] = None,
)
Container for dataset validation results.
| METHOD | DESCRIPTION |
|---|---|
get_failed_checks |
Get all failed validation checks. |
get_validation_summary |
Get summary statistics. |
to_dict |
Convert result to dictionary for serialization. |
to_log_entry |
Format result as a structured log entry. |
| ATTRIBUTE | DESCRIPTION |
|---|---|
details |
TYPE:
|
message |
TYPE:
|
passed |
TYPE:
|
severity |
TYPE:
|
details
details: Optional[Dict[str, Any]] = None
message
message: Optional[str] = None
passed
passed: bool
severity
severity: Optional[str] = None
get_failed_checks
get_failed_checks(
results: Dict[str, ValidationResult]
) -> Dict[str, ValidationResult]
Get all failed validation checks.
Source code in src/temporalscope/datasets/dataset_validator.py
57 58 59 60 | |
get_validation_summary
get_validation_summary(
results: Dict[str, ValidationResult]
) -> Dict[str, Any]
Get summary statistics.
Source code in src/temporalscope/datasets/dataset_validator.py
62 63 64 65 66 67 68 69 70 | |
to_dict
to_dict() -> Dict[str, Any]
Convert result to dictionary for serialization.
Source code in src/temporalscope/datasets/dataset_validator.py
44 45 46 | |
to_log_entry
to_log_entry() -> Dict[str, Any]
Format result as a structured log entry.
Source code in src/temporalscope/datasets/dataset_validator.py
48 49 50 51 52 53 54 55 | |