temporal_data_loader
TemporalScope/src/temporalscope/core/temporal_data_loader.py.
This module provides TimeFrame, a universal data loader for time series forecasting that can store metadata
for conversions between DataFrame and PyTorch/TensorFlow types. It supports state-of-the-art models including
multi-modal and mixed-frequency workflows, with integration for explainability tools (SHAP, LIME, Boruta-SHAP).
TimeFrame is designed to support different modeling approaches by allowing users to add columns that generalize temporal patterns. For example, adding a market regime column to analyze feature importance across different market conditions, or a treatment phase column to understand changing feature effects during patient care. TimeFrame enforces minimal restrictions to maintain flexibility:
- The time column must be numeric or timestamp-like
- Non-time columns must be numeric (preprocess categorical features)
- Data can have mixed frequencies and asynchronous records
Supported Modeling Approaches:
| Approach | Description |
|---|---|
| Standard Regression | Basic ML models where Temporal SHAP reveals how feature importance evolves naturally over time without enforced constraints. |
| Time Series Regression | Group-aware models (e.g., by stock_id) where Temporal SHAP shows how features impact predictions differently across groups and their unique temporal patterns. |
| Bayesian Regression | Probabilistic models where Temporal SHAP explains how features drive both predictions and uncertainty estimates through time. |
Supported Modes:
| Mode | Description & Data Structure |
|---|---|
| single_target | General machine learning tasks with scalar targets. Each row is a single time step, and the target is scalar. Single DataFrame: each row is an observation. |
| multi_target | Sequential time series tasks (e.g., seq2seq) for deep learning. The data is split into sequences (input X, target Y). Two DataFrames: X for input sequences, Y for targets. Frameworks: TensorFlow, PyTorch, Keras. |
References
-
Van Ness, M., et al. (2023). Cross-Frequency Time Series Meta-Forecasting. arXiv:2302.02077.
-
Woo, G., et al. (2024). Unified training of universal time series forecasting transformers. arXiv:2402.02592.
-
Trirat, P., et al. (2024). Universal time-series representation learning: A survey. arXiv:2401.03717.
-
Xu, Q., et al. (2019). An artificial neural network for mixed frequency data. Expert Systems with Applications, 118, pp.127-139.
-
Filho, L.L., et al. (2024). A multi-modal approach for mixed-frequency time series forecasting. Neural Computing and Applications, pp.1-25.
| CLASS | DESCRIPTION |
|---|---|
TimeFrame |
Central class for the TemporalScope package. |
| ATTRIBUTE | DESCRIPTION |
|---|---|
MODE_MULTI_TARGET |
|
MODE_SINGLE_TARGET |
|
VALID_MODES |
|
MODE_MULTI_TARGET
MODE_MULTI_TARGET = 'multi_target'
MODE_SINGLE_TARGET
MODE_SINGLE_TARGET = 'single_target'
TimeFrame
TimeFrame(
df: FrameT,
time_col: str,
target_col: str,
time_col_conversion: Optional[str] = None,
sort: bool = True,
ascending: bool = True,
mode: str = MODE_SINGLE_TARGET,
enforce_temporal_uniqueness: bool = False,
id_col: Optional[str] = None,
verbose: bool = False,
)
Central class for the TemporalScope package.
The TimeFrame class is designed to handle time series data across various backends, including Polars, Pandas,
and Modin. It facilitates workflows for machine learning, deep learning, and explainability methods, while abstracting
away backend-specific implementation details.
This class automatically infers the appropriate backend, validates the data, and sorts it by time. It ensures compatibility with temporal XAI techniques (SHAP, Boruta-SHAP, LIME etc) supporting larger data workflows in production.
Engineering Design Assumptions:
- Universal Models: This class is designed assuming the user has pre-processed their data for compatibility with deep learning models. Across the TemporalScope utilities (e.g., target shifter, padding, partitioning algorithms), it is assumed that preprocessing tasks, such as categorical feature encoding, will be managed by the user or upstream modules. Thus the model will learn global weights and will not groupby categorical variables.
- Mixed Time Frequency supported: Given the flexibility of deep learning models to handle various time frequencies,
this class allows
time_colto contain mixed frequency data, assuming the user will manage any necessary preprocessing or alignment outside of this class. - The
time_colshould be either numeric or timestamp-like for proper temporal ordering. Any mixed or invalid data types will raise validation errors. - All non-time columns are expected to be numeric. Users are responsible for handling non-numeric features (e.g., encoding categorical features).
Examples:
import polars as pl
data = pl.DataFrame({"time": pl.date_range(start="2021-01-01", periods=100, interval="1d"), "value": range(100)})
tf = TimeFrame(data, time_col="time", target_col="value")
print(tf.get_data().head())
This constructor initializes the TimeFrame object, validates the input DataFrame,
and performs optional sorting based on the specified time_col. It also allows for
validation and conversion of the time_col to numeric for compatibility with downstream
processing. Designed for universal workflows supporting state-of-the-art AI models,
this class accommodates mixed-frequency time series data.
There are two common use cases for TimeFrame:
-
Implicit & Static Time Series: For workflows where
time_colis treated as a feature, such as in static modeling for ML/DL applications,enforce_temporal_uniquenesscan remainFalse(default). This mode emphasizes a universal design, accommodating mixed-frequency data. -
Strict Time Series: For workflows requiring strict temporal ordering and uniqueness (e.g., forecasting), set
enforce_temporal_uniqueness=True. Additionally, specifyid_colfor grouped or segmented validation.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
The input DataFrame, which can be any TemporalScope-supported backend (e.g., Pandas, Modin, Polars).
TYPE:
|
time_col
|
The name of the column representing time. Must be numeric or timestamp-like for sorting.
TYPE:
|
target_col
|
The name of the column representing the target variable.
TYPE:
|
time_col_conversion
|
Specify the conversion type for the
TYPE:
|
sort
|
If True, the data will be sorted by
TYPE:
|
ascending
|
If sorting, whether to sort in ascending order. Default is True.
TYPE:
|
mode
|
The operation mode, either
TYPE:
|
enforce_temporal_uniqueness
|
If True, ensures that timestamps in
TYPE:
|
id_col
|
Optional column for grouped or segmented strict temporal validation. Default is None.
TYPE:
|
verbose
|
If True, enables logging for validation and setup stages. Default is False.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ModeValidationError
|
If the specified mode is invalid. |
UnsupportedBackendError
|
If the specified or inferred backend is not supported. |
ValueError
|
If required columns are missing, invalid, or if the time column conversion fails. |
| ATTRIBUTE | DESCRIPTION |
|---|---|
_metadata |
A private metadata dictionary to allow end-users flexibility in extending the TimeFrame object. This provides storage for any additional attributes or information during runtime.
TYPE:
|
Examples:
import pandas as pd
from temporalscope.core.temporal_data_loader import TimeFrame, MODE_SINGLE_TARGET
# Example DataFrame
df = pd.DataFrame({"time": pd.date_range(start="2023-01-01", periods=10, freq="D"), "value": range(10)})
# Initialize TimeFrame with automatic time column conversion to numeric
tf = TimeFrame(df, time_col="time", target_col="value", time_col_convert_numeric=True, mode=MODE_SINGLE_TARGET)
print(tf.df.head())
Warnings
- The
modeparameter must be one of:"single_target": For scalar target predictions (e.g., regression)."multi_target": For sequence forecasting tasks (e.g., seq2seq models).
- The
time_col_conversionparameter allows for automatic conversion of thetime_colto either numeric or datetime during initialization. - The
_metadatacontainer follows design patterns similar to SB3, enabling users to manage custom attributes and extend functionality for advanced workflows, such as future conversion to TensorFlow or PyTorch types in multi-target explainable AI workflows.
| METHOD | DESCRIPTION |
|---|---|
setup |
Initialize and validate a TimeFrame's DataFrame with proper sorting and validation. |
sort_dataframe_time |
Sort DataFrame by time column using backend-agnostic Narwhals operations. |
update_dataframe |
Update TimeFrame's internal DataFrame with new data. |
validate_dataframe |
Validate DataFrame structure and data types. |
| ATTRIBUTE | DESCRIPTION |
|---|---|
ascending |
Get the TimeFrame's sort order setting.
TYPE:
|
df |
Access the internal DataFrame.
TYPE:
|
metadata |
Container for storing additional metadata associated with the TimeFrame.
TYPE:
|
mode |
Get the TimeFrame's operation mode.
TYPE:
|
Source code in src/temporalscope/core/temporal_data_loader.py
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 | |
ascending
ascending: bool
Get the TimeFrame's sort order setting.
This property indicates whether time-based sorting is performed in ascending or descending order, affecting how data is organized for analysis and modeling.
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if sorting is ascending (earlier to later), False if descending (later to earlier). |
Examples:
from temporalscope.core.temporal_data_loader import TimeFrame
# Create TimeFrame with descending sort
tf = TimeFrame(df, time_col="time", target_col="target", ascending=False)
# Check sort order
if not tf.ascending:
print("Data sorted from latest to earliest")
See Also
sort_dataframe_time : Method that uses this setting setup : Where sorting is applied
Notes
- Affects all sorting operations
- Set during initialization
- Used by sort_dataframe_time method
df
df: FrameT
Access the internal DataFrame.
This property provides read-only access to the TimeFrame's internal DataFrame. The DataFrame maintains all validations, conversions, and sorting settings applied during initialization or updates.
| RETURNS | DESCRIPTION |
|---|---|
FrameT
|
The current state of the DataFrame, with all validations and transformations applied. |
Examples:
import pandas as pd
from temporalscope.core.temporal_data_loader import TimeFrame
# Create TimeFrame
tf = TimeFrame(pd.DataFrame({"time": [1, 2, 3], "target": [10, 20, 30]}), time_col="time", target_col="target")
# Access DataFrame
current_df = tf.df
print(current_df) # Shows current state
See Also
update_dataframe : Method to update the internal DataFrame setup : Method that prepares the DataFrame
Notes
- Returns a reference to the internal DataFrame
- Any modifications should be done through update_dataframe
- Maintains all TimeFrame settings and validations
metadata
metadata: Dict[str, Any]
Container for storing additional metadata associated with the TimeFrame.
This property provides a flexible storage mechanism for arbitrary metadata related to the TimeFrame, such as configuration details, additional annotations, or external data structures. It is designed to support future extensions, including multi-target workflows and integration with deep learning libraries like TensorFlow or PyTorch.
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary for storing metadata related to the TimeFrame. |
Examples:
# Initialize a TimeFrame
tf = TimeFrame(df, time_col="time", target_col="value")
# Add custom metadata
tf.metadata["description"] = "This dataset is for monthly sales forecasting"
tf.metadata["model_details"] = {"type": "LSTM", "framework": "TensorFlow"}
# Access metadata
print(tf.metadata["description"]) # Output: "This dataset is for monthly sales forecasting"
Notes
This metadata container is designed following patterns seen in deep reinforcement learning (DRL) libraries like Stable-Baselines3, where additional metadata is stored alongside primary data structures for extensibility.
Future Support
In future releases, this will support multi-target workflows, enabling the storage of processed tensor data for deep learning explainability (e.g., SHAP, LIME).
mode
mode: str
Get the TimeFrame's operation mode.
This property indicates whether the TimeFrame is configured for single-target or multi-target operations, affecting how data is processed and validated.
| RETURNS | DESCRIPTION |
|---|---|
str
|
The current operation mode: - MODE_SINGLE_TARGET: For scalar target predictions - MODE_MULTI_TARGET: For sequence forecasting tasks |
Examples:
from temporalscope.core.temporal_data_loader import TimeFrame, MODE_MULTI_TARGET
# Create TimeFrame in multi-target mode
tf = TimeFrame(df, time_col="time", target_col="target", mode=MODE_MULTI_TARGET)
# Check mode
if tf.mode == MODE_MULTI_TARGET:
print("Configured for sequence forecasting")
See Also
TimeFrame.init : Where mode is set during initialization
Notes
- Mode affects validation and processing behavior
- Cannot be changed after initialization
- Determines compatibility with different model types
setup
setup(
df: FrameT,
sort: bool = True,
ascending: bool = True,
time_col_conversion: Optional[str] = None,
enforce_temporal_uniqueness: bool = False,
id_col: Optional[str] = None,
) -> FrameT
Initialize and validate a TimeFrame's DataFrame with proper sorting and validation.
This method performs the necessary validation, conversion, and sorting operations to prepare the input DataFrame for use in TemporalScope workflows. The method is idempotent.
Steps:
- Validate the input DataFrame using the
validate_dataframemethod. - Optionally convert the
time_colto the specified type (numericordatetime). - Perform temporal uniqueness validation within groups if enabled.
- Optionally sort the DataFrame by
time_colin the specified order.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Input DataFrame to set up and validate.
TYPE:
|
sort
|
Whether to sort the DataFrame by
TYPE:
|
ascending
|
Sort order if sorting is enabled. Defaults to True.
TYPE:
|
time_col_conversion
|
Optional. Specify the conversion type for the
TYPE:
|
Steps:
- Validate the input DataFrame using the
validate_dataframemethod. - Optionally convert the
time_colto the specified type (numericordatetime). - Perform temporal uniqueness validation within groups if enabled.
- Optionally sort the DataFrame by
time_colin the specified order.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Input DataFrame to set up and validate.
TYPE:
|
sort
|
Whether to sort the DataFrame by
TYPE:
|
ascending
|
Sort order if sorting is enabled. Defaults to True.
TYPE:
|
time_col_conversion
|
Optional. Specify the conversion type for the
TYPE:
|
enforce_temporal_uniqueness
|
If True, validates that timestamps in the
TYPE:
|
id_col
|
An optional column name to define groups for temporal uniqueness validation. If None, validation is performed across the entire DataFrame. Default is None.
TYPE:
|
df
|
TYPE:
|
sort
|
(Default value = True)
TYPE:
|
ascending
|
(Default value = True)
TYPE:
|
time_col_conversion
|
(Default value = None)
TYPE:
|
enforce_temporal_uniqueness
|
(Default value = False)
TYPE:
|
id_col
|
(Default value = None)
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
SupportedTemporalDataFrame
|
|
Example usage:
import pandas as pd
from temporalscope.core.temporal_data_loader import TimeFrame
df = pd.DataFrame(
{
"patient_id": [1, 1, 2, 2],
"time": ["2023-01-01", "2023-01-02", "2023-01-01", "2023-01-03"],
"value": [10, 20, 30, 40],
}
)
tf = TimeFrame(
df,
time_col="time",
target_col="value",
)
sorted_df = tf.setup(df, time_col_conversion="datetime", enforce_temporal_uniqueness=True, id_col="patient_id")
print(sorted_df)
Notes
- This method is designed to be idempotent, ensuring safe revalidation or reinitialization.
- The
time_col_conversionparameter allows you to convert thetime_colto a numeric or datetime type. - Sorting is performed only if explicitly enabled via the
sortparameter. - While this method validates, converts, and sorts the DataFrame, it does not modify the TimeFrame's
internal state unless explicitly used within another method (e.g.,
update_dataframe). - The
enforce_temporal_uniquenessparameter can be set dynamically in this method, allowing validation of temporal uniqueness to be turned on/off as needed. - The
id_colparameter can also be set dynamically, defining the scope of the temporal uniqueness validation. - The
id_colparameter enables validation of temporal uniqueness within each group's records, ensuring no duplicate timestamps exist per group while allowing different groups to have events on the same dates. This is particularly useful for multi-entity time series datasets (e.g., patient data, stock prices). Note: Users must check the Apache License for the complete terms of use. This software is distributed "AS-IS" and may require adjustments for specific use cases. Validated, converted, and optionally sorted DataFrame.
Source code in src/temporalscope/core/temporal_data_loader.py
468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 | |
sort_dataframe_time
sort_dataframe_time(
df: FrameT, ascending: bool = True
) -> FrameT
Sort DataFrame by time column using backend-agnostic Narwhals operations.
This method provides a consistent way to sort DataFrames by their time column across all supported backends (Pandas, Polars, etc.). It delegates to core_utils.sort_dataframe_time for the actual sorting operation.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
DataFrame to sort. Can be any backend supported by Narwhals (Pandas, Polars, etc.).
TYPE:
|
ascending
|
Sort direction. True for ascending (default), False for descending.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
FrameT
|
A new DataFrame sorted by the time column. |
Examples:
import polars as pl
from temporalscope.core.temporal_data_loader import TimeFrame
# Create TimeFrame with unsorted data
data = pl.DataFrame({"time": [3, 1, 4, 2, 5], "target": range(5)})
tf = TimeFrame(data, time_col="time", target_col="target", sort=False)
# Sort ascending
sorted_asc = tf.sort_dataframe_time(tf.df, ascending=True)
print(sorted_asc) # Shows: 1, 2, 3, 4, 5
# Sort descending
sorted_desc = tf.sort_dataframe_time(tf.df, ascending=False)
print(sorted_desc) # Shows: 5, 4, 3, 2, 1
See Also
temporalscope.core.core_utils.sort_dataframe_time : The underlying sorting function
Notes
- Uses core_utils.sort_dataframe_time for consistent sorting across the codebase
- Preserves DataFrame schema and column types
- Returns a new DataFrame; does not modify the input DataFrame
Source code in src/temporalscope/core/temporal_data_loader.py
339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 | |
update_dataframe
update_dataframe(df: FrameT) -> None
Update TimeFrame's internal DataFrame with new data.
This method updates the internal DataFrame with new data, performing all necessary validations and conversions to maintain consistency. Uses eager evaluation since it modifies internal state.
The update process includes: 1. Converting input to Narwhals DataFrame 2. Running full setup validation and conversion pipeline 3. Replacing internal DataFrame with validated result
| PARAMETER | DESCRIPTION |
|---|---|
df
|
The new DataFrame to update with. Must contain the required time and target columns with appropriate data types.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
None
|
Method updates internal state but returns nothing. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
|
Examples:
import pandas as pd
from temporalscope.core.temporal_data_loader import TimeFrame
# Initialize TimeFrame
initial_df = pd.DataFrame({"time": [1, 2, 3], "target": [10, 20, 30]})
tf = TimeFrame(initial_df, time_col="time", target_col="target")
# Update with new data
new_df = pd.DataFrame({"time": [4, 5, 6], "target": [40, 50, 60]})
tf.update_dataframe(new_df) # Updates internal DataFrame
# Invalid update (missing column)
invalid_df = pd.DataFrame({"wrong_col": [1, 2, 3]})
tf.update_dataframe(invalid_df) # Raises ValueError
See Also
setup : The underlying validation and setup method validate_dataframe : The validation method used
Notes
- Uses eager evaluation to ensure immediate state update
- Maintains all TimeFrame settings (sorting, conversion, etc.)
- Performs full validation to ensure data consistency
Source code in src/temporalscope/core/temporal_data_loader.py
624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 | |
validate_dataframe
validate_dataframe(df: FrameT) -> None
Validate DataFrame structure and data types.
This method performs comprehensive validation of the input DataFrame: 1. Converts to Narwhals DataFrame for backend-agnostic operations 2. Checks for empty DataFrame 3. Validates required columns exist (time_col and target_col) 4. Validates time column is numeric or datetime 5. Validates all non-time columns are numeric
| PARAMETER | DESCRIPTION |
|---|---|
df
|
DataFrame to validate. Can be any backend supported by Narwhals (Pandas, Polars, etc.).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
None
|
Method returns None if validation passes. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
|
Examples:
import pandas as pd
from temporalscope.core.temporal_data_loader import TimeFrame
# Create TimeFrame with valid data
df = pd.DataFrame({"time": [1, 2, 3], "target": [10, 20, 30]})
tf = TimeFrame(df, time_col="time", target_col="target")
# Validate new data
new_df = pd.DataFrame({"time": [4, 5, 6], "target": [40, 50, 60]})
tf.validate_dataframe(new_df) # Passes validation
# Invalid data (missing column)
invalid_df = pd.DataFrame({"wrong_col": [1, 2, 3]})
tf.validate_dataframe(invalid_df) # Raises ValueError
See Also
temporalscope.core.core_utils.validate_column_numeric_or_datetime temporalscope.core.core_utils.validate_feature_columns_numeric
Notes
- Uses core_utils functions for consistent validation across the codebase
- Performs validation without modifying the input DataFrame
- Supports all DataFrame backends through Narwhals abstraction
Source code in src/temporalscope/core/temporal_data_loader.py
390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 | |