TemporalScope Tutorial: Synthetic Health Monitoring Analysis¶
Overview¶
This tutorial demonstrates using TemporalScope with synthetic health data. While we plan to integrate standard academic healthcare datasets in future releases, this synthetic example illustrates the core functionality of TimeFrame and SingleStepTargetShifter.
Current Features Demonstrated¶
TimeFrame:
- Backend-agnostic data loading
- Data validation for XAI workflows
- Support for temporal data structures
SingleStepTargetShifter:
- One-step-ahead target preparation
- Clean separation of validation/transformation
- Backend-agnostic operations
Future Enhancements¶
- Integration with standard healthcare datasets
- Multi-step sequence prediction (planned MultiStepTargetShifter)
- Advanced temporal partitioning strategies
Engineering Design¶
This tutorial follows TemporalScope's core engineering principles:
Data Quality:
- Clean, preprocessed data assumption
- Proper time column formatting
- Numeric features requirement
Backend Agnostic:
- Works with pandas, polars, modin
- Pure Narwhals operations
- Consistent behavior across backends
XAI Ready:
- Prepared for MASV computations
- Compatible with temporal feature importance
- Supports model-agnostic explainability
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from temporalscope.core.temporal_data_loader import TimeFrame
from temporalscope.target_shifters.single_step import SingleStepTargetShifter
from temporalscope.core.core_utils import print_divider
def generate_health_data(start_date: str = "2023-01-01", days: int = 365):
"""Generate synthetic health monitoring data.
This synthetic data includes realistic patterns:
- Seasonal effects (yearly cycles)
- Weekly patterns (work stress)
- Daily variations
:param start_date: Starting date for the data
:type start_date: str
:param days: Number of days to generate
:type days: int
:return: DataFrame with synthetic health data
:rtype: pd.DataFrame
"""
# Create date range
dates = pd.date_range(start=start_date, periods=days, freq="D")
t = np.arange(days)
# Generate patterns
seasonal = 5 * np.sin(2 * np.pi * t / 365) # Yearly cycle
weekly = 3 * np.sin(2 * np.pi * t / 7) # Weekly cycle
# Generate metrics
systolic = 120 + seasonal + weekly + np.random.normal(0, 3, days)
heart_rate = 70 + weekly + np.random.normal(0, 3, days)
return pd.DataFrame({"ds": dates, "systolic": systolic, "heart_rate": heart_rate})
# Generate synthetic data
print("Generating synthetic health data...")
health_df = generate_health_data()
print("Preview of generated health data:")
print(health_df.head())
print_divider()
Generating synthetic health data...
Preview of generated health data:
ds systolic heart_rate
0 2023-01-01 120.076743 67.202520
1 2023-01-02 119.704010 74.038603
2 2023-01-03 116.535820 78.983519
3 2023-01-04 123.188708 69.461099
4 2023-01-05 115.938817 66.350265
======================================================================
# Explore the synthetic data
print("Data Overview:")
print(f"Shape: {health_df.shape}")
print("\nColumn Information:")
print(health_df.info())
print("\nSummary Statistics:")
health_df.describe()
Data Overview: Shape: (365, 3) Column Information: <class 'pandas.core.frame.DataFrame'> RangeIndex: 365 entries, 0 to 364 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ds 365 non-null datetime64[ns] 1 systolic 365 non-null float64 2 heart_rate 365 non-null float64 dtypes: datetime64[ns](1), float64(2) memory usage: 8.7 KB None Summary Statistics:
| ds | systolic | heart_rate | |
|---|---|---|---|
| count | 365 | 365.000000 | 365.000000 |
| mean | 2023-07-02 00:00:00 | 120.078210 | 69.896828 |
| min | 2023-01-01 00:00:00 | 107.137569 | 59.929979 |
| 25% | 2023-04-02 00:00:00 | 116.360957 | 67.394663 |
| 50% | 2023-07-02 00:00:00 | 119.988853 | 69.833409 |
| 75% | 2023-10-01 00:00:00 | 123.328977 | 72.521727 |
| max | 2023-12-31 00:00:00 | 134.699601 | 79.206850 |
| std | NaN | 4.921702 | 3.814593 |
# Initialize TimeFrame for systolic blood pressure
systolic_tf = TimeFrame(df=health_df, time_col="ds", target_col="systolic")
print("Original TimeFrame:")
print(systolic_tf.df.head())
print_divider()
# Initialize SingleStepTargetShifter
shifter = SingleStepTargetShifter(n_lags=1, verbose=True)
# Transform data for one-step-ahead prediction
transformed_tf = shifter.fit_transform(systolic_tf)
print("\nTransformed TimeFrame:")
print(transformed_tf.df.head())
print_divider()
Original TimeFrame:
ds systolic heart_rate
0 2023-01-01 120.076743 67.202520
1 2023-01-02 119.704010 74.038603
2 2023-01-03 116.535820 78.983519
3 2023-01-04 123.188708 69.461099
4 2023-01-05 115.938817 66.350265
======================================================================
Initialized SingleStepTargetShifter with target_col=None, n_lags=1
Rows before: 365; Rows after: 364; Dropped: 1
Transformed TimeFrame:
ds heart_rate systolic_shift_1
0 2023-01-01 67.202520 119.704010
1 2023-01-02 74.038603 116.535820
2 2023-01-03 78.983519 123.188708
3 2023-01-04 69.461099 115.938817
4 2023-01-05 66.350265 115.791724
======================================================================
# Explore the transformed data
print("Original vs Transformed Shape:")
print(f"Original: {systolic_tf.df.shape}")
print(f"Transformed: {transformed_tf.df.shape}")
print("\nNote: One row less due to target shifting")
print("\nTransformed Data Preview:")
transformed_tf.df.head()
Original vs Transformed Shape: Original: (365, 3) Transformed: (364, 3) Note: One row less due to target shifting Transformed Data Preview:
| ds | heart_rate | systolic_shift_1 | |
|---|---|---|---|
| 0 | 2023-01-01 | 67.202520 | 119.704010 |
| 1 | 2023-01-02 | 74.038603 | 116.535820 |
| 2 | 2023-01-03 | 78.983519 | 123.188708 |
| 3 | 2023-01-04 | 69.461099 | 115.938817 |
| 4 | 2023-01-05 | 66.350265 | 115.791724 |
Implementation Notes¶
Current Limitations¶
Synthetic Data:
- Currently using synthetic data for demonstration
- Future releases will integrate standard healthcare datasets
- Academic dataset integration planned
Single-Step Prediction:
- Current focus on one-step-ahead forecasting
- Multi-step sequence prediction planned (MultiStepTargetShifter)
- Deep learning support in development
Best Practices¶
Data Preparation:
- Ensure clean, preprocessed data
- Proper datetime formatting
- Handle missing values before using TemporalScope
Backend Selection:
- Choose based on data size and compute resources
- pandas: Small to medium datasets
- polars/modin: Larger datasets
XAI Workflows:
- TimeFrame ensures data quality for MASV
- SingleStepTargetShifter preserves temporal structure
- Ready for temporal feature importance analysis
Info
This tutorial was auto-generated from the TemporalScope repository.
If you would like to suggest enhancements or report issues, please submit a Pull Request following the contribution guidelines.
Source notebook: synthetic_health_monitoring_analysis.ipynb
Disclaimer & Copyright
THIS SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT, OR OTHERWISE, ARISING FROM, OUT OF, OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
THIS SOFTWARE IS INTENDED FOR ACADEMIC AND INFORMATIONAL PURPOSES ONLY. IT SHOULD NOT BE USED IN PRODUCTION ENVIRONMENTS OR FOR CRITICAL DECISION-MAKING WITHOUT PROPER VALIDATION. ANY USE OF THIS SOFTWARE IS AT THE USER'S OWN RISK.
© 2024 Philip Ndikum