Independence Testing

Understanding Independence

Residuals should be independent of each other. Violations occur when:

Working with time-series data
Spatially correlated observations
Clustered data (e.g., students within schools)

The Durbin-Watson Test

The check_independence() function uses the Durbin-Watson test to detect autocorrelation.

Interpreting the Statistic

1.5 to 2.5: No significant autocorrelation ✓
< 1.5: Positive autocorrelation (successive residuals are similar)
> 2.5: Negative autocorrelation (successive residuals alternate)

Example: Time Series Data

import pandas as pd
from lrassume import check_independence

# Monthly sales data
df = pd.DataFrame({
    "advertising": [10, 15, 12, 18, 20, 25, 22, 28, 30, 35],
    "month": range(1, 11),
    "sales": [100, 150, 120, 180, 200, 250, 220, 280, 300, 350]
})

result = check_independence(df, target="sales")

print(f"DW Statistic: {result['dw_statistic']:.3f}")
print(f"Independent: {result['is_independent']}")
print(result['message'])

What to Do if Independence Fails

If the test detects autocorrelation:

Time series models: Consider ARIMA, VAR, or other time-series methods
Add lagged variables: Include previous values as predictors
Cluster-robust standard errors: Adjust standard errors for clustering
Mixed-effects models: Account for hierarchical structure

Key Points

The function automatically fits a linear model using all numeric features
It handles the intercept term internally
Only the target column needs to be specified