import pandas as pd
import numpy as np
from lrassume import check_homoscedasticity
np.random.seed(42)
X = pd.DataFrame({
'x1': np.linspace(1, 100, 100),
'x2': np.random.randn(100)
})
y = pd.Series(2 * X['x1'] + 3 * X['x2'] + np.random.randn(100))
# Run Breusch-Pagan test
test_results, summary = check_homoscedasticity(
X, y,
method="breusch_pagan",
alpha=0.05
)
print(f"Conclusion: {summary['overall_conclusion']}")
print("\nTest Results:")
print(test_results)Homoscedasticity Testing
Understanding Homoscedasticity
Homoscedasticity means residuals have constant variance across all predictor values.
Why It Matters
- Biased standard errors: Confidence intervals and p-values are incorrect
- Inefficient estimates: Coefficients are unbiased but not optimal
- Hypothesis testing issues: T-tests and F-tests may be invalid
Available Tests
| Test | Best For |
|---|---|
| Breusch-Pagan | General heteroscedasticity detection |
| White | Detects complex forms of heteroscedasticity |
| Goldfeld-Quandt | Heteroscedasticity that increases/decreases monotonically |
Example: Single Test
Example: All Tests
# Run all three tests
test_results, summary = check_homoscedasticity(X, y, method="all")
print(f"Overall: {summary['overall_conclusion']}")
print(f"Tests agreeing: {summary['tests_in_agreement']}")
print("\nDetailed Results:")
print(test_results)Expected Output:
Overall: homoscedastic
Tests agreeing: 3
Detailed Results:
test statistic p_value conclusion significant
0 breusch_pagan 1.234 0.5391 homoscedastic False
1 white 2.156 0.3407 homoscedastic False
2 goldfeld_quandt 0.987 0.4123 homoscedastic False
Using Pre-fitted Models
from sklearn.linear_model import LinearRegression
# Fit your model
model = LinearRegression().fit(X, y)
# Test with fitted model
test_results, summary = check_homoscedasticity(
X, y,
fitted_model=model,
method="breusch_pagan"
)Custom Significance Level
# 99% confidence (stricter)
test_results, summary = check_homoscedasticity(
X, y,
alpha=0.01
)
# 90% confidence (more lenient)
test_results, summary = check_homoscedasticity(
X, y,
alpha=0.10
)Solutions for Heteroscedasticity
- Transform the target variable
# Log transformation
y_log = np.log(y + 1)
# Square root transformation
y_sqrt = np.sqrt(y)- Weighted Least Squares (WLS)
from statsmodels.regression.linear_model import WLS
# Weight by inverse variance
weights = 1 / residuals**2
model_wls = WLS(y, X, weights=weights).fit()- Robust Standard Errors
import statsmodels.api as sm
model = sm.OLS(y, X).fit()
robust_results = model.get_robustcov_results(cov_type='HC3')- Use a different model
- Generalized Linear Models (GLM)
- Quantile regression
- Tree-based models
Visualizing Heteroscedasticity
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
model = LinearRegression().fit(X, y)
residuals = y - model.predict(X)
fitted = model.predict(X)
plt.scatter(fitted, residuals)
plt.axhline(y=0, color='r', linestyle='--')
plt.xlabel('Fitted Values')
plt.ylabel('Residuals')
plt.title('Residual Plot')
plt.show()Good pattern: Random scatter around zero
Bad pattern: Funnel shape (variance increases/decreases)