English: The figure shows the change in p-values computed from a t-test as the sample size increases, and how early stopping can allow for p-hacking.
Data is drawn from two identical normal distributions, . For each sample size , ranging from 5 to , a t-test is performed on the first <math>n<math> samples from each distribution, and the resulting p-value is plotted. The red dashed line indicates the commonly used significance level of 0.05.
If the data collection or analysis were to stop at a point where the p-value happened to fall below the significance level, a spurious statistically significant difference could be reported.
Illustration based on
Wagenmakers, Eric-Jan. "A practical solution to the pervasive problems of p values." Psychonomic bulletin & review 14.5 (2007): 779-804.
```python
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
- Set random seed for reproducibility
np.random.seed(42)
- Function to perform t-test and return p-value
def perform_t_test(sample1, sample2):
_, p_value = stats.ttest_ind(sample1, sample2)
return p_value
- Initialize parameters
max_samples = 10**4
start_samples = 5
p_values = []
sample_sizes = range(start_samples, max_samples + 1)
- Generate data and perform t-tests
population1 = stats.norm(loc=0, scale=10)
population2 = stats.norm(loc=0, scale=10)
samples1 = population1.rvs(max_samples)
samples2 = population2.rvs(max_samples)
for n in sample_sizes:
p_value = perform_t_test(samples1[:n], samples2[:n])
p_values.append(p_value)
- Create the plot
plt.figure(figsize=(12, 6))
plt.semilogx(sample_sizes, p_values, 'b-')
plt.axhline(y=0.05, color='r', linestyle='--', label='p = 0.05')
plt.xlabel('Sample Size (log scale)')
plt.ylabel('p-value')
plt.title('Variability of p-value as Sample Size Increases')
plt.grid(True, which="both", ls="-", alpha=0.2)
plt.legend()
plt.ylim(0, 1)
plt.tight_layout()
plt.savefig('p-hacking.svg')
plt.show()
```