Results¶

For each subreddit-specific dataset of processed data we computed Wilcoxon rank-sum statistic (also referred to as the Mann-Whitney-Wilcoxon rank-sum), comparing the difference in median number of references to substance abuse per reddit-post in the ‘pre-COVID’ and ‘post-COVID’ datasets. In our test we employ Scipy stats packages’ ranksums method [VGO+20].

According to Scipy documentation:

We can test the hypothesis that two independent unequal-sized samples are drawn from the same distribution with computing the Wilcoxon rank-sum statistic.

And according to this ISIXSIGMA article

The Mann-Whitney test compares the medians from two populations and works when the Y variable is continuous, discrete-ordinal or discrete-count, and the X variable is discrete with two attributes.

While this test may not exactly fit our use case for reasons unknown to us - we do not have any better guidance at this point as to choosing a more well-suited test for the purpose of measuring a statistically significant difference in medians between these two datasets.

Using the Wilcoxon rank-sum test, we can set up our hypotheses for this test as follows:

\(H_0:\) median number of references to substance abuse per reddit-post is the same in subreddit-specific ‘pre-COVID’ and ‘post-COVID’ datasets.

\(H_a:\) median number of references to substance abuse per reddit-post is not the same in subreddit-specific ‘pre-COVID’ and ‘post-COVID’ datasets.

	subreddit_topic	test_statistic	p_value
0	bipolarreddit	-0.692095	4.888776e-01
1	EDAnonymous	1.922717	5.451560e-02
2	socialanxiety	0.990075	3.221376e-01
3	alcoholism	-1.407757	1.592032e-01
4	lonely	-2.041229	4.122808e-02
5	healthanxiety	0.184029	8.539905e-01
6	ptsd	-0.551019	5.816208e-01
7	suicidewatch	1.643746	1.002287e-01
8	addiction	0.481856	6.299085e-01
9	bpd	-0.454428	6.495209e-01
10	autism	1.358547	1.742903e-01
11	schizophrenia	0.246796	8.050664e-01
12	adhd	2.955800	3.118593e-03
13	depression	1.693505	9.035944e-02
14	anxiety	6.492918	8.418971e-11

Given we set a standard threshold \(\alpha = 0.05\) for statistical significance, the conclusions we may be able to draw from these results are:

r/adhd and r/lonely saw a statistically significant difference between median number of references to substance abuse per reddit-post when comparing the ‘pre-COVID’ and ‘post-COVID’ datasets
the remaining subreddits tested showed no statistically significant difference between median number of references to substance abuse per reddit-post when comparing the ‘pre-COVID’ and ‘post-COVID’ datasets.

Covid Reddit Behaviour

Results¶