UCA · Research Methods

Statistics 101
for Cyberpsychologists

The concepts behind your data — in plain language

Todd McCaffrey · MSc Cyberpsychology · ATU Letterkenny

⚡ Try the Interactive Course → 🧬 Data Guide ⬇ Download Dataset (CSV) R R Analysis Script ↩ UCAHub

Chapter 00

What Statistics Actually Is

One sentence: Statistics is the art of drawing defensible conclusions from imperfect data.

You cannot directly measure 'moral disengagement' or 'cyber-aggression'. You can only measure proxies — questionnaire responses, behavioural traces, self-reports. Statistics is the formal machinery for reasoning about the gap between your proxies and the real thing.

Think of it like a compiler. You write code in a high-level language (your theory). The compiler translates it into something the machine can execute (your data). Statistics is that compiler. It tells you whether your high-level ideas actually run — and how well.

The Fundamental Problem

You have a SAMPLE (the 164 people who filled in your survey). You want to say something about a POPULATION (all people who might respond with hostility online). Every statistical technique is answering: how confident can I be that what I found in my sample is actually true in the population?

Chapter 01

Describing Your Data

Before you can analyse anything, you need to describe what you have. These are the fundamental descriptive statistics — the vocabulary of any results section.

1.1 The Three Averages

Mean	The arithmetic average. Sum all values, divide by n. Sensitive to outliers — one extreme score drags it.
Median	The middle value when sorted. Robust to outliers. If your data is skewed, the median tells a more honest story.
Mode	The most frequent value. Mainly useful for categorical data. ('Most participants reported Strongly Agree.')

\(\displaystyle \bar{x} = \frac{\sum x_i}{n} \qquad \tilde{x} = \begin{cases} x_{(n+1)/2} & n \text{ odd} \\ \frac{x_{(n/2)} + x_{(n/2)+1}}{2} & n \text{ even} \end{cases}\)

Example

Hostile-response scores of [1, 2, 2, 3, 50]. Mean = 11.6. Median = 2. The outlier (50) makes the mean misleading — the median better represents the group.

1.2 Spread: Standard Deviation and Variance

Knowing the average is half the story. You also need to know how spread out the scores are.

Variance (s²)	The average squared distance from the mean. Squaring makes all distances positive and penalises large deviations more.
Std Dev (s)	The square root of variance. Back in the original units. 'On average, scores deviate from the mean by this much.'

\(\displaystyle s = \sqrt{\frac{\sum(x_i - \bar{x})^2}{n-1}}\)

The Engineering Analogy

SD is like manufacturing tolerance. A mean score of 3.5 on a 5-point scale tells you where the centre is. An SD of 0.4 means almost everyone is bunched near 3.5 (tight tolerance). An SD of 1.8 means scores are scattered across the whole scale (sloppy tolerance). Both have the same mean — only the SD reveals they're completely different distributions.

1.3 The Normal Distribution

Many psychological variables, when sampled sufficiently, produce a bell-shaped curve: lots of people in the middle, fewer at the extremes. This is the normal (Gaussian) distribution. Most statistical tests assume or approximate normality.

68-95-99.7 Rule	68% of scores fall within 1 SD of the mean. 95% within 2 SD. 99.7% within 3 SD.
Z-score	How many SDs above or below the mean a score is. Converts any scale to a common currency.

\(\displaystyle z = \frac{x - \mu}{\sigma} \qquad f(x) = \frac{1}{\sigma\sqrt{2\pi}}\,e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}\)

If moral disengagement has mean 2.7 and SD 0.8, a score of 4.3 is 2 SDs above the mean — in the top 2.5% of respondents. That's your z-score interpretation.

Chapter 02

The Core Engine: Inference

2.1 The Sampling Problem

Your 164 participants are one possible sample from a vast population. If you ran the study again with 164 different people, you'd get slightly different numbers. The question is: is the pattern you found real, or just noise from this particular sample?

H₀ (Null)	There is no effect in the real population. What you observed happened by chance.
H₁ (Alternative)	There IS a real effect. Your sample is giving you a genuine signal.
p-value	The probability of getting your results (or more extreme) if H₀ is true. p < .05 is the conventional threshold. It is not the probability that H₀ is true.

\(\displaystyle p = P\!\left(|T| \geq |t_{\text{obs}}| \mid H_0\right) \qquad \text{reject } H_0 \text{ if } p < \alpha\)

The Courtroom Analogy

The null hypothesis is 'innocent until proven guilty'. You assume there's no effect. The p-value is the probability of seeing evidence this strong if the defendant is truly innocent. A small p-value means: 'this evidence would be very unlikely if there were no real effect.' But it does not prove guilt — it just makes innocence implausible.

Your Dissertation

p > .05 does not mean 'no effect exists'. It means 'insufficient evidence to reject H₀'. This is exactly why your null finding — the AI block not predicting hostile responding above the classic drivers — is not a failure. It's a finding. It says: whatever incremental effect the AI factors have, it's too small to detect as distinct from noise at this sample size (the AI block was tested at n = 139).

2.2 Effect Size: The Number That Actually Matters

p-values are contaminated by sample size. With 10,000 participants, a trivially tiny effect becomes statistically significant. Effect size is the clean measure of 'how big is this, really?'

Cohen's d	For comparing two means. d = 0.2 small, 0.5 medium, 0.8 large. Standardised distance between two group averages.
r (Pearson)	−1 to +1. How closely two variables move together. r² = proportion of shared variance.
f² (Cohen's)	Effect size for regression. f² = R² / (1 − R²). Small .02, medium .15, large .35.
η² / ω²	Effect size for ANOVA. How much variance does group membership explain?

2.3 Confidence Intervals

A 95% CI tells you: if you repeated this study 100 times, 95 of those intervals would contain the true population parameter. In practice: it's a range of plausible values for your effect.

Always more informative than a p-value alone. A CI of [0.32, 0.48] for a correlation tells you something very different from [0.01, 0.79], even if both are 'significant'.

Chapter 03

The t-test

The t-test is the simplest inferential workhorse. It answers one question: are these two means significantly different from each other?

3.1 The Mechanics

The t-statistic is a signal-to-noise ratio. The numerator is the signal (how different are the means?). The denominator is the noise (how much sampling variability is there?).

\(\displaystyle t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}}} \qquad d = \frac{\bar{x}_1 - \bar{x}_2}{s_{\text{pooled}}}\)

The Welch-Satterthwaite equation adjusts degrees of freedom for unequal group variances:

\(\displaystyle df = \frac{\left(\dfrac{s_1^2}{n_1}+\dfrac{s_2^2}{n_2}\right)^2}{\dfrac{(s_1^2/n_1)^2}{n_1-1}+\dfrac{(s_2^2/n_2)^2}{n_2-1}}\)

Key Assumption

t-tests assume roughly normally distributed data — or large enough n that the Central Limit Theorem helps (typically n > 30 per group). They also assume similar variance in both groups (homoscedasticity). Levene's test checks the variance assumption; SPSS/R report it automatically.

3.2 Three Flavours

One-sample t	Is my sample mean different from a known value? 'Is the average hostile-response score different from the scale midpoint of 5.5?'
Independent t	Are two separate groups different? 'Do high AI-trust scorers report higher hostile-response likelihood than low AI-trust scorers?'
Paired t	Are the same people different at two time points? 'Did hostile-response scores change after an intervention?' Each person is their own control.

Chapter 04

Correlation and Regression

4.1 Correlation: Do These Variables Move Together?

Pearson's r measures the linear relationship between two continuous variables. It ranges from −1 (perfect negative) to +1 (perfect positive). Zero means no linear relationship.

\(\displaystyle r = \frac{\sum(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum(x_i-\bar{x})^2 \cdot \sum(y_i-\bar{y})^2}}\)

Critical Point

Correlation is not causation. Moral disengagement and hostile responding correlate moderately in your data (r ≈ .38). That tells you they move together. It does not tell you that disengagement causes hostility — though theory does.

4.2 Simple Linear Regression: Prediction

Regression fits a line through your data to predict one variable from another.

\(\displaystyle \hat{Y} = a + bX \qquad R^2 = r^2\)

a (intercept)	Predicted Y when X = 0. Often not directly interpretable if X = 0 is outside your data range.
b (slope)	How much Y changes for each one-unit increase in X.
R²	Proportion of variance in Y explained by your predictor(s). R² = .22 → 22% of variation in hostile responding explained.

4.3 Multiple Regression: Several Predictors

Multiple regression estimates the unique contribution of each predictor, controlling for all the others.

\(\displaystyle Y = a + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3 + \varepsilon\)

β (beta)	Standardised coefficient. Same units as a z-score. Directly comparable across predictors on different scales.
b (unstandardised)	Raw slope in original units. 'Each 1-point increase in moral disengagement (1–5) predicts a ≈1.3-point increase in hostile-response likelihood (1–10).'

4.4 Hierarchical Regression: Your Method

Hierarchical regression enters predictors in theoretically-motivated blocks. The key statistic is ΔR² — how much additional variance the new block explains above and beyond what came before. Every block is fit on the same complete-case set (here n = 139) so the ΔR² tests are valid.

\(\displaystyle \Delta R^2 = R^2_{\text{step}_k} - R^2_{\text{step}_{k-1}} \qquad F_{\text{change}} = \frac{\Delta R^2 / \Delta df_1}{(1 - R^2_k) / df_2}\)

Your Dissertation Example — Predicting Hostile Response Likelihood

Block 1 — Classic predictors (habitual use, empathy deficit, aggression normalisation, anonymity, moral disengagement): R² ≈ .22
Block 2 — + AI block (AI trust, AI disinhibition, AI familiarity): ΔR² ≈ .02, p ≈ .27
Block 3 — + Big Five: ΔR² ≈ .04, ns

Interpretation: the AI factors add essentially nothing once the classic drivers are in. Moral disengagement carries the model (β ≈ .47, p < .001). A near-zero ΔR² for the AI block is the finding — a precise partitioning of variance, and a meaningful null.

4.5 Suppression: When Controls Make an Effect Appear

Usually, adding control variables shrinks a predictor's coefficient — the controls explain away part of what looked like its effect. Suppression is the opposite, and it surprises people: a predictor that is null on its own becomes significant once you adjust for the others. The coefficient grows rather than shrinks.

It happens when a predictor carries two kinds of variance: some that relates to the outcome, and some irrelevant variance it shares with the other predictors. The other predictors "soak up" that irrelevant part, leaving a cleaner signal behind.

Your Dissertation Example — Perceived Anonymity

On its own, perceived anonymity barely relates to hostile responding: r ≈ −.10, non-significant. But in the full model it becomes a clear negative predictor (β ≈ −.21, p ≈ .015) — people who feel more anonymous report less hostile responding, once you hold the other constructs constant. The effect got stronger under control. A "screen the correlations first, then model the survivors" workflow would have discarded anonymity before the regression ever ran.

Why It Matters

Suppression is the standard cautionary tale against modelling variables one at a time. It also explains why a coefficient's sign or size can legitimately differ from its raw correlation — the regression answers a different question (unique contribution, holding the rest constant), not the zero-order one.

Chapter 05

Cronbach's Alpha

Before you can trust your regression results, you need to know your scales are reliable. Cronbach's alpha (α) measures internal consistency: how correlated are the items within a scale?

The underlying logic: if five items all measure 'moral disengagement', they should all correlate positively. A person high on disengagement should score high on all five. If items don't correlate, they're measuring different things.

\(\displaystyle \alpha = \frac{k}{k-1}\!\left(1 - \frac{\sum \sigma_i^2}{\sigma_t^2}\right)\)

Where k = number of items, Σσᵢ² = sum of item variances, σₜ² = variance of the total score.

α < .60 Poor reliability. Items are not measuring the same construct. Revise before use.

α .60–.70 Acceptable for exploratory research. Treat with caution.

α .70–.90 Good ✓ — target range for most psychometric scales.

α > .90 Excellent — but potentially item redundancy. Items may be near-identical.

Important Caveat

Alpha is not validity. A scale can have α = .92 and still not measure what you think it measures. High internal consistency means the items hang together — it says nothing about whether the construct maps to the real world. Validity requires Confirmatory Factor Analysis (CFA), convergent validity (AVE ≥ .50), and discriminant validity (HTMT < .85).

Chapter 06

Factor Analysis

Factor analysis is the conceptual parent of PLS-SEM. Imagine you gave 20 questions to 200 people. Questions 1–5 all correlate with each other, 6–12 all correlate with each other, 13–20 all correlate — but the three groups don't correlate across groups. Factor analysis formalises this: there are probably three underlying latent variables driving each cluster.

Latent variable	A construct you can't measure directly (moral disengagement, trust, aggression). Inferred from observable indicators.
Manifest variable	The actual measured item (question responses, behaviours). Your observables.
Factor loading	How strongly an item loads onto a factor. Like a correlation. Loadings > .70 are considered good; > .40 meaningful.

EFA	Exploratory Factor Analysis. 'I don't know how many factors there are — show me the structure.' Used in scale development. Let the data reveal which items cluster.
CFA	Confirmatory Factor Analysis. 'I have a theoretical model — does the data fit it?' Tests established scales. More rigorous. Used inside SEM.

Chapter 07

PLS-SEM

7.1 What It Is

Structural Equation Modelling (SEM) is regression on steroids. It lets you test a whole theoretical model simultaneously: multiple predictors, multiple outcomes, indirect effects (mediation), and latent variables — all in one analysis. A SEM has two layers:

Measurement model	The CFA part. Confirms that your latent variables are well-measured by their items. Cronbach's α, factor loadings, convergent/discriminant validity.
Structural model	The regression part. Tests the paths between latent constructs. 'Moral disengagement → hostile responding.' Estimates β and R².

7.2 CB-SEM vs PLS-SEM

CB-SEM	e.g. AMOS, lavaan. Fits the model by minimising the difference between the observed covariance matrix and the model-implied one. Gold standard for confirmatory theory testing. Needs larger samples, multivariate normality, established scales.
PLS-SEM	e.g. SmartPLS. Maximises explained variance in outcomes (prediction-oriented). Better for exploratory/complex models, smaller samples, non-normal data. Your dissertation choice.

The Key Trade-off

CB-SEM is the gold standard for theory testing. PLS-SEM is more lenient — it will almost always find some variance explained. Using PLS-SEM alongside hierarchical regression (CB-SEM's simpler cousin) strengthens your triangulation. Two different approaches, same substantive finding: AI factors don't add to the model.

7.3 Key PLS-SEM Statistics

\(\displaystyle AVE = \frac{\sum \lambda_i^2}{k} \qquad CR = \frac{(\sum \lambda_i)^2}{(\sum \lambda_i)^2 + \sum(1-\lambda_i^2)} \qquad HTMT < .85\)

\(\displaystyle f^2 = \frac{R^2_{\text{included}} - R^2_{\text{excluded}}}{1 - R^2_{\text{included}}} \qquad Q^2 > 0 \Rightarrow \text{predictive relevance}\)

AVE	Average Variance Extracted. Convergent validity. AVE > .50 means the construct explains more than half of item variance. Good.
CR	Composite Reliability. Like α but weighted by loadings. CR > .70 acceptable, > .80 good.
HTMT	Heterotrait-Monotrait Ratio. Discriminant validity. HTMT < .85 = constructs are sufficiently distinct.
R² (endogenous)	Variance in outcome variable explained by structural model predictors. Same interpretation as regression R².
β (path coeff.)	Standardised structural paths between constructs. Direction and magnitude of relationships.
f²	Effect size for each path. How much does removing a predictor reduce R²? <.02 negligible, .02–.15 small, .15–.35 medium, >.35 large.
Q²	Stone-Geisser test. Q² > 0 = model has better-than-chance predictive power. Estimated via blindfolding.

Bootstrapping is the standard approach for significance testing in PLS-SEM — it resamples your data thousands of times to estimate standard errors empirically, without assuming normality.

Chapter 08

K-Means Clustering

Clustering is fundamentally different from everything above. Regression and SEM are about relationships between variables. Clustering is about finding natural groups of participants.

\(\displaystyle \underset{S}{\arg\min} \sum_{i=1}^{k} \sum_{x \in S_i} \|x - \mu_i\|^2\)

The algorithm: Place K random centroids in the data space. Assign each participant to their nearest centroid. Recalculate each centroid as the mean of assigned points. Repeat until assignments stop changing.

The GPS Analogy

Imagine plotting every participant as a point in 3D space where X = moral disengagement, Y = AI trust, Z = hostile-response likelihood. K-means finds the K 'towns' in this landscape where most people cluster. It doesn't know what the towns represent — you have to interpret the cluster characteristics after the fact.

Choosing K

Elbow method	Plot within-cluster variance for K = 1, 2, 3, 4… The 'elbow' is where adding more clusters stops dramatically reducing variance.
Silhouette score	For each point: how similar is it to its own cluster vs. the nearest other cluster? Average silhouette width of 1.0 = perfect clusters. 0 = ambiguous.

Limitations

K-means assumes clusters are spherical and similarly sized. Results are sensitive to initial random centroid placement — run multiple times. Outliers distort centroids. Cluster labels are your interpretation — the algorithm gives you groups; theory gives them names.

Chapter 09

Multiple Imputation

Your 164 responses had some missing values — concentrated in one place. The 22 people who said they never use AI tools were routed past the entire AI / cyber-cognition block, so those scales are blank for them (analytic n = 142). How you handle that matters enormously.

Listwise deletion	Drop any participant with any missing value. Simple but wasteful and potentially biased — what if people who skipped items are systematically different?
Mean imputation	Replace missing values with the variable mean. Artificially reduces variance and distorts correlations. Generally bad practice.
Multiple Imputation	Create m complete datasets by estimating plausible missing values from other variables, run your analysis on each, then pool the results using Rubin's Rules. What you used. Best practice.

\(\displaystyle \bar{Q} = \frac{1}{m}\sum_{i=1}^m \hat{Q}_i \qquad T = \bar{U} + \left(1+\frac{1}{m}\right)B\)

The Missing Data Taxonomy

MCAR	Missing Completely At Random. No relationship to any variable. Rare. Listwise deletion is valid here.
MAR	Missing At Random. Missingness depends on other observed variables but not the missing value itself. MI handles this correctly. Your dissertation scenario — the gaps depend on `ai_frequency` (the "Never" skip), which you observed, so include it in the imputation model.
MNAR	Missing Not At Random. Missingness depends on the unobserved value (e.g. high aggressors don't disclose aggression). Most dangerous. MI doesn't fully solve this — requires sensitivity analysis.

Chapter 10

Quick Reference

What technique for what question?

Question	Technique	Key Output
Are two group means different?	Independent t-test	t, p, Cohen's d
Do these items measure the same thing?	Cronbach's alpha / CR	α, CR (>.70 good)
How are two variables related?	Pearson correlation	r, r², p-value
Predict Y from one X	Simple regression	β, R², F, p
Predict Y from multiple X, control covariates	Multiple regression	β, R², ΔR²
Test blocks of predictors sequentially	Hierarchical regression	ΔR², F-change, p
Test whole theoretical model with latent variables	PLS-SEM	β, R², AVE, HTMT, f²
Confirm scale factor structure	CFA	Factor loadings, fit indices
Find natural groups of participants	K-means clustering	Cluster membership, silhouette
Handle missing data properly	Multiple imputation	Pooled estimates, Rubin's Rules

Statistics is not mathematics. It's an argument structure. Every number — an R², a p-value, a path coefficient — is a move in a rhetorical game you're playing with your reader: convincing them that your conclusions follow defensibly from your data.

The mastery isn't in computing the numbers. It's in knowing which argument you're trying to make, choosing the technique that makes that argument honestly, and being clear about what you can and cannot conclude.

You already have that understanding. This document just gives you the vocabulary to express it precisely.