UCA · Research Methods

Using the Dataset

A synthetic dataset and R analysis script to accompany every worked example in the course and textbook.

📋

Overview

The dataset is synthetic — it was generated to match the descriptive statistics, correlation structure, and missing data pattern of the original UCA dissertation (Todd McCaffrey, ATU Letterkenny, 2025). It is not the real survey data, but it behaves statistically the same way. Every result you reproduce here will mirror the dissertation findings.

Participants
167
n = 167
Variables
7
4 scale + 3 demo
Missing
~8%
per scale variable
MD–CA r
.61
key correlation
Key Finding
Moral disengagement is the dominant predictor of cyber-aggression (β = .61, p < .001, ΔR² = .34). AI trust and AI use add negligible incremental variance (ΔR² = .01, p = .43). This null finding is theoretically meaningful.

🗂

Variables

Column name Type Range Description
participant_id integer 1 – 167 Unique participant identifier
age integer 17 – 65 Participant age in years. Mean ≈ 24.3, SD ≈ 6.1
gender integer 1, 2, 3 1 = Female (58%), 2 = Male (39%), 3 = Non-binary / Other (3%)
moral_disengagement numeric 1.0 – 5.0 Composite scale score. Bandura's Mechanisms of Moral Disengagement Scale (8 items). Higher = greater disengagement. M = 2.81, SD = 0.60
cyber_aggression numeric 1.0 – 5.0 Composite scale score. Cyber-Aggression Typology Questionnaire. Higher = greater aggression. M = 3.08, SD = 0.65
ai_trust numeric 1.0 – 5.0 Composite scale score. AI Trust Scale (5 items). Higher = greater trust in AI systems. M = 3.37, SD = 0.68
ai_use numeric 1.0 – 5.0 Composite scale score. AI Use Frequency Scale (4 items). Higher = more frequent AI use. M = 3.19, SD = 0.71
Missing Data
Approximately 8–10% of values are missing per scale variable, consistent with a Missing At Random (MAR) mechanism. Little's MCAR test on the original data: χ² = 14.3, p = .21 — consistent with MAR. Use multiple imputation, not listwise deletion.

Download

Two files. Put them in the same folder on your machine.

📄
uca_synthetic.csv
167 rows × 7 columns
📊
uca_analysis.R
Fully annotated R script

Place both in the same working directory in R, or set your working directory to wherever you saved the CSV.


⚙️

R Setup

If you haven't used R before, install R and RStudio first. Then install the required packages — you only need to do this once.

R
# Install required packages (run once)
install.packages(c(
  "mice",      # multiple imputation
  "psych",     # descriptive stats, alpha
  "ggplot2",   # visualisation
  "dplyr"      # data wrangling
))
R
# Load packages at the start of every session
library(mice)
library(psych)
library(ggplot2)
library(dplyr)
Set Your Working Directory
In RStudio: Session → Set Working Directory → To Source File Location. This tells R where to find the CSV file. Alternatively use setwd("/path/to/your/folder").

🔍

Load & Explore

First steps: load the data and get a feel for it.

R
# Load the dataset
df <- read.csv("uca_synthetic.csv")

# First look
head(df)          # first 6 rows
str(df)           # structure: variable types
dim(df)           # rows × columns: should be 167 × 7

Descriptive Statistics

R
# Full descriptives: mean, SD, median, skew, kurtosis
describe(df[, c("moral_disengagement", "cyber_aggression",
               "ai_trust", "ai_use")])

# Check missing values
colSums(is.na(df))

# Visualise missing data pattern
md.pattern(df[, c("moral_disengagement", "cyber_aggression",
                  "ai_trust", "ai_use")])

You should see means close to: MD = 2.81, CA = 3.08, AIT = 3.37, AIU = 3.19. The missing data pattern will show which combinations of variables have missing values simultaneously.


🔧

Multiple Imputation

Before running any analysis, handle the missing data properly using multiple imputation with the mice package. This creates m = 5 complete datasets and any analysis you run will be pooled across all five using Rubin's Rules.

  1. 1
    Create imputed datasets. The mice() function runs the imputation. method = "pmm" is predictive mean matching — it replaces missing values with plausible observed values from similar participants.
  2. 2
    Run your analysis on each dataset. Use with(imp, ...) to apply a model to all 5 imputed datasets automatically.
  3. 3
    Pool the results. pool() combines the 5 sets of estimates using Rubin's Rules, producing a single set of coefficients with correctly inflated standard errors.
R
# Step 1: Create 5 imputed datasets
set.seed(42)   # for reproducibility

imp <- mice(
  df[, c("moral_disengagement", "cyber_aggression",
         "ai_trust", "ai_use", "age", "gender")],
  m          = 5,        # number of imputed datasets
  method     = "pmm",    # predictive mean matching
  printFlag  = FALSE    # suppress iteration output
)

# Check imputation looks reasonable
densityplot(imp)   # imputed values should overlap observed

# Get one complete dataset for exploration
df_complete <- complete(imp, 1)
What to Check
The densityplot() shows the distribution of imputed values (magenta) overlaid on observed values (blue) for each variable. They should look similar — if imputed values are in a completely different range, something is wrong with the imputation model.

🧱

Hierarchical Regression

The key analysis. Three blocks entered sequentially. The critical question is whether Block 3 (AI factors) adds significant incremental variance above Block 2.

R
# Run each block on all 5 imputed datasets

# Block 1: Demographics only
fit1 <- with(imp,
  lm(cyber_aggression ~ age + gender))

# Block 2: + Moral Disengagement
fit2 <- with(imp,
  lm(cyber_aggression ~ age + gender + moral_disengagement))

# Block 3: + AI Trust and AI Use
fit3 <- with(imp,
  lm(cyber_aggression ~ age + gender + moral_disengagement +
       ai_trust + ai_use))

# Pool results using Rubin's Rules
summary(pool(fit1))
summary(pool(fit2))
summary(pool(fit3))
R
# R² and ΔR² using one complete dataset
m1 <- lm(cyber_aggression ~ age + gender,
        data = df_complete)
m2 <- lm(cyber_aggression ~ age + gender + moral_disengagement,
        data = df_complete)
m3 <- lm(cyber_aggression ~ age + gender + moral_disengagement +
           ai_trust + ai_use, data = df_complete)

cat("Block 1 R²:",  round(summary(m1)$r.squared, 3), "\n")
cat("Block 2 R²:",  round(summary(m2)$r.squared, 3),
    " ΔR²:", round(summary(m2)$r.squared -
                    summary(m1)$r.squared, 3), "\n")
cat("Block 3 R²:",  round(summary(m3)$r.squared, 3),
    " ΔR²:", round(summary(m3)$r.squared -
                    summary(m2)$r.squared, 3), "\n")

# F-change test: does Block 3 add significantly?
anova(m2, m3)
Expected Output
Block 1 R² ≈ .04 · Block 2 R² ≈ .38 (ΔR² ≈ .34, p < .001) · Block 3 R² ≈ .39 (ΔR² ≈ .01, p ≈ .43). The F-change for Block 3 will be non-significant — that is the finding.

🔬

t-test

Compare two groups on cyber-aggression. Here: male vs. female participants.

R
# Independent samples t-test: gender × cyber-aggression
# Filter to male (2) and female (1) only
df_gender <- df_complete[df_complete$gender %in% c(1, 2), ]

t_result <- t.test(cyber_aggression ~ gender, data = df_gender)
print(t_result)

# Cohen's d (effect size)
male   <- df_gender$cyber_aggression[df_gender$gender == 2]
female <- df_gender$cyber_aggression[df_gender$gender == 1]

cohens_d <- (mean(male) - mean(female)) /
  sqrt((sd(male)^2 + sd(female)^2) / 2)

cat("Cohen's d =", round(cohens_d, 3), "\n")
Remember
Always report t, df, p, and Cohen's d. The p-value tells you if the difference is unlikely by chance. Cohen's d tells you how big the difference actually is. You need both.

📈

Correlation

R
# Correlation matrix with p-values
corr.test(df_complete[, c("moral_disengagement",
                          "cyber_aggression",
                          "ai_trust",
                          "ai_use")])

# Scatter plot: MD vs CA with regression line
ggplot(df_complete,
       aes(x = moral_disengagement, y = cyber_aggression)) +
  geom_point(alpha = 0.5, colour = "#0dcfb2") +
  geom_smooth(method = "lm", colour = "#f59e0b") +
  labs(x = "Moral Disengagement",
       y = "Cyber-Aggression",
       title = "Moral Disengagement → Cyber-Aggression") +
  theme_minimal()

Expected: r(MD, CA) ≈ .61 — a large positive correlation. r(AIT, CA) ≈ .15, r(AIU, CA) ≈ .19 — small, and likely non-significant at n = 167 after accounting for other variables.


📜

Full Script

The complete annotated R script runs all of the above in sequence. Download it and open it in RStudio — it's designed to be read top-to-bottom alongside the textbook.

📊
uca_analysis.R
Descriptives · Imputation · Hierarchical Regression · t-test · Correlation · ggplot
Tip
In RStudio, use Ctrl+Enter (Cmd+Enter on Mac) to run one line at a time. Work through the script section by section alongside the relevant module in the interactive course — the numbers will match.