Data Analysis & Quality

The TABS project applies a multi-stage data quality pipeline to every survey response. This page documents how responses flow from collection through validation to analysis, the quality checks applied at each stage, the edge cases we discovered and resolved, and the sensitivity analysis that demonstrates our findings are robust to inclusion criteria.

All statistics on this page are generated automatically by the daily analysis pipeline and updated every time the pipeline runs. The numbers shown here match the Prolific platform exactly.

Data Flow

Every survey response passes through this pipeline before appearing in any analysis:

  1. Qualtrics Export— Raw survey responses exported via API (3-header-row CSV format with question text and import IDs)
  2. Prolific Enrichment— Each response is cross-referenced with Prolific submission data (approval status, auth check scores) using the participant ID as join key
  3. Deduplication— When a participant retakes the survey, only one response is kept. Completed responses are preferred over incomplete retakes (see Edge Cases below)
  4. Disposition Waterfall— An 11-step quality classification assigns each response to exactly one disposition category
  5. Sample Definition— Five nested samples are computed, from most restrictive (Conservative Clean) to least (All V2)
  6. Statistical Analysis— Every metric is computed independently across all five samples

Disposition Waterfall (Steps 0–10)

Each response is evaluated through this 11-step waterfall (steps 0–10). The first matching stepdetermines the disposition — a response is never counted in multiple categories.

StepDispositionCriteria
0INCOMPLETESurvey not finished (Qualtrics Finished != TRUE)
1FLAG-AUTH-FAILProlific authenticity check: LLM or Bots score = "Low"
2FLAG-AUTH-MIXEDProlific authenticity check: LLM or Bots score = "Mixed"
3AUTO-EXCLUDE2+ IRI failures, OR speed flag (<5 min) + any IRI failure
4FLAG-SPEEDDuration < 5 min but all 3 IRIs correct
5FLAG-SINGLE-IRI1 IRI failure at normal speed (>= 5 min)
6FLAG-SMEALDuration 5-9 min (below Smeal eDBA benchmark of 9 min)
7FLAG-RECAPTCHAreCAPTCHA score < 0.5
8FLAG-STRAIGHTLININGQualtrics Q_StraightliningCount > 0 (same answer for entire block)
9FLAG-PARTIAL-STRAIGHTLININGWithin-person SD < 0.5 in any question block (Meade & Craig 2012)
10CLEANAll checks passed: finished, all 3 IRIs, duration >= 9 min, reCAPTCHA >= 0.5, no straightlining, auth checks pass

Instructed Response Items (IRIs)

Three attention check items are embedded within the survey, one per construct. Each instructs the respondent to select a specific answer. Exact string match is required — any other value (including “Don’t Know”) is scored as a failure.

ConstructColumnExpected Answer
Barriers (19 items)Q10-28_Barriers_19“Major Barrier”
Readiness (18 items)Q47-64_Readiness_18“Low Readiness/Capability”
Maturity (9 items)Q65-73_Maturity_9“Level 2: Developing/Repeatable”

Sample Definitions

Five nested sample definitions are used, from most to least restrictive. The Prolific Acceptedcount matches the Prolific platform’s “Approved” tab exactly. The clean samples apply additional quality filters on top of Prolific approval.

SampleDefinitionN
Conservative CleanProlific APPROVED + all quality checks (IRI, duration >= 540s, reCAPTCHA, straightlining, auth)75
Flexible CleanProlific APPROVED + basic quality (all 3 IRIs + duration >= 480s)116
Prolific AcceptedAll deduplicated V2 rows with Prolific APPROVED status206
All V2 FinishedFinished + duration >= 120s (extreme speeders excluded)331
All V2All V2 responses including incomplete388

Constraints: Conservative Clean ⊆ Flexible Clean ⊆ Prolific Accepted ⊆ All V2, and All V2 Finished ⊆ All V2. Prolific Accepted and All V2 Finished overlap but neither is guaranteed to be a subset of the other (Prolific Accepted includes INCOMPLETE+APPROVED responses; All V2 Finished includes non-APPROVED responses).

Sensitivity Analysis

Every key statistic is computed across all five sample definitions. If a finding holds across Conservative Clean (N=75) and Flexible Clean (N=116), it is robust to inclusion criteria.

MetricConservative Clean
N=75
Flexible Clean
N=116
Prolific Accepted
N=206
All V2 Finished
N=331
All V2
N=388
Barrier Grand Mean2.79952.78542.76272.73072.7371
Barrier SD0.6330.6940.70690.76940.7737
Readiness Grand Mean3.06743.10963.15983.2583.258
Readiness SD0.58640.64570.66740.72460.7235
Maturity Grand Mean3.02523.05943.15553.27213.2721
Maturity SD0.72420.79360.80980.80390.8039
B-R Correlation-0.5154-0.4392-0.3473-0.3213-0.321
B-M Correlation-0.2689-0.2891-0.2756-0.2852-0.2852
R-M Correlation0.61410.68810.70.74140.7414
Alpha Barriers0.86120.87090.87580.90140.903
Alpha Readiness0.87720.91410.92040.9340.934
Alpha Maturity0.84360.88280.89070.89160.8916

Edge Cases & Data Quality Decisions

During pipeline development, several edge cases were discovered and resolved. Each decision is documented here for transparency and reproducibility.

Retake Deduplication: Prefer Completed Response

Some participants completed the survey, received Prolific approval, then started a retake but did not finish it. The Qualtrics export contains both rows for the same Prolific PID. The Python analysis pipeline’s deduplication logic prefers the completed response (Finished=TRUE) over the incomplete retake, regardless of chronological order. This ensures the approved, completed response is used for analysis rather than being overwritten by an abandoned retake.

Note: The TypeScript disposition triage (used by the operations pipeline) still uses “latest row wins” dedup, which can keep an incomplete retake over a completed original. This is being addressed in issue #687 (TS → Python migration). The Python analysis pipeline already applies the correct logic.

Prolific Accepted Must Match Prolific UI

The “Prolific Accepted” sample count must match the Prolific platform’s “Approved” tab exactly. This is validated by cross-referencing the Prolific API submission statuses with the Qualtrics export. Any discrepancy indicates a pipeline bug, not a data issue.

The Prolific API is queried with limit=1000 per page to ensure all submissions are fetched. The enrichment step matches Prolific participant IDs to Qualtrics PROLIFIC_PID embedded data fields.

IRI Pass Rate Denominator

IRI (attention check) pass rates are computed using finished responses only as the denominator, not all responses. Incomplete responses cannot have valid IRI answers, so including them would artificially deflate pass rates.

Partial Straightlining Detection

Beyond Qualtrics’ built-in straightlining count, the pipeline computes within-person standard deviation per question block. If a respondent selected nearly identical answers for all items in a block (SD < 0.5), the response is flagged. The threshold follows Meade & Craig (2012), Psychological Methods, 17(3), 437-455.

The minimum response threshold for evaluation is ceil(block_count / 2) items answered, matching the TypeScript disposition pipeline exactly.

Qualtrics Export Format

Qualtrics CSV exports include 3 header rows: column names (row 0), question text (row 1), and import IDs (row 2). Data starts at row 3. The pipeline handles UTF-8 BOM markers (common in Qualtrics exports), embedded newlines in quoted feedback fields, and both label mode (“TRUE”/“FALSE”) and numeric mode (“1”/“0”) for the Finished column.

Don’t Know Responses (Readiness & Maturity)

The Readiness and Maturity constructs allow “Don’t Know” as a response option. These are treated as missing data (excluded from person-level means), not mapped to a numeric value. This prevents artificial deflation of construct scores. The Barriers construct does not include a Don’t Know option.

Reproducibility

All analysis code is open source and can be run independently against the public dataset. The sensitivity analysis shown above is generated automatically by the daily analysis pipeline and committed to the repository as JSON data.