Data Analysis & Quality
The TABS project applies a multi-stage data quality pipeline to every survey response. This page documents how responses flow from collection through validation to analysis, the quality checks applied at each stage, the edge cases we discovered and resolved, and the sensitivity analysis that demonstrates our findings are robust to inclusion criteria.
All statistics on this page are generated automatically by the daily analysis pipeline and updated every time the pipeline runs. The numbers shown here match the Prolific platform exactly.
Data Flow
Every survey response passes through this pipeline before appearing in any analysis:
- Qualtrics Export— Raw survey responses exported via API (3-header-row CSV format with question text and import IDs)
- Prolific Enrichment— Each response is cross-referenced with Prolific submission data (approval status, auth check scores) using the participant ID as join key
- Deduplication— When a participant retakes the survey, only one response is kept. Completed responses are preferred over incomplete retakes (see Edge Cases below)
- Disposition Waterfall— An 11-step quality classification assigns each response to exactly one disposition category
- Sample Definition— Five nested samples are computed, from most restrictive (Conservative Clean) to least (All V2)
- Statistical Analysis— Every metric is computed independently across all five samples
Disposition Waterfall (Steps 0–10)
Each response is evaluated through this 11-step waterfall (steps 0–10). The first matching stepdetermines the disposition — a response is never counted in multiple categories.
| Step | Disposition | Criteria |
|---|---|---|
| 0 | INCOMPLETE | Survey not finished (Qualtrics Finished != TRUE) |
| 1 | FLAG-AUTH-FAIL | Prolific authenticity check: LLM or Bots score = "Low" |
| 2 | FLAG-AUTH-MIXED | Prolific authenticity check: LLM or Bots score = "Mixed" |
| 3 | AUTO-EXCLUDE | 2+ IRI failures, OR speed flag (<5 min) + any IRI failure |
| 4 | FLAG-SPEED | Duration < 5 min but all 3 IRIs correct |
| 5 | FLAG-SINGLE-IRI | 1 IRI failure at normal speed (>= 5 min) |
| 6 | FLAG-SMEAL | Duration 5-9 min (below Smeal eDBA benchmark of 9 min) |
| 7 | FLAG-RECAPTCHA | reCAPTCHA score < 0.5 |
| 8 | FLAG-STRAIGHTLINING | Qualtrics Q_StraightliningCount > 0 (same answer for entire block) |
| 9 | FLAG-PARTIAL-STRAIGHTLINING | Within-person SD < 0.5 in any question block (Meade & Craig 2012) |
| 10 | CLEAN | All checks passed: finished, all 3 IRIs, duration >= 9 min, reCAPTCHA >= 0.5, no straightlining, auth checks pass |
Instructed Response Items (IRIs)
Three attention check items are embedded within the survey, one per construct. Each instructs the respondent to select a specific answer. Exact string match is required — any other value (including “Don’t Know”) is scored as a failure.
| Construct | Column | Expected Answer |
|---|---|---|
| Barriers (19 items) | Q10-28_Barriers_19 | “Major Barrier” |
| Readiness (18 items) | Q47-64_Readiness_18 | “Low Readiness/Capability” |
| Maturity (9 items) | Q65-73_Maturity_9 | “Level 2: Developing/Repeatable” |
Sample Definitions
Five nested sample definitions are used, from most to least restrictive. The Prolific Acceptedcount matches the Prolific platform’s “Approved” tab exactly. The clean samples apply additional quality filters on top of Prolific approval.
| Sample | Definition | N |
|---|---|---|
| Conservative Clean | Prolific APPROVED + all quality checks (IRI, duration >= 540s, reCAPTCHA, straightlining, auth) | 75 |
| Flexible Clean | Prolific APPROVED + basic quality (all 3 IRIs + duration >= 480s) | 116 |
| Prolific Accepted | All deduplicated V2 rows with Prolific APPROVED status | 206 |
| All V2 Finished | Finished + duration >= 120s (extreme speeders excluded) | 331 |
| All V2 | All V2 responses including incomplete | 388 |
Constraints: Conservative Clean ⊆ Flexible Clean ⊆ Prolific Accepted ⊆ All V2, and All V2 Finished ⊆ All V2. Prolific Accepted and All V2 Finished overlap but neither is guaranteed to be a subset of the other (Prolific Accepted includes INCOMPLETE+APPROVED responses; All V2 Finished includes non-APPROVED responses).
Sensitivity Analysis
Every key statistic is computed across all five sample definitions. If a finding holds across Conservative Clean (N=75) and Flexible Clean (N=116), it is robust to inclusion criteria.
| Metric | Conservative Clean N=75 | Flexible Clean N=116 | Prolific Accepted N=206 | All V2 Finished N=331 | All V2 N=388 |
|---|---|---|---|---|---|
| Barrier Grand Mean | 2.7995 | 2.7854 | 2.7627 | 2.7307 | 2.7371 |
| Barrier SD | 0.633 | 0.694 | 0.7069 | 0.7694 | 0.7737 |
| Readiness Grand Mean | 3.0674 | 3.1096 | 3.1598 | 3.258 | 3.258 |
| Readiness SD | 0.5864 | 0.6457 | 0.6674 | 0.7246 | 0.7235 |
| Maturity Grand Mean | 3.0252 | 3.0594 | 3.1555 | 3.2721 | 3.2721 |
| Maturity SD | 0.7242 | 0.7936 | 0.8098 | 0.8039 | 0.8039 |
| B-R Correlation | -0.5154 | -0.4392 | -0.3473 | -0.3213 | -0.321 |
| B-M Correlation | -0.2689 | -0.2891 | -0.2756 | -0.2852 | -0.2852 |
| R-M Correlation | 0.6141 | 0.6881 | 0.7 | 0.7414 | 0.7414 |
| Alpha Barriers | 0.8612 | 0.8709 | 0.8758 | 0.9014 | 0.903 |
| Alpha Readiness | 0.8772 | 0.9141 | 0.9204 | 0.934 | 0.934 |
| Alpha Maturity | 0.8436 | 0.8828 | 0.8907 | 0.8916 | 0.8916 |
Edge Cases & Data Quality Decisions
During pipeline development, several edge cases were discovered and resolved. Each decision is documented here for transparency and reproducibility.
Retake Deduplication: Prefer Completed Response
Some participants completed the survey, received Prolific approval, then started a retake but did not finish it. The Qualtrics export contains both rows for the same Prolific PID. The Python analysis pipeline’s deduplication logic prefers the completed response (Finished=TRUE) over the incomplete retake, regardless of chronological order. This ensures the approved, completed response is used for analysis rather than being overwritten by an abandoned retake.
Note: The TypeScript disposition triage (used by the operations pipeline) still uses “latest row wins” dedup, which can keep an incomplete retake over a completed original. This is being addressed in issue #687 (TS → Python migration). The Python analysis pipeline already applies the correct logic.
Prolific Accepted Must Match Prolific UI
The “Prolific Accepted” sample count must match the Prolific platform’s “Approved” tab exactly. This is validated by cross-referencing the Prolific API submission statuses with the Qualtrics export. Any discrepancy indicates a pipeline bug, not a data issue.
The Prolific API is queried with limit=1000 per page to ensure all submissions are fetched. The enrichment step matches Prolific participant IDs to Qualtrics PROLIFIC_PID embedded data fields.
IRI Pass Rate Denominator
IRI (attention check) pass rates are computed using finished responses only as the denominator, not all responses. Incomplete responses cannot have valid IRI answers, so including them would artificially deflate pass rates.
Partial Straightlining Detection
Beyond Qualtrics’ built-in straightlining count, the pipeline computes within-person standard deviation per question block. If a respondent selected nearly identical answers for all items in a block (SD < 0.5), the response is flagged. The threshold follows Meade & Craig (2012), Psychological Methods, 17(3), 437-455.
The minimum response threshold for evaluation is ceil(block_count / 2) items answered, matching the TypeScript disposition pipeline exactly.
Qualtrics Export Format
Qualtrics CSV exports include 3 header rows: column names (row 0), question text (row 1), and import IDs (row 2). Data starts at row 3. The pipeline handles UTF-8 BOM markers (common in Qualtrics exports), embedded newlines in quoted feedback fields, and both label mode (“TRUE”/“FALSE”) and numeric mode (“1”/“0”) for the Finished column.
Don’t Know Responses (Readiness & Maturity)
The Readiness and Maturity constructs allow “Don’t Know” as a response option. These are treated as missing data (excluded from person-level means), not mapped to a numeric value. This prevents artificial deflation of construct scores. The Barriers construct does not include a Don’t Know option.
Reproducibility
All analysis code is open source and can be run independently against the public dataset. The sensitivity analysis shown above is generated automatically by the daily analysis pipeline and committed to the repository as JSON data.
