Data Quality Pipeline

Last updated: Apr 17, 2026, 6:46 AM EDT

The TABS project applies a multi-stage data quality pipeline to every survey response. This page documents how responses flow from collection through validation to analysis, the quality checks applied at each stage, the edge cases we discovered and resolved, and the sensitivity analysis that demonstrates our findings are robust to inclusion criteria.

All statistics on this page are generated automatically by the daily analysis pipeline and updated every time the pipeline runs. The numbers shown here match the Prolific platform exactly.

Data Flow

Every survey response passes through this pipeline before appearing in any analysis:

Qualtrics Export - Raw survey responses exported via API (3-header-row CSV format with question text and import IDs)
Prolific Enrichment - Each response is cross-referenced with Prolific submission data (approval status, auth check scores) using the participant ID as join key
Deduplication - When a participant retakes the survey, only one response is kept. Completed responses are preferred over incomplete retakes (see Edge Cases below)
Disposition Waterfall - An 11-step quality classification assigns each response to exactly one disposition category
Sample Definition - Five nested samples are computed, from most restrictive (Conservative Clean) to least (All V2)
Statistical Analysis - Every metric is computed independently across all five samples

Demographic Data Sources

Participant demographics are collected from two independent sources that capture fundamentally different information:

Aspect	Survey Demographics (Qualtrics)	Platform Demographics (Prolific)
Source	Self-reported in the TABS survey instrument (questions Q1-Q9)	Prolific participant profile database (archived at submission completion)
Type	Organizational/role-based characteristics	Personal/sociodemographic + professional characteristics
Base Fields	Executive Role (Q1), Decision Authority (Q2), Industry (Q3), Org Size (Q4), Profit Model (Q5), Revenue/Budget (Q6-Q7), Geography (Q8-Q9)	Age, Sex, Ethnicity, Language, Country of Residence, Nationality, Country of Birth, Student Status, Employment Status
Prescreener Fields	N/A - all fields are part of the survey instrument	Employment Sector, Industry, Company Size, Occupation, Education Level, Household Income, Fluent Languages, and hundreds more via GET /api/v1/filters/ (up to 15 per export)
Collection Method	Qualtrics CSV export, processed by `tabs_v2_analysis.py`	Prolific API `POST /studies/{id}/demographic-export/`
Used in Analysis	✅ All per-group statistics, effect sizes, cross-tabulations	✅ Available via API for cross-validation and sample balancing; base fields not published on results pages (privacy protection)
Cross-Validation	Overlapping dimensions allow independent verification: Prolific `industry`↔ Qualtrics `Q3_Industry`, Prolific `company_size` ↔ Qualtrics `Q4_OrgSize`, Prolific `employment_sector` ↔ Qualtrics `Q5_ProfitModel`, Prolific `occupation` ↔ Qualtrics `Q1_Role`
Join Key	Embedded in Qualtrics response row (same CSV)	Matched via Prolific Participant ID (PID)

Key distinction: Survey Demographics (Qualtrics) and Platform Demographics (Prolific) are separate datasets that capture different types of information. Survey demographics document what role participants hold and what kind of organization they work in. Prolific demographics document who the participants are personally, plus professional characteristics (industry, company size, occupation) when prescreener filters are configured. Where fields overlap (industry, company size, sector, role), they provide an independent cross-validation opportunity to verify self-reported data and balance samples. The two are joined via Prolific Participant ID (PID).

🔒 Privacy-First Enrichment Architecture

Prolific demographic data (both base fields and prescreener responses) contains personally identifiable information (PII). The pipeline processes this data ephemerally:

Demographics are fetched to runner.temp during pipeline execution - never committed to the repository
Cross-validation checks run in-memory; only aggregate pass/fail flags are emitted
Published results pages display only category-level aggregates from Qualtrics survey data (Q1-Q9), never individual Prolific profile data
The PROLIFIC_STUDY_SCREENERS constant in tabs_api.py and prolific-api.ts documents the exact eligibility criteria and filter_ids used for enrichment export

Enrichment Filter Budget (7 of 15 max)

Category	Filter IDs	Count
Study screeners (prescreener)	`employment_sector`, `company_size`, `occupation`	3
Cross-validation	`industry`	1
Augmentation	`education_level`, `household_income`, `fluent_languages`	3
Total		7 / 15

Note: “Current Country of Residence” and “Employment Status” are base fields (always included in every export) and do not count against the 15-filter limit. See Prolific Demographic Export API for export limits (15 filters, 2 configuration changes before lock).

Disposition Waterfall (Steps 0-10)

Each response is evaluated through this 11-step waterfall (steps 0-10). The first matching step determines the disposition - a response is never counted in multiple categories.

Step	Disposition	Criteria
0	INCOMPLETE	Survey not finished (Qualtrics Finished != TRUE)
1	FLAG-AUTH-FAIL	Prolific authenticity check: LLM or Bots score = "Low"
2	FLAG-AUTH-MIXED	Prolific authenticity check: LLM or Bots score = "Mixed"
3	AUTO-EXCLUDE	2+ IRI failures, OR speed flag (<5 min) + any IRI failure
4	FLAG-SPEED	Duration < 5 min but all 3 IRIs correct
5	FLAG-SINGLE-IRI	1 IRI failure at normal speed (>= 5 min)
6	FLAG-SMEAL	Duration 5-9 min (below Smeal eDBA benchmark of 9 min)
7	FLAG-RECAPTCHA	reCAPTCHA score < 0.5
8	FLAG-STRAIGHTLINING	Qualtrics Q_StraightliningCount > 0 (same answer for entire block)
9	FLAG-PARTIAL-STRAIGHTLINING	Within-person SD < 0.5 in any question block (Meade & Craig 2012)
10	CLEAN	All checks passed: finished, all 3 IRIs, duration >= 9 min, reCAPTCHA >= 0.5, no straightlining, auth checks pass

Instructed Response Items (IRIs)

Three attention check items are embedded within the survey, one per construct. Each instructs the respondent to select a specific answer. Exact string match is required - any other value (including “Don’t Know”) is scored as a failure.

Construct	Column	Expected Answer
Barriers (19 items)	Q10-28_Barriers_19	“Major Barrier”
Readiness (18 items)	Q47-64_Readiness_18	“Low Readiness/Capability”
Maturity (9 items)	Q65-73_Maturity_9	“Level 2: Developing/Repeatable”

Sample Definitions

Five nested sample definitions are used, from most to least restrictive. The Prolific Acceptedcount matches the Prolific platform’s “Approved” tab exactly. The clean samples apply additional quality filters on top of Prolific approval.

Sample	Definition	N
Conservative Clean	Prolific APPROVED + all quality checks (IRI, duration >= 540s, reCAPTCHA, straightlining, auth)	89
Flexible Clean	Prolific APPROVED + basic quality (all 3 IRIs + duration >= 480s)	140
Prolific Accepted	All deduplicated V2 rows with Prolific APPROVED status	261
All V2 Finished	Finished + duration >= 120s (extreme speeders excluded)	410
All V2	All V2 responses including incomplete	485

Constraints: Conservative Clean ⊆ Flexible Clean ⊆ Prolific Accepted ⊆ All V2, and All V2 Finished ⊆ All V2. Prolific Accepted and All V2 Finished overlap but neither is guaranteed to be a subset of the other (Prolific Accepted includes INCOMPLETE+APPROVED responses; All V2 Finished includes non-APPROVED responses).

Exact Filter Chains (Authoritative Definitions)

Each sample definition is produced by applying filters in order. These are the canonical definitions used by the analysis pipeline (tabs_v2_analysis.py). Every metric on the Results pages is computed against these exact filters.

1. Conservative Clean (Primary Analysis Sample)

The most restrictive sample. Used for all primary reporting. Requires Prolific approval plus passing every quality gate.

Prolific_Status == “APPROVED”
Qualtrics Finished == TRUE (survey completed)
Duration ≥ 480 seconds (8 minutes)
All 3 IRI attention checks correct (exact string match)
Duration ≥ 540 seconds (9 min Smeal eDBA benchmark)
reCAPTCHA score ≥ 0.5
Q_StraightliningCount == 0 (no full-block straightlining)
Within-person SD ≥ 0.5 in all blocks (no partial straightlining)
Auth_LLM and Auth_Bots not LOW or MIXED

Source: filter_samples() in tabs_v2_analysis.py

2. Flexible Clean (Expanded Quality Sample)

Includes manually-reviewed FLAG responses that were approved on Prolific. Uses a lower duration threshold and only checks IRI attention.

Prolific_Status == “APPROVED”
Qualtrics Finished == TRUE
Duration ≥ 480 seconds (8 minutes)
All 3 IRI attention checks correct

Does NOT check: reCAPTCHA, straightlining, partial straightlining, or auth flags.

3. Prolific Accepted (Platform-Verified Sample)

All deduplicated V2 responses where the participant has been approved on Prolific. This count must matchthe Prolific UI “Approved” tab exactly. Any discrepancy indicates a pipeline bug.

Prolific_Status == “APPROVED”
Deduplicated by PROLIFIC_PID (prefer completed response)

No quality filters. Includes incomplete/short responses if Prolific approved them.

4. All V2 Finished (Completed Responses)

All finished responses above a minimum duration threshold. Not filtered by Prolific status - includes returned, timed-out, and awaiting-review participants.

Qualtrics Finished == TRUE
Duration ≥ 120 seconds (extreme speeders excluded)

5. All V2 (Complete Dataset)

Every V2 response including incomplete, deduplicated by PROLIFIC_PID. This is the universe from which all other samples are drawn.

StartDate on or after V2 launch (2026-03-23)
Deduplicated by PROLIFIC_PID (prefer completed response)

⚠ Disposition CLEAN vs. Conservative Clean

These are related but distinct concepts that serve different purposes:

Disposition CLEAN (from the waterfall above): A response that passes all 10 quality checks without being flagged. Used by the operations pipeline to auto-approve participants on Prolific. Does not check Prolific_Status.
Conservative Clean(sample definition): Requires Prolific_Status == “APPROVED” plus all quality checks. Used for statistical analysis and reporting.

Expected relationship: After the daily auto-approve workflow runs, all Disposition CLEAN participants should have Prolific_Status == APPROVED, making the counts equal. Any persistent gap indicates a pipeline issue. The disposition dashboard cross-references these counts automatically.

Sensitivity Analysis

Every key statistic is computed across all five sample definitions. If a finding holds across Conservative Clean (N=89) and Flexible Clean (N=140), it is robust to inclusion criteria.

Metric	Conservative Clean N=89	Flexible Clean N=140	Prolific Accepted N=261	All V2 Finished N=410	All V2 N=485
Barrier Grand Mean	2.8354	2.8135	2.7944	2.7591	2.764
Barrier SD	0.6252	0.7115	0.7092	0.7658	0.7692
Readiness Grand Mean	3.052	3.0862	3.126	3.2284	3.2285
Readiness SD	0.5643	0.6573	0.6701	0.7194	0.7185
Maturity Grand Mean	3.0526	3.0656	3.153	3.2593	3.2593
Maturity SD	0.6988	0.8064	0.8072	0.8074	0.8074
B-R Correlation	-0.4265	-0.4485	-0.3457	-0.3042	-0.3039
B-M Correlation	-0.1783	-0.3141	-0.2815	-0.3189	-0.3189
R-M Correlation	0.5783	0.7065	0.7208	0.7235	0.7235
Alpha Barriers	0.8535	0.8757	0.8764	0.8997	0.901
Alpha Readiness	0.8677	0.9171	0.9183	0.9317	0.9317
Alpha Maturity	0.8291	0.8871	0.8899	0.8909	0.8909

Edge Cases & Data Quality Decisions

During pipeline development, several edge cases were discovered and resolved. Each decision is documented here for transparency and reproducibility.

Retake Deduplication: Prefer Completed Response

Some participants completed the survey, received Prolific approval, then started a retake but did not finish it. The Qualtrics export contains both rows for the same Prolific PID. The Python analysis pipeline’s deduplication logic prefers the completed response (Finished=TRUE) over the incomplete retake, regardless of chronological order. This ensures the approved, completed response is used for analysis rather than being overwritten by an abandoned retake.

Note: The TypeScript disposition triage (used by the operations pipeline) still uses “latest row wins” dedup, which can keep an incomplete retake over a completed original. This is being addressed in issue #687 (TS → Python migration). The Python analysis pipeline already applies the correct logic.

Prolific Accepted Must Match Prolific UI

The “Prolific Accepted” sample count must match the Prolific platform’s “Approved” tab exactly. This is validated by cross-referencing the Prolific API submission statuses with the Qualtrics export. Any discrepancy indicates a pipeline bug, not a data issue.

The Prolific API is queried with limit=1000 per page to ensure all submissions are fetched. The enrichment step matches Prolific participant IDs to Qualtrics PROLIFIC_PID embedded data fields.

IRI Pass Rate Denominator

IRI (attention check) pass rates are computed using finished responses only as the denominator, not all responses. Incomplete responses cannot have valid IRI answers, so including them would artificially deflate pass rates.

Partial Straightlining Detection

Beyond Qualtrics’ built-in straightlining count, the pipeline computes within-person standard deviation per question block. If a respondent selected nearly identical answers for all items in a block (SD < 0.5), the response is flagged. The threshold follows Meade & Craig (2012), Psychological Methods, 17(3), 437-455.

IRI items are excludedfrom the SD calculation. IRI attention checks have predetermined correct answers (e.g., “Major Barrier”) that differ from typical straightline responses. Including them would artificially inflate within-person variance and mask genuine straightlining. Only substantive scale items are used: 18 Barrier items, 17 Readiness items, and 8 Maturity items.

The minimum response threshold for evaluation is ceil(block_count / 2) items answered, matching the TypeScript disposition pipeline exactly.

Qualtrics Export Format

Qualtrics CSV exports include 3 header rows: column names (row 0), question text (row 1), and import IDs (row 2). Data starts at row 3. The pipeline handles UTF-8 BOM markers (common in Qualtrics exports), embedded newlines in quoted feedback fields, and both label mode (“TRUE”/“FALSE”) and numeric mode (“1”/“0”) for the Finished column.

Don’t Know Responses (Readiness & Maturity)

The Readiness and Maturity constructs allow “Don’t Know” as a response option. These are treated as missing data (excluded from person-level means), not mapped to a numeric value. This prevents artificial deflation of construct scores. The Barriers construct does not include a Don’t Know option.

Reproducibility

All analysis code is open source and can be run independently against the public dataset. The sensitivity analysis shown above is generated automatically by the daily analysis pipeline and committed to the repository as JSON data.

Open Data & Reproducibility Prolific Dashboard View Analysis Scripts on GitHub

See What This Pipeline Produces

Descriptive Statistics - grand means, standard deviations, correlations
Scale Reliability - Cronbach’s alpha across all five samples
Sensitivity Analysis - every metric across all sample definitions
Sample & Demographics - who participated in the survey
← Back to Results Overview