Data Quality Pipeline
Last updated: Apr 17, 2026, 6:46 AM EDT
The TABS project applies a multi-stage data quality pipeline to every survey response. This page documents how responses flow from collection through validation to analysis, the quality checks applied at each stage, the edge cases we discovered and resolved, and the sensitivity analysis that demonstrates our findings are robust to inclusion criteria.
All statistics on this page are generated automatically by the daily analysis pipeline and updated every time the pipeline runs. The numbers shown here match the Prolific platform exactly.
Data Flow
Every survey response passes through this pipeline before appearing in any analysis:
- Qualtrics Export - Raw survey responses exported via API (3-header-row CSV format with question text and import IDs)
- Prolific Enrichment - Each response is cross-referenced with Prolific submission data (approval status, auth check scores) using the participant ID as join key
- Deduplication - When a participant retakes the survey, only one response is kept. Completed responses are preferred over incomplete retakes (see Edge Cases below)
- Disposition Waterfall - An 11-step quality classification assigns each response to exactly one disposition category
- Sample Definition - Five nested samples are computed, from most restrictive (Conservative Clean) to least (All V2)
- Statistical Analysis - Every metric is computed independently across all five samples
Demographic Data Sources
Participant demographics are collected from two independent sources that capture fundamentally different information:
| Aspect | Survey Demographics (Qualtrics) | Platform Demographics (Prolific) |
|---|---|---|
| Source | Self-reported in the TABS survey instrument (questions Q1-Q9) | Prolific participant profile database (archived at submission completion) |
| Type | Organizational/role-based characteristics | Personal/sociodemographic + professional characteristics |
| Base Fields | Executive Role (Q1), Decision Authority (Q2), Industry (Q3), Org Size (Q4), Profit Model (Q5), Revenue/Budget (Q6-Q7), Geography (Q8-Q9) | Age, Sex, Ethnicity, Language, Country of Residence, Nationality, Country of Birth, Student Status, Employment Status |
| Prescreener Fields | N/A - all fields are part of the survey instrument | Employment Sector, Industry, Company Size, Occupation, Education Level, Household Income, Fluent Languages, and hundreds more via GET /api/v1/filters/ (up to 15 per export) |
| Collection Method | Qualtrics CSV export, processed by tabs_v2_analysis.py | Prolific API POST /studies/{id}/demographic-export/ |
| Used in Analysis | â All per-group statistics, effect sizes, cross-tabulations | â Available via API for cross-validation and sample balancing; base fields not published on results pages (privacy protection) |
| Cross-Validation | Overlapping dimensions allow independent verification: Prolific industryâ Qualtrics Q3_Industry, Prolific company_size â Qualtrics Q4_OrgSize, Prolific employment_sector â Qualtrics Q5_ProfitModel, Prolific occupation â Qualtrics Q1_Role | |
| Join Key | Embedded in Qualtrics response row (same CSV) | Matched via Prolific Participant ID (PID) |
Key distinction: Survey Demographics (Qualtrics) and Platform Demographics (Prolific) are separate datasets that capture different types of information. Survey demographics document what role participants hold and what kind of organization they work in. Prolific demographics document who the participants are personally, plus professional characteristics (industry, company size, occupation) when prescreener filters are configured. Where fields overlap (industry, company size, sector, role), they provide an independent cross-validation opportunity to verify self-reported data and balance samples. The two are joined via Prolific Participant ID (PID).
đ Privacy-First Enrichment Architecture
Prolific demographic data (both base fields and prescreener responses) contains personally identifiable information (PII). The pipeline processes this data ephemerally:
- Demographics are fetched to
runner.tempduring pipeline execution - never committed to the repository - Cross-validation checks run in-memory; only aggregate pass/fail flags are emitted
- Published results pages display only category-level aggregates from Qualtrics survey data (Q1-Q9), never individual Prolific profile data
- The
PROLIFIC_STUDY_SCREENERSconstant intabs_api.pyandprolific-api.tsdocuments the exact eligibility criteria and filter_ids used for enrichment export
Enrichment Filter Budget (7 of 15 max)
| Category | Filter IDs | Count |
|---|---|---|
| Study screeners (prescreener) | employment_sector, company_size, occupation | 3 |
| Cross-validation | industry | 1 |
| Augmentation | education_level, household_income, fluent_languages | 3 |
| Total | Â | 7 / 15 |
Note: âCurrent Country of Residenceâ and âEmployment Statusâ are base fields (always included in every export) and do not count against the 15-filter limit. See Prolific Demographic Export API for export limits (15 filters, 2 configuration changes before lock).
Disposition Waterfall (Steps 0-10)
Each response is evaluated through this 11-step waterfall (steps 0-10). The first matching step determines the disposition - a response is never counted in multiple categories.
| Step | Disposition | Criteria |
|---|---|---|
| 0 | INCOMPLETE | Survey not finished (Qualtrics Finished != TRUE) |
| 1 | FLAG-AUTH-FAIL | Prolific authenticity check: LLM or Bots score = "Low" |
| 2 | FLAG-AUTH-MIXED | Prolific authenticity check: LLM or Bots score = "Mixed" |
| 3 | AUTO-EXCLUDE | 2+ IRI failures, OR speed flag (<5 min) + any IRI failure |
| 4 | FLAG-SPEED | Duration < 5 min but all 3 IRIs correct |
| 5 | FLAG-SINGLE-IRI | 1 IRI failure at normal speed (>= 5 min) |
| 6 | FLAG-SMEAL | Duration 5-9 min (below Smeal eDBA benchmark of 9 min) |
| 7 | FLAG-RECAPTCHA | reCAPTCHA score < 0.5 |
| 8 | FLAG-STRAIGHTLINING | Qualtrics Q_StraightliningCount > 0 (same answer for entire block) |
| 9 | FLAG-PARTIAL-STRAIGHTLINING | Within-person SD < 0.5 in any question block (Meade & Craig 2012) |
| 10 | CLEAN | All checks passed: finished, all 3 IRIs, duration >= 9 min, reCAPTCHA >= 0.5, no straightlining, auth checks pass |
Instructed Response Items (IRIs)
Three attention check items are embedded within the survey, one per construct. Each instructs the respondent to select a specific answer. Exact string match is required - any other value (including âDonât Knowâ) is scored as a failure.
| Construct | Column | Expected Answer |
|---|---|---|
| Barriers (19 items) | Q10-28_Barriers_19 | âMajor Barrierâ |
| Readiness (18 items) | Q47-64_Readiness_18 | âLow Readiness/Capabilityâ |
| Maturity (9 items) | Q65-73_Maturity_9 | âLevel 2: Developing/Repeatableâ |
Sample Definitions
Five nested sample definitions are used, from most to least restrictive. The Prolific Acceptedcount matches the Prolific platformâs âApprovedâ tab exactly. The clean samples apply additional quality filters on top of Prolific approval.
| Sample | Definition | N |
|---|---|---|
| Conservative Clean | Prolific APPROVED + all quality checks (IRI, duration >= 540s, reCAPTCHA, straightlining, auth) | 89 |
| Flexible Clean | Prolific APPROVED + basic quality (all 3 IRIs + duration >= 480s) | 140 |
| Prolific Accepted | All deduplicated V2 rows with Prolific APPROVED status | 261 |
| All V2 Finished | Finished + duration >= 120s (extreme speeders excluded) | 410 |
| All V2 | All V2 responses including incomplete | 485 |
Constraints: Conservative Clean â Flexible Clean â Prolific Accepted â All V2, and All V2 Finished â All V2. Prolific Accepted and All V2 Finished overlap but neither is guaranteed to be a subset of the other (Prolific Accepted includes INCOMPLETE+APPROVED responses; All V2 Finished includes non-APPROVED responses).
Exact Filter Chains (Authoritative Definitions)
Each sample definition is produced by applying filters in order. These are the canonical definitions used by the analysis pipeline (tabs_v2_analysis.py). Every metric on the Results pages is computed against these exact filters.
1. Conservative Clean (Primary Analysis Sample)
The most restrictive sample. Used for all primary reporting. Requires Prolific approval plus passing every quality gate.
- Prolific_Status == âAPPROVEDâ
- Qualtrics Finished == TRUE (survey completed)
- Duration â„ 480 seconds (8 minutes)
- All 3 IRI attention checks correct (exact string match)
- Duration â„ 540 seconds (9 min Smeal eDBA benchmark)
- reCAPTCHA score â„ 0.5
- Q_StraightliningCount == 0 (no full-block straightlining)
- Within-person SD â„ 0.5 in all blocks (no partial straightlining)
- Auth_LLM and Auth_Bots not LOW or MIXED
Source: filter_samples() in tabs_v2_analysis.py
2. Flexible Clean (Expanded Quality Sample)
Includes manually-reviewed FLAG responses that were approved on Prolific. Uses a lower duration threshold and only checks IRI attention.
- Prolific_Status == âAPPROVEDâ
- Qualtrics Finished == TRUE
- Duration â„ 480 seconds (8 minutes)
- All 3 IRI attention checks correct
Does NOT check: reCAPTCHA, straightlining, partial straightlining, or auth flags.
3. Prolific Accepted (Platform-Verified Sample)
All deduplicated V2 responses where the participant has been approved on Prolific. This count must matchthe Prolific UI âApprovedâ tab exactly. Any discrepancy indicates a pipeline bug.
- Prolific_Status == âAPPROVEDâ
- Deduplicated by PROLIFIC_PID (prefer completed response)
No quality filters. Includes incomplete/short responses if Prolific approved them.
4. All V2 Finished (Completed Responses)
All finished responses above a minimum duration threshold. Not filtered by Prolific status - includes returned, timed-out, and awaiting-review participants.
- Qualtrics Finished == TRUE
- Duration â„ 120 seconds (extreme speeders excluded)
5. All V2 (Complete Dataset)
Every V2 response including incomplete, deduplicated by PROLIFIC_PID. This is the universe from which all other samples are drawn.
- StartDate on or after V2 launch (2026-03-23)
- Deduplicated by PROLIFIC_PID (prefer completed response)
â Disposition CLEAN vs. Conservative Clean
These are related but distinct concepts that serve different purposes:
- Disposition CLEAN (from the waterfall above): A response that passes all 10 quality checks without being flagged. Used by the operations pipeline to auto-approve participants on Prolific. Does not check Prolific_Status.
- Conservative Clean(sample definition): Requires Prolific_Status == âAPPROVEDâ plus all quality checks. Used for statistical analysis and reporting.
Expected relationship: After the daily auto-approve workflow runs, all Disposition CLEAN participants should have Prolific_Status == APPROVED, making the counts equal. Any persistent gap indicates a pipeline issue. The disposition dashboard cross-references these counts automatically.
Sensitivity Analysis
Every key statistic is computed across all five sample definitions. If a finding holds across Conservative Clean (N=89) and Flexible Clean (N=140), it is robust to inclusion criteria.
| Metric | Conservative Clean N=89 | Flexible Clean N=140 | Prolific Accepted N=261 | All V2 Finished N=410 | All V2 N=485 |
|---|---|---|---|---|---|
| Barrier Grand Mean | 2.8354 | 2.8135 | 2.7944 | 2.7591 | 2.764 |
| Barrier SD | 0.6252 | 0.7115 | 0.7092 | 0.7658 | 0.7692 |
| Readiness Grand Mean | 3.052 | 3.0862 | 3.126 | 3.2284 | 3.2285 |
| Readiness SD | 0.5643 | 0.6573 | 0.6701 | 0.7194 | 0.7185 |
| Maturity Grand Mean | 3.0526 | 3.0656 | 3.153 | 3.2593 | 3.2593 |
| Maturity SD | 0.6988 | 0.8064 | 0.8072 | 0.8074 | 0.8074 |
| B-R Correlation | -0.4265 | -0.4485 | -0.3457 | -0.3042 | -0.3039 |
| B-M Correlation | -0.1783 | -0.3141 | -0.2815 | -0.3189 | -0.3189 |
| R-M Correlation | 0.5783 | 0.7065 | 0.7208 | 0.7235 | 0.7235 |
| Alpha Barriers | 0.8535 | 0.8757 | 0.8764 | 0.8997 | 0.901 |
| Alpha Readiness | 0.8677 | 0.9171 | 0.9183 | 0.9317 | 0.9317 |
| Alpha Maturity | 0.8291 | 0.8871 | 0.8899 | 0.8909 | 0.8909 |
Edge Cases & Data Quality Decisions
During pipeline development, several edge cases were discovered and resolved. Each decision is documented here for transparency and reproducibility.
Retake Deduplication: Prefer Completed Response
Some participants completed the survey, received Prolific approval, then started a retake but did not finish it. The Qualtrics export contains both rows for the same Prolific PID. The Python analysis pipelineâs deduplication logic prefers the completed response (Finished=TRUE) over the incomplete retake, regardless of chronological order. This ensures the approved, completed response is used for analysis rather than being overwritten by an abandoned retake.
Note: The TypeScript disposition triage (used by the operations pipeline) still uses âlatest row winsâ dedup, which can keep an incomplete retake over a completed original. This is being addressed in issue #687 (TS â Python migration). The Python analysis pipeline already applies the correct logic.
Prolific Accepted Must Match Prolific UI
The âProlific Acceptedâ sample count must match the Prolific platformâs âApprovedâ tab exactly. This is validated by cross-referencing the Prolific API submission statuses with the Qualtrics export. Any discrepancy indicates a pipeline bug, not a data issue.
The Prolific API is queried with limit=1000 per page to ensure all submissions are fetched. The enrichment step matches Prolific participant IDs to Qualtrics PROLIFIC_PID embedded data fields.
IRI Pass Rate Denominator
IRI (attention check) pass rates are computed using finished responses only as the denominator, not all responses. Incomplete responses cannot have valid IRI answers, so including them would artificially deflate pass rates.
Partial Straightlining Detection
Beyond Qualtricsâ built-in straightlining count, the pipeline computes within-person standard deviation per question block. If a respondent selected nearly identical answers for all items in a block (SD < 0.5), the response is flagged. The threshold follows Meade & Craig (2012), Psychological Methods, 17(3), 437-455.
IRI items are excludedfrom the SD calculation. IRI attention checks have predetermined correct answers (e.g., âMajor Barrierâ) that differ from typical straightline responses. Including them would artificially inflate within-person variance and mask genuine straightlining. Only substantive scale items are used: 18 Barrier items, 17 Readiness items, and 8 Maturity items.
The minimum response threshold for evaluation is ceil(block_count / 2) items answered, matching the TypeScript disposition pipeline exactly.
Qualtrics Export Format
Qualtrics CSV exports include 3 header rows: column names (row 0), question text (row 1), and import IDs (row 2). Data starts at row 3. The pipeline handles UTF-8 BOM markers (common in Qualtrics exports), embedded newlines in quoted feedback fields, and both label mode (âTRUEâ/âFALSEâ) and numeric mode (â1â/â0â) for the Finished column.
Donât Know Responses (Readiness & Maturity)
The Readiness and Maturity constructs allow âDonât Knowâ as a response option. These are treated as missing data (excluded from person-level means), not mapped to a numeric value. This prevents artificial deflation of construct scores. The Barriers construct does not include a Donât Know option.
Reproducibility
All analysis code is open source and can be run independently against the public dataset. The sensitivity analysis shown above is generated automatically by the daily analysis pipeline and committed to the repository as JSON data.
See What This Pipeline Produces
- Descriptive Statistics - grand means, standard deviations, correlations
- Scale Reliability - Cronbachâs alpha across all five samples
- Sensitivity Analysis - every metric across all sample definitions
- Sample & Demographics - who participated in the survey
- â Back to Results Overview