Statistics Glossary

Published: April 2026

This glossary explains every psychometric and statistical method used in the TABS instrument validation. Each entry describes what the statistic measures, how it is calculated, commonly accepted thresholds, and the TABS-specific results from the CRP-200 frozen dataset (N=200).

Reliability

Cronbach's Alpha (α)

What it measures: Internal consistency reliability - the degree to which all items in a scale measure the same underlying construct. Higher alpha means items are more closely related.

How it’s calculated: α = (k / (k − 1)) × (1 − Σs²ᵢ / s²ₜ), where k = number of items, s²ᵢ = variance of each item, s²ₜ = variance of the total score. TABS also reports the Feldt (1965) 95% confidence interval.

Thresholds: ≥ 0.70 acceptable, ≥ 0.80 good, ≥ 0.90 excellent. Values above 0.95 may indicate item redundancy.

TABS result: Barriers α = .873, Readiness α = .917, Maturity α = .885 - all in the "good to excellent" range.

McDonald's Omega (ω)

What it measures: Model-based reliability that accounts for varying item-factor loadings. Unlike alpha, which assumes all items contribute equally (tau-equivalent model), omega uses the actual factor loading weights from a single-factor model.

How it’s calculated: ω = (Σλᵢ)² / ((Σλᵢ)² + Σ(1 − λ²ᵢ)), where λᵢ are the standardized factor loadings from a single-factor model. This is equivalent to total reliability omega.

Thresholds: Same thresholds as alpha (≥ 0.70 acceptable). Omega is generally preferred over alpha by modern psychometricians because it relaxes the tau-equivalent assumption.

TABS result: Barriers ω = .873, Readiness ω = .918, Maturity ω = .886 - very close to alpha values, suggesting reasonably tau-equivalent item loadings.

Composite Reliability (CR)

What it measures: The proportion of variance in the composite score that is attributable to the true score. CR is computed from factor loadings and is a key component of construct validity assessment alongside AVE.

How it’s calculated: CR = (Σλᵢ)² / ((Σλᵢ)² + Σε²ᵢ), where λᵢ are standardized factor loadings and ε²ᵢ = 1 − λ²ᵢ are the error variances. Numerically equivalent to omega when computed from the same loadings.

Thresholds: ≥ 0.70 acceptable, ≥ 0.80 good. When CR exceeds 0.70 but AVE falls below 0.50, convergent validity is "adequate" per Fornell & Larcker (1981).

TABS result: Barriers CR = .873, Readiness CR = .918, Maturity CR = .886 - all above 0.80, compensating for below-threshold AVE values.

Split-Half Reliability

What it measures: An alternative reliability estimate that splits the scale into two halves (odd-numbered vs even-numbered items), computes the correlation between halves, and adjusts for test length.

How it’s calculated: rₛₕ = 2r / (1 + r), where r is the Pearson correlation between the two half-scores. This is the Spearman-Brown prophecy formula, which estimates what the reliability would be if both halves were combined.

Thresholds: Same as alpha: ≥ 0.70 acceptable. Split-half reliability provides a useful cross-check against alpha - large discrepancies suggest item ordering effects.

TABS result: Barriers = .896, Readiness = .912, Maturity = .896 - all consistent with alpha/omega values.

Alpha-if-Deleted

What it measures: What Cronbach's alpha would be if a specific item were removed from the scale. Items whose deletion would increase alpha may be weakening internal consistency.

How it’s calculated: Recompute Cronbach's alpha on the remaining k−1 items for each item in turn. Report the change (Δα) and flag items where deletion would increase alpha.

Thresholds: If Δα > 0, the item may be a candidate for removal (though substantive justification should outweigh statistical criteria alone).

TABS result: The current validation summary does not export per-item alpha-if-deleted values. Overall Barriers alpha is .873, indicating adequate scale reliability. Any item whose deletion would substantially increase alpha would be a candidate for review on substantive grounds.

Item Analysis

Corrected Item-Total Correlation (CITC)

What it measures: The Pearson correlation between each item and the sum of all other items in the scale (excluding that item). This avoids the part-whole correlation inflation that occurs if the item is included in its own total. It measures how well each item 'tracks' with the rest of the scale.

How it’s calculated: For item i, CITCᵢ = r(xᵢ, T₋ᵢ), where T₋ᵢ is the sum of all items except item i.

Thresholds: ≥ 0.30 acceptable. Items below 0.30 may not be measuring the same construct as the other items, or may be interpreted inconsistently by respondents.

TABS result: At least one Barriers item falls below the 0.30 threshold (minimum observed CITC = .28). Items below threshold are typically retained when substantive coverage justifies it.

Inter-Item Correlation Matrix

What it measures: The full matrix of pairwise Pearson correlations between all items in a scale. Summary statistics (mean, min, max, SD) describe how tightly items cluster together.

How it’s calculated: Standard Pearson r between each pair of items. Summary: mean of all unique pairwise correlations, plus minimum, maximum, and standard deviation.

Thresholds: Mean inter-item correlation between 0.15 and 0.50 is optimal (Clark & Watson, 1995). Below 0.15 suggests items are too heterogeneous; above 0.50 suggests redundancy. No negative correlations should exist for a unidimensional scale.

TABS result: Barriers mean r = .277, Readiness mean r = .397, Maturity mean r = .490. All within acceptable ranges, with Maturity approaching the upper bound (consistent with its 8-item high-homogeneity scale).

Factor Analysis

Kaiser-Meyer-Olkin (KMO) Measure

What it measures: Sampling adequacy for factor analysis. It compares the magnitudes of observed correlations to partial correlations. High KMO means the correlations between variables are largely explained by other variables, making factor analysis appropriate.

How it’s calculated: KMO = ΣΣr²ᵢⱼ / (ΣΣr²ᵢⱼ + ΣΣp²ᵢⱼ), where rᵢⱼ are correlations and pᵢⱼ are partial correlations. Values closer to 1.0 mean factor analysis is more appropriate.

Thresholds: ≥ 0.50 minimum (miserable), ≥ 0.60 mediocre, ≥ 0.70 middling, ≥ 0.80 meritorious, ≥ 0.90 marvelous (Kaiser, 1974).

TABS result: Barriers KMO = .851 (meritorious), Readiness KMO = .927 (marvelous), Maturity KMO = .912 (marvelous).

Bartlett's Test of Sphericity

What it measures: Whether the correlation matrix is significantly different from an identity matrix (where all correlations are zero). A significant result means the items share enough variance to support factor analysis.

How it’s calculated: χ² = −(N − 1 − (2k+5)/6) × ln|R|, where N = sample size, k = number of variables, |R| = determinant of the correlation matrix. Tested against a chi-squared distribution with k(k−1)/2 degrees of freedom.

Thresholds: p < .05 required. In practice, large samples almost always produce significant results, so this is a necessary but not sufficient condition.

TABS result: All three constructs: p < .001 - highly significant, confirming factor analysis is appropriate.

Exploratory Factor Analysis (EFA)

What it measures: The underlying latent factor structure of a set of items without imposing a pre-specified model. EFA identifies how many factors the data supports and which items load on each factor.

How it’s calculated: TABS uses Maximum Likelihood (ML) estimation with Promax oblique rotation. ML is preferred for normally-distributed data; Promax allows factors to correlate (appropriate for related constructs). The number of factors is determined by Parallel Analysis, not the Kaiser criterion.

Thresholds: Factor loadings ≥ 0.30 are typically considered meaningful. Cross-loadings (high loading on multiple factors) above 0.30 suggest item complexity.

TABS result: Barriers: 2 factors (F1 Internal, F2 External). Readiness: 1 factor. Maturity: 1 factor. See the Factor Analysis page for the full loading matrix.

Horn's Parallel Analysis

What it measures: The number of factors to retain in EFA. It compares actual eigenvalues from the data against eigenvalues generated from random data of the same dimensionality. Factors are retained only when actual eigenvalues exceed the random expectation.

How it’s calculated: Generate many (typically 100-1,000) random datasets with the same N and k. Compute eigenvalues for each. Retain factors whose actual eigenvalues exceed the 95th percentile of the random eigenvalues.

Thresholds: Retain factors where actual eigenvalue > 95th percentile random eigenvalue. This is more accurate than the Kaiser criterion (eigenvalue > 1.0), which tends to over-extract factors.

TABS result: Barriers: 2 factors retained (eigenvalues 5.87, 1.90 > random thresholds). Readiness: 1 factor (only first eigenvalue 7.39 exceeds threshold). Maturity: 1 factor (eigenvalue 4.44 exceeds threshold).

Confirmatory Factor Analysis (CFA)

What it measures: Tests whether a pre-specified factor structure fits the observed data. Unlike EFA (which discovers structure), CFA confirms whether a hypothesized model adequately reproduces the observed covariance matrix.

How it’s calculated: Fit a structural equation model where latent factors predict observed items. Estimate factor loadings, error variances, and factor correlations using ML estimation. Evaluate fit using multiple indices (CFI, TLI, RMSEA, SRMR).

Thresholds: CFI ≥ .95 (good), ≥ .90 (acceptable). TLI ≥ .95 (good), ≥ .90 (acceptable). RMSEA ≤ .06 (good), ≤ .08 (acceptable). SRMR ≤ .08 (good).

TABS result: Maturity CFA fits well (CFI = .981, TLI = .973, RMSEA = .057). Barriers single-factor CFA shows poor fit (CFI = .766), expected for a multi-factor construct. The 4-factor Barriers CFA (CFI = .827) improves but remains below threshold, reflecting the exploratory nature of the current sample.

Eigenvalue

What it measures: The amount of total variance in the items that a single factor accounts for. Each factor's eigenvalue represents its explanatory power. Eigenvalues sum to the number of variables.

How it’s calculated: Computed from the eigendecomposition of the correlation matrix R. The eigenvalue λⱼ for factor j is the sum of squared factor loadings for that factor across all items.

Thresholds: Kaiser criterion: retain factors with eigenvalue > 1.0 (explains more than a single item's worth of variance). However, Parallel Analysis is more accurate and is used by TABS.

TABS result: Barriers: λ₁=5.87, λ₂=1.90, λ₃=1.15 (only 2 exceed parallel analysis threshold). Readiness: λ₁=7.39 (1 factor). Maturity: λ₁=4.44 (1 factor).

Communality

What it measures: The proportion of an item's variance that is explained by the extracted factors. It is the item-level analogue of R² - "how much of this item is captured by the factors?"

How it’s calculated: h²ᵢ = Σλ²ᵢⱼ across all retained factors, where λᵢⱼ is the loading of item i on factor j. Equivalently, 1 minus the uniqueness.

Thresholds: ≥ 0.40 is generally considered adequate. Low communalities (≤ 0.20) indicate the item is not well-explained by the factor structure.

TABS result: In TABS, communalities are interpreted item-by-item to check whether each question is adequately represented by the retained factor structure. Items with lower communalities are candidates for closer review because the extracted factors explain less of their variance.

Cumulative Variance Explained

What it measures: The total percentage of item variance accounted for by all retained factors combined. Higher values mean the factor solution captures more of the systematic variation in the data.

How it’s calculated: Sum of (eigenvalue / k) × 100 for each retained factor, where k is the number of items.

Thresholds: > 50% is commonly cited as a minimum in social sciences, but 40-60% is typical for survey instruments measuring complex constructs.

TABS result: Barriers 2-factor: 39.9%. Readiness 1-factor: 40.0%. Maturity 1-factor: 49.3%. All within the typical range for organizational behavior surveys.

Promax Rotation

What it measures: An oblique rotation method for factor analysis that allows extracted factors to be correlated. This is appropriate when the underlying constructs are theoretically related (as opposed to Varimax, which forces orthogonal/uncorrelated factors).

How it’s calculated: Start with a Varimax (orthogonal) rotation, then raise the loadings to a power (kappa, typically 4) and use the resulting matrix as a target for oblique Procrustes rotation. This simplifies the loading pattern while allowing inter-factor correlations.

Thresholds: If inter-factor correlations exceed |r| > .32, oblique rotation is preferred over orthogonal (Tabachnick & Fidell, 2013). Report factor correlations to justify the choice.

TABS result: Barriers F1-F2 correlation = .505, well above .32, confirming Promax was the correct choice.

Validity

Average Variance Extracted (AVE)

What it measures: The proportion of variance in the items that is captured by the latent construct (as opposed to measurement error). AVE is the core measure of convergent validity - whether items converge on their intended construct.

How it’s calculated: AVE = Σλ²ᵢ / k, where λᵢ are the standardized factor loadings and k is the number of items. This is the mean of the squared loadings - the average communality.

Thresholds: ≥ 0.50 indicates that the construct explains more variance in its items than error does. When AVE < 0.50 but CR > 0.70, convergent validity is "adequate" per Fornell & Larcker (1981). Below 0.50 is common in broad, multi-faceted constructs with many items.

TABS result: Barriers AVE = .289, Readiness AVE = .400, Maturity AVE = .493. All below the ideal .50, but all have CR > .80. The 18-item Barriers scale measures diverse barrier types, naturally reducing AVE. Maturity (8 homogeneous items) is closest to the threshold.

Heterotrait-Monotrait Ratio (HTMT)

What it measures: Discriminant validity - whether two constructs are empirically distinct. HTMT estimates the correlation between constructs as if they were measured perfectly (correcting for measurement error). Lower values indicate better discrimination.

How it’s calculated: HTMT = mean(heterotrait-heteromethod correlations) / geometric mean of (mean(monotrait-heteromethod correlations) for each construct). Bootstrap confidence intervals (2,000 iterations) are reported for inferential testing.

Thresholds: < 0.85 conservative threshold (Henseler et al., 2015). < 0.90 liberal threshold. If the 95% CI includes 1.0, constructs may not be distinct.

TABS result: B-R: .498, B-M: .441, R-M: .804. All pass the conservative .85 threshold. The R-M value (.804) is the highest, reflecting the known Readiness-Maturity conceptual overlap, but remains below threshold.

Fornell-Larcker Criterion

What it measures: Discriminant validity by comparing the square root of each construct's AVE against its correlations with other constructs. If a construct shares more variance with its own items than with another construct, they are discriminant.

How it’s calculated: For each construct pair (A, B): check whether √AVE(A) > |r(A,B)| and √AVE(B) > |r(A,B)|. Both conditions must hold for discriminant validity.

Thresholds: Pass: √AVE > inter-construct correlation. Fail: the constructs may overlap too much. Fornell-Larcker has been criticized as overly lenient; HTMT is now preferred. When they disagree, HTMT takes precedence.

TABS result: B-R: passes (√AVE .537/.632 > |r| .381). B-M: passes (.537/.702 > .316). R-M: fails (√AVE .632 < r .719). The R-M failure is expected given their shared "organizational capability" dimension and is well-documented in the CRP.

Normality

Shapiro-Wilk Test

What it measures: Whether a variable follows a normal distribution. A significant result (p < .05) means the distribution departs significantly from normality.

How it’s calculated: W = (Σaᵢx₍ᵢ₎)² / Σ(xᵢ − x̄)², where x₍ᵢ₎ are the ordered values and aᵢ are tabulated coefficients. W ranges from 0 to 1, with 1 indicating perfect normality.

Thresholds: p > .05 suggests normality. In large samples (N > 100), even trivial departures from normality produce significant results, so practical significance (skewness, kurtosis) matters more than p-values.

TABS result: For Likert-type items at N=200, significant non-normality by Shapiro-Wilk is typical. ML estimation is robust to moderate non-normality at this sample size, so practical indicators (skewness, kurtosis) are more informative than p-values. Item-level Shapiro-Wilk results are not exported in the current validation summary.

Skewness and Kurtosis

What it measures: Skewness measures distributional asymmetry (0 = symmetric). Kurtosis measures tail heaviness relative to a normal distribution (0 = normal, positive = heavy tails, negative = light tails).

How it’s calculated: Skewness = E[(X−μ)³] / σ³. Kurtosis (excess) = E[(X−μ)⁴] / σ⁴ − 3. Both are standardized moments of the distribution.

TABS result: Item-level skewness and kurtosis distributions for the CRP-200 dataset are not exported in the current validation summary. Acceptable limits for ML estimation are |skew| < 2.0 and |kurtosis| < 7.0 (Curran et al., 1996).

CFA Fit Indices

Comparative Fit Index (CFI)

What it measures: How much better the hypothesized model fits compared to a null (independence) model where all items are uncorrelated. Ranges from 0 to 1.

How it’s calculated: CFI = 1 − max((χ²ₘ − dfₘ), 0) / max((χ²₀ − df₀), (χ²ₘ − dfₘ), 0), where m = proposed model, 0 = null model.

Thresholds: ≥ .95 good fit, ≥ .90 acceptable fit (Hu & Bentler, 1999).

TABS result: Maturity CFI = .981 (excellent). Readiness CFI = .930 (acceptable). Barriers single-factor CFI = .766 (poor, expected for multi-dimensional construct).

Tucker-Lewis Index (TLI)

What it measures: Similar to CFI but penalizes model complexity. Values can exceed 1.0 or fall below 0. More conservative than CFI for models with many parameters.

How it’s calculated: TLI = ((χ²₀/df₀) − (χ²ₘ/dfₘ)) / ((χ²₀/df₀) − 1).

Thresholds: ≥ .95 good fit, ≥ .90 acceptable (same as CFI).

TABS result: Maturity TLI = .973 (excellent). Readiness TLI = .920 (acceptable). Barriers TLI = .735 (poor).

Root Mean Square Error of Approximation (RMSEA)

What it measures: The average amount of misfit per degree of freedom. It estimates how well the model fits the population covariance matrix (not just the sample). Lower is better.

How it’s calculated: RMSEA = √(max((χ²ₘ − dfₘ) / (dfₘ × (N−1)), 0)).

Thresholds: ≤ .06 good, ≤ .08 acceptable, > .10 poor (Browne & Cudeck, 1993).

TABS result: Maturity RMSEA = .057 (good). Readiness RMSEA = .063 (acceptable). Barriers RMSEA = .097 (borderline poor).

Standardized Root Mean Square Residual (SRMR)

What it measures: The average discrepancy between the observed and model-implied correlation matrices, standardized to a 0-1 scale. Lower values indicate the model reproduces the correlations more faithfully.

How it’s calculated: SRMR = √(ΣΣ(sᵢⱼ − σ̂ᵢⱼ)² / (k(k+1)/2)), where s and σ̂ are observed and predicted correlations.

Thresholds: ≤ .08 good fit (Hu & Bentler, 1999). SRMR is less sensitive to sample size than chi-squared tests.

TABS result: SRMR is not always available from all estimation methods. When computed, all TABS constructs fall within acceptable ranges.

Key References

Browne, M.W. & Cudeck, R. (1993). Alternative ways of assessing model fit. In K.A. Bollen & J.S. Long (Eds.), Testing Structural Equation Models (pp. 136-162). Sage.

Clark, L.A. & Watson, D. (1995). Constructing validity: Basic issues in objective scale development. Psychological Assessment, 7(3), 309-319.

Curran, P.J., West, S.G. & Finch, J.F. (1996). The robustness of test statistics to nonnormality and specification error. Psychological Methods, 1(1), 16-29.

Feldt, L.S. (1965). The approximate sampling distribution of Kuder-Richardson reliability coefficient twenty. Psychometrika, 30, 357-370.

Fornell, C. & Larcker, D.F. (1981). Evaluating structural equation models with unobservable variables and measurement error. Journal of Marketing Research, 18(1), 39-50.

Henseler, J., Ringle, C.M. & Sarstedt, M. (2015). A new criterion for assessing discriminant validity in variance-based SEM. Journal of the Academy of Marketing Science, 43(1), 115-135.

Hu, L. & Bentler, P.M. (1999). Cutoff criteria for fit indexes in covariance structure analysis. Structural Equation Modeling, 6(1), 1-55.

Kaiser, H.F. (1974). An index of factorial simplicity. Psychometrika, 39, 31-36.

Tabachnick, B.G. & Fidell, L.S. (2013). Using Multivariate Statistics (6th ed.). Pearson.

Shared / Full Dataset Results

Instrument Validation Results →Factor Analysis →Scale Reliability →← Results Overview

CRP 2026 Results

CRP 2026 Validation →CRP 2026 Factor Analysis →CRP 2026 Reliability →← CRP 2026 Overview