PART 1: DATA ANALYTICS FOUNDATION

Inferential Statistics

Inferential Statistics covers parametric and non-parametric methods for hypothesis testing and data-driven decision-making. It addresses estimation, t-tests, ANOVA, and corresponding non-parametric approaches, accommodating a variety of data scenarios, including those that deviate from normality assumptions. Python-based techniques are used throughout, facilitating reliable analyses and effective communication of statistical results.

Introduction to Statistical Inference

Learning Objectives

Explain foundational statistical inference terminology (parameter, estimator, estimate) and differentiate point vs. interval estimation to build core inferential skills.

Indicative Content

Definition and importance of statistical inference
Key terms: Variable, Population, Sample, Statistical Distribution, Factor, Descriptive Statistics
Parameter vs. estimator vs. estimate
Confidence intervals with scipy.stats.t.interval()

Hypothesis Testing

Learning Objectives

Construct null and alternative hypotheses and evaluate test statistics, errors, and p-values to guide data-driven decisions.

Indicative Content

Null Hypothesis (H₀) vs. Alternative Hypothesis (H₁)
Test statistic and rejection region
Errors in hypothesis testing (Type I and II)
p-value concept and interpretation
scipy.stats.ttest_1samp() for hypothesis testing

Normality Assessment

Learning Objectives

Apply graphical (Q-Q plots) and numerical tests (Shapiro-Wilk, Kolmogorov-Smirnov) to determine if data meet normality assumptions for valid parametric testing.

Indicative Content

Q-Q Plot: scipy.stats.probplot()
Shapiro-Wilk Test: scipy.stats.shapiro()
Kolmogorov-Smirnov Test: scipy.stats.kstest()

T-Distribution and Degrees of Freedom

Learning Objectives

Recognize the properties of the t-distribution, compare it to the normal distribution, and articulate how degrees of freedom impact parametric tests.

Indicative Content

Fundamentals of the t-distribution
Degrees of freedom (df)
Computing probabilities with scipy.stats.t.pdf()

One-Sample T-Test

Learning Objectives

Conduct and interpret a one-sample t-test to compare a sample mean with a hypothesized population mean.

Indicative Content

Practical usage of one-sample t-tests
scipy.stats.ttest_1samp()

Independent Samples T-Test

Learning Objectives

Evaluate mean differences between two groups using independent samples t-tests, ensuring normality and equal variance assumptions are verified.

Indicative Content

Key assumptions (random sampling, normality, equal variance)
scipy.stats.ttest_ind() for mean comparisons
Checking variance equality with scipy.stats.levene()

Paired T-Test

Learning Objectives

Analyze mean differences for paired data (same subjects, two conditions/time points) using paired t-tests.

Indicative Content

Considerations for paired data
scipy.stats.ttest_rel()

T-Test for Correlation

Learning Objectives

Assess the statistical significance of correlation between two continuous variables with a t-test approach.

Indicative Content

Pearson correlation analysis
scipy.stats.pearsonr()

F-Test for Equality of Variances

Learning Objectives

Determine variance equality between two populations, a prerequisite for many parametric methods.

Indicative Content

Concept of F-tests
scipy.stats.f_oneway() for variance-based comparisons

One-Way ANOVA

Learning Objectives

Compare three or more group means under normality and homogeneity of variances using one-way ANOVA.

Indicative Content

Concept and assumptions of one-way ANOVA
scipy.stats.f_oneway()

Two-Way ANOVA

Learning Objectives

Incorporate two independent variables into ANOVA, analyzing main and interaction effects on group means.

Indicative Content

Main vs. interaction effects
statsmodels.formula.api.ols() and statsmodels.api.stats.anova_lm()

Multi-Way ANOVA

Learning Objectives

Extend ANOVA to three or more independent variables, exploring multifaceted factor interactions.

Indicative Content

Multi-factor study designs
statsmodels.api.ols() for multi-way ANOVA

Introduction to Non-Parametric Tests

Learning Objectives

Explain the rationale for non-parametric tests when parametric assumptions fail and weigh the trade-offs between these approaches.

Indicative Content

Definition of distribution-free methods
Conditions under which parametric assumptions break down
Power considerations in parametric vs. non-parametric tests
Relevant Python equivalents (replacing R functions)

Mann-Whitney U Test (Two Independent Groups)

Learning Objectives

Apply rank-based methods to compare two independent groups under non-normal or ordinal data conditions.

Indicative Content

Ranking combined samples and calculating sum of ranks
Computing the U test statistic
scipy.stats.mannwhitneyu() for implementation in Python

Wilcoxon Signed Rank Test (Paired Samples)

Learning Objectives

Evaluate differences in paired observations using rank-based methods for non-normal data.

Indicative Content

Computing differences between paired observations
Ranking absolute differences and deriving the W statistic
scipy.stats.wilcoxon() for performing the test in Python

Kruskal-Wallis Test (Multiple Independent Groups)

Learning Objectives

Compare three or more independent groups using a rank-based approach to detect distribution differences without normality assumptions.

Indicative Content

Ranking combined groups and deriving sum of ranks
Calculating the H test statistic
scipy.stats.kruskal() for multi-group comparisons

Chi-Square Test for Independence (Categorical Data)

Learning Objectives

Determine the relationship between two categorical variables by comparing observed vs. expected frequencies and testing for independence.

Indicative Content

Constructing a contingency table
Computing expected frequencies and the χ² statistic
pandas.crosstab() for contingency tables
scipy.stats.chi2_contingency() for Chi-Square Test in Python

Tools and Methodologies

Python Data Environment
- pandas for data manipulation and constructing contingency tables (e.g., pandas.crosstab())
- numpy for foundational numeric operations (array management, mathematical functions)
Statistical Testing and Analysis
- scipy.stats for parametric tests (t-tests, F-tests, ANOVA, normality checks) and non-parametric tests (Mann-Whitney U, Wilcoxon, Kruskal-Wallis, Chi-Square)
- statsmodels.formula.api.ols() and statsmodels.api.stats.anova_lm() for one-way, two-way, and multi-way ANOVA analyses