PART 1: DATA ANALYTICS FOUNDATION

PART 1: DATA ANALYTICS FOUNDATION

Inferential Statistics

Inferential Statistics covers parametric and non-parametric methods for hypothesis testing and data-driven decision-making. It addresses estimation, t-tests, ANOVA, and corresponding non-parametric approaches, accommodating a variety of data scenarios, including those that deviate from normality assumptions. Python-based techniques are used throughout, facilitating reliable analyses and effective communication of statistical results.

Introduction to Statistical Inference

Learning Objectives

Explain foundational statistical inference terminology (parameter, estimator, estimate) and differentiate point vs. interval estimation to build core inferential skills.

Indicative Content

  • Definition and importance of statistical inference

  • Key terms: Variable, Population, Sample, Statistical Distribution, Factor, Descriptive Statistics

  • Parameter vs. estimator vs. estimate

  • Confidence intervals with scipy.stats.t.interval()

Hypothesis Testing

Learning Objectives

Construct null and alternative hypotheses and evaluate test statistics, errors, and p-values to guide data-driven decisions.

Indicative Content

  • Null Hypothesis (H₀) vs. Alternative Hypothesis (H₁)

  • Test statistic and rejection region

  • Errors in hypothesis testing (Type I and II)

  • p-value concept and interpretation

  • scipy.stats.ttest_1samp() for hypothesis testing

Normality Assessment

Learning Objectives

Apply graphical (Q-Q plots) and numerical tests (Shapiro-Wilk, Kolmogorov-Smirnov) to determine if data meet normality assumptions for valid parametric testing.

Indicative Content

  • Q-Q Plot: scipy.stats.probplot()

  • Shapiro-Wilk Test: scipy.stats.shapiro()

  • Kolmogorov-Smirnov Test: scipy.stats.kstest()

T-Distribution and Degrees of Freedom

Learning Objectives

Recognize the properties of the t-distribution, compare it to the normal distribution, and articulate how degrees of freedom impact parametric tests.

Indicative Content

  • Fundamentals of the t-distribution

  • Degrees of freedom (df)

  • Computing probabilities with scipy.stats.t.pdf()

One-Sample T-Test

Learning Objectives

Conduct and interpret a one-sample t-test to compare a sample mean with a hypothesized population mean.

Indicative Content

  • Practical usage of one-sample t-tests

  • scipy.stats.ttest_1samp()

Independent Samples T-Test

Learning Objectives

Evaluate mean differences between two groups using independent samples t-tests, ensuring normality and equal variance assumptions are verified.

Indicative Content

  • Key assumptions (random sampling, normality, equal variance)

  • scipy.stats.ttest_ind() for mean comparisons

  • Checking variance equality with scipy.stats.levene()

Paired T-Test

Learning Objectives

Analyze mean differences for paired data (same subjects, two conditions/time points) using paired t-tests.

Indicative Content

  • Considerations for paired data

  • scipy.stats.ttest_rel()

T-Test for Correlation

Learning Objectives

Assess the statistical significance of correlation between two continuous variables with a t-test approach.

Indicative Content

  • Pearson correlation analysis

  • scipy.stats.pearsonr()

F-Test for Equality of Variances

Learning Objectives

Determine variance equality between two populations, a prerequisite for many parametric methods.

Indicative Content

  • Concept of F-tests

  • scipy.stats.f_oneway() for variance-based comparisons

One-Way ANOVA

Learning Objectives

Compare three or more group means under normality and homogeneity of variances using one-way ANOVA.

Indicative Content

  • Concept and assumptions of one-way ANOVA

  • scipy.stats.f_oneway()

Two-Way ANOVA

Learning Objectives

Incorporate two independent variables into ANOVA, analyzing main and interaction effects on group means.

Indicative Content

  • Main vs. interaction effects

  • statsmodels.formula.api.ols() and statsmodels.api.stats.anova_lm()

Multi-Way ANOVA

Learning Objectives

Extend ANOVA to three or more independent variables, exploring multifaceted factor interactions.

Indicative Content

  • Multi-factor study designs

  • statsmodels.api.ols() for multi-way ANOVA

Introduction to Non-Parametric Tests

Learning Objectives

Explain the rationale for non-parametric tests when parametric assumptions fail and weigh the trade-offs between these approaches.

Indicative Content

  • Definition of distribution-free methods

  • Conditions under which parametric assumptions break down

  • Power considerations in parametric vs. non-parametric tests

  • Relevant Python equivalents (replacing R functions)

Mann-Whitney U Test (Two Independent Groups)

Learning Objectives

Apply rank-based methods to compare two independent groups under non-normal or ordinal data conditions.

Indicative Content

  • Ranking combined samples and calculating sum of ranks

  • Computing the U test statistic

  • scipy.stats.mannwhitneyu() for implementation in Python

Wilcoxon Signed Rank Test (Paired Samples)

Learning Objectives

Evaluate differences in paired observations using rank-based methods for non-normal data.

Indicative Content

  • Computing differences between paired observations

  • Ranking absolute differences and deriving the W statistic

  • scipy.stats.wilcoxon() for performing the test in Python

Kruskal-Wallis Test (Multiple Independent Groups)

Learning Objectives

Compare three or more independent groups using a rank-based approach to detect distribution differences without normality assumptions.

Indicative Content

  • Ranking combined groups and deriving sum of ranks

  • Calculating the H test statistic

  • scipy.stats.kruskal() for multi-group comparisons

Chi-Square Test for Independence (Categorical Data)

Learning Objectives

Determine the relationship between two categorical variables by comparing observed vs. expected frequencies and testing for independence.

Indicative Content

  • Constructing a contingency table

  • Computing expected frequencies and the χ² statistic

  • pandas.crosstab() for contingency tables

  • scipy.stats.chi2_contingency() for Chi-Square Test in Python

Tools and Methodologies

  • Python Data Environment

    • pandas for data manipulation and constructing contingency tables (e.g., pandas.crosstab())

    • numpy for foundational numeric operations (array management, mathematical functions)

  • Statistical Testing and Analysis

    • scipy.stats for parametric tests (t-tests, F-tests, ANOVA, normality checks) and non-parametric tests (Mann-Whitney U, Wilcoxon, Kruskal-Wallis, Chi-Square)

    • statsmodels.formula.api.ols() and statsmodels.api.stats.anova_lm() for one-way, two-way, and multi-way ANOVA analyses