cmu202

2021-02-20
2 min read
Featured Image

Premise

During my time as a teaching assistant for 36-202 the course moved away from conducting analysis in MiniTab to rudimentary analysis and visualizations in R. 36-202 is an introductory statistics course typically completed in a student’s second semester and builds upon EDA and single linear regression techniques explored in 36-200.

Process

Given the population of intended users, second-semester freshmen with little to no programming experience, I prioritized usability and simplicity. Introductory Carnegie Mellon computer science courses are typically taught in python (at the time I wrote cmu202). As a result of this trend I assumed that students who entered 36-202 with programming knowledge would most likely be familiar with pythonic syntax – a stark contrast to the more rigid nature of R.

In consultation with Gordon Weinberg, Kayla Frisoli and Rebecca Nugent, we examined the existing R functionality we felt that introductory students could understand and documented the functionality that would either be needlessly difficult. The functionality implemented was a follows:

  • Data processing and loading: Loading data into R requires a rudimentary knowledge of file storage formats, relative and absolute paths and read.XYZ functions. All needed datasets can be loaded with a call of data("DATASET_NAME"). As the course progressed students gradually moved to read.csv and similar functions. The presence of simple data-loading procedures reduces the cognitive load of data processing and allows for students to focus on other skills within R.
  • Outlier Calculation: A secondary goal of the course is to familiarize students with basic R functionality. This function allows for easy $\text{k} \times IQR(\text{array})$ rule calculations. The functionality provides a quick way to calculate a commonly-used metric and ca be combined with basic data manipulation techniques to enhance EDA displays.
  • Cross Means: This functionality implements a simply way to calculate a $2 \times 2$ table of means for 2-way ANOVA. Students struggled with aov functinality and attempted to minimize its use

Skills Used

  • R Package Development
  • Namespace Management
  • Function Documentation
  • Repository Management
  • Remote Installation: I installed updated versions of the package on a custom CMU Statistics R server for students to use
  • Pedagogy
  • Elementary statistical functionality

Resources

Next ISLE