Premise
During my time as a teaching assistant for 36-202 the course moved away from conducting analysis in MiniTab to rudimentary analysis and visualizations in R. 36-202 is an introductory statistics course typically completed in a student’s second semester and builds upon EDA and single linear regression techniques explored in 36-200.
Process
Given the population of intended users, second-semester freshmen with little to no programming experience, I prioritized usability and simplicity. Introductory Carnegie Mellon computer science courses are typically taught in python (at the time I wrote cmu202). As a result of this trend I assumed that students who entered 36-202 with programming knowledge would most likely be familiar with pythonic syntax – a stark contrast to the more rigid nature of R.
In consultation with Gordon Weinberg, Kayla Frisoli and Rebecca Nugent, we examined the existing R functionality we felt that introductory students could understand and documented the functionality that would either be needlessly difficult. The functionality implemented was a follows:
- Data processing and loading: Loading data into R requires a rudimentary knowledge of file storage formats, relative and absolute paths and
read.XYZ
functions. All needed datasets can be loaded with a call ofdata("DATASET_NAME")
. As the course progressed students gradually moved toread.csv
and similar functions. The presence of simple data-loading procedures reduces the cognitive load of data processing and allows for students to focus on other skills within R. - Outlier Calculation: A secondary goal of the course is to familiarize students with basic R functionality. This function allows for easy $\text{k} \times IQR(\text{array})$ rule calculations. The functionality provides a quick way to calculate a commonly-used metric and ca be combined with basic data manipulation techniques to enhance EDA displays.
- Cross Means: This functionality implements a simply way to calculate a $2 \times 2$ table of means for 2-way ANOVA. Students struggled with
aov
functinality and attempted to minimize its use
Skills Used
- R Package Development
- Namespace Management
- Function Documentation
- Repository Management
- Remote Installation: I installed updated versions of the package on a custom CMU Statistics R server for students to use
- Pedagogy
- Elementary statistical functionality