We do statistical analysis of our data using R, or more formally The R project for statistical computing.
R is not just a a statistics application, but a full-fledged programming language. R allows users to add additional functionality by defining new functions. You should think of R as an environment within which statistical techniques are implemented. R can be extended via packages. There are about eight packages supplied with the R distribution and many more are available through the CRAN family of Internet sites covering a very wide range of modern statistics.
RStudio is a free, open source IDE (integrated development environment) for R. (You must install R before you can install RStudio.) Its interface is organized so that the user can clearly view graphs, data tables, R code, and output all at the same time. It allows users to import CSV, Excel, SAS (.sas7bdat), SPSS (.sav), and Stata (*.dta) files into R without having to write the code to do so.
This introductory textbook does not assume or expect any prior knowledge or skills in programming or statistical analysis. Instead, this book offers a comprehensive entry to R programming for novices and more experienced analysts wishing to migrate from other computational software.
In this book, students gain experience with the whole data analysis pipeline. In particular, ModernDive is one of the few intro stats textbooks that teaches students how to wrangle data. ModernDive carefully walks the students through each new function it presents and provides frequent reinforcement through the many Learning checks dispersed throughout the chapters.
Donovan notes that much of the and content and structure of these tutorials will be based on Hadley Wickham’s excellent book R for Data Science. For those who want more detail and some exercises for the techniques detailed here, he recommmends going through Wickham’s book (see link in “Advanced Texts” below).
This book assumes no knowledge of R. It is structured as a series of walk-through lessons that cover both the core ideas of data science as well as the concrete software skills that will help translate those ideas into practice.
This is an R handbook for ESM (Evaluation, Statistics, and Methodology) students prepared by faculty at the University of Tennessee, Knoxville. This handbook takes students from installation and set up, to data cleaning, analysis, visualization, and reporting. This guide uses survey data from the RStudio Learning R Survey. It also includes data from built-in R data sets and simulated data.
In early 2011, Danielle Navarro started teaching an introductory statistics class for psychology students offered at the University of Adelaide, using the R statistical package as the primary tool. The lecture notes for the class were expanded into a book which is freely available, and as of version 0.6 it is released under a creative commons licence (CC BY-SA 4.0)
Advanced R Solutions This is the website for “Advanced R Solutions” which provides solutions to the exercises from Hadley Wickham’s Advanced R, 2nd edition.
The R Graphics Cookbook is a practical guide that provides more than 150 recipes to help you generate high-quality graphs quickly, without having to comb through all the details of R’s graphing systems. Each recipe tackles a specific problem with a solution you can apply to your own project, and includes a discussion of how and why the recipe works.
Principal component analysis, or PCA, is a dimensionality-reduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set.