Gaining familiarity with R #

Getting started #

One of the best resources for getting started with data science is the R for Data Science book written by Hadley Wickham. It’s available for free online and provides a great intro to the key concepts of a data science workflow as well as pragmatic instructions for loading up RStudio and getting started.

One of the great things about the approaches outlined in this book is that it gets you started using some of the tools that make R great very early on. Specifically, it leverage the tidyverse which is a collection of R packages (a package is essentially a bundle of code that aims to accomplish a particular task or set of related tasks) that make data science a breeze.

Prerequisites #

If you think R is the letter that comes after S in the alphabet, you will clearly need to install R on your computer. This is outlined in the Intro in the R for Data Science book described above, but here’s what you have to do:

Install R
Install RStudio (an integrated development environment or IDE)
Install the R package tidyerse (this makes life enjoyable)

R and RStudio are available here. Just be sure to pick a version that works with your computer.

Data science lifecycle #

I’d recommend getting start with the “Whole game” section from the R for Data Science book. It works through a number of key points that apply to any data analysis workflow. Obviously the genomics analyses that we are striving to execute will be different from these workflows, but many of the principles are the same.

Completmentary resources #

If you are completely new to programming, you might find Hands-On Programming with R to be useful for understanding some coding concepts.
If you are a self-proclaimed nerd and want to know more about the history of R as a programming language, it’s design philosophy and it’s limitations, check out R Programming for Data Science, specifically the History and Overview of R section.