diff --git a/.gitignore b/.gitignore index 6be4156b..2991a260 100644 --- a/.gitignore +++ b/.gitignore @@ -5,6 +5,7 @@ # .Rprofile .DS_Store *.html +*.swp /.quarto/ _site/ diff --git a/content/bootcamp/r/class-01.qmd b/content/bootcamp/r/class-01.qmd deleted file mode 100644 index ef983040..00000000 --- a/content/bootcamp/r/class-01.qmd +++ /dev/null @@ -1,328 +0,0 @@ ---- -title: "class-01" -author: "Sujatha Jagannathan" -date: "8/24/2020" ---- - -```{r include=FALSE} -library(tidyverse) -library(knitr) -``` - -### Contact Info -Suja Jagannathan [sujatha.jagannathan@cuanschutz.edu](mailto:sujatha.jagannathan@cuanschutz.edu) - -### Office Hours -Use https://calendly.com/molb7950 to schedule a time with a TA. - -
- -### Learning Objectives for the R Bootcamp - -* Follow best coding practices (*class 1*) -* Know the fundamentals of R programming (*class 1*) -* Become familiar with "tidyverse" suite of packages - * tidyr: "Tidy" a messy dataset (*class 2*) - * dplyr: Transform data to derive new information (*class 3*) - * ggplot2: Visualize and communicate results (*class 4*) -* Practice reproducible analysis using Rmarkdown (Rigor & Reproducibility) (*classes 1-5*) - -### Today's class outline - *class 1* - -* Coding best practices -* Review R basics - * R vs Rstudio (Exercises #1-2) - * Functions & Arguments (Exercises #3-4) - * Data types (Exercise #5) - * Data structures (Exercises #6-7) - * R Packages (Exercise #8) -* Review Rmarkdown (Exercise #9) -* Rstudio cheatsheets (Exercise #10) - -### Coding best practices ### - -> "Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread." -> --- Hadley Wickham - -### File Names - -* File names should be meaningful and end in `.R`, `.Rmd`, etc. -* Avoid using special characters in file names - stick with numbers, letters, `-`, and `_`. -* *Never* include spaces in file names! - -```{show-code} - ###### Good - fit_models.R - utility_functions.Rmd - - ###### Bad - fit models.R - tmp.r - stuff.r -``` - -* If files should be run in a particular order, prefix them with numbers. -* If it seems likely you'll have more than 10 files, left pad with zero. -* It looks nice (constant width) and sorts nicely. - -```{show-code} - 00_download.R - 01_explore.R - ... - 09_model.R - 10_visualize.R -``` - -* Avoid capitalizing when not necessary. -* If you want to include dates in your file name, use the ISO 8601 standard: `YYYY-MM-DD` -* Use delimiters intentionally! (helps you to recover metadata easily from file names) -* For example, "_" to delimit fields; "-" to delimit words - -```{show-code} -2019-02-15_class1_data-wrangling.Rmd -``` - -* Avoid hard coding file names and instead use relative paths. -* `~` represents the current working directory. -* Use `getwd()` to figure out what your working directory is. - -```{show-code} -###### Good -"~/class1/code/test.R" - -###### Bad -"/Users/sjaganna/Desktop/CU-onedrive/08-teaching/molb7910/class1/data.csv" -``` - -### Organisation - -* Try to give a file a concise name that evokes its contents -* One way to organize your files is by grouping them into `data`, `code`, `plots`, etc. -* For example, in **this class** we often use the following structure: - -```{show-code} - exercises - - exercises-01.Rmd - - data - - img - - setup - ... -``` - -### Internal structure of code - -Use commented lines of `-` and `=` to break up your code chunk into easily readable -segments. Or better yet, make each "action" it's own chunk and give it a name. - -```{show-code} -# Load data --------------------------- - -# Plot data --------------------------- -``` - -### R Basics - Overview ### - -* R, Rstudio (Exercise #1) -* R as a calculator (Exercise #2) -* Functions and arguments (Exercises #3-4) -* Data types: numeric, character, logical (& more) (Exercise #5) -* Data structures: vector, list, matrix, data frame, tibbles (Exercises #6-7) -* Package system, Rstudio, and Rmarkdown (Exercises #8-9) - -### R vs Rstudio - Exercise 1 - -What is R? What is Rstudio? - -* R is a programming language used for statistical computing -* RStudio is an integrated development environment (IDE) for R. It includes a console, terminal, syntax-highlighting editor that supports direct code execution, tools for plotting, history, workspace management, and much more. -* You can use R without RStudio, but not the other way around. - -Let's do the following to explore Rstudio: - -* Look at Rstudio panels one at a time -* Environment, History, Console, Terminal, Files, Plots, Packages, Help, etc. - -### R as a calculator - Exercise 2 - -* R can function like an advanced calculator - -- try simple math -```{r} -2 + 3 * 5 # Note the order of operations. -3 - 7 # value of 3-7 -3 / 2 # Division -5^2 # 5 raised to the second power -# This is a comment line -``` - -- assign a numeric value to an object -```{r} -num <- 5^2 # we just created an "object" num -``` - -- print the object to check -```{r} -num -``` - -- do a computation on the object -```{r} -num + 100 -``` -Note: Objects can be over-written. So be careful if you reuse names. - -### Functions and arguments - Exercise 3 - -* Functions are fundamental building blocks of R -* Most functions take one or more arguments and transform an input object in a specific way. -* Tab completion is your friend! - -```{r} -log -?log -log(4) -log(4, base = 2) -``` - -### Writing a simple function - Exercise 4 - -```{r} -addtwo <- function(x) { - num <- x + 2 - return(num) -} - -addtwo(4) -``` - -```{r} -f <- function(x, y) { - z <- 3 * x + 4 * y - return(z) -} - -f(2, 3) -``` - -### Data types ### - -* There are many data types in R. -* For this class, the most commonly used ones are **numeric**, **character**, and **logical**. -* All these data types can be used to create vectors natively. - -### Data types - Exercise 5 -```{r} -typeof(4) # numeric data time -typeof("suja") # character data type -typeof(TRUE) # logical data type -typeof(as.character(TRUE)) # coercing one data type to another -``` - -### Data structures ### - -* R has multiple data structures. -* Most of the time you will deal with tabular data sets, you will manipulate them, take sub-sections of them. -* It is essential to know what are the common data structures in R and how they can be used. -* R deals with named data structures, this means you can give names to data structures and manipulate or operate on them using those names. - -```{r, echo = FALSE, out.width= '100%'} -knitr::include_graphics("img/data-structures.png") -``` -Source: Devopedia - -### Tibbles -* A __tibble__, or `tbl_df`, is a modern reimagining of the data.frame, keeping what time has proven to be effective, and throwing out what is not. -* Tibbles are data.frames that are lazy and surly: they do less (i.e. they don't change variable names or types, and don't do partial matching) and complain more (e.g. when a variable does not exist). -* This forces you to confront problems earlier, typically leading to cleaner, more expressive code. Tibbles also have an enhanced `print()` method which makes them easier to use with large datasets containing complex objects. -* `tibble()` does much less than `data.frame()`: - - it never changes the type of the inputs - - it never changes the names of variables - - it never creates `row.names()` - -Source: [tibbles chapter](http://r4ds.had.co.nz/tibbles.html) in *R for data science*. - -### Vectors - Exercise 6 - -- Vectors are one of the core R data structures. -- It is basically a list of elements of the same type (numeric,character or logical). -- Later you will see that every column of a table will be represented as a vector. -- R handles vectors easily and intuitively. -- The operations on vectors will propagate to all the elements of the vectors. - -Create the following vectors -```{r} -x <- c(1, 3, 2, 10, 5) # create a vector named x with 5 components -# `c` is for combine -# you could use '=' but I don't recommend it. -y <- 1:5 # create a vector of consecutive integers y -y + 2 # scalar addition -2 * y # scalar multiplication -y^2 # raise each component to the second power -2^y # raise 2 to the first through fifth power -y # y itself has not been unchanged -y <- y * 2 # here, y is changed -``` - -### Data frames - Exercise 7 - -- A data frame is more general than a matrix, in that different columns can have different modes (numeric, character, factor, etc.). -- A data frame can be constructed by data.frame() function. -- For example, we illustrate how to construct a data frame from genomic intervals or coordinates. - -Create a dataframe `mydata` -```{r} -chr <- c("chr1", "chr1", "chr2", "chr2") -strand <- c("-", "-", "+", "+") -start <- c(200, 4000, 100, 400) -end <- c(250, 410, 200, 450) - -mydata.df <- data.frame(chr, strand, start, end) # creating dataframe -mydata.df - -mydata.tbl <- tibble(chr, strand, start, end) # creating a tibble -mydata.tbl -``` - -### R packages - Exercise 8 - -* An R package is a collection of code, data, documentation, and tests that is easily sharable -* A package often has a collection of custom functions that enable you to carry out a workflow. eg. DESeq for RNA-seq analysis -* The most popular places to get R packages from are CRAN, Bioconductor, and Github. -* Once a package is installed, one still has to "load" them into the environment using a `library()` call. - -Let's do the following to explore R packages -* Look at the "Environment" panel in Rstudio -* Explore Global Environment -* Explore the contents of a package - -### Rmarkdown Exercise - Exercise 9 - -* Rmarkdown is a fully reproducible authoring framework to create, collaborate, and communicate your work. -* Rmarkdown supports a number of output formats including pdfs, word documents, slide shows, html, etc. -* An Rmarkdown document is a plain text file with the extension `.Rmd` and contains the following basic components: - - An (optional) YAML header surrounded by ---s. - - Chunks of R code surrounded by ```. - - Text mixed with simple text formatting like # heading and _italics_. - -Let's do the following to explore Rmarkdown documents -* Create a new .Rmd document -* `knit` the document to see the output - -### Homework instructions - -* Today's homework is: - 1) To go over everything we covered today and make sure you understand it. (Use office hours if you have questions) - Expected time spent: 30 min - 1 hour - 2) Go over Rstudio and Rmarkdown cheatsheets (Finding cheatsheets: Exercise 10) - - Expected time spent: 30 min on each cheatsheet - -### Acknowledgements - -The material for this class was heavily borrowed from: -* Introduction to R by Altuna Akalin: http://compgenomr.github.io/book/introduction-to-r.html -* R for data science by Hadley Wickham: https://r4ds.had.co.nz/index.html - -### Further Reading & Resources - -* R for data science https://r4ds.had.co.nz/index.html -* Advanced R by Hadley Wickam https://adv-r.hadley.nz/ -* Installing R: https://cran.r-project.org/ -* Installing RStudio: https://rstudio.com/products/rstudio/download/ diff --git a/content/bootcamp/r/class-02.qmd b/content/bootcamp/r/class-02.qmd index 29d3d6a4..55acfc95 100644 --- a/content/bootcamp/r/class-02.qmd +++ b/content/bootcamp/r/class-02.qmd @@ -7,6 +7,7 @@ date: "8/25/2020" ```{r include=FALSE} library(tidyverse) library(knitr) +library(here) ``` ### Contact Info @@ -49,7 +50,7 @@ Use https://calendly.com/molb7950 to schedule a time with a TA. * 25 packages, total (as of today) - we will focus mainly on tidyr, dplyr, and ggplot2 ```{r, echo = FALSE, out.width= '60%'} -knitr::include_graphics("img/tidy1.png") +knitr::include_graphics(here("img/tidy1.png")) ```
Source: R for Data Science by Hadley Wickham @@ -57,15 +58,15 @@ Source: R for Data Science by Hadley Wickham ### Data import - readr - Exercise 1 ```{r, echo = FALSE, out.width= '70%'} -knitr::include_graphics("img/readr.png") +knitr::include_graphics(here("img/readr.png")) ``` -```{r, echo = FALSE, out.width= '80%'} -knitr::include_graphics("img/readr-args.png") +```{r, echo = FALSE, out.width= '80%', fig.cap='Source: RStudio cheatsheets'} +knitr::include_graphics(here("img/readr-args.png")) ``` -Source: Rstudio cheatsheets - Let's try importing a small dataset - Exercise # 1 + ```{r} getwd() # good to know which folder you are on since the path to file is relative # same as `pwd` in bash @@ -84,15 +85,16 @@ __Note__: All of these functions can also be used in an interactive manner via ` > > --- Hadley Wickham -```{r, echo = FALSE, out.width= '100%'} -knitr::include_graphics("img/tidydata.png") +```{r, echo = FALSE, out.width= '100%', fig.cap='Source: Rstudio cheatsheets'} +knitr::include_graphics(here("img/tidydata.png")) ``` -Source: Rstudio cheatsheets ### Datasets for today's class - Exercise 2 -* In this class, we will use the datasets that come with the tidyr package to explore all the functions provided by tidyr. +* In this class, we will use the datasets that come with the tidyr package to explore all the functions provided by tidyr. + * Explore the contents of _tidyr_ package (Exercise #2) + * `table1`, `table2`, `table3`, `table4a`, `table4b`, and `table5` all display the number of TB cases documented by the World Health Organization in Afghanistan, Brazil, and China between 1999 and 2000. ### Getting familiar with the data - Exercise 3 @@ -106,7 +108,8 @@ R provides many functions to examine features of a data object - `str()` - what is the structure of the object? - `attributes()` - does it have any metadata? -* Let's explore table1 +* Let's explore `table1` + ```{r} # View(table1) # to look at the table in Viewer table1 # to print the table to console @@ -183,10 +186,10 @@ There are other verbs as well - as always, look at the `tidyr` cheatsheet! pivot_wider() "widens" data, increasing the number of columns and decreasing the number of rows. ```{r, echo = FALSE, out.width= '50%'} -knitr::include_graphics("img/pivot_wider.png") +knitr::include_graphics(here("img/pivot_wider.png")) ``` -```{show-code} +```{r, eval=FALSE} pivot_wider( data, names_from = name, @@ -222,10 +225,10 @@ table2_tidy pivot_longer() "lengthens" data, increasing the number of rows and decreasing the number of columns. ```{r, echo = FALSE, out.width= '50%'} -knitr::include_graphics("img/pivot_longer.png") +knitr::include_graphics(here("img/pivot_longer.png")) ``` -``` r +```{r eval=FALSE} pivot_longer( data, cols, @@ -250,10 +253,10 @@ table4_tidy Given either a regular expression or a vector of character positions, separate() turns a single character column into multiple columns. ```{r, echo = FALSE, out.width= '60%'} -knitr::include_graphics("img/separate.png") +knitr::include_graphics(here("img/separate.png")) ``` -```{show-code} +```{r eval=FALSE} separate( data, col, @@ -285,7 +288,7 @@ table3_tidy_1 Given either a regular expression or a vector of character positions, separate() turns a single character column into multiple rows. ```{r, echo = FALSE, out.width= '50%'} -knitr::include_graphics("img/separate_rows.png") +knitr::include_graphics(here("img/separate_rows.png")) ``` ```{show-code} @@ -311,7 +314,7 @@ This is not a great example because in creating two rows, the case and populatio unite() combines multiple columns into a single column. ```{r, echo = FALSE, out.width= '50%'} -knitr::include_graphics("img/unite.png") +knitr::include_graphics(here("img/unite.png")) ``` ```{show-code} @@ -334,14 +337,14 @@ table6_tidy ### Handling missing values ```{r, echo = FALSE, out.width= '100%'} -knitr::include_graphics("img/missing-values.png") +knitr::include_graphics(here("img/missing-values.png")) ``` Source: Rstudio cheatsheets ## Regular expressions ```{r, echo = FALSE, out.width= '80%'} -knitr::include_graphics("img/regex.png") +knitr::include_graphics(here("img/regex.png")) ```
Source: Rstudio cheatsheets diff --git a/content/bootcamp/r/class-03.qmd b/content/bootcamp/r/class-03.qmd index bcd531d6..300cf5c9 100644 --- a/content/bootcamp/r/class-03.qmd +++ b/content/bootcamp/r/class-03.qmd @@ -7,6 +7,7 @@ date: "8/26/2020" ```{r include=FALSE} library(tidyverse) library(knitr) +library(here) ``` @@ -166,7 +167,7 @@ select_if(starwars, is.numeric) # select_if to return all columns with numeric v Mutate has a LOT of variants. ```{r, echo = FALSE, out.width= '60%'} -knitr::include_graphics("img/mutate.png") +knitr::include_graphics(here("img/mutate.png")) ``` Source: Rstudio cheatsheets @@ -244,7 +245,7 @@ More information here: https://dplyr.tidyverse.org/articles/rowwise.html ```{r, echo = FALSE, out.width= '60%'} -knitr::include_graphics("img/summarise.png") +knitr::include_graphics(here("img/summarise.png")) ``` @@ -310,7 +311,7 @@ starwars %>% + `union()` ```{r, echo = FALSE, out.width= '100%'} -knitr::include_graphics("img/combining-tables.png") +knitr::include_graphics(here("img/combining-tables.png")) ``` Source: Rstudio cheatsheets diff --git a/content/bootcamp/r/class-04.qmd b/content/bootcamp/r/class-04.qmd index 04b60547..db8fe0cb 100644 --- a/content/bootcamp/r/class-04.qmd +++ b/content/bootcamp/r/class-04.qmd @@ -12,6 +12,7 @@ library(cowplot) # to make panels of plots library(viridis) # nice colors! library(ggridges) # ridge plots library(hexbin) # hexplots +library(here) ``` ### Contact Info @@ -81,14 +82,14 @@ _coordinate-system_: specify x and y variables _geometry_: specify type of plots - histogram, boxplot, line, density, dotplot, bar, etc. ```{r, echo = FALSE, out.width= '60%'} -knitr::include_graphics("img/ggplot-syntax.png") +knitr::include_graphics(here("img/ggplot-syntax.png")) ``` __aesthetics__ can map variables in the data to visual properties of the geom (aesthetics) like size, color, and x and y locations to make the plot more information rich. ```{r, echo = FALSE, out.width= '60%'} -knitr::include_graphics("img/ggplot-aesthetics.png") +knitr::include_graphics(here("img/ggplot-aesthetics.png")) ``` ### Making a plot step-by-step (Exercise 2) @@ -187,7 +188,7 @@ You can. But the advantage of ggplot is that it is equally "simple" to make basi ## *Create more complex plots* ```{r, echo = FALSE, out.width= '50%'} -knitr::include_graphics("img/ggplot-layers.png") +knitr::include_graphics(here("img/ggplot-layers.png")) ``` ### Geom function @@ -199,7 +200,7 @@ knitr::include_graphics("img/ggplot-layers.png") ### Geom functions for one variable - Exercise 4 ```{r, echo = FALSE, out.width= '60%'} -knitr::include_graphics("img/ggplot-1variable.png") +knitr::include_graphics(here("img/ggplot-1variable.png")) ``` ```{r fig.height=3, fig.width=5} @@ -258,7 +259,7 @@ With two variables, depending on the nature of the data, you can have different ### discrete x, continuous y - Exercise 5 ```{r, echo = FALSE, out.width= '60%'} -knitr::include_graphics("img/ggplot-geom-dx-cy.png") +knitr::include_graphics(here("img/ggplot-geom-dx-cy.png")) ``` ```{r fig.height=3, fig.width=5} @@ -303,7 +304,7 @@ ggplot( ### continuous x, continuous y - Exercise 6 ```{r, echo = FALSE, out.width= '60%'} -knitr::include_graphics("img/ggplot-geom-cx-cy.png") +knitr::include_graphics(here("img/ggplot-geom-cx-cy.png")) ``` ```{r fig.height=3, fig.width=4} @@ -344,7 +345,7 @@ ggplot( ### continuous bivariate - Exercise 7 ```{r, echo = FALSE, out.width= '50%'} -knitr::include_graphics("img/ggplot-geom-cont-bivar.png") +knitr::include_graphics(here("img/ggplot-geom-cont-bivar.png")) ``` ```{r fig.height=3, fig.width=3} @@ -365,7 +366,7 @@ ggplot( ### Geom functions for three variables - Exercise 8 ```{r, echo = FALSE, out.width= '100%'} -knitr::include_graphics("img/ggplot-geom-3variables.png") +knitr::include_graphics(here("img/ggplot-geom-3variables.png")) ``` One example with geom_tile() @@ -383,7 +384,7 @@ ggplot( R has 25 built in shapes that are identified by numbers. There are some seeming duplicates: for example, 0, 15, and 22 are all squares. The difference comes from the interaction of the colour and fill aesthetics. The hollow shapes (0–14) have a border determined by colour; the solid shapes (15–18) are filled with colour; the filled shapes (21–24) have a border of colour and are filled with fill. ```{r, echo = FALSE, out.width= '80%'} -knitr::include_graphics("img/ggplot-shapes.png") +knitr::include_graphics(here("img/ggplot-shapes.png")) ``` ```{r fig.height=3, fig.width=4} @@ -488,7 +489,7 @@ Facets divide a plot into subplots based on the values of one or more discrete v ```{r, echo = FALSE, out.width= '60%'} -knitr::include_graphics("img/ggplot-facet.png") +knitr::include_graphics(here("img/ggplot-facet.png")) ``` ```{r fig.height=3, fig.width=7} @@ -523,7 +524,7 @@ ggplot( Themes can significantly affect the appearance of your plot. Thanksfully, there are a lot to choose from. ```{r, echo = FALSE, out.width= '60%'} -knitr::include_graphics("img/ggplot-themes.png") +knitr::include_graphics(here("img/ggplot-themes.png")) ``` ```{r fig.height=3, fig.width=4} @@ -593,7 +594,7 @@ ggplot( ### Labels & Legends - Exercise 15 ```{r, echo = FALSE, out.width= '50%'} -knitr::include_graphics("img/ggplot-labels-legends.png") +knitr::include_graphics(here("img/ggplot-labels-legends.png")) ``` ```{r fig.height=3, fig.width=5} @@ -658,7 +659,7 @@ More information on using plot_grid (from package `cowplot`) is [here](https://w ### Saving plots (Exercise 18) ```{r} -ggsave("img/plot_final.png", width = 5, height = 5) +ggsave(here("img/plot_final.png"), width = 5, height = 5) # Saves last plot as 5’ x 5’ file named "plot_final.png" in working directory. Matches file type to file extension ```