class: center, middle, inverse, title-slide .title[ # Introduction to Statistics: R intro ] .subtitle[ ## Neuroscience 2022-2023 ] .author[ ### Prof. Dr. Tim Friede
Dr. Andreas Leha
Renato Valladares Panaro, MSc ] .institute[ ###
University Medical Center Göttingen
Department of Medical Statistics
] .date[ ###
October 2022</span’> ] --- # Download R and RStudio -- ## R Installation - Windows: go to <https://cran.r-project.org/bin/windows/base/> and download the .exe file and install it on your machine. - Macosx: go to <https://cran.r-project.org/bin/macosx/> and download the file according to the spec. of your machine. - Linux: go to <https://cran.r-project.org/bin/linux/> select your distribution and follow the instructions to install R and its dependencies. -- ## The manual - R language is defined in [https://cran.r-project.org/doc/manuals/r-release/R-lang.html](https://cran.r-project.org/doc/manuals/r-release/R-lang.html). --- # Install RStudio Go to <https://www.rstudio.com/products/rstudio/download/#download> and download the installation file according to your operating system. -- <center> <img src="./img/rstudioIDE.png" style="width: 95%" /> </center> --- # RStudio panels -- <center> <img src="https://bookdown.org/ndphillips/YaRrr/images/RStudio_Screenshot_Labels.png" alt = "source: https://bookdown.org/ndphillips/YaRrr/images/RStudio_Screenshot_Labels.png" style="width: 95%" /> </center> [https://bookdown.org/ndphillips/YaRrr/images/RStudio_Screenshot_Labels.png](https://bookdown.org/ndphillips/YaRrr/images/RStudio_Screenshot_Labels.png) --- ## Write an R script .pull-left[ - Find the *File* tab and click *New File*. <center> <img src="./img/Screenshot from 2022-09-20 23-33-30.png" style="width: 100%" /> </center> - The **Untitled1** Rscript will appear in the source panel. <center> <img src="./img/Screenshot from 2022-10-03 17-11-27.png" style="width: 80%" /> </center> ] .pull-right[ - Run a line of code using the `ctrl+ENTER` shortcut. - The output is retrieved in the console panel. <center> <img src="./img/Screenshot from 2022-10-03 17-11-35.png" style="width: 80%" /> </center> - The source code is written and then saved locally using the .R extension (e.g. **mycode.R**). ] --- # Ask R! Use the *help()* command with an object name inside the parenthesis. ```r help(sum) ``` <center> <img src="./img/Screenshot from 2022-09-20 15-13-19.png" style="width: 80%" /> </center> --- ## Schedule Cover the basics of sections 2 and 3 of the R-lang [manual](https://cran.r-project.org/doc/manuals/r-release/R-lang.html). ### Section 2 - Objects: -- - 1) Basic types - Vectors - Lists - Function objects - Environments - **NULL** -- - 2) Attributes - Names - Dimensions - Dimnames - Classes --- ## Schedule -- - 3) Special compound objects - Factors - Data frame objects -- ### Section 3 - Evaluation of expressions: - 4) Simple evaluation - Function calls - Operators -- - 5) Control structures - if - for - while -- - 6) Indexing - Indexing by vectors - Indexing other structures --- class: left, middle background-color: #f0f0f0 background-size: cover # 1) Basic types --- ## Vectors > R has six basic (‘atomic’) vector types: logical, integer, real, complex, string (or character) and raw. | typeof | mode | storage.mode | example | |-----------|-----------|--------------|--------------| | logical | logical | logical | x = TRUE | | double | numeric | double | x = 1| | character | character | character | x = "1"| ```r c(1,3,9) ``` ``` ## [1] 1 3 9 ``` ```r typeof(c(1,3,9)) ``` ``` ## [1] "double" ``` --- ## Lists > Lists (‘generic vectors’) have elements, each of which can contain any type of R object, i.e. the elements of a list do not have to be of the same type. > Lists are vectors, and the basic vector types are referred to as atomic vectors where it is necessary to exclude lists. -- .pull-left[ ```r list(TRUE, 1L, 1, "1") ``` ``` ## [[1]] ## [1] TRUE ## ## [[2]] ## [1] 1 ## ## [[3]] ## [1] 1 ## ## [[4]] ## [1] "1" ``` ] .pull-right[ <center> <img src="./img/list.png" style="width: 90%" /> </center> ] --- class: left, middle background-color: #f0f0f0 background-size: cover # 2) Attributes --- ## Names > A names attribute, when present, labels the individual elements of a vector or list. -- ```r z <- 1:3 names(z) ``` ``` ## NULL ``` ```r names(z)[2] <- "b" ## assign just one name z ``` ``` ## <NA> b <NA> ## 1 2 3 ``` ```r class(names(z)) ``` ``` ## [1] "character" ``` --- ## Dimensions > The dim attribute is used to implement arrays. -- ```r x <- 1:12 ## atomic vector dim(x) <- c(3, 4) x ``` ``` ## [,1] [,2] [,3] [,4] ## [1,] 1 4 7 10 ## [2,] 2 5 8 11 ## [3,] 3 6 9 12 ``` -- Alternatively, ```r x <- matrix(1:12, ncol = 4, nrow = 4) x ``` ``` ## [,1] [,2] [,3] [,4] ## [1,] 1 5 9 1 ## [2,] 2 6 10 2 ## [3,] 3 7 11 3 ## [4,] 4 8 12 4 ``` --- ## Dimnames > Arrays may name each dimension separately using the dimnames attribute which is a list of character vectors. -- ```r x <- matrix(1:12, ncol = 4, nrow = 4) dimnames(x) <- list(c("A", "B", "C", "D"), as.character(1:4)) x ``` ``` ## 1 2 3 4 *## A 1 5 9 1 ## B 2 6 10 2 ## C 3 7 11 3 ## D 4 8 12 4 ``` ```r x["A", ] ``` ``` ## 1 2 3 4 *## 1 5 9 1 ``` --- ## Classes > R has an elaborate class system, principally controlled via the class attribute. -- ```r (x <- 10) ``` ``` ## [1] 10 ``` ```r (y <- list(obj1 = x)) ``` ``` ## $obj1 ## [1] 10 ``` -- ```r class(x) ``` ``` ## [1] "numeric" ``` ```r class(y) ``` ``` ## [1] "list" ``` --- class: left, middle background-color: #f0f0f0 background-size: cover # 3) Special compound objects --- ## Factors > Factors are used to describe items that can have a finite number of values (gender, social class, etc.). > A factor may be purely nominal or may have ordered categories. -- ```r (x <- factor(x = c(3, 1, 2), labels = c("large", "small", "medium"))) ``` ``` ## [1] medium large small ## Levels: large small medium ``` ```r levels(x) <- c("small", "medium", "large") ``` -- ```r x ``` ``` ## [1] large small medium ## Levels: small medium large ``` --- ## Data frame objects > A data frame is a list of vectors, factors, and/or matrices all having the same length (number of rows in the case of matrices). ```r L3 <- LETTERS[1:3] fac <- sample(L3, 4, replace = TRUE) d <- data.frame(x = 1L, y = 1:4, fac = fac) ``` ```r d ``` ``` ## x y fac *## 1 1 1 A *## 2 1 2 A ## 3 1 3 A ## 4 1 4 A ``` -- ```r d[1:2, ] ``` ``` ## x y fac *## 1 1 1 A *## 2 1 2 A ``` --- class: left, middle background-color: #f0f0f0 background-size: cover ## 4) Simple evaluation --- ### Function calls > Most of the computations carried out in R involve the evaluation of functions. > Functions are invoked by name with a list of arguments separated by commas. .pull-left[ ```r Speed <- cars$speed Distance <- cars$dist plot( x = Speed, y = Distance, pch = 0, cex = 1.2, col = "blue", bty = "n" ) ``` ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-18-1.png" width="504" /> ] --- ### Operators ```text - Minus, can be unary or binary + Plus, can be unary or binary ! Unary not ~ Tilde, used for model formulae, can be either unary or binary ? Help : Sequence, binary (in model formulae: interaction) `*` Multiplication, binary / Division, binary ^ Exponentiation, binary < Less than, binary > Greater than, binary == Equal to, binary >= Greater than or equal to, binary <= Less than or equal to, binary [...] ``` ```r ?operators ``` --- class: left, middle background-color: #f0f0f0 background-size: cover ## 5) Control structures --- ### if > The if/else statement conditionally evaluates two statements. There is a condition which is evaluated and if the value is TRUE then the first statement is evaluated; otherwise the second statement will be evaluated. ```r x <- -1 y <- -1 if (x^2 + y^2 < 1) { print(TRUE) *} else { * print(FALSE) *} ``` ``` ## [1] FALSE ``` --- ### for > For each element in vector the variable *name* is set to the value of that element and *statement1* is evaluated. ```r for (name in 1:3) { * print(name + 1) # statement1 } ``` ``` ## [1] 2 ## [1] 3 ## [1] 4 ``` > A side effect is that the variable name still exists after the loop has concluded and it has the value of the last element of vector that the loop was evaluated for. --- class: left, middle background-color: #f0f0f0 background-size: cover ## 6) Indexing --- .pull-left[ ### Indexing matrices and arrays > R allows some powerful constructions using vectors as indices. We shall discuss indexing of simple vectors first. ```r (m <- matrix(1:6, nrow = 2)) ``` ``` ## [,1] [,2] [,3] ## [1,] 1 3 5 ## [2,] 2 4 6 ``` ```r m[, 1:2] ``` ``` ## [,1] [,2] ## [1,] 1 3 ## [2,] 2 4 ``` ```r m[2, ] ``` ``` ## [1] 2 4 6 ``` ] .pull-right[ ### Indexing other structures ```r my_ls <- list( obj1 = m, obj2 = "Hallo" ) my_ls["obj1"] ``` ``` ## $obj1 ## [,1] [,2] [,3] ## [1,] 1 3 5 ## [2,] 2 4 6 ``` ```r my_ls[["obj1"]] ``` ``` ## [,1] [,2] [,3] ## [1,] 1 3 5 ## [2,] 2 4 6 ``` ```r my_ls$obj2 ``` ``` ## [1] "Hallo" ``` ] --- ### Useful links - CRAN manuals: [https://cran.r-project.org/manuals.html](https://cran.r-project.org/manuals.html) - RStudio manuals: [https://rstudio.github.io/r-manuals/r-intro/](https://rstudio.github.io/r-manuals/r-intro/) - The Base R Cheat Sheet: [https://iqss.github.io/dss-workshops/R/Rintro/base-r-cheat-sheet.pdf](https://iqss.github.io/dss-workshops/R/Rintro/base-r-cheat-sheet.pdf) ### Help function Search the help files for a word or phrase: ```r help.search("weighted mean") ``` Find help for a package: ```r help(package = "dplyr") ``` --- class: left, middle background-color: #f0f0f0 background-size: cover # Exercise 1 - Basic R ## (20 min) # Exercise 2 - Descriptive Statistics ## (25 min) --- # Exercise 1 - R basics ## Simple functions Generate a vector of length 10 containing some numbers. Try what happens when you apply the *sort()* function. Check its help pages to see what additional options you can specify. What do the related *rank()* and *order()* functions do? -- ```r # generate some numbers: x <- c(7, 3, 12, 0:4, 100, -3) print(x) length(x) ``` -- ``` ## [1] 7 3 12 0 1 2 3 4 100 -3 ``` ``` ## [1] 10 ``` --- ```r # sort(): ?sort sort(x) sort(x, decreasing = TRUE) ``` -- ``` ## [1] -3 0 1 2 3 3 4 7 12 100 ``` ``` ## [1] 100 12 7 4 3 3 2 1 0 -3 ``` -- ```r ?NA sort(c(5, 7, 1, NA, 3)) sort(c(5, 7, 1, NA, 3), na.last = TRUE) sort(c(5, 7, 1, NA, 3), na.last = FALSE) ``` -- ``` ## [1] 1 3 5 7 ``` ``` ## [1] 1 3 5 7 NA ``` ``` ## [1] NA 1 3 5 7 ``` --- ```r # rank() ?rank rank(x) cbind(x, "rank()" = rank(x)) x ``` -- ``` ## [1] 8.0 5.5 9.0 2.0 3.0 4.0 5.5 7.0 10.0 1.0 ``` ``` ## x rank() ## [1,] 7 8.0 ## [2,] 3 5.5 ## [3,] 12 9.0 ## [4,] 0 2.0 ## [5,] 1 3.0 ## [6,] 2 4.0 ## [7,] 3 5.5 ## [8,] 4 7.0 ## [9,] 100 10.0 ## [10,] -3 1.0 ``` ``` ## [1] 7 3 12 0 1 2 3 4 100 -3 ``` --- ## Trigonometric functions Use the *seq()* function to generate a vector *x* of length 100 of evenly spaced numbers between 0 and `\(2\pi\)`. Draw a plot (using *plot()*) of *x* vs. *sin(x)*. Use the *plot()* function's type argument to draw a connecting line. Use the *lines()* function to also add the cosine to the plot. -- .pull-left[ ```r # trigonometric functions: x <- seq( from = 0, to = 2 * pi, length = 100 ) plot(x = x, y = sin(x)) ``` ] -- .pull-right[ <img src="index_files/figure-html/unnamed-chunk-31-1.png" width="504" /> ] --- .pull-left[ ```r plot(x, sin(x), type = "l") ?lines # plot(x, cos(x)) lines(x, cos(x)) ``` ] -- .pull-right[ <img src="index_files/figure-html/unnamed-chunk-32-1.png" width="504" /> ] -- .pull-left[ ```r plot(x, sin(x), type = "l", col = "red", xlab = "x", ylab = "y", main = "Trigonometric functions" ) lines(x, cos(x), col = "blue") ``` ] -- .pull-right[ <img src="index_files/figure-html/unnamed-chunk-33-1.png" width="504" /> ] --- # Exercise 2 - Descriptive statistics CASE STUDY: In a clinical trial 12 patients are randomly assigned to two different treatments. Blood measurements are taken before and after the treatment. ## Loading data Read the data [here](https://raw.githubusercontent.com/rvpanaro/neuroscience_2022_2023/main/data/study1.csv) into the R workspace . -- ```r library(RCurl) dt <- read.csv(text = getURL("https://raw.githubusercontent.com/rvpanaro/neuroscience_2022_2023/main/data/study1.csv"), sep=";") str(dt) ``` -- ``` ## 'data.frame': 12 obs. of 4 variables: ## $ patient: int 1 2 3 4 5 6 7 8 9 10 ... ## $ group : chr "a" "a" "a" "a" ... ## $ before : num 0.372 0.442 0.382 0.569 0.571 ... ## $ after : num 0.221 0.44 0.331 0.591 0.668 ... ``` --- ## Summary statistics Calculate mean, median, variance, standard deviation, quartiles and the sum of the measurements before treatment (*mean()*). -- ```r c(sum = sum(dt$before), mean = mean(dt$before), median = median(dt$before)) ``` -- ``` ## sum mean median ## 5.2126380 0.4343865 0.4119027 ``` -- ```r c(var = var(dt$before), sd = sd(dt$before)) ``` -- ``` ## var sd ## 0.01246266 0.11163628 ``` -- ```r quantile(dt$before, probs=c(0.25, 0.50, 0.75)) ``` -- ``` ## 25% 50% 75% ## 0.3638939 0.4119027 0.5452815 ``` --- ## Box Plot Plot a boxplot and a histogram of the measurements before treatment. Correct title of axes. Choose appropriate range of axes (*boxplot()*). -- .pull-left[ ```r ?boxplot boxplot(dt$before) ``` <img src="index_files/figure-html/unnamed-chunk-38-1.png" width="288" /> ] -- .pull-right[ ```r boxplot(dt$before, dt$after, horizontal = TRUE) ``` <img src="index_files/figure-html/unnamed-chunk-39-1.png" width="288" /> ] --- .pull-left[ ```r boxplot(dt$before, xlab = "Before treatment", ylab = "Blood measurements", ylim = c(0.2, 0.7), main = "Box plot", horizontal = TRUE) points(dt$before, y = rep(0.6, 12), pch=1, col="orange", cex=0.75) ``` ] -- .pull-right[ <img src="index_files/figure-html/unnamed-chunk-40-1.png" width="504" /> ] -- .pull-left[ ```r # Histogram hist(dt$before, xlim = c(0.2, 0.7), xlab = "Before treatment", main = "Histogram", breaks = 12) points(dt$before, rep(0, 12), pch=15, col="orange", cex=0.75) ``` ] -- .pull-right[ <img src="index_files/figure-html/unnamed-chunk-41-1.png" width="504" /> ] --- ## Bar Chart Determine the absolute and relative frequencies of patients in the study groups. Plot a bar chart of the parameter "group" (*pie()* and *barplot()*). -- .pull-left[ ```r ## absolute table(dt$group) ## relative table(dt$group) / length(dt$group) ``` ] -- .pull-right[ ``` ## ## a b ## 6 6 ``` ``` ## ## a b ## 0.5 0.5 ``` ] -- .pull-left[ ```r pie(table(dt$group)) ``` <img src="index_files/figure-html/unnamed-chunk-43-1.png" width="216" /> ] -- .pull-right[ ```r barplot(table(dt$group)) ``` <img src="index_files/figure-html/unnamed-chunk-44-1.png" width="216" /> ]