class:inverse <br><br><br> ## DSC365: Intro to Data Science ### Introduction to R #### August 21, 2025 --- ## Agenda - R Studio Set-Up - R as a Calculator - Basic Functions - Installing Packages - Loading Data <br> <br> <br> <br> <br> <br> **Note**: Please follow along and run the code in the file *IntroR.R* found in BlueLine --- ### R Studio Set-Up <img src="../images/rstudio.png" width="100%" style="display: block; margin: auto;" /> To Create a Script, Click the Following: File -> New File -> R script [R Studio IDE Cheatsheet](cheatsheets/rstudio-ide.pdf) --- ### R Studio Set-Up - Editor (Top Left) <img src="../images/editor.jpeg" width="100%" style="display: block; margin: auto;" /> <br> **Shortcuts for running code**: To run code from your script: + Windows: Ctrl + Enter + Mac: Cmd + Enter --- ### R Studio Set-Up - Console (Bottom Left) <img src="../images/console.jpeg" width="100%" style="display: block; margin: auto;" /> --- ### R Studio Set-Up - Environment (Top Right) <img src="../images/environment.jpeg" width="100%" style="display: block; margin: auto;" /> --- ### R Studio Set-Up - Environment (Bottom Right) <!-- Trigger the Modal --> <img id='imgfiles' src='../images/files.png' alt='Files' width='30%'> <!-- Trigger the Modal --> <img id='imgpackages' src='../images/packages.png' alt='Packages' width='30%'> <!-- Trigger the Modal --> <img id='imghelp' src='../images/help.png' alt='Help' width='30%'> <!-- The Modal --> <div id='modalfiles' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalfiles'> <!-- Modal Caption (Image Text) --> <div id='captionfiles' class='modal-caption'></div> </div> <!-- The Modal --> <div id='modalpackages' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalpackages'> <!-- Modal Caption (Image Text) --> <div id='captionpackages' class='modal-caption'></div> </div> <!-- The Modal --> <div id='modalhelp' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalhelp'> <!-- Modal Caption (Image Text) --> <div id='captionhelp' class='modal-caption'></div> </div> --- ### Writing Code for People - A **comment** is a part of computer code which is intended only for people to read. It is not evaluated or run by the computing language. - In `R` we use `#` ``` r # This is a comment ``` - **Literate Programming**: interspersing text and code in the same document - We will learn more about this next week. --- ### Giant Calculator! .pull-left[ ``` r 3+4 #Addition ``` ``` ## [1] 7 ``` ``` r 5-2 #Subtraction ``` ``` ## [1] 3 ``` ``` r 3*2 #Multiplication ``` ``` ## [1] 6 ``` ``` r 9/3 #Division ``` ``` ## [1] 3 ``` ].pull-right[ ``` r sqrt(4) #square root ``` ``` ## [1] 2 ``` ``` r 2^3 #exponent ``` ``` ## [1] 8 ``` ``` r exp(3) #exponential function ``` ``` ## [1] 20.08554 ``` ``` r log(10) #natural log ``` ``` ## [1] 2.302585 ``` ] --- ### Giant Calculator! .pull-left[ ``` r log(10, base = 10) ``` ``` ## [1] 1 ``` ``` r abs(-1) #absolute value ``` ``` ## [1] 1 ``` ``` r floor(3.7) #round down ``` ``` ## [1] 3 ``` ``` r ceiling(3.2) #round up ``` ``` ## [1] 4 ``` ].pull-right[ ``` r round(3.2) ``` ``` ## [1] 3 ``` ``` r pi ``` ``` ## [1] 3.141593 ``` ``` r sin(pi/2) ``` ``` ## [1] 1 ``` ``` r 5%%3 #division remainder ``` ``` ## [1] 2 ``` ] --- ### Giant Calculator! **Note**: In R, the following will not work! ``` r 10(2+4) ``` *R* will assume the "10()" is indicating some function, NOT multiplication as we do in math and stats. Thus, you MUST include the multiplication symbol. ``` r 10*(2+4) ``` ``` ## [1] 60 ``` Calculates using your Order of Operations, so be careful with `()` --- ### Creating Variables We can create variables using the assignment operator `<-` ``` r x <- 5 ``` Should now see `x` in the `Environment` Window We can then perform any of the functions on the variables: ``` r log(x) ``` ``` ## [1] 1.609438 ``` <br> Shortcut to make arrow: - Mac: `Option` + `-` - PC: `Alt` + `-` --- ### Naming Variables ... It's hard Rules of Naming Variables: + Variables can't start with a number + Can't use some special symbols (`^`, `!`, `$`, `@`, `+`, `-`, `/`, `*`) + Case-sensitive + There are reserved words that *R* won't let you use for variables names - for, in, while, if, else, repeat, break, next <br> My general tips: - Avoid meaningless names - Be consistent with conventions - Don't make too long/difficult to spell --- ### Vectors A variable does not need to be a single value. We can create a **vector** using the `c` function (combine- combines several objects into one) ``` r y <- c(1, 5, 3, 2) ``` Operations can then be done element-wise ``` r y/2 ``` ``` ## [1] 0.5 2.5 1.5 1.0 ``` Determining number of objects in a vector ``` r length(y) ``` ``` ## [1] 4 ``` --- ### Vectors Can also have vectors of characters ``` r bulldogs <- c("american", "english", "french") bulldogs ``` ``` ## [1] "american" "english" "french" ``` ``` r length(bulldogs) ``` ``` ## [1] 3 ``` ``` r str(bulldogs) ``` ``` ## chr [1:3] "american" "english" "french" ``` --- ### R Packages - **Packages**: include reusable R functions, the documentation that describes how to use them, and sample data. - List of all installed packages in **Packages** Tab - *R* packages containing more specialized *R* functions can be installed freely from CRAN servers using the function `install.packages()` - After packages are installed, their function can be loaded into the current *R* session using the function `library()` --- ### Step 1: Installing Packages We will be installing the [tidyverse](https://www.tidyverse.org/) package. - Manipulates data structures (includes dplyr, tidyr, purr, tibble, etc packages) **Method 1** ``` r install.packages("tidyverse") ``` **Method 2** - Point and Click + Tools -> Install Packages Only have to do this once!! --- ### Step 2: Load Package Let's load the package we just installed! ``` r library(tidyverse) ``` Will have to run this every time we are in a new session and want to use this package **Note**: Packages do update --- ### Functions Think of a function like a verb: ``` r do_this(to_this) ``` **Functions** are sets of instructions that take **arguments** and **return** values *R* has pre-made functions found in packages, but later in the semester we will talk about how to create your own functions --- ### Load Data - **Course focus**: .csv file (download from Chrome) - **Recommendation**: Save the data set to the same file folder as your R file *Method 1* (Manual): ``` r getwd() #What is your working directory setwd() # Change Working directory ``` **Method 2*: 1). Session -> Set Working Directory -> Choose Directory -> Select folder data set is located -> Apply 2). ``` r data = read.csv("movie-2018.csv") ``` --- ### Load Data **Method 3*: Import Data set in Environment Tab - Works well for .csv/.xlsx files - Environment Tab -> Import Dataset -> From Text -> Find Dataset -> "Open" -> "Import" - Runs the code for you in your console (would recommend pasting this into your R Script) --- ### Check Data Need to make sure data is loaded in correctly - Should see data in environment panel (upper right) ``` r head(data, n = 3) #make sure data loads in correctly ``` ``` ## Film Script.Type Average.critics Average.audience Primary.Genre Opening.Weekend ## 1 12 Strong adaptation 54 64 drama 15815025 ## 2 7 Days in Entebbe based on a true story 38 49 thriller 1592645 ## 3 A Quiet Place original screenplay 89 79 horror 50203562 ## Domestic.Gross Foreign.Gross Worldwide.Gross Budget...million. Release.Date..US. ## 1 45500164 21533253 67033417 35 19-Jan-18 ## 2 3189220 5465548 8654768 NA 16-Mar-18 ## 3 188024361 144559086 332583447 17 6-Apr-18 ``` --- ### Check Variable Types ``` r glimpse(data) ``` ``` ## Rows: 160 ## Columns: 11 ## $ Film <chr> "12 Strong", "7 Days in Entebbe", "A Quiet Place", "A Simple Favor", "A Star is Born"… ## $ Script.Type <chr> "adaptation", "based on a true story", "original screenplay", "adaptation", "remake",… ## $ Average.critics <chr> "54", "38", "89", "76", "89", "48", "32", "28", "65", "73", "76", "83", "79", "60", "… ## $ Average.audience <chr> "64", "49", "79", "74", "84", "29", "57", "33", "65", "74", "83", "69", "77", "77", "… ## $ Primary.Genre <chr> "drama", "thriller", "horror", "thriller", "drama", "adventure", "drama", "comedy", "… ## $ Opening.Weekend <chr> "15815025", "1592645", "50203562", "16011689", "42908051", "33123609", "2798229", "23… ## $ Domestic.Gross <int> 45500164, 3189220, 188024361, 53548586, 202110867, 100478608, 3000943, 5059608, 31445… ## $ Foreign.Gross <chr> "21533253", "5465548", "144559086", "41285014", "188000000", "32197256", "-", "-", "2… ## $ Worldwide.Gross <int> 67033417, 8654768, 332583447, 94833600, 390110867, 132675864, 3000943, 5059608, 53345… ## $ Budget...million. <dbl> 35.00, NA, 17.00, 20.00, 36.00, 125.00, 10.00, 19.00, 35.00, 51.00, 3.00, 40.00, 195.… ## $ Release.Date..US. <chr> "19-Jan-18", "16-Mar-18", "6-Apr-18", "14-Sep-18", "5-Oct-18", "9-Mar-18", "24-Aug-18… ``` <br> Note: The purpose of checking variable types is to make sure R is understanding what each column is representing. --- ## Change Variable Types *R* does not always read in the variables correctly. For example, `Average.audience` is being read in as a character, when it should be a number. We can change this! ``` r #To change this column to a number: data$Average.audience <- as.numeric(data$Average.audience) ``` ``` ## Warning: NAs introduced by coercion ``` ``` r #Now Opening.Weekend is a number! class(data$Average.audience) ``` ``` ## [1] "numeric" ``` [More about Data Types](https://www.geeksforgeeks.org/r-data-types/) --- ### Selecting Specific Row/Column ``` r data[1,] #first row ``` ``` ## Film Script.Type Average.critics Average.audience Primary.Genre Opening.Weekend Domestic.Gross ## 1 12 Strong adaptation 54 64 drama 15815025 45500164 ## Foreign.Gross Worldwide.Gross Budget...million. Release.Date..US. ## 1 21533253 67033417 35 19-Jan-18 ``` ``` r data[,1] #first column ``` ``` ## [1] "12 Strong" "7 Days in Entebbe" ## [3] "A Quiet Place" "A Simple Favor" ## [5] "A Star is Born" "A Wrinkle in Time" ## [7] "A.X.L." "Action Point" ## [9] "Adrift" "Alpha" ## [11] "American Animals" "Annihilation" ## [13] "Ant-Man and the Wasp" "Aquaman " ## [15] "Avengers: Infinity War" "Baaghi 2" ## [17] "Bad Samaritan" "Bad Times at the El Royale" ## [19] "Beirut (aka The Negotiator)" "Black Panther" ## [21] "BlacKkKlansman" "Blindspotting" ## [23] "Blockers" "Bohemian Rhapsody" ## [25] "Book Club" "Breaking In" ## [27] "Bumblebee" "Call Me By Your Name" ## [29] "Chappaquiddick" "Christopher Robin" ## [31] "Crazy Rich Asians" "Creed II" ## [33] "Deadpool 2" "Death Wish" ## [35] "Den of Thieves" "Detective Chinatown 2" ## [37] "Disobedience" "Dog Days" ## [39] "Dr Seuss' The Grinch" "Early Man" ## [41] "Eighth Grade" "Every Day" ## [43] "Fantastic Beasts: The Crimes of Grindelwald" "Fifty Shades Freed" ## [45] "Finding Your Feet" "First Man" ## [47] "First Reformed" "Forever My Girl" ## [49] "Game Night" "God's Not Dead: A Light in Darkness" ## [51] "Goosebumps 2: Haunted Halloween" "Gotti" ## [53] "Green Book" "Gringo" ## [55] "Halloween " "Hearts Beat Loud" ## [57] "Hereditary" "Holmes and Watson" ## [59] "Hostiles" "Hotel Artemis" ## [61] "Hotel Transylvania 3: Summer Vacation" "Hunter Killer" ## [63] "I Can Only Imagine" "I Feel Pretty" ## [65] "Incredibles 2" "Insidious: The Last Key" ## [67] "Instant Family" "Isle of Dogs" ## [69] "Jurassic World: Fallen Kingdom" "La Boda de Valentina" ## [71] "Lean on Pete" "Leave No Trace" ## [73] "Life of the Party" "Love, Simon" ## [75] "Mamma Mia: Here We Go Again!" "Mary Poppins Returns" ## [77] "Maze Runner: The Death Cure" "Midnight Sun" ## [79] "Mile 22" "Mission: Impossible - Fallout" ## [81] "Molly's Game" "Mortal Engines" ## [83] "Night School" "Nobody's Fool" ## [85] "Ocean’s 8" "Operation Finale" ## [87] "Operation Red Sea" "Overboard" ## [89] "Overlord" "Pacific Rim: Uprising" ## [91] "Paddington 2" "Papillon" ## [93] "Paul, Apostle of Christ" "Peppermint" ## [95] "Peter Rabbit" "Proud Mary" ## [97] "Puzzle" "Raid" ## [99] "Ralph Breaks the Internet" "Rampage" ## [101] "Ready Player One" "Red Sparrow" ## [103] "Robin Hood" "Samson" ## [105] "Searching" "Second Act" ## [107] "Sgt. Stubby: An American Hero" "Sherlock Gnomes" ## [109] "Show Dogs" "Sicario: Day of the Soldado" ## [111] "Skyscraper" "Slender Man" ## [113] "Smallfoot" "Solo: A Star Wars Story" ## [115] "Sorry to Bother You" "Spider-man: Into the Spider-verse" ## [117] "Super Troopers 2" "Superfly" ## [119] "Tag" "Teen Titans Go! To The Movies" ## [121] "The 15:17 to Paris" "The Commuter" ## [123] "The Darkest Minds" "The Death of Stalin" ## [125] "The Equalizer 2" "The Favourite" ## [127] "The First Purge" "The Girl in the Spider's Web" ## [129] "The Happytime Murders" "The Hate U Give" ## [131] "The House with a Clock in its Walls" "The Hurricane Heist" ## [133] "The Leisure Seeker" "The Meg" ## [135] "The Miracle Season" "The Mule" ## [137] "The Nun" "The Nutcracker and the Four Realms" ## [139] "The Post" "The Predator" ## [141] "The Rider" "The Seagull" ## [143] "The Spy Who Dumped Me" "The Strangers: Prey at Night" ## [145] "Thoroughbreds" "Tomb Raider" ## [147] "Traffik" "Truth or Dare" ## [149] "Tully" "Tyler Perry's Acrimony" ## [151] "Uncle Drew" "Unfriended: Dark Web" ## [153] "Unsane" "Upgrade" ## [155] "Venom" "Vice" ## [157] "White Boy Rick" "Widows" ## [159] "Winchester" "You Were Never Really Here" ``` --- ### Selecting Specific Row/Column ``` r data[1:2,c(2,5,7)] #row: 1-2, column: 2,5,7 ``` ``` ## Script.Type Primary.Genre Domestic.Gross ## 1 adaptation drama 45500164 ## 2 based on a true story thriller 3189220 ``` We can use `$` to specify columns as well ``` r head(data$Film) ``` ``` ## [1] "12 Strong" "7 Days in Entebbe" "A Quiet Place" "A Simple Favor" "A Star is Born" ## [6] "A Wrinkle in Time" ``` --- ### Some Basic Functions ``` r summary(data[,7:10]) ``` ``` ## Domestic.Gross Foreign.Gross Worldwide.Gross Budget...million. ## Min. : 1010385 Length:160 Min. :1.010e+06 Min. : 1.00 ## 1st Qu.: 9477580 Class :character 1st Qu.:1.622e+07 1st Qu.: 10.50 ## Median : 32354347 Mode :character Median :5.294e+07 Median : 33.00 ## Mean : 65923100 Mean :1.624e+08 Mean : 50.99 ## 3rd Qu.: 68929494 3rd Qu.:1.623e+08 3rd Qu.: 62.00 ## Max. :700059566 Max. :2.046e+09 Max. :321.00 ## NA's :18 ``` ``` r nrow(data) ``` ``` ## [1] 160 ``` ``` r ncol(data) ``` ``` ## [1] 11 ``` --- ### Some Basic Functions ``` r table(data$Primary.Genre) ``` ``` ## ## action adventure animation black comedy comedy crime drama fantasy ## 30 14 4 2 22 5 40 2 ## horror musical romance sci-fi thriller ## 10 2 4 2 23 ``` --- ## Built-In Statistical Functions ``` r mean(data$Domestic.Gross, na.rm = TRUE) #na.rm removes missing values ``` ``` ## [1] 65923100 ``` ``` r sd(data$Domestic.Gross, na.rm = TRUE) ``` ``` ## [1] 105900308 ``` ``` r summary(data$Domestic.Gross) ``` ``` ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 1010385 9477580 32354347 65923100 68929494 700059566 ``` ``` r quantile(data$Domestic.Gross, c(0.025, 0.975), na.rm = TRUE) ``` ``` ## 2.5% 97.5% ## 1251400 320769577 ``` --- ### Getting Help Let's talk about a couple ways to get help. The primary function to use is the `help` function. Just pass in the name of the function you need help with: ``` r help(head) ``` The `?` also works ``` r ?head ``` This returns the help documentation for this function Googling for help can be difficult at first. You might need to search for R + to get good results Stay up-to-date with R: [R Community Blog](https://rweekly.org) --- ### Warnings vs. Errors + Routinely beginners to R panic if they see a red message as innocuous as confirming that a library loaded - Not all red text means that there is an error! + A *warning* is a message that does not disturb the program flow but is displayed along with the output - Not always a cause for concern + An *error* will terminate a program from being ran + Google is a beautiful thing <br> <br> Introduction points from: https://ourcodingclub.github.io/tutorials/intro-to-r/ --- ### Learning to code can be challenging <img src="../images/debug.png" width="1175" /> Artwork by @allison_horst --- ### Some Debugging Tips Here is a few strategies if your code does not work: - Google! + Whenever you see an error message, start by googling it. If you’re lucky, you’ll discover that it’s a common error with a known solution. When googling, improve your chances of a good match by removing any variable names or values that are specific to your problem. - Make sure the problem does exist. + Sometimes the error may not cause by the current lines you are working on. It is possible that you have changed something earlier. Try to re-run the files from the beginning. - Run the functions part by part till you find the problem. - Print out the output to see whether it is the one you want. Sometimes look at the data type may also help you understand what's going on. --- ### Some Debugging Tips - Take a break! You won't effectively debug something if you're stressed. - Check your spelling <img src="images/spelling-error.png" width="70%" style="display: block; margin: auto;" /> - Rubber ducking [(Source)](https://duckly.com/blog/improve-how-to-code-with-rubber-duck-debugging/) --- ### Two Common Errors 1). Typos: ``` r maen(c(1, 7, 13)) ``` ``` ## Error in maen(c(1, 7, 13)): could not find function "maen" ``` 2). Unloaded Package ``` r favstats(data$Domestic.Gross) ``` ``` ## Error in favstats(data$Domestic.Gross): could not find function "favstats" ```