class:inverse <br><br><br> ## DSC365: Intro to Data Science ### Welcome to DSC365 #### January 13, 2026 --- ## Homework - Software downloaded by beginning of class on **Thursday January 15** + Should hopefully have time at end of class --- ## Alison Kleffner, PhD - BS in Applied Mathematics and Economics at Rockhurst University - Master and PhD in Statistics at University of Nebraska at Lincoln - Thesis: *Visualization and Modeling of Multivariate Data in Environmental Applications* ??? **Now let introduce yourselves! Get into groups of 2-3** - Name - Something your group has in common (besides things like major ...) --- class:inverse <br> <br> <br> <br> <br> <br> <br> <br> .center[ ## What is Data Science? ] --- ### Data Science Data science is an exciting discipline that allows you to turn raw data into understanding, insight, and knowledge. - Data science is a vast field, and there’s no way you can master it all by taking a single course. But we are going to lay a solid foundation. .center[ <img src="../images/data-science.png" alt="Venn Diagram depicting skills that a good data scientist must have. Data scientists needs skills from math and stats, computer science, and domain area." width="60%" /> ] --- ### Data Science .center[ <img src="../images/data-flowchart.png" alt="The Data Science Flow Chart. First you must import the data into your programing language of preference. Next you must clean the data. Then you have a cycle of transforming, visualizing, and modeling your data. Finally once you have answered your research question, you must then communicate your results to a non-statistical audience." width="1477" /> ] --- class:inverse <br> <br> <br> <br> <br> <br> <br> <br> .center[ ## Syllabus and Course Specifics ] --- ## What will be in this course? - This is a programming course using *R*, the most commonly used *statistical* language - We will briefly cover each part of the Data Science Cycle - In this course you will complete in-class labs, projects, and presentations. - Since this course involves using *R*, please bring your laptops to class with you --- ## Syllabus --- class:inverse <br> <br> <br> <br> <br> <br> <br> <br> .center[ ## Software Set-Up ] --- ### How to Learn a Programming Language If you have never used *R* (or even heard of it) before, it is totally OK! We will be starting with the basics. .pull-left[ - Breathe - Mistakes are ok! - Ask for help! - Translate code into plain English - Don't reinvent the wheel - Walk away ].pull-right[ <img src="../images/begin-R.png" alt="Cartoon depicting that R can be frustrating to learn at first, but hopefully after some time it becomes more fun to use." width="1565" /> .center[<font size = "0.75">Artwork by Allison Horst</font>] ] --- ### What is R? *R* is a free software environment and programming language for statistical computing and graphics - Created by Ross Ihaka and Robert Gentleman in 1993; Formally released by the [**R Core Group**](https://www.r-project.org/contributors.html) in 1997. - Open-source - **Interpreted language**: we interact with R through a “command line” interpreter, which translates our “code” to machine code - CRAN (comprehensive R archive network) is R's central software repository. Contains contributed packages - New major release once a year <img src="../images/base-r.png" alt="Visual of the R Console. This is the place where you would type R code." width="50%" style="display: block; margin: auto;" /> --- ### What is R Studio? Every R installation comes with the *R* Console, so we don’t actually need an additional program to interface with R. BUT most of people who uses *R* ALSO uses RStudio to interact with it. - History + **RStudio** was released in 2011 by J.J. Allaire. + They make money off the IDE and other helper software. + In 2020, RStudio became a PBC (*Public Benefit Corp*), meaning they are legally obligated to support education and open-source development. - “Integrated Development Environment” + Clean user interface --- ### Eventually... Move to Positron From it's [website](https://positron.posit.co): "Positron unifies exploration and production work in one free, AI-assisted environment, empowering the full spectrum of data science in Python and R" - Positron is under much more active development, with a rapidly expanding set of features. RStudio’s development, on the other hand, is much more focused on bug fixes and product stability. - First stable release in July 2025 + I have been told there is still some things it needs to work through. --- ### Implication of Open-Source Software Because *R* is open source... - No one company owns *R* (similar to Python) - This means nobody can sell their **R** code! + But you can sell "helpers" like **RStudio**. - Users can contribute *R* packages to add additional functions and capabilities (more than 19,000 as of March 2023). + A complete list can be found [here](https://cran.r-project.org/web/packages/available_packages_by_name.html) + Pro: New statistical/data science techniques are added to CRAN, Bioconductor (another package repository), GitHub, etc. daily + Con: No standard syntax! Not all are well documented --- ### Reproducible Research Excerpt from the Simply Statistics blog: “The Real Reason Reproducible Research is Important” [(Source)](https://simplystatistics.org/posts/2014-06-06-the-real-reason-reproducible-research-is-important/) - **Reproducible**: the original data (and original computer code) can be analyzed (by an independent investigator) to obtain the same results of the original study. - Reproducibility is important because it is the only thing that an investigator can guarantee about a study. - It does not necessarily ensure the results are correct, but does ensure transparency. More reasons to do your programming in a reproducible way: 1. **Time saved**: especially important in the professional world 2. **Time elapsed**: even the best programmers forget how their code runs eventually --- ### Reproducible Research with Quarto Quarto provides “an authoring framework for data science”. In an Quarto document, you can: - Save and execute code - Generate written reports that can be shared with someone else - Supported file formats: Word, PDF, HTML, slide shows, handouts, dashboards, Powerpoint <br> <br> **More on this next week** --- ### Let's Install it Together 1. Download and run the R installer for your operating system from CRAN - Windows: https://cran.rstudio.com/bin/windows/base/ - Mac: https://cran.rstudio.com/bin/macosx/ - Linux: https://cran.rstudio.com/bin/linux/ 2. Now download RStudio from the RStudio website – IDE of R - https://posit.co/download/rstudio-desktop/ 3. If you are using a Mac, you may need to download XQuartz. https://www.xquartz.org/ - If you run into issues, try downloading this to see if it helps. Go with all default options <br> <br> **Other Option**: If you would prefer to not download it to your computer: https://posit.cloud --- ### R Studio Set-Up <img src="./images/rstudio.png" alt="Picture of the R Studio Interface. The top left is called the script and it is where you can write your code. You can save your script to access the code later. The bottom left is the Console which looks idential to base R. This is where you will see your outputs. The top right is the enviroment tab where you will see all the data you import and any created objects. The bottom right is where plots will appear and you can find a list of downloaded packages." width="100%" style="display: block; margin: auto;" /> To Create a Script, Click the Following: File -> New File -> R script --- ### R Studio Set-Up - Editor (Top Left) <img src="./images/editor.jpeg" alt="Zoomed in view of the Editor window, where the user can write and save code. Highlighted in the top right of the picture is the Run button, which runs the line of code your cursor is placed on." width="90%" style="display: block; margin: auto;" /> <br> **Shortcuts for running code**: To run code from your script: + Windows: Ctrl + Enter + Mac: Cmd + Enter --- ### R Studio Set-Up - Console (Bottom Left) <img src="./images/console.jpeg" alt="Zoomed in view of the console window. Code output will appear at the bottom of this window." width="100%" style="display: block; margin: auto;" /> --- ### R Studio Set-Up - Environment (Top Right) <img src="./images/environment.jpeg" alt="Zoomed in view of the Environment tab, where loaded in datasets will appear ready for use. Highlighed are the buttons that are helpful. The import dataset button offers a point-and-click method to load in a dataset. Additionally, the broom button will clear the environment." width="100%" style="display: block; margin: auto;" /> --- ### R Studio Set-Up - Environment (Bottom Right) <!-- Trigger the Modal --> <img id='imgfiles' src='./images/files.png' alt='Files' width='30%'> <!-- Trigger the Modal --> <img id='imgpackages' src='./images/packages.png' alt='Packages' width='30%'> <!-- Trigger the Modal --> <img id='imghelp' src='./images/help.png' alt='Help' width='30%'> <!-- The Modal --> <div id='modalfiles' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalfiles'> <!-- Modal Caption (Image Text) --> <div id='captionfiles' class='modal-caption'></div> </div> <!-- The Modal --> <div id='modalpackages' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalpackages'> <!-- Modal Caption (Image Text) --> <div id='captionpackages' class='modal-caption'></div> </div> <!-- The Modal --> <div id='modalhelp' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalhelp'> <!-- Modal Caption (Image Text) --> <div id='captionhelp' class='modal-caption'></div> </div>