February 22, 2020R is a free software environment for statistics and graphics. Alongside Python, it is one of the most popular tools used for statistical analysis. R was released in 1995 and the current version is 3.1.3. As of 2018, Python is still an extremely popular programming language, but R has been rising in popularity of late, according to the Tiobe Index. https://www.tiobe.com/tiobe-index/r/ A word of caution however – the popularity of programming languages can change quite rapidly within the space of a year.
R vs Python
One of the main differences between R and Python is that instead of being a general purpose tool, its main focus is on statistics. Therefore, it enables features for powerful data visualisation that are not possible elsewhere. Another difference is that Python focuses on code readability, whereas R comes with support for user friendly data analysis.
Why Learn R?
There are numerous advantages to learning R. It is easier for beginners who want to explore, as you can setup a model within a few lines of code. In other programming languages and tools, you would have to carry out a number of tasks in order to get something up and running. By seeing the possibilities of your code early on, you can imagine and understand the possibilities sooner rather than later.
Part I: Getting Started with R
This first tutorial will guide you through the basics of R through to an example from them of how to get it to work. We will run through an example download, setup and run of the software. In part II, we will then look at using a CSV file with R in order to understand how to use real-world examples.
There are two ways to install and use R, one is with the R Studio, available at https://www.rstudio.com/. R Studio is an Integrated Development Environment (IDE) which basically allows you to use a source code editor, build automation tools and provides a way to debug your code.
Why use the console version then? The main reason for using the console is that the IDE is normally used by developers for making programs, but in this tutorial we only want to run some lines of code to see what is possible – so we don’t need all the extra add-ons. If involved in a large project, it would then make sense to have the extras of the IDE.
For this tutorial, you can use a any text editor (such as Notepad or Text-Edit or even R’s Command-line interface to edit and run an R script).
So first off – visit R’s official project homepage. https://www.r-project.org/. You should see this screen:
There are a number of different installation packages available depending on your computer setup. To select the correct software from the R website, visit the page https://cloud.r-project.org/ where you will be presented with the following screen:
Choose the software download that you need by clicking on the “Download R” link for your computer’s operating system at the top of the page. For demonstration purposes, this tutorial will look at the Windows installation.
Once you have picked your own operating system, you will be brought to a page that should have your operating system appear as the heading with the download link for R underneath. As can be seen in the example below, the file we are going to download is about 60Mb in size, so we can select the first link at the top of the page:
Once you click on this link this software is downloaded, and you can install it. Run the setup and once you have finished, you should be presented with the following window:
Double click on the newly installed software’s icon on your desktop or in the program menu, and the Graphical User Interface for R appears. You can see a menu at the top with a sub-menu then a “console” which has some copyright and licencing information, followed by some system information and then a cursor with some options. This is where you can communicate with the software.
R looks a bit daunting at first, because there are no handy buttons to open files with, and when you click on the “open” button, the only files that can be opened are “R” files or it’s predecessor “S”. The reason for this is that you are supposed to open scripts that are written in R so that you can complete programming operations on your data files, but you are supposed to actually import your data files.
To communicate with the software, click inside of the console window and type the command that simply says
print ("Hello World")
the program should respond like this:
print ("hello world")
Digging Into Data – The Activity of “Old Faithful”
Now that we are getting a response from the program, we will want to do more. To clear the console, simply use the keyboard shortcut, Ctrl+L or on a Mac, Option+Ctrl+L.
You can bring up a list of datasets by simply typing data(). This will bring up a window that shows the list of datasets which are already built in (real-world examples which have been downloaded with the software) and some information on them:
To load these datasets, simply type the command data(name of dataset) and this command adds that dataset into your console. It doesn’t show it, because data(name of dataset) is simply loading the data. To view these datasets, you only have to type the actual name of them once they are loaded.
In this example we loaded the eruption data for Old Faithful by typing
and then just
You can see the data on the left hand side, inside the console which looks like a database entry – the id of the eruption is in the first column, and the “Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA.” The “Format” of this dataset then is described as “A data frame with 272 observations on 2 variables.” For more information on what these datasets do, visit the R documentation website at: https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/
These datasets can now be traversed with R commands and visualised in seconds! For example, in the case of Old Faithful, visit the webpage https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/faithful.html and copy all of the code at the bottom of that webpage, it starts off like this:
f.tit <- “faithful data: Eruptions of Old Faithful”
Paste this code into your console window, and observe what it does with the data. The code loads in some graphics and tells R what needs to be drawn. R then plots this data with the graphics and under instruction it produces a visualisation:
This simple example shows what can be done using R. Think about your own ideas and what it might be useful for. Try the other examples and see what they show and how the plotting is done. Continue to get an idea for what R is useful for, and then we will try out our own data.
More R Commands
Return to the console and type ‘summary(faithful)’ to see what this does. Below we can see the minimum, the maximum and the average wait times overall.
Interpreting Our Code, Working with Variables (advanced)
In order to understand the above code for our graph, we need to go through it line for line (this will help us later when we create our own graphs).
f.tit <- “faithful data: Eruptions of Old Faithful”
This means that the graph title on the right hand side will be stored as f.tit
Think of f.tit simply as shorthand for the sentence
Now look at the other code such as
“ne60 <- round(e60 <- 60 * faithful$eruptions)
faithful$better.eruptions <- ne60 / 60
te <- table(ne60)”
These lines of code simply assign values to different parts of the graph that we draw. In the end, they are plotted with x and y axes, title, colours and values.
So any time you see <- in a line of code that means that the value on the right hand side is being stored in the value on the left hand side. Once it is stored, it is then called a “variable”. You can access all of your variables by typing
To remove any of these variables, simply type ‘rm(name of variable)’ and then type ‘ls()’ again.
Part II: Creating Your Own Projects
Now that you have went through a setup of R, opened datasets that are built-in and understood what the code is about, we’ll look at a real-world example. Once you have a hang on this second part, you can really start to see how practical examples can be carried out. For this second part, we will download a sample CSV dataset from the World Wide Web, and see what we can do with it. The dataset chosen for this tutorial is “Angling Stands” around Roscommon, a practical (and small) example to work with. It can be found here: https://data.gov.ie/dataset/roscommon-angling-stands1
From this webpage, download the CSV file Roscommon_Angling_Stands.csv, and rename it to file.csv for ease of use.
What we need to do now is feed in the CSV file from outside of the R program – to locate the file in a directory on your computer.
Before we can access the file, we need to setup a directory. In order to import your data files, you need to set the directory where R is going to look for them. In some cases it is best to create a folder on your C: drive. Go onto your start menu and open “My Computer”. When Windows Explorer opens up, double-click on C: and create a new folder in there called “RData”:
Now return to the R program, clear the screen CTRL+L or Option+Ctrl+L and in the command section type:
setwd("C:/RData") – this sets the current working directory to RData (where you will store file.csv)
getwd() – shows the current directory that R is pointing towards
list.files() – shows the current list of files within that directory
The setwd has now set the current working directory of the R software to the new folder created on the C: drive. Put the Roscommon Angling file (which you renamed to file.csv earlier in this tutorial) into the RData directory on the C: drive. You can now read that file into the console by typing:
myData <- read.csv(“C:/RData/file.csv”, header=TRUE)
This will add your data to the software’s memory, in shorthand (or variable) as myData. It can now be read by the software in different ways. See if you can access it – here are some examples:
myData – view all of your data
head(myData) – shows the column headers for your data with some data
myData – all of the rows between 1 and 20 plus the columns
names(myData) – shows the columns only
Just as in Excel where you can view and edit data based on columns and rows, you can run a line of code in R to see a similar type of layout. Here is that line:
Mapping with Leaflet
Our file.csv contains a number of different latitudinal and longitudinal co-ordinates. Next thing we are going to do is to plot these points in R in combination with LeafletJS, an open-source mapping tool. R brings the processed data, Leaflet brings the mapping software.
Our file.csv is already loaded into memory and waiting to be processed, we can now add the leaflet package.
In the R GUI, go to Packages>Install Package(s)
In the next window, select “UK”, and then in the following window, scroll down and double click on “leaflet”
It will begin loading up the package. Once this has installed, return to the console window and type:
There should be no response, only that the cursor moves onto a new line.
If so, now we are ready to combine file.csv with leafletJS.
To create a new map simply type the following:
newmap = addProviderTiles(newmap, provider = "CartoDB.Positron")
newmap=setView(newmap, lng =-8.18333, lat=53.63333, zoom=15)
To view this new map simply type
Congratulations – You have created your first map with R and Leaflet! You can change the BaseMap simply by editing the “CartoDB.Positron” above, and running that same line of code. Have a look at your options on the right hand side over at this page: http://leaflet-extras.github.io/leaflet-providers/preview/
So now that this is working, we want to read in the markers in order to populate this map. First we will create one marker to test. Copy and paste the code below into your console window and hit enter:
newmap=addMarkers(newmap, lng =-8.18333, lat=53.63333, popup="Roscommon")
If that runs ok, that’s one marker completed. Go ahead and test your map with the “newmap” command to see if it shows up. To fill everything in from the csv file, just copy and paste the following code in the same way:
newmap=addMarkers(newmap, lng = myData$WGS84Longitude, lat =myData$WGS84Latitude, popup = paste("Name:", myData$Name, "
", "Type:", myData$Type))
Summary and Full code for part II:
Download the CSV (from https://data.gov.ie/dataset/roscommon-angling-stands8b404)
Rename it file.csv
Create a folder on the C: drive called RData
Copy and paste the file.csv into RData folder
myData- read.csv(“C:/RData/file.csv”, header=TRUE)
newmap = addProviderTiles(newmap, provider = “CartoDB.Positron”)
newmap=setView(newmap, lng=-8.1333, lat=53.65333, zoom=10)
newmap=addMarkers(newmap, lng = myData$WGS84Longitude, lat =myData$WGS84Latitude, popup = paste(“Name:”, myData$Name, ”
“, “Type:”, myData$Type))
This is just one of the many ways you can work with R. Have a look at the other tutorials online, such as at The Programming Historian which shows how to work with Tabular Data: https://programminghistorian.org/lessons/r-basics-with-tabular-data. What other ways can we get R to help us with datasets? [...]