Skip to content Skip to sidebar Skip to footer

How to Read a Random Csv R

How to Work With Data Frames and CSV Files in R — A Detailed Introduction with Examples

Welcome! If you want to start diving into information science and statistics, then data frames, CSV files, and R will be essential tools for you. Let's encounter how yous tin use their astonishing capabilities.

In this commodity, you lot will larn:

  • What CSV files are and what they are used for.
  • How to create CSV files using Google Sheets.
  • How to read CSV files in R.
  • What Information Frames are and what they are used for.
  • How to access the elements of a data frame.
  • How to modify a data frame.
  • How to add and delete rows and columns.

We will utilise RStudio, an open-source IDE (Integrated Development Environment) to run the examples.

Let's begin! ✨

🔹 Introduction to CSV Files

CSV (Comma-separated Values) files can be considered ane of the building blocks of data assay because they are used to shop data represented in the course of a table.

In this file, values are separated past commas to correspond the different columns of the table, like in this example:

image-153
CSV File

We will generate this file using Google Sheets.

🔸 How to Create a CSV File Using Google Sheets

Let's create your showtime CSV file using Google Sheets.

Pace one: Get to the Google Sheets Website and click on "Go to Google Sheets":

image-227

💡 Tip: You tin can access Google Sheets by clicking on the button located at the height-right edge of Google's Home Page:

image-228

If nosotros zoom in, nosotros come across the "Sheets" button:

image-156

💡 Tip: To use Google Sheets, you demand to take a Gmail account. Alternatively, you tin create a CSV file using MS Excel or some other spreadsheet editor.

You will see this panel:

image-157

Step 2: Create a blank spreadsheet by clicking on the "+" button.

image-158

Now y'all take a new empty spreadsheet:

image-159

Step iii: Alter the name of the spreadsheet to students_data. We volition need to use the name of the file to work with data frames. Write the new name and click enter to confirm the change.

image-162

Footstep 4: In the get-go row of the spreadsheet, write the titles of the columns.

image-160

When you import a CSV file in R, the titles of the columns are called variables. We will define 6 variables: first_name, last_name, historic period, num_siblings, num_pets, and eye_color, as you can see right here beneath:

image-163

💡 Tip: Notice that the names are written in lowercase and words are separated with an underscore. This is not mandatory, but since y'all will demand to access these names in R, it'due south very mutual to use this format.

Step 5: Enter the data for each one of the columns.

When y'all read the file in R, each row is chosen an observation, and information technology corresponds to data taken from an individual, animal, object, or entity that we collected information from.

In this example, each row corresponds to the information of a pupil:

image-164

Step six: Download the CSV file by clicking on File -> Download -> Comma-separated values, as you lot tin meet below:

image-165

Step 7: Rename the file CSV file. You will need to remove "Sheet1" from the default proper noun considering Google Sheet will automatically add this to the name of the file.

image-169

Great work! At present you lot take your CSV file and it's time to starting time working with it in R.

🔹 How to Read a CSV file in R

In RStudio, the get-go step before reading a CSV file is making sure that your current working directory is the directory where the CSV file is located.

💡 Tip: If this is not the instance, you lot will need to apply the total path to the file.

Change Current Working Directory

Y'all can modify your current working directory in this panel:

image-172

If we zoom in, you tin can see the current path (1) and select the new ane past clicking on the ellipsis (...) button to the right (2):

image-171

💡 Tip: You can likewise check your electric current working directory with getwd() in the interactive console.

Then, click "More" and "Prepare Equally Working Directory".

image-175

Read the CSV File

One time you lot have your current working directory set up, y'all can read the CSV file with this command:

image-176

In R lawmaking, we have this:

                > students_data <- read.csv("students_data.csv")              

💡 Tip: We assign it to the variable students_data to access the information of the CSV file with this variable. In R, we tin can separate words using dots ., underscores _, UpperCamelCase, or lowerCamelCase.

After running this command, you volition run into this in the pinnacle right console:

image-177

Now y'all take a variable defined in the environs! Let's run into what data frames are and how they are closely related to CSV files.

🔸 Introduction to Data Frames

Information frames are the standard digital format used to store statistical information in the grade of a table. When you read a CSV file in R, a data frame is generated.

We can confirm this past checking the type of the variable with the class function:

                > form(students_data) [1] "data.frame"              

It makes sense, right? CSV files contain information represented in the form of a table and information frames correspond that tabular data in your lawmaking, and so they are deeply connected.

If you enter this variable in the interactive console, you will see the content of the CSV file:

                > students_data   first_name last_name age num_siblings num_pets eye_color i      Emily    Dawson  xv            2        5      Blue 2       Rose Patterson  fourteen            5        0     GREEN three  Alexander     Smith  16            0        2     Chocolate-brown 4       Nora    Navona  sixteen            4       10     Light-green five       Gino      Sand  17            3        8      Blueish              

More than Information About the Information Frame

You take several different alternatives to see the number of variables and observations of the data frame:

  • Your first choice is to look at the meridian correct panel that shows the variables that are currently defined in the environment. This data frame has 5 observations (rows) and 6 variables (columns):
image-178
  • Another alternative is to use the functions nrow and ncol in the interactive panel or in your program, passing the data frame as argument. We get the aforementioned results: 5 rows and 6 columns.
                > nrow(students_data) [i] 5 > ncol(students_data) [1] 6              
  • You tin also see more information about the data frame using the str function:
                > str(students_data) 'data.frame':	5 obs. of  vi variables:  $ first_name  : Factor w/ 5 levels "Alexander","Emily",..: two 5 i 4 3  $ last_name   : Cistron w/ v levels "Dawson","Navona",..: 1 3 5 2 4  $ historic period         : int  15 xiv 16 16 17  $ num_siblings: int  ii 5 0 4 3  $ num_pets    : int  5 0 2 x eight  $ eye_color   : Cistron due west/ 3 levels "BLUE","BROWN",..: i 3 two three one              

This function (practical to a data frame) tells y'all:

  • The number of observations (rows).
  • The number of variables (columns).
  • The names of the variables.
  • The data types of the variables.
  • More information nearly the variables.

Y'all tin meet that this function is really great when you want to know more nearly the data that y'all are working with.

💡 Tip: In R, a "Factor" is a qualitative variable, which is a variable whose values represent categories. For example, eye_color has the values "BLUE", "BROWN", "Green" which are categories, and then as yous tin see in the output of str in a higher place, this variable is automatically divers as a "cistron" when the CSV file is read in R.

🔹 Data Frames: Key Operations and Functions

Now you know how to see more data about the information frame. But the magic of data frames lies in the amazing capabilities and functionality that they offer, so let'south run across this in more item.

How to Access A Value of a Data Frame

Information frames are like matrices, so you can access individual values using 2 indices surrounded past square brackets and separated past a comma to signal which rows and which columns yous would similar to include in the result, like this:

image-181

For case, if we want to access the value of eye_color (column six) of the fourth student in the information (row 4):

image-182

Nosotros need to use this command:

                > students_data[iv, half dozen]              

💡 Tip: In R, indices start at 1 and the first row with the names of the variables is non counted.

This is the output:

                [1] Green Levels: Blue BROWN GREEN              

You lot tin can run across that the value is "Greenish". Variables of type "gene" have "levels" that represent the different categories or values that they can take. This output tells u.s. the levels of the variable eye_color.

How to Access Rows and Columns of a Data Frame

Nosotros can besides use this syntax to admission a range of rows and columns to get a portion of the original matrix, like this:

image-179

For example, if we want to get the historic period and number of siblings of the third, fourth, and fifth student in the listing, we would use:

                > students_data[iii:v, iii:4]    age num_siblings iii  xvi            0 4  16            4 v  17            3              

💡 Tip: The basic syntax to define an interval in R is <get-go>:<end>. Note that these indices are inclusive, so the third and fifth elements are included in the case above when we write three:5.

If we want to get all the rows or columns, we only omit the interval and include the comma, like this:

                > students_data[3:5,]    first_name last_name historic period num_siblings num_pets eye_color 3  Alexander     Smith  xvi            0        2     BROWN 4       Nora    Navona  16            4       ten     GREEN five       Gino      Sand  17            3        8      Blueish              

Nosotros did not include an interval for the columns after the comma in students_data[3:5,], so nosotros go all the columns of the information frame for the 3 rows that we specified.

Similarly, we tin can get all the rows for a specific range of columns if we omit the rows:

                > students_data[, 1:3]    first_name last_name historic period one      Emily    Dawson  xv ii       Rose Patterson  14 3  Alexander     Smith  16 4       Nora    Navona  16 5       Gino      Sand  17              

💡 Tip: Notice that you all the same demand to include the comma in both cases.

How to Access a Column

There are three ways to access an entire column:

  • Option #1: to access a column and return information technology as a data frame, you can apply this syntax:
image-184

For example:

                > students_data["first_name"]    first_name 1      Emily 2       Rose three  Alexander 4       Nora v       Gino              
  • Option #2: to go a column as a vector (sequence), you can utilize this syntax:
image-185

💡 Tip: Notice the apply of the $ symbol.

For example:

                > students_data$first_name  [1] Emily     Rose      Alexander Nora      Gino      Levels: Alexander Emily Gino Nora Rose              
  • Choice #iii: You can also use this syntax to get the cavalcade every bit a vector (run across beneath). This is equivalent to the previous syntax:
                > students_data[["first_name"]]  [1] Emily     Rose      Alexander Nora      Gino      Levels: Alexander Emily Gino Nora Rose              

How to Filter Rows of a Data Frame

You tin can filter the rows of a data frame to become a portion of the matrix that meets certain conditions.

For this, nosotros utilize this syntax, passing the condition as the first element inside square brackets, so a comma, and finally leaving the second element empty.

image-190

For example, to get all rows for which students_data$historic period > sixteen, we would use:

                > students_data[students_data$age > xvi,]    first_name last_name age num_siblings num_pets eye_color 5       Gino      Sand  17            3        8      Blueish              

We  get a data frame with the rows that run into this condition.

Filter Rows and Cull Columns

You can combine this condition with a range of columns:

                > students_data[students_data$age > 16, iii:6]    historic period num_siblings num_pets eye_color five  17            3        8      Blueish              

We go the rows that run across the condition and the columns in the range iii:6.

🔸 How to Change Data Frames

You tin modify private values of a data frame, add columns, add together rows, and remove them. Allow's run into how you lot can do this!

How to Modify A Value

To alter an individual value of the information frame, you lot demand to use this syntax:

image-191

For example, if nosotros want to modify the value that is currently at row 4 and column six, denoted in blue right hither:

image-182

We need to use this line of code:

                students_data[4, 6] <- "Chocolate-brown"              

💡 Tip: You lot can also use = as the assignment operator.

This is the output. The value was inverse successfully.

image-193

💡 Tip: Call up that the first row of the CSV file is non counted as the beginning row because it has the names of the variables.

How to Add together Rows to a Data Frame

To add a row to a data frame, y'all need to use the rbind function:

image-194

This function takes two arguments:

  • The data frame that you desire to modify.
  • A list with the data of the new row. To create the listing, you lot can use the list() function with each value separated by a comma.

This is an example:

                > rbind(students_data, list("William", "Smith", fourteen, 7, 3, "Brownish"))              

The output is:

                                  first_name last_name historic period num_siblings num_pets eye_color one      Emily    Dawson  15            2        5      BLUE 2       Rose Patterson  14            5        0     Dark-green 3  Alexander     Smith  sixteen            0        2     Dark-brown 4       Nora    Navona  16            4       10     Dark-brown five       Gino      Sand  17            3        8      Bluish six       <NA>     Smith  14            7        three     BROWN              

But wait! A warning message was displayed:

                Warning message: In `[<-.factor`(`*tmp*`, ri, value = "William") :   invalid factor level, NA generated              

And observe the first value of the sixth row, it is <NA>:

                half-dozen       <NA>     Smith  14            vii        3     BROWN              

This occurred because the variable first_name was defined automatically as a factor when nosotros read the CSV file and factors have fixed "categories" (levels).

You lot cannot add a new level (value - "William") to this variable unless you lot read the CSV file with the value FALSE for the parameter stringsAsFactors, as shown below:

                > students_data <- read.csv("students_data.csv", stringsAsFactors = FALSE)              
image-196

Now, if we effort to add together this row, the data frame is modified successfully.

                > students_data <- rbind(students_data, listing("William", "Smith", 14, vii, iii, "BROWN")) > students_data    first_name last_name age num_siblings num_pets eye_color ane      Emily    Dawson  15            ii        5      Bluish 2       Rose Patterson  14            5        0     GREEN 3  Alexander     Smith  16            0        ii     BROWN 4       Nora    Navona  16            four       10     GREEN 5       Gino      Sand  17            iii        8      Bluish vi    William     Smith  14            7        3     BROWN              

💡 Tip: Note that if you read the CSV file again and assign it to the aforementioned variable, all the changes fabricated previously will be removed and y'all will see the original information frame. You need to add this argument to the beginning line of code that reads the CSV file and then make changes to it.

How to Add Columns to a Information Frame

Adding columns to a information frame is much simpler. You lot need to use this syntax:

image-197

For instance:

                > students_data$GPA <- c(four.0, 3.five, 3.2, 3.15, 2.9, three.0)              

💡 Tip: The number of elements has to exist equal to the number of rows of the data frame.

The output shows the data frame with the new GPA cavalcade:

                > students_data    first_name last_name age num_siblings num_pets eye_color  GPA 1      Emily    Dawson  15            2        5      Blue 4.00 2       Rose Patterson  14            5        0     GREEN 3.l 3  Alexander     Smith  16            0        2     BROWN 3.20 4       Nora    Navona  16            four       ten     GREEN 3.15 v       Gino      Sand  17            iii        eight      Blueish 2.ninety half dozen    William     Smith  14            7        3     BROWN 3.00              

How to Remove Columns

To remove columns from a data frame, you demand to utilize this syntax:

image-198

When you assign the value Naught to a cavalcade, that column is removed from the data frame automatically.

For example, to remove the age cavalcade, we apply:

                > students_data$age <- Naught              

The output is:

                > students_data    first_name last_name num_siblings num_pets eye_color  GPA 1      Emily    Dawson            ii        v      BLUE iv.00 ii       Rose Patterson            five        0     GREEN 3.l 3  Alexander     Smith            0        2     BROWN 3.20 4       Nora    Navona            4       10     GREEN 3.fifteen 5       Gino      Sand            3        8      Blue two.xc 6    William     Smith            7        3     BROWN 3.00              

How to Remove Rows

To remove rows from a information frame, you can utilise indices and ranges. For example, to remove the offset row of a data frame:

image-200

The [-i,] takes a portion of the information frame that doesn't include the first row. And then, this portion is assigned to the aforementioned variable.

If we take this information frame and we want to delete the start row:

image-230

The output is a data frame that doesn't include the commencement row:

image-231

In general, to remove a specific row, y'all need to employ this syntax where <row_num> is the row that you desire to remove:

image-229

💡 Tip: Notice the - sign earlier the row number.

For example, if we want to remove row 4 from this data frame:

image-232

The output is:

image-233

As you can see, row 4 was successfully removed.

🔹 In Summary

  • CSV files are Comma-Separated Values Files used to represent data in the form of a tabular array. These files tin can exist read using R and RStudio.
  • Information frames are used in R to represent tabular data. When you read a CSV file, a data frame is created to shop the data.
  • You tin access and alter the values, rows, and columns of a information frame.

I really hope that you liked my article and found it helpful. Now you can work with information frames and CSV files in R.

If yous liked this article, consider enrolling in my new online course "Introduction to Statistics in R - A Practical Arroyo "



Larn to code for free. freeCodeCamp's open up source curriculum has helped more than 40,000 people get jobs as developers. Get started

purdybeave1944.blogspot.com

Source: https://www.freecodecamp.org/news/how-to-work-with-data-frames-and-csv-files-in-r/

Postar um comentário for "How to Read a Random Csv R"