R is an open-source programming language and software used for statistical computing, analysis, and data visualization.

In this piece, we will explain how R handles different data types and various operations on data.

Primary data type: There are mainly three data types in R namely:

· Numerical data

· Logical or Boolean data

· Character data

Numeric or Numerical data are any numbers that can be decimal or integers. This can be discrete or continuous such as covid patients in India, animal weight, and average marks of class 9th student. Logical or Boolean data consist of True or False, and it can be represented by 1 or 0. Logical data is basically used for comparison. Character data consist of text, such as the name of anything. To store character values in R, put character value inside “” or double-quotes.

Variable: A variable is used to hold different values of different types (or the same type). To assign multiple values to any variable, we write the variable name to the left, followed by <- or =, and then value. Variable names should not start with any special character and with any number.

Code:

Ø X<-5

Ø X=5

Data Structures: The different data structures in R namely:

· Vectors

· Matrices

· Arrays

· Data Frame

· Lists

· Factors

Vectors: Vectors are used to store single or multiple values with similar data types in a variable. Vectors are considered to be one-dimensional arrays. I have defined the above X variable as a vector. To store multiple numeric values in a vector, we need concatenation (c()) and separate all values in a variable except the last value.

Ø X=c(2,9,8,7,3,9)

We can also store different data types as numeric and character data sets in a single variable.

Ø Y=c(45,4.98,5.36,47,”Xyz”, “abc”)

We can check the data type (class) of stored data in any variable in R with class(variable_name).

Ø Class(X)

Output: [1] “numeric”

Ø Class(Y)

Output: [1] “character”

So, X variable’s data type (class) is numeric, and Y variable’s data type (class) is character.

We can also level vectors or give names to different values in variables with another name per our need. For example, we want to assign the inflation rate values in Indian states with urban weight.

Ø CPI_inf=c(Cities=c(“Bihar”,”Delhi”,”Odisha”,”Punjab”), Urban=c(1.62,5.64,1.31,3.09))

Matrix: A matrix is a two-dimensional array of data components with a defined number of rows and columns. A matrix cannot mix two different elements, just like a vector cannot include more than one type of element. Suppose we have prices for any three products for the months of April, May, and June, and we want to stack all these products with all months in a single variable. We can not use vectors for this stack. One of the data structures to overcome this problem is the matrix.

Ø April_price=c(25,26,47)

Ø May_price=c(30,35,34)

Ø June_price=c(31,35,39)

We can combine these three vectors inside a matrix followed by commas(,)with a command matrix(). The number of matrices(nrow = 3) indicates that there are three different items combined, with each column listing the prices of various things in a specific month and each row listing those prices for multiple items in that month.

Ø All_price=matrix(c(April_price,May_price,June_price),nrow=3)

The All_price data frame will look like this in R window:

Output:

Ø All_price

[,1] [,2] [,3]

[1,] 25 30 31

[2,] 26 35 35

[3,] 47 34 39

Suppose we want to arrange the above product displayed column-wise and prices row-wise with every other row and column. The first row reflects the prices of various things in a particular month, and the first column reflects the prices of multiple items in the first month (April). We can do it easily by adding byrow=TRUE in the above code.

Ø All_price_1=matrix(c(April_price,May_price,June_price),nrow=3,byrow = TRUE)

Output:

Ø All_price_1

[,1] [,2] [,3]

[1,] 25 26 47

[2,] 30 35 34

[3,] 31 35 39

Array: Similar to matrix, arrays also support more than two dimensions. Prices for various items for April, May, and June 2022 are listed in the All_price_1 row. Let’s say we also wish to record 2021 prices. In this situation, we want to add two 3x3 matrices, the first of which belongs to 2022 and the latter refers to 2021. We may do this by using array(). An array(m, n, p) specifies how many matrices we want to store, while m and n denote the matrix’s dimensions and p number of matrices.

Here we define six vectors for three different months for two different years. Create an array by combining six other vectors using array() and c().

Ø April_2022=c(22,36,58)

Ø May_2022= c(21,34,18)

Ø June_2022= c(25,36,37)

Ø April_2021= c(14,36,58)

Ø May_2021= c(22,31,41)

Ø June_2021= c(22,36,28)

Combining above vectors in array.

Ø Combined_Price=array(c(April_2022,May_2022,June_2022,April_2021,May_2021,June_2021),dim=c(3,3,2))

Output:

Ø Combined_Price

, , 1

[,1] [,2] [,3]

[1,] 22 21 25

[2,] 36 34 36

[3,] 58 18 37

, , 2

[,1] [,2] [,3]

[1,] 14 22 22

[2,] 36 31 36

[3,] 58 41 28

Data frame: A data frame is very similar to a matrix but with the additional benefit of being able to mix different element types in a single data frame. You can store both numeric and character elements in this data structure. For example, you can also enter the names of different foods Items and store them in one data frame with different monthly prices. First, define the variables with the names of the various foods.

Let a variable with a name of different vegetables items.

Ø goods=c(“potato”,”tomato”,”onion”)

Now define a data frame of above goods and its prices by using data.frame() as follows-

Ø Veg_price=data.frame(goods,April_price,May_price,June_price)

Output:

Ø Veg_price

goods April_price May_price June_price

1 potato 25 30 31

2 tomato 26 35 35

3 onion 47 34 39

To access data frame element, use either $ or [[]]. For example, to see the prices of all goods in May, we use-

Ø Veg_price$May_price

[1] 30 35 34

Or we can use this-

Ø Veg_price[[“May_price”]]

[1] 30 35 34

We can also access different vegetable product prices with other months. For example, if we want to know the price of Onion in May, we use the code-

Ø Veg_price[3,3]

[1] 34

Lists: Lists give you almost all the benefits of a data frame and the ability to store different sets of elements of different lengths (columns in the case of data frames). Suppose the vector April_price and May_Price have 4 or 5 elements and June_Price has only 3 elements. Therefore, you can not use a data frame to store all these values in a single variable. You can use a list instead.

Ø April_price1=c(25,26,47,21)

Ø May_price1=c(30,35,34,35)

Ø All_list=list(goods,April_price1,May_price1,June_price)

Output:

All_list

[[1]]

[1] “potato” “tomato” “onion”

[[2]]

[1] 25 26 47 21

[[3]]

[1] 30 35 34 35

[[4]]

[1] 31 35 39

Factor: Factors are data objects used to classify and store data as levels. It can store both strings and integers. These are useful for columns that have a limited number of unique values. For example, in data field a variable called gender may contain “Male”, “Female”, “Transgender” or any other variable may have “true” or “false” value.

For example, two vector contain height and gender.

Ø Height=c(133,152,187,172,151,139)

Ø Gender= c(“Female”,”Female”,”Male”,”Male”,”Female”,”Female”)

Create a data frame of above vectors.

Ø data=data.frame(Height,Gender)

Output:

Ø data

Height Gender

1 133 Female

2 152 Female

3 187 Male

4 172 Male

5 151 Female

6 139 Female

Now, create factor data of the above data set.

Ø factor_data=factor(data$Gender)

Output:

factor_data

[1] Female Female Male Male Female Female

Levels: Female Male

This ends with the Basics data type and data structure in R. In the next piece, there will be an explanation basic operations, functions and looping in R.

Share this post