The data munging process in Python: An overview-0.1

The data munging process in Python: An overview-0.1
Photo by Claudio Schwarz / Unsplash

In my previous abstract, we have learnt the following about data munging process: what is a DataFrame and how to create one. Refer to my previous article. This abstract will give you a brief idea about munging process.

Step 1(Continued..): Inspect data –

  • Checking attributes- DataFrame:

Let's look at some ways to access the data in a DataFrame. We already did marginally of this back in an anterior edification when we did the broad overview of Pandas Data Structures, so let’s expeditiously go over this. Now checking DataFrame Attributes, which are auxiliary when we optate to fetch information cognate to a particular DataFrame. The important data frame attributes are index, columns, dtypes, info, shape, size, count, astype, transpose(T). Let's try to understand these attributes by considering the following example.

Attributes

Description

DataFrame.index

This will fetch the index’s names of the DataFrame.

DataFrame.columns

Give the column labels of the DataFrame.

DataFrame.shape

It gives a tuple that represents the dimensionality of a DataFrame.

DataFrame.dtypes

Return the dtypes in the DataFrame.  This will return a Series object with the column names as the index labels and the corresponding data types as the values.

DataFrame.info

This method returns information about the DataFrame, such as index dtype and column, non-null value, and memory usage.

DataFrame.size

For Series, returns the number of rows. Otherwise, in the case of DataFrame, it returns the number of rows multiplied by the number of columns.

Series.count

It counts non-NA cells for each column or row.

Series.astype

It will change the data types of the DataFrame that we’re working with.

Series.T

This attribute used to transpose the DataFrame

Read more attributes at Pandas DataFrame

  • Check for – value, missing values:

While checking if an item exists or not: use ‘in’ keyword.

To Detect if there is any missing data: missing data in Pandas appears as NaN (Not a number), and to detect them, we use- isnull() and notnull() functions.

All right, so in the next lesson, we’ll talk about Pandas Descriptive Statistics on numerical and categorical data and also will move forward to step two i.e. Clean & Data Manipulation.