Data Science

The data munging process in Python: An overview-0.1

Satyabrata Sahoo

Mar 2, 2022 — 3 min read

In my previous abstract, we have learnt the following about data munging process: what is a DataFrame and how to create one. Refer to my previous article. This abstract will give you a brief idea about munging process.

Step 1(Continued..): Inspect data –

Checking attributes- DataFrame:

Let's look at some ways to access the data in a DataFrame. We already did marginally of this back in an anterior edification when we did the broad overview of Pandas Data Structures, so let’s expeditiously go over this. Now checking DataFrame Attributes, which are auxiliary when we optate to fetch information cognate to a particular DataFrame. The important data frame attributes are index, columns, dtypes, info, shape, size, count, astype, transpose(T). Let's try to understand these attributes by considering the following example.

Attributes	Description
DataFrame.index	This will fetch the index’s names of the DataFrame.
DataFrame.columns	Give the column labels of the DataFrame.
DataFrame.shape	It gives a tuple that represents the dimensionality of a DataFrame.
DataFrame.dtypes	Return the dtypes in the DataFrame. This will return a Series object with the column names as the index labels and the corresponding data types as the values.
DataFrame.info	This method returns information about the DataFrame, such as index dtype and column, non-null value, and memory usage.
DataFrame.size	For Series, returns the number of rows. Otherwise, in the case of DataFrame, it returns the number of rows multiplied by the number of columns.
Series.count	It counts non-NA cells for each column or row.
Series.astype	It will change the data types of the DataFrame that we’re working with.
Series.T	This attribute used to transpose the DataFrame
Read more attributes at Pandas DataFrame

Check for – value, missing values:

While checking if an item exists or not: use ‘in’ keyword.

To Detect if there is any missing data: missing data in Pandas appears as NaN (Not a number), and to detect them, we use- isnull() and notnull() functions.

All right, so in the next lesson, we’ll talk about Pandas Descriptive Statistics on numerical and categorical data and also will move forward to step two i.e. Clean & Data Manipulation.

The data munging process in Python: An overview-0.1

Satyabrata Sahoo

Step 1(Continued..): Inspect data –

Checking attributes- DataFrame:

Check for – value, missing values:

Read more

A Treaty Towards Transparency and Fairness

Are Evidence-Based Medicine and Public Health Incompatible?

Book Launch: Ek Samandar Mere Andar

This Weekend on IP Wave: How do you create value? More 'Ferris Bueller's Day Off,' Less 'The Terminator'