The data munging process in Python: An overview-0.1
In my previous abstract, we have learnt the following about data munging process: what is a DataFrame and how to create one. Refer to my previous article. This abstract will give you a brief idea about munging process.
Step 1(Continued..): Inspect data –
Checking attributes- DataFrame:
Let's look at some ways to access the data in a DataFrame. We already did marginally of this back in an anterior edification when we did the broad overview of Pandas Data Structures, so let’s expeditiously go over this. Now checking DataFrame Attributes, which are auxiliary when we optate to fetch information cognate to a particular DataFrame. The important data frame attributes are index, columns, dtypes, info, shape, size, count, astype, transpose(T). Let's try to understand these attributes by considering the following example.
Attributes |
Description |
DataFrame.index |
This will fetch the
index’s names of the DataFrame. |
DataFrame.columns |
Give the column
labels of the DataFrame. |
DataFrame.shape |
It gives a tuple
that represents the dimensionality of a DataFrame. |
DataFrame.dtypes |
Return the dtypes
in the DataFrame. This will return a
Series object with the column names as the index labels and the corresponding
data types as the values. |
DataFrame.info |
This method returns
information about the DataFrame, such as index dtype and column, non-null
value, and memory usage. |
DataFrame.size |
For Series, returns
the number of rows. Otherwise, in the case of DataFrame, it returns the
number of rows multiplied by the number of columns. |
Series.count |
It counts non-NA
cells for each column or row. |
Series.astype |
It will change the
data types of the DataFrame that we’re working with. |
Series.T |
This attribute used
to transpose the DataFrame |
Read more
attributes at Pandas DataFrame |
Check for – value, missing values:
While checking if an item exists or not: use ‘in’ keyword.
To Detect if there is any missing data: missing data in Pandas appears as NaN (Not a number), and to detect them, we use- isnull() and notnull() functions.
All right, so in the next lesson, we’ll talk about Pandas Descriptive Statistics on numerical and categorical data and also will move forward to step two i.e. Clean & Data Manipulation.