Cleaning Empty Cells

Empty Cells

Missing cells can lead to inaccurate results when analyzing data.

Remove Rows

One approach to handling empty cells is to remove rows that contain them. This is often acceptable, as datasets are typically large, and removing a few rows usually has minimal impact on the results.

Example

Generate a new DataFrame excluding rows with empty cells:

import pandas as pd

df = pd.read_csv(‘data.csv’)

new_df = df.dropna()

print(new_df.to_string())

Note: By default, the dropna() method creates a new DataFrame without modifying the original.

To modify the original DataFrame, use the inplace=True argument.

Example

Delete all rows containing NULL values.

import pandas as pd

df = pd.read_csv(‘data.csv’)

df.dropna(inplace = True)

print(df.to_string())

Note: With dropna(inplace=True), no new DataFrame is returned; instead, rows with NULL values are removed from the original DataFrame.

Replace Empty Values

Another way to handle empty cells is by replacing them with a new value. This approach prevents the need to delete entire rows just due to a few empty cells. The fillna() method lets you replace empty cells with a specified value.

Example

Replace NULL values with the value 130:

import pandas as pd

df = pd.read_csv(‘data.csv’)

df.fillna(130, inplace = True)

Replace Only For Specified Columns

The example above replaces empty cells throughout the entire DataFrame. To replace empty values in just one column, specify the column name in the DataFrame.

Example

Replace NULL values in the “Calories” column with the value 130:

import pandas as pd

df = pd.read_csv(‘data.csv’)

df[“Calories”].fillna(130, inplace = True)