Links

Keep columns

Tags: #pandas #snippet #datacleaning #operations
Author: Florent Ravenel
Last update: 2023-06-03 (Created: 2023-06-03)
Description: This notebook shows how to define a new DataFrame that only keeps columns defined in Input section.
References:

Input

Import libraries

import pandas as pd

Setup Variables

  • to_keep: list of columns to keep in dataframe
to_keep = ["team", "points", "blocks"]

Model

Create DataFrame

# create DataFrame
df = pd.DataFrame(
{
"team": ["A", "A", "A", "B", "B", "B"],
"points": [11, 7, 8, 10, 13, 13],
"assists": [5, 7, 7, 9, 12, 9],
"rebounds": [11, 8, 10, 6, 6, 5],
}
)
df

Create new DataFrame that only keeps defined columns

You can't set one column that does not exists in your dataframe so we managed it in the function below.
def keep_columns(df, to_keep):
# Check if all columns exist in dataframe
for i, x in enumerate(to_keep):
if not x in df.columns:
print(
f"🚨 Columns '{x}' does not exist in DataFrame -> removed from your list!"
)
to_keep.pop(i)
df1 = df[to_keep]
return df1

Output

Display new DataFrame

df1 = keep_columns(df, to_keep)
df1