Groupby and Aggregate
Tags: #pandas #snippet #datamining #dataaggragation #datacleaning #operations
Description: This notebook groups and perform aggregation on columns.
References:
- https://towardsdatascience.com/5-pandas-group-by-tricks-you-should-know-in-python-f53246c92c94
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html
- https://www.w3resource.com/pandas/dataframe/dataframe-agg.php
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.agg.html
import pandas as pd
# create DataFrame
df = pd.DataFrame(
{
"team": ["A", "A", "A", "B", "B", "B"],
"points": [11, 7, 8, 10, 21, 13],
"assists": [5, 7, 7, 9, 12, 0],
"rebounds": [5, 8, 10, 6, 6, 22],
}
)
df
# list of columns to group
to_group = ["team"]
Accepted combinations are:
- function
- string function name: "sum", "count", "mean"
- list of functions and/or function names, e.g. [np.sum, 'mean']
- dict of axis labels -> functions, function names or list of such.
Here, we are going to use dict of axis labels -> functions
# dict of columns to aggregate
to_agg = {
"points": "sum",
"assists": "sum",
"rebounds": "sum",
}
- When you use .groupby() function on any categorical column of DataFrame, it returns a GroupBy object. Then you can use different methods on this object and even aggregate other columns to get the summary view of the dataset.
- The agg() method allows you to apply a function or a list of function names to be executed along one of the axis of the DataFrame, default 0, which is the index (row) axis.
# For aggregated output, return object with group labels as the index.
# Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output.
df = df.groupby(by=to_group, as_index=False).agg(to_agg)
df
Last modified 1mo ago