Creating a DataFrame
1 | import pandas as pd |
Location
1 | df.loc[0:4,"Year":"Party"] |
type transforming
1 | df['column_name'] = df['column_name'].astype(int) |
Groupby and agg
1 | df.groupby('column_name').agg(['sum', 'mean', 'max', 'min']) |
Example
Example 1
Extracting the top 20 categories from a DataFrame using groupby
:
Use the
groupby
andsize
methods to calculate the size of each group:1
2
3
4
5
6
7group_sizes = df.groupby('category').size()
````
2. Use the `sort_values` method to sort the group sizes in descending order:
```python
sorted_groups = group_sizes.sort_values(ascending=False)Use the
head
method to extract the top 20 categories:
1 | top_20 = sorted_groups.head(20) |
- Use the
isin
method to filter the original DataFrame to only include rows with the top 20 categories:
1 | df_filtered = df[df['category'].isin(top_20.index)] |
This will extract the top 20 categories from the ‘category’ column of the DataFrame and create a new DataFrame that only includes rows with those categories.