Creating a DataFrame 1 2 3 4 import pandas as pddf = pd.read_csv("a.csv" ) df = pd.DataFrame({dict_t}) df = Series_t.to_frame()
Location 1 2 3 4 5 6 7 df.loc[0 :4 ,"Year" :"Party" ] df.loc[[1 ,2 ,5 ],["Year" ,"Candidate" ]] df.loc[:,["Year" ,"Candidate" ]] df.loc[:,"year" ] df.loc[1 ,"year" :"Party" ] df.shape[0 ] df[["Year" ,"Candidate" ]][0 :5 ]
1 2 3 df['column_name' ] = df['column_name' ].astype(int ) df['column_name' ] = df['column_name' ].astype(float ) df['column_name' ] = df['column_name' ].astype(str )
Groupby and agg 1 2 3 df.groupby('column_name' ).agg(['sum' , 'mean' , 'max' , 'min' ]) df.groupby(['column_name_1' , 'column_name_2' ]).agg({'column_name_3' : ['sum' , 'mean' ], 'column_name_4' : ['max' , 'min' ]}) df.groupby(['column_name_1' , 'column_name_2' ]).agg(['sum' , 'mean' , 'max' , 'min' ])
Example Example 1 Extracting the top 20 categories from a DataFrame using groupby
:
Use the groupby
and size
methods to calculate the size of each group:
1 2 3 4 5 6 7 group_sizes = df.groupby('category' ).size() ```` 2. Use the `sort_values` method to sort the group sizes in descending order:```python sorted_groups = group_sizes.sort_values(ascending=False )
Use the head
method to extract the top 20 categories:
1 top_20 = sorted_groups.head(20 )
Use the isin
method to filter the original DataFrame to only include rows with the top 20 categories:
1 df_filtered = df[df['category' ].isin(top_20.index)]
This will extract the top 20 categories from the ‘category’ column of the DataFrame and create a new DataFrame that only includes rows with those categories.