Creating a DataFrame

import pandas as pd
df = pd.read_csv("a.csv")
df = pd.DataFrame({dict_t}) #convert a dict to a dataframe
df = Series_t.to_frame()

Location

df.loc[0:4,"Year":"Party"]
df.loc[[1,2,5],["Year","Candidate"]]
df.loc[:,["Year","Candidate"]]
df.loc[:,"year"] # return series
df.loc[1,"year":"Party"] # return series
df.shape[0] # count of row
df[["Year","Candidate"]][0:5]

type transforming

1
2
3

df['column_name'] = df['column_name'].astype(int)
df['column_name'] = df['column_name'].astype(float)
df['column_name'] = df['column_name'].astype(str)

Groupby and agg

1
2
3

df.groupby('column_name').agg(['sum', 'mean', 'max', 'min'])
df.groupby(['column_name_1', 'column_name_2']).agg({'column_name_3': ['sum', 'mean'], 'column_name_4': ['max', 'min']})
df.groupby(['column_name_1', 'column_name_2']).agg(['sum', 'mean', 'max', 'min'])

Example

Example 1

Extracting the top 20 categories from a DataFrame using groupby:

Use the groupby and size methods to calculate the size of each group:

group_sizes = df.groupby('category').size()
````

2.  Use the `sort_values` method to sort the group sizes in descending order:

```python
sorted_groups = group_sizes.sort_values(ascending=False)

Use the head method to extract the top 20 categories:

1	top_20 = sorted_groups.head(20)

Use the isin method to filter the original DataFrame to only include rows with the top 20 categories:

1	df_filtered = df[df['category'].isin(top_20.index)]

This will extract the top 20 categories from the ‘category’ column of the DataFrame and create a new DataFrame that only includes rows with those categories.

Creating a DataFrame

Location

type transforming

Groupby and agg

Example

Example 1

Archives

Recents

Tags