- The data table is a Pandas dataframe
- The data column is a Pandas series
- We can obtain information of the dataset:
- 10 top rows:
df.head(10)
- Number of rows & variables:
df.shape
- Names of variables:
df.columns
- Types of variables:
df.info()
- Stats of numerical variables:
df.describe()
- Stats of numerical variables:
df.describe(include=['object', 'bool'])
- 10 top rows:
- Information of numerical variables:
- Minimum
df.num_var.min()
- Maximun
df.num_var.max()
- Mean
df.num_var.mean()
- Median
df.num_var.median()
- Std
df.variable.std()
- Minimum
- Information of categorial variables:
- Unique values (count):
df.cat_var.value_counts()
- Unique values (percen):
df.cat_var.value_counts(normalize=True)
- 2 categorial vars (count):
pd.crosstab(df.cat_var1, df.cat_var2, margins=True)
- 2 categorial vars (percen):
pd.crosstab(df.cat_var1, df.cat_var2, margins=True, normalize=True)
- Unique values (count):
- Information of numerical & categorial variables:
- Pivot table:
df.pivot_table([num_var1, num_var2, ...], [cat_var1, cat_var2], aggfunc='mean')
- Pivot table:
- Fitering rows:
- One filter:
df[ condition ]
- Multiple filters:
df[ (condition1) & (condition2)]
- One filter:
01. Python y Pandas
Folders and files
Name | Name | Last commit date | ||
---|---|---|---|---|
parent directory.. | ||||