Skip to content

Latest commit

 

History

History

EDA

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Univariate Exploratory Data Analysis (EDA)

Univariate EDA involves the analysis of a single variable at a time. It helps to understand the distribution, central tendency, spread, and potential outliers within a single variable. Here are some common techniques used in univariate EDA:

  1. Histograms: Histograms display the distribution of a single numerical variable by dividing it into bins and plotting the frequency of observations in each bin.

  2. Boxplots: Boxplots provide a visual summary of the central tendency, spread, and skewness of numerical data. They also help in identifying potential outliers.

  3. Descriptive Statistics: Descriptive statistics such as mean, median, mode, variance, and standard deviation give a summary of the central tendency and variability of the data.

  4. Countplots: Countplots are useful for visualizing the frequency of different categories within a categorical variable.

  5. Kernel Density Estimation (KDE) Plots: KDE plots estimate the probability density function of a continuous variable, providing insights into the shape of its distribution.

Bivariate Exploratory Data Analysis (EDA)

Bivariate EDA involves analyzing the relationship between two variables simultaneously. It helps to understand how one variable behaves concerning another variable. Here are some common techniques used in bivariate EDA:

  1. Scatterplots: Scatterplots are useful for visualizing the relationship between two numerical variables. They help identify patterns, trends, and potential correlations.

  2. Pairplots: Pairplots display scatterplots for every pair of variables in a dataset, along with histograms for each variable along the diagonal. They provide a quick overview of the relationships between multiple variables.

  3. Correlation Analysis: Correlation analysis measures the strength and direction of the linear relationship between two numerical variables. Common correlation coefficients include Pearson correlation coefficient, Spearman rank correlation coefficient, and Kendall's tau.

  4. Heatmaps: Heatmaps visualize the correlation matrix between multiple variables using colors to represent the strength and direction of correlations. They are particularly useful when dealing with a large number of variables.

  5. Categorical Plots: Plots such as boxplots, violin plots, and bar plots can be used to visualize the relationship between a numerical and a categorical variable or between two categorical variables.

By performing both univariate and bivariate EDA, you can gain valuable insights into the characteristics of individual variables as well as the relationships between variables in your dataset.