The EDA for this project consisted of:
- Checking missing values
- Looking at the distribution of the target variable (churn)
- Looking at numerical and categorical variables
Functions and methods:
df.isnull().sum()
- returns the number of null values in the dataframe.df.x.value_counts()
returns the number of values for each category in x series. Thenormalize=True
argument retrieves the percentage of each category. In this project, the mean of churn is equal to the churn rate obtained with the value_counts method.round(x, y)
- round an x number with y decimal placesdf[x].nunique()
- returns the number of unique values in x series
The entire code of this project is available in this jupyter notebook.
The notes are written by the community. If you see an error here, please create a PR with a fix. |