EDA OF EMPLOYEE ATTRITION

EDA of an employee attrition dataset to reveal demographic, compensation, and departmental patterns driving turnover.

Category

People Analytics • EDA

Tech Stack

Python · Pandas · NumPy · Matplotlib · Seaborn · Skimpy

Exploratory Data Analysis of Employee Attrition

🔑 Key Insights

  • Attrition is concentrated in younger employees — the mean attrition age is ~32, and attrition decreases as age increases.

  • Lower-paid employees are far more likely to leave — employees who left earned ~$3,000/month on average vs ~$6,000 for those who stayed, with income showing the strongest negative correlation with attrition.

  • R&D carries a higher attrition rate even though Sales has the highest number of leavers, revealing risk in a smaller but critical population.

These patterns show that attrition is driven by demographics and compensation, not just overall headcount.

🏷️ Overview

This project uses Python-based Exploratory Data Analysis (EDA) to understand why employees leave a fictional company. Using the Employee Attrition dataset from Kaggle, the analysis explores relationships between attrition and factors such as age, tenure, income, department, education, and satisfaction levels.

The goal is to surface clear, data-backed patterns that HR and leadership can use to design better retention strategies and reduce unwanted turnover.

❗ Problem

  • The company had no structured view of which groups were most at risk of leaving.

  • Leadership lacked clarity on whether attrition was driven more by demographics, pay, satisfaction, or function.

  • Attrition conversations were based on assumptions (e.g. “Sales always churns more”) rather than measurable patterns.

🧩 Solution

  • Performed systematic EDA on the employee attrition dataset using Python.

  • Analyzed attrition against:

    • Age and tenure

    • Monthly income

    • Department and education field

    • Environment and job satisfaction

  • Built a series of visualizations (univariate + bivariate) to answer concrete questions:

    • Which groups have the highest attrition?

    • How do age, tenure, and income differ between stayers and leavers?

    • Are certain education fields or departments structurally at risk?

  • Translated the patterns into actionable observations around compensation, environment, and early-tenure risk.

  • This turned a raw CSV into clear retention signals HR could act on.

🛠️ Methodology

  • Tools: Python, Pandas, NumPy, Matplotlib, Seaborn, Skimpy (for column cleaning).

  • Steps:

    • Loaded the Kaggle dataset and cleaned column names.

    • Checked data integrity: structure, missing values, duplicates, types.

    • Conducted univariate analysis (distributions, counts, summary statistics).

    • Ran bivariate analysis to explore relationships between attrition and:

      • Age, tenure, income

      • Department and education

      • Satisfaction and work-life balance

    • Built correlation views to understand which variables move together (e.g. age–income, income–attrition).

The notebook and visuals are fully reproducible and focus on clean, readable analysis rather than model building.

© 2026 • ALFIE DANISH