31 Dec 2025

Understanding Decision Trees: The White Box of Machine Learning

Mateo Lafalce - Blog

In the world of Machine Learning, the Decision Tree stands out for one specific reason: clarity. While neural networks often act as "black boxes" where the logic is hidden, a Decision Tree is a "white box" model. You can see exactly how it thinks.

At its core, a Decision Tree is a flowchart-like structure used for both classification (predicting a category) and regression (predicting a value). Imagine playing a game of 20 Questions. You ask a series of Yes/No questions to narrow down the possibilities until you arrive at an answer. That is exactly how this algorithm functions.

How It Works: The Gini Impurity

A Decision Tree doesn't just guess which questions to ask; it calculates them using specific metrics. The most common metric in classification trees is the Gini Impurity.

Gini measures the purity of a group of data.

When the tree tries to split the data, it calculates the Gini score for the resulting groups. The algorithm always chooses the question that results in the lowest possible Gini score, ensuring the data becomes more organized with every step.

graph TD
    %% Root Node
    Start(("Start Loan Application")) --> A{"Is FICO Score >= 670?"}

    %% Decision 1
    A -- "No (Score < 670)" --> B("DENIED: High Credit Risk"):::denied
    A -- "Yes" --> C{"Is DTI Ratio < 43%?"}

    %% Decision 2
    C -- "No (DTI Too High)" --> D("DENIED: Excessive Debt"):::denied
    C -- "Yes" --> E{"Job History > 2 Years?"}

    %% Decision 3
    E -- "No (Unstable)" --> F("DENIED: Unstable Employment"):::denied
    E -- "Yes" --> G("LOAN APPROVED"):::approved

    %% Styling
    classDef denied fill:#ffdddd,stroke:#cc0000,stroke-width:2px,color:#990000;
    classDef approved fill:#ddffdd,stroke:#00cc00,stroke-width:2px,color:#005500;
    
    style A fill:#f9f9f9,stroke:#333,stroke-width:2px
    style C fill:#f9f9f9,stroke:#333,stroke-width:2px
    style E fill:#f9f9f9,stroke:#333,stroke-width:2px
    

When to Use

When Not To:

Python Implementation

Here is an example using the classic Iris dataset.


This blog is open source. See an error? Go ahead and propose a change.