DoorDash Delivery Duration Prediction

Project Goal: Build a predictive model that estimates total delivery time based on order-level data, marketplace conditions, and store behavior, while identifying key factors affecting delivery delays.

Tech Stack: Python (pandas, numpy, seaborn, matplotlib), scikit-learn, xgboost

Why Delivery Time Prediction Matters

DoorDash promises timely food delivery. Every delayed order risks customer churn, lower ratings, and operational friction. That's why predicting how long a delivery will take—before the order is placed—is a critical business function.

This project analyzes over 170,000 historical DoorDash delivery logs to improve delivery duration estimates using machine learning. The goal was not just to build a predictive model, but to uncover what actually drives delivery delays.

Data and Method

Key Analytical Steps

1. Preprocessing

2. Feature Engineering

Created new variables like:

3. Feature Selection

4. Modeling

Visual Summary

Correlation Matrix: Detecting Multicollinearity

Correlation heatmap showing relationships between features

Feature Importance: What Drives Delivery Time?

Feature importance plot showing key predictors of delivery time

Model Comparison: Simpler Models Performed Better

RMSE comparison chart across different models

Project Presentation

/

Key Findings

Business Implications

← Back to Home