Project Goal: Build a predictive model that estimates total delivery time based on order-level data, marketplace conditions, and store behavior, while identifying key factors affecting delivery delays.
Tech Stack: Python (pandas, numpy, seaborn, matplotlib), scikit-learn, xgboost
DoorDash promises timely food delivery. Every delayed order risks customer churn, lower ratings, and operational friction. That's why predicting how long a delivery will take—before the order is placed—is a critical business function.
This project analyzes over 170,000 historical DoorDash delivery logs to improve delivery duration estimates using machine learning. The goal was not just to build a predictive model, but to uncover what actually drives delivery delays.
Created new variables like:
estimated_total_duration (DoorDash's system ETA)busy_ratio (active dashers marked busy)avg_price_per_item, item_diversity_ratio, and price_range_of_items
estimated_total_duration) was closely aligned with actual outcomes. This validated the engineering team's domain logic and data quality.busy_ratio variable—how many dashers were marked busy during the order window—strongly influenced final delivery time.