Skip to the content.

A Tree-Based Model for Predicting Travel Mode

Travel Models Support Informed Decision Making

A travel model can help answer these questions, and many more!

The Utility Travel Model

Idea: A commuter will choose the travel mode that maximizes their welfare

Assumption: The commuter is fully informed of each mode’s attributes and considers all modes equally

Other concerns: Can become very complex, takes a long time to run, and difficult to explain results

Our Approach: Decision Tree Travel Model

Idea: Based on the travel mode that other commuters chose, we predict a commuter with similar conditions will choose the same mode.

XGBoost (Extreme Gradient Boost) Tree: An Optimized Decision Tree

Features:

Data

Our data was provided by our industry mentors, Deloitte Consulting LLP.

The data contains over 14 million different trips with 29 attributes of each trip. We plotted some of the attributes to get insight into the distributions of our data and the activities we have.

Trip Modes

We have 12 different trip modes to predict and the distribution is shown in the figure below. The most common trip modes are 1, 3, and 5, which are Drive Alone Free, HOV2 Free, HOV3 Free, meaning most trips were done through driving and without paying tolls or fees.

Age

Most commuters were in the age groups 4, 6, and 7 which correspond to 25-40, 41-55, and 56+ years old. The gap we see between 5 to 6 is because our data on age does not have the category of 5.

Income

Most commuters fall under the average income of $60k to $100k (Group 3). The least represented group is 4, which is the average income of $100k to $150k. Most households in our dataset have an income that is around average, and in general, of lower income groups. Income subpopulations can be interpreted in our model because this can affect travel modes.

Results

Evaluation

After we finished the model, we created a validation pipeline to estimate the performance of our model and sought the potential problems in our model, like overfitting or imbalanced sample.

Below are the AUC curves for the training and testing sets over 100 epochs. We observe that the training curve improves steadily while the testing curve plateaus at an AUC of around 0.91, which means our model correctly predicts about 91% of the test set.

We also plotted ROC curves for each class using the One vs All technique, which considers one class as a label and groups the others as the other label. This way, we can see which classes our model tends to perform better on. According to the AUC scores, we see that our model does best predicting school bus and drive alone paid trips, but not as well on 2-person HOV free trips.

Feature Fine-tuning

To simplify our model, we want to use only a handful of features. We narrowed down our choices by evaluating the feature importances, which indicates how important each feature was in building the model. Based on the graph below, we observed that employment density at the origin and destination regions (dempden, oempden) were the two most important features in determining trip mode. We intend to use all the features as or more important than the parkingcost in our optimized mode.

We compared the evaluation metrics of our model with the results that Deloitte Consulting LLC provided us from their utility model. For measurement of correctness, we referred to accuracy, sensitivity, precision, and F1-score for evaluation metrics

Accuracy:

Sensitivity:

Precision:

F1-score:

Conclusion

The results show that the XGBoost decision tree model performed better than the utility model across all evaluation metrics. We hope that as the model becomes more fine-tuned, we will be able to improve these numbers even more and provide urban planners and policy makers with accurate simulations that will assist them in making more informed decisions.