Septa Next Bus
Technical Details

This project uses machine learning for it's prediction approach. The main app is written in C++, the database used is mysql, and the front end api is available through php.

Machine Learning Details

The machine learning uses a simple linear regression algorithm with fmincg. It also uses feature normalization. Everything one would need to know to use this approach can easily be learned from the first 4 "weeks" of this basic online course - https://www.coursera.org/learn/machine-learning.

Training features currently used:

  • The bus lat and lng
  • Distance from the bus to the stop
  • Manhattan distance from the bus to the stop
  • The day of the week
  • Current hour and what quarter of the day it is
  • Direction in the route the bus is going
  • Which Septa destination is claimed
  • The block_id claimed
  • The current nearby stop_id
  • Number of stops until destination
  • Number of buses currently on the route
  • Weather information
  • Time until next scheduled stop
  • Traffic data (through google directions api)

    The Source

    The source is available at http://sourceforge.net/projects/septanextbus/. You can also clone the mercurial based repository with "hg clone http://hg.code.sf.net/p/septanextbus/code septanextbus-code". A good windows / mac / linux Mercurial gui can be found at http://tortoisehg.bitbucket.org/. The code was developed for linux originally so it may need minor tweaks to compile on other platforms. The C++ code depends on the libraries curl, jsoncpp, and mysqlclient.

    Replicating the Project

  • First install mysql and create a database called septa_next_bus_db and grant full permissions to the user/pass of snb_user/snb_password.
  • Second compile the c++ code. First install the development libraries curl, jsoncpp, and mysqlclient. If you are on linux a makefile has been provided so you can just type make or make clean.
  • Next install a httpserver like apache with php / mysql and curl support. Then put the php files on this server.
  • After this begin collecting bus data to train the machine learning algorithm against. You will probably need at least a week's worth of data to begin making decent predictions. This is done with the c++ app by running "./next_bus -b". Also see "./next_bus --help".
  • Once your mysql database has been populated with sufficient training data, train the database by running "./next_bus -c". After this the php scripts should be able to grab the results from mysql to make predictions.

    If you are not proficient with compiling c++ programs, web server installations and mysql installations it is recommended that you ask for help from someone as it can be hard to learn from scratch.

    Some other notes:

  • Run "./next_bus -o" to keep pre predictions up to date in the database so the php scripts don't have to make them on their own.
  • A lot depends on septa's "gtfs" data which is frequently updated. You may want to search for the latest version and download it to the gtfs_public folder. When this is done run "./next_bus --reload-gtfs".