Technical Details
This project uses machine learning for it's prediction approach. The main app is written in C++, the database used is mysql, and the front end api is available through php.
Machine Learning Details
The machine learning uses a simple linear regression algorithm with fmincg. It also uses feature normalization. Everything one would need to know to use this approach can easily be learned from the first 4 "weeks" of this basic online course - https://www.coursera.org/learn/machine-learning.
Training features currently used:
The bus lat and lng
Distance from the bus to the stop
Manhattan distance from the bus to the stop
The day of the week
Current hour and what quarter of the day it is
Direction in the route the bus is going
Which Septa destination is claimed
The block_id claimed
The current nearby stop_id
Number of stops until destination
Number of buses currently on the route
Weather information
Time until next scheduled stop
Traffic data (through google directions api)
The Source
The source is available at http://sourceforge.net/projects/septanextbus/. You can also clone the mercurial based repository with "hg clone http://hg.code.sf.net/p/septanextbus/code septanextbus-code". A good windows / mac / linux Mercurial gui can be found at http://tortoisehg.bitbucket.org/. The code was developed for linux originally so it may need minor tweaks to compile on other platforms. The C++ code depends on the libraries curl, jsoncpp, and mysqlclient.
Replicating the Project
First install mysql and create a database called septa_next_bus_db and grant full permissions to the user/pass of snb_user/snb_password.
Second compile the c++ code. First install the development libraries curl, jsoncpp, and mysqlclient. If you are on linux a makefile has been provided so you can just type make or make clean.
Next install a httpserver like apache with php / mysql and curl support. Then put the php files on this server.
After this begin collecting bus data to train the machine learning algorithm against. You will probably need at least a week's worth of data to begin making decent predictions. This is done with the c++ app by running "./next_bus -b". Also see "./next_bus --help".
Once your mysql database has been populated with sufficient training data, train the database by running "./next_bus -c". After this the php scripts should be able to grab the results from mysql to make predictions.
If you are not proficient with compiling c++ programs, web server installations and mysql installations it is recommended that you ask for help from someone as it can be hard to learn from scratch.
Some other notes:
Run "./next_bus -o" to keep pre predictions up to date in the database so the php scripts don't have to make them on their own.
A lot depends on septa's "gtfs" data which is frequently updated. You may want to search for the latest version and download it to the gtfs_public folder. When this is done run "./next_bus --reload-gtfs".