Data Sets


So far, most of the data sets that I have are randomly generated. I would love to get more real data sets: a complete schedule for an airline or a city bus network would be great. To see the best reported tours, check the Speedy Tourist Page.

Railway Schedules

This data is from the Indian Bradshaw Railway guide (January 1996). My current effort has been to include the full set of routes between "major" cities in South India. A city was considered major if I had heard of it previously, or if it was a railway junction. The data was entered by hand, so it probably has a few bugs. The schedule is presented as a daily schedule, with the incorrect assumption that trains run every day. (A more accurate schedule would be a weekly schedule). Another minor nit is that I have not distinguished between different railway stations in the same city, so you are allowed to depart from Madras Central one minute after your arrival from Madras Egmore.

Data from real schedules has different properties from the artificial data. The vertex degree is generally small (except for a few junctions), but the number of parallel edges can be very high. There can be a strong correlation between arrival and departure times, so (for good connections), the waiting time in a city can be very small.

This data set is now complete, with the 75 city example covering essentially all the rail routes in southern India.

Random Instance Generators

Several different data sets have been generated. For some of the data sets, I have used real cities, and then randomly generated bus routes between pairs of cities. The source code for the generators is available and a sketch of the generation methods is given below. The graphs are checked for strong connectivity before being included in the official data set.

Data Sets

Each data set consists of a pair of files, a schedule file, and a city file. The schedule file has all of the information about the routes. The city file gives names to the cities, and display coordinates. (The city files aren't really necessary for the problem).

There are several families of data sets. Within each family, a common generation algorithm was used. Each problem instance specificies a starting time, and a starting city. (The times and cities are given as integers - the minutes since midnight, and the city number.)

The instances are specified: schedule_file, city_file, start_city, start_time