GTFS analytics for transit planners: what a schedule feed can tell you
The General Transit Feed Specification (GTFS) started as a way to put bus times into a trip planner. But a GTFS Schedule feed is a complete, structured description of an agency's planned service — and that makes it one of the richest analytical datasets a planner has, available for nearly every transit operator in the country at no cost. The trick is knowing what is actually in the feed and, just as importantly, what you can compute from it directly versus what you still need to source elsewhere.
What a GTFS Schedule feed actually contains
A GTFS feed is a ZIP archive of CSV-style text files, each describing one part of the service. The core files you will work with are:
- agency.txt — the operating agency or agencies, time zone, and URL.
- routes.txt — every route, its short and long name, and its mode (bus, light rail, ferry, and so on via
route_type). - trips.txt — each individual scheduled run of a route, tied to a direction, a service period, and optionally a shape.
- stop_times.txt — the heart of the feed: the arrival and departure time of every trip at every stop, in sequence. This is by far the largest file and the source of most analytics.
- stops.txt — the location (latitude/longitude), name, and hierarchy of each stop or station.
- calendar.txt and calendar_dates.txt — which days each service runs.
calendar.txtdefines weekly patterns with a start/end date;calendar_dates.txtlayers on exceptions (added or removed dates, e.g. holidays). - shapes.txt — the physical path a trip follows as an ordered list of points, used to draw the route and measure its length.
Optional files (frequencies, transfers, fares, pathways, feed_info) add detail, but the seven above are enough to support a deep analysis.
What you can derive — no extra data required
Because the feed is structured rather than a flat timetable, you can compute meaningful planning metrics straight from it.
Service supply and frequency
This is where GTFS earns its keep. By joining trips to calendar/calendar_dates, you can count trips per day on any route for any service day — a typical weekday, Saturday, or Sunday. The mechanics are straightforward: pick a representative stop on the route (often a high-ridership or terminal stop), pull every departure at that stop from stop_times for trips active on the chosen service day, and sort them in time order. The gaps between consecutive departures are the headways; the median or mean gap within a window is the headway you would publish.
Bucket those departures into time periods — early AM, AM peak, midday, PM peak, evening, night — and you get headways by period, which is how planners actually describe service ("every 12 minutes peak, every 30 midday"). The first and last departures of the day give the span of service. Do this for each direction and each day type and you have a full picture of how much service each route supplies and when. Aggregate across the network and you can rank routes by frequency, flag routes that fall below a frequency-standard threshold, or total up daily revenue trips — all from the schedule alone.
Network structure
The spatial files describe the network itself. Distances between consecutive stops give stop spacing, a lever for speed and access. Comparing a route's shapes.txt path length against the straight-line distance between its endpoints yields a directness (circuity) ratio. Stops served by multiple routes within a short walk reveal transfer opportunities, and buffering all stops shows the coverage footprint of the network. None of this requires data beyond the feed.
Where GTFS stops
A schedule feed is exactly that — a schedule. It contains no ridership, no on-time performance, and no demographic or land-use context. To answer the questions planners are usually asked, pair it with complementary sources:
- NTD (National Transit Database) for ridership, revenue hours and cost, so you can relate supply to demand and productivity.
- GTFS-Realtime for delivered service — actual vehicle positions, arrival predictions, and the gap between planned and operated trips.
- Census/ACS and LODES for population, demographics, and jobs, which underpin equity and access analysis.
GTFS is the connective tissue: it geolocates and time-stamps the service that all of these other datasets describe from a different angle.
Practical gotchas
- Frequency-based vs. stop-time trips. Some feeds describe high-frequency service in
frequencies.txt("every 10 minutes from 6am to 9am") instead of enumerating every trip instop_times. Your analysis has to expand those frequencies, or it will undercount service. - Calendar exceptions. Never assume
calendar.txtis the whole story. A holiday or a school-break service change lives incalendar_dates.txt, and ignoring it can put you on the wrong service day entirely. - Feed freshness and versioning. Agencies publish new feeds as service changes; an old feed describes service that no longer runs. Check
feed_info.txtdates and the active calendar window before trusting any number. - Validate first. Garbage in, garbage out. Run the feed through validation before computing anything — a malformed
stop_timesor a broken shape will quietly corrupt your headways and distances. See our guide to feed validation and the live data-coverage view for how we track feed health nationwide.
How HeadwayForge operationalizes this
Doing this once for one agency is a weekend of scripting. Doing it consistently, for any agency, on demand, is a platform problem. HeadwayForge normalizes roughly 1,150 US GTFS feeds — handling frequency expansion, calendar exceptions, and validation — and turns them into ready-made service-supply and network-structure views: trips per day, headways by period and day type, span, stop spacing, and route directness, without you writing a line of code. From there it joins NTD, Census/LODES, and GTFS-Realtime so the schedule sits in context. See the product overview for how the pieces fit, or jump to service planning to see these metrics applied to a real workflow.