NTD peer benchmarking: how to choose peers and read the numbers

"How are we doing compared to similar agencies?" is one of the most common questions a transit board, a city council, or a grant reviewer will ask — and one of the easiest to answer badly. The data exists: the FTA's National Transit Database (NTD) collects standardized operating and financial statistics from agencies across the country every year. The hard part is using it honestly: picking a peer group that actually resembles your agency, choosing metrics that mean something, and reading the result without drawing a conclusion the data can't support. This guide walks through all three.

What the NTD is

The National Transit Database is the FTA's repository of financial and operating data reported by recipients of federal transit funding. Agencies submit detailed annual reports — and many submit monthly data as well — covering service supplied, service consumed, costs, assets, and safety. Because the reporting follows a common definition framework, the NTD is the closest thing the U.S. transit industry has to apples-to-apples national statistics. It is the backbone of most peer comparisons, board dashboards, and the quantitative sections of grant narratives.

The headline metrics

A handful of measures do most of the work. They fall into two groups — what service you supplied, and how much of it was used — plus the cost to provide it.

Unlinked Passenger Trips (UPT) — boardings; the standard ridership count. A rider who transfers once is counted twice.
Passenger Miles Traveled (PMT) — the cumulative distance riders travel, which captures trip length in a way UPT does not.
Vehicle Revenue Miles (VRM) and Vehicle Revenue Hours (VRH) — the miles and hours vehicles operate while in revenue service (i.e., available to carry passengers, excluding deadhead). These are the core "supply" measures.
Operating expense — the cost of running service, reported by mode.
Fleet size / VOMS — the count of vehicles, including Vehicles Operated in Maximum Service, the peak-pullout number that describes how much equipment the system actually puts on the street.

Raw volume vs. derived ratios

The single biggest mistake in benchmarking is comparing raw volume directly. Of course a large metropolitan operator carries more trips than a mid-size bus system — that tells you the city is bigger, not that the agency is more effective. The useful comparisons come from derived ratios that normalize for scale:

Trips per revenue hour (UPT ÷ VRH) — a productivity measure: how many boardings each hour of service generates.
Cost per trip (operating expense ÷ UPT) — what it costs to deliver one boarding.
Cost per passenger mile (operating expense ÷ PMT) — cost normalized by distance carried, which is fairer across systems with very different average trip lengths.
Average trip length (PMT ÷ UPT) — short hops vs. long commutes, which reshapes how every other ratio should be read.

Volume tells you how big a system is; ratios tell you how it performs. A board context slide needs the volume to set the scale, but the conversation about whether service is productive or efficient lives entirely in the ratios.

The crux: choosing a valid peer set

A benchmark is only as good as the peer group behind it. Bad peer selection is the fastest way to produce a misleading chart. The goal is a set of agencies that are genuinely comparable on the dimensions that drive cost and ridership:

Urbanized area (UZA) size. Service operates in a context. An agency in a million-person urbanized area faces different density, congestion, and demand than one in a 150,000-person area. Matching UZA population is usually the first filter.
Primary mode. Compare bus to bus, light rail to light rail. The economics of a heavy-rail metro and a fixed-route bus network are not the same currency.
Service type and region. Demand-response, commuter, and all-day local service have different cost structures; regional labor markets and operating environments matter too.

The classic failure: benchmarking a small bus system against a heavy-rail metro. The metro will look "cheaper per passenger mile" and "more productive per revenue hour" because rail moves dense corridors at scale — but the comparison says nothing about how the bus system is run. Mismatched peers don't just add noise; they point you at the wrong conclusion.

Reading the numbers: percentiles, medians, and time

Once you have a defensible peer set, read it carefully. A peer comparison is usually expressed two ways: against the median (the middle of the peer set — a single reference point), and as a percentile (where your agency sits within the full distribution). Percentiles are more informative because they show whether you are near the middle of a tight cluster or out at the tail. Being "below the median on cost per trip" sounds clean, but if the peer set is widely spread, the median alone hides how far above or below you really are.

Two cautions matter most. First, do not over-interpret a single year. One reporting year can be distorted by a service disruption, a one-time cost, a reporting change, or — as recent years have shown industry-wide — a demand shock. Trends across several years are far more trustworthy than a single snapshot. Second, remember that even good peers differ; a number outside the median is a prompt to ask why, not a verdict on its own.

Where benchmarking earns its keep

Leadership context. Boards and executives need to know where the agency stands among comparable systems before they weigh trade-offs.
Identifying productivity gaps. A low trips-per-revenue-hour figure relative to peers is a starting point for a service-design conversation — which routes, which spans, which markets.
Grant narratives. Competitive applications are stronger when they situate the agency against a credible peer set rather than asserting need in a vacuum.

How HeadwayForge supports this work

HeadwayForge benchmarks an agency against an NTD peer set assembled on the right basis — same urbanized area and mode — rather than an arbitrary list. It compares the agency on the metrics that matter: UPT, passenger miles, VRM and VRH, fleet, operating expense, cost per trip, and a productivity percentile so you can see where the agency sits in the distribution, not just against a median. The full comparison exports as CSV so it can drop straight into a board deck or a grant narrative. To see how this fits the funding workflow, read the grants & capital use case; for the broader analytics platform behind it, see the product overview; and browse sample outputs to see how a peer comparison is laid out. The selection criteria and the metrics stay visible, so the comparison is one your team can defend when someone asks how the peers were chosen.

Open your agency — free → More on the blog

NTD peer benchmarking: how to choose peers and read the numbers

What the NTD is

The headline metrics

Raw volume vs. derived ratios

The crux: choosing a valid peer set

Reading the numbers: percentiles, medians, and time

Where benchmarking earns its keep

How HeadwayForge supports this work

Benchmark against the right peers