How To

Analyze Price Transparency Data

Healthcare is undergoing a fundamental shift toward transparency. Under federal regulations, hospitals and insurers are required to publish detailed machine-readable files (MRFs) of negotiated rates for thousands of procedures, services, and providers.

On paper, this data promises a revolution in how patients, providers, payers, and self-funded employers understand healthcare costs. In practice, it’s overwhelming: terabytes of messy JSON files that are nearly impossible to analyze with traditional spreadsheets or databases.

So how do you make sense of it?

First it's important to understand what's in these files (and perhaps more importantly, what isn't).

What’s Inside Payer / Transparency in Coverage (TiC) Files

Under the Transparency in Coverage (TiC) rule, applicable to commercial insurers and group health plans, the required files typically include:

  1. In‑network negotiated rates for all covered services and providers
  2. Out-of-network allowed amounts and billed charges
  3. For prescription drugs: negotiated rates and historical net prices

Additionally, files typically include:

  • Reporting entity information (legal name, type)
  • Plan identifiers like EIN or HIOS ID, market type, network type, and network name
  • Provider details,
  • Billing codes with types (CPT, HCPCS, DRG, NDC) linked to allowed amounts

These payer MRFs are updated monthly and aim to give consumers, regulators, and analysts insight into pricing across a wide range of services. The raw files are massive and often lack standardized formatting, making them difficult to work with out of the box.

One of the biggest pitfalls in working with transparency files is the presence of zombie rates, which are rates tied to inactive contracts, placeholder values, or outdated plan designs that still appear in the files. These ghost entries inflate the dataset and can skew benchmarking if not carefully identified and filtered out.

What’s Inside Hospital Machine-Readable Files (MRFs)

Under CMS rules, hospital MRFs must include five categories of standard charges for each item or service:

  • Gross charges (chargemaster prices)
  • Discounted cash prices (for self‑pay patients)
  • Payer‑specific negotiated rates
  • De‑identified minimum and maximum negotiated rates

As of recent regulatory updates (Aug 2025), additional data elements are now required:

  • Estimated allowed amount (an average received dollar amount when contracts are algorithmic or percent‑based)
  • Drug unit and unit of measurement (for detailed prescription pricing)
  • Modifiers that could change the standard charge—for example, for different billing scenarios

Hospital MRFs must also include:

  • Payer and plan names, or broad categories like "all PPO plans", linked to payer‑specific negotiated charges
  • A footer link and .txt manifest with hosting/site metadata and a contact for questions

Key challenges with hospital data include format variance, inconsistent schemas, and frequent mismatches between hospital and payer data. In our opinion, hospital data, while useful at times, tends to be less reliable than payer data as a whole.

Analysis: The Intelligence Cycle

At Gigasheet we're big proponents of the proven Intelligence Cycle methodology used by national security intelligence agencies and experts around the world. The same process works well for gaining intelligence in any domain, and healthcare markets are no exception.

Source: recordedfuture.com

Below is an abrreviated explaination of the core elements of the cycle for analyzing price transparency data.

Collection

The first step is collecting the hospital and payer machine-readable files. Hospitals are required to publish files that include gross charges, discounted cash prices, and negotiated rates. Payers publish in-network negotiated rates, out-of-network allowed amounts, and drug pricing data. These files are updated regularly and can be massive, often hundreds of gigabytes. Most are delivered in nested JSON formats that require parsing before they can be queried. File structures vary by payer, which adds further complexity when trying to build a unified dataset.

Cleaning and Normalizing

Once collected, the raw files need significant cleaning. Provider names, NPIs, and payer identifiers are often inconsistent or duplicated. Zombie rates tied to inactive or placeholder contracts add noise and can distort analysis. Cleaning also involves standardizing billing codes and normalizing file formats so that services align correctly across sources. Without this step, comparisons between providers or across payers are unreliable.

Enriching

Transparency data becomes much more valuable when enriched with external context. Medicare reimbursement benchmarks, provider quality ratings, geographic crosswalks, and network attributes all add critical perspective. Enrichment transforms raw pricing data into a framework where costs can be tied to quality, outcomes, and geography. It also enables analysts to connect the dots between price and value rather than just producing static rate comparisons.

Analysis

The final step is turning the prepared data into actionable insights. Effective analysis depends on the objective. For example, comparing rates for outpatient versus inpatient services requires factoring in the place of service. Removing statistical outliers can prevent extreme but rare values from skewing results. Recognizing that a single provider may work across multiple organizations with different contracted rates is also critical. Analysts may also want to evaluate contract terms at the plan level rather than lumping all rates under one payer.

AI is rapidly lowering the barriers to this type of work. Machine learning models can flag zombie rates and outliers automatically, cluster providers with similar pricing patterns, and even suggest benchmarks by payer and geography. Natural language interfaces, like those we use at Gigasheet, are making it easier for business teams to query datasets without writing code or learning complex interfaces. Instead of only expert data engineers being able to navigate terabytes of nested JSON, AI-assisted tools now allow anyone to quickly surface the insights that matter for negotiations, network design, and cost management.

Platforms like Gigasheet combine this analytical flexibility with AI-powered parsing and cleaning at big-data scale, making it possible to work with massive, messy files and still deliver clean, trustworthy insights.

Similar posts

100,000+ people use Gigasheet.

Self-service analytics for huge files, databases, warehouses and more.
No Code
No Training
No Install
Request a Demo

By using this website, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.