BulkUP - the Bulk Data User Partnership

The SMART team has previously performed a survey of performance of bulk FHIR across five sites and three software implementations. The results showed that bulk FHIR performance varied significantly across implementations. In practice, slow bulk performance can mean that it is impractical to investigate your population-scale data, despite the promise of bulk FHIR of being able to do just that.

But implementations are always changing, and there are certainly more implementations to test. That’s where you come in - measure the performance and data quality of your Bulk FHIR interface using the open source CumulusQ tools and share your results with the community!


How to submit your logs to SMART

Please use this form to request that your bulk FHIR API experience be seen by the community. Feel free to describe both interoperability issues and success stories. Someone from the community will verify and reach out to the email provided to obtain more information. Verified issues and stories will be listed publicly and can be updated based on vendor engagement.

Submit Your Logs


CumulusQ Performance

CumulusQ Performance is an effort to gather bulk data performance logs from as many different sites and vendors as possible. This will help us understand the challenges and limitations of the current crop of bulk export implementations, in a variety of environments.

SMART bulk export clients generate a standard log format that has interesting data about the timing of the export and how many resources were exported. (It holds no sensitive or personal data.)

How to generate export logs

All you have to do is use one of the SMART bulk data tools (read more about them on the Bulk Data Sandboxes page) and there should be a log.ndjson file sitting in the data download folder (or several files with names like log*.ndjson), next to your other NDJSON files.

CumulusQ USCDI

CumulusQ USCDI is a collection of data metrics that make it easier to surface interesting trends in your FHIR data, from conformance issues to code coverage to demographic information.

SMART is interested in collecting data metrics from as many different sites as possible. This will help us understand the FHIR conformance challenges experienced by different vendors, how widespread certain standardized code systems are, and the general shape of FHIR data in the wild.

How to generate the data metrics

These metrics are designed as a Cumulus Library study. They can operate on small amounts of local data, but you will be able to run metrics for your entire population if you set up the entire Cumulus pipeline in AWS. For now, we’ll focus on the local data approach.

Assuming you have some exported bulk NDJSON sitting in the folder ./downloads:

pipx install cumulus-library
pipx inject cumulus-library cumulus-library-data-metrics

# Compile the metrics
cumulus-library build \
  --db-type duckdb \
  --database metrics.db \
  --target data_metrics \
  --load-ndjson-dir ./downloads

# Export summarized metric reports
cumulus-library export \
  --db-type duckdb \
  --database metrics.db \
  --target data_metrics \
  ./reports

This will create a local file metrics.db that holds all the calculated metrics based on your data and a reports/ directory that holds summaries of the results, in both CSV and parquet forms.