Exploring the fragility index and learning about clinical trials

By: Bryan Bunning (bjb2178) | Yuanzhi Yu (yy3019) | Zongchao Liu (zl2860) | Gavin Ko (wk2343) | Kevin S.W. (ksw2137)

Introduction

Fragility index (FI) measures the “fragility” of a clinical trial by iteratively switching the outcomes of a single trial arm and evaluating whether or not the trial becomes non-signifcant. It therefore measures the number of patient outcomes required for a statistical test to change. FI has been suggested to be an easy-to-understand metric of a clinical trial’s robustness, which may pair well with other frequently discussed metrics like p-values and we are interested in how FI may differ based on different clinical trial characteristics. This project investigates the fragility index using real world data in addition to baseline information about clinical trials. More details regarding our motivation can be found here.

Dataset

Obtaining our dataset is the hardest portion of this project. After various trial and error, we decided to use clinicaltrials.gov as our data source. This is a website provided by U.S. National Library of Medicine that stores information on studies being conducted around the world.

At first, we set the main target to autoimmune disorders related clinical trials. However, the resulting dataset wasn’t big enough for us to conduct further analysis. Therefore, we broadened our criteria to US-based phase III clinical trials.

This project ended up utilizing two manually created datasets:

A dataset created through html scraping clinicaltrials.gov which gave us trials eligible to calculate the fragility index.
A separate and distinct dataset created by selecting information by XML to explore geographical and temporal trends in clinical trials from all available clinical trial data

Findings

A quick summary of our findings include:

Data scraping of clinicaltrials.gov is difficult
Out of 10,000 trials prior to 2017, only 39 met our criteria for calculating the fragility index.
California has by far the most number of clinical trials out of available data

For detailed results, analysis and output, please refer to our report.

Fun Facts about the project

This project utilized 5 custom functions made by members of the team.
The XML for all publicly available clinical trial data is over 800 MB large.
To date, there have been 158 clinical trials based in Hawaii, and only 10 in Wyoming.
Alabama has the 2nd most industry-backed clinical trials in the US.