Study merges clinical research investigators data from three government databases

Complete Picture

The three primary U.S. government sources of data about clinical research investigators -- National Library of Medicine:, FDA: Bioresearch Monitoring Information System (BMIS)and CMS: Open Payments (Sunshine Act) -- provide incomplete and inconsistent data not linked to data in the other databases. While the FDA is aware of the problems, researchers have used the databases to understand investigator demographics and activities.

Two researchers, Ronald Ranauro and Romiya Barry, have conducted a study to determine the “extent it is possible to combine data from all three databases into a single record for each investigator.” In the  study, they “matched and merged data across the three databases to form a more complete picture of investigators and their activities.” To synchronize the analysis, they limited the data in the analysis to certain parameters, as reported in the Journal of Clinical Research Best Practices.  

They determined that in the three databases, there were 65,890 investigators in 2017, not counting overlaps. Additionally, they found that investigators overlap incompletely across the three databases In the process of merging and matching data from the three databases, they identified 50,414 “apparently unique investigators.” Only 7,936 (15.7%) investigators could be found in all three databases for 2017, 12,564 (24.9%) could be found in two databases, and 29,940 (60.0%) could be found in only one database.

According to the researchers, not all investigators can be matched with clinical trial identifiers. Investigator records have sponsor, payment and therapeutic specialty data, 38,383 (76.1%) can be matched with one or more clinical trials, and 5,363 (10.6%) have only address and date of FDA filing data.

Of the investigators who first appeared in 2017, 29% first appear in that year. Some of the investigators who first appeared in 2017 then became very active. The researchers showed that 15,119 (29.4%) of the investigators with one or more attached 2017 interventional trial identifiers appear in either or Sunshine for the first time in 2017. The average investigator with a presence in 2017 has participated in 4.2 trials (mean) and 2.0 trials (median) of any type. 

For this analysis, the researchers included trials of all types with start dates as far back as 2000 and as far forward as 2018. Nearly 80% of trials of all types counted have start dates between 2008 and 2018. Separately, they analyzed the 5,363 BMIS investigator records without a match to the two other databases and found 1,674 (31%) that appear for the first time in BMIS in 2017.

The researchers concluded that, given the present limitations of the three databases taken individually, it is impossible to accurately count the number of investigators active in a given year, much less to determine with any accuracy attributes like their therapeutic specialties. Nonetheless, by matching and merging records across the three databases, they could assemble a more comprehensive record for many investigators. Still there are gaps. For investigator records lacking clinical trial identifiers, there is no way to determine the details of the investigators’ research activity.  With coordination among the data sources, additional uses for the data emerge, such as accurately estimating longitudinal trends over time. The researchers caution that such analyses too are prone to distortion due to exogenous developments like changes in international regulatory policies.  Not only can multiple investigators share the same name, but investigator records sometimes use different versions of an investigator’s name.

The researchers conclude that it would be much easier to understand investigator demographics and activity if the following changes were made in the databases: require study sponsors to submit Form FDA 1572s to BMIS with unique clinical trial identifiers; at minimum, BMIS and Open Payments should assign a unique identification number to each investigator and use that number consistently and assign a unique number to each clinical research site, because these names are often entered inconsistently.