Clinical Trial Fraud Detection

read 3 mts

Clinical trials are an important and primary way for researchers to find out if a new treatment, like a new drug or diet or medical device (for example, a pacemaker) is safe and effective in people. But in the past few years, numerous cases of fabrication of data in the clinical trials have raised concern worldwide. In this article, we will talk about the tools and methods that can help the clinical trial statisticians, to detect or prevent such frauds?

About the Client

A Fortune 100 pharmaceutical manufacturer.

Business Problem

Honesty, while performing and reporting data and results from clinical trials, is the key to the advancement of medical knowledge. In clinical trials, volunteers are given a treatment whose effectiveness and risks are to be ascertained.  This requires carefully controlled experimentation, close monitoring of the volunteers and diligent recording of measurements. There is a huge dependency on honesty and trust in the individuals or organisations that are tasked with performing these clinical trials.

The article on Data fraud in clinical trials describes some publicised cases of fraud in clinical trials involving falsification of data ranging from minor alterations to complete fabrication of the data!


To identify potential falsifications of the data, one has to rely on the presence of some regular patterns in genuine data.  Since various types of patterns exist, these concern the distributions of each variable,  correlations among variables, the multivariate structure of the data and patterns across time.

Typically, a clinical trial is run across multiple centres with identical protocols. When data is falsified at one centre,  some of those natural patterns are likely to break down unless data is very carefully falsified.  Sometimes, the data may not exhibit the natural randomness one would expect to see in genuine data. For example, in one well-publicized case mentioned in the article referred above, a Norwegian physician and researcher was found to have completely fabricated data – 250 out of the 908 subjects in his data had the same date of birth!

The client desired an improved alert system for a specific type of fraud that can happen during clinical trials. As the existing model was detecting only a fraction of the fraud, they wanted a more comprehensive system.

Our Approach

We performed an initial study which identified that the current model and evaluation metrics used by the firm were most likely inadequate and cannot reveal a substantial portion of the fraud.

Work was done to:

  1. Redefine more effective and custom evaluation metrics.
  2. Prototyping ML model and testing.
  3. Building the entire production system.
  4. Developing full-fledged visualization and reporting system for decision makers.

The models had to comply with legal regulations which excluded the use of certain historical data or excluding the use of certain types of analysis. In our case, we could not make use of patterns over time which would have disclosed whether the subject was in the placebo group or in the test group.


We delivered and deployed an E2E system that pulled the data from the database, ran ML models to generate alerts and push them back to the database and a visualization layer that provided users (including auditors) with the ability to perform interactive analysis of the data in the context of an alert.

The complete project lasted a little over 6 months, with the client rolling it out successfully across the entire organisation.


  • A complete production system was delivered and implemented on the client’s assets.
  • The IP and knowledge were fully transferred.

Leave a Reply

Your email address will not be published. Required fields are marked *