Q&A Report: Cracking the Code: When and How to Validate ICD Algorithms for RWE

Cracking the Code: When and How to Validate ICD Algorithms for RWE

Experts answer top questions from their recent panel discussion where they delivered a high level overview on validation of code algorithms for Real World Evidence (RWE).

Is there a way to validate the ICD algorithms when we only have claims data?

Not really. You can check the face validity of an algorithm by comparing the patient population identified in claims with that identified from, for example, a randomized controlled trial (RCT). This would be a way of confirming that the patients identified with the algorithm have similar ages, comorbidities, etc. to research study participants. It is not a true validation.

In the panel's experience, how does the algorithm vary by source of real world data ie. claims versus electronic medical records (EMR) versus registry?

Most of the algorithms that we have tried to validate have come largely from claims-type data (e.g., data that includes only ICD, CPT, and NDC codes).

What alternative solutions would you suggest if one does not have access to EHRs?

Unfortunately, there are no great solutions without access to a medical record. One thing we have done is to ask several clinicians to identify the patients they have that are “true positives” and send us a list of the ICD codes associated with those patients. This doesn’t result in a publishable validation study, but it is a type of validation and can be described in a manuscript.

Thoughts on need for validation studies for public data sources such as MEPS which generally mask diagnoses codes?

Certain sources, including MEPS and SEER, conduct validation studies themselves. We would suggest searching the data source website for these studies.

Why not use a combination of factors to validate a disease (meds, visits, etc.)?

The claims algorithms we talked about in the session often do include a combination of codes for visits, procedures, and medications. If an algorithm includes all these components, it can’t be validated with those same components. If the algorithm includes, for example only ICD codes, one can confirm the face validity by examining the list of medications used by the patients identified with that algorithm to a list of medication used by patients identified by a clinician or in a clinical trial.

Are there any guidelines for what is "good enough"‚ regarding sensitivity and specificity?

There are not. The validation statistics themselves serve only as guides and adequate performance depends on the study type and objective.

What do you do if your validation study reveals poor performance?

This will eventually happen to everyone who validates enough algorithms. Depending on how poor the performances is, you can add or subtract elements to the algorithm and see if performance improves. Sometimes, it is just not possible to use claims to identify a population with enough accuracy to allow secondary data studies to be done.

What do you think of the idea of using data driven methods like machine learning to develop algorithms?

None of us have successfully used machine learning or artificial intelligence (AI) to validate algorithms. The FDA Sentinel program is studying such methods, but nothing is available currently.