Identifying people with cancer
As part of this program, an SNDS cancer algorithm was developed and validated to identify a number of cancer sites associated with occupational exposures.
To validate these algorithms, a validation study was conducted using French cancer registries.
Three algorithms developed
Three generic algorithms were developed to identify ESPrI retirees with specific cancer sites in the National Health Data System of the health insurance program. The objective of this study was to validate the algorithms using data from the ESPrI cohort and the French cancer registries. Seven general cancer registries and one specialized registry covering 10 departments shared with ESPrI (Somme, Gironde, Lille and its region, Poitou-Charentes, Manche, Calvados, Haute-Vienne) were approached, and all agreed to participate.
Results
The ESPrI cohort consists of retired artisans with an average age of 73 in 2020. Between 2011 and 2016, data from 7,544 participants recorded in a self-administered questionnaire were linked to the SNDS. Data from the French cancer registries were used as the gold standard. Performance indicators such as sensitivity, specificity, and positive and negative predictive values (PPV, NPV) were estimated for the three algorithms based on the ICD-10 code and the earliest year of Long-Term Conditions (A) and PMSI (Programme de Médicalisation des Systèmes d’Information) data (B) were developed. The first algorithm, named “PMSI,” derived from MCO hospitalization data in the PMSI, was constructed using the admission date, the ICD-10 codes for the patient’s conditions, and the medical procedures performed (chemotherapy, radiation therapy, etc.) related to the primary, secondary, or associated diagnoses (DP, DR, DAS, respectively) pertaining to the hospital stays. The third algorithm resulted from the combination of ALD and PMSI (C). Furthermore, false positives and false negatives were analyzed to determine whether better algorithms could be developed.
Their sensitivities ranged from 50% to 93% with algorithm A, from 71% to 100% with algorithm B, and from 92% to 100% with algorithm C, respectively. The best algorithm was Algorithm C, based on two combined data sources—ALD and PMSI—which showed the highest sensitivities and PPVs for certain cancer sites. The majority of false positives were actually in situ cancers and cancers in adjacent organs (such as the esophagus and stomach). The majority of false negatives were likely due to underreporting of ALD.
The algorithms performed well. The SNDS and the validated algorithms can be used for passive epidemiological surveillance of specific cancer sites in the ESPrI cohort. A paper has been submitted.
1 The principal diagnosis (PD) is defined as the condition primarily responsible for the patient’s hospitalization. The associated diagnoses required for classification (ADs) are, in fact, the diagnoses, symptoms, and other significant reasons for seeking care or using resources.