Postmarketing drug surveillance is a crucial aspect of the clinical research activities in pharmacovigilance and pharmacoepidemiology. Successful utilization of available Electronic Health Record (EHR) data can complement and strengthen postmarketing safety studies. In terms of the secondary use of EHRs, access and analysis of patient data across different domains are a critical factor; we address this data interoperability problem between EHR systems and clinical research systems in this paper. We demonstrate that this problem can be solved in an upper level with the use of common data elements in a standardized fashion so that clinical researchers can work with different EHR systems independently of the underlying information model. Postmarketing Safety Study Tool lets the clinical researchers extract data from different EHR systems by designing data collection set schemas through common data elements. The tool interacts with a semantic metadata registry through IHE data element exchange profile. Postmarketing Safety Study Tool and its supporting components have been implemented and deployed on the central data warehouse of the Lombardy region, Italy, which contains anonymized records of about 16 million patients with over 10-year longitudinal data on average. Clinical researchers in Roche validate the tool with real life use cases.
It is a well-accepted fact that, due to the limited scope and duration of clinical trials, drugs may still have serious side effects, adverse drug reactions (ADRs), after they are marketed. Postmarketing drug surveillance systems have been in place in order to analyze additional information about a drug’s safety, efficacy, and optimal use to capture such ADRs. During the last decades, postmarketing activities in pharmacovigilance have largely depended on spontaneous case reports, which is still the case unfortunately. There are certain limitations on surveillance activities with spontaneous report data [
At present, postmarketing drug surveillance is largely being carried out with traditional methods for both pharmacovigilance and pharmacoepidemiology. In pharmacovigilance, there is active research on data mining algorithms [
Successful utilization of available EHRs for clinical research in terms of access, management, and analysis of patient data within and across different functional domains is a critical factor in terms of secondary reuse [
The objective of the aforementioned initiatives is to use the available EHR data held by multiple different systems for clinical research purposes (mainly for postmarketing surveillance, comparative effectiveness research, and evidence development). In addition to the distributed architecture of the Sentinel Initiative, recent research projects like SALUS (Scalable, Standard based Interoperability Framework for Sustainable Proactive Post Market Safety Studies) [
Current research on postmarketing surveillance for pharmacovigilance and pharmacoepidemiology tries to unify the available EHR data on a common information model. Most of the time, this forces the EHR systems to implement the necessary adapters for transforming data into the defined common model and persist in a separate database. Either distributed or not, analyses on longitudinal EHR data require clinical researchers to implement the designed algorithms and build methods according to the predefined data model of the database that they are working on. On the other hand, some approaches transform the query to the native data model at each transaction. It is an experienced fact that data and processing requirements of different areas of clinical research change in time while the quality, quantity, and availability of EHR data on patient care side increase. In parallel with this, new initiatives propose new common data models into which collaborating EHR sources have to transform and transfer data, regardless of the system’s nature. The literature exemplifies this situation clearly.
Vaccine Safety Datalink [
In this paper, we address the heterogeneity problem among common data models for clinical researchers who work on EHR data for postmarketing surveillance studies. We show that this problem of interoperability can be solved in an upper level with the use of common data element (CDE) phenomenon [
In the light of the common data element based interoperability approach, we design and implement the Postmarketing Safety Study Tool (PMSST) which can extract any needed information from a patient record after it is retrieved as a result of an eligibility query or it is directly accessed from the EHR database within a data mining routine. Our design is built upon the notion of CDEs and makes use of a Semantic Metadata Registry (MDR) to retrieve data element definitions and use their extraction specifications to access data [
The tool that we introduce in this paper has been built within the SALUS interoperability framework. Hence, first of all, SALUS project and its incorporated common data element based interoperability framework are introduced in Section
SALUS aims to create a semantic interoperability layer in order to enable the secondary use of EHR data for clinical research activities. SALUS follows a common data element based interoperability approach and uses the semantic MDR to maintain its common data elements (CDEs). Built upon its abstract CDE definitions, SALUS exposes a semantic RDF [
Several organizations are publishing common data element dictionaries and common models in order to solve the interoperability problem within and between clinical research and patient care borders. The objective is to provide a dictionary like the collection of the abstract definitions of common data elements. Most of the time, these definitions are published as unstructured text files. Rarely, semistructured spreadsheets are used to publish the data element specifications. Health Information Technology Standards Panel (HITSP) is one of such organizations publishing a library of common data elements, called HITSP C154: Data Dictionary [
Data retrieval mechanism of the SALUS enabled clinical research tools has been built on top of the idea of data interoperability through federated semantic metadata registries [
The abstract SALUS common data element (CDE) [
As illustrated in Figure
CDE based data interoperability framework through federated semantic metadata registries.
PMSST is one of the safety analysis tools developed within the scope of the SALUS project. Within this CDE based data interoperability framework, PMSST retrieves the CDE definitions from a semantic MDR where any common data element model can be maintained according to ISO/IEC 11179 metamodel [
Using PMSST, a clinical researcher designs a data schema (a template) by using SDTM variables on which she writes scripts (i.e., SAS [
Postmarketing Safety Study Tool is a web based tool enabling clinical researchers to extract data from different EHR systems by designing data collection sets through common data elements. After patient record is retrieved as a result of an eligibility query, any needed information can be extracted from the patient record to populate the data collection set with the help of abstract CDE definitions used to annotate data collection set definitions. By means of the underlying interoperability framework [
Roche is conducting clinical trials in both acute coronary syndrome (ACS) patients and in ACS patients with diabetes. Whilst the trials are blind, it is important to compare the observed overall incidence rate of an important adverse event like CHF in the trials with that in similar background populations. Such a comparison provides a context to the observed incidence and enables us to identify any potential safety concerns earlier on (e.g., if the observed incidence in the trial is greater than the background).
Table
Data collection set schema details for the PMSST use case.
Scheme item description | Data Elements of the Schema Item | Corresponding SDTM data element name | MedDRA code for MH.MHPTCD |
|
|||
Sex | Sex | DM.SEX | |
|
|||
Date of acute Coronary syndrome (ACS) event | (i) ACS event |
(i) MH.MHPTCD |
10051592 |
|
|||
Date of acute myocardial infarction | (i) Acute myocardial infarction |
(i) MH.MHPTCD |
10000891 |
|
|||
Date of unstable angina | (i) Unstable angina pectoris |
(i) MH.MHPTCD |
10002388 |
|
|||
Had a congestive heart failure (CHF) before start of ACS (Y/N) | (i) Congestive heart failure |
(i) MH.MHPTCD |
10007559 |
|
|||
Had a CHF after start of ACS (Y/N) | (i) Congestive heart failure |
(i) MH.MHPTCD |
10007559 |
Patient selection phase is the execution of the eligibility criteria for retrieving the data of the defined cohort. For PMSST, this execution is handled through the semantic interoperability layer of SALUS. However, this could be any other system like Sentinel or Query Health [
Analyzing the use case, we elicited the key requirements for PMSST and based the design of the key functionalities of PMSST on these requirements. During the data collection set schema definition process, values of particular schema items might be used in defining other schema items. Therefore, PMSST provides a flexible variable definition mechanism. PMSST keeps track of the variable definitions and generates the queries to be applied on the EHR data and organizes their execution order.
As it can be seen in Table
Value domains of the used CDEs may be referring to different terminology/coding systems. For example, while asking whether a patient has T2D or not, researcher at Roche uses MHPTCD common data element from the MH domain of SDTM. Since this data element requires a coded value from MedDRA, the researcher should easily assign values to such data elements during her schema design. For this purpose, PMSST has been integrated with a terminology server so that it would recommend possible values based on the schema item through a type-a-head search mechanism.
PMSST is a web based tool which can be used via modern web browsers. It has been implemented with the latest high performance web technologies incorporating HTML5 design principles and RESTful client-server communication. The tool is composed of an eligibility query execution and a data selection part. Details of the former are out of the scope of this paper. Upon the execution of an eligibility query, a cohort of patient data is retrieved in the form of a content model adopted by the EHR sources. We claim that the CDE based interoperability implementation of PMSST can make use of any content model as long as the appropriate extraction specifications are available for the abstract CDE definitions within the semantic MDR framework.
Figure
A snapshot of PMSST while the researcher defines a data collection set schema. On the right hand side, domains of SDTM form a circle; if selected, then CDEs of that domain form the circle. On the left hand side, a schema item “Date_HbA1C_Average1YBeforeACS” is created out of 4 SDTM elements. Below that, a list of other schema items is shown.
PMSST is composed of several different components among which a number of integration mechanisms exist. In Figure The researcher uses a web browser to define the data collection set schema by using the CDEs. Roche researchers use SDTM variables in our deployment as identified in Table CDEs are maintained in the semantic MDR and retrieved through the IHE DEX profile. The user browses the CDEs starting from the object classes in a top-down fashion. If the user likes to restrict the value of a selected data element (i.e., set acute myocardial infarction to MHPTCD element), possible values can automatically be searched from the terminology server. PMSST knows in which coding system to look for the term by analyzing the value domain of the CDE definition automatically. After the user completes the schema definition, identifying each schema item by using abstract CDE definitions, the schema definition is sent to the PMSST engine on the server side. Eligibility query is sent to the SALUS system and EHR data of the eligible patients is retrieved in the form of SALUS common information model. For each schema item definition, PMSST engine extracts information from the EHR data and performs necessary calculations to place into the appropriate location according to the schema definition.
Schema is defined by SDTM elements. Semantic MDR keeps the mappings between SDTM and SALUS CDEs as presented in Table If the schema item definition includes a value in one of its defining CDEs, value analysis should be done. However, in our deployment, EHR data is coded with ICD-9-CM terminology system for patient conditions while SDTM elements refer to MedDRA or NCI terms. The terminology server includes mappings between these different coding systems, and PMSST can do value matching with the help of this terminology server. As a result of these data extraction operations, the data collection set is populated conforming to the schema defined by the researcher. User can write analysis methods on top of this schema independently of the underlying EHR source model. In our deployment, Roche implements SAS scripts to do further analysis. Finally the analysis results are presented to the researcher.
Mappings of the common data elements: SDTM, SALUS CDE set, and HITSP C154 Data Dictionary.
SDTM | SALUS CDE | HITSP C154 |
---|---|---|
DM | Patient | Personal Information |
DM.DMSEX | Patient.Gender.CD | 1.06 Personal Information Gender |
MH | Patient.Condition.Condition | Conditions |
MH.MHPTCD | Condition.ProblemCode.CD | 7.04 Conditions Problem Code |
MH.MHSTDTC | Condition.TimeInterval.IVLTS | 7.01 Conditions Problem Date |
Step-by-step representation of the data flow between different components. A clinical researcher uses PMSST in order to define a data collection set schema so that when patient data is retrieved from the underlying EHR source(s), data will be automatically transformed to that schema.
The CDE based data interoperability approach lets the PMSST interact with the semantic MDR through IHE DEX profile and retrieve abstract CDE definitions. The researcher interacting with the PMSST uses the data elements that she is used to in her research domain. The underlying architecture of the PMSST does not make any message translation between different content models (i.e., from SALUS CIM based patient summaries to SDTM conformant instances). Instead, the abstract CDE definitions and their semantic links maintained by the semantic MDR are processed in order to find an extraction specification to be executed on the content model to which the EHR data conforms. This clear distinction between the abstract and implementation dependent parts of the CDEs enables integrating the CDEs with the semantic web technologies and linked data principles by using the semantic MDR.
In our semantic MDR, the links between different sets of abstract CDE definitions can be established through well-known knowledge organization systems such as SKOS (i.e., skos:exactMatch) or any property can be indicated again with SKOS (i.e., skos:notation) or other ontological constructs.
PMSST makes use of the abstract CDE definitions of the SDTM variables retrieved from the semantic MDR. In order to enable the retrieval of the extraction specifications given the SDTM variables, we mapped the SDTM elements to the SALUS CDEs. We implemented an automatic content model importer on top of the open API of the semantic MDR for importing the SDTM variables and their mappings to SALUS CDEs. In this way, although the user defined the data collection set via SDTM variables, he becomes able to collect the requested data from the EHRs sources that can provide the EHR data of the eligible patients through SALUS common information model.
In the semantic MDR, SALUS CDEs have also mappings to HITSP C154 Data Dictionary [
Although the usage of the tool starts with defining eligibility criteria and retrieving EHR data according to that query, our implementation is independent of the content model according to which the EHR data is shaped. For example, if the underlying EHR system can provide HL7/ASTM CCD based patient summaries, then PMSST can seamlessly process the data by using the corresponding extraction specifications retrieved from the semantic MDR. Because HITSP C154 defines XPath expressions from its CDE definitions to HL7/ASTM CCD based documents and PMSST can retrieve the extraction specifications through the HITSP C154 mappings, this time, the extraction specifications would be XPath expressions and clinical researcher would not be aware of this. This means that PMSST can automatically communicate with an EHR system which is capable of exporting HL7/ASTM CCD based document summaries and make the data available for clinical research automatically.
In the context of the SALUS project, PMSST and all related components have been implemented and deployed on top of the SALUS interoperability framework integrated with the central data warehouse of the Lombardy region, Italy. This regional database includes anonymized data of ~16 million patients with over 10-year longitudinal data on the average. Clinical researchers in Roche are validating the PMSST with real life use cases, one of which is presented in this paper. Till the actual deployment within SALUS, we worked with simulated data to collect further requirements from clinical researchers and improve the capabilities of PMSST.
For the eligibility criteria defined in the use case introduced in this paper, PMSST retrieved anonymized data of ~8000 acute coronary syndrome (ACS) patients from a population of ~16 million patients. The definition of the data collection set schema starts after retrieving the cohort based on the eligibility criteria. PMSST transforms the retrieved cohort data to the model defined by the data collection set schema. The eligibility query execution is not in scope of this paper.
The researchers in Roche cannot directly see the resultant data according to the privacy rules of the SALUS project. Instead, their analysis methods are executed on the returned dataset and the results of this execution are presented through the graphical user interface of PMSST. Researchers from Roche have implemented SAS scripts assuming that in the end they will have the data represented with SDTM variables, which they already use in their daily work. However, the data warehouse of the Lombardy region has a custom schema. SALUS technical and semantic interoperability solutions retrieve data from this custom database and transform to instances of SALUS common information model (CIM). The CDE based data interoperability approach has enabled the mappings between the SDTM variables and SALUS CDEs where the SALUS CDEs have their extraction specifications (SPARQL scripts in this case). With the help of this architecture, PMSST can extract data from the SALUS CIM based patient data by using the SDTM variables. Afterwards, the analysis routines (i.e., SAS scripts) of the clinical researchers run on the SDTM conformant data.
In order to assess the validity of the data collection set calculated by PMSST, we have conducted a comparative analysis. By issuing SQL queries to the data warehouse of the Lombardy region, we have obtained several statistics regarding the items in the data collection set and compared them with the set populated by PMSST. For instance, demographic analysis on Lombardy region data warehouse shows that 38.22% of ACS patients are females which is equal to the percentage of “Female” in “Sex” column of the resultant PMSST data collection set. Similarly, PMMST calculates the incidence “Patient died any time after start of ACS” as 34.63% which aligns with the real values calculated in LISPA data warehouse. These analyses over the items of the data collection set show that the dataset created by PMSST is correct compared to the original data and assure us about the reliability of the results.
One problem we observed during the analysis is that many items of data collection set defined in Roche use case were actually empty. For instance, there was no data in any of the patients regarding the systolic and diastolic blood pressure measurements, or history of smoking. To deduce the cause of the problem, we have investigated the LISPA data warehouse and found out that the available data is not fully structured. Those empty columns in the resultant data collection set were also missing in the data warehouse. This has hindered Roche researchers from full utilization of the PMSST according to the planned use case. On the other hand, the records which exist in the data warehouse, such as the ones related to congestive heart failure (CHF), have been processed by the tool accurately. Thus, we conclude that PMSST can be fully exploited once the underlying EHR source gets more structured and includes more data about the patients.
In order to assess whether PMSST fulfills the intended use from an end-user point of view or not, it has been tested and evaluated by real end-users from Roche in the scope of the SALUS project. An evaluation and validation framework based on the ISO/IEC 25040 Software engineering, Software Product Quality Requirements and Evaluation (SQuaRE) standard has been developed. According to the developed framework, a total of 6 users including a data analyst and an epidemiologist have taken part in the evaluation in order to assess the feasibility of conducting a study over a particular EHR system by using PMSST.
We have built a PMSST specific questionnaire which is based on the Health IT Usability Evaluation Scale [ PMSST has been a positive addition to postmarketing safety studies. Using PMSST makes it easier to define data fields (the Data Selection tab) to be extracted from the retrieved patient summaries. Using PMSST enables defining data collection fields and performing data selection more quickly.
Apart from the questionnaire results, the evaluators provided specific comments addressing the benefits of PMSST in their studies. Currently Roche conducts safety analysis studies based on some sample EHR datasets it has. It has been concluded that having a tool like PMSST which enables extraction of selected data collection sets of a specified cohort selection from different EHR systems would be very beneficial for pharmaceutical companies as it will increase the size and variability of patient data pools. The data analyst and the epidemiologist from Roche positively agree on that PMSST has been successful to achieve its focused objective and the data provided by the tool is feasible and suitable for a wider range of observational clinical studies. On the other hand, they also report that the efficiency of the tool and the completeness of the data vary depending on the status of selected EHR data source.
The deployment in Lombardy region, Italy, can only serve the SALUS project partners (i.e., Roche) behind high security firewalls because of the privacy concerns. For the interested readers, we prepared a dataset of 50 simulated patients considering the real world facts that Roche has experienced from previous studies and ongoing work on Lombardy region database. A package of the deployable software including this simulated dataset can be requested from the corresponding author.
The PMSST, introduced in this paper, enables clinical researches to define data collection set schemas on which the postmarketing safety studies will be conducted without being concerned about the structure of the underlying data sources. The main benefit of utilization of CDE based interoperability architecture is the ability of developing surveillance methods which do not have to be restricted to the data model of the EHR source: cohort selection and data collection set definition can be easily done in researchers’ own language (such as SDTM). Moreover, such an interoperability architecture allows the data collection operation to be run on distributed EHRs resources which might be using different content models to expose patient data.
The authors declare that there is no conflict of interests regarding the publication of this paper.
The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007–2013) under Grant Agreement no. ICT-287800, SALUS project (Scalable, Standard Based Interoperability Framework for Sustainable Proactive Post Market Safety Studies).