The proper exploration of patient-level data will pave the way towards personalised medicine. To better assess the state of the art in this field we identify the challenges and uncover the opportunities for the exploration of patient-level data through the review of well-known initiatives and projects focusing on the exploration of patient-level data. These cover a broad array of topics, from genomics to patient registries up to rare diseases research, among others. For each, we identified basic goals, involved partners, defined strategies and key technological and scientific outcomes, establishing the foundation for our analysis framework with four pillars: control, sustainability, technology, and science. Substantial research outcomes have been produced towards the exploration of patient-level data. The potential behind these data will be essential to realise the personalised medicine premise in upcoming years. Hence, relevant stakeholders continually push forward new developments in this domain, bringing novel opportunities that are ripe for exploration. Despite last decade’s translational research advances, personalised medicine is still far from being a reality. Patients’ data underlying potential goes beyond daily clinical practice. There are miscellaneous challenges and opportunities open for the exploration of these data by academia and business stakeholders.
The widespread collection of patient-level data represents a critical step towards the realization of personalised medicine [
Yet, along with the miscellaneous opportunities to explore patient-level data, this unparalleled growth of patients’ digital metadata brings several challenges [
Although patient-level data from public institutions, such as hospitals or regional/national administration centres, should be easier to access, it is generally locked under primitive technological implementations. This results in closed data silos that hinder scientific and technological evolution. Several large-scale projects already try to commoditize access to these data, whether through policies or through technical standards for data exchanges [
Pharmaceutical companies are also responsible for a big chunk of patient-level data [
It is important to distinguish between private companies’ data, which is the basis for internal research and development for new drugs and treatments, from public research datasets, fundamental to advance general scientific research. Although pharmaceutical companies are entitled to keep their results private, policies should be put in place to foster the sharing of clinically relevant results into the public domain.
Dealing with this heterogeneous mixture of private and public patient-level data, tools, standards, and projects is in itself a complex research and development challenge [
For this matter we established an evaluation framework to analyse the outcomes of existing initiatives, identifying current challenges and uncovering new opportunities. This framework is based on four key pillars: control, sustainability, technology, and science. We assess several components in each of these areas, generating a rather comprehensive study: the control section focuses on data ownership and access; the sustainability topics cover the long-term perspectives for each asset; on technology we assess the technical outcomes for each project, where existing; at the science level we identify the projects’ research areas and their key scientific outcomes.
We present this comprehensive review targeting three key objectives. These were to (1) identify the best initiatives dealing with patient-level data, (2) inspect and study their different features, and (3) evaluate tackled challenges and open opportunities. Furthermore, we shed some light on the current status of public investment into research, where the lack of strict evaluation guidelines brings too much liberty to funded project partners. This research work brings true added value to multiple fields in the scientific domain; from the performance analysis of hospital care [
This review covers past and on-going large-scale projects. Selected projects’ evaluation is based on an assessment framework with four key components: control, sustainability, technology, and science. This design allows us to better understand the projects’ outcomes distribution as well as defining an initial categorization for each project. We chose topics for matching criteria in each area based on mappings with existing ontologies, namely, Simple Knowledge Organization System (SKOS) [
At the control level we assess several topics, detailed next.
In this review we also assess the selected projects’ sustainability, covering the following areas:
At the technology level we identified the technological outcomes from the studied projects, where available.
At last, we inspected the key scientific outcomes for each project, evaluating their areas of impact.
We searched for large-scale international projects in literature and general listings. From there, the inclusion criteria for this review were as follows: is on-going or finished after January 1st, 2011; is sponsored mainly by the NIH, IMI, or the European Commission; includes partners from both academia and the business sector; must focus on rare diseases, pharmacy or have direct patient involvement; must have public published results.
For all identified projects, we reviewed titles, funding information, references, and available publications to better assess if the projects appeared to meet all inclusion criteria. If insufficient information was available to make a confident decision, we contacted key project partners to disclose further details.
This review provides an overview of the different attempts at improving the exploration of patient-level data. This section details the projects’ evaluation according to our framework, including a tabular and visual comparison of their distinct features. From this evaluation we identify the main challenges and opportunities for future research endeavours.
Our initial dataset was extracted from the online project databases of three major funding agencies: USA’s National Institutes of Health (NIH), European Commission (EC), and the Innovative Medicines Initiative (IMI) [
List of evaluated projects.
Project | Start | End | URL | Description |
---|---|---|---|---|
BBMRI | 2008 | 2011 |
|
BBMRI connects researchers, biobankers, patient advocacy groups, and pharmaceutical research companies to foster a quicker discovery of new treatments [ |
|
||||
BioMedBridges | 2012 | 2015 |
|
BioMedBridges’ goal is to launch a shared e-infrastructure for biological and biomedical data. |
|
||||
BioSHaRe-EU | 2010 | 2015 |
|
BioSHaRe-EU partners are working to ensure the development of harmonized measures and standardized computing infrastructures. |
|
||||
BRIDGEtoData | 2011 | — |
|
BRIDGEtoData aims to be an online reference platform describing population healthcare databases for use in epidemiology and health outcomes research. |
|
||||
DDMoRe | 2011 | 2016 |
|
The Drug Disease Model Resources (DDMoRe) project aims to establish a universal standard framework for modelling drugs and diseases [ |
|
||||
EHR4CR | 2011 | 2014 |
|
EHR4CR partners built, validated, and deployed a Europe-wide innovative technological platform to reuse EHRs data for clinical research purposes [ |
|
||||
ELIXIR | 2010 | 2018 |
|
ELIXIR project’s goal is to coordinate the collection, quality control, and archiving of large amounts of biological data [ |
|
||||
EMIF | 2012 | 2018 |
|
EMIF’s goal involves the creation of an innovative and connected patient registry catalogue that will enable researchers and pharmaceutical companies to search for patient-level data based on the databases’ digital fingerprints [ |
|
||||
ESGI | 2011 | 2015 |
|
ESGI’s goal is to integrate and standardise current and emerging technologies, providing access to infrastructures so that a broad group of European researchers can use the new technologies. |
|
||||
eTRIKS | 2012 | 2017 |
|
eTRIKS’ objective is to address knowledge management gaps by building a sustainable translational research informatics/knowledge management platform and to provide additional sustainable services. |
|
||||
EU-ADR | 2008 | 2012 |
|
EU-ADR project aimed developing a unique computerized system to detect adverse drug reactions (ADRs), supplementing spontaneous reporting systems [ |
|
||||
EURenOmics | 2012 | 2018 |
|
EURenOmics work is based on rare kidney diseases, where the project seeks to establish more accurate diagnoses strategies and improve clinical care. |
|
||||
Euro-BioImaging | 2010 | 2014 |
|
Euro-BioImaging’s main work covered the improvement of existing research infrastructures on a large scale. |
|
||||
GEN2PHEN | 2008 | 2013 |
|
GEN2PHEN was created to unify human and model organism genetic variation databases towards increasingly holistic views into Genotype-to-Phenotype (G2P) data and to link this system into other biomedical knowledge sources via genome browser functionality [ |
|
||||
NeurOmics | 2012 | 2018 |
|
NeurOmics’ research objectives feature the study of neurodegenerative and neuromuscular diseases in an attempt to explore Omics technologies to improve diagnosis, treatments, and general patient care. |
|
||||
OMOP | 2008 | 2013 |
|
OMOP’s goal was to design experiments testing a variety of analytical methodologies in a range of data types to look for drug impacts, going towards a complete database analysis standard [ |
|
||||
Oncotrack | 2011 | 2016 |
|
Oncotrack deploys several methods for systematic next generation oncology biomarker development [ |
|
||||
OpenPHACTS | 2011 | 2014 |
|
OpenPHACTS works with the integration of a relevant and continuously expanding subset of distributed heterogeneous data sources into one “virtual resource,” via the creation of a semantic interoperability layer [ |
|
||||
RD-Connect | 2012 | 2018 |
|
RD-Connect will launch an integrated platform connecting databases, registries, biobanks, and clinical bioinformatics for rare diseases research [ |
|
||||
Sentinel | 2008 | — |
|
Sentinel is a USA-based electronic system that will transform FDA’s ability to track the safety of drugs, biologics, and medical devices [ |
On a first glance we can quickly assess that the selected projects’ domains and goals are heterogeneous, with the access or use of patient-level data being one of the few common threads. There is also an obvious bias towards European projects, as the European Commission continues to be a strong proponent of research, namely, on the life sciences and medical areas.
In this section we explore the projects’ evaluation results according to the several pillars of our evaluation framework.
From Figure
Data control evaluation breakdown charts. Charts summarizing evaluation results for the control section of the proposed evaluation framework. (a) Data ownership; (b) data access; (c) data storage; (d) patient involvement; (e) security, privacy, and auditing.
Our sustainability review entails better prospects for future data exploration. As Figure
Data sustainability evaluation breakdown charts. These two charts feature the tracked sustainability topics in the proposed evaluation framework. (a) Business model and (b) data maintenance.
At the technological level, all evaluated projects already produced public results. As expected from the heterogeneous project goals, there is an assorted amount of technical outcomes. Figure
Technology outcomes’ evaluation evolution breakdown chart. This chart features the key technological outcomes across the various projects, as assessed according to the proposed evaluation framework. To better understand the results’ evolution over time, project evaluation results are divided between projects started before the year 2011 (A) and after the year 2011 (B).
As shown in Figure
Science outcomes’ evaluation evolution breakdown charts. Charts summarizing the various scientific research topics covered across the various projects assessed with the proposed evaluation framework. (a) Field of research; (b) area of interest. To better understand the results’ evolution over time, project evaluation results are divided between projects started before the year 2011 (A) and after the year 2011 (B).
Figure
With this evaluation we identified several challenges and opportunities. Challenges relate to data discovery, access, acquisition, and ownership. This brings several opportunities to deploy future solutions that fully explore the enormous amounts of patient-level data, using technological paradigms that projects are already supporting.
There is a clear dichotomy regarding data. Patient-level data is a very specific use case for exploration. While there are too many data scattered throughout multiple stakeholders, they are wildly difficult to obtain. The outcome of this is that, in the end, there is not enough data to generate statistically meaningful conclusions. Hence, we cannot discover or infer new knowledge because there is no access to a minimal amount of patient data. Along with distribution, data heterogeneity arises as a key challenge for exploring patient-level data. As shown in Figure
In the same vein, data translation also arises as a complex challenge for researchers. In addition to the obvious sense (translating data between multiple languages [
Data discovery, access, and acquisition are typical problems that can be solved by improving existing technologies and by focusing on their widespread adoption. Unlike these, data ownership is a much more complex issue. Dealing with data ownership involves tackling issues related with government’s policies, stakeholders’ interests, and projects’ internal guidelines. In an ideal scenario, all patient-level data should be available for research purposes. This should be particularly enforced in publicly funded projects. Yet, this does not happen. As seen in Figure
Great challenges leverage great opportunities. From our review, we believe there is room for improving how we explore patient-level data and how we can use it to further improve research and development towards personalised medicine. As Figure
There is huge potential behind the combination of data available worldwide. Yet, we need to develop and disseminate new technologies that improve how relevant entities collect, store, and share patient-level data.
As data integration is already commonplace, to obtain real advances in this domain we must see worldwide patient-level data as a whole, and not as single detached data silos. Although we already have the technology to accomplish this, stakeholders must unite efforts to make this holistic view a reality.
At the technical level, opportunities arise that demand the creation of new software and new standards. Likewise, at a policy level, we must improve existing guidelines and policies to better cover data sharing and ownership and ethics issues.
New data management standards should promote better (and easier) ways to access and share data. This will promote knowledge discovery and enable the integration and interoperability among patient-level data silos throughout the world. Likewise, going from patient-level data to summary-level data, and vice-versa, should be a simple straightforward process with the latest text-mining and semantic web tools.
Ideally, new software will empower collaboration and sharing among patients and clinicians. These should promote ease of access to patient information and enhance the communication process among clinicians. Furthermore, new tools are required to enhance data ownership controls, facilitating how patients, clinicians, or researchers express who has access to relevant personal data. More importantly, a combination of policies and guidelines should be put in place to foster the active involvement of patients in clinical care.
Despite the great opportunity for creating new standards and software, these assets alone are not enough to change the current scenario. New politics and guidelines, stemming directly from key worldwide stakeholders, must be disseminated to all interested parties. Moreover, with adequate support from governmental agencies (regional, national, and international), projects and their internal partners will proactively work towards implementing these new guidelines.
As this review reveals, there is room for change in the exploration of patient-level data. However, we must take in account that these results are biased and strict. This is an ever-expanding field with lots of partners, projects, and companies working in this subject.
While we tried to be comprehensive, this review has obvious limitations. Namely, identifying each project’s features and technical/scientific outcomes was a complex task. Once the projects finish, little to no effort is put into maintaining an accurate dissemination summary and rarely the projects results are assessed a couple years after each project’s conclusion.
The core focus of this review revolves around projects dealing with patient-level data stemming from electronic patient records. However, as shown in Figure
In a sense, patient sequencing data are patient-level data. Projects, such as 1000 Genomes [
In the long term, these data will be included in clinical patient registries. They may even be part of the electronic patient record. At this stage, clinicians will require new tools to adequately exploit the true value behind these data. In summary, this is a whole new field of exploration for personalised medicine and patient-level data research that cannot be ignored [
As detailed in previous sections, the various opportunities highlight the room for improvement in this domain. Assessing the projects’ timing evolution we identify that the focus on sharing, dissemination, and patient control is of growing relevance in the field.
The creation of new technical standards and data sharing policies will be fundamental for future research. Moreover, these topics are emerging in current project calls. Thus, they are becoming a stepping-stone for future research and infrastructure initiatives.
Despite the scale of on-going projects, they will not cover every possible topic. Technological developments in analytics tools, text-mining, ontologies, semantic web, data visualisation, integration, and interoperability, originating from distinct areas, must be brought to patient-level exploration.
The semantic web arises as a ground breaking paradigm to foster the intelligent integration of structured information. Sustained by state-of-the-art standards such as RDF, OWL, SPARQL, and LinkedData, semantic web promotes better strategies to express, infer, and make knowledge interoperable.
Latest advances in the area cover the research and development of new algorithms to further improve how we collect data, transform data into meaningful knowledge assertions, and publish connected knowledge. To further improve this, we must rely on the latest text-mining technologies. Elevating clinical text data to abstract knowledge or mapping the best matching ontologies to patient datasets require advanced text-mining solutions.
The combination of these strategies, semantic web, text-mining, and ontologies will pave the way towards interoperable scientific knowledge. These technologies will foster data integration and interoperability, enabling an effortless connection between heterogeneous distributed knowledge, obtained from patient-level data. Hence, the foundation of translational research, where multiple technical research areas collide, will be even more meaningful in the future.
Although this review had the main goal of covering the scientific results, we cannot ignore additional fundamental questions surrounding large-scale projects.
Hence, we must discuss the privacy policies applied to research-oriented datasets, the creation of businesses sustained by public funding, or the lack of publicly visible project evaluation outcomes.
The general community perceives that there is a huge amount of public funds being poured into research projects in all areas. Still, the outcomes of these projects are not as public as desired. There is an underlying sense of fulfilment in investing on research, especially in fields related with life sciences, such as rare diseases treatments, pharmaceutical research, or any other relevant omics field: IMI, EC, and NIH are funding science.
Figure
Likewise, Figure
At last, there is a great difficulty in finding projects details and their respective evaluation results. It is as if the IMI, EC, and NIH projects lists are difficult to access and lack essential project details on purpose. The general audience cannot find out how projects are evaluated, their assessment results and, more importantly, their visible outcomes. Despite having concluded that most project results are private, the projects’ evaluation should be public. Furthermore, it should be supported by a clear long-term plan that assessed the proper use of public funds to actually advance research. Finished projects should be evaluated in multiple timespans, not just when the deadline is reached. Evaluating projects 2, 5, or 10 years after their finish date would improve the understanding of how successful was the large sum of invested money.
The reality is that IMI, EC, and NIH are funding projects that have the liberty to create for-profit businesses and, more importantly, the liberty to apply public funds to the most diverse research tasks, whether they are directly related to the expected project results.
This review provides an overview of different initiatives that try to properly explore patient data. We limited our study to research and development projects in the recent past. We established base criteria to evaluate on-going initiatives. This resulted in the identification of several opportunities for future developments, namely, (1) bringing distributed data together by putting more advanced sharing and integration at clinicians’ fingertips; (2) focus on text-mining and semantic web technologies to create real knowledge from distributed and heterogeneous data; and (3) pressuring stakeholders for stricter project evaluations that will foster a quicker evolution pace. The lack of well-established and widely adopted solutions covering these areas represents a major roadblock for the adequate exploration of patient-level data. However, if future projects consistently adopt these overarching goals, personalised medicine will be one step closer.
More importantly, in addition to the research-specific evaluation outcomes, we must highlight the strange patterns behind large-scale project funding. Although IMI, NIH, and EC provide intensive financial support for research, what we witness is that the money is being used to create for-profit businesses and closed research datasets. Furthermore, funding agencies lack clear evaluation frameworks that properly assess the success of public investment into large-scale research.
The authors declare that there is no conflict of interests regarding the publication of this paper.
The research leading to these results has received funding from the European Community (FP7/2007–2013) under Reference no. 305444, the RD-Connect project, from EU/EFPIA Innovative Medicines Initiative Joint Undertaking (EMIF Grant no. 115372), and from the QREN “MaisCentro” program, Reference CENTRO-07-ST24-FEDER-00203, the Cloud Thinking project.