A Domain-Specific Terminology for Retinopathy of Prematurity and Its Applications in Clinical Settings

A terminology (or coding system) is a formal set of controlled vocabulary in a specific domain. With a well-defined terminology, each concept in the target domain is assigned with a unique code, which can be identified and processed across different medical systems in an unambiguous way. Though there are lots of well-known biomedical terminologies, there is currently no domain-specific terminology for ROP (retinopathy of prematurity). Based on a collection of historical ROP patients' data in the electronic medical record system, we extracted the most frequent terms in the domain and organized them into a hierarchical coding system—ROP Minimal Standard Terminology, which contains 62 core concepts in 4 categories. This terminology has been successfully used to provide highly structured and semantic-rich clinical data in several ROP-related applications.


Introduction
Retinopathy of prematurity (ROP) is a vaso-proliferative retinal disease affecting premature and low birth-weight infants. It is one of the main causes of children blindness worldwide. With the advancement of perinatal care quality, the survival rate of premature infants increases steadily, making ROP an unneglectable problem in both developed and developing countries. In China alone, there are about two million premature babies born annually. The incidence rate of ROP among premature babies is about 10% [1]. A conservative estimate of annual ROP infants is 200,000. The timely screening and intervention have become a huge problem worldwide.
To address this problem, four years ago, we initiated the CMS-R (Case Management System for ROP) project. This system is designed to support effective clinical data management and provide cross-regional telemedicine of ROP screening. One prerequisite of CMS-R is a well-defined domain-specific terminology. Such a terminology is essential for achieving SDE (structured data entry) and generating highly structured clinical data. It can also be used for future data exchange with external health information systems. This paper will introduce a ROP-specific terminology developed for CMS-R.

Related Work
Terminology, a.k.a. controlled vocabulary, is a collection of terms with explicitly defined meanings and unique codes in a specific domain. In the medical domain, there are hundreds of openly published terminologies. Readers may refer to https://www.nlm.nih.gov/research/umls/sourcereleasedocs/ index.html for a list of medical terminologies. The following are some of the most widely used biomedical terminologies.
ICD (International Classification of Diseases [2]) organizes disease terms in a hierarchical style according to their semantic relations. It is widely used in EMRS (Electronic Medical Record System) and HIS (Hospital Information System) as diagnostic codes. LOINC (Logical Observation Identifiers Names and Codes) [3] is a terminology of tests, measurements, and observations, which is widely used in LIS (Laboratory Information System). CPT (Current  Most biomedical terminologies are focused on a specific domain or developed for a special purpose. When it comes to a specific domain, such specialized terminologies have more advantages than general-purposed ones: (1) Expressiveness: some fine-grained concepts in a specific domain may not be directly available in general-purposed terminologies. For example, "Type 1 ROP" is a special concept in the ROP domain and is difficult to find an off-the-shelf item in existing terminologies. (2) Efficiency: a specially tailored terminology can be more coherent and efficient in expressing certain domain concepts. In such cases, general-purposed medical terminologies may have to use complex postcoordinated expressions or combinations of multiple terms. (3) Reasoning and inference: specialized terminologies can use hierarchical coding systems to facilitate reasoning and semantic query. For example, H35.0 (background retinopathy and retinal vascular changes) and H35.1 (retinopathy of prematurity) in ICD-10 are sibling concepts under the common parent concept H35 (other retinal disorders).
Currently, there is no specially tailored terminology for ROP, which has hindered the effective application of ROP-related systems. In this manuscript, we will introduce a domain-specific terminology for ROP and demonstrate several used cases of ROP-related applications.

Clinical Settings and Materials.
This study is conducted in Shenzhen Eye Hospital (SEH), a 200-bedded class III  From the analysis, a total of 37,070 valid text strings are extracted, which correspond to 752 distinct narrative terms. We then sort the terms by their frequencies in descending order, to determine which terms are used most often. As the distinctive term number is not huge (752), the ophthalmologists manually coordinated (e.g., multiple free-text narrations of a same concept) these terms and reorganized them into a hierarchical concept tree.
3.2. The ROP_MST Terminology. Based on the above analysis, we built a hierarchical terminology-ROP_MST (ROP Minimal Standard Terminology), which contains 62 ROPrelated core concepts in 4 primary categories (i.e., diagnosis, treatment, examination, and laterality). Each concept has a unique code and multiple aliases (equivalent narratives in different languages). The encoding rule is similar to ICD, that is, the code of a subordinate concept is prefixed by its superior concept code. For example, intravitreal injection (T004) is a parent concept of Ranibizumab intravitreal injection (T004.M001). Such encoding rule facilitates conceptlevel information retrieval and semantic reasoning. Users may refer to Tables 1-5 for the terminology.

Structured Data Entry.
A basic usage of ROP_MST is SDE, which ensures highly structured and semantic-rich clinical data for ROP-related information systems. In CMS-R (demo version: http://ropd.brahma.top), SDE is widely used. As shown in Figure 1, the diagnostic tree is arranged by terms' conceptual hierarchy. Users can click the triangle icon to expand or collapse branches. When user clicks a child node, all parent nodes along its path will also be selected. User can express complex conditions by selecting multiple nodes. For example, "ROP Zone II Stage 4A ++" can be expressed by D002.A001, D002.A001.Z002, D002.A001.S004A, and D002.A001.P002. When user saves patient data, the codes of the selected terms will be persisted in the server-side database. As each concept/term is explicitly assigned to a unique code, the potential ambiguity and chaos that arise from free-text input can be prevented.

Advanced Search.
Information retrieval is a common task for clinical information systems, for example, searching   learning techniques to train a classifier to identify whether a fundus image has ROP or not. One prerequisite resource is a training set with high-quality class labels, and a "LabelR (Labeling Tool for ROP, http://label.brahma.top)" system was developed. LabelR allows user to assign multiple unambiguous and fine-grained diagnostic labels from ROP_MST to each fundus image ( Figure 2).

Conclusions and Discussions
The first version of ROP_MST was designed in 2013 and has since then been evolving to better suit pediatric ophthalmologists' needs. Compared to other coding systems, the unique strength of ROP_MST is its specialty and domain orientation. All terms in ROP_MST are systematically organized by a hierarchical coding mechanism and are much easier for ROP-related applications. During research, we also encountered several issues that require concerns or future research.

Using Clustering Algorithms to Aggregate Terms.
In building ROP_MST, the disambiguation of multiple literal strings for the same concept is performed manually by pediatric ophthalmologists. However, for other future ophthalmology terminologies, the total number of literal strings could be larger (say tens of thousands). For such cases, the manual operation would become unrealistic. A feasible solution would be designing a string similarity function (e.g., Levenshtein distance) and a text clustering algorithm (e.g., k-means).

Mapping with Existing Coding Systems.
In order to integrate existing biomedical data encoded by traditional coding systems, it is essential to implement a terminology translation service. This service aims to map existing coding systems to ROP_MST, which could be a rather complicated task due to the heterogeneity between terminologies. Although several concepts can be directly mapped (e.g., "retinopathy of prematurity" (H35.1, ICD-10) ↔ "ROP" (D002) and "stage of retinopathy in retinopathy of prematurity" (422746009, SNOMED CT)↔ "ROP stage" (D002.A001.S)), others may involve the mapping of multiple-concept combinations between different terminologies.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.