Diagnostic Criteria of Postoperative Cognitive Dysfunction: A Focused Systematic Review

Postoperative Cognitive Dysfunction (POCD) is characterized by a deterioration in cognitive performance after surgery and is increasingly addressed in research studies. However, a uniform definition of POCD seems to be lacking, which is a major threat to clinical research in this area. We performed a focused systematic review to determine the current degree of heterogeneity in how POCD is defined across studies and to identify those diagnostic criteria that are used most commonly. The search identified 173 records, of which 30 were included. Neurocognitive testing was most commonly performed shortly before surgery and at 7 days postoperatively. A variety of neurocognitive tests were used to test a range of cognitive domains, including complex attention, language, executive functioning, perceptual-motor function, and learning and memory. The tests that were used most commonly were the Mini-Mental State Examination, the digit span test, the trail making test part A, and the digit symbol substitution test, but consensus on which test result would be considered “positive” for POCD was sparse. The results of this systematic review suggest the lack of a consistent approach towards defining POCD. However, commonalities were identified which may serve as a common denominator for deriving consensus-based diagnostic guidelines for POCD.


Introduction
Postoperative Cognitive Dysfunction (POCD) is characterized by a deterioration in cognitive performance after surgery and is particularly prevalent in elderly patients [1]. e cognitive decline is usually transient, but can be a substantial threat to the quality of life as it occasionally persists for months to years after surgery. e pathophysiology and risk factors are incompletely understood, and effective strategies for prevention and treatment yet need to be established. us, POCD is the target of an increasing amount of research studies and reviews published in leading anesthesia journals [2][3][4].
Despite the clinical relevance and the need for standardized research, a uniform definition of POCD seems to be lacking. A systematic review in 2010 revealed a major inconsistency in diagnostic criteria, neurocognitive tests, and timing of the assessment of POCD across studies in cardiac surgery [5]. e need for uniform terminology and diagnostic criteria in POCD research has in the meantime been widely recognized, and recommendations on uniform nomenclature for perioperative neurocognitive disorders have been published [6]. Yet, it is unclear whether POCD research has recently adopted a more uniform approach to defining POCD and which criteria and tests are currently most commonly used to define POCD.
We thus performed a focused systematic review, limited to literature published in the last two years, to determine the current degree of heterogeneity in how POCD is defined and diagnosed across research studies. e results of this review may identify those tests and criteria that are currently used most often and can serve as a common denominator to derive a broader consensus on which tests and criteria should actually be used to diagnose POCD. . Titles and abstracts were screened by 2 authors, and the full text of potentially relevant articles was retrieved. Original research studies on human subjects, published in English language, that explicitly reported which diagnostic criteria had been used to define POCD, were eligible for inclusion. Data on study and patient characteristics, POCD criteria, and tests used to assess POCD, as well as the time points at which the tests were performed, were extracted using a standardized data extraction form.
Because the research question does not lend itself to quantitative pooling of data across studies, and due to substantial heterogeneity across studies, a meta-analysis of the collected data was not performed, and all results are presented descriptively.

Results
e PubMed search identified 173 records, and 61 full-text articles were assessed for eligibility ( Figure 1). Of these, 30 studies reporting POCD criteria were included in this systematic review . Seven studies report data of a randomized controlled trial, while the rest were observational study designs (cohort, case-control, and longitudinal studies). e study characteristics, POCD criteria, and POCD assessment tests and time points are summarized in Table 1.
In the majority of studies (n � 25), baseline neurocognitive performance was measured either on the day of surgery or on the day before surgery, while the other studies used varying preoperative time intervals. e most frequently used follow-up time was 7 days after surgery (n � 23). Ten out of the 30 studies followed patients up over a longer period of time, up to one year after surgery.
A variety of different neurocognitive tests (Table 2) were used to test a range of cognitive domains, including complex attention, language, executive functioning, perceptual-motor function, and learning and memory. We refer to Sachdev et al. [37] for a detailed description of the different cognitive domains and to Evered and Silbert [3]       Anesthesiology Research and Practice  , which measures short-term memory, the trail making test part A (n � 13), which tests processing speed and mental flexibility by connecting numbered dots in sequence, and the digit symbol substitution test (n � 11), which measures visuoperceptual functions and motor speed. Not only the used tests and timing of testing differed considerably between studies, but also the criteria to define a "positive" test result. While some authors use a deterioration compared to baseline testing in terms of the absolute test score (e.g., >2-point deterioration in the MMSE), other authors define a positive test result in terms of the deterioration of ≥1 standard deviation from the baseline measurement. Other authors use z-scores, in which the difference in the observed change from baseline between surgical patients and control patients is scaled by the standard deviation of control patients [38]. e "reliable change index" (RCI)-a measure of change from baseline in units of the standard error of the change [39]-has occasionally been used in addition to, or instead of, z-scores.

Discussion
Symptoms of cognitive dysfunction are estimated to occur postoperatively in about 12% of patients without apparent preoperative cognitive dysfunction undergoing noncardiac surgery [4], and the incidence may be as high as 50-70% after cardiac surgery [40]. While these symptoms are often transient, cognitive impairment has been observed in up to 10% of elderly patients at 3 months after surgery [4], and it has been estimated that about one-half of elderly patients with POCD suffer permanent dysfunction [41]. POCD is thus evidently a major threat after an operation in particular to elderly patients, and every effort is needed to prevent, diagnose, and treat POCD in this vulnerable patient population.
Unfortunately, the pathophysiology and risk factors are still poorly understood, and evidence-based treatment options are scarce. It is thus not surprising that POCD is a target of an increasing amount of research papers in anesthesiologic, surgical, and neurologic literature. Using the search term "postoperative cognitive dysfunction" in PubMed yields almost 4,000 results at the time of the writing of this manuscript, with a steady increase over the years: while there were 50 publications in the year 2000, this has increased to 168 in 2010 and to 426 in 2019. e literature, however, has in the past been very heterogenous in how POCD was defined, and this is a major threat to clinical research in this area: when the condition being studied is not well characterized and well defined, study results may not be readily comparable and applicable to clinical practice. Moreover, there are several types of cognitive impairments that can be observed after an operation, including delirium, POCD, and dementia. ese terms are sometimes used interchangeably or with a broad overlap in the literature, and a clear distinction is often lacking. In 2010, Rudolph and colleagues have demonstrated a substantial inconsistency in diagnostic criteria, neurocognitive tests, and timing of the assessment of POCD across studies in cardiac surgery [5]. Since then, multiple authors have emphasized the need for uniform terminology and diagnostic criteria in POCD research. However, while it is clear that a uniform approach to defining POCD was previously lacking, it is unclear whether this issue has been successfully improved or even resolved in recent years.
In this focused systematic review, we therefore aimed to study how POCD is defined and measured in the recent literature. e results suggest that even in the most recent POCD literature, a consistent approach towards defining and measuring POCD is still lacking. As previously noted by other authors [4,42,43], uniform diagnostic criteria are needed, and this is clearly reinforced by our data. While our systematic review identified the lack of a consistent approach towards defining POCD, the data, however, provide more than a mere description of heterogeneity across recent studies. We observed some patterns that may serve as a common denominator in the process of reaching standardized diagnostic criteria: with respect to timing, a follow-up measurement at 7 days postoperatively seems to be broadly accepted. With respect to the tests being used, the MMSE, digit span test, trail making test part A, and the digit symbol substitution test were used most commonly, suggesting a broad consensus that these tests are particularly useful for the diagnosis of postoperative neurocognitive disorders. However, it is still unclear which test result would be considered "positive" for POCD. Moreover, it should be noted that these tests have not been developed for POCD testing and have also not been rigorously validated for this purpose. For example, the MMSE has been shown to be less sensitive than the MoCA for a variety of neurocognitive disorders [44], and this could also be true for POCD. is implies that more research not only on the pathophysiology, prevention, and treatment of POCD is required, but also on the diagnostic accuracy [45] of the tests themselves that are commonly used to diagnose POCD.
Our data suggest that strong efforts are necessary to define precise and applicable diagnostic criteria for POCD, involving key-opinion leaders and researchers in the field. is would, in our opinion, be a strong and necessary step towards a precise characterization of the disease, its underlying pathophysiology, risk factors, and treatment options. Notably, however, perhaps the term POCD itself is too broad and vague and actually represents several distinct neuropathologic conditions, rather than one underlying common pathophysiology. If so, this may have important implications for the identification of risk factors as well as prevention and treatment strategies and may explain the heterogeneity in findings across the literature. In this context, Evered et al. have previously suggested that the overarching term POCD should be changed to "delayed neurocognitive recovery" for symptoms expected to have been resolved before 30 days and "postoperative mild neurocognitive disorder" or "postoperative major neurocognitive disorder" for an expected recovery between 30 days and 12 months, depending on the severity of symptoms [6]. erefore, future efforts should perhaps not so much focus on defining diagnostic criteria for the overarching concept of "POCD" itself, but rather more specifically, for distinct postoperative neurocognitive disorders comprised within this umbrella term. Moreover, it is also well possible that some cognitive domains may be more vulnerable after surgery and anesthesia than others, or that a cognitive decline in some domains may affect patients more severely than a decline in other domains. We believe that identification of those domains that are particularly relevant in the context of POCD is an important topic for future research because broad testing for a decline in all cognitive domains may neither be necessary nor ideal. Focusing on those domains which are most vulnerable and/or most severely affect the patients' wellbeing would reduce the patients' burden to undergo a whole battery of neurocognitive tests, facilitate clinical testing, and allow for a more efficient allocation of clinical and research resources. e results of this systematic review must be viewed in the context of its limitations. First, this was a focused review, deliberately limited to the literature published in the last two years, and it thus does not claim to be an exhaustive review of all published literature on the topic. However, we specifically attempted to assess how POCD is currently diagnosed, and older literature would merely have been of historic interest. We only searched one database, however, one with comprehensive coverage of reputable medical literature. It was neither our intention to identify gray literature on the topic, nor to identify literature published in predatory or otherwise untrustworthy literature. We are also aware that the mere fact that a majority of studies used a certain definition to diagnose POCD does not necessarily mean that this is the optimal way to diagnose POCD. It does, however, suggest that there is a rather broad consensus among researchers that certain tests at certain time points may play a useful role and can serve as a starting point towards a more uniform definition and assessment of POCD in clinical practice and in research settings.
In conclusion, this systematic review identified the lack of a consistent approach towards defining POCD. Commonalities were identified which may serve as a common denominator for deriving consensus based diagnostic guidelines for POCD. However, more research is necessary to characterize the diagnostic accuracy of the tests used to identify a postoperative cognitive decline and on what would constitute a "positive" test result. Finally, future efforts should perhaps not so much focus on finding a uniform definition of POCD itself, but rather on characterizing and defining distinct postoperative neurocognitive disorders comprised within this overarching term.

Data Availability
All data are extracted from the published literature, which are available from the respective publisher upon request.

Conflicts of Interest
Kim van Sinderen, Lothar A. Schwarte, and Patrick Schober report no conflicts of interest regarding the publication of this paper.