Automation of Legal Precedents Retrieval: Findings from a Literature Review

. Judges frequently rely their reasoning on precedents. Courts must preserve uniformity in decisions while, depending on the legal system, previous cases compel rulings. Te search for methods to accurately identify similar previous cases is not new and has been a vital input, for example, to case-based reasoning (CBR) methodologies. Tis literature review ofers a comprehensive analysis of the advancements in automating the identifcation of legal precedents, primarily focusing on the paradigm shift from manual knowledge engineering to the incorporation of Artifcial Intelligence (AI) technologies such as natural language processing (NLP) and machine learning (ML). While multiple approaches harnessing NLP and ML show promise, none has emerged as defnitively superior, and further validation through statistically signifcant samples and expert-provided ground truth is imperative. Ad-ditionally, this review employs text-mining techniques to streamline the survey process, providing an accurate and holistic view of the current research landscape. By delineating extant research gaps and suggesting avenues for future exploration, this review serves as both a summation and a call for more targeted, empirical investigations.


Introduction
Civil, criminal, and administrative courts are challenged by the increasing need for justice systems' intervention in private and public affairs.Their actions must result in prompt and consistent judgments (Gomez, 2021;Rhode, 2004).Although the decision-making process of justice courts affects many facets of citizens' lives, these institutions have limited resources and strive to keep up with the rising caseload (Popova et al., 2021;Susskind, 2020).
The basis of the rationale of judges in national legal systems is precedent.Judges often follow precedent closely for legal certainty.Otherwise, their rulings could be appealed to higher instances (Guillaume, 2011).
In the same way, the Common Law system considers similar past cases to be precedents, implying that a result in a current issue is compelled by past issues (Rigoni, 2014).Even in places that adopt Civil law, courts are required to regard former rulings when there is enough uniformity in case law.Typically, when consistent jurisprudence is formed, precedents become "soft" law, and courts consider them when making decisions (Fon & Parisi, 2006).
Precedents are also fundamental to case-based reasoning (CBR).CBR considers similar previous cases to employ prior knowledge in answering new questions.CBR can clarify new situations by reasoning from precedents (Kolodner, 1992).Arti cial Intelligence (AI) and Law, a branch of AI, extensively utilize CBR to explore legal case-based reasoning (Roth, 2003).
Although research in CBR has been utilized in legal practice since the 1980s, techniques for identifying precedents are reasonably young and understudied in AI and Law.While methods for mining textual data and natural language processing (NLP) have evolved and provide promising opportunities, the number of papers that studied strategies for detecting similarity and recognizing such past cases is scant.
To our knowledge, no prior work has described the methodologies used to retrieve legal precedents [1].In the same way, the research state on this subject needs clari cation so researchers and courts can consider such AI-based assistants.This paper identi es the most promising ndings and the knowledge gaps about how legal practitioners can employ AI to retrieve similar cases.Moreover, we investigate the effectiveness of text mining (TM) and NLP in this semi-automated systematic review of the literature.An earlier version of this paper has been presented as a preprint (Mentzingen et al., 2022).
Mainly, we concentrate on these research questions:

Theoretical Background Systematic Literature Reviews
We intend to present an unbiased literature assessment on automating legal precedent identi cation.To this purpose, we conducted a systematic literature review (SLR).An SLR is a method of synthesizing scienti c data from explicitly de ned research questions.It follows rigorous procedures to locate, select, and assess relevant scienti c research.SLRs collect and critically analyze data from selected studies.(Moher et al., 2009).
The systematic review process is based on prede ned criteria and protocol (Jahan et al., 2016), constituting evidence syntheses of high value used to inform decisions.However, it frequently takes one to two years to complete the process under the methodological rigor that renders SLR evidence reliable (Bero et al., 2012;Ganann et al., 2010;Khangura et al., 2012;Tsafnat et al., 2014).Garritty et al. (2021) mention that this aspect reduces the usefulness of SLRs for not tackling the time constraints of stakeholders.

Rapid Literature Reviews
Various strategies exist to make reviews more time-e cient.These strategies can be employed separately or simultaneously.Review shortcuts are among these mechanisms, through which one or more steps in a systematic review can be simpli ed or omitted.Moreover, the typical systematic review approach is accelerated by automating review steps (Tricco et al., 2017).In this sense, Rapid Reviews (RRs) became an alternative method to save time and resources on literature reviews.At the same time, the core principles of knowledge synthesis are present (Tricco et al., 2017).
Despite a surge in RR production, their process is still underdeveloped.The de nition of an RR is still not universally recognized (Khangura et al., 2012).An extension is in progress to Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) for RRs (Stevens et al., 2018).However, until it is formally nalized, authors are encouraged to follow the generic PRISMA criteria as much as possible and adapt them appropriately (Garritty et al., 2021).
PRISMA's four stages (identi cation, screening, eligibility, and inclusion) are used in this study to guide the critical components of an SLR.

Automating Systematic Literature Reviews
Numerous academics studied methods for automating the SLR procedure (van Dinter et al., 2021).Conducting an SLR according to the best standards and with the required level of rigor is complex, developed in multiple stages, and considerably time-consuming (Feng et al., 2018).While preserving the rigor expected from SLRs, text analytics and machine learning techniques could solve this scaling issue (Zimmerman et al., 2021).For this reason, automated literature reviews have been utilized for numerous subjects, including tourism and hospitality (António et al., 2019), climate change (Schweizer et al., 2022), and healthcare (Golinelli et al., 2022).
A recent review screened 41 articles focused on automating SLRs.It concluded that the primary election of studies was the most automated step.Even though various studies have proposed automation techniques for SLRs, none have automated the planning and reporting phases (van Dinter et al., 2021).
Four groups of applications were identi ed based on the SLR automation solutions proposed in 32 studies: automated document/text categorization, text mining based on visual methods (VTM) such as word clouds, federated search strategy, i.e., searching on multiple data sources at once, and document summarizing (Feng et al., 2018).In this study, we used Python (Van Rossum & Drake, 2009) and a combination of TM methods to identify primary studies for human screening.
VTM enabled the determination of the most pertinent terms and authors employing keywords and word clouds produced from the most frequent terms.Also, we used the vectorial representation of the studies to automate their classi cation.Documents were deduplicated based on their similarity and ltered based on keywords.
Lastly, Latent Dirichlet Allocation (LDA) was utilized to determine document admissibility.It was made by associating a Topics' distribution to each paper, each Topic with its corresponding probability.This way, documents were softly clustered according to the Topic with the most signi cant likelihood.The number of records still present following each phase is shown in Fig. 1.

Keyword Identi cation
Precedents are in uential in resolving any legal question (Fon & Parisi, 2006).According to the Legal Information Institute, based at Cornell Law School, a precedent is a decision that serves as a starting point for resolving later cases.These cases should involve similar facts or legal matters.However, a case cannot serve as a precedent if its facts or issues differ from the current case (Cornell University Law School, 2020).
In this sense, the identi cation of legal precedents is anteceded by nding previous similar cases.In this sense, we derived central expressions from the research questions to allude to detecting similar cases.Also, we used 'precedent' as the corresponding legal term and consulted the study from Mandal et al. (2021) as a seminal reference on identifying precedents in justice courts.The resulting set was 'precedents identi cation', 'precedent retrieval', and 'case similarity'.Since CBR models are used in many disciplines, including engineering and medicine, we added the term 'case-based reasoning' as a potential alternative.We made the keyword 'legal' a required component to focus the search on the legal segment.

Electronic Databases and Eligibility Criteria
We used Scopus, a publication of Elsevier, and Web of Science (WoS), a database managed by Thomson Reuters, as sources because of their extensive publications' coverage.Besides that, these are the most used electronic databases for bibliometric analyses (Mongeon & Paul-Hus, 2016).
Given that English is the language of most relevant research, the search was limited to English to reduce obstacles with text mining (António et al., 2019).Additionally, to appraise the most recent methodologies, in line with advancements in TM and NLP, we restricted the search to research published from the 2000s onward ('Year'> 1999).We did not limit the publication categories provided by the search because precedent identi cation is still a relatively unexplored topic.Fig. 2 and Fig. 3 show the queries on Scopus and WoS, respectively.
The results from the search queries were combined into a single dataset that underwent preprocessing, including deduplication and feature engineering, before they could be employed for further studies' screening.The details about the extraction and preprocessing of the sources list are presented in Appendix A.

Screening using the Keyword Frequency
The queries for searching the electronic databases could retrieve any studies related to legal precedents, even those not involving computational methods.Thus, we wanted to apply a screening process so that the publications not mentioning computational techniques for identifying similar cases could be disregarded.We then chose to semi-automate the screening process of the remaining 160 documents.This process started with tokenizing and stemming the abstracts, from which unique tokens (3256 unigrams and 14675 bigrams) were found.The number of occurrences of these tokens is shown in Fig. 4 and Fig. 5.
Stemmed words (tokens) might be indicative of computer-aided precedent retrieval applications.We used word clouds for this purpose.The words appear more prominent in this visual representation the more frequently they occur in the corpus.The tokens shown in Fig. 6 have been removed to prevent nondiscriminatory words from crowding word clouds.
When building the word cloud using unigrams, the tokens 'reasoning', 'cas', and 'casebased' were discarded because they appeared in over half of the texts.Similarly, 'casebased reasoning' exceeded 30% of the samples and was removed when the word cloud was built from bigrams.The word clouds were created using the WordCloud library (Oesper et al., 2011) and are shown in Fig. 7 and Fig. 8.
The terms' syst' ('support syst' included), 'arti c intelligenc', 'machine learning,' 'natur language,' and 'computat model' pointed to the most promising literature segment.Following this, the only records kept were those containing such tokens in the title, abstract, or keywords, yielding 101 samples.

Eligibility Screening based on Topic Modelling
During the eligibility assessment stage of the SLR, the publications are assessed for eligibility based on pre-speci ed methods.Therefore, the review team must meet the criteria for inclusion and excluding publications.The result of this stage is the reduction of the studies under evaluation, keeping the evidence set that can provide answers to the research questions.(Frampton et al., 2017).
We used topic modeling to cluster the papers and choose the studies with a higher probability of effectively responding to the research questions.Although Topic Modeling is not a recent concept, it is remarkable that just a few publications use this method to cluster research papers (Asmussen & Møller, 2019).We decided to employ LDA (Blei et al., 2003).It is usually the preferred approach for topic modeling and is considered state-of-the-art (Asmussen & Møller, 2019), using the Gensim (Rehurek & Sojka, 2011) library.LDA is a topic model that uses a generative statistics approach to unveil semantic Topics in extensive text collections and classi es documents into these Topics.The documents are categorized according to their distance from a given Topic (António et al., 2019;Hu, 2009).
We classi ed the documents according to four Topics.The one with the most explicit link to the research subject was chosen as the eligibility criterion for including a document in the literature review.It resulted in forty eligible documents.The procedure for obtaining the optimal number of Topics and the Topics' description is presented in Appendix B.

Full-text Screening for Inclusion
After clustering the documents according to their most relevant topics and selecting the set of publications mainly related to Topic 2, the remaining forty studies had their full texts examined to eliminate those irrelevant to the automation of similar case identi cation or precedent retrieval.The nineteen publications that focus on subjects unrelated to the research questions are synthesized in Appendix C. It contains the excluded document titles and respective research topics.In Appendix D, we included the method used for validating the clustering through topic modeling as an effective method to assess document eligibility.
In conclusion, the ultimate analysis included the twenty-one papers in Table 1.

Results
Descriptive Analytics Examining Fig. 9 demonstrates that the studies' publications were dispersed in numerous sources, including conference proceedings.The peer-reviewed articles were found in six journals, one per journal.
When we examine the number of publications per year and geographical origin (Fig. 10), we observe a few studies, approximately one per year, in the rst decade of the 2000s.After the initial studies, there was a long period in which there were virtually no publications in the eld.After 2016, a growing interest in this eld is observable, with Indian researchers contributing the most to this subject.This growth is possibly a result of the developments observed in TM and NLP in the latter half of the 2010s: word embeddings and neural network (NN) applications to NLP (Mikolov et al., 2013), recurrent neural networks (RNN) (Cho et al., 2014) and Long Short-Term Memory networks (LSTM) (Yao et al., 2019), Attention Mechanisms and Transformers (Vaswani et al., 2017), and language models pre-trained through Transfer Learning (Howard & Ruder, 2018;Qiu et al., 2020).

Content Analysis
The two rst relevant studies (Elhadi, 2000(Elhadi, , 2001) ) presented a simpli ed model to store and retrieve information in the legal context.The model approximated the human cognitive process and combined keywords with related scenarios to facilitate understanding.The scheme matched individual cases to story patterns, clustering cases with some similarity.However, the patterns, keywords, and importance were manually crafted according to the speci c domain, e.g., bankruptcy legislation.
The following paper proposed a model using content vectors to recover principles and previous cases.In this work, the author assessed similarities in each case's actions and events (McLaren, 2003).Content vectors summarize the information included in intricate relational structures.A case description's content vector identi es the functors[2] that were used in that description and how frequently they occurred, including connectives, relations, object properties, and functions (Forbus et al., 1994).
A 2004 paper presented an algorithm for obtaining valuable legal information, constructing case examples from prior lawsuits, recognizing comparable samples, and re ning them by combining cases and deleting irrelevant data.It required encoding the texts as ordered sets of keywords and evaluating the similarity between pairs of case scenarios.To measure similarity, the authors applied word count-based metrics.Using predetermined crime types, the nearest neighbors created clusters (C.-L.Liu et al., 2004).
The subsequent research introduced a function based on Nonlinear Nearest-Neighbor (NNN) for nding similar accident compensation cases.This method used "dimensions" to compare cases.The dimensions can be interpreted as factors representing, analogizing, and differentiating legal cases.From data observation, groupings of variables were set to represent clusters of cases.The authors employed four groupings of variables (Wang & Zeng, 2005).Nouaouria et al. (2006) presented a prototype of a tool for applying interpretative case-based reasoning to verdicts involving alcohol consumption and smoking under Islamic legislation.The representation of cases was based on attributes inferred as relevant to historical decisions, such as the type of fact and the product used.The similarity was addressed by grouping cases with similar attributes.
Between 2008 and 2009, a methodology inspired by CBR for retrieving legal precedents depicted lawsuits as pairs of attributes and their respective values.Knowledge from experts and critical legal circumstances were the sources for attributes and values.After, the process involved a case-similarity calculation, and cases with strong similarities were selected according to the value distribution of each attribute (Raman & Palanissamy, 2008, 2009).
In all previously analyzed studies, cases were represented using a predetermined set of dimensions corresponding to the attributes of each sample.Consequently, the factors or dimensions describing each case were subject-speci c.Indeed, there were two broad approaches to applying information retrieval (IR): methods built on manual knowledge engineering (KE) and other methods based on NLP.The existing technology and scienti c knowledge limited the former methods.So far, the studies have emphasized KEbased retrieval, even though these methods were not viable in the long run (Maxwell & Schafer, 2008).
McLaren and Ashley (2011) evaluated the in uence of temporal orderings of facts in distinguishing among cases in ethical case issues.Each case was represented by a prede ned set of actors, objects, actions, and events that appeared in the narrative.Also, temporal knowledge was expressed through a time quali er: an association between time and how a fact relates to other facts.The study did not detail the logic employed to obtain the time quali er for each case.The authors could not con rm the hypothesis that introducing temporal knowledge into a computational model would increase the accuracy of the model's predictions.Eyorokon et al. (2016) presented Kyudo, a system that used conversational CBR to support knowledge discovery.Cases were represented in Kyudo as sequences of questions or knowledge goals represented by TF-IDF vectors.The similarity between answers or new goals with the existing knowledge base was calculated as a dialogue between the user and the system.By expressing knowledge goals as multidimensional vectors of semantic attributes, the system could recognize similarities with other knowledge goals and alert the user to other pertinent goals as the number of examples increased.
Later, a Business Intelligence (BI) solution was proposed by Oconitrillo and De La Ossa Osegueda (2017) to support judges' decision-making.The authors advocated a new formal representation of legal cases, depending on facts and attributes.It considered how the law is applied during a judge's reasoning process to decide on each case and the relations that a judge develops among such regulations.Nevertheless, the authors did not incorporate a solution for automating the retrieval of attributes and their values.
With a study authored by Kulkarni et al. (2017) proposing the detection of precedents using Regular Expressions rules, cosine similarity between Doc2Vec embeddings (Le & Mikolov, 2014), and topic modeling (Blei et al., 2003), we notice a move toward TM and NLP.Another research published in 2017 detected precedents integrating genetic algorithms (GAs) with k-nearest neighbors (KNNs) (Zhang et al., 2017).
Additionally, the performance of legal catchphrases for precedent retrieval was studied.Thuma and Motlogelwa (2017) isolated legal catchphrases from new cases, primarily bigrams and trigrams represented by TF-IDF vectors, and compared them to gold standard catchphrases extracted from previous cases.The results indicated that the technique needed improvements.
In a 2018 article, the citations of a document were used to perform association-rule mining as an alternative method of identifying similarities.This time, cases with matching citations were considered comparable (Nair & Wagh, 2018).
To identify similar cases from unstructured text, an unsupervised Autoencoder (Baldi, 2012) was used as a substitute for word embeddings based on neural networks, such as Word2Vec (Mikolov et al., 2013).
The Autoencoder was utilized with LSTM to retrieve similar cases from unstructured text.It reportedly led to quicker training and more accurate results (Amin et al., 2019).
Using Named Entity Recognition (NER) (Mansouri et al., 2008) to preprocess documents and the input query, More et al. (2019) reported the extraction of data from legal texts.First, the vectorization used TF-IDF, while the comparison of documents used BM25 (Robertson & Zaragoza, 2009).This algorithm also won the Arti cial Intelligence for Legal Assistance (AILA) track at the 2019 FIRE Conference (Bhattacharya et al., 2019), in which the task involved identifying legal precedents.In the subsequent edition of the FIRE Conference, Di Nunzio (2020) explored techniques to reduce the dimensionality of vectorized texts by employing lemmatization and stemming.The author compared the techniques for retrieving precedents with no outstanding results for any method.
Recently, the novel text embedding technique Top2Vec (Angelov, 2020) was used to retrieve precedents in combination with BM25 in a paper published by Arora et al. (2020).The authors evaluated the similarity obtained from Top2Vec embeddings with BM25, outperforming BM25-only measures.
The most recent publication in this rapid literature review was released in 2021.Mandal et al. (2021) presented a comprehensive study of fty-six distinct combinations of document representation techniques (eight) and similarity measures (seven) to identify similarities between case report texts.Their methods included author-designed methodologies, BERT, and Law2Vec (Chalkidis, 2018), a set of Word2Vec embeddings trained on legal corpora.
When comparing the various methods for similarity measurement, the authors noticed a similar performance between neural network-based embeddings (Word2Vec, Doc2Vec, and Law2Vec) and conventional embeddings, whereas BERT produced unsatisfactory results.They also noted that the conventional vectorization techniques that represent text using bag-of-words, for instance, TF-IDF, outperformed the more sophisticated methods that consider the context (such as Law2Vec and BERT).
The Era of Manual Knowledge Engineering (2000 -2009) As identi ed in the content analysis, in eight papers published in 2009 and before, cases were represented using a predetermined set of dimensions corresponding to the attributes of each sample that may be grouped as methods built on manual knowledge engineering (KE).Early computational models, like Elhadi's (2000Elhadi's ( , 2001) ) studies, aimed to store and retrieve information by manually matching cases to story patterns.The efforts during this period primarily relied on predetermined sets of attributes derived from keywords and factual aspects, which were crafted to the speci cs of the domain, such as bankruptcy legislation.Such dimensions were handcrafted from intricate cognitive structures like content vectors (McLaren, 2003) or text encoding as ordered sets of keywords (C.-L.Liu et al., 2004).
The commonality in all these methods was their reliance on domain expertise and manual screening of cases to create such attributes.Such methods made scalability a persistent challenge, proving them unfeasible when dealing with vast numbers of legal documents.Even as computational power grew, the emphasis remained on KE-based retrieval methods (Raman & Palanissamy, 2008, 2009).
The Wave of Arti cial Intelligence (2016 -Onwards) The next signi cant evolution in the domain came almost a decade later.A renewed interest in legal precedent retrieval was marked by a notable shift towards utilizing NLP and ML techniques to identify precedents through textual similarity.Studies like Eyorokon et al. (2016) and Kulkarni et al. (2017) marked the beginning of this transition.The authors of the rst study used TF-IDF to represent documents as vectors of important words.The approach of Kulkarni et al. (2017) combined Regular Expression rules with sophisticated techniques like Doc2Vec embeddings (Le & Mikolov, 2014) and topic modeling (Blei et al., 2003).
More contemporary methods, like the unsupervised Autoencoder paired with LSTM (Amin et al., 2019) and Named Entity Recognition (More et al., 2019), were employed as research evolved.The latter part of this era saw researchers evaluating many algorithms and representations to optimize precedent retrieval.Techniques like Top2Vec (Angelov, 2020) combined with BM25, a ranking function, began outperforming legacy methods.A comprehensive assessment by Mandal et al. (2021) encompassed fty-six unique document representation techniques and similarity measure combinations.Their observations testify to the potential of conventional vectorization techniques in identifying legal precedents while also indicating areas of improvement for methods that heavily rely on context.

The Pipeline of Legal Precedents Retrieval
In the scholarly landscape focused on automating the identi cation of legal precedents, a noticeable structure has emerged, as depicted in Fig. 11.The initial phase of this process, here termed "representation", involves encoding each legal case based on a prede ned set of attributes, thereby preparing it for subsequent "similarity assessment."Following this second step, most studies incorporate "evaluation" as the nal stage.This phase is dedicated to assessing the effectiveness of the proposed techniques for legal precedent retrieval.
Concerning the representation of legal cases, the existing body of research can be categorized based on the speci c attributes employed to characterize each document.It is noteworthy that a singular study may incorporate methodologies from multiple categories.The delineated groups are as follows: Keywords: In this approach, legal cases were characterized through the utilization of speci c keywords or sets of keywords.
Facts: This category concentrated on representing cases based on factual elements, encompassing scenarios, actions, events, and the content of judicial decisions.
Time: This group of studies considered the temporal sequencing of the cases or the facts within each case.
Text-based vectors: Here, vectorial representations of text are employed.These may be generated through various means, such as regular expressions or vectorization techniques, and may or may not incorporate semantic considerations.
References: This category pertains to representations that account for the statutes cited in each case or cross-references to other legal cases.
The similarity assessment phase constitutes a critical component of the legal precedents' retrieval pipeline, serving as the linchpin that connects the representation of legal cases with the subsequent evaluation of a methodology's e cacy.Following the encoding of cases based on speci c attributes as previously described -from keywords and factual elements to text-based vectors and references -the similarity assessment phase employs computational techniques to determine the resemblance between cases.The methods of similarity assessment identi ed among the studies are multifaceted, encompassing clustering, pairwise-distance calculations, attribute matching, association rules, and ranking functions.These techniques measure how closely a target case aligns with one or more source cases, thereby enabling the accurate retrieval of relevant legal precedents.Therefore, the e cacy of similarity assessment methodologies plays an instrumental role in enhancing the precision and utility of automated legal precedent retrieval systems.The studies can be classi ed into the following categories according to the type of similarity assessment: Clustering: This method for identifying similar cases involves retrieving sets of documents based on document clustering.
Pairwise distance: The potential precedents were identi ed by calculating a distance measure, mainly cosine similarity, to all other cases in the dataset.
Attribute matching: In such studies, cases sharing one or more attributes or facts were considered similar.
Association rules: Similar cases were identi ed by mining frequent itemsets and calculating association rules metrics such as Support and Con dence.
Ranking function: These studies used a ranking function to compare documents to a given query or another document.Most studies in this category employed BM25 to score documents, while Divergence from Randomness (DFR)[3] (Amati & Van Rijsbergen, 2002) was also observed.
We designated the terminal phase of the legal precedents' retrieval pipeline as the "evaluation" stage, which substantiates the employed methodologies' utility and precision.This section gauges the effectiveness of the techniques demonstrated in the studies in automating the identi cation of relevant similar cases.A well-constructed evaluation, therefore, not only con rms the validity of the similarity assessment techniques but also serves as a benchmark for future studies seeking to contribute to this growing eld of research.Notably, the evaluative methods employed by researchers in this domain can be broadly classi ed into three categories: Document citation: This approach entails juxtaposing the results produced by the computational model against the cases cited in the target document.Such a comparison provides an empirical assessment of the model's ability to identify precedents already recognized and cited in legal literature.
Expert evaluation: Some studies opt for a more comprehensive approach by comparing the model's results with cases appraised similarly by legal experts.This method adds a layer of professional scrutiny, offering insights into how well the computational methods align with human expertise in the eld.The comparison of a target document is made with all the documents included in the corpus and is not limited to the documents cited by the target document.
Authors' appraisal: A subset of studies adopts a somewhat subjective methodology, wherein the authors manually compare the results generated by their model against a small set of case pairs.Typically, this involves examining one to three pairs of cases.Although less rigorous, this evaluation form is an initial test for the model's performance.
Ultimately, a handful of studies forego the evaluation phase entirely, either due to the exploratory nature of the research or other constraints.In these instances, the absence of an evaluative component leaves the model's effectiveness untested, limiting the ndings' generalizability.

The Taxonomy of Legal Precedents Retrieval
Based on our review of the existing literature and ndings detailed in sections 4.2.1 to 4.2.3, we propose a taxonomy to systematize the eld of legal precedents retrieval in Fig. 12.The aim is to provide a structured framework that categorizes the studies according to the technological context and techniques employed, facilitating a more comprehensive understanding of the eld's evolution and current trends.
We also classi ed the existing studies under this taxonomy according to the characteristics described in Table 2.  (Elhadi, 2001) Manual KE Keywords, Facts Clustering Authors' appraisal (Elhadi, 2000) Manual KE Keywords, Facts Clustering Authors' appraisal

Data Sources
In the topic of legal precedent retrieval, academic research has utilized a diversity of data sources from varied geographical and legal contexts.An observation that stands out is the wide-ranging geographical representation of data sources, lending a global perspective to the research.Early studies, such as those by Elhadi (2000Elhadi ( , 2001) ) and McLaren (2003), mainly used U.S.-based data focusing on bankruptcy law and professional ethics cases.These were followed by research that expanded the geographical scope to include criminal summary judgments from Taiwan (C.-L.Liu et al., 2004) and Islamic legislation (Nouaouria et al., 2006).There were also more focused datasets, such as those involving the Law of Negligence (Raman & Palanissamy, 2008) and judicial declarations of abandonment in Costa Rican Juvenile Courts (Oconitrillo & De La Ossa Osegueda, 2017).
Notably, there has been a prominent utilization of Indian court data in recent studies.Beginning with Thuma & Motlogelwa (2017), who used case documents from the Indian Supreme Court, subsequent research such as that by Kulkarni et al. (2017) and Nair & Wagh (2018) further delved into various facets of the Indian justice system.Kulkarni et al. (2017) analyzed a broad dataset comprising both court cases and statutes.Nair & Wagh (2018) used cases under the Information Technology Act 2000 conducted in different high courts in India.This trend continued in recent years with work by More et al. (2019), Bhattacharya et al. (2019), Di Nunzio (2020), J. Arora et al. (2020), andMandal et al. (2021), all of whom engaged with case documents adjudicated by the Supreme Court of India.The emphasis on Indian court data enriches the eld by incorporating the complexities and idiosyncrasies of a legal system in uenced by a rich tapestry of cultural, historical, and social factors.
Adding another layer of complexity are studies like that of Amin et al. (2019), which employed a mixedlanguage customer support tickets dataset from an automotive company in Germany, and Zhang et al. (2017), who investigated Chinese statutes and judicial cases.These data types, though unusual, pave the way for exploring the adaptability of legal precedent retrieval algorithms to varied data formats and languages.The diversity in data sources and geographical settings reveals the demand from multiple justice systems and jurisdictions for legal precedent retrieval methods.It also embeds a challenge for the adaptability of such models and raises questions about their universal applicability, thereby serving as a compelling avenue for future research.
[2] A Functor is de ned as a function that converts items of one set into those of another set.
[3] The DFR framework aims to rank documents based on the idea that terms that diverge signi cantly from their expected random distribution in a corpus are informative and thus useful for determining the relevance of a document to a query.Essentially, the more a term's distribution in the document set deviates from a random distribution, the more "useful" or "informative" that term is for distinguishing relevant from non-relevant documents.
[4] This is study describes the Arti cial Intelligence for Legal Assistance (AILA) track at the 2019 FIRE Conference and did not include representation or similarity assessment stages.
[5] The authors did not describe the similarity assessment method in detail.

Discussion
RQ1: How did researchers address the challenge of automatically identifying prior relevant cases, and what methods have been used in the screened studies?
Arti cial Intelligence Wave (2016-Onwards): The subsequent wave saw the use of ML and NLP techniques to handle the task.Methods like TF-IDF vectors (Eyorokon et al., 2016), Regular Expressions combined with Doc2Vec embeddings (Kulkarni et al., 2017), Named Entity Recognition (More et al., 2019), and Autoencoders (Amin et al., 2019) were employed.These methods signi cantly improved scalability and accuracy.
RQ2: What are the most promising methods for the automated search of legal precedents, and what research gaps exist?
Based on the review, the most promising automated legal precedent retrieval techniques seem to lie within NLP and ML.While early efforts were primarily built on manual knowledge engineering, the shift towards NLP and ML has heralded promising advances.Document embeddings such as Doc2Vec and topic modeling (Kulkarni et al., 2017;Mandal et al., 2021) have effectively represented textual data.
Additionally, methods such as Top2Vec combined with BM25 ranking functions have been highlighted as outperforming approaches in a recent precedent retrieval challenge (J.Arora et al., 2020;Bhattacharya et al., 2019).
Autoencoders, especially when coupled with LSTMs, have been claimed to be effective for training on unstructured text, offering both speed and accuracy (Amin et al., 2019).NER has also been employed as a preprocessing step to extract meaningful data from legal documents (More et al., 2019).Conventional vectorization techniques, such as TF-IDF, still maintain robust performance in similarity measurements, as evidenced by the ndings of Mandal et al. (2021).
Despite the progress, there are some glaring research gaps in the eld.One signi cant gap is the continuation of studies employing AI to retrieve similar cases.The lack of consensus about the most effective methodology is evident.No de nitive technique stands out as the best for all types of legal documents and jurisdictions, indicating room for more comparative studies.Additionally, no work has analyzed the effect of text preprocessing on the results, which may prove to be decisive considering that, so far, no methodology presents superior performance.
Moreover, there is a lack of uniform benchmarks for comparing different methodologies.While Mandal et al. (2021) conducted a comprehensive study on various document representation techniques and similarity measures, such exhaustive evaluations are not commonly found in the existing literature, and some studies did not evaluate the proposed methods.Larger corpora and expert-supplied ground truth should be incorporated into new studies, preferably in new legal contexts.It is crucial to notice that the mentioned works were conducted on very small corpora, with the most comprehensive study comparing similarity measurement methods using only 50 pairs of documents as the gold standard.
Temporal elements in cases, although explored, have not been comprehensively studied or incorporated into existing models, leaving a gap in our understanding of how temporal relationships between events might affect the relevance of legal precedents.Another aspect that has not yet been thoroughly evaluated is the applicability of models across different legal domains and jurisdictions, considering studies focused only on a single corpus each.
Moreover, while there has been a move toward utilizing sophisticated techniques like NLP and ML, it is worth noting that conventional methods like TF-IDF are still highly effective.This effectiveness leaves an open question about the actual incremental bene ts of using more complex methods, which remains underexplored.
Finally, there is plenty of room for assessing the effects of contextual understanding on this topic.While techniques considering semantics have been effective, there is still a gap in understanding the deep context of legal language, which neural embeddings like BERT and other Transformers may ful ll.
In summary, the eld has seen a promising shift toward automation using AI techniques, but many questions and gaps remain, suggesting avenues for further research.
RQ3: What is the taxonomy of existing methods, and what is their mainstream?
The taxonomy of methods for automating the identi cation of legal precedents has evolved signi cantly over the years, with two distinct eras emerging, as described in Sections 4.2.1 and 4.2.2.In terms of operational pipeline, the taxonomy can be broadly divided into three main phases: Representation, which involves encoding legal cases based on prede ned attributes; Similarity Assessment, the techniques employed to measure the resemblance between cases; and Evaluation, the nal stage of measuring the effectiveness of the retrieval techniques.
As per the existing literature, there has been a mainstream shift towards AI-based methods in recent years, with a particular emphasis on vectorial text representations and advanced similarity assessment techniques like ranking functions and pairwise distance measures.The comprehensive assessment by Mandal et al. (2021) stood out as a seminal work that spanned over fty-six unique combinations of document representation techniques and similarity measures, validating the e cacy of conventional vectorization techniques while indicating areas for improvement.
In terms of evaluation, the most rigorous methods involve either document citation or expert evaluation, with a less rigorous yet prevalent approach involving authors' appraisal of a handful of case pairs.However, it is noteworthy that some studies have foregone the evaluation phase entirely, possibly due to the lack of an annotated corpus or other constraints, which leaves a gap in validating the effectiveness of the proposed methods.
China: One study has also investigated Chinese statutes and judicial cases (Zhang et al., 2017).
Germany: One unusual dataset employed was a mixed-language customer support tickets dataset from an automotive company in Germany (Amin et al., 2019).
Finally, multiple types of documents have been employed by researchers.Most studies used case documents from courts, while some research incorporated statutes (Kulkarni et al., 2017;Zhang et al., 2017) or even customer support tickets (Amin et al., 2019), a non-traditional source for this eld.
Using data from multiple justice systems and jurisdictions embeds a challenge for the adaptability of legal precedent retrieval models and raises questions about their universal applicability.This diversity serves as a compelling avenue for future research.
RQ6: Are there real-world applications of this topic?
In synthesizing the range of research approaches discussed in Section 4.2, it is notable that none of the studies in this literature review report real-world implementations of models designed for legal precedents retrieval.This gap presents a central direction for future research.While various models have been proposed, tested, and evolved to tackle different aspects of precedent retrieval-from early KE systems to more recent systems based on neural network embeddings-their applications have remained mainly theoretical.The absence of real-world case studies in applying these models to actual legal systems raises pertinent questions: Why have these models not yet been adopted in judicial settings?Are there inherent limitations in the current models that deter their application, or are there external factors such as ethical, legal, or operational constraints?
This situation underscores the need for the following research phase to focus on re ning algorithms and methods and transitioning from theoretical frameworks to applied solutions.Implementing these models in real-world scenarios could provide insights into their e cacy, scalability, and limitations.Similarly, indepth investigations into the challenges preventing the implementation can yield valuable lessons.Both avenues would signi cantly contribute to the eld's maturity, ensuring that future advancements are theoretically robust and practically applicable.

Conclusions
This literature review aimed to synthesize the state of research on automating the identi cation of legal precedents, focusing on the techniques employed, in uential journals, and authors while assessing the effectiveness of these techniques and existing research gaps.Using textual mining methods for semiautomating the review process, this study identi ed 70% of relevant publications, thereby reducing the number of studies that required in-depth analysis by 82.5%.This highlights the transformative potential

Figures
Page 30/   The search query used on Scopus.
The search query used on Web of Science.
Page 32/37     The number of publications per source and type.
Figure 10 Publications per year and country.
The pipeline for automating the identi cation of legal precedents.
Figure 12 Taxonomy of the legal precedents retrieval eld.
Figure 13 A comparison between overlap and coherence for different numbers of topics.

RQ1:
How did researchers address the challenge of automatically identifying prior relevant cases, and what methods have been used in the screened studies?RQ2: What are the most promising methods for the automated search of legal precedents, and what research gaps exist?RQ3: What is the taxonomy of existing methods, and what is their mainstream?RQ4: What are the research domain's most in uential journals and authors?RQ5: What data has been used in existing research?RQ6: Are there real-world applications of this topic?[1] The present paper is an extended and updated version of our paper [omitted for anonymity reasons], presented at [omitted for anonymity reasons][reference omitted].This paper incorporates feedback received, increases the search period, includes a new research question, and offers signi cantly more detailed results and discussion sections. 37

Table 1 .
Documents contained in the final selection (A = Article, CP = Conference Paper, PP = Proceedings Paper).

Table 2 .
Studies are classified according to the proposed taxonomy.