Current Approaches for Image Fusion of Histological Data with Computed Tomography and Magnetic Resonance Imaging

Classical analysis of biological samples requires the destruction of the tissue's integrity by cutting or grinding it down to thin slices for (Immuno)-histochemical staining and microscopic analysis. Despite high specificity, encoded in the stained 2D section of the whole tissue, the structural information, especially 3D information, is limited. Computed tomography (CT) or magnetic resonance imaging (MRI) scans performed prior to sectioning in combination with image registration algorithms provide an opportunity to regain access to morphological characteristics as well as to relate histological findings to the 3D structure of the local tissue environment. This review provides a summary of prevalent literature addressing the problem of multimodal coregistration of hard- and soft-tissue in microscopy and tomography. Grouped according to the complexity of the dimensions, including image-to-volume (2D ⟶ 3D), image-to-image (2D ⟶ 2D), and volume-to-volume (3D ⟶ 3D), selected currently applied approaches are investigated by comparing the method accuracy with respect to the limiting resolution of the tomography. Correlation of multimodal imaging could position itself as a useful tool allowing for precise histological diagnostic and allow the a priori planning of tissue extraction like biopsies.


Introduction
Examination of pathological alterations in human tissue by histology is an integral part of clinical routine. Countless staining protocols and immunohistochemistry applications have been developed for histology, thus enabling the identifcation of specifc cell types, subcellular structures, substrates, and disease biomarkers, which renders this approach extremely versatile.
However, histological evaluation requires access to tissue specimens. Such specimens are usually obtained by biopsy, with the associated discomfort and risks. Based on the specimen type, histology can be divided into (i) soft-tissue histology in which the specimen is typically embedded in parafn and cut with a microtome utilizing a static blade and (ii) hard-tissue histology for which the samples are embedded in resin, cut with a diamond saw, and then grinded down to a slice thin enough to be stained and imaged under a microscope. In both scenarios, the cutting is typically done lacking any a priori knowledge regarding the position of the region of interest (ROI), which in turn limits the prognostic value of the histology analysis and/or leads to a very timeconsuming serial cutting of the specimen. In hard-tissue histology, this situation is further complicated by the fact that the specimens are typically opaque and cut-grind techniques result in the loss of a substantial percentage of the material. Since classical histology is based on the evaluation of micrometer thin tissue slices by microscopy, the biopsies are typically only sparsely sampled, bearing the risk of missing important key aspects. In addition, intrinsically three-dimensional features such as metastatic volumes, fber orientations, and so on cannot be efectively accessed in planar slices. Tis can, in theory, be solved with serial sectioning and 3D reconstruction, but this is extremely labor-intensive.
A solution to some of these current shortcomings could be a combination of histology with high-resolution 3Dimaging techniques such as micro-CT and micro-MRI performed prior to sectioning. Tis would allow histology to be supplemented with measures of 3D features and to spatially localize the fndings of histology within the local 3D tissue environment thereby raising the prognostic value of the analysis. Furthermore, this combination would enable "guided sectioning" as demonstrated by Albers et al. [1] by using micro-CT scans of lung tissue to plan the subsequent sectioning and isolation of the ROI. Rau et al. [2] embedded artifcial markers together with their tissue specimen in order to accurately reconstruct the extracted human temporal bone based on landmark-registration. Tey argue that the presence of fducial markers and an additional planning phase before cutting can benefcially aid image-guided sectioning and computer-aided surgery.
One of the main problems of image fusion between histology and 3D-imaging is that the process of sectioning (especially in soft-tissue) can introduce nonuniform deformation which needs to be compensated by the employed registration pipeline. To account for this problem, a vast variety of combinations of diferent imaging techniques and registration strategies have been proposed. In the scope of this review, we assessed and grouped these strategies according to their dimensionality in three diferent chapters: (i) extracting a corresponding 2D cut section from the 3D data set for subsequent 2D-2D registration with the histological slice, (ii) placing the 2D histological slice into a 3D data set, and (iii) fusion of 3D serial sectioning with the 3Dimaging data. Tere is also a large variety of algorithms that are used in each of the diferent scenarios, starting from a simple manual registration of the data as performed by Mourad et al. [3] for the fusion of histology and micro-CT down to elastic image registration as employed by, for instance, Albers et al. [1,4,5]. Tus, the three chapters are subdivided into applications of elastic and nonelastic methods. Figure 1 illustrates the typical pipeline of combined 3D-imaging and histological tissue analysis as well as the discussed structure of the review.
Te reported relative registration accuracies are compared based on the resolution of the data to facilitate fnding a suitable strategy for the task at hand.
We started our search for literature intending to fnd approaches concerning the terms "CT," "MRI," and "histology" in synergy with the topic of "multimodal registration." In order to keep the number of publications tractable, we excluded publications printed before 2010. Te search was conducted through Google Scholar and Web of Science. Tis resulted in a preselection of forty-three papers, which we investigated for solutions to the problem of registration of hard-or soft-tissue with CT or MRI scans. Out of these forty-three papers, we identifed 19 that addressed the problem of multimodal image registration between MRI or CT and histology and thus met our criteria and were considered for this review. Research addressing monomodal registration was not excluded from our scope. Te described process is visualized in the literature including fowchart depicted in Figure 2.

Multimodal Image Registration
Te typical procedure of image registration is illustrated in Figure 3: in our case, one data set (in our case, the 3D data of CT or MRI or a virtual slice of those) is considered to be the ground truth and is not modifed during the entire process, usually referred to as "fxed image" [6][7][8]. Te second dataset (here the histology slice or a set of histology slices) is deformed during the process to "optimally" match the underlying fxed image; this image is usually called the "foating image." Registration, as described here, has been implemented in prominent software libraries dedicated to image processing like elastic [9]. Te presented algorithms difer mainly in two key aspects: (i) which type of deformation they allow for the foating image and (ii) which criteria (metric) they use to assess matching quality.
An important aspect of image registration is how the two images are compared once an ideal transformation of the foating image has been achieved. Two main strategies are commonly employed: (a) using the entire image information by, for instance, cross-correlation or mutual information and (b) using landmarks (either intrinsic ones if, for instance, implants are present in the data or generated by algorithms such as speeded up robust features-SURF or scale-invariant feature transform-SIF, to only name a few). In the frst case, the registration process is typically robust and tolerates partially missing data in the images or the presence of artifacts. However, both data sets have to have a large degree of similarity, like, for instance, CT and chemically stained histology. Te second case can also deal with vastly diferent image content like CT and immunohistochemistry, but identifcation of good landmarks may be challenging.
One drawback to comparing the performance of different registration algorithms or picking the optimal procedure for a given task is the lack of standardized quality measures. Typically, the same metric used for the registration process is used to perform quality assurance. Especially in the case of landmark-based approaches, an ideal match of the landmarks does not necessarily imply optimal registration of the entire data. A large variety of measures can be utilized to assess the quality of a given registration approach, with the simplest being a sheer calculation of the translational and rotational errors in plain units of distance and tilt. Other approaches estimate the L1-distance between two points in the target and moving images, as implemented by [10,11]. If of interest, statistical metrics associated with the used imaging technique like the Dice index [12] (also called F1-Score) are used.
In order to provide some means of comparison of the presented algorithms, we calculated relative accuracies based on the spatial resolution of the applied imaging techniques (although not all publications listed those parameters). In a later section, we compiled these values into three tables according to the dimension of the registration approach.
In the following, we will discuss registration strategies loosely grouped according to the dimension of the input data. (2D ⟶ 3D). Tis approach may seem most straightforward in terms of combining histology with CT or MRI. Te aim is to place the histology section into the 3D context of CT or MRI as depicted in Figure 4 for an application combining a Sanderson's Rapid Bone and Van Gieson-stained section with a micro-CT scan of a resin embedded vertebrae of a rat. Figure 4(a) illustrates the whole scan, which was iterated through in search of the best ftting cutting plane ( Figure 4(b)). Finally, both the histological section and the cut CT volume are fused in Figure 4(c).

Slice-to-Volume Registration
Following the CT scan, the embedded vertebrae were sectioned using a combination of a diamond-cut grinder and a LASER microtome. Te resulting section was stained with Sanderson's Rapid Bone and Van Gieson, scanned with a microscope (Axiovert 200 inverted microscope, Zeiss) and then manually positioned in the 3D data set visualized in VGStudioMax (Volume Graphics), a 3D rendering and analysis software. Figure 4(c) indicates a nearly precise match which proves two aspects: frstly the cutting plane refects a "strict" plane in the 3D data set and, therefore, allows to reduce the problem to a 2D-2D registration problem, if the correct virtual plane in the 3D data set can be identifed and secondly that hard-tissue embedded in resin is not subjected to relevant nonuniform deformations during the cutting process which eases the registration process as only a ridged body transformation needs to be found. In terms of parafn-embedded, soft-tissue sectioning results in local deformations especially in porous tissue like the lung as  Figure 1: An overview of the presented review concerning multimodal registration of 3D scans and 2D sections. Te methods described here extend the classical histological workfow with 3D imaging before sectioning. Tus, image representations of the tissue are present in two modalities. Te research considered is grouped according to the dimensionality of the addressed problem into slice-to-volume (2D ⟶ 3D), slice-to-slice (2D ⟶ 2D), and volume-to-volume (3D ⟶ 3D) registration. Each category is further subdivided into elastic and non-elastic algorithms. Tis results in a specialized selection of algorithms that match and remedy the corresponding severity of deformations introduced during the sectioning process.
reported by Albers et al. [1]. However, even in lung tissue, the deformation is mainly restricted to the cutting plane due to the nature of the cutting system. Tus, in all cases, the approach can be split into the identifcation of the in silico cutting plane in the 3D data set and subsequent registration with the histological slice. Albers et al. used elastic image registration, treating the a priori acquired micro-CT as ground truth for that [5]. In order to reduce deformations Papers that include the keywords n ≈ 44000 Papers that were published after 2010 n ≈ 16300 Papers preselected that include mono-or multimodal regsitration of MRI, CT or Histology n = 43 Papers found that addressed multimodal image registration between MRI or CT and histoloy n = 19  Te reference or fxed image is used as a ground truth, while the foating image is iteratively altered by transformation and alteration until the optimal overlay is achieved. Te registration process is completed when the optimal value for the similarity (based on a specifc metric) between the two images is found.
between the individual sectioned planes, the integration of 3D printing slicers or cutting boxes has been proposed [13], where the sectioning process is optimized through the inclusion of a 3D segmented model of the tissue. Tis model is then used as a reference for the creation of a specimen cutting in a 3D modelling or computer-aided design program and subsequent printing. Depending on how strongly the histological image is distorted in comparison to the in-silico plane, 2D-registration algorithms with diferent degrees of freedom need to be applied. Tis can be loosely grouped into nonelastic, i.e., rigid or afne, and elastic registration algorithms as pointed out by [6] and confrmed to be still valid later by the same authors in [7]. Tus, the main problem is identifying the insilico plane. For this purpose, multiple strategies are proposed, ranging from manual identifcation of the plane by Albers et al. [1] to complete automatic detection. In many cases, deformation or loss of tissue in the histological slide complicates the search for the corresponding plane in the 3D volume. Due to possible shifts in the slicing of the tissue, the problem cannot therefore be simplifed to a 2D ⟶ 2D transformation. Tese shifts are commonly prevalent in softtissue sections as a result of the physical cut of the microtome. Hence, a two-step approach with a preliminary coarse and subsequent fne alignment was proposed [11,14,15]. Te plane in the 3D volume is coarsely aligned frst by matching to a group of candidate slices, followed by refned correction of plane shifts and tilts. Te initial alignment of the two modalities may be performed feature-free by the alignment of corresponding extrinsic markers and matched pixel/voxel intensities.

Nonelastic
Approaches for Slice-to-Volume Registration (2D ⟶ 3D). Lundin et al. [14] sampled a group of candidate planes from micro-CT scans of porcine vertebrae trabecular bones with the corresponding specifc orientation parameters. Candidate planes were determined by searching for the maximum number of identifed Harris corner detector key points detected in the histological image. From each key point, a descriptor vector is generated with a simplifed version of the histogram of oriented gradients (HoG) algorithm [16] that is matched with a given CT-plane key point through an implementation of the nearestneighbor algorithms. Te binarized histological slice was then rigidly aligned by its center at a low resolution with all candidate planes based on the matched key points and optimized through RANSAC. Te rotation was estimated based on pixel-intensity values using the Radon transform [17,18]. Te Radon transform is based on measuring the length of lines between two points and returns the perpendicular distance between the origin and the destination as well as the angle between the line and the y-axis. Te latter was used to estimate the rotation. Trough calculating the sum of the edge distances [17] between a CT-plane and the histological slice in an inverted order, a cost function was established in order to optimize the Radon transform.
If extrinsic markers like implants are present, segmentation-based approaches to estimate coarse positioning can be considered [11,15]. Based and expanding on the work by Sarve et al. [15], Becker et al. [11] used Chamfer matching [17] to preliminarily align images of specimens containing dental implants on the basis of thresholding. Due to the infexible nature of the implant, initialization was approached through alignment of the corresponding axis, which was approximated through principal component analysis (PCA) [19]. Te edge vectors of the histology and µCT images were centered and saved in a matrix. From this matrix, the covariance matrix and eigenvalue composition were calculated, where the eigenvector with the largest eigenvalue yielded an approximation of the implant axis. Using the implant axis as an initialization, an extraction of adjacent slices was conducted, following an extraction of adjacent slices. Candidate slices were then determined by the estimation of the optimal Chamfer distance [17], a measure to identify the nearest edges between two planes, and the smallest root-mean-squared error (RMSE) was approximated. In order to further refne the initial alignment, candidate slices were identifed through a rotation in 5degree steps orthogonal to the identifed implant axis. A subsequent 10-degree rotation in 1-degree steps was performed at the positions that resulted in the highest similarity. Te similarity of adjacent slices was quantifed using an alignment score (L-score) that was composed of the averaged L1-Norm between two aligned pixels. Plane parameters were extracted based on the optimal L-score, which was considered to be equal to a coarse alignment of the histological image in the CT volume.
In the case of registration methods based on extracted features in both modalities, a prime alignment step can be neglected in favor of grouping mechanisms performed in a higher dimensionality space [14,[20][21][22]. Feature points can be extracted from each image and represented as vector points. Based on the proximity of these points in the vector space, an afliation and subsequent geometric dependency can be detected. Tese descriptors can, for example, be obtained by utilizing a Harris corner detector [23] and HoG [16] as presented by [14] or SURF [24] and SIFT [25] algorithms shown by [20], who neglected the initial alignment. Corresponding feature clouds were matched by either calculating the Euclidean distance or by utilizing a variant of the nearest-neighbor algorithm [25,26], where data points are grouped together based on their proximity to one another. Te validity of these detected matches was verifed by a variant of the random sample consensus (RANSAC) optimization scheme [27,28], which resulted in transformations with six degrees of freedom. RANSAC arbitrarily defnes a minimum number of data points sufcient to describe the target shape, with three points as the minimum representation of a plane. In an iterative process, the number of points inside a given distance interval, so-called inliners, is counted. Te parameters of the plane are then updated until the number of inliers decreases. If data points that do not ft the plane, called outliers, are still present at that point, the process is started again with a diferent subset of points. An example of a feature-based approach is depicted in Figure 5.
Given that both images have been preliminarily matched regarding translation and rotation, fne alignment can be achieved through intensity-threshold-based approaches like simulated annealing [30] for fne afne alignment of the histology image onto a predetermined slice in the volume as presented by [11]. To refne the coarse alignment estimated by the chamfer distance, they moved on to fnd the transformation parameters that yielded the optimal alignment of both modalities. Taking inspiration from annealing in metallurgy, a global optimal position was searched in a slow iterative process instead of a fast estimation of a local minimum [30][31][32][33]. Tis was achieved by predicting if a higher alignment score, calculated through an alteration of the L1-norm between two pixels (L score, see description above), can be achieved between the fxed histological slide and two adjacent CT-planes predicted by the initial placement. Simulated annealing yields six degrees of freedom in translation and rotation, allowing for transformation with the goal of maximizing the L-score and thus minimizing the ofset between two given sections of both modalities. While this approach produces acceptable results (median L score: 91 out of 100, CT isotropic nominal resolution: 8.6 µm), they argue that their results can be hardly improved due to limiting segmentation from artifacts introduced during preprocessing of the tissue or irregularity of the staining. Tis workfow by Becker et al. [11] is highly dependent on the presence of distinguishable artifacts like implants and can be categorized as a landmark-based registration approach. Image segmentation and the manual determination of a suitable threshold are the method's bottlenecks. Te applicability of this method to other problems is therefore not fully guaranteed.
Utilizing the initial placement of the histological slide in the volume, Lundin et al. [14] continuously afnely registered several adjacent parallel planes with the image, relying on feature points detected by a Harris corner detector [23].
Each key point was subsequently utilized to extract descriptor vectors, which were then matched by employing the nearest-neighbor matching algorithm. Depending on whether the validity of the match was determined by the means of optimal RANSAC [28], an afne registration was estimated according to the greatest number of valid matches. Te authors state that the presented approach is diferent from previously conducted research due to the fully automated plane estimation and its applicability to distinguish highly structured objects. Employing simulated and real 2D data, an average orientation error of 0.6°was found. Te target registration error, i.e., the distance between two manually annotated points in target and moving images, was computed based on manually defned landmarks and determined to be 106.3 µm (corresponding to about 10 pixels in the diference between the target and moving landmark). Te validation was performed without any prior knowledge of the geometrical shape of the volume. Two diferent varieties of CT scans were used: one with a lower resolution of 85.6 µm for coarse alignment and one with a resolution of 21.4 µm for fner alignment (nominal CT resolution: 10.7 µm), while the pixel resolution of the histological image was measured to be 2.55 µm. A clear reduced performance was observed for nonartifcial data due to deformations introduced to the specimen during preprocessing, which could be accounted for by introducing countermeasures for local deformations.
As described before, thresholding as a preprocessing step for alignment may not apply to all datasets or may even possibly limit the overall superimposing performance [11]. Chicherova et al. [20] proposed a feature-based approach to automatically and without any a priori knowledge rigidly align a histological slide to an arbitrary plane in the CT volume of jawbones based on their earlier work [29]. Employing a SURF detector [24], a subset of feature points and respective descriptor vectors were determined from both histological images and predetermined slices of the micro-CT volume. Corresponding feature points were then chosen by calculating the Euclidean distance through a second nearest-neighbor criteria and a given threshold to validate a candidate match. Tis process was then repeated for all slices of the micro-CT dataset. To correctly defne the ftting plane, a RANSAC optimization scheme [27] was used to obtain a descriptive four-dimensional normal vector. 6 Radiology Research and Practice Chicherova et al. [20] found an average error distance of 0.25 mm for correctly matched slices, leaving room for improvement with a 75% success rate. Tey analyzed specimens smaller than a tube of 3 mm in diameter and 12 mm in length, which were scanned with a resolution of about 4 µm (depending on the specimen) [29]. Te authors state that they plan to develop a more suitable feature detector with an elastic matching approach. In subsequent research, the described workfow was adapted by Khimchenko et al. [21] with the additional use of the Demon registration tool [34]. By afnely deforming the tomographic image, they aligned both modalities and verifed their results through comparison to expert 2D ⟶ 3D registration. Tey show that they improved their former workfow for their specifc test cases, yielding radial and longitudinal stretches of 6% and 15%, respectively. Overall, they state that the resolution of the micro-CT was limited to the size of the focal spot (around 0.9 µm) thus limiting the overall achievable registration results. Trough the comparison of their results with the expert-based ground truth, a mean diference of 4 µm between the characteristic landmarks and the automatic and manual registration planes was observed. Chicherova et al. improved their work in a later publication [22] through the addition of an elastic registration step and normalized mutual information. Teir work will be presented in the following chapter.

Elastic Approaches for Slice-to-Volume Registration
With an elastic optimization scheme based on normalized mutual information, Chicherova et al. extended their previous workfow in [22]. Instead of solving the problem of fnding an optimal elastic deformation through B-Splines, they opted for a Legendre polynomial, which considered the entire space of possible deformations instead of a piecewise deformation model. In an iterative step, the plane obtained by the RANSAC scheme is subsequently altered with the goal of optimizing the alignment. Te coefcients are calculated as the result of the least square solution of a system of linear equations. Trough an optimization framework, the coefcients are further refned with the aim of maximizing the normalized mutual information.
In order to investigate an improvement of the new optimization scheme, the authors tested the algorithm for both rigid and deformable cases using the established jaw bone (CT resolution: 4.57 µm) [29] and cerebellum specimen datasets (CT resolution: 3.5 µm, resized to 7 µm in-silico) [21]. For the rigid jaw bone images, there was a clear improvement with a median error of 8.4 µm. In the case of the deformable cerebellum specimens, the overall median distance between the landmarks resulted in 21.6 µm. While this value is signifcantly higher, the authors argue that overall registration improved due to a limitation in the dispersion of the distances. Museyko et al. [35] matched micro-CT scans (15 µm isotropic resolution) with histological images of vertebrae and tibiae obtained from diferent wild-type mice through segmentation-based registration (SegReg) and conventional intensity-based approaches and compared the results (Figure 6). For the SegReg, both modalities were frst binarized and preliminary aligned based on visual inspection of corresponding landmarks. A secondary automated registration was implemented for a fner alignment. Te intrinsic, i.e., intensity-based registration approach was not preceded by an additional segmentation step. Te authors have chosen diferent complexities regarding the transformation of vertebrae (afne) and tibiae (elastic) specimens. Both registration methods were implemented using the insight segmentation and registration toolkit (ITK) [36].A B-Spline algorithm was used for a fner alignment. Mean Squared Diference (MSD) and Mattes Mutual Information (Mattes MI) [37] were applied as metrics for SegReg and intensity registration, respectively. Both approaches were compared using the Jaccard Distance, which is also known as the found exclusive disjunction (XOR-values). Quantifcation of the positional error of the slice in the volume was achieved by varying the position of the section by translating it into all three-dimensions and tilting it around the orthogonal axis  Figure 5: Exemplary workfow of a feature-based approach by Chicherova et al. [20] based on data originating from extracted human jawbones [29]. First, SURF-feature points (blue dots) were detected in all 700 planes of the CT scan and the histological image (left). Tese descriptors were then compared in a higher-dimensional space (middle). Matching descriptors of both modalities form a plane that was used as a basis to estimate the position of the histological section in the CT volume (right). Te plane was then further  Radiology Research and Practice based on six rotations. Te resulting eight XOR-values were then computed, leading to a calculation of precision through root-mean-square (RMS) and standard deviation (SD). In this study, the accuracy of the resulting alignment could not be improved signifcantly for SegReg for afne registration approaches but showed that better results were obtained for elastic registration indicated by the decreasing overall registration error from 43% to 23%. Figure 6 compares the two registration methods investigated by [35] for the histological and micro-CT images of one mouse tibia. Te comparison of SegReg and the intensity-based approach was quantifed using the standard deviation of eight specimens and the found exclusive disjunction, which was computed from the deviations in translation and rotation in eight positions. Te authors found that the error in translation and rotation estimated by a standard deviation of the RMS value was lower in the SegReg (0.0039) compared to the intensity-based registration (0.1227) for translational errors in elastic registrations. Te rotational error was noted to be an RMS of the standard deviation for both approaches of 0.0471 and 0.1189, respectively (micro-CT voxel size: 15 µm; histology pixel size: 7.25 µm). Furthermore, the authors computed and compared the ofset for the afne and elastic registrations by comparing it to the best rigid transformation. Tis rigid transformation was created by neglecting the nonrigid components. Te overall mean ofset was computed to be 52 µm.
A fexible framework for the non-rigid registration of individual histological two-dimensional images obtained from human brains to a three-dimensional MRI volume based on intensity-based criteria was proposed by Osechinskiy and Kruggel [38]. Teir approach was designed specifcally for cases in which there are only a sparse number of slides, but an MRI scan was performed beforehand. Te framework was built up in a modular manner, starting with geometric transformation using a deformation model. First, a preliminary alignment of the slice in the bounds of the MRI volume was performed by translating and rotating the image by Procrustes or rigid transformation, resulting in nine and six degrees of freedom, respectively, with the assumption that no scaling was needed. Next, several options for the deformations can be chosen, namely, Tin Plate Splines (TPS) [39], Gaussian Elastic Body Spline Deformation Field [40], and B-Spline Free-Form Deformation Field [41,42]. In a similar fashion, diverse options to calculate the similarity measure and optimization procedures were ofered. Te framework was assessed on MRI scans with an isotropic resolution of 0.35 mm × 0.35 mm × 0.7 mm and histological scans at a resolution of 12.7 micrometers/pixel. Tey demonstrated that the best performance was achieved by TPS in combination with a NEW Unconstrained Optimization Algorithm (NEWUOA) [43], optimization, and a correlation-coefcient-based cost function. Te similarity between histological sections and MRI planes of human brains was measured by calculating the sum of seven coefcients [38].

2.2.
Image-to-Image Registration (2D ⟶ 2D). Image-to-image registration in the presented context refers to the alignment of the histology slice to the optimal plane inside the volume. Tus, concerns about geometrical integrity and optimal in-plane ftting of the three-dimensional bodies are discarded. However, after the accurate registration of both modalities, eforts may be put into the reconstruction of a virtual histological model. (2D ⟶ 2D). Te optimal alignment of two images sourced by diferent modalities is a complex task, with a need to defne a means of comparison. Furthermore, if two stacks of images are present, an individual correspondence needs to be established. While this correspondence may be defned by manual labor, it is a labor-intensive process, highly dependent on expert knowledge [10,15,21]. In order to overcome this problem and match histological slices and MRI slices of the human prostate, Xiao et al. [10] proposed the utilization of an iterative group-alignment scheme to estimate corresponding images and then individually compare the pixel value distribution of both modalities in opposition to the established pairwise comparison of two images. Teir approach consists of three modules. First, all histological and MRI-images were separated into two groups according to the imaging modality (MRI or histology), with the goal of estimating a selection of MRI-slices that resulted in an optimal match based on the computed Mutual information (MI) [44,45]. Te K-top ranked matches were then averaged to distinguish between a probable and less likely match. While this step yielded an educated guess for the correspondence of both modalities, it did not account for distortion that might have occurred in the preprocessing of the tissue or the organ deformation experienced in the case of in vivo-MRI scanning with an endorectal coil. Terefore, the fnal alignment was conducted in an afne registration process, transforming the histological slice by calculating the MI of the two images and a simple optimization method. Xiao et al. [10] demonstrated that the proposed group-wise alignment results in a higher degree of similarity than pairwise comparison, e.g., a brute-force approach. Tis was evidenced by a lower proprietary error norm based on the L1-distance. According to their calculations, based on expert-generated ground truth, a pairwise alignment performs only half as well as their group alignment. Both approaches were assessed on the same data set with an MRI resolution of 0.27 mm/pixel. Teir experiments showed that the groupwise alignment produced a smaller error than the pairwise comparison method. Figure 7 shows a practical example of applying the group-wise alignment scheme to 5 µm thick slices of human prostate tissue.

Nonelastic Approaches for Image-to-Image Registration
Quantifed through their error norm, they measured values ranging from 0 to 2.7 for the group scheme and 1.7 to 5.2 for the pairwise scheme. While the pairwise matching set the individual slices in order, the complete generation of an authentic three-dimensional model still posed a challenge due to overlaps and ofsets introduced to the slices during preparation. Te authors addressed this problem with a 3D ⟶ 3D afne registration [10], which we present in a later section concerning higher dimensional registration. (2D ⟶ 2D). Te correlation of histological slices with the scanned volume in an image-to-image registration approach relies on a priori knowledge of the section to plane correspondence, i.e., the position of the extracted histological image in the 3D-stack. If the order of sections is conserved during the preparation process, this a priori information may be used to establish such correspondence and may even allow for more sophisticated 3D ⟶ 3D registration approaches as presented by [46]. However, changes in orientation may need to be accounted for. Te sectioning process can be further optimized through the inclusion of 3D-printed slicers that are created with an a priori scan of the tissue [13]. For example, Absinta et al. [47] showed qualitative improvements in the creation of serial sections of human brain tissue, allowing for the enhancement of the subsequent superimposition of 3D scan and histology. Turkbey et al. [48] segmented prostate tissue and generated 3D models with the use of the ANALYZE software (Mayo Clinics, AnalyzeDirect, Inc., Overland Park, KS, USA). Using the surfaces of these models, slicer molds were created and 3D-printed to allow sectioning without distortion.

Elastic Approaches for Image-to-Image Registration
Matching an individual plane to not only its adjacent neighbors but to a group of peers has been proposed [10,49] in order to avoid the pitfalls of error propagation. One approach used by [49] involved the implementation of the feature point detection and feature description algorithm AKAZE [50]. Here, the feature points of the N preceding slices were matched symmetrically. Robust estimation of an afne transformation matrix was done by employing RANSAC [27]. In addition, the planned cutting plane was marked in the micro-CT volume beforehand, allowing for the allocation of corresponding images. Other approaches reconstruct the histological image stack through an initial alignment of each image to its en face or block face representation, referring to an image of the front side of a sectioned tissue (see, for example, [46,51]). Manually placed landmarks in both histological and en face images of softtissue specimens of arteries with plaques were incorporated by Groen et al. [51] to register preand postprocessing recordings. Te annotated landmarks were assigned through a B-Spline control point displacement [42] calculated by the means of MI and optimized using a gradient-descent algorithm. Next, the MRI and CT-slices were registered to the en face images to (i) determine the orientation of the slices and (ii) compensate for scaling errors that might have occurred. Te former registration was performed as an automated rigid transformation, also incorporating MI as the similarity measure. Registration of CT-and en face images was based on manually edited landmarks using a rigid transformation as well as isotropic scaling. Tey reached an optimum of a 5-degree and 7-degree rotation error and a corresponding translational mean error of 0.6 mm and 1.2 mm (CT resolution: 18 µm and histology resolution: 1.82 µm).
In a two-step approach, Seise et al. [52] frst afnely align pairs of binarized images of pig livers based on the relative overlap metric (Kappa-Statistic in ITK https://www.itk.org) [53]. Te centers of the vessels depicted in the matched segmented images as well as interactively defned locations were then used as the initial group of corresponding points for TPS. Focusing on just the registration of histology to micro-CT, an average accuracy of about 0.5 mm (CT resolution: 0.4 mm) has been determined while the entire framework, including a 3D ⟶ 3D similarity transform between three-phase contrast-enhanced CT and micro-CT and 2D ⟶ 3D CT and histology registration, achieved a mean deviation of 2 mm. Considering the nature of the problem at hand, registering vessels, the presented approach produced satisfactory results when compared to its peers [54,55], but limited applicability to other tasks can be assumed. Furthermore, the chosen implementation was a rather labor-intensive task.
Based on their previous work [1,4], Albers et al. [5] used an inverted individual or overall color channel representation of the histological image to match the slide with the corresponding CT-plane. Following the aforementioned coarse matching using the Fourier-Mellin algorithm [56,57], the fne alignment was achieved through the means of a B-spline deformation model implemented by elastic [9] and optimized through MI. Te results were quantifed using a displacement index [1] that computes the displacement and MI based on block matching. Ideally, this value should correspond to 0; however, for the case at hand, an overall value of 6.9 ± 2.0 was achieved for the transformed dataset, due to diferent image content shown in both histology and micro-CT. While the manual sectioning of the CT volume may complicate the reproducibility and applicability in an analog problem scenario, the overall result of the two-step approach produced a good outcome considering that the displacement was unlikely to equal 0 due to the diferent modalities involved.
Magee et al. [58] extended the intensity matching criteria approach by representing each pixel of each image with a feature vector constructed from the results of a Gaussian Filter on color and grayscale channels as well as texture features built upon their previous work [59]. Tis resulted in a common visual representation of images created through diferent modalities. Tese feature descriptors were labeled using prototypes and clustered together. Clusters were assigned to so-called tissue-classes that represent a multichannel representation of a given tissue using preexisting mapping functions. Next, a tissue class co-occurrence matrix of two mapping functions was generated. In order to quantize the similarity of said matrices, the MI was calculated and maximized by incorporating a greedy search algorithm. Tis process was repeated until each image was present in the same representation. Finally, the actual registration of the converted images was performed. Based on the idea of realizing a nonrigid registration by a set of rigid registrations on subimages, the images were frst padded to the same size, rigidly aligned using phase correlation, and then divided into overlapping image pairs. Tese patches were superimposed by determining the rotation and translation ofsets by calculating the phase correlation [60]. For each local registration, fve transform vectors were created and overlapped with their peers using a least squares minimization method and subtracted from each other. Te resulting vector set was then approximated by a B-Spline using a robust least squares-minimizing method [61]. Tese steps were repeated at diferent resolutions to achieve the best result. Te authors employed their methodology to register MRI and histological images. Tis method outperformed iterative methods with an absolute error of 5.7 ± 5.8% in 100 µm thick sections imaged at an MRI isotropic resolution of 50 µm for specimens with low collagen quantifcation. Sections with a thickness above 100 µm resulted in worse performance, with an error starting at 50% [59]. In their work, the others state that the achieved quality of registration was found to be within 200 µm. Approaches that already used feature points in a previous step logically tend to reuse them for a more advanced matching procedure. For completing the feature extraction of the CT-dataset, Nagara et al. [49] employed AKAZE as they have preceded to do for the histological images. Appointing the features of two corresponding slides as nodes of a Markov random feld [62,63], an elastic registration based upon normalized cross-correlation (NCC) was proposed. Te resulting experiments showed promising results, with the best match quantifed by a mean dice index of 0.744, a mean Jaccard index of 0.595, and a NCC of 0.608 (CT resolution: 49 µm and 52 µm; histological image resolution: 22 µm). Figure 8 illustrates the quantitative results of their approach by visually linking matched features.
Based upon the manually annotated landmarks, Katsamenis et al. [64] matched CT planes and histological slides according to the best visual correspondence in a preliminary step. In order to achieve a more accurate registration, an elastic vector-spline registration of the Fiji [65] Plugin UnwarpJ (available at https://bigwww.epf.ch/thevenaz/ UnwarpJ/) [66], implementation was used. Te proposed method was realized in an almost exclusively manual workfow, which complicates its possible applicability to other problems. Furthermore, no indication of a similarity measure or accuracy quantifcation was provided. (3D ⟶ 3D). A full registration of two three-dimensional models poses the most challenging approach for multimodal image fusion and is commonly composed of a multistep workfow in which lower-level matching methods are applied iteratively. Consistency with regard to the geometrical integrity or geometry consistency of the resulting model is a frequently addressed problem that is visually noticeable in layer shifts found in the z-axis, in cases of curved objects, also referred to as the "banana efect" [67,68]. Tis could result from a sparse number of histological slices as well as distortion or lesions introduced during the slicing process. Furthermore, changes in the intensity of the aligned planes or sections need to be accounted for. A visualization of the problem at hand is presented in Figure 9.

Geometric integrity
Intensity integrity Sections to be registered Original object Figure 9: 3D-volume reconstruction based on subsequent image-to-image registration poses a challenge in the conservation of the geometric consistency of the sectioning tissue. In order to register a histological image with a scanned volume, the individual sections frst need to be aligned together, e.g., the volume needs to be reconstructed while geometrical consistency is maintained. Consecutive registration of individual slices needs to be performed until the original shape of the sectioned tissue is reconstructed in silico. Subsequently, inconsistencies in the intensity can be accounted for.
Te following research tackles the problems of (i) reconstructing the specimen from the slices and (ii) matching this 3D representation with a scanned model. It is worth mentioning that the topic of 3D virtual histology has been addressed by other authors who did not rely on a scan of the specimen as a ground truth and therefore did not implement multimodal registration as evidenced by [69][70][71].
Full 3D ⟶ 3D registration could contribute and expand on existing approaches through the almost exact correlation of 3D scan and histology. With the addition of a registration of the reconstructed histology volume and an MRI scan in the workfow of Turkbey et al. [72], future research may produce an increased degree of correlation between both modalities, thus providing further aid in the diagnostic process.

Nonelastic Approaches for Volume-to-Volume Registration (3D ⟶ 3D).
In addition to the established groupwise alignment scheme explained in the previous section, Xiao et al. [10] used the plane and slice correspondence to create a three-dimensional representation of the histological image stack and registered it to the volume. Retaining the interslice distance of the MRI-scan as well as the order of the individual slices, they created a pseudo-volume of the stained tissue. Missing values were compensated for by zeropadding, essentially leaving absent parts blank. In order to align both volumes, the MRI was used as the transformed volume in an afne registration process. Te authors opted to avoid overftting and therefore deliberately chose a nonelastic approach. As they have done for their 2D ⟶ 2D registration algorithm, MI and a dedicated simplex optimization approach were applied. Te authors did not state the achieved margin of accuracy or similarity for the implemented 3D registration. A qualitative example of their results is portrayed in Figure 10. (3D ⟶ 3D). After the histological processing of the tissue specimen and subsequent imaging, a three-dimensional virtual model reconstructed from the individual slices needs to be obtained in order to match it with the volume. Tis can be achieved by registering the histological images to their respective block face or en face representation based upon an MI implementation [37,44,45,73] as a basis for 3D Registration and the 2D ⟶ 2D approaches described in the image-to-image registration section of this review. In the case of Mancini et al. [74], whole brain specimens were divided after an MRI scan to realize a histological section of smaller blocks. To preserve the overall anatomical structure of the brain, these blocks were matched to whole-slice photographs using SURF [24] and RANSAC [27]. After obtaining the full virtual histological model, a 3D registration was proposed by frst resampling the MRI of the brain according to the block orientation. Ten, each slice pair was coarsely matched by the means of NiftyReg [75], realizing a nonelastic transformation. A subsequent fne alignment was implemented through stationary velocity felds [76].

Elastic Approaches for Volume-to-Volume Registration
Alegro et al. [46] realized their 3D registration through asymmetric difeomorphic registration [77]. Facing the illposed problem of geometrical altering of the tissue during processing, they propose to preserve the geometrical integrity of the virtual histological image stack by 2D registering the sections to an a priori ground truth captured in block face images of the whole postmortem human brain acquired during sectioning. Te authors claimed that their methodology prevents shifts in the geometrical integrity, i.e., the z-efect. Te registration was implemented using advanced normalization tools (ANTs) [78] and an afne registration algorithm based on Mattes MI [37]. Following intensity correction and resampling, both 3D-stacks were then registered. Difeomorphic nonlinear registration [77,79] was implemented to compensate for artifacts present in the histological samples with Mattes MI [37] as the matching criterion. Teir results were quantifed using the Dice-coefcient [12] and yielded experimental results of 0.59, 0.65, and 0.75, respectively, with a relatively low MRI isotropic voxel resolution of 1 mm 3 . A visual representation of the reconstruction of a three-dimensional histological model of a brain specimen is displayed in Figure 11. Rusu et al. [80] fused histological images and MRI scans of lung specimens to detect features of pulmonary infammation based on three-dimensional registration. In a scheme with increasing complexity, the authors frst rigidly aligned neighboring histological slices based on MI. Using elastix [9], a virtual histology model was reconstructed through registration with an ex-vivo scan of the lung, thus limiting the spatial deformations introduced during sectioning. Utilizing a three-level pyramid afne registration optimized by MI as the scoring function, the processes were realized. Next, the histological volume was registered to an in vivo scan by afne and deformable transformation. Te alignment was further optimized with regard to the individual lobular units extracted from the specimen. Utilizing the entire gamut of the information held by the histological images, both volumes were further merged, allowing for the mapping of pulmonary infammation onto the in vivo scan. Te histological model was warped onto the in vivo-MRI scan using a B-spline based elastic registration based on a three-level registration scheme to optimize MI. With an ultimate grid spacing of 4 mm, a fnal alignment error of 0.85 ± 0.44 mm (root-mean-square deviation between the 17 landmarks) (in vivo-MRI resolution: 250 µm, histological resolution: 0.75 µm) was achieved.
In a vastly diferent approach in comparison to the prior research concerning matching criteria and implemented metrics, Lee et al. [81] reconstructed the surface of a cochlea based on histological images and CT scans by using an iterative closest point (ICP) algorithm [82]. Both modalities were processed in order to generate a wireframe representation of the surface. Substructures of these wireframes, constructed of triangular faces, were used to identify corresponding markers and subsequently match them using the ICP algorithm. In an iterative process, the optimal deformation function to map these points onto another was determined by minimizing the distance between the sum of all points. Te process was optimized by observing the RMS value between successive slides. After partial alignment, the entire surface was matched connecting the individual wireframes. Tis fnal registration was done using an afne transformation. Te mean distance between two points in the reconstructed surface model was calculated for several RMS and averages to about 0.0805 mm (micro-CT resolution: 30 µm; histology images: 300 pixels/in (about 12 pixels/mm)).

Comparison of the Presented Registration Approaches
Tis summary presents a prevalent excerpt of registration strategies to implement an optimal alignment transformation of a 2D histological image with the analog three-dimensional plane representation. Given that the analogous objective is the multimodal registration of complementary imaging techniques, we believe that a broader scope can aid the implementation of an akin solution. For the purposes at hand, accuracy in terms of the quality of alignment is paramount. Due to the authors' diferent mission statements manifested in the dimensionality of the registration method, a variety of targeted types of tissue, and the desired outcome, a variety of metrics were chosen to estimate the individual performance. With different equipment and software used by the authors for imaging and preprocessing, a universal solution through direct inspection and subsequent appointment is ambiguous. However, with a quantitative comparison of the selected research, a trend for a subset of algorithms can be observed. Trough the found variety of similarity measures and stated accuracy, we quantifed the performance of the individual approaches, for the cases where it was possible, by weighing them with the stated resolution of the three-dimensional imaging technique and calling this relative accuracy. Since histology imaging has a higher native resolution, the a priori MRI or CT scan can be characterized as the limiting factor. In Tables 1-3, the presented research, according to the established dimensionality categories, is summarized by their achieved similarity, resolution of the scan, and relative accuracy. Tis accuracy is computed with the 3D isotropic resolution and the found similarity measure, therefore, allowing us to directly compare the precision of the given approach concerning the multimodal registration of histology and scan if measurements with equal units are provided. Possible results range from 0 to 1, with 1 being the optimal relative accuracy. Trough this comparison, Tables 1-3 were compiled to create an overview of the achieved performance. Starting with Image-to-Volume registration approaches, the presented literature is   quantitatively compared by the reported performances of the described approaches and put into context with the limitations of the three-dimensional scan. We documented this efort through the introduction of a relative accuracy that is computed as the ratio of the found similarity (depending on the values stated by the authors) and 3D isotropic resolution in Table 1.
Due to the diferent approaches the authors took to quantify their results, a clear comparison through the computed relative accuracy is not always feasible. However, among the three papers in which sufcient measurements were provided, a distinct diference in the performances is observed. Since these papers provide their found accuracy as plain distances measured in µm, a calculation of the relative accuracy is possible. In Table 2, we applied and listed the found similarity and isotropic resolutions and, if possible, calculated the relative accuracy, which we have been using as a means for quantitative comparison but now apply to the image-to-image registration cases presented in this paper.
For the problem of 2D ⟶ 2D registration, a comprehensive comparison is rather difcult to establish due to the coarse isotropic resolution of the scans. Tis manifests itself as a second factor to be considered when choosing a registration approach. One should be aware of the stated resolution when choosing an approach for proprietary problems. Finally, in Table 3, we list the observed performance of the volume-to-volume registration approaches taken from the individual publications, respectively.
A comparison of the provided data was not possible for the 3D ⟶ 3D. Either the data was not provided or not applicable for the calculation of the relative accuracy. Furthermore, there is less research being conducted in this domain possibly due to the complexity of the task.

Conclusion
Te herein presented review of image registration algorithms aims at providing a broad overview of techniques that can be used for the registration of histological slides with 3Dimaging modalities such as CT and MRI. Since there is a strong interest and intensive research in this feld, we focused on reports published in the last ten years. Te publications were sorted based on the complexity of the transformation, allowed degrees of freedom and the dimensionality of the problem at hand. Here, a clear discrepancy in the amount of research published was observed, with 2D ⟶ 2D and 2D ⟶ 3D being signifcantly more prominent than 3D ⟶ 3D applications. Te latter feld seems less well understood given the limited information about the used algorithms and the focus on very specialized use cases. Furthermore, we conclude that even though microscopic images of processed hard-tissue typically show fewer deformations than observed in soft-tissue histology, dedicated algorithms for this specifc task are less prominent in the literature. Instead, a majority of elastic solutions are presented for precise superposition, while, on the other hand, nonelastic methods are primarily used for preliminary alignment of both modalities. A majority of the presented publications that deal with complex slice-to-volume or volume-to-volume registration strategies divide the process into distinct substages: (i) three-dimensional registration is initiated by a priori matching of the corresponding planes, resulting in transferring the original 3D problem into a 2D ⟶ 2D registration problem, (ii) typically iterative refnement of the position of the histological section in the scanned volume is applied to increase the precision, and (iii) fnally, for 3D ⟶ 3D registration, an additional arrangement step is utilized to match both structural and geometrical properties.
In order to loosely compare the performance of the proposed strategies, we calculated the relative accuracy based on the stated matching error in relation to the lowest spatial resolution of the used image datasets. Tis proved to be practical for most nonelastic approaches; methods employing more sophisticated similarity measures or novel quantifcation strategies can hardly be compared in this way due to their heterogeneous nature and the involvement of complex alignment schemes. Also, in the case of elastic registration, our simple comparison metric cannot be applied. Nevertheless, we found large variations in the achievable relative accuracy and hope that this information will help the reader pick the ideal technique for his/her application.
Taking the above-declared limitation into consideration, we, however, observed that for the literature considered in this review, a clear tendency to favor the use of intensitybased approaches generally tends to perform better than their feature or landmark-based counterparts. However, this might change if feature-based image processing methods are incorporated with registration approaches, which are predominantly realized by intensity matching. Te visible deformation introduced to the specimen during histological sectioning will continue to be a major hindrance for extraction algorithms.
Overall, we observed the dominance of relative accuracies or measures instead of a transparent distance quantifcation (e.g., in micrometers). However, a set of standardized methods to quantify the resulting alignment of two images after the registration may hold the key to efciently establishing a unique approach that could be suitable as a commonly recognized means to evaluate the quality of a registration approach and thus allow for direct comparison of the individual algorithm's performances.
(i) Vectorized Norm. Prior knowledge of individual landmarks and points of interest can be obtained, or expert knowledge is provided. A vectorized norm may be provided to (i) determine the overall performance in slice correspondence [10] or to (ii) normalize the deviation between two markers present in both images [11]. While plain distance measurements are also feasible, the norm approach should also be considered if the registration methodology is based on feature descriptors or if the image is transformed. (ii) Set Teory Approaches. If no prior knowledge of plane correspondence is available, methods based on logic operations may be used. Quantifcation of alignment for intensity values through Intersection of Union, i.e., Jaccard index [34], provides a performance statement based on the resulting overlap which can be expressed in percentages or in values ranging from 0 to 1. Terefore, a universal comparison could be achieved by estimating the similarity and diference in pixel-based intensity values after the two images are superimposed. (iii) Benchmarking. In image processing applications and machine learning, benchmark-data sets have been the gold standard to verify and validate the performance of dedicated algorithms, e.g., [83][84][85][86]. Typically, these images stem from real objects or were artifcially generated. If such a multimodal dataset, with constant resolutions and unspecifc staining protocols, including expert-based ground truth and a defned set of metrics, is to be established, future work needs to be conducted based on the achieved performance. Using such benchmark data could be used to verify and validate results found by other researchers who already considered expert-based matching as ground truth, e.g., [21] in an attempt to reduce inductive bias. A ground truth-based evaluation set for the benchmarking for the reconstruction of 3D-volumes by 2D ⟶ 2D registration has already been proposed by Lobachev et al. [87]. Furthermore, diferent staining protocols need to be accounted for. Tis problem is currently being tackled by the participants of the Automatic Nonrigid Histological Image Registration (ANHIR) Challenge [85,86,88,89].
Tis review shows that the registration of hard-and softtissue histology to a prior generated 3D scan of the specimen is of broad interest. However, each of the presented approaches difers not only in the pursued goal but also in the registration method. Tus, a comprehensive comparison of performance and accuracy can only be achieved with great difculty. Tis underlines the need for a general quantifcation method and an agnostic procedure to compare and evaluate each workfow objectively. With this review, we hope to provide researchers new to the feld of image registration an easy decision tree to pick the optimal strategy for their registration problem. In analogy to the structure of this review, one should frst be aware of the dimensionality of the problem to be tackled and then decide how severe the alterations introduced to the tissues are and fnally decide on the metric which promises the best optimization opportunity.

Disclosure
Tese funding institutions had no role in the study design, collection, analysis, and interpretation of the data, in the writing of the manuscript, or in the decision to submit the manuscript for publication.

Conflicts of Interest
Te authors declare that there are no conficts of interest regarding the publication of this paper.