An Automatic Image Processing Method Based on Artificial Intelligence for Locating the Key Boundary Points in the Central Serous Chorioretinopathy Lesion Area

Accurately and rapidly measuring the diameter of central serous chorioretinopathy (CSCR) lesion area is the key to judge the severity of CSCR and evaluate the efficacy of the corresponding treatments. Currently, the manual measurement scheme based on a single or a small number of optical coherence tomography (OCT) B-scan images encounters the dilemma of incredibility. Although manually measuring the diameters of all OCT B-scan images of a single patient can alleviate the previous issue, the situation of inefficiency will thus arise. Additionally, manual operation is subject to subjective factors of ophthalmologists, resulting in unrepeatable measurement results. Therefore, an automatic image processing method (i.e., a joint framework) based on artificial intelligence (AI) is innovatively proposed for locating the key boundary points of CSCR lesion area to assist the diameter measurement. Firstly, the initial location module (ILM) benefiting from multitask learning is properly adjusted and tentatively achieves the preliminary location of key boundary points. Secondly, the location task is formulated as a Markov decision process, aiming at further improving the location accuracy by utilizing the single agent reinforcement learning module (SARLM). Finally, the joint framework based on the ILM and SARLM is skillfully established, in which ILM provides an initial starting point for SARLM to narrow the active region of agent, and SARLM makes up for the defect of low generalization of ILM by virtue of the independent exploration ability of agent. Experiments reveal the AI-based method which joins the multitask learning, and single agent reinforcement learning paradigms enable agents to work in local region, alleviating the time-consuming problem of SARLM, performing location task in a global scope, and improving the location accuracy of ILM, thus reflecting its effectiveness and clinical application value in the task of rapidly and accurately measuring the diameter of CSCR lesions.


Introduction
CSCR is a common fundus macular disease, which causes the visual object to be deformed, darkened, or become smaller and is one of the factors afecting human visual health. But its pathogenesis is still unknown in ophthalmology. In recent years, some scholars have put forward new theories on the pathogenesis of CSCR, such as the theory of choroidal dysfunction and the theory of retinal pigment epithelium dysfunction, which have explained the pathogenesis of CSCR to a certain extent and appropriately promoted human cognition of the fundus disease. Tis macular disease is mostly seen in young men aged 30 to 50 and is typically characterized by neurosensory retinal detachment (NRD, as shown in Figure 1) with or without pigment epithelium detachment (PED) [1,2]. Although the vision of some patients may recover spontaneously within a few months without any intervention, it is still difcult for some patients to recover to normal vision without surgery or drugs in a short time. In general, the main active interventions for the treatment of CSCR are laser surgery and drugs. No matter which method is adopted, it is essential and critical to carry out efective quantitative monitoring of the CSCR lesion area, which lays a foundation for timely obtaining the disease information and then assisting ophthalmologists to more objectively evaluate the severity of this disease and the efcacy of the corresponding treatment plan and also provides a basis for better formulating the follow-up treatment scheme.
At the moment, the monitoring parameters of CSCR mainly include the central macular thickness (CMT), best corrected visual acuity (BCVA), maximum height, and diameter of CSCR lesion area. In addition, the CSCR lesion area is also an important parameter, and its direct segmentation and indirect detection methods have been carried out by many researchers [3][4][5][6][7][8][9]. A fully convolutional neural network was built for the automatic segmentation of subretinal fuid, and with the help of shrinking and expanding network structure, an average dice rate of 0.91 was obtained [3]. To deal with the large variations of the locations and shapes of CSCR lesion and the low contrast of Bruch membrane areas, Xue et al. [4] proposed a deep ensemble neural-like P system that integrated the strengths of deep convolutional neural networks and the spiking neural P system and achieved a maximum average dice rate of 0.97, which showed great potential in actual application. Wu et al. [5] presented a two-stage scheme consisting of detecting fuid-associated abnormalities by using thickness map prior and segmenting the subretinal fuid by using the fuzzy level set with a spatial smoothness and was benefcial for the automatic quantifcation of lesion area. Similar to [3], an end-to-end pipeline [6] inspired by the SegNet neural network was adopted for the identifcation and segmentation of CSCR fuid regions, which facilitated a more complete analysis of CSCR. Based on loosely coupled level sets, Novosel et al. [7] raised a locally-adaptive approach for the segmentation of the fuid and the interfaces between retinal layers, and a dice coefcient for fuid segmentation of 0.96 was acquired, which revealed a great potential in quantifying the CSCR lesion area. Moreover, Zhen et al. [8] tried to detect CSCR based on the deep learning architecture and color fundus images. However, this method cannot describe CSCR lesions in detail, so it is not conducive to the monitoring of the disease. A commendable segmentation model combining the U-Net and generative adversarial network was ingeniously constructed by Yoo et al. [9]. To the best of our knowledge, this framework was the frst time to achieve the segmentation of CSCR lesions in the color fundus images by developing a cascaded network, which is of great signifcance for quantitative monitoring of CSCR by virtue of conventional fundus image examination. In addition to the previous direct segmentation schemes, there are also some indirect detection methods [10,11]. Syed et al. [10] constructed a support vector machine (SVM) classiferbased model for the automated diagnosis of CSCR. Specifcally, they established a feature vector with a length of 8 based on retinal thickness and cyst space cavity to guide the classifer to learn proper weights for judging the disease category. A similar idea was also designed by Khalid et al. [11], where the diference was that 9 extracted features and more testing samples were adopted to train the classifer for making more accurate judgments on the type of retinal diseases. In these schemes, the feature descriptors of the CSCR lesion area are frstly established by applying the feature engineering technique, and then, the classifer is trained by using the feature vectors to construct the lesion detection model. Since such schemes require detailed digital description of the lesions, professional cognition of the characteristics of the lesions is crucial.
Besides, the fuid segmentation of other fundus diseases also provides a reference for the area quantifcation method of CSCR lesion [12][13][14][15][16][17]. To detect three-dimensional retinal fuid (i.e., symptomatic exudate-associated derangements), Xu et al. [12] developed a novel voxel classifcation-based approach using a layer-dependent stratifed sampling strategy, and this approach performed well in dealing with the class imbalance issue. By combining the squeeze-andexcitation blocks and the U-shape network, Chen et al. [13] put forward a structure called SEUNet to segment fuid regions in the age-related macular degeneration (AMD) and supplied an efective for fuid segmentation. Based on graph shortest paths and neutrosophic transformation, a fullyautomated segmentation method was designed for the accurate segmenting of diabetic macular edema (DME) biomarkers so as to provide a quantitative measure for DME diagnosis [14]. Alsaih et al. [15] employed four wide-spread deep learning models for the segmentation of three retinal fuids in AMD and explored how the patch-based technique pushes the performance of deep learning-based models, which was conducive to the improvement of such scheme. Lu et al. [16] presented a deep learning-based method for segmenting multiclass retinal fuids. Diferent from the common deep learning schemes, this method introduced the random forest classifer in postprocessing to reduce the over segmentation problem in the independent network model. Hassan et al. [17] also constructed a deep learning-based segmentation network integrating the atrous spatial pyramid pooling module, residual module, and inception module to NRD NRD NRD segment multiclass retinal fuids and brought a considerable gain in efciency. Te previous direct segmentation pipelines or indirect detection methods undoubtedly enrich the research ideas of the automatic quantifcation scheme of the CSCR lesion area, which is of great signifcance for the precise treatment of this eye disease. Nevertheless, the tediousness of the pixellevel annotation task in deep learning-based segmentation method and its potential defects of insufcient generalization ability, the strong dependence of feature engineering on professional experience in classical machine learning-based detection way, and the low accuracy and weak adaptability in traditional image processing-based segmentation plan may restrict the wide application of the previous methods in the quantifcation task of CSCR lesion area to a certain extent. It has to be said that the lesion diameter measurement scheme [18] based on locating key boundary points does appropriately alleviate the previous situations, but the gradientbased correction module (GBCM) in this scheme relies on setting appropriate threshold parameters and is sensitive to the position of the starting point provided by ILM.
Considering the limitations and advantages of the previous methods, as well as the challenges of diameter measurement caused by the diameter diferences of CSCR lesion areas in diferent frames (as shown in Figure 2), this paper constructs an automatic image processing method (i.e., a joint framework) based on artifcial intelligence for rapidly and accurately measuring diameter of CSCR lesion area from the perspective of locating key boundary points in the CSCR lesion area. Te details are as follows: (1) In the frst step, the multitask learning-based ILM is appropriately adjusted and used for rapid location of key boundary points in the CSCR lesion area, laying the foundation for subsequent accuracy improvement. (2) In the second step, the location task is described as MDP, in which the single agent aims to explore and lock the key boundary points in the CSCR lesion area through continuous interaction with the image environment. (3) Te joint framework based on ILM and SARLM is skillfully designed to make up for the defect of low generalization of ILM by employing the unique exploration ability of agent in SARLM and to narrow the active region of single agent by providing the initial starting point for SARLM through the ILM. (4) In the fourth step, extensive and in-depth experiments are carefully carried out to prove and analyze the efectiveness and feasibility of the joint framework in the key boundary point location task of CSCR lesion area and its application efect in the lesion diameter measurement.
Te structure of the remaining part is as follows: Section 2 describes the related works of multitask learning and single agent reinforcement learning. Section 3 explains the implementation details of our proposed method. Section 4 shows the results and discussions. Section 5 concludes the research work.

Materials.
Te CSCR source images used in the experiments are provided by the cooperative eye hospitals, and the patients' privacy information has been desensitized carefully. Te annotation task of all the CSCR B-scan images is jointly completed and reviewed by professional ophthalmologists and relevant academic personnel. After the conventional data augmentation operations, the number of image and annotation pairs in the dataset used for training reaches 3240. Additionally, to evaluate the efect of the joint framework in the testing dataset, a total of 25 patient-level data, including 912 OCT B-scan images, are introduced into this process.

Multi-Task
Learning. As one of the artifcial intelligence technologies, multitask learning [19] is a learning paradigm that improves generalization ability of the convolutional neural network model by using the domain information contained in the training signals of related tasks as an inductive bias, which has been extensively applied in downstream tasks such as object detection, target classifcation, and semantic segmentation [20][21][22]. Meanwhile, this paradigm also shines brightly in various medical image processing tasks [18,[23][24][25]. For assisting the diameter measurement of the CSCR lesion area, the multitask learning paradigm was introduced into the key boundary point location task for the frst time [18], enabling the rapid locking of the relevant coordinates. To obtain a robust retinal disease grading model, Ju et al. [23] extracted additional monitoring signals from various sources by using multitask learning and achieved a signifcant improvement. A new canonical correlation analysis model [24] combining the biologically meaningful structures with the multitask learning framework was designed to mine the shared representations in multimodal data, which experimentally demonstrated the potential of multitask learning. Additionally, this paradigm also performed well in improving the accuracy of glaucoma diagnosis [25]. By sharing most of the parameters of the segmentation layers and classifcation layers, the feature representation ability of the model for a given task is enhanced, and then a win-win situation is achieved.

Reinforcement
Learning. As a unique machine learning method to realize artifcial intelligence, the reinforcement learning (RL) model has emerged in various scenes with its unique operating principle, in which the artifcial agent obtains rewards and punishments through the continuous interaction with the environment [26] and then learns the optimal strategy for a given task. In particular, RL has shown satisfactory performance in various tasks in the feld of medical image processing, such as registration, classifcation, and segmentation. In the registration task [27,28], instead of directly optimizing an image matching metric, the goal of artifcial agent was to fnd the best sequence of motion actions to achieve the best alignment between images. In the classifcation task [29], the agent cropped the appropriate patch on the original image through hard attention mechanism and updated the cropping strategy with the feedback of the classifcation network, Computational Intelligence and Neuroscience 3 so as to achieve better classifcation accuracy of breast cancer. In the segmentation task [30], the process of lymph node segmentation was completed by the interaction of two networks, where the decision network provided the target bounding box for the segmentation network, and the output of the segmentation network guided the policy network to make better strategies. Moreover, RL has also been applied and performed well in landmark detection [31][32][33][34]. Diferent from the traditional machine learning schemes, in this kind of application, the object appearance and parameter search strategy are unifed into a framework, in which the behaviour strategy of agent and the efective object feature representation are jointly learned to better achieve the given task. Te previous research explored and confrmed the feasibility and efectiveness of the application of multitask learning and RL in the corresponding scenes and also promoted the inspiration of our research ideas in this paper. Te specifc details will be shown in the following sections.

The Proposed Method
As shown in Figure 3, it is the overall fow chart of the scheme proposed in this paper, including image preprocessing module (IPM), ILM, and SARLM. Firstly, IPM is employed to provide datasets for the subsequent training and testing steps of ILM and SARLM. Secondly, the operation of independently training ILM and SAILM based on the training dataset is carefully performed. Ten, the testing images are input into the trained ILM model to obtain the preliminary results of key boundary points. Finally, the testing images and the corresponding location results are sent to SARLM to get the fnal results.

Motivation.
Trough the previous brief analysis, it can be clearly found that both the multitask learning and reinforcement learning have achieved a wide layout in various visual tasks and obtained gratifying results. In the previous applications, the multitask learning paradigm does improve the adaptability of the deep learning model to a certain extent. Nevertheless, the paradigm usually works independently in the downstream tasks, thus resulting in the generalization of the multitask model that is still afected by factors such as the volume of data and network structure. Although the RL model performs well in diferent scenarios, agents usually regard the global region of the input image as the interactive environment, which is bound to lead to a signifcant increase in the time cost and computing power required to complete the task. Tis cannot help but bring some thoughts to our research in this paper, that is, whether these two learning paradigms can be integrated to alleviate the above issues. On the one hand, ILM is used to realize the preliminary and rapid location of key boundary points to provide the initial starting point for the RL model, which in turn achieves the reduction of the active region of artifcial agent. On the other hand, based on the unique exploration ability of agent in the RL model, the position of key boundary points is further adjusted on the basis of the initial location results of ILM in a local range. Te specifc implementation route and experimental results will be detailed in the following sections. It is the successful application and surprising achievements of these two learning paradigms in various visual tasks that encourage us to make further attempts in this key boundary point location task.

Te Preprocessing
Step. Due to the equipment and human factors, the quality and size of medical images initially obtained from the clinic are often unable to directly adapt to the downstream tasks, so the image preprocessing operation is particularly critical. In this paper, the source images acquired from the clinic are in a whole composed of the scanning laser ophthalmoscope (SLO) part and the OCT B-scan part, which cannot be directly applied to the key boundary points location tasks. In view of this, we use the separation operation designed in our previous work [18] to realize the separation of the previous parts. In addition, considering that the image size and speckle noise may interfere with the performance of both the ILM and SARLM, the clipping operation and the BM3D (Block-matching and 3D fltering [35])-based denoising operation are then applied to OCT B-scan images. Te overall process of IPM is shown in Figure 4. After the image preprocessing, the size and quality of OCT B-scan images have been improved and then followed by image annotating step which is completed  by professional ophthalmologists. Finally, the OCT B-scan image dataset used to locate the key boundary points in the CSCR lesion area is established, which paves the way for the follow-up work.

Te Joint Framework
3.3.1. ILM. Inspired by the excellent performance of the multitask model in face key point detection [36], we adjusted the architecture appropriately for the frst time and introduced it into the key boundary point location scene in the CSCR lesion area [18], realizing another application test of this paradigm. In this paper, ILM continues to serve the task of initial location of key boundary points, and its specifc composition is shown in Figure 5. Te residual network [37] and MobileNet [38,39] network are employed here as the CNN backbones to mine the background and nonbackground information contained in the OCT B-scan images, enabling the feature representation of CSCR lesion

Testing Dataset
Preliminary Results

Final Results
:The preprocessing step :The training step :The final testing step :The preliminary testing step    [39], which are regarded as two kinds of backbone networks. Taking into account the network parameters and experimental conditions, resnet18 (i.e., R18), resnet34 (i.e., R34), and resnet50 (i.e., R50) are selected as another three kinds of backbone networks. Te previous fve backbone network structures are shown in Table 1. In addition, considering the capacity limitation in this paper, the FPN module [40], context module, and multitask loss module [41] will not be repeated further.

SARLM.
As previously analyzed, RL has been popularized in various visual tasks; especially, its successful application in landmark detection promotes the proposal of our scheme. It should be noted that considering the distribution characteristics of key boundary points in this task and the time cost of agent interaction with the environment, this paper establishes SARLM to deal with the location task based on a single agent. Te overall framework of SARLM is shown in Figure 6(a). Since the key boundary points are located on both sides of the CSCR lesion area, the SARLMs based on the left agent and the right agent are designed respectively. Although the structure of the two SARLMs is the same, the training process is carried out separately. Unlike the traditional machine learning scheme, the training samples required by SARLM are obtained through the continuous interaction between the agent and the environment, which are stored in the experience memory. Te terms involved in the process are as follows: (i) State: Tis term describes the surrounding environment including the location of the agent, which is mainly divided into the current state and the next state. In this task, in order to improve the operation efciency of the agent, based on the initial location point provided by ILM, we frst limit the active region of the agent to the purple square box (as shown in Figure 6(a)) with the size of 80. Ten, with the location of the agent as the center, a square region with a size of 32 is cropped on the B-scan image as the state.
(ii) Action: Tis term refers to the moving direction of the agent in the environment, which is used to realize the interaction between the agent and the environment. In this paper, we set up four discrete actions, namely, up, down, left, and right, to control the agent to move in the corresponding direction with a step of one pixel, so as to achieve its exploration of the environment. (iii) Reward: Tis term denotes the feedback of an agent after taking an action, aiming at evaluating whether the current action is conducive to the agent to achieve the given task. In the task of locating the key boundary points in the CSCR lesion area, the difference of the Euclidean distances between the agent and the target point before and after the action is regarded as reward. In addition, in order to avoid excessive Q value and obtain good conditional gradient, reward is clipped between −1 and 1 according to the common operation. Te reward function is defned as follows: (iv) Policy: Tis term is a mechanism to determine the behavior of an agent. It is a mapping from the current state of the agent to the corresponding behavior taken by the agent. It defnes various possible behaviors and corresponding probabilities of the agent in each state. In the key boundary point location task, the strategy is the behavior selection mechanism that enables the agent to reach the key boundary point of the CSCR lesion by a series of optimal actions. In the process of taking the optimal action, the agent can obtain the maximum cumulative reward. (v) Termination: Tis term is used to defne the stopping rules of agents in the training or testing stages, so as to prevent the agents from exploring and exploiting in the environment indefnitely. In this paper, in the training stage, we defne the termination fag as true when the Euclidean distance between the agent and the target point is less than or equal to one pixel.   Computational Intelligence and Neuroscience True, else, where N Train denotes the maximum iteration value to limit the number of times the agent implements the target point location operation in the environment during training stage, and its value is empirically set to 100. In the testing stage,    Computational Intelligence and Neuroscience because the ground truth of the target point cannot be provided, we design the following termination rule according to the experimental observation: True, else, where δ Q represents the diference between the average value of the frst eight elements in the last 16 Q values and the average value of the last eight elements in the last 16 Q values. Tr is the threshold which is set to 0.3. q is used to confrm whether the agent has converged to the target point, and its corresponding threshold T q is set to 2 in this paper according to the time cost and experimental observation. N Test denotes the maximum iteration value to limit the number of times the agent implements the target point location operation in the environment during testing stage, and its value is empirically set to 60. After the previous brief introduction and analysis, it is followed by the typical reinforcement learning paradigm (i.e., Q-Learning) [42]. In various RL scenarios, appropriate actions are the core to achieve efective and continuous interaction between agent and environment, the optimization process of which can be completed based on the state-action value function Q (s, a) [43]. By solving the following formula, the corresponding Q value can be obtained after implementing the corresponding action in each state, and then the best action can be selected depending on the highest long-term Q value.
where Q i+1 and Q i represent the Q values at step I and i + 1, respectively. s and s ′ denote the current state and the next state, respectively. Correspondingly, a and a ′ are the current action and the next action, respectively. c is the discount factor and is set to 0.95 in this paper. However, when there are too many state-action pairs in Q-learning, the Q- where θ ′ and θ are the parameters of target Q-Network and the current Q-Network, respectively. In this paper, the DQN structure is designed as shown in Figure 6 (b). Te input size of the network is 32 × 32 × 4. During the training or testing stage, the input image is composed of four cropped single channel patches with the size of 32 × 32. After a series of convolutional layers, pooling layers, fully-connected layers, and various activation functions, a four-dimensional Qvalue vector can be obtained, and then the corresponding action can be selected for the agent to interact with the environment.

ILM-SARLM.
On the basis of the previous research, the joint framework shown in Figure 7 is fnally established, mainly consisting of three parts, namely, ILM part, SARLM part, and joint location part. Te execution contents of each part are as follows: ① ILM is frstly trained based on all the OCT B-scan images in the training dataset. Since fve CNN backbone networks are applied in the ILM framework, the training process needs to be repeated fve times. Ten, ofine testing models corresponding to various backbone networks can be acquired. ② It should be pointed out that the location model of key boundary points on both sides of the CSCR lesion area is trained independently. Although the network structure on both sides is the same, the weight parameters will not be shared between them. Under the previous premise, SARLMs are trained based on some OCT B-scan images in the training dataset to obtain independent DQN models for the action selection of left and right agents, respectively. ③ On the basis of ① and ②, the location task of key boundary points is achieved through a cascade operation. Specifcally, the testing image is frstly sent to the ofine testing model to obtain the initial coordinates of the key boundary points on both sides. Ten, the testing image and its corresponding initial location results are sent to the DQN models, and the agents further optimize the initial points of ILM again in the purple square active region (as shown in Figure 7) which is delimited according to the initial points on both sides.

Te Backbone Networks and Algorithm
Step. Tis section shows the structures of fve CNN backbone networks used in ILM, and the specifc details are shown in Table 1. In addition, the algorithm steps of ILM are detailed in [18], and the algorithm steps of SARLM and joint location parts are shown in Table 2.

Experimental Settings
In order to verify the feasibility and efectiveness of the proposed scheme, this paper has conducted in-depth and extensive experiments. Te training dataset, testing dataset, parameter settings and equipment conditions, and  Computational Intelligence and Neuroscience evaluation metric involved in the experiments will be briefy described.

Parameter Settings and Equipment
Conditions. Both ILM and SARLM are based on the TensorFlow framework as the development platform, and the computing power involved in the training and testing process of these models is mainly supplied by NVIDIA-3080ti GPU. For ILM, the parameters of epoch, learning rate, batch size, and optimizer are set to 40, 0.001, 20, and Adam, respectively; [18]can be referred for other settings. For SARLM, the parameters of epoch, learning rate, batch size, max episode, update frequency, sample step, and optimizer are set to 80, 0.0001, 32, 25, 50, 5, and Adam, respectively.

Evaluation Metric.
In order to quantify the location accuracy of the key boundary points in the CSCR lesion area so as to evaluate the performance of the proposed scheme, this paper adopts an average Euclidean distance (AED) evaluation metric, which is expressed as follows: For left key point, LG For right key point, where LG and RG are vectors determined by the coordinate of the left key point (i.e., (x new L , y new L )), its corresponding ground truth value (i.e., (x truth L , y truth L )), the coordinate of the right key point (i.e., (x new R , y new R )), and its corresponding truth value (i.e., (x truth R , y truth R )), respectively.
‖ · ‖ 2 represents the 2-norm to perform the calculation of Euclidean distance, and N T denotes the number of OCT B-scan images included in a single patient data. Te diameter measurement of the CSCR lesion area is also based on the 2norm, and the specifc equation is as follows: where LR is the vector determined by the left and right key boundary points.

Results and Discussions
Tis section analyzes and discusses the experiment. Before that, we will explain the terms involved in this process. It should be pointed out that in order to verify the efectiveness and feasibility of the joint framework proposed in this paper, not only the pure DQN based SARLM is combined with each kind of ILMs but also the DDQN-based [44,45], Duel DQN (DuelDQN)-based [46], and Duel DDQN (DuelDDQN)-based SARLMs are introduced into the joint framework. As mentioned earlier, the SARLM model for locating the key boundary points on both sides in this paper is independent of each other, and the training process is carried out separately. In this way, the left key point location model based on DQN is named DQN_LP, and the corresponding location model for the right key point is named DQN_ RP. According to the same rule, DDQN_LP, DDQN_RP, DuelDQN_LP, DuelDQN_RP, DuelDDQN_LP, and DuelDDQN_RP can be obtained, respectively. Moreover, for diferent ILMs, we name it according to the name of the CNN backbone network in Section 3.3.1. We take the R18 backbone network and DQN for example, the pure ILM is named R18-Base, and the corresponding joint framework is called R18-DQN. Te names of other joint frameworks also comply with this rule.

Convergence Observation of ILMs and SARLMs on the Training Dataset.
Properly judging whether the model converges in the training process is an indispensable link related to the later model performance test and actual deployment. As shown in Figure 8(a), each ILM can converge from a large value to a small loss value after 6480 iterations, revealing that the weight parameters of the model can better ft a nonlinear function to deal with the task given in this paper. In addition, it can also be found that the progress trend of each task loss curve in ILM is almost the same, and the diference between losses is very small, which has important reference signifcance for setting the proportion parameter of each task item in the total loss. For SARLM, each model was trained with 80 epochs on the training dataset including 30 OCT B-scan images, and the total training time of eight models was about 240 hours, that is, 10 days. In order to boost the adaptability of SARLM to the initial position during the testing stage, the initial coordinate of the agent is initialized randomly in each episode of training, based on the coordinate ground truth of each key boundary point and the corresponding margin randomly selected from [−4, −6, −8, −10, −12, −14, 4, 6, 8, 10, 12, 14]. It can be clearly observed from Figure 8(b) that compared with the initial period, the reward value of each SARLM in the later training period fnally stabilized within a certain range, which implies that through continuous exploration and exploitation, the agent gradually learned how to formulate appropriate behaviour strategies according to its environment to achieve the location task of the key boundary points in the CSCR lesion area. Te previous observations show that the feasibility of ILMs and SARLMs in the training dataset has been preliminarily verifed, laying a foundation for subsequent analysis.

Performance Analysis of SARLMs on the Validation
Dataset. In order to select the appropriate SARLM for later application in the testing dataset, we conducted relevant experiments on the validation dataset consisting of 10 OCT B-scan images. Specifcally, in order to check the location performance of SARLM under the random starting point in the local region of CSCR lesion, we designed 16 random initialization starting points for both the left and right agents based on the coordinate ground truth of the key boundary points and their corresponding margins in the X-axis direction and Y-axis direction (i.e., [−4, −14, 4, 14] for the Xaxis direction and [−4, −14, 4, 14] for the Y-axis direction). Under the previous settings, the AED curves of various SARLMs based on 80 epochs are obtained as shown in Figure 9. On the whole, in the later stage of training, the AED value of each SARLM on the validation dataset decreases compared with the initial stage, which indicates that the ability of the agent to locate the key point under the condition of random initial position is appropriately improved with the increase of learning times, corresponding to the hint of the reward convergence curve of SARLM on the training dataset. Moreover, the overall trend of the AED curves of SARLMs for the left and right key points in the same environment is similar, indicating that even if the local regions of the key points on both sides are diferent, the strategy of training the SARLMs on both sides independently in this paper can make the agents properly adapt to this change.
Q value is an important indicator to measure the closeness of agents to target points and is the key basis to terminate the interaction between agents and the environment. As pointed out in [29], when the agent approaches the target point, Q value is relatively small; otherwise, it is relatively large. Tis view is further confrmed by the experimental results involved in the task of this paper. As shown in Figure 10, each best SARLM-selected based on the minimum AED can almost converge from the larger Q value at the initial time to the smaller Q value at the fnal stage, which means that the agent can perform well in the validation dataset after continuous learning. In some cases, when the initial position of the agent is set to approach the target point, the Q value does not change much before and after convergence, suggesting that the agent judges that it is close to the target point fall into the local optimized point in advance. Tis situation is also where we strive to break in the future.
However, we can be gratifed that when the initial position of the agent is far away from the target point, each kind of SARLM can stabilize at a small Q value after the iteration stops, which shows that even if the maximum distance between the agent and the target point is about 19 pixels in the local region, it can fnally converge near the target point. In fact, the initial location accuracy of ILM is usually less than this value (as shown in Table 3), which proves that it is feasible to use ILM to narrow the active region of the agent for SARLM. Corresponding to Figures 10 and 11, it shows the visual performance of each SARLM in the key boundary point location task under 16 initial positions, in which the silver dotted line represents the fnal position of the agent corresponding to the initial position. What can be clearly captured is that the agent can fnally converge near the target point at diferent initial positions. Although there are great diferences in each initial position, each fnal position of the agent is very close, and in some cases, the agent even locks the same fnal point (such as Figure 11(a) DQN_RP, DuelDDQN_LP, DuleDDQN_RP and so on). Te previous analysis not only shows the good location ability of each SARLM on the validation dataset but also implies its low sensitivity to the initial position in the local region, which plays an important role in promoting the proposal of the joint framework.

Performance of ILMs and ILM-SARLMs on the Testing
Dataset. Based on the previous results and analysis, this section formally investigates the performance of the joint framework constructed based on the multitask learning and the single-agent reinforcement learning on the testing dataset, including qualitative analysis, quantitative analysis, and efciency analysis, so as to explore and discover the value and potential of the framework in practical application scenarios and the corresponding details to be further improved. Figures 12 and 13, the Q value convergence and practical application efect of the joint framework composed of diferent ILMs and SARLMs under the key boundary point location task at the image level are shown, respectively. Te icon on the right in Figure 12 represents the scanning number of all B-scan images in the patient-level data. In general, each joint framework can achieve convergence under a certain number of iterations, that is, obtain a smaller Q value in the later stage. However, similar to the observations in the validation dataset, the diference between the Q value of the agent before and after the key point location task in some B-scan images of the testing dataset is small. Tis reveals that the initial starting point provided by ILM for SARLM is relatively close to the target point, thus making the agent wander around the target point all the time, and the external representation is the periodic small amplitude fuctuation of Q value. As shown in Figure 13, the reason behind the previous phenomenon is further demonstrated by the actual location efect. It can be clearly captured that sometimes the initial starting point is indeed in the region near the target point, resulting in the agent naturally interacting with the environment in a small range and gaining a small Q value, which echoes the observation in Figure 12. In addition, the cyan   Computational Intelligence and Neuroscience line in the fgure represents the distance between the fnal location point of the joint framework and the target point, while the silver dotted line here denotes the distance between the starting point provided by ILM and the target point. It can be seen that in most cases, the length of the cyan line is smaller than that of the silver dotted line, which indicates that the joint framework has the ability to further optimize the position of the initial key boundary points. Meanwhile, Figure 13 also visually shows the performance diference between joint frameworks and conveys the level of various ILMs in the initial location task, which is conducive to the selection and deployment of the joint framework in practical application.

Qualitative Analysis. As shown in
In addition to the previous qualitative analysis at the picture level, we also verify the ability of the diferent joint framework and the corresponding ILMs on patient-level data, so the discussion based on patient level is also introduced here. Figure 14(a) clearly reveals the AED of each ILM and its corresponding joint framework on each patient-level data. It is obvious that compared with the pure ILMs, all kinds of joint frameworks can basically obtain lower AED values, further enriching the evidence of the efectiveness of the proposed model. Furthermore, the robustness of various joint frameworks and their ILMs at the patient level has also been carefully considered. As shown in Figure 14 (b), compared with ILM, the box height of its corresponding joint framework is relatively small, showing that the performance of the joint framework on patient-level data fuctuates little, implying that the joint framework has good robustness. Te cyan triangle in the fgure denotes the average AED value of ILM or joint framework. Te lower value also proves that the   Computational Intelligence and Neuroscience joint framework possesses a more prominent key point location ability as a whole. What can also be seen is that there is a signifcant diference in robustness between joint frameworks, which, together with the information provided in Figure 13, sets a reference for the selection of models.
At the end of this section, the DQN-based SARLM is taken as an example to show the actual results of the joint frameworks under diferent ILMs in locating the key boundary points of CSCR lesion. As shown in Figure 15, with the ground truth as a reference, it is easily judged that various joint frameworks can achieve the efective reoptimization of the position of key boundary points on the basis of the initial results of ILM, which is consistent with the previous qualitative analysis based on the AED metric. Trough the actual location results, the efectiveness and feasibility of the joint framework are demonstrated again.

Qualitative Analysis.
Tis section further analyzes the performance between the joint framework and the corresponding ILM from the quantitative perspective. When the AED between the key boundary points on both sides provided by the joint framework and the corresponding target points is less than that between the key boundary points on both sides provided by the ILM and the corresponding target points, the joint framework successfully corrects the key boundary points on the B-scan image. As shown in Figure 16, from the image level, the successful correction rate of the key boundary point of the joint framework exceeds half of the total number of testing images, and the maximum  Besides, there are some diferences in the successful correction rate between joint frameworks based on fve kinds of ILMs. However, in view of the defects of the weak generalization ability of ILM itself and the performance diferences between them, it cannot be directly said that a smaller value corresponds to a joint framework with weak correction ability, and a larger value represents a strong ability. Te successful correction rate here mainly refects that the joint framework can further optimize the location of key boundary points based on the corresponding ILM. As for the comparison of performance diferences between models, it can be based on the qualitative analysis in the previous section and the AED metric at the patient level shown in Table 3. As shown in Table 3, the patient-level-based AEDs are counted. It can be clearly observed that each joint framework is signifcantly superior to the corresponding ILM in terms of this metric, obtaining an AED value with a minimum of 3.61 pixels, and the maximum diference between the two types of models is 5.68 pixels, which quantitatively shows the advantages of the joint framework. Additionally, due to the introduction of SARLM, the size of the joint framework is larger than that of the corresponding ILM, but this margin has little impact on the actual deployment. In general, the previous quantitative analysis also confrms the positive role of the joint framework in the given task.

Efciency Analysis.
Tis section focuses on analyzing the efciency of the proposed joint framework and ILM. Te time cost is the key reference to judge the efciency of the model in completing the given task. In view of this, we recorded the time cost of ILMs and the corresponding joint frameworks in the key boundary point location task in the CSCR lesion. Figure 17(a) shows the time consumption  Computational Intelligence and Neuroscience   0  2 4  23  22  21  20  19  18  17  16  15  14  13  12  11  10  9  8  7  6  5  4  3  2  1   Patient Number   0  24  23  22  21  20  19  18  17  16  15  14  13  12  11  10  9  8  7  6  5  4  3

22
Computational Intelligence and Neuroscience based on the patient level, in which the time of ILM is signifcantly lower than that of the joint framework, and its minimum time consumption is only about 6.05 seconds. Furthermore, the time consumption based on the image level refected in Figure 17(b) also displays the efciency advantage of various ILMs. Tis situation is mainly due to the fact that SARLM in the joint framework needs to rely on the continuous interaction between the agent and the environment. Such an iterative process will naturally lead to an increase in time cost.     paper aims to compress the time expenditure for the joint framework from two perspectives. On the one hand, the active region of the agent is limited to a local scope through the initial key boundary point location function of ILM, so as to avoid the transition time consumption caused by the global activities of the agent. On the other hand, the termination rule is properly developed to stop the repeated interaction behaviour of the agent in the later stage as soon as possible when it has approached the target point. Te former can undoubtedly improve the efciency of the agent. As for the latter, although the termination rule based on the Q value and the maximum iteration designed in this paper can appropriately stop the repeated wandering behaviour of the agent on time, better termination hints still need to be explored, which is also the perspective we need to further improve in the future.

Discussion on the Efectiveness of SARLMs for ILM-
GBCMs on the Testing Dataset. Te analysis and discussion in the previous sections show the superiority of the joint framework proposed in this paper over ILM. Tis section intends to explore whether SARLM can be equally efective in further improving the performance of ILM-GBCM [18] through qualitative and quantitative experiments. As shown in Figure 18 Figure 18(a) shows that the AED values of the joint frameworks at the patient level are higher than those of the corresponding ILM-GBCMs most of the time, indicating that the joint framework can further optimize the position of the key boundary points after the GBCM correction. It can also be found from Figure 18 (b) that whether considering the average value of AED (the cyan triangle in the fgure) or based on the box height, all kinds of joint frameworks are signifcantly better than their corresponding ILM-GBCMs, not only confrming the role of the joint framework in improving the location accuracy of key boundary points of ILM-GBCM but also revealing its good robustness in this task. Furthermore, quantitative analysis experiments were carried out accordingly. As shown in Figure 19, the maximum successful correction rate of the joint framework is 83.00%, which shows that it has the ability to further improve the location accuracy on the basis of the correction results. However, by observing Figures 16 and 19, it can be found that compared with the successful correction rate of the location results of pure ILM, the improvement degree of SARLM on this metric of ILM-GBCM is relatively low, which stems from the fact that GBCM has played a corrective role in the initial location results of ILM to a certain extent, thus making the space for SARLM to further optimize the location accuracy of key boundary points smaller. Concurrently, AEDs of the two models based on the patient level are also emphatically recorded, as shown in Table 4.
Compared with ILM-GBCM, the AED values of the corresponding ILM-GBCM-SARLM are reduced, and the maximum diference between them is 3.64 pixels, which together with the image-level-based successful correction rate shows that SARLM also has a certain efect in boosting the ability of ILM-GBCM to locate the key boundary points in the CSCR lesion area.

Preliminary Application.
Extensive experiments and analysis have proved the superiority of the proposed joint framework compared with the corresponding ILM in the key boundary point location task in the CSCR lesion area, which is embodied in the fact that it can make up for the weak generalization of ILM in this scene through the unique autonomous learning ability of agent in the lesion environment, which paves the way for its preliminary application in the actual measurement of the diameter of CSCR lesions in this section. Based on formulas (9) and (10) and the coordinates of the key boundary points output by the joint framework, the diameters of the CSCR lesions at all scanning angles can be measured quickly. As shown in Figure 20 On this basis, it is not only convenient for ophthalmologists to review the diameter size of the lesion at all scanning angles but also convenient for them to check the diameter size of the lesion at a certain scanning angle. Moreover, based on the diameter measurement results of lesions in all B-scan images, the maximum, minimum, and average values can also be obtained, providing a quantitative reference for ophthalmologists to judge the severity of the CSCR and evaluate the efcacy of the corresponding treatment scheme.

Te Clinical Advantages of Tis Study in Ophthalmology.
By and large, in the previous sections, both the qualitativequantitative analysis oriented to the evaluation of location efect and the time-cost consideration oriented to the evaluation of efciency provide strong support for demonstrating the feasibility, efectiveness, and potential clinical application value of the proposed joint framework based on ILM and SARLM in the task of locating key boundary points in the CSCR lesion area. In addition, the qualitativequantitative experiments of the joint framework composed of SARLM and ILM-GBCM reveal that the introduction of SARLM can also add luster to the further optimization of the initial key boundary points provided by ILM-GBCM. On the basis of the previous part, based on the preliminary application of the diameter measurement of the CSCR lesion area, the potential deployment value of the proposed joint framework is afrmed from a practical point of view, and the signifcance of this study is further consolidated. It should be noted that although the scheme proposed in this paper is aimed at the diameter measurement of the CSCR lesion area, this method is also instructive for the design of the automatic measurement scheme of the diameter of the focus area in 24 Computational Intelligence and Neuroscience OCT B-scan images of other fundus diseases (such as diabetic macular edema, macular hole, and retinal angiomatous proliferation), which has potential guiding signifcance for more comprehensive monitoring of the lesion morphology and assisting ophthalmologists in the more objective assessment of patients' eye conditions.

Limitation Analysis and Future Work.
Although this study reveals good clinical application value, the inadequacy of the proposed scheme still needs to be properly faced. Specifcally, the joint framework may encounter the phenomenon of an agent wandering around the target point repeatedly in the procedure of processing a given task, which is also the factor leading to its slightly lower efciency. We hold that this phenomenon may be due to the fact that the surrounding environment is too similar when the agent is approaching the target point in the process of interaction with the environment, resulting in the agent being unable to learn more efective behavior strategies and then falling into a local area and unable to extricate itself. For this problem, in the future, we plan to 0  24  23  22  21  20  19  18  17  16  15  14  13  12  11  10  9  8  7  6  5  4  3  2  1   Patient Number   0  24  23  22  21  20  19  18  17  16  15  14  13  12  11  10  9  8  7  6  5  4  3  2  1   Patient Number   0  24  23  22  21  20  19  18  17  16  15  14  13  12  11     continue to mine solutions from the design of the termination rule and the adjustment of the DQN framework structure, so as to further improve the performance of the joint framework.

Conclusions
An automatic image processing method (i.e., the joint framework) based on the multitask learning and single-agent reinforcement learning paradigms is constructed to achieve the goal of rapid and accurate location of key boundary points in CSCR lesion area, so as to facilitate the automatic diameter measurement of CSCR lesion. On the one hand, the adjustment and introduction of ILM initially realize the rapid locking of key boundary points and efectively narrow the activity range of agents, which helps to improve the location efciency of SARLM. On the other hand, the unique exploration ability of the agent enables it to independently learn task-oriented behavior strategies, so that SARLM can better adapt to the differences of CSCR lesion areas in diferent scanning frames and appropriately make up for the defect of low generalization of ILM. Extensive experiments have been carried out carefully, demonstrating the efectiveness and feasibility of the joint framework in improving the location performance of ILM. Te preliminary test on the diameter measurement of the CSCR lesion further reveals the potential clinical application value of the proposed joint framework, which also has a certain reference signifcance for the designation of the diameter measurement scheme of lesions in other fundus diseases (diabetic macular edema, macular hole, and retinal angiomatous proliferation). Generally speaking, the method proposed in this paper is a further innovation based on our previous work from the perspective of the algorithm, and in the future, we will pay attention to the inadequacy of this scheme and improve it.

Data Availability
Te CSCR dataset used and analyzed in our research is available from the corresponding authors upon reasonable request.

Conflicts of Interest
Te authors declare that they have no conficts of interest regarding this paper.