Interpretable and Expandable Deep Learning

5 downloads 0 Views 966KB Size Report
interpretable and expandable automatically diagnosis framework with deep ..... current research, cost sensitive CNN (Convolutional Neural Network) was ...
1Interpretable 2Diagnosis 3Elaborately

and

Expandable

Deep

Learning

System for Multiple Ocular Diseases: Simulating Doctors’ working

4Kai Zhang1, 2, Fan Liu3, Lin He1, Lei Zhang1, Yahan Yang2, Wangting Li2, Shuai Wang3, Lin Liu1, 5 Zhenzhen Liu2, Xiaohang Wu2, Xiyang Liu1, 3, 4, Haotian Lin2§ 61School of Computer Science and Technology, Xidian University, Xi’an 710071, China; 72State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen 8University, Guangzhou 510060, China; 93School of Software, Xidian University, Xi’an 710071, China; 104Institute of Software Engineering, Xidian University, Xi’an 710071, China; 11§Corresponding authors: 12Prof. Haotian Lin 13Email: [email protected]; 14Address: Zhongshan Ophthalmic Center, Xian Lie South Road 54#, Guangzhou, China, 510060. 15Telephone: +86-020-87330493; Fax: +86-020-87333271. 16 17 18 19 20 21 22ABSTRACT 1 2

1

23Background: Although artificial intelligence performs promisingly in medicine, but there is few 24automatic disease diagnosis platforms which can clearly explain why a specific medical decision is 25made. 26Objective: The goal of this study is to devise and develop an interpretable and expandable diagnosis 27framework for automatically diagnosing multiple ocular diseases and providing treatment 28recommendations as the particular illness of specific patient. This system can simulate the complete 29diagnostic process of experienced doctors so that diagnostic process can be tracked and understood. 30Methods: The diagnosis of ocular diseases is highly dependent on observing medical images. 31Therefore, current research chose ophthalmic images as research material. All medical images were 32labelled to four types of diseases or normal (totally five classes); each image is decomposed into 33different parts according to the anatomical knowledge and then annotated. This process yields the 34positions and primary information on the different anatomical parts and foci observed in medical 35images. Thus, the gap between medical image and diagnosis process can be bridged. Next, the 36diagnosis of other diseases can rely on the combination of different anatomical parts and foci. We 37applied images and the information produced during annotation process to implement an 38interpretable and expandable automatically diagnosis framework with deep learning. 39Results: This diagnosis framework consists of four stages. The first stage can identify the type of 40disease, and the identification accuracy reaches 93%. The second stage can localize the anatomical 41parts and foci of eye, the localization accuracy reaches 82% and 90% for images under natural light 42without fluorescein sodium eye drops and images under cobalt blue light or natural light with 43fluorescein sodium eye drops respectively. The third stage can carefully classify the specific 44condition of each anatomical part or focus with the result from the second stage, the average

3 4

2

45accuracy for multiple classification problems in this stage reaches 79% - 98%. The last stage is 46providing treatment advices according to medical experience and artificial intelligence which is 47merely involved with pterygium, its accuracy surpasses 95%. On this basis, we developed a 48telemedical system which can show detailed reasons for a particular diagnosis to doctors and patients 49to help doctors with medical decision making. This system can carefully analyze medical images and 50provide treatment advices according to the analysis results and consultation between doctor and 51patient. 52Conclusions: The interpretable and expandable medical artificial intelligence platform was 53successfully built. This system can identify the disease, distinguish different anatomical parts and 54foci, discern the diagnostic information that is relevant to the diagnosis of diseases and provide 55treatment suggestions. During this process, whole diagnostic flow becomes clear and understandable 56to both doctors and their patients. Moreover, other diseases can be seamlessly integrated into this 57system without any influence on existing modules or diseases. Furthermore, this framework can 58assist in the clinical training of junior doctor. Due to the rare high-grade medical resource, it is 59impossible that everyone receive professional diagnosis and treatment service with high quality. This 60framework can not only be applied in hospitals with insufficient medical resources to decrease the 61pressure on experienced doctors and deployed in remote areas to help doctors diagnose common 62ocular diseases. 63Key words: deep learning; object localization; multiple ocular diseases; interpretable and 64expandable diagnosis framework; making medical decisions 65 66INTRODUCTION 5 6

3

67

Although there have been many artificial intelligence based automatic diagnostic platforms, the

68diagnostic results produced by such computer systems cannot be easily understood. Artificial 69intelligence which obtains diagnostic result from the computational perspective cannot provide the 70reason which is depicted as clinical practice for a given diagnosis. Some researches have attempted 71to make the conclusion obtained from artificial intelligence methods explainable, such as Raccuglia 72et al. used decision tree to understand the classification result from support vector machine. Hazlett, 73H. C. et al. used deep belief network, a reverse trackable neural network, to find a diagnostic 74evidence of autism. Bolei Zhou et al. used the output of last full-connected layer of convolution 75neural network to infer which part of image causes the final classification result, which also provides 76the evidence of classification. Zeiler et al. used occlusion test to study which parts of images produce 77a given classification result. These studies made great achievements in explainable artificial 78intelligence, but readily explainable automatic diagnosis systems are still rare. The primary cause is 79these explainable methods did not explain their result according to human thought patterns. 80Therefore, this research aims to make additional progress based on previous studies. 81

There are many works about the automatically diagnosis of different types of diseases with

82medical imaging, but all these works are isolated. They cannot regard all diseases shown in a 83specific format of medical images with a unified perspective which is common in natural image 84processing and practical medical scenes. On the other hand, once all diseases are regarded unifiedly, 85the extensibility for integrating other types of medical imaging or disease will be easy. The diagnosis 86of ophthalmic diseases is highly dependent on observing the medical images, so present work 87selected ophthalmic images that represent multiple ocular diseases as material and treated them with 88consistent view. The unified automatic diagnostic procedure is the simulation of the working flow of 7 8

4

89doctors. Explainable artificial intelligence-based automatic diagnosis platform owns many 90advantages. Firstly, it can increase the confidence in the diagnostic results. Second, it assists doctors 91to perfect the diagnosing thinking. Third, it helps medical students deepen the medical knowledge. 92Finally, it can clear a path toward diagnosing greater numbers of diseases from a unified perspective 93

Besides, doctors can diagnose diseases via observing the medical images, but doctors from

94many specialties and subspecialties cannot tackle all diseases. If a patient suffers from more than one 95type of disease, the system can tackle these diseases simultaneously. This work plans to integrate the 96experience of doctors from many subspecialties to construct an omnipotent ophthalmologist. 97

To create an explainable automatic diagnosis system with artificial intelligence, we simulated

98the workflow of doctors to help artificial intelligence follow the patterns of human thought. The 99current research aims to apply artificial intelligence techniques to fully simulate the diagnostic 100process of doctors so that reasons for a given diagnosis can be illustrated directly to doctors and 101patients. 102

In current research, we designed an interpretable and expandable framework for multiple

103ocular diseases. There are four stages in this diagnostic framework: primary classification of disease, 104detection of each anatomical parts and foci, judging the conditions of anatomical parts and foci and 105providing treatment recommendations. The accuracies of all stages surpass 93%, 82%-87%, 79%10698% and 95% respectively. Not only this system is an interpretable diagnostic tool for doctors and 107patients, but also facilitates the accumulation of medical knowledge for medical students. Moreover, 108this system can be enriched to cover more ophthalmic diseases or more diseases of other specialties 109to provide more services as the workflow of doctors. Telemedicine can combine the medical experts 110and patients with considerable low cost. This research develops an interpretable and expandable tele

9 10

5

111medical artificial intelligence diagnostic system, which can also effectively improve the undesirable 112condition that medical resource with high quality is not enough and the distribution of it is not even. 113Finally, the health level of the people all over the world and the medical condition of 114underdeveloped countries can be improved with the help of computer network. 115MATERIALS 116

Data is important for data-driven research. The dataset is examined by all the members of our

117team. Besides, we developed some programs to facilitate the examination of data. All images were 118collected in Sun Yat-sen University Zhongshan Ophtalmic Center which is the leading ophthalmic 119hospital in China. In order to simulate the experience and diagnostic process of doctors, all images 120were segmented into several parts according to anatomical knowledge or diagnostic experiences and 121then were annotated. Next, multiple attributes of all parts are classified as the actual states of these 122parts (including foci). All of the relevant aspects of the data (images, the coordinates of each part and 123the attribute information) were used to train an artificial intelligence system. This data preparation 124process can not only help simulate the diagnostic process of doctors, but also can facilitate many 125followup studies, such as medical image segmentation, clinical experience mining and integration of 126refined diagnosing of multiple diseases. 127 128 129 130 131 132 133 134 135 136 137

Table 1. Detailed diagnostic information regarding the dataset. Disease

11 12

(NO. of classification

Values of diagnostic 6

Type of image

problem) Diagnostic information Pterygium

1.

Whether the body of the pterygium is hypertrophy

2.

Is pseudo pterygium

4.

Whether the head of pterygium is uplift

information Yes or no

Images under natural light without fluorescein sodium eye drops

6. Whether the head and body of the pterygium is hyperemia 7. Whether the pterygium is during progressive period Keratitis

8.

9.

Turbidity degree of cornea

Pupil zone is invaded by turbidity or not

Stage of keratitis

Infiltration stage and ulcer stage, perforation stage or convalescence

10. Corneal neovascularization

Yes or no

11. Edge of foci is clear 12. Condition of illness based on dyeing

138

No dyeing and dot staining, sheet dyeing or dyeing with coloboma

Images under cobalt blue light or natural light with fluorescein sodium eye drops

We collected 1513 images which can be classified into 5 classes (normal, pterygium, keratitis,

139subconjunctival hemorrhage, cataracts). The number of images of each class are listed in in Fig. 1(a). 140The examples of objects to be detected in images are shown in Fig. 1(b)-(k). For fundus images [the 141last row (i), (j) and (k)], the localized objects include artery (blue), vein (green), the macular (black),

13 14

7

142the optic disc (light purple), hard exudate (yellow) and so on. For other types of images, the objects 143to be localized includes the eyelid (red), eyelash (green), keratitis focus (yellow), cornea and iris 144zone with keratitis (pink), the pupil zone (blue), conjunctiva and sclera zone with hyperaemia 145(orange), the conjunctiva and sclera zone with edema (light blue), the conjunctiva and sclera zone 146with hemorrhage (brown), the pupil zone with cataracts (white), the slit arc of the cornea (black), 147cornea and iris zone (dark green), the conjunctiva and sclera zone (purple), pterygium (gray), the slit 148arc of keratitis focus (dark red) and the slit arc of the iris (light brown).The detailed diagnostic 149attributes to be classified is listed in Table 1, and each diagnostic information corresponds to a 150classification problem. The diagnostic information in Table 1 is corresponding to stage 3 (see 151METHODS). These information is essential and fundamental for diagnosing and providing 152treatment advice and will be determined in stage 3 of the interpretable artificial intelligence system 153(see section METHODS). All information (object annotation and diagnostic information) was 154double-blind marked by the annotation team which consists of five experienced ophthalmic doctors 155and 20 medical students. The annotation of fundus images was completed, however, the experiments 156on fundus images was not finished. Because the intrinsic characteristics of fundus image, the output 157of annotation method for fundus image is suitable for semantic segmentation. 158METHODS 159

The framework consists of four functional stages: (a) judging the class of disease, prelimary

160diagnosis which is completed with original image without any processing; (b) detecting each part of 161image, localization of anatomical parts and foci which is used to discern different parts with different 162appearance so that more careful checking can be guaranteed; (c) classifying the attributes of each 163part, severity and illness assessment which is closely connected to stage (b) is used to determine the 15 16

8

164condition of the illness; and (d) providing treatment advice according to the results from the first, 165second and third stages, except for the treatment advice of pterygium is from artificial intelligence, 166whereas treatment advice of other diseases is from experiences of doctors. First, the disease is 167primarily identified during stage 1. Second, all anatomical parts and foci are localized during stage 2 168and important parts (cornea and iris zone with keratitis and pterygium) are segmented for the 169analysis in stage 3. Then, the attributes of all anatomical parts and foci are determined during stage 1703. Then, the treatment advice is provided in stage 4. The whole process imitates the diagnosing 171procedure of doctors so that reasons for a given diagnosis can be tracked and can be used to 172construct evidence-based diagnostic report. Finally, treatment advices can be provided according to 173the full workflow presented above. The flowchart of this system is shown in Fig. 2. The analysis of 174fundus images is coming soon and will be easily integrated into this system quickly as the same idea 175with existing images. The first, second and third function is fully based on artificial intelligence 176which is trained with dataset, the fourth function is depending on both artificial intelligence and 177experiences from doctors. 178

Machine learning, especially deep learning technique represented by Convolutional neural

179network, is becoming the effective computer vision tools for to automatically diagnosing diseases 180with biomedical image. It has been widely applied in medical image classification and automatic 181diagnosis of disease, such as diagnosis of attention deficit hyperactivity disorder with fMRI 182(functional Magnetic Resonance Imaging), gradation of brain tumor, breast cancer, lung cancer, 183diagnosis of skin disease, kidney disease, diagnosis of ophthalmic diseases. In present research, 184inception_v4 and Resnet(101 layers) were used to carry out the stage 1, stage 3 and stage 4 185respectively. Stage 1 (inception_v4) can give a general diagnostic conclusion and stage 3 (Resnet) 17 18

9

186and 4 (Resnet) can provide further information about diseases and treatment recommendations. In 187current research, cost sensitive CNN (Convolutional Neural Network) was adopted since the 188imbalanced classification is common in this research. Inception_v4 is a wider and deeper 189convolutional neural network which is suitable for careful classification (the difference between all 190classes is easily neglected sometimes). Resnet is a type of thin convolutional neural network whose 191architecture is full of cross-layer connections. The obejective function is transformed to fit the 192residual function so that the performance of Resnet is improved much. Resnet is suitable for rough 193classification (the difference between all classes is not needed be carefully analyzed). In addition, 194we chose Resnet with 101 layers whose volume is enough for the classification problems in 195current research. Stage 1 is a five classes classification and some classes are very similar in 196color and shape, so inception_v4 is chosen in stage 1. Other classification problems is limited in 197one specific disease so that Resnet is selected in stage 3 and 4. Chain rule of derivatives based 198stochastic gradient descent algorithm was used to minimize loss function. 199

Faster-RCNN, an effective and efficiency approach, was adopted to localize the anatomical

200parts and foci (stage 2). Faster-RCNN is developed based on RCNN and Fast-RCNN, which 201originally apply super pixel segmentation algorithm to produce proposal regions. Whereas, Faster202RCNN uses anchor mechanism to generate region proposals quickly and then adopts two-stage 203training to obtain the transformations of bounding box repressor and classifier. The first stage of 204Faster-RCNN is region proposal network which is responsible for generating region proposals. Then 205whether the proposals are objects or not are judged and the coordinates of each object are primary 206regressed. The second stage is judging the class of each object and eventually regressing the 207coordinate of each object, which is the same as RCNN and Fast-RCNN. In the current research, 19 20

10

208pretrained ZF network(Zeiler & Fergus network) was exploited to save training time. 209EXPERIMENTAL SETTINGS 210

This system were implemented with Caffe (Berkeley Vision and Learning Center deep-learning

211framework BVLC) and Tensorflow, all models were trained in parallel on four NVIDIA TITAN X 212GPUs. For classification problem, indicators applied to evaluate the performance are as follows: 213

k

Accuracy  (�TPi ) / N i 1

214

Precisioni  TPi /  TPi  FPi 

215

Sensitivityi  TPR, Recall   TPi /  TPi  FN i 

216

FNRi

217

Specificityi  TN i /  TN i  FPi 

218

FPRi  false positive rate   FPi /  TNi  FPi 



false negative rate   FN i /  TPi  FN i 

219where N is the total number of samples; Pi indicates the number of correctly classified samples of

220ith class; k is the number of classes in specific classification problem; TPi denotes the number of

221samples which are correctly classified as ith class; FPi is the number of samples which are wrongly

222recognized as ith class; FN i denotes the number of samples which are classified as jth class,

21 22

11

223 j �[1, c ] / i ; TN i is the number of samples which are recognized as negative jth class, j �[1, c ] / i 224; All above performance indicators can be computed with confusion matrix. In addition, the ROC 225(receiver operating characteristics) curve, which indicates how many samples of ith class are

226recognized conditioned on a specific number of jth class ( j �[1, c ] / i ) are classified as ith class, PR 227(precision recall) curve which illustrates how many samples of jth class are recognized as samples of

228ith class conditioned on a specific number of jth class ( j �[1, c ] / i ) are classified as ith class and 229AUC (area under curve) which means the area of the zone under ROC curve were also adopted to 230assess the performance. The indicators (Precision, Sensitivity, Specificity, ROC curve with AUC and 231PR curve) were only used to evaluate the performance of binary classification problems. Accuracy 232and confusion matrix were used to evaluate the performance of multi-classes classification problems. 233

For object localization problem, the interpolated AP (Average Precision) is always used to

234evaluate the performance. The interpolated AP is computed with PR curve as Eq. 1.

AP  235

1 � P int erp( ) 11  �{0,0.1,...,1} ~

(1)

P int erp( )  max( p( )) ~ ~  : �

~

~

236where, p ( ) is the measured precision at specific recall  . In present research, four fold 237cross validation was used to evaluate the performance of this system firmly for all classification 238problems and localization problems.

23 24

12

239

The application of cost sensitive CNN is dependent on the distribution of the dataset in specific

240classification problems. Except for the classification problems 1, 6 and 8, other classification 241problems in stage 3 and 4 were completed with cost sensitive CNN. 242RESULTS AND 243

DISCUSSION

All stages and the whole working flow of this system were completed with an acceptable

244performance. The four stages in the framework were separately trained and validated, and all 245relevant results are shown in Fig. 3 and Fig. 4. The rows and columns of all heat maps stand for 246ground truth labels and predicted labels respectively. Fig. 3(a) is the heat map of stage 1 and the 247accuracy reaches 92%. Fig. 3(b) shows the detection performance of Faster-RCNN in recognizing 248anatomical parts and foci, the mean value of average precision over all classes surpasses 82% and 24990% for images under natural light without fluorescein sodium eye drops and images under cobalt 250blue light or natural light with fluorescein sodium eye drops respectively. The left one of Fig. 3( b) is 251the performance for localizing the objects in images without fluorescein sodium eye drops during 252stage 2, where I – VX represent the cornea and iris zone with keratitis, focus of keratitis, the 253conjunctiva and sclera zone, the slit arc of the cornea, the slit arc of keratitis focus, the eyelid, the slit 254arc of the iris, the conjunctiva and sclera zone with hyperaemia, the conjunctiva and sclera zone with 255edema, cornea and iris zone, pterygium, eyelash, pupil zone, the conjunctiva and sclera zone with 256hemorrhage, the pupil zone with cataracts respectively. The right one of Fig. 3(b) is the performance 257for localizing the objects in images with fluorescein sodium eye drops during stage 2, where I – VII 258represent the cornea and iris zone with keratitis, focus of keratitis, the slit arc of cornea, the slit arc 259of keratitis focus, the slit arc of the iris, the eyelid, the eyelash respectively. The statistical results of 260stage 2 are shown in Supplementary Table 1- 2 (Supplementary Material). 25 26

13

261 The stage 3 was decomposed into ten classification problems and the relevant results are shown in 262Fig. 3(c), including the box plots for accuracy, specificity and sensitivity, ROC curve with AUC, PR 263curve for all binary classification problems and the heat maps with accuracy for all multi-classes 264classification problems. Fig. 4(a) shows the classification performance of stage 4, which includes 265box plot for accuracy, sensitivity and specificity, ROC curve with AUC value and PR curve. The 266only one classification problem addressed by stage 4 is whether patient who suffers from pterygium 267needs surgery. In stage 2, the detection rate of some objects is low because Faster-RCNN cannot 268effectively detect some small objects. We will overcome this issue by adjusting parameters of Faster269RCNN. In spite of this, the stage 3 will not be affected by this drawback, because the detection rate 270of cornea and iris zone with keratitis and pterygium (the relevant anatomical parts and foci) which 271involved with stage 3 is considerably high. In addition, the detection performance of pupil zone 272which is related to vision is also satisfactory. In stage 3, the specificity of classification problems 1, 2733, 4 and 5 are slightly low, the application scene of this system is hospital where doctors pay more 274attention to sensitivity than specificity. The result of all classification problems is satisfactory and 275acceptable. Furthermore, the performance of classification problems 1, 3, 4 and 5 can be improved 276with more samples under the circumstance of online learning. The statistical results of stage 3 and 4 277are shown in Supplementary Table 3-4 (Supplementary Material). 278 In order to study which anatomical parts are essential for automatic diagnostic, The stage 3 and 4 279were repeatedly again with original medical images without processing, all parameters are same with 280that of systems which is shown as Fig. 2(a). The relevant results are shown in Fig. 4(b). The 281classification performance close to that of classification with anatomical parts and foci. In another 282word, the important parts, cornea and iris zone with keratitis and pterygium are essential for 27 28

14

283automatic diagnosis. The statistical results of stage 3 and 4 with original images are shown in 284Supplementary Table 5 - 6 (Supplementary Material). 285WEB 286

BASED AUTOMATICALLY DIAGNOSTIC SYSTEM

We applied Django framework to develop a tele medical decision making and automatic

287diagnosing system to facilitate doctors and patients. This system can analyze inputted medical 288images, show diagnostic result as the working process of doctors and treatment advice with 289producing an examination report. The appearance and function introduction is shown as Fig. 4. This 290telemedical system can finely analyze medical images (Fig. 5(a), (b), (c), (d)) and provide treatment 291advices (Fig. 5 (e)) with a diagnostic report (a PDF file that includes treatment suggestion, Fig. 5 (f)) 292according to the analysis result and the consultation between doctor and patient. The format of the 293diagnostic report is shown in Supplementary Fig. 1 (Supplementary Material). All diagnosis 294information can be shown to doctor and patient with storing into database. Administrators and 295doctors can handle all information and contact patients conveniently. This system can be deployed in 296multiple hospitals and medical centers to screen common diseases and collect more medical data 297which can be used to improve the diagnosis performance. This system is freely available at 298http://114.67.37.252:80. 299CONCLUSIONS AND 300

FUTURE WORK

This study constructed an explainable artificial intelligence system for automatic diagnosis of

301multiple ophthalmic diseases. This system carefully mimics the working flow of doctors so that 302reasons for specific diagnosis can be shed light upon to doctors and patients with high performance. 303This system accelerates the application of telemedicine with the assistance of computer network and 304help develop the health level and medical condition. Moreover, this system can be easily expanded

29 30

15

305to cover more diseases as long as the diagnostic process of other diseases are simulated seamlessly. 306In addition, this system can help medical students to understand diagnosis and diseases. In the 307future, there are many progress can be made. In this research, we did not consider multi-label 308classification for those patients suffered from multiple diseases. In the future, multiple-label 309classification can be adopted further to make this system more close to real clinical circumstance. 310Moreover, because bound box is not suitable for some anatomical parts, semantic segmentation can 311be applied in this system for segmenting medical images more accurately. 312REFERENCES 313Acknowledgement 314 We greatfully thank the volunteers of AINIST (medical artificial intelligence 315alliance of Zhongshan School of Medicine, Sun Yat-sen University). This study was 316funded by the NSFC (No. 91546101, No. 61472311, No. 11401454, No. 61502371 and 317No. 81770967), National Defense Basic Research Project of China (jcky2016110c006), 318the Guangdong Provincial Natural Science Foundation (No. YQ2015006, No. 3192014A030306030, No. 2014TQ01R573, No. 2013B020400003), the Natural Science 320Foundation of Guangzhou City (No. 2014J2200060), The Guangdong Provincial 321Natural

Science

Foundation

for

Distinguished

Young

Scholars

of

China

322(2014A030306030), the Science and Technology Planning Projects of Guangdong 323Province (2017B030314025), the Key Research Plan for the National Natural Science 324Foundation of China in Cultivation Project (No. 91546101), the Ministry of Science and 325Technology of China Grants (2015CB964600), the Fundamental Research Funds for the 326Central Universities (No. 16ykjc28). 327Author contributions 31 32

16

328X.Y.L. and H.T.L. designed the research; K.Z. conducted the study; W.T.L., Z.Z.L. and 329X.H.W. collected the data and prepared the relevant information; K.Z., F.L., L.H., L.Z., 330L.L. and S.W. were responsible for coding; L.H. and L.Z. developed the web based 331system; K.Z. analyzed and completed the experimental results; and K.Z., W.T.L., H.T.L. 332and X.Y.L. co-wrote the manuscript. H.T.L. critical revised the manuscript. All the 333authors discussed the results and reviewed the manuscript. 334Abbreviations 335AI: Artificial Intelligence 336fMRI: functional Magnetic Resonance Imaging 337CNN: Convolutional Neural Network 338Resnet: Residue network 339Caffe: Convolutional Architecture for Fast Feature Embedding 340ROC curve: Receiver Operating Characteristics curve 341PR curve: Precision Recall curve 342AUC: area under ROC curve 343AP: Average Precision 344PDF: Portable Document Format 345 346 347 348 349 350 351 352 353 354 33 34

17

355 356 357 358 359FIGURES 360Figure 1. The information of image dataset and examples of each objects in terms of each type 361of disease or normal eye. 362

35 36

18

363 364Figure 2. Architecture of the overall framework for interpretable diagnosis of multiple ocular 365diseases. 366

37 38

19

367 368 369 370 371 372 373 374 375 376Figure 3. Performance of whole system (stage 1, 2 and 3). 377 39 40

20

378 379 380 381 382 383 384 385Figure 4. Performance of whole system (stage 4) and performance of stage 3 and 4 with 386original medical images. 41 42

21

387

388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414Figure 5. Appearance and function introduction of automatically diagnostic system. 43 44

22

415

416 417

45 46

23