A REPORTING GUIDELINE FOR MEDICAL APPLICATIONS OF ARTIFICIALNEURAL NETWORKS

H.C.E. McGowan, M. Stevenson, M.Frize
Department of Electrical Engineering, University of New Brunswick,PO Box 4400, Fredericton, NB E3B 5A3

ABSTRACT

Given the interdisciplinary nature of projects which employ ArtificialNeural Networks (ANNs) to estimate medical outcomes, and the wide spectrumof journals in which the results are published, there exists a need toestablish a guideline for reporting the details of these experiments. Inthis paper, one model will be proposed and discussed. Once a standard guidelineis adopted, comparisons of the work of different researchers will be greatlyfacilitated. While such a guideline need not be followed to the letterin order to publish a "good" paper in this field, incorporatingmany of the suggested details into publications would ensure that the overallperformance of each ANN is clearly documented and that the results arenot difficult to reproduce.

INTRODUCTION

The introduction of Artificial Neural Network (ANN) technology intothe commercial market, which followed a series of advances in researchin the late 1980's, has resulted in ANN software packages being widelyavailable to scientists and engineers of all disciplines. Specialists inmany fields other than neural network research quickly recognized the vastpotential of a "new" non-linear modeling technique and many nowincorporate ANNs into their work.

The field of medicine has not been immune to this trend. Some doctorsand nurses, whose work involves developing models of disease by performingtraditional statistical analyses of vast and complex databases of medicalinformation, have initiated research projects aimed at determining theusefulness of ANNs in estimating medical outcomes. In a flurry of researchpublished in the early 1990's, ANNs were used to identify everything fromheart attacks [Baxt, 1990 and Baxt, 1991] to microcalcifications on mammographicx-rays [Wu, 1993], with varying degrees of success.

Since the results of the first experiments of this type were published,there has been little discussion of how one determines whether or not aparticular ANN is useful in the medical domain and what (if any) techniquescan be used to improve the performance level of such an ANN. To date, researcherswith various levels of ANN expertise have been working on a wide varietyof databases, conducting isolated experiments to determine the best valuesfor a large number of ANN parameters, and rarely reporting anything otherthan a wealth of similar outcomes. Comparing the work in this field isoften difficult and usually not particularly helpful in answering questionsregarding the resiliency of a particular ANN architecture or algorithmwhen it is used on a medical database. As a result, progress in this fieldhas been slow. However, if researchers were to report (as a minimum) astandard set of parameters each time the results of an experiment are documented,progress would be enhanced because this information could be used to pinpointthe types of ANNs and ANN techniques which yield the best results for aparticular problem of interest.

MOTIVATION

In some cases, it is difficult to glean critical information from thedetails published in the literature about an ANN which has been used toestimate a particular medical outcome. Some of the more serious omissionsinclude: critical details of the dataset used in the ANN experiments (suchas a priori knowledge of the various outcomes in the dataset); detailsof ANN performance (such as the number of epochs completed and the stoppingcriteria which were used); performance comparisons with the results obtainedusing traditional statistical benchmarks and models (such as an estimatedBayes-type minimum distance classifier and/or regression models); in veryexceptional circumstances there may even have been no evidence presentedto indicate that the ANN model was ever applied to a set of test data.

In order to rectify this, it is proposed that the following guideline,which sets forth a standard set of parameters to be reported in all papersdocumenting the use of ANNs to estimate medical outcomes, be adopted byall researchers in this field. The proposed guideline consists of threedistinct categories and is based largely on questions which arose duringan extensive literature survey, although it also incorporates the ideasand suggestions of our own research group. To avoid singling out a particularauthor (or group of authors), references to papers which raised questionshave not been included. While neither exhaustive in its inclusion of everypossible ANN detail, nor minimal in the sense that it is not necessaryfor every detail of the guideline to be included in a paper for it to beconsidered to be "complete" (since it is understood that someof the information, such as comparisons to regression models, may not evenbe available), it is hoped that this basic guide will motivate researchersto carefully consider what details of their work to report.

THE PROPOSED REPORTING STANDARD

The three reporting categories suggested are: a) The Dataset, b) ANNDetails, c) ANN Results and Statistical Comparisons. The parameters whichit would be useful to include are described in the sections which follow.

a) The Dataset

To answer the questions:

The documentation should include:

b) ANN Details

To answer the questions:

The documentation should include:

c) ANN Results and Statistical Comparisons

To answer the questions:

The documentation should include:

CONCLUSION

Based on a consideration of the current literature, it is apparent thatthere is a need to establish a guideline for reporting the results of experimentsin which ANNs have been used to estimate medical outcomes. It is hopedthat the model suggested in this paper will act as a catalyst for meaningfuldiscussion on how best to compare and evaluate the results of all suchexperiments, and eventually lead to a standard which will be accepted byall researchers in this field.

ACKNOWLEDGEMENTS

This work was completed with the assistance of a NSERC PGS-A Scholarshipand MRC Grant CGAA-45088.

REFERENCES

[Baxt, 1990] Baxt, W.G. Use of an Artificial Neural Network for DataAnalysis in Clinical Decision-Making: The Diagnosis of Acute Coronary Occlusion.Neural Computation, 2, 480-489: 1990.

[Baxt, 1991] Baxt, W.G. Use of an Artificial Neural Network for theDiagnosis of Myocardial Infarction. Annals of Internal Medicine, 115,843-848: 1991.

[Duda and Hart, 1973] Duda, R.O. and Hart, P.E. Pattern Classificationand Scene Analysis. New York: John Wiley and Sons, Ltd., 1973.

[Wu, 1993] Wu, Y. et al. Artificial Neural Networks in Mammography:Application to Decision Making in the Diagnosis of Breast Cancer. Radiology,187, 81-87: 1993.