BSPP Presidential Meeting 1996

Parallel session 3A: Support for decision making

PC-Plant Protection - a need-based decision support system for crop protection in Denmark
Bo J M Secher & N S Murali
Danish Institute of Plant and Soil Science, Department of Plant Pathology and Pest Management, Lottenborgvej 2, DK 2800 Lyngby, Denmark.

The decision, in 1986, on an action plan to reduce pesticide use in Denmark by 50% has led to increased research on the potentials for reduced dosages. A decision support system (PC-Plant Protection) developed by the Danish Institute of Plant and Soil Science and the Danish Agricultural Advisory Centre implements the research findings with a detailed use of threshold values to support decisions on treatment need, choice of pesticides and the appropriate dosage for the actual problem.

The pest and disease model of the system has been commercially available since 1993. In 1996, 2800 licences were in use at agricultural schools and with advisers and farmers. The system was evaluated by agricultural advisers in 1990, 1991 and 1993. Usability and usefulness of the recommendation model were judged to be user-friendly, and the model was considered to be reliable and to meet the requirements of advisers. In 1996 a farm survey of 488 farmers who had used the system in 1995 showed that the system has been well accepted by farmers, not only because of its reliable recommendations but also because of the good profitability that resulted from its use.

The recommendation model for pest and disease control has been validated in the field since 1990. Validation in winter wheat (62 trials), spring barley (62 trials) and winter barley (39 trials) has shown that the model is able to control pests and diseases to a satisfactory level, without affecting farmers' gross margins. The model was able to regulate pesticide usage to the large annual variations. On average, pesticide usage in plots treated according to the model was well below the level in reference plots, and below the level for strategies commonly used by Danish farmers.

The models in PC-Plant Protection are undergoing continuous development. A weather module, taking into account detailed weather information, and an environmental module, giving farmers the opportunity to choose pesticides according to their environmental risk, will be implemented in the near future. Further planned developments include integration of the decision support system with site-specific (precision) farming, and multimedia presentation of biological and pesticide information.

Developing a model of expertise for a taxonomic expert system
Marion Edwards
Computing Laboratory, University of Kent, Canterbury, Kent CT2 7NF, UK
School of Sciences, University of Buckingham, Buckingham MK18 1EG, UK

This paper discusses the role of modelling in expert system development with reference to an expert system developed to investigate water mite identification. A model, usually termed a model of expertise, defines the behaviour that the expert system is required to exhibit. The model of expertise is usually, at least partly, based on the behaviour shown by a human expert when solving a problem. In this case the actions of an expert taxonomist when identifying water mite specimens were investigated.

A model of expertise is developed during the knowledge acquisition stage of an expert system project. Knowledge acquisition includes three activities:

  • the elicitation of the domain-specific knowledge (usually from an expert)
  • the interpretation of that knowledge
  • the formal representation of that knowledge.

This paper concentrates on the first two activities: elicitation and interpretation. First, the knowledge acquisition technique, which is a modified form of the twenty questions knowledge acquisition technique, is described. Second, the interpretation and analysis of the knowledge elicited is outlined, and the model of identification is described. The model was termed identification by confirmation and has four distinguishing characteristics.

  1. A broad search strategy is used so that once a tentative identification is reached alternative identifications, which could be confused with the tentative identification, are specifically excluded.
  2. Questions are repeated when there is reason to doubt the original answer either because the question was answered hesitantly or the answer contradicts existing evidence.
  3. Different types of questions are recognized, most important are those used to confirm conclusions tentatively reached and questions answered hesitantly.
  4. The taxonomic classification has a flexible representation.

This last characteristic showed that the expert neither rigidly followed the recognized classification nor discarded it entirely, rather it was used as appropriate.

The model of identification is described in an informal, implementation-independent format, and the implications of adopting a modelling approach to the development of taxonomic expert systems are discussed.

Parallel session 3B: Computer-based species identification: Applications I

Computer-based keys for botanical identification
R J Pankhurst
Royal Botanic Garden Edinburgh, Inverleith Row, Edinburgh EH3 5LR, UK. 

Computer assisted identification first began about 30 years ago, and it is now 20 years since the publication of the first Systematics Association conference volume on identification methods. This paper traces the development of the principal types of method, elimination versus comparison, over this time. The DELTA format for morphological descriptive data continues to be of great importance for identification techniques. The relevance of various computing techniques, such as expert systems, hypertext, relational databases and computer graphics, will be discussed. Finally, priorities for new research will be given.

Identification of yeasts through computer-based systems
R W Payne
Statistics Department, IACR-Rothamsted, Harpenden, Herts, AL5 2JQ, UK.

The identification of yeasts presents a worthwhile and challenging application for computer-based identification systems. There are currently nearly 650 species and varieties to consider, and over 100 tests. These involve such attributes as the ability of the yeast to ferment certain types of sugar or to grow in particular media, as well as some morphological characters such as the formation of spores and filaments. The basic data for the identification is a species-by-test table indicating the possible responses that each yeast may give to each test. The response may be fixed: all strains of the yeast always give the same result (positive or negative) to the test. However, there are many variable responses, where different strains produce different results. Some responses are weak or delayed and thus must also be treated as variable, and others are unknown.

A further complication is that tests can take up to 14 days to observe. Consequently the standard sequential methods, used for example in botanical identification, are inappropriate. Tests are generally observed in batches, and it is more important to ensure that the identification can be completed within a specified time than to minimise the number of tests that are required.

Methods will be described for selecting the tests to be used during an identification, and for assessing the reliability of the conclusions that they generate. We will also discuss ways of allowing a suggested identification to be checked and of allowing for mistakes during testing. These will be illustrated by examples from the Yeast Identification Program of Barnett, Payne & Yarrow, and the keys and tables in the book Yeasts: Characteristics and identification (Barnett, Payne & Yarrow, 2nd edition, 1990, Cambridge University Press). 

Probablisitic identification systems for bacteria
TN Bryant
Medical Statistics and Computing, University of Southampton, Southampton General Hospital, Southampton SO16 6YD, UK. 

Probabilistic identification methods for bacteria are based on the likelihood that the observed pattern of results of tests carried out on an unknown bacterium can be attributed to the known results of a taxon within an identification matrix. Typically, the known taxon is a recognized bacterial species, although this does not have to be the case, and the identification matrix contains species from a small number of related genera. Few attempts have been made to produce a general purpose matrix covering all genera. A number of approaches have been suggested for probabilistic identification; however the most commonly used method was proposed in 1973 by Lapage and his colleagues, which is based on Bayes Theorem.

The probabilistic approach depends on the availability of good quality phenotypic data in the form or a probability, or identification, matrix. Ideally, the probability that a taxon (species) exhibits an attribute is obtained from testing many isolates of that species. Developments in probabilistic identification can be divided into: bacteriological, involving the development of new tests and techniques that enable the creation of new, or better, identification matrices; and computational, through the development of software to perform the management, manipulation and evaluation of data as well as the identification process itself. The probabilistic approach forms the basis for some commercial identification kits although the exact algorithm used may not be published. In recent years there has been a lack of new identification matrices. This may be due to microbiologists pursuing new genotypic, or phenotypic techniques, neither of which produce data appropriate to the probabilistic method. Bacterial taxonomy, including identification, is now based on this polyphasic approach. In most routine microbiology laboratories, however, identification is based on the use of conventional tests which form the basis of probabilistic identification.

A general structure for biological databases
J Diederich*, R Fortuner**, J Milton*
* Department of Mathematics, University of California, Davis CA 95616, USA.
** 4, rue des Jardins, Montendre, 17170 France (Correspondant du Musum National d'Histoire Naturelle, Paris). 

We argue that morpho-anatomical characters should best be stored in a general database as simple data because it is always easier to combine several simple pieces of data into a complex character, when one is needed, than to do the opposite, which may even be impossible in some cases.

A standardized decomposition is discussed, using the classical entity/property/value scheme. For each character, the entity is strictly defined as one of the existing biological structures, from the whole organism to systems, organs, tissues, and down to individual cells, organites and molecules. The property is taken from a short list of "basic properties" which are the same in most biological groups. The value is the traditional qualitative state or quantitative value, with some standardization made possible by the use of basic properties.

Other, non-morpho-anatomical, types of data could be decomposed in a similar manner, using the same entities with their own sets of basic properties. This would be naturally conducive to the definition of relations between different types of data.

Characters that are decomposed into elementary parts can be used as is, by specifically designed applications. In other situations, a database management system (DBMS) should provide sufficient support to use or view the data in more complex, or different ways. Some of this required support can be provided through generic capabilities of DBMS which are intended for a wide variety of users and are not directed at supporting biological applications per se.

There are some important requirements for biological databases to relate simple characters with more complex formulations, or with coded formulations. Here we will address some of these questions and will look at concepts we have defined or that need to be studied to support existing applications such as DELTA-based applications, cladistic programs, etc. In particular we will be looking at the requirements (the needs of biological applications) for the purpose of determining what a biological database management system, a BioDBMS, should be.