|
What is our error rate?The Protea Atlas Project has been accused of having a 20% error rate by a prominent taxonomist. However, this figure is just pie in the sky. What are more realistic estimates of our error rates? Data capture errors: It must be noted that at this stage the electronic data have not yet been checked against the paper Sight Record Sheets, so some errors may still exist. However, we suspect these to be very low in the current data base - about 1 per 100 SRS for the Protea and Habitat boxes and between 1 and 5 per 100 SRS for the Locality box. This low rate is due to the integrity of the codes, with the only commonly shared code being "N" (for nothing, none). However, these errors will be confined to population, phenology and height codes, and not to species. Similarly, Locality errors in degree Latitude or Longitude should be very low (1 per 8 000 SRS), and in minutes fairly low (1 per 1 000 SRS), although the error in decimal minutes may be higher. More details are provided under Co-ordinates below. Atlasser errors: Although atlassers may make many errors, most of these will be picked up by the data checking programmes. The exception to these are "allocation errors" and computing errors. We cannot detect when atlassers have used the wrong code (an allocation error). By far the most serious of these are identification errors. But we can check on computing errors. Estimates here are less accurate than above, being based on what I remember having processed. If a numerist is interested in tallying up the corrections, their types and their rates, we should be most grateful. Computing errors- Co-ordinates and Altitude: In addition, the checking programme CAPTURE locates each site to a broad biogeographical zone and reports any new or "out of range" species records. For the Cape Floral Region, it reports new records (to the atlas, and to the atlasser) to a 12 X 12 km grid cell (See species errors, below). These records are thus automatically flagged for attention by the atlasser and, if required, the co-ordinate checking team. Similarly, during routine checks, the generation of species lists for specific areas and nature reserves (at the request of atlassers intending to visit them) and the listing of species localities (at the request of atlassers wanting to see certain species in an area), any odd records are first flagged for co-ordinate checks before possible allocation errors are checked. Detected (and corrected) locality errors are about: The current co-ordinate error in the database is estimated at about 1 per 300, most of these in recently received (and thus still to be checked) data. Allocation errors - Species In order to resolve some of these possible errors, we "empower" atlassers (who request species lists for areas they intended visiting) with comprehensive lists of species and localities which require checking. We have also sent letters to atlassers with suspect data, asking them for more details. Herbarium specimens are requested for note-worthy localities, gap fillers and range extensions. This is ongoing. The major problems are: Subspecies codes: Planted species: Substitution codes: Substitution species - problem species: The following species routinely present identification problems (from worst to not so
bad): Protea neriifolia and hybrids of the White Water Proteas are widely planted within the distribution ranges of related species and require much effort (usually revisits to the sites) to distinguish between them. None of this is helped by the fact that many of the more experienced atlassers routinely identify species outside of flowering and fruiting times. A further problem is the identification of atypical specimens. Thus Ld uliginosum uliginosum with larger leaves in the Kouga has been identified as Ld loeriense. A previously unrecorded small-flowered form of Pr scorzonerifolia was identified as Pr piscina (PAN 27.7). Pr laurifolia and Pr neriifolia seem to intergrade where they co-occur - most (99%) of the time there is no problem distinguishing between the two species. The degree to which atlas data will modify species concepts in the genera is at present unclear, but many problems experienced by amateurs are real and not merely careless mistakes. Furthermore, some identification problems are seasonal. Out-of-range records of Ls conocarpo-dendron conocarpodendron are only noted during the period of new leaf growth, when subspecies viridum produces silvery-haired leaves. Some identification problems can be identified to a particular source. Thus Botanical Society Members of Professor Jackson's A-team routinely atlassed Se glomerata as Se hirsuta. Neither species are illustrated in Mary Matham Kidd's Cape Peninsula guide, and presumably the error originated in the A-team. This error now only occasionally crops up during the period of new growth when flowerheads are absent. Statistics on incorrect identifications are difficult to obtain. Many problems are resolved before the Sight Record Sheets are submitted, when atlassers bring in "ecoscraps" for identification. Obvious errors are often resolved before data are captured, simply by pointing out differences between confusing species and requesting confirmation. By carefully vetting the first 20-50 SRS sent in by any atlasser the major identification problems are easily forestalled. About 1 in 400 species records have been changed since capture, mostly as a result of atlassers collecting additional data or having their specimens verified by professionals. Queries on over 400 species records (1 in 200) have been sent to atlassers, but many of these have been verified or confirmed. Correcting errors, possible errors and verifying new records is an on-going process. We hope that atlassers are improving all the time, as that this is reflected in the quality of our data. Tony Rebelo Back PAN 30 |