Use of Satellite and Field Information in a Forest Damage Survey of Eastern Finnish Lapland in 1993

The study area consists of the Finnish part of a Landsat 5 TM image from 1990. Three independent field samples were measured during 1991–93 in the study area. The first sample was used to compile training areas for supervised maximum likelihood classification of the image. Classification accuracy was studied in the second sample. The spectral separability of the forest strata usable in practical forestry was poor. The extent of the damage area was estimated by the principle of stratified sampling. The estimate included considerable bias because the field sample had not been objectively selected from the image classes. The third field sample was measured as part of the National Forest Inventory of Finland. It is wholly objective, and about ten times larger than the two earlier field samples. The poor spectral separability of the forest strata was confirmed by the NFI sample. However, this sample could be used in stratified sampling with little or no bias in the estimation of the damage area estimate. 14 different damage types were separated according to specific damaging agent. A thematic map was produced which presents the spatial distribution of two damage-rich image classes. The study area comprises 18 300 sq.km, of which 38 % were damaged. At first sight it would appear that the proportion of damaged forest has tripled in ten years. However, this is not the case because now special attention was paid to forest health in the field work. Despite this, it is possible that some damage caused by unfavourable climatic phenomena in the ’80s was still perceptible in 1993. No damage caused directly by air pollution has yet been verified in the study area.


Introduction
Unusual outbreaks of forest damage occurred in Finnish Lapland in the 1980s. Special climatic phenomena were shown to have caused some part of the damage (Jalkanen 1990, Ritari 1990. Despite this, uncertainty about the role of airborne pollutants was not dispelled. Increasing awareness of the huge emissions from the nearby Russian copper-nickel smelters in the Kola Peninsula raised suspicions that these emissions could be affecting forest health in eastern Finnish Lapland. A multidisciplinary research project was started in 1989 to clarify the situation. The project has been described in an interim report (Kauhanen and Varmola 1992) and a summary report of the final results has been published (Tikkanen and Niemelä 1995).
Satellite imagery was used in the Lapland Forest Damage Project to survey damage in eastern Lapland and the Kola Peninsula. It soon became clear that the two areas are rather different as regards the potential usability of satellite imagery (Mattila 1992). In the Kola Peninsula there are large, easily detectable damage areas; this is not the case in eastern Lapland. In eastern Lapland it was possible to gather field data for training and verification but not in the Kola Peninsula. The study was therefore divided into two parts: a) Kola Peninsula: spatial and temporal illustration of the worst damage areas b) Eastern Lapland: quantitative description of the actual forest health situation at the beginning of the 1990s and spatial illustration of the damagerich areas Methodological aspects were important in both cases. The actual study areas were delineated on the basis of the usable satellite images available. The study areas did not cover the whole of the Kola Peninsula and eastern Lapland, but are typical examples. The results from the Kola Peninsula have been presented in the papers of Mikkola and Ritari (1992) and Mikkola (1996a,b). This paper is limited to the study carried out in eastern Lapland. A literature review is first given to help understand the final goals set and methods used in this part of the study. Satellite data were used in forest damage sur-veys shortly after the launch of ERTS-1, the first satellite in the Landsat series. Gimbarzevsky (1984) has reviewed remote sensing applications in forest damage surveys up until 1983.The results give the overall impression that the old satellite data were not very suitable for forest damage surveys. Usually only two or three damage classes could be separated, which was probably not enough for most practical purposes. Recent use of Landsat MSS material is often associated with coarse change analyses with multi-date imagery. Landsat 4 (launched in 1982) and SPOT 1 (1986) marked the start of a new era of satellite data. The characteristics of the new spectral information are discussed in the papers of Begni (1982), Markham and Barker (1985) and Chavez and Bowell (1988). A considerable effort has been devoted to studying the use of the new satellite data in forest damage surveys. For example, at least four dissertations on the item have been published in Europe (Förster 1987, Häme 1991, Ekstrand 1993and Olsson 1994. Many studies with the new data were conducted already in the late 1980s (e.g. Defeo et al. 1988, Hopkins et al. 1988, Westman and Price 1988, Franklin 1989and Vogelmann and Rock 1988. The early experiments revealed that the new data did not improve damage detection as much as was expected. Some studies from the '90s are discussed in more detail in the following. Chamignon and Maniere (1990) used SPOT data and GIS to map insect damage on larch in France. Damage could only be detected in homogeneous stands using the normalized difference vegetation index. Godard et al. (1990) were able to map spruce budworm defoliation by Landsat TM data, but not by SPOT data in Canada. Joria et al. (1991) compared SPOT and TM data in mapping three levels of gypsy moth defoliation in Michigan. A GIS was utilized in the analysis and the outcome was improved by using a forest/non-forest mask. TM data gave a better separation between the three defoliation classes. Brockhaus et al. (1992) compared SPOT and TM data in modelling defoliation in North Carolina. The near infrared channel of both satellites correlated significantly with per cent defoliation. However, the authors concluded that satellite information alone is not sufficient. Hudak et al. (1993) separated three defoliation classes of balsam fir in Canada using multidate SPOT imagery. Muchoney and Haack (1994) also used multidate SPOT data to monitor defoliation due to gypsy moth in Virginia. They compared four different change detection methods. Principal components analysis and image differencing were the best techniques in mapping two defoliation categories at two successive dates.
Landsat TM imagery was used in the following studies. Ekstrand (1990) used digital stand data as an ancillary tool to detect moderate defoliation of Norway spruce in Sweden. Vogelmann (1990) compared two vegetation indices in mapping the damage at three sites. Only two damage classes could be mapped in conifer forests, but the normalized difference vegetation index separated three classes in deciduous forests. Ahern et al. (1991) also used vegetation indices to detect reduced growth due to spruce budworm in Canada. The indices incorporated the near-infrared and shortwave infrared reflectances. The correlation between the indices and net annual volume change was more than 0.8. Nakane and Kimura (1992) observed that the simple ratio index (NIR/ Red) decreased with increasing pine blight damage on Japanese red pine. In Canada, Hudak et al. (1993) used discriminant analysis to separate defoliation classes of balsam fir on the basis of the spectral channels. The near and middle infrared channels were best in this context. The mean accuracy of the three defoliation classes was 79 %. The spectral signal of defoliation is usually obscured by other stand characteristics. In Sweden, for example, a hardwood mixture of 20 % cancelled out the effect of 20 % spruce defoliation (Ekstrand 1994). In theory this problem can be mitigated by calibrating the spectral values on the basis of digitized stand data. However, this is probably not an operational approach in practice.
The studies mentioned in the preceding review have some common features. The targets under investigation were distinct homogeneous stands with a fairly closed canopy. Some information about the site, topography and stand structure was often used as ancillary data. Quality of the original satellite imagery was above average. In addition, radiometric corrections were made to remove the noise induced by the atmosphere and topography. These measures are used to refine the spectral information to give better damage detection. However, the separation success has been 3-4 damage levels, at best.
The experimental results cannot be generalized to practice as such. Practical cases often involve heterogeneous forest areas without relevant ancillary data. Due to illumination and clouds the available satellite imagery is rarely optimal for solving the actual problem. The data needed for radiometric corrections are hard to acquire or do not exist. Thus the results will be poorer in operational applications than in experiments. This fact has often been omitted in planning a survey, which has caused disappointments due to the overestimation of the usability of satellite imagery.
The above conclusions are valid concerning detection of 'normal' forest diseases. Land use changes and severe physical damage to forest vegetation have been successfully mapped using satellite imagery. Gillis et al. (1990) mapped forest damage caused by hailstones. Kuntz and Zimmermann (1993) surveyed storm damage. Coal mining areas have been mapped e.g. by Meer Mohr et al. (1992) and Chatterjee et al. (1994). Rathore and Wright (1993) reviewed remote sensing studies on the total environmental effect of surface coal mining. The effects of industrial air pollution on vegetation cover have been mapped for example by Tömmervik et al. (1992) and Mikkola and Ritari (1992). Both studies concern the Russian ore smelters in the Kola Peninsula, NW Russia. Open cut areas have been mapped with success especially in Canada (Rencz 1985, Hall et al. 1989. The general limits of satellite imagery in forest damage survey appeared in the literature review. On the other hand, the actual usability depends on case specific circumstances, too. The forests in Finnish Lapland are difficult to survey on the basis of satellite spectral information (Mattila 1992). Because crown cover is generally fairly low, its spectral signal is confused by the understorey vegetation. Tree stand structure is uneven both horisontally and vertically, which creates disturbing shadows in the canopy and on the ground. The generally low sun angle accentuates the shadow problem. All these factors limit the usability of satellite imagery especially in damage surveys (see Koch et al. 1990). A further limiting factor is that usable images are obtained only once every two or three years in Finnish Lapland.
Pollution-induced deforestation or erosion that is clearly visible in satellite imagery does not occur in Finnish Lapland. Forest diseases caused mainly by insects and fungi are relatively slight and areally dispersed. The spectral signature of a known forest damage area can provide clues about the damage even in the forest conditions of Finnish Lapland. However, the signs are so faint and site specific that using them for detecting damage areas elsewhere does not work in practice. Omission and commission errors will occur to such an extent that the resulting map is of little use. To avoid erroneous conclusions the map must be checked in the field and this is a costly task.
It became clear that an operational survey of forest damage directly from satellite imagery was not possible in Finnish Lapland. The conclusion was also supported by the fact that the material most approriate for the purpose, Landsat 5 TM, was already somewhat degraded (Olsson 1994). Better material was not expected to become available in the near future because the Landsat 6 satellite was lost during launching in autumn 1993.
The main goal of this study is to estimate the area affected by certain types of forest damage in the study area. Forest damage types can only be observed in the field. Area estimates can be obtained by combining the satellite and field information according to the principle of stratified sampling. The precision of the estimate depends on the strength of direct or indirect association between forest damage and the remotely sensed variables used for stratification. So the central task in this study is to find image classes which differ from each other as much as possible with respect to the actual occurrence of damage in them.
One can define damage-prone forest strata by site quality, dominant tree species and age of the tree stand. On the condition that these strata can also be mapped on satellite image, damage-rich image classes can be defined and used efficiently in stratified sampling. Presenting the damagerich classes as a thematic map reveals spatial patterns which may facilitate causal analysis of the damage, too. The secondary goal of this study is to create such a thematic map.

The Satellite Image
The Landsat 5 satellite captured a nearly cloudfree TM-image on 31. 8.1990. The location parameters of the image are 189-13. The image center point is located near the border between Russia and Finland (Fig. 1). The Finnish part of the image covers 1.83 mill. ha in the municipalities of Salla, Savukoski, Pelkosenniemi, Kemijärvi and Kuusamo. The image covers only a small part of Kuusamo. The named coverage is called the 'study area' in this paper.
The timing of the image is not optimal due to the incipient vegetation senescense and the relatively low sun angle in late summer. However, the image was chosen for the study because it was the only cloud-free one available for a couple of years. No corrections for the atmosphere or topography were made because the necessary ancillary data were not available. The whole image was rectified to a UTM coordinate system.
The study area belongs to the boreal coniferous forest zone. Watercourses occupy some 4 % of the total area. Only a few per cent of the land area is used as residential areas, fields, roads etc. The rest is classified on the basis of tree growth potential either as forest land, scrub land, or waste land (see Kuusela et al. 1986, p. 17). In the study area, waste land mainly includes peatland which is naturally treeless or nearly treeless. Forested land (forest land + scrub land) accounts for about 85 % of the land area. This includes 5 percentage units of temporarily open, clear-cut areas. 70 % of the forested land area is located on mineral soil. Forest land occupies 67 % of the land area. Most of the forest land is on mineral soil.
The tree stands are rather heterogeneous, although even-aged management principles have been applied on forest land at least. The primary dominant tree species are Scots pine and Norway spruce, and to a minor extent downy birch (Betula pubescens). Mixed stands are common. All development classes ranging from clear cuts to overmature stands are present in the study area. Development class mixtures often occur within the same stand. The mean volume of the living trees is low, 45 m 3 /ha on forest land and only 12 m 3 /ha on scrub land. The low mean volumes are partly due to the heavy cuttings after World War II.
Several of the features listed above suggest that it is difficult to map the study area for forestry purposes on the basis of satellite spectral information. Most of the resulting spectral classes can probably not be used as the information classes needed in practical forestry. Confusion will probably occur between forest land and scrub land, between mineral soil and peatland, between development classes and between dominant tree species. These qualitative variables are all of vital importance in forest management planning in the study area.

The Field Samples
The study area was covered by a systematic field sample in the fall of 1991. The field work lasted for one and half months from late August to early November. The image date (31.8) of the previous year fell in the first half of the period. The sample consisted of 228 angle count plots at the same locations as in the previous National Forest Inventory in 1983. The old field locations were used in order to facilitate some checks at the stand level. Trees were tallied on the field plots without identification of the old tally trees.
The first field sample was to be used as training data for supervised classification. The reason for the systematic sampling was to catch the actual variation of the forest conditions present in the study area. The data served two other implicit goals, too. The sample gave an objective preliminary idea of the disputed forest health situation in eastern Lapland. No clear deterioration was observed. On the other hand, it was possible to check the consistency of classifications made at two points in time. Some confusion between forest land and scrub land, as well as between mineral soil and peatland, was apparent. The confusion was mainly due to borderline cases. The result was taken into account in compiling the final image classes for stratified sampling.
Another field sample was measured in the fall of 1992 to facilitate independent verification of the classified image. 228 plots were again selected from the sample of the previous National Forest Inventory for re-examination in the field. The following selection criteria were used. The study area was divided into 8 km by 16 km tracts. Four plots were first chosen systematically from each tract. This was done in order to ensure an even spatial distribution of the sample. The old inventory data of the plots were used in the final selection. All forested plots (plots either on forest land or scrub land) were automatically chosen. Non-forested plots were rejected if additional old forested field plots were present within the same tract. Forested plots were preferred in the selection in order to obtain a better sample of the more important forest strata. However, an effort was made to ensure that all non-forested strata were also represented in the verification sample. After selection, the plots were located in the field and those stand variables needed in defining the necessary forest strata were recorded on the plots.
The verification sample was also intended to serve the reconnaissance and checking goals mentioned above. The observations were similar to those obtained from the training sample.
The new National Forest Inventory was carried out in the study area in summer 1993. The systematic field sample consisted of 2700 field plots located independently of the previously mentioned samples. Owing to the large size this sample alone permits rather precise estimation of the forest health situation, and thus serves as a strong control for the other damage estimates.
Five of the stand variables observed in the field concerned damage (see Valtakunnan... 1993, p. 37-39). Three of them viz. appearance, causal agent, and degree of damage were used to define 14 types and two levels of damage in this study (section 5.1). Damage was also recorded in a sub-sample of the tally trees. It was neither necessary nor possible to use the tree-level damage data in this study.

The Basic Choices
Supervised maximum likelihood (ML) classification (Swain and Davis 1978) was chosen as the working tool in this study. This method has a strong theoretical basis and there is abundant experience of its use. However, there are some limitations which must be kept in mind when using this image classification method. The training areas should be homogeneous, include at least 30 pixels each, and the pixel values should be normally distributed. In general practice, all these requirements are not usually met in full, and the potential classification accuracy is not obtained. Proper field data must be available prior to classification, and in many cases this is the main limiting factor. Such data could be acquired and the ML methodology was chosen for this study.
Only half of the satellite image is in Finland. It was not possible to measure sufficient comparable field data on the Russian side for image classification and verification. The study was therefore restricted to the Finnish territory only. The whole image can be classified using the Finnish training areas, but the result cannot be verified in the field. More classification errors would probably occur in Russia owing to the rather different forest conditions there. For example, lichens are much more abundant in the ground vegetation on the Russian side and this may be reflected in satellite imagery.
Synthetic channels can be created by combining the original image channels. Earlier, data compression was the main reason for doing this.
The improved computer facilities nowadays available mean other aspects are also relevant. For example, some ratio transformations can reduce noise. It is a matter of definition whether synthetic channels can offer novel information or only reveal some intrinsic features of the data. Synthetic channels were used in this study.
ML classification allows the use of prior probabilities. They alter the probabilities with which the pixels are assigned to the spectral classes. In order to improve classification accuracy in this way, one must have sufficient and relevant data for calculating the prior probabilities. The field data from the 1993 National Forest Inventory were finally used for this purpose in the study.

Construction of the Synthetic Channels
Principal components analysis was proposed as a data compression method as early as in the 1930s (Hotelling 1933). When applied in multispectral image analysis the first two components are often associated with the brightness and greenness of the scene (for example Franklin 1986, Horler andAhern 1986). The meaning of the higher order components depends on the application. In multi-date Landsat MSS imagery they can reveal changes in the brightness and greenness (Ingebritsen and Lyon 1985). In a singledate Landsat TM image the third principal component is also informative. It can contain some information about the contrast between the visual and shortwave infrared spectral channels (Horler and Ahern 1986). The principal components can be calculated using either a covariance or a correlation matrix. According to Singh and Harrison (1985), the latter method gives better results. However, a covariance matrix is mostly used because of the limitations in image processing software. This was also the case in this study. The matrix was calculated from the center of the satellite image, omitting 2000 pixels on each side. The window was chosen visually so as to include approximately the same variation as the whole image. All seven channels were used in the calculation. The first three principal components included some 95 % of the scene variance. These compo-nents were then calculated for the whole image and added as three layers to the working image. Kauth and Thomas (1976) presented a fixed linear transformation for the compression of Landsat MSS data and called it Tasseled Cap (TC). It is a kind of soil-adjusted vegetation index. It is often confused with the proper principal components because the terms brightness and greenness are also used in this connection. Another similarity to the principal components is that the maximum number of TCs is equal to the number of image channels. Standard coefficients are usually used in the transformation. Crist and Cicone (1984) applied the concept to Landsat TM data. Crist and Kauth (1986) dealt with the topic and called the third TC of Landsat TM data, wetness. Cohen and Spies (1992) observed a clear correlation between the wetness and some stand parameters.
The PCs are scene-dependent, but they probably refer to rather similar features as the TCs when the same channels are used in the transformations. The TCs were therefore calculated without the thermal channel in this study. Standard coefficients from the literature were used in the calculation. The image center window was used only to scale the value range. The third TC was added as one layer to the working image. This choice was made because the thermal channel had a strong effect on the third PC already present in the working image.
Three different vegetation indices were further added to the working image. The indices are certain transformed normalized differences which were calculated using the equations where SQRT is the square root and the B terms refer to the TM channels. 0.5 was added to avoid negative values. The square root operation was carried out to mitigate extreme values and to check that all the values were positive after the addition. The image center window was again used to scale the index values.
The first index above was derived from the proper Normalized Difference Vegetation Index (NDVI). According to Cohen (1991), NDVI was introduced by Rouse et al. in 1973. The second index was derived from the proper Infrared Index (II) used by Hardisky et al. (1983). The ratio of B5 to B7 was successfully used as a vegetation index by Musick and Pelletier (1988). This indicates that these channels include different information. Thus the third index above is justified, too. The first index is mainly sensitive to quantitive vegetation characteristics, while the others are primarily sensitive to moisture conditions.
The final working image included seven synthetic channels. After scaling and stretching, the value range in each channel was 0-255. The scaling parameters were defined in the same image center window. The range to be stretched was chosen as ±2.5-3.0 standard deviations from the mean. Thus only about 1 % of the pixels should be forced to the range setting with a value of either 0 or 255. Each channel was inspected visually as a monochromatic image on the screen. The inspection gave the impression that each synthetic channel in the working image was imformative in its own way.

Compilation of the Training Areas
The exact locations of the plots measured for training purposes were marked on aerial photographs in the field. The locations were then transferred to the satellite image by visual comparison on the screen. It was estimated that the transfer could be done with an accuracy of some 10-20 meters. The mean displacement would have been more than 50 meters without the visual control. The coordinates of the 'correct' locations of the field plots on the satellite image were loaded into a digital file. This file was subsequently used for automatic positioning of the plots for the image processing system (ERDAS).
The training areas were compiled in several phases. First, a small group of spectrally homogeneous and spatially contiguous pixels was formed at each field plot location. The limiting parameters were defined plot by plot so that the group size remained within 3-9 pixels. This was done to ensure that the groups were relatively homogeneous in the field, too. After the proper limits were found each group was named, and its spatial borders and spectral signature were saved in a digital file for subsequent work phases. These pixel groups were called seed areas according to the SEED program used in compiling them.
The seed areas must be combined in a valid way in order to get training areas. The final training areas are therefore not spatially contiguous. The purpose of using this approach was to catch the real variation occurring in the field. Field data were first used as the principal criterion in combining the seed areas. Statistical measures were also used when the number of pixels reached 10-15. The tools chosen were Transformed Divergence and Jeffries-Matusita Divergence (see Thomas et al. 1987) because the image was classified by the ML classifier. The best divergence values were obtained with all seven channels combined. However, the divergence measures could not be used as the only decision rule in any of the phases. Combination frequently required a compromise between the divergence and field data. Because the most important field strata were not allowed to mix, this inevitably increased the spectral variation within the spectral strata.
The 228 seed areas were combined to form 30 training areas. These are from 18 separate field strata, because more than one training area was formed in some of the large field strata. The minimum number of pixels per training area was set at 30; this was not reached in 12 cases. These training areas were supplemented with additional pixels that were located in the immediate vicinity of the original seed areas. A visual inspection was made to ensure that the additional pixels were chosen outside the original seed areas. After the additions, the location parameters and signatures of the training areas were loaded into digital files for later use in the image processing system.

Use of the Prior Probabilities
Prior probabilities, or so-called priors, can be used in ML classification to improve the results (Strahler 1980). The ML classifier is prone to give biased area estimates. This can be mitigated by using priors (Maselli et al. 1990(Maselli et al. , 1992. According to Mather (1985), the potential utility of priors depends on class separability. The gain can be significant when the classes are spectrally close, as is the case in this study. The priors affect the class assignment in borderline cases. The output distribution becomes closer to the prior distribution when the priors are used in the classification.
The field sample of the National Forest Inventory was used to estimate the priors in this study. Three factors facilitated the estimation. The study area could be delineated fairly accurately in the NFI sample. The relevant subsample was still relatively large. However, the most decisive factor was that the field strata of the training areas could also be defined unambiguously in the NFI sample. This was because the same field instructions had been used in both cases. Priors can be modified on the basis of the preceding classification (Strahler 1980). Manipulation of this sort was performed once in this study.

Factual Content of the Image Classes
The classifications were done on the basis of the synthetic channels. The output image classes were first studied using the verification field sample. The field sample of the National Forest Inventory was later used for the same purpose. As with the training sample, the plot coordinates of the verification sample were defined visually on the working image and saved into a digital file for the image processing system. After each classification, the assigned image class and the known true stratum were cross-tabulated at these locations. The process was automated to produce three confusion matrices immediately after classification. The matrices referred to three different levels of class aggregation. The 30 original image classes were aggregated on the basis of their labels into 18-, 11-and 6-class combinations. The three groupings consist of information classes which cover the whole study area with decreasing levels of detail. The checking system enabled efficient testing with different weighting and filtering alternatives, and thus promoted refinement of the final goals of this study.

Initial Classification
No priors were used in the classification and the output image was not filtered. The verification sample included 228 plots. Overall accuracies at the three aggregation levels were only 22 % (18 classes), 35 % (11 classes) and 36 % (6 classes).
The accuracy was so poor that the output image is useless in practical applications. Some explanatory features of the outcome are discussed here in more detail.
The cross-tabulations revealed that mineral soil and peatland strata were confused at all levels. Confusion was usual in two typical cases. Forest land on mineral soil with a sparse growing stock was poorly separated from scrub land peatland forest. Clear cut areas on mineral forest land were readily confused with scrub and waste land peatland. The problem is even more serious because it concerns two important variables of forestry (land class and sub-class) simultaneously. Worse still, the strata in question account for a considerable part of the study area. This confusion problem was no surprise because the strata are spectrally close to each other.
The peatland strata are not damage-prone. An experiment was made in which peatland was masked out, and the remaining area was classified without the peatland signatures. The mask had been digitized from a medium resolution map. It became apparent that the mask did not completely remove peatland, and the removed part included rather a lot of mineral soil. This was due to the resolution of the mask and to the fact that peatland is defined somewhat differently on maps than in the National Forest Inventory. The masking alternative was rejected because a better mask was not available. A terrain model, if it had been available, would probably not have resolved the problem either.
Dominant tree species and development class are also important variables in practical forestry. These variables were used in defining the forest strata on forest land. The cross-tabulations revealed a lot of confusion among the strata. The result was expected for many reasons. Dominant tree species is primarily determined on the basis of stand stem volume. Other tree species usually occur in the same stand and they also affect the canopy reflection. In addition, pine and spruce, which dominate on most of the forested land area, are spectrally fairly close to each other. Development classes form a continuum ranging from clear cuts to closed mature stands and there are often no distinct borders, which tends to increase confusion of this variable in remote sensing.
The margins of the cross-tabulation matrices refer to two distributions in the verification sample: a) into the true field strata, and b) into the image classes. Large discrepancies occurred between the distributions. Classification errors can sometimes cancel each other out, but this did not happen in this sub-population. As is known a priori, the ML classifier is prone to give poor area estimates.

Filtering and the Use of Prior Probabilities
The initial classification output was filtered using the majority rule. The proper window was first sought by experimentation. A square window with a side length of three or five pixels was found to be the optimal or near optimal solution. Increasing the window size did not improve the result. In some cases a side greater than five pixels even gave a worse result. Testing with circles led to similar conclusions. Oblong window forms gave either better or worse results depending on the orientation of the window. This phenomenon reflects the effect of the spatial pattern on the filtering success. A square is a robust choice and a window of three by three pixels was therefore used in this study. The priors used in the classifications were initially calculated from the field sample of the National Forest Inventory. The results are presented in Table 1 in the A columns.
Filtering clearly reduced the confusion. However, it did not remove the basic problems discussed above. Using the prior probabilities gave better results only after filtering. In point of fact, using the priors without filtering possibly impaired the result. Evidently the distribution of the verification sample deviates too much from the prior probabilities. Using the priors plus filtering clearly produced the best agreement. The values of coincidence improved in the best case by almost ten percentage points at all levels of aggregation.
The proportions of the six output image classes in the whole study area were compared with the corresponding priors. The sums of the differences (absolute values) were 23.4, 20.8, 17.9 and 19.7 percentage points in the same case order as in Table 1. Thus filtering increased the similarity of the distributions when the priors were not used, but the effect of filtering was reversed when the priors were used. The result may be an accident due to the small sample.
Another signature set was calculated from the original image channel values using the same location parameters for the training areas as with the synthetic channels. The location correspondence is not perfect because a few original seed areas and all the additional seed areas were omitted owing to software limitations. Classification of the original image showed trends rather similar to those presented above. Using priors and filtering the output image gave overall classification accuracies of 30.8, 40.7 and 41.2 % for the three levels of class aggregation. The synthetic channels were thus better after aggregation of the initial image classes. The distribution difference sum in the six-class aggregation was 22.1 when using the priors without filtering. Again, filtering increased the sum to 24.1. The sums are clearly greater than those obtained using the synthetic channels (17.9 and 19.7). Thus the synthetic channels were better in this context, too. Removing the thermal channel from the classifications did not alter these general trends.

Comparisons with the National Forest Inventory Data
Classification accuracy was also studied with the new NFI data. Visual location of the plots in the satellite image was no longer possible because aerial photographs had not been used in the field. Automatic location was therefore performed on the basis of the calculated plot coordinates. The results obtained using the synthetic channels are presented in Table 1 in the B columns.
The agreement shows the same general trends with the exception that using the priors reduces the confusion without filtering even. The NFI sample is much more representative than the verification sample. The priors cause both correct and incorrect class assignments, the latter evidently occuring less frequently in a representative sample. Due to the phenomenon, case 3 gives better accuracy in the NFI sample than in the verification sample. In the other cases accuracy was usually lower in the NFI sample. This is obviously due to the less accurate location data for the NFI plots in the satellite image.
The priors were manipulated to reach a better agreement between the original priors and the output image class distribution into the six classes. It was assumed that the correlation between the prior values and the proportions of the output image classes is positive and linear. Two new sets of priors were derived from the output image obtained using the synthetic channels and the original priors in the classification. The basis used in deriving the first new prior set was the unfiltered output image (case 3 in Table 1 and the PM1 cases in Table 2), while the filtered output image was used in deriving the second new prior set (case 4 in Table 1 and the PM2 cases in Table 2).
Without manipulation the difference sums were 17.9 before filtering and 19.7 after filtering. As expected, manipulation decreased the difference sums drastically ( Table 2). The effect of filtering now depends on the manipulation starting point. The sum is increased by filtering when the new priors were derived from the unfiltered output image, and vice versa. The coincidences are almost the same or slightly poorer than those obtained using the original priors. Further manipulations revealed that coincidence became the poorer, the more the new priors differed from the original ones. Increasing forcing by priors in classification thus causes more incorrect class asssignments.
The priors used in this study were calculated from the NFI field sample. Due to the large sample size these priors (PR) serve as a good estimate of the true field stratum distribution. After classification one can also estimate the distribution using the stratified sampling estimator (STR). The proportions of the six field strata (p j , j = 1-6) were calculated as where p i refers to the proportion of image class i and p ij refers to the proportion of field stratum j in image class i. The sum of the absolute differences between the two estimates (PR-STR) should be small. The difference sums presented in Table 2 were calculated from the differences between the priors and the output image class distribution (PR-I). In the following the output image class distribution is compared with the other estimate of the true field stratum distribution (STR-I). If both PR and STR are good estimates, the corresponding values of PR-I and STR-I should be fairly close to each other. The three kinds of difference sums are presented in Table 3. The PR and STR distributions are very similar. Both using priors in the classification and filtering the output image increase PR-STR. In absolute terms, however, the increase in the difference sum may be insignificant. PR-I and STR-I differ only slightly, and the differences are inconsistent to at least some extent. The results are as expected which indicates that, despite the poor classification accuracy, one can use stratified sampling in this context without a risk of a great systematic error. Sampling error will be discussed in more detail in the following section.

Estimation of the Total Damage Area
The new National Forest Inventory sample measured in 1993 in the study area is more than ten times larger than the verification sample. Thus, in addition to the priors, it can be used for estimating the occurrence of damage in the forest strata. In point of fact, the other field samples are too small for this purpose. For the sake of comparison the total damage area was estimated in three different ways. First, the NFI sample alone was used to obtain the basic estimate. The other estimates were based on the principle of stratified sampling with six strata. The strata weights were obtained from the output image class distribution after filtering. The classification was carried out with the synthetic channels using the original priors. The stratified approaches differ in the way the factual content of the image classes was estimated as follows: a) The verification sample is used to estimate the field stratum distributions by image class. The damage proportions in the field strata are estimated from the NFI sample. It must be assumed that the damage proportion by field stratum does not depend on the image class. Thus the same damage proportions by field stratum are applied in all image classes. b) The NFI sample is used in estimating both the field stratum distributions by image class and the damage proportions by field stratum. In other words, the damage proportions are estimated directly from the NFI sample by image class.
Expressed in the form of equations, the two latter estimates are calculated as where i and j refer to image class and field stratum, respectively, and A i = area of image class i p ij = proportion of field stratum j in image class i pd j = damage proportion in field stratum j pd i = damage proportion in image class i In principle, all three approaches lead to unbiased estimates of the damage area if the field samples are random. The whole NFI sample and the main part of the verification sample are sys- tematic. In theory, a systematic sample leads to biased estimates if the sampling pattern coincides to any extent with the spatial pattern of the population. In practical applications of systematic sampling, however, significant biases have rarely if ever been verified. Thus the main concern is usually the statistical precision of the estimates. Precision estimates calculated by the random sampling estimators are generally conservative in the context of systematic sampling. There is therefore justification to state that the true accuracy of the result is, very probably, at least the same as or better than the precision measure indicates. One part of the verification sample was chosen subjectively in order to obtain more observations of certain important forest strata. It is namely this sub-sample, and not the systematic part, that can lead to biases in stratified sampling. There is no problem when one asks how certain field strata are classified into the image classes. In stratified sampling, however, the question is reversed.
The total area of the study area is 18 317 sq.km. The estimates of the field stratum distribution and the damage proportions by stratum in the NFI sample are presented in Table 4 in the C columns. The estimated total damage area is 6890 sq.km. The absolute and relative standard errors of the estimated total damage proportion (37.6 %) are 0.94 % and 2.49 %, respectively. Expressed as area the standard error is 171 sq.km.
The results concerning stratified sampling with the verification sample (option a) are given in Table 4 in the A columns. The first column refers to the field stratum distribution estimated by stratified sampling. The second column reveals the damage proportions in the image classes. They are, of course, not the same as those in the NFI sample because each image class consists of several field strata due to the classification errors. The estimated total damage area is now 8600 sq.km which is considerably more than that obtained on the basis of the NFI sample alone. The reason for the deviation is clear: the areas of the field strata with a high damage proportion (3 and 4) have been estimated larger than they are in the NFI sample. The absolute and relative standard errors of the estimated total damage proportion are now 3.4 % and 7.2 % respectively. The areal standard error is 618 sq.km.
The two damage estimates are independent because the field samples are independent. Thus the variance of the difference is calculated as the sum of the variances of the estimates. The stand- Table 4. The output image class distribution, and the stratum and damage proportions estimated by either stratified sampling (A and B) or simple field sampling (C). The verification sample was used in A, and the field sample of the National Forest Inventory was used in B and C. The image class weights used in A and B were obtained from an image classification described at the beginning of chapter 5. ard error of the difference is 641 sq.km. The difference is 1710 sq.km and is therefore statistically very significant. The outcome cannot be explained by the sampling errors alone. Stratified sampling with the NFI sample (option b) gave results which are presented in Table  4 in the B columns. The damage area estimate is now 7000 sq.km. The absolute and relative standard errors of the damage proportion estimate are 0.94 % and 2.47 %. The areal standard error is 173 sq.km. The damage area estimate deviates by 110 sq.km from the estimate calculated from the NFI sample alone. This is well below the corresponding areal standard errors. The precisions of these two estimates are almost the same. There are two reasons why stratified sampling does not give a more precise estimate. First, the NFI sample is so large that the precision obtained with it is hard to overcome. Secondly, the image classes do not differ very much as regards the damage proportion and, this being the case, stratified sampling is not an efficient estimation method.
The use of the verification sample in stratified sampling (option b) leads to a biased estimate of the damage area. The explanation is as follows: The subjective part of the sample was directed at the most important strata, mainly 3 and 4, in order to study classification accuracy more thoroughly in these strata. The poor classification accuracy first led to a clear overestimation of the strata shares. By accident, the damage proportion in these field strata was higher than the average, resulting in a serious overestimation of the total damage area. Without the bias the accuracy would perhaps have been satisfactory, as can be concluded from the estimated relative standard error (7.2 %).

Forest Health
The whole procedure used to obtain the area results was, in short, as follows: The synthetic working image (section 3.2) was classified into 30 classes using the ML classifier. A set of manipulated prior probabilities (PM2, section 4.3) was used in the classification. The output image was filtered using the majority rule and a 3 by 3 pixels square window (section 4.2). The original image classes were aggregated into six image strata. The damage proportions by image stratum were estimated from the National Forest Inventory field sample. Finally, the damage area estimates in the study area were calculated using the stratified sampling estimator (option b, section 4.4).

Area of the Damage
The estimated areas of 14 different types of forest damage are presented in Table 5. Each damage type is divided into two severity classes. Slight damage has not impaired forest quality or changed the stand development class. All dam-age classes worse than slight belong to the class noticeable. The scope of the latter class is large. It includes all stands involving damage ranging from decreased quality to total destruction of the stand.
The damage classification has been made only on forest land in the field. Due to the estimation method, the damage percentages presented in Table 5 refer to the total acreage of the study area, including watercourses. The proportion of forest land in the study area is 64.89 % as estimated from the NFI sample. Thus the presented percentages must be multiplied by 1.54 in order to obtain the corresponding shares on forest land.
It is not the purpose of this study to analyse the result in more detail. However, two features deserve immediate attention. The cause of damage is unknown in almost 30 % of the damage area. The two major causes of named damage are rot fungi and multiple damage. The former may indicate that special attention has been paid to the damage item in the field. The latter indicates that the silvicultural condition is not very good in the study area. There are many possible reasons for this condition, including reduced cutting activities due to conservation.

Thematic Map
There are two image classes where the damage proportion is clearly above the average (classes 3 and 4). These classes are labeled as mature pine-dominated forest and mature spruce-dominated forest in accordance with the training areas. The spatial distributions of the two image classes in the study area can be seen in Fig. 2. The yellow colour refers to the pine class where the damage proportion is 52.1 %. The spruce class is coloured red and the damage proportion there is 58.7 %. The damage percentages in the corresponding field strata were 81.7 and 94.3. The dashed lines in the thematic map are municipality borders. They have been added to the map to make it spatially more informative.
No profound analysis of the produced thematic map is performed here. Let it be noted, however, that the colours form patterns that are in accordance with the known forest characteristics in the study area. One must keep in mind the limitations of the map. First, the factual content of the image classes is not nearly the same as that indicated by the class labels. Second, the colours also cover a lot of healthy forest. The two image classes on the thematic map account for 17 % of the study area and they include only 25 % of the total damage area.
Small-scale thematic maps based on satellite imagery are abstractions and generalizations of spatial phenomena. As such they do not depict the strict details observable in the field or on the original output image. In many cases image details are lost in the printing phase. For example, the thematic map in Fig. 2 includes only about 1 % of the original output image pixels. This kind of reduction may not be harmful because a more synoptic view can reveal new larger patterns which are relevant to the problem under investigation.

Discussion
Timely and synoptic study knowledge of forest health in Finnish Lapland was needed without delay in the late '80s. Several investigations were started spontanieously which were later combined into the Lapland Forest Damage Project. This may not have resulted in the best possible outcome. In one sub-project the damge situation was studied using satellite imagery. The method was chosen mainly on the basis of the large area involved and the poor accessibility in West Kola particularly. Difficulties in using satellite imagery for the purpose in East Lapland were anticipated for several reasons (Mattila 1992).
Earlier experiences gained elsewhere in the use of satellite imagery in forest damage surveys were not very promising. The conditions in East Lapland are unfavourable, as can be concluded from the study results of Koch et al. (1990). The scarcity of usable images also hindered the use of techniques based on multi-date imagery (see eg. Häme 1991). In addition, the date of the only usable image was far from optimum during the growing season. The effect of the late date can be seen in a panorama of four images (Mikkola 1996b, p. 3687). It later appeared that the satellite sensor (TM) condition impaired the quality of the material, too (Olsson 1994).
It became clear that direct mapping of forest damage was not possible in East Lapland. Instead, it was decided to try to map the forest strata where the incidence of damage was high. Correction of the original image due to atmosphere and topography was not possible. Some transformations which combine information from two or more channels can correct to some extent. Thus so-called synthetic channels were constructed and used in the classification. The synthetic channels gave slightly better results than the original ones. The advantage seemed to be the greater, the more the original image classes were aggregated. This result is interesting, and it de-serves more attention in other contexs. Of course, pure chance may have caused the phenomenon in question in this study.
The thermal TM channel is rarely used in vegetation surveys due to its smaller spatial resolution compared to the other channels (120 m versus 30 m). It was used in this study after artificially dividing each pixel of the channel into 16 parts to harmonize the resolution. In comparisons it appeared that the use of the thermal channel gave no clear advantage in East Lapland. In West Kola, however, the thermal channel made the worst damage areas more visible (Mikkola 1996b).
These experiences justify the conclusion that, in normal forest conditions, it is a robust solution to use the original six reflective channels in image classification. Use of synthetic channels and the thermal channel is not efficient because it requires much additional work and the gain is little, at the best. Instead, if relevant data is available, one should consider corrections to diminish the noise due to topography and atmosphere.
A systematic field sample was used for training purposes. It appeared that forming training areas from a systematic field sample is quite a complex task. The original plots had to be combined in many phases to obtain usable training areas. In hindsight, it is apparent that a better procedure would have been to select subjectively certain homogeneous training areas from a few field strata. The problem is that these strata are not known a priori with any certainty. Creation of proper signatures will always be a rather iterative process.
The classification accuracy was first studied using a small, separate field sample gathered for verification purposes. Part of the sample had been chosen subjectively in order to obtain more observations from the important forest strata. It appeared that some sort of objective, unequal inclusion probability sampling could have been a better choice to avoid the problems later caused by the subjective part of the sample.
Classification accuracy was poor, as would be expected a priori. When 18 image classes were used, coincidence between the classes and the true field strata was only 22 %. The poor classification accuracy was confirmed by using the far more representative sample of the National Forest In-ventory (NFI) in the comparisons, too. The main reason for this outcome is that the field strata were defined to meet the needs of practical forestry. These strata are spectrally not well separable.
Proper use of priors in classification improved the coincidence. It became evident that poor priors can even impair the result. The coincidence was also increased by filtering the output image by the majority rule. The effect of filtering depended on the form and size of the filtering window. Joint use of the priors and filtering increased the coincidence drastically. Despite these measures, however, the output image as such was not usable for any practical application.
It was necessary to aggregate the original spectral classes in order to increase the classification accuracy. In testing, the accuracy statistics were calculated simultaneously at three aggregation levels by dividing the output image into 18, 11 or 6 classes. The best results were obtained with the coarsest division. However, the improvement over the medium coarse division was fairly small. Taking into consideration the unfavourable preconditions mentioned above, the achieved coincidence of about 45 % is probably near the potential maximum.
The NFI field sample was used to study the occurrence of damage in the image classes. The damage frequency was clearly above the average in two of the six image classes. The training areas of the two image classes had been measured in mature pine-and spruce-dominated stands. On the basis of the observations it was decided to utilize the output image in two ways. Firstly, damage acreages were estimated by the principle of stratified sampling using the six image classes. Secondly, a thematic map was compiled which shows the spatial distribution of the two damage-richest image classes.
Stratified sampling is an efficient method when classes can be formed which differ clearly from each other as regards the object under investigation. In this case one can reach the same precision with a smaller sample than without stratification. The potential gain is not reached if the sample is not chosen in a proper way. The verification sample used in this study included a subjective component from the most important forest strata. This, together with the poor classification accuracy, led to considerable bias, which could be detected thanks to the objective NFI field sample. Without the bias, the accuracy could have been fairly good by using the verification sample. The lesson in this case is that every point within an image class must have the same probability to be sampled in stratified sampling. The probability can of course be different in different classes.
The damage acreages were finally estimated using stratifid sampling with the six image classes and the NFI field sample. The procedure gave almost the same precision as the NFI sample alone. The two main reasons for this outcome are as follows. Firstly, the precision obtained using the NFI sample alone was high. Secondly, stratified sampling was not very efficient because the image classes differed fairly modestly with respect to the damage frequency in the classes. On the other hand, stratification did at least not impair the precision, which is important considering the thematic map.
The two damage-rich image classes depicted on the thematic map contain a considerable amount of healthy forest, too. On the other hand, only about one quarter of the total damage area is in the two classes. Despite these limitations, the map reveals damage concentrations which are associated with the known forest patterns in the study area. This kind of visualization and generalization is possible with digital satellite data, a facility that has contributed much to the great popularity of the techniques. However, to avoid erroneous conclusions one should always know the exact facts behind the scene, too.
The main results of this study lie in the damage acreage information. The distribution of the estimated total damage area into 14 types and two grades each (Table 5) reveals some interesting features. The total damage acreage accounted for 38 % of the study area. 39 % of the damage acreage was slight damage that does not affect stand quality according to the definition. No damaging agent could be identified in 29 % of the damage acreage. These findings indicate that the field crews may have paid special attention to the damage item. Public interest in forest health can be one reason for this. The damage classification scheme used here was more detailed than before, which may also have increased the proportion of damage observations. The study area is almost the same as the northern part of the Koillis-Suomi Forestry Board District. The damage percentage on forest land there was 22 % in 1983 (Mattila 1990, p. 104). Estimated on the whole acreage, the comparable percentage was 13-14 %. The estimated damage proportion has thus almost tripled in ten years. However, these estimates are not commensurable due to the facts mentioned above. Many findings suggest that the development has not been so drastic. Some increase of damage is probable, however. The effects of some abnormal climatic factors in Lapland in the 1980s on forest health may well have still been visible in 1993 when the new NFI sample was measured.
The study provides a cross-section of the situation at the beginning of the 1990s. As such, it can serve as one link in the time series started in 1983. However, commensurability must be taken into account in all time series analyses in order to avoid erroneous conclusions. The thematic map presented in this study is only an example of the visualization potential of digital image processing techniques. All too often a clear distinction is not drawn between image class and its factual content. Unbiased estimation of the true content is not possible without objective field sampling. Erroneous emphasis on pictorial results only has caused much disappointment. This may be one reason why practical applications of satellite remote sensing are nowadays fewer than have earlier been predicted.