Predictions of Forest Inventory Cover Type Proportions Using Landsat TM Predictions of Forest Inventory Cover Type Proportions Using Landsat TM

The feasibility of generating via Landsat TM data current estimates of cover type proportions for areas lacking this information in the national forest inventory was explored by a case study in New Brunswick. A recent forest management inventory covering 4196 km2 in south-eastern New Brunswick (the test area) and a coregistered Landsat TM scene was used to develop predictive models of 12 cover type proportions in an adjacent 4525 km2 region (the validation area). Four prediction models were considered, one using a maximum likelihood classifier (MLC), and three using the proportions of 30 TM clusters as predictors. The MLC was superior for non-vegetated cover types while a neural net or a prorating of cluster proportions was chosen for predicting vegetated cover types. Most predictions generated for national inventory photo-plots of 2 × 2 km were closer to the most recent inventory results than estimates extrapolated from the test area. Agreement between predictions and current inventory results varied considerably among cover types with model-based predictions outperforming, on average, the simple spatial extensions by about 14 %. In this region, an 11year-old forest inventory for the validation area provided estimates that in half the cases were closer to current inventory estimates than predictions using the optimal Landsat TM model. A strong temporal correlation of photo-plot-level cover type proportions made old-values more consistent than predictions using the optimal Landsat TM model in all but three cases. Prorating of cluster proportions holds promise for large-scale multi-sensor predictions of forest inventory cover types.


Introduction
Statistics on the current extent of land-cover types are needed for a plethora of natural resource planning and policy purposes (Apps et al. 1995, Dale and Robinson 1996, Noss et al. 1997, Odum 1996, Scott et al. 1996. Increased demands for reporting the state of the environment has accentuated the need for statistics stating the extent of land cover types (White et al. 1992). Estimates of cover-type proportions are pivotal statistics suited to quantify strategic issues and trends. Detailed spatial knowledge about cover type distributions in the landscape is needed for efficient implementation and monitoring of specific objectives and plans (Andersen 1988, Costanza et al. 1992). Compilation of current land-cover statistics is a challenge for large countries like Canada (Leckie and Gillis 1995). Although parts of the country are inventoried each year many regions are without a current land-cover inventory (Gray and Power 1997). Estimates of cover type proportions for these regions are lacking, incomplete or dated. Remote sensing has the potential to fill inventory gaps and provide approximations for the missing estimates (Ahern et al. 1996, He et al. 1998). Yet development of a remote sensing classifier using verified ground data and implementing an independent validation/calibration process are both costly and time-consuming (Congalton and Biging 1992) activities, even within a sampling context (Bauer et al. 1994, Moody 1998). However, if the missing data is primarily cover type proportions for landscape segments several hectares in size, cheaper and more expedient shortcuts should be possible.
The objective of this study is to explore, within the framework of a national inventory, an expedient use of Landsat TM imagery and existing regional forest inventories to make predictions of forest cover type proportions for areas for which there are no or only dated estimates. We shall evaluate our predictions against current inventory estimates and against estimates from a 11-year-old inventory. Predictions are assessed on how well they match estimates that would otherwise have been forthcoming from a local inventory to the national inventory (Lowe et al. 1994). Predictions of the proposed nature would greatly expand the domain with up-to-date land-cover information. The tested methods are both fast and cheap as they rely on existing inventories and readily available software algorithms.
Our approach differs from the now classic combination of remotely sensed data and ground inventory in a double sampling framework (Cochran 1977, Czaplewski and Catts 1992, Tenenbein 1972 in that the reference (training) data come from an area outside the area for which estimates are needed. Furthermore, estimates of cover type proportions are produced for 2 × 2 km area units with no attempt to produce a map. Unbiasedness is only assumed and not given by design (Gregoire 1998).

Study Site and Photo-plots
An 8721-km 2 region in the southeastern part of New Brunswick was used as the study site ( Fig.  1). Approximately one half of the area (TEST) was used to train and test various predictive models while the other half (VALID) was used for comparing final model predictions against forest inventory results. The test area was again split into two non-overlapping halves (TEST1 and TEST2) in a checkerboard fashion, one half for model fitting (TEST1) and the other (TEST2) for model comparisons. Models chosen based on their ability to predict cover type proportions in TEST2 were then refitted using all data in the test area (e.g. TEST1 + TEST2) and used to make predictions for the validation area. From general considerations of climate, soils and ecological conditions, it was assumed that the cover types found in the test area would be representative for the cover types in the validation area. This assumption proved warranted; one rare cover type (bryoids) in the test area (0.003 %) was not documented for the validation area.
Cover type proportion estimates are desired for units of area called "photo-plots". The recommended size for these photo-plots is 400 ha in a square 2 × 2 km area ("A Plot-based National Forest Inventory Design For Canada". Natural Resources Canada. 1999. Canadian Forest Service, Pacific For. Centre, Victoria, BC. 70 p, unpubl.) The study area was therefore subdivided into a grid of 2 × 2 km photo plots, 1193 photo plots inside the validation area and 1149 photo plots in the test area. These photo plots are the units for predicting cover types.

Forest Inventory Cover Types
Forest cover type maps in a GIS format (New Brunswick Department of Natural Resources and Energy, Forest Management Branch) for the entire study area were used as benchmark data for training and fitting purposes. The cover type maps were based on photo interpretation of 1:12 500 air-photos taken during the months of July, August and early September of 1993 (test area) and 1994 (validation area). The forest inventory was completed in 1993 for the test area and in 1993-1994 for the validation area. Inventory cover types, as listed for specific polygons on the forest cover maps, were grouped into a hierarchical system compatible, as far as possible, with the classification system currently under development for the National Forest Inventory. Assignment of inventory labels to the national scheme was guided by over 30 inventory attributes associated with each polygon. Relabeling existing provincial inventory cover types to a unified national scheme is not without problems (Carpenter et al. 1999, Foody 1999  polygons with a combined area of maximum 183 ha were labeled as partial cuts or thinning (PC, TI) with 0 % crown closure. The inventory information indicated 'low shrub' as the most likely cover type for these polygons. 2. Polygons disturbed between 1983 and 1990 and currently dominated (area) by a tree/shrub vegetation > 2m tall. A few (see note 1 above) polygons with incomplete records may have erroneously been placed in this category.
definitions prevented a perfect conversion. Discussions with provincial inventory experts helped decide a 'most likely' classification in many cases, and defaulting to past conversion rules used in Canada's National Forest Inventory (Gray and Power 1997) were used in remaining unresolved cases. Only the aggregate cover types are used in the analyses. The first three levels of the aggregated cover types are listed in Table 1 along with the complement of forest inventory cover types. Cover type proportions are listed in Table 2.
To check whether 'old' inventories could provide estimates of current cover type proportions that were in some sense superior to what could be predicted with our models (see below) we added an inventory from 1983/84 to our data sets. Due to minor changes in inventory proce-dures we discarded about 15 % of the 'old' polygons which could not be classified with any confidence. We used simple prorating of areas to compensate for excluded areas (Little and Rubin 1987).

Landsat Image Data
Landsat 5 TM scene number (Track 9, Frame 28) from July 31 1995 covered the entire study area. This scene was deemed the better choice among alternatives from August 1993, and July/ August 1994, and July 1995/96 as these had either more cloud cover or were less temporally synchronized with the ground inventories. Spatial resolution was 30 × 30 m for channels 1-5, and 7 as opposed to 120 × 120 m for channel 6 (thermal).
The Landsat TM scene was geometrically corrected using the PCI utility "GCP Works". Vector data from New Brunswick Forest Inventory Maps (scale 1:12 500) were used to provide ground control points. A total of 37 control points were gathered throughout the image utilizing features such as road intersections, bridges, etc. An overall root mean square error of 0.46 pixels was attained using a first order nearest neighbor sampling technique. Radiance values were given in an 8-bit format. No radiometric correction was done. First, accurate correction factors were not available, and secondly, we do not expect such values to be readily available for the type of areas that we envision for the application domain of our methods. Further, it was also deemed desirable to maintain the full dynamic range of the radiance values.
Forest cover type maps were then co-registered with the Landsat TM scene and rasterized to a grid size compatible with the Landsat TM resolution. Vector data within the image was used to ensure that the co-registration was correct to within a pixel. Further details are given in Wulder (1998). Two streaks of coastal fog covering about 144 km 2 (18 photo plots in the test area, and 16 photo plots in the validation area) and clearly visible in southern part of the image ( Fig. 1) were left 'as is' in the data.

Principal Component Scores
To reduce redundancy and potential collinearity between Landsat TM channels we extracted the first four principal components accounting for 94.5 % of the total variation in the 7 channels (Rencher 1993), and computed four principal Table 2. Cover type distribution in the test area and model dependent mean absolute difference (MAD %) between Landsat TM model-based predictions and inventoried cover type proportions in 2 × 2 km photo plots within TEST2. Models were fitted to inventory land cover proportions in area TEST1. The row wise minimum of MAD % is typed in Bold. ∆MAD( %) is the relative reduction in MAD % achieved by using the best model instead of SPEX. component scores for each of the 9.7 · 10 6 pixels in the study area (4.7 · 10 6 in the test area and 5.0 · 10 6 in validation area). Scores of each retained principal component were multiplied by the square root of the variance explained by the component. Preliminary analyses with the six channel TM data plus various vegetation indices (Rouse et al. 1974) treated as additional channel data failed to isolate significant information in the vegetation indices not already accounted for by a linear combination of the six channels.

Unsupervised Clustering
A pre-stratification of principal component scores into disjoint clusters (Lillesand 1996) was subsumed to enhance our ability to predict cover types from Landsat TM data. Formed clusters are assumed associated with one or a few inventory cover type classes. If true, it would be possible to exploit this association in a regression framework as outlined in section 2.4 where we quantify the association between clusters and cover types.
In an unsupervised clustering (Hartigan 1975) of PC-scores, with a predetermined number of clusters n clu = {3, 6, 9, ... ,33, 40, 50, 60, 70} and random initial centroids, the assignment of pixels to clusters was done when the maximum change in cluster centroids, between two iterations, fell below 0.02. About 140 iterations were needed to meet this criterion for n clu . = 70. Three scaled (to the 0-1 range) and penalized clustering criteria: i) cubic clustering criteria (Milligan and Cooper 1985), ii) R 2 (Hartigan 1975), and iii) a within-cluster relative diversity index of cover types assisted in deciding on how many clusters to maintain. Ideal clusters would represent only one cover type (a diversity index of 0). For a particular n clu value the mean diversity index (D) was estimated as: where E stands for an expectation (average) and S is Simpson's diversity index (Patil 1982). Level refers to the three classification levels (Table 1). Penalty factors (n plotsn cover types x n clusters -1) / (n plotsn covertypes -1) were levied to account for the decline in degrees of freedom of the residual terms of any prediction model as a result of adding more clusters as predictors. Fig. 2 illustrates the trends in the three indicators. Fifteen to 35 clusters appear to perform equally well. Our final choice was 30, coincident with the maximum of the cubic clustering criterion. Fig. 3 presents an example of the clustered TM-image (30 gray tones) and the corresponding forest inventory cover type map.

Comparing TM Data from the Test and Validation Areas
Statistical tests of the null hypotheses of no differences between PC scores and cluster proportions in the test and the validation areas were done with either Hotelling's T 2 tests or Chisquare goodness of fit tests (Rencher 1993).

Predicting Cover Type Proportions from Cluster Proportions
Clusters generated by the unsupervised clustering are assumed associated with one or a few inventory cover types. If so, we would also expect proportions of clusters in an area (here 2 × 2 km) to be associated with the proportions of cover types in this area (Metternicht and Fermont 1998). We do not expect this association to be perfect nor simple. To explore what could be a very complex relationship we tested two flexible non-linear models: neural net (NNET), and alternating conditional expectations (ACE) as the models most likely to succeed (Breiman and Friedman 1985, Carpenter et al. 1997, Lek et al. 1996, Moody et al. 1996, Warner and Misra 1996. Should the relationship between cluster proportions and cover type proportions, however, Level I cover types Level II cover types Level III cover types Landsat TM clusters turn out to be linear, then linear models should perform well. Two linear models were tested, one that simply exploits conditional probabilities of a cover type given the cluster (PRO), and a spatial extension model (SPEX) which simply extrapolates results from one area to adjacent areas.
The generic model applicable to all tested models is: where r P li is a 1 × n l vector of proportions of the n l cover types within classification level l (l = I, II, III, see Table 1) in the ith photo-plot, r C i is a 1 × n clu vector of cluster proportions (n clu is the number of defined clusters), J clu is a n clu × 1 vector of ones, J l a n l × 1 vector of ones and g and f are two monotone functions of proportions. The last two equations in (2) are sum-to-one restrictions.
In the NNET model f is replaced by a feedforward neural net (Lek et al. 1996) with one hidden layer taking ln( ) r C i + − 10 9 as input and with ln( ) r P li + − 10 9 as the target output (during training only). The decay rate for back-propagated residuals through the network was a constant 0.05 (Venables and Ripley 1994). NNET predictions were back-transformed to proportions. Trials with several other network configurations (number of layers, weights, and decay rates) on TEST1 data yielded inferior results.
In the ACE model f and g are monotone transformations of cluster and cover type proportions maximizing the correlation between the transformed proportions (Breiman and Friedman 1985).
In the PRO model f is a matrix of conditional probabilities P(cover type | cluster) obtained from the training area and pre-multiplied to the cluster proportions in a given photo-plot, and g is an identity matrix pre-multiplication of r P . P(cover type | cluster) is thus the proportion of pixels of a given cover type within the training area classified to a specific cluster. In other words a PRO prediction is the probability P(cover type | photoplot) = P(cover type | cluster) · P(cluster | photoplot). Thus f is essentially a 'confusion' matrix (Steele et al. 1998) and predictions are compatible with the classical 'inverse' method (Magnus-sen 1997a, Walsh and Burk 1993). P(cover type | cluster) is obtained from the training area and P(cluster | photo-plot) is merely the 'observed' cluster composition of a photo-plot in the area for which predictions are to be made.
The SPEX model used the mean cover-type proportions of the training area as the predictions for the testing area; specifically, r r P li E P l training area = ( | ). In other words, the area for which predictions are needed is assumed to be identical to the training area.
Linear combinations of NNET, ACE, and PRO predictions were also tested as a viable hybrid prediction model (Leblanc andTibshirani 1996, Peña 1997). However, no tangible improvements could be obtained by any weighted linear combination of individual model predictions (weights assigned to each model prediction were proportional to the median absolute residuals obtained in the training area with NNET, ACE and PRO).
All predictions from NNET, ACE, and PRO were proportionally adjusted and calibrated to satisfy a sum-to-one constraint and to ensure that all predictions were between 0 and 1. Predictions at a specific classification level were harmonized by iterative proportional fitting (Bishop et al. 1975) to match predictions at the next higher level.

Maximum Likelihood Classification (MLC)
MLC is generally considered a standard method for classification (McLachlan 1991) and we used MLC as an alternative to the above cluster-based models. Specifically we applied a quadratic Gaussian maximum likelihood classifier of principal component scores (see 2.3.1). The MLC used separate within-cover-type covariance matrices as a hypothesis of homogeneous covariance matrices was rejected (P = 0.001, χ 2 -test, Rencher(1993) page 272-273). The MLC classifier was trained on principal component scores of the pixels in the TEST1 area and tested on the pixels in TEST2. TEST2-predictions were then summarized for the 588 photo plots in this area and compared to test results from four alternatives (see below). Given the objective of predicting cover type proportions for a validation area assumed similar to the training area we used the appropriate cover type proportions from the TEST area(s) as priors for the classifier (Frigessi and Stander 1994). Incidentally, the assumption of normality was violated for the first principal component (P = 0.001) due to an excess kurtosis and right skewed distributions of principal component scores (D'Agostino et al. 1990). No simple transformation achieved normality; consequently we accepted that classifications may be less than optimal (McLachlan 1991). Classification accuracy of pixels in the TEST2 area varied from a high of 97 % for separating vegetated from non-vegetated to 92 % for the second classification level (Table1) and 51 % in the third level.

Temporal Extension of Old Inventory Results (OLD)
One of our subsidiary objectives was to compare TM-based predictions with estimates from dated inventories. In areas of little change an old inventory might still produce estimates of broad cover type proportions that are as good as TMbased model predictions. Cover type proportions as per the old inventory were simply used as estimates of current proportions. Estimates are, in all cases, only applied to the area (photo plot) from which they were derived.

Evaluating Cover Type Specific Prediction Models
In practical applications the choice of model to predict cover type proportions rests with its ability to predict consistently and with minimum divergence from established benchmarks. To establish which of the tested models is likely to produce the most consistent and least divergent predictions for a specific cover type, we first fitted all models to data from TEST1 and then compared their predictions for the TEST2 area to corresponding inventory estimates. Under practical circumstances this would complete the model evaluation and a decision on preference. For each cover type, the best model would then be refitted using data from both TEST1 and TEST2 and used to make predictions for an area with no or only dated inventory information. Everything else being equal, the model producing the lowest mean absolute deviation for a given cover type is also expected to produce the minimum absolute deviation when used to generate predictions. Note, however, that this selection does not imply that the chosen model is statistically superior. From the estimated mean absolute differences it is possible to conclude that NNET and MLC, on average, produce predictions of equal quality.
Predictions of cover type proportions in TEST2 photo plots from MLC, NNET, ACE, PRO, and SPEX were compared against current inventory results. Differences simply mean a lack of agreement as both the prediction as well as the inventory could be in error. A lack of agreement simply quantifies how much a Landsat-based summary of cover types would deviate from a summary based on a provincial inventory. The model producing the minimum average absolute deviation (Tauer 1983) for a given cover type was chosen to predict cover type proportions for photo-plots in the validation area.
Predictions made for the photo-plots in the validation area with the cover-type-specific chosen (best) models were compared to current inventory estimate by computing the mean absolute deviation.

Model Comparisons Based on TEST2 Predictions
Results from the comparison of model predictions with inventory estimates for TEST2 photoplots are shown in Table 2. NNET produced the lowest mean absolute deviations for all but two of the vegetated cover-types. MLC, on the other hand, was superior for predicting non-vegetated types. MLC was in all but two cases among the best two models. ACE, despite strong correlation (>0.8) between transformed cluster and cover-type proportions, resulted in five times more divergence compared to a simple extension of the test results (SPEX). PRO was, on average, at par with MLC. The SPEX model produced, in general, the largest discrepancies between pre-dictions and inventory estimates. From this we expected that relative to SPEX, the best model for each cover type would lower the mean absolute difference between a predicted cover type proportion and the actual cover type proportion of an inventory by 38 % for level I cover types, 64 % for level II, and 36 % for level III cover types. Table 2 shows the cover type specific models chosen to best predict cover type proportions for the validation area. NNET was chosen ten times, MLC seven times, and PRO and ACE were each chosen once. The SPEX model was not chosen.

Model Predictions for the Validation Area
With the cover-type-specific preferred models predictions of inventory cover type proportions for photo-plots in the validation area were, on average, significantly closer to the inventory estimates than predictions obtained by spatially extending results from the TEST area (SPEX). Predictions were also, on average, closer than the mean cover type proportions of the validation area (P = 0.001, test: Chi-square goodness of fit, Miller 1980). Predictions improved, in general, with an increased aggregation of cover types as one moves from level III to level I in the classification system. At the top level the reduction of differences amounted to 2/3 while at the most detailed the reduction was no more than about 1/6. Similarly, correlation coefficients between predicted and observed cover type proportions declined from 0.82 (P = 0.001) at the top level to an average of 0.49 (P = 0.001) at the third level. Non-vegetated predictions were exclusively based on the maximum likelihood classifier while the vegetated predictions, with two exceptions were based on neural net fitting involving unsupervised clustering. A simulation study indicated that the quality of our predictions was commensurate with that expected from an actual inventory of 6-8 % of the validation area. Table 3 details the predicted and actual cover type proportions and the mean absolute difference of the predictions (PRED). Results obtained by spatial extension of mean proportions from the test area (SPEX) and a temporal extension of the 'old' inventory (OLD) are included for comparison. Predictions using the optimal Landsat TM model of vegetated area were, on average, within 2 percentage points of the actual proportion while the individual photo-plot predictions were, on average, within about 9 percentage points of the comparable inventory figures. At this coarse level the model predictions were about 50 % closer to the target than spatially extended averages. As the shifts from vegetated to nonvegetated were rather limited during the last decade, the 11-year-old inventory is closer to the current state than our predictions. Had we used the 'old' inventory for our predictions, the mean absolute deviations would have been about 1/4 as large. A strong temporal correlation of photoplot proportions (~0.98) explains this observation.
At the next more detailed level (level II; Table  1) the benefit of predicting via Landsat TM images is striking. Average predictions fall within 1-2 percentage points of the inventory values while SPEX and OLD mostly miss them by 5-11 and 3-8 percentage points, respectively (Table 3). The TM-based predictions appear especially promising for the treed, land, and water portions. Photo-plot level disagreements in predicted cover type proportions were generally smallest for the 'old' inventory, followed by PRED, with SPEX a close third. Correlation coefficients between old and new inventory attributes at this level averaged 0.92.
Predictions of hardwood, mixedwood, and softwood proportions in the treed group were prone to consistent and substantial deviations from the inventory results (Table 3). Only mixedwood predictions matched the inventory estimates. SPEX and OLD had also apparent problems in predicting softwood and mixedwood proportions. Agreement between 'old' and new inventory estimates of softwood, mixedwood, and hardwood as epitomized by a correlation coefficient was 0.78, 0.71, and 0.72, respectively.
In the non-treed group it was clearly difficult to discriminate between low-shrubs and crop/ pasture. Spatial extension was only marginally better while the 'old' inventory was disadvantaged due to the aforementioned large shift in the non-treed group between the two inventories; changes that are a mix of real changes and method-dependent artifacts. Disagreement between predictions and inventory estimates for this group is, in general, about the same magnitude as the average predicted proportion. The exception is again the 'old' inventory that benefits from the strong temporal correlation of cover types within photo plots (avg. 0.72).
Predictions of third-level non-vegetated cover-types were superior to either spatial and temporal extensions. The maximum-likelihood-based pixel-level predictions were significantly (P = 0.01) closer to estimates from the current inventory than the two alternatives. Yet, individual photo plot predictions were associated with relatively large departures from the inventory.
All predictions for the 16 photo plots covered by at least 30 % coastal fog were significantly (P = 0.01) worse than other predictions. Predictions for partially obscured photo plots were, on aver-age, about four times further away from the inventory estimates than unobscured photo plots.
The distribution of differences between model-based and inventory estimates of cover type proportions is an important aspect of model performance. Fig. 4 highlights eight typical examples. Long tails of 'gross' differences and lack of symmetry was evident in most distributions. Also, the split between variance and bias differs between PRED, SPEX and OLD. If the concern is to minimize excessive disagreement then it is almost always 'safer' to use the decade-old inventory as it limits the disagreement to the amount of actual change in a photo plot plus any artifacts caused by changes in inventory procedures. With both PRED and SPEX the chance of an excessive photo-plot-level disagreement (defined as a disagreement that is twice the predicted value) is between 30 and 40 %.

Comparing Results from Testing and Validation
Discrepancies between predicted and inventory estimates of cover type proportions in the validation area were substantially higher than those reported when the best model was tried on the TEST2 area. In the vegetated parts the deviations were 2-4 times as high as comparable figures from the test area. In non-vegetated parts the deviations were even higher (6-8 times). Several factors contribute to this phenomenon. Spectral feature differences between the test area and the validation area were apparent. For example, identically labeled cover types in the two areas had different average spectral reflectance values. Fig. 5 captures the essence of this by plotting and connecting the means of the first two principal components in the TEST area and the validation area for the most important cover types. Hotelling's T 2 -tests of equal area means of the four principal components rejected the null hypothesis of no differences for all cover types (P = 0.001). The same test for a single photo plot failed to identify significant differences (P = 0.52). The shifts in mean principal component values were for the most part large enough to modify the association between cover types and clusters. Classification of most nonvegetated cover types by maximum likelihood is very sensitive to such shifts in the centroids of the principal components. As well, the variances of principal component scores for pixels in the cover types rock, river/stream, lake/pond, water and ocean changed significantly (P = 0.001) between the test and the validation sites and compromised predictions. The tight clustering of cluster centroids in the upper part of the illustrated feature space (Fig. 5) generates disproportionately large shifts in the association between cluster and cover-type proportions for even modest shifts in the cover type centroids between the test and validation areas. This is illustrated in Fig. 6 where the composition of each cover type in terms of image clusters is given for the test and the validation area. Only few cover types (water, lake/pond/, ocean, and rock) are strongly associated with one or two clusters. Most cover types are represented more or less equally by 4-12 clusters. Shifts between the two areas are legion. A global test of difference in proportions between the test and validation area led to a rejection (P = 0.001) of the null hypothesis while only non-vegetated,  Inflation of discrepancies between model predictions and inventory estimates when going from a test to a validation area is also attributable to statistical problems with the models. For the neu-ral-net-based predictions, overfitting and collinearity of cluster proportions were deemed to be the most significant factors. For example, the mean lack-of-fit obtained within the TEST1 area was increased by about 35 % when predictions were generated for the other half of the test area Each inventory cover type is represented by a maximum of 30 clusters. The relative cluster composition of a cover type is illustrated by a series of 30 dots along each row. The size of a dot representing one cluster is proportional to the fraction of pixels within a given inventory cover type that were assigned by unsupervised classification to the cluster. Gray dots: TEST area. Black dots: VALID area. Relative overall cluster sizes for the test and validation area are displayed in the last row.
(TEST2). Clearly, making predictions for areas outside the spatial domain of the model development results in a significant deterioration of performance over and above what can be deduced from testing the model inside the same area that generated the data for model development.
Neural net and maximum likelihood models were preferred for all but two cover types based on their expected performance in the validation area. While the very simple prorating scheme (PRO) only qualified for the best low shrub predictions in the testing phase it would, in hindsight, have made an appealing alternative due to simplicity and potential for fitting this procedure into a multi-phase prediction system. Prorating did significantly better in predicting the hardwood:mixedwood:softwood split and crop/ pasture. Conversely it performed no better than the spatial extension when it came to vegetated/ non-vegetated, and water. Mean predictions of a prorating scheme would have been, on average, about half a percentage point closer to the inventory estimates than predictions obtained with a mixture of neural nets and maximum likelihood classifiers.

Discussion and Conclusions
The demonstrated method for using Landsat TM images to obtain estimates of forest cover type proportions for sampling units in a national forest inventory is both fast, simple, and low-cost. Although the use of a single datum Landsat TM image naturally limits the number of distinguishable cover types well below the numbers possible with a photointerpretation of cover-types or multi-date images (Aldrich 1979, Drieman 1993, Lachowski and Bowlin 1988, Lillesand 1996, Wolter et al. 1995) the attraction remains for areas where no reliable cover type information exist. Enriching the predictions by provision of expected associations with more detailed inventory information remains an option (He et al. 1998).
In areas where the rate of net change in broadly defined cover type proportions is deemed low (0.5 % · yr -1 ) a cover type mapping completed within the last 20 years may still give estimates of the current cover type distribution as good as possible with the methods tested in this study. Although roughly 45 % of all pixels changed their level III cover type class between 1983 and 1994, the median net change in a cover type proportion of a 2 × 2 km photo plot was close to 0.2 %. Unless prediction differences can be reduced by an order of magnitude an update of inventories less than about 20 years old by use of current Landsat imagery appears, in situations similar to that of the case study, problematic (Sachs et al. 1998, Smiatek 1995, Van Deusen 1994, at least when the benchmark is a forest cover inventory based on interpreted aerial photos. Only a combined analysis of misclassification rates in the Landsat-based models and the forest inventory (Czaplewski 1999) can resolve the absolute error rates of our models. However, this is beyond the scope of this study. For an evaluation of how close we can predict estimates that would otherwise enter the national forest inventory our analyses remain valid.
Expedient use of existing forest inventory cover type information to develop allocation and classification rules for image data guarantees, everything else being equal, suboptimal results (Cohen et al. 1995, Dobbertin and Biging 1996, Franklin et al. 1997, Wulder et al. 1996. Covertype polygons delineated and classified by photo-interpretation into a single type usually contains a considerable amount of variation, both within and among the polygons of the class. A closer inspection of, say, a mixed-wood polygon would show a mosaic of pure hardwood, pure softwood pixels interspersed with bona-fide mixes. As well, photo-interpretation is not without errors. Agreement between ground and photobased interpretation of cover-types can diverge considerably (Biging et al. 1995, Eid and Naesset 1998, Hall and Fent 1996, Holmgren et al. 1997. Translation of provincial polygon classes into a set of national classes, possibly compounds the issue further. The price for this expediency is an increased frequency of disagreement and possibly biases. The significant deterioration of model performance experienced during the validation process confirmed that identically labeled polygons may indeed mask substantial differences in spectral reflectance. Subtle shifts in rock-bed, landscape features, agri-cultural practices, and species composition between the test area and the extension area could trigger the effect (Lillesand 1996, Slaymaker et al. 1996. Differences between Landsat-based model predictions and inventory estimates of cover type proportions reported in this study appear to be only slightly higher than results obtained with a two-phase sampling and verified training pixels (Bauer et al. 1994). Direct comparisons have been hampered by a scarcity of published records on the performance of Landsat TM-based predictions in regions outside the area used to develop the classifier. Studies by Efron (1986) indicate that an appreciable deterioration is common. Spatial heterogeneity in the performance of a classifier also increases the chance of mistakes when a classifier is used outside the domain from which it was derived (Steele et al. 1998). Our reported differences compare favorably with results obtained with AVHRR (Moody andWoodcock 1996, Turner et al. 1993).
The modeling approach that we adopted was inspired by recent work in linear 'de-mixing' models (Foody et al. 1996, Moody and Woodcock 1996, Turner et al. 1989 where progress towards removing resolution bias has been steady. For large spatial units, such as the 2 × 2 km photo plots, it appears that the mixing patterns of landscape elements within a fairly homogenous region are stable enough to make useful predictions of cover type proportions. Neural nets appear particularly apt at finding patterns that can be generalized (Carpenter et al. 1997, Lippmann 1987, Moody et al. 1996. The best number of image clusters to form remains contentious (Hartigan 1975, Milligan andCooper 1985). Our approach to let the number be guided by a combination of three indicators appears reasonable. Preliminary analyses with 13 and 18 clusters gave poorer results. We conjecture from our experience and from that of others (Beaubien 1994, Benjamin et al. 1996 that it is a better strategy to include many rather than few clusters. Furthermore, the employed models lend themselves well to a multistage inventory design (Köhl and Kushwaha 1994).
Enhancement of remote sensing classification by inclusion of auxiliary enduring landscape features has been exploited with positive results (Binaghi et al. 1997, Frigessi and Stander 1994, Sader et al. 1995. Notably, the Finnish multiresource inventory makes extensive use of such techniques . Adding soiltype (43 types), elevation class (0-25 m, 26-50 m, ... , 426-450 m), and eco-site (26 classes, "Ecological land classification for New Brunswick: Ecoregion, Ecodistrict and Ecosite levels", New Brunswick Department of Natural Resources and Energy, Forest Management Branch, 1998) to the 30 Landsat image clusters as auxiliary predictors of cover-type proportions, however, did little to improve our predictions; the main reasons for this were the coarse resolution of soils maps (~300 m), elevation models (100 m) and the absence of ecotones at the photo plot level (Cox et al. 1997). Apart from a few significant gratuitous correlation stating the obvious, the ratios of posterior likelihood of class membership actually declined in most cases when these auxiliary variables were added (see also McLachan (1991) for an exposé on the 'curse of dimensionality').
Attempts to improve predictions for single photo plots in the validation area by means of a composite average based on a set of k "most similar" photo plots in the test did not succeed despite its intuitive appeal (Moeur et al. 1995, Stroup andMulitze 1991). Similarities were indexed by Chi-square statistics of differences in pc-score cluster proportions. A recent testing of the Finnish multi-source k-nearest-neighbor methodology in Norway points in the same direction (Gjertsen et al. 1999).
An application of the presented methodology to the large fraction of Canada with dated (>20 yr.) inventory information (Gray and Power 1997) has immediate appeal. Recent inventories in nearby regions provide for a fast and low-cost updating of key resource cover types. Parallel arguments apply to the dispersed small scale (<20 000 ha) gaps in regional inventories. Success will, to a large degree, depend on the quality and availability of suitable forest inventories. Remote and unmanaged areas may lack current inventories within a radius deemed safe for model-based extrapolation.