Stratiﬁ ed Estimation of Forest Inventory Variables Using Spatially Summarized Stratiﬁ cations

using spatially summarized stratiﬁ cations. Large area natural resource inventory programs typically report estimates for selected geographic areas such as states or provinces, counties, and municipalities. To increase the precision of estimates, inventory programs may use stratiﬁ ed estimation, with classiﬁ ed satellite imagery having been found to be an efﬁ cient and effective basis for stratiﬁ cation. For the beneﬁ t of users who desire additional analyses, the inventory programs often make data and estimation procedures available via the Internet. For their own analyses, users frequently request access to stratiﬁ cations used by the inventory programs. When data analysis is via the Internet and stratiﬁ cations are based on classiﬁ cations of even medium resolution satellite imagery, the memory requirements for storing the stratiﬁ cations and the online time for processing them may be excessive. One solution is to summarize the stratiﬁ cations at coarser spatial scales, thus reducing both storage requirements and processing time. If the bias and loss of precision resulting from using summaries of stratiﬁ cations is acceptably small, then this approach is viable. Methods were investigated for using summaries of stratiﬁ cations that do not require storing and processing the entire pixel-level stratiﬁ cations. Methods that summarized satellite image-based 30 m × 30 m pixel stratiﬁ cations at spatial scales up to 2400 ha produced stratiﬁ ed estimates of the mean that were generally within 5-percent of estimates for the same areas obtained using the pixel stratiﬁ cations. In addition, stratiﬁ ed estimates of variances using summarized stratiﬁ cations realized nearly all the gain in precision that was obtained with the underlying pixel stratiﬁ cations.


Introduction
National and large area natural resource inventory programs report design-based estimates of attributes based on measurements at sampling locations for selected geographic areas such as states or provinces, counties, and municipalities. Because users frequently desire additional analyses using the same data, the programs may provide public access to data via the Internet, an increasingly popular and useful medium for doing so. For example, in 2003, the Internet data access site of the Forest Inventory and Analysis (FIA) program of the Forest Service, U.S. Department of Agriculture (USDA) received approximately 15 000 requests for data retrievals and analyses. Other inventory programs that make data available via the Internet include the National Resource Inventory (NRI) of the USDA Natural Resource Conservation Service, the national forest inventory of Canada, and the national forest inventories of several European countries (e.g., Finland, France).
When data requests do not include exact sampling locations, then there are few constraints on data access. However, if exact locations are required for a user's area of interest (AOI), then several policy issues must be considered. First, revealing exact locations may entice users to visit the locations to obtain additional information, thus artifi cially disturbing vegetation on the sampling location. If permanent inventory plots are used, this disturbance contributes to inventory estimation bias. Second, sampling locations may be located on private land, and the land owners, while permitting access by inventory crews, generally prohibit additional access. In these situations, user visits to sampling locations may jeopardize future access by the inventory crews. Third, revealing the exact sampling locations may violate constraints on the release of proprietary information. Thus, if exact sampling locations are required for a user's data request, policy constraints may require that the analysis be completed online in a manner that does not disclose the locations.
For many inventory programs, the combination of budgetary constraints and the natural variability among plots prohibits sample sizes suffi cient to satisfy precision standards unless the estimation process is enhanced using ancillary data. One approach to enhancing the estimation process is to use stratifi ed estimation with stratifi cations based on classifi ed satellite imagery (McRoberts et al. 2002). Stratifi ed estimation requires that two tasks be accomplished: each sampling location must be assigned to a stratum, and stratum weights must be calculated as the proportions of the AOI in strata. When stratifi cations are derived from classifi ed satellite imagery, sampling locations are assigned to strata on the basis of the stratum assignments of their associated satellite image pixels, and stratum weights are calculated as the proportions of image pixels assigned to strata. Users who desire the precision gain associated with stratifi ed estimation but do not have access to exact sampling locations have two options: they may use their own stratifi cations and assign sampling locations to strata using approximate coordinates provided by the inventory program, or they may accept online analyses using the inventory program's stratifi cation and its assignment of sampling locations to strata using exact coordinates. Users usually prefer the second option.
If storage space and processing time were not constraints for online analyses, the inventory programs could provide access to their entire pixellevel stratifi cations. However, because storage space and online processing time are constraints, this approach is not feasible. Therefore, methods for making the advantages of the stratifi cations available to users via online analyses without requiring inordinate amounts of storage space and intolerable processing delays merit investigation. The objective of this study was to compare stratifi ed estimates of forest attributes obtained using pixel-level stratifi cations with stratifi ed estimates obtained using summaries of those stratifi cations. Spatial summarizations of pixel classifi cations and predictions are also useful for facilitating online, map-based estimation of forest attribute estimates obtained by aggregating individual pixel classifi cations or predictions for selected AOIs.

Data
Comparisons of stratifi ed estimates obtained with pixel stratifi cations and summaries of those strati-fi cations used 1998 FIA data for the State of Indiana, USA. The FIA sampling design for Indiana is based on an array of regular hexagons covering the conterminous USA ( Fig. 1) (McRoberts 1999). Each hexagon includes 2402.8 ha and contains at least one permanent sampling location. This array of sampling locations provides complete, consistent coverage of all lands in the USA. To facilitate intensifi cation of the sample for some states, the hexagons have been subdivided into three parallelograms denoted sub-hexagons. Sampling locations for the national FIA program are denoted plots and consist of four 7.31-m radius circular subplots confi gured as a central subplot and three peripheral subplots located at distances of 36.58 m and azimuths of 0°, 120°, and 240° from the centre of the central subplot. Among the observations FIA fi eld crews obtain are individual tree diameters and heights and the proportions of subplots that satisfy specifi c ground land use conditions. The tree diameter and height observations are used as predictor variables for calculating model predictions of individual tree volumes. Volume per unit area is calculated for each plot by adding the volume predictions over all trees on a plot and scaling the sum to a per unit area basis. Forest land proportion is obtained for each plot by aggregating the ground land use conditions into forest and non-forest categories. Analyses for this study focused on stratifi ed estimates of means and standard errors for tree volume per unit area and proportion forest land.
The National Land Cover Data (NLCD) is a satellite image-based classifi cation that has been used by the FIA program as a basis for stratifi cations. The NLCD, a digital product of the Multi-Resolution Land Characterization (MRLC) Consortium (Loveland and Shaw 1996), is a land cover map of the conterminous USA consisting of the assignment of each 30 m × 30 m pixel to one of 21 land cover classes. The classifi cation was produced by the U.S. Geological Survey and was based on nominal 1992 Landsat 5 Thematic Mapper (TM) satellite imagery and a variety of ancillary data (Vogelmann et al. 2001).

Stratifi ed Estimation
The regional FIA program of the North Central Research Station, USDA Forest Service, derives stratifi cations from the NLCD (McRoberts et al. 2002). First, all the NLCD forest classes are aggregated into a forest stratum, and the remain- ing classes are aggregated into a non-forest stratum. Second, two additional strata are created by subdividing the forest stratum into forest and forest edge strata and by subdividing the nonforest stratum into non-forest and non-forest edge strata. The edge strata are created by assigning pixels in the original forest stratum within two pixels of the forest/non-forest boundary to the forest edge stratum and pixels in the original non-forest stratum within two pixels of the forest/non-forest boundary to the non-forest edge stratum. The rationale for this approach to stratifi cation is discussed in detail by McRoberts et al. (2002). A minimum of fi ve plots per stratum is considered necessary to obtain reliable stratifi ed estimates. If fewer than fi ve plots are assigned to a stratum, then the forest and forest edge strata are collapsed into a single forest stratum and/or the non-forest and non-forest edge strata are collapsed into a single non-forest stratum. If either collapsed stratum has fewer than fi ve plots, then stratifi ed estimation is deemed inappropriate for the AOI. This approach to stratifi cation yields stratifi ed variance estimates for proportion forest land that are smaller by factors as great as 4.0 than corresponding estimates calculated under the assumption of simple random sampling (SRS) (McRoberts et al. 2002).
The national FIA program assigns plots rather than subplots to strata to avoid the mathematical complexities necessary to accommodate the spatial correlations among subplot observations when calculating variances. Nevertheless, assigning FIA plots to strata is not trivial, because each plot is covered by multiple 30 m × 30 m pixels. For this study, each plot was assigned to the stratum of the pixel corresponding to the plot centre. Stratum weights were then calculated as the proportions by strata of pixels with centres in the AOI. This stratifi cation method was designated the pixel method and was the standard for com-pixel method and was the standard for com-pixel parison for other methods (Fig. 2).
Stratifi ed estimates of means and variances of means were obtained using formulae from Cochran (1977): These formulae ignore fi nite population correction factors.

Stratifi ed Estimation with Estimated Stratum Weights
Stratum weights obtained from classifi ed satellite imagery are only estimates of the true weights because imagery at all resolutions aggregates forest attribute information at a coarser spatial scale than it naturally occurs. Use of estimated rather than known weights contributes to bias in the stratifi ed estimates of the mean and to increases in the stratifi ed variance estimates. Cochran (1977) provides a formula for the bias which, when substituting sample stratum means, where {w h } are the estimated stratum weights. The effects of estimated stratum weights on stratifi ed variances may be evaluated by considering the problem as one of double sampling for stratifi cation with the satellite image pixels as the fi rst phase sample and the plots as the second phase sample. Cochran (1977) provides the following formula for the stratifi ed variance when using estimated stratum weights, where N is the population size and N is the population size and N nʹ is the size of the fi rst sample. When using classifi ed satellite imagery as the basis for stratifi cations, nʹ is the number of pixels with centres in the AOI. For a circle of radius 15 km, nʹ = 785 000; for the smallest Indiana county, nʹ = 473 000; and for a user AOI with two strata, fi ve plots per stratum, one plot per hexagon, and 26 000 pixels per hexagon, nʹ > 250 000. Thus, , and for an infi nite population, When compared to [2], [5] reveals that for nʹ large, the differences between known and estimated stratum weights account for all but negligible differences in variance estimates. For this study, stratum weights obtained using the classifi ed satellite imagery are the standard for comparison and are assumed to be the true stratum weights, {W h W h W }, and stratifi ed estimates using these weights are assumed to be unbiased.

Stratifi ed Estimation with Stratifi cation Summaries
Stratifi cations were summarized by counting the number of pixels by stratum for summary units of a variety of spatial scales. Beginning with an arbitrarily selected starting point, the state of Indiana was tesselated by a set of non-overlapping squares with 1-km sidelengths. The numbers of pixels with centres in each square kilometre were counted by stratum and retained. Stratifi cations were summarized at the sub-hexagon and hexagon level in the same manner as for square kilometres.
To obtain stratifi ed estimates with stratifi cation summaries, the same two stratifi cation tasks must be accomplished: plots must be assigned to strata, and stratum weights must be calculated. The fi rst task is easily accomplished by using the same stratum assignments as for the pixel stratifi cations. Several methods were used to accomplish the second task. One method was to select the summary units with centres in the AOI, add the pixel counts by stratum over these summary units, and calculate stratum weights as the proportions of pixels assigned to strata. This method, based on including or excluding a summary unit's pixel counts in the overall total on the basis of whether the centre of the summary unit was inside the AOI, was designated the centre estimation method and was used in combination with the square kilometre (100 ha), sub-hexagon (~8000 ha), and hexagon (~2400 ha) summary units (Fig. 3).
Pixel counts by stratum obtained using the centre method with stratifi cation summaries will not be the same as those obtained using the pixel method, because the exterior boundaries of the spatially aggregated summary units will not coincide exactly with the boundaries of the AOI. Two categories of summary units are distinguished: 1 0 ′ ≈ n Fig. 3. a) Square kilometre summary unit with centre estimation method for Scott County, Indiana, USA. All pixels are counted for square kilometres with centres in the county. b) Sub-hexagon summary unit with centre estimation method for Scott County, Indiana, USA. All pixels are counted for sub-hexagons with centres in the county. c) Hexagon summary unit with centre estimation method for Scott County, Indiana, USA. All pixels are counted for hexagons with centres in the county.
interior summary units are wholly within the AOI, while boundary summary units are only partially within the AOI. Boundary units cause two kinds of errors in pixel counts for an AOI. Boundary units with centres inside the AOI cause pixel counts to include some pixels outside the AOI, and boundary units with centres outside the AOI cause pixel counts to exclude some pixels inside the AOI. However, stratifi ed estimation does not require pixel counts by stratum but rather stratum weights that are proportions of pixel counts by stratum. Thus, even though some pixels are erroneously included and some are erroneously excluded in the pixel counts by stratum over summary units, the stratum weights may still be approximately correct. Two additional methods for summarizing stratifi cations were investigated. The fi rst included pixel counts for boundary summary units on the basis of whether any part of a unit was inside the AOI. This method was designated the exterior estimation method, because the boundaries of the spatially aggregated summary units coincided with or were exterior to the boundaries of the AOI. The exterior method counted all pixels in a boundary unit, regardless of whether the unit centre was inside or outside the AOI and was used only in combination with hexagon summary units (Fig. 4). The corresponding interior estimation method was not investigated because of the risk of excluding so many pixels that the stratum weights would not adequately represent the AOI. This risk is particularly acute for AOIs with small, fragmented components for which the ratio of interior to boundary summary units is small.
The second additional summary method was designed to compensate for pixel inclusion and exclusion errors. With this method, the total pixel counts by stratum for boundary summary units were adjusted by multiplying them by the proportion of the unit in the AOI. The additional computation necessary to determine the proportion for each boundary unit may produce more accurate pixel counts, more accurate stratum weights, and hence, more accurate stratifi ed estimates. This method was designated the proportional estimation method and was used only in combination with hexagon summary units (Fig. 5). The proportional estimation method should be distinguished from the method that fi rst determines the portion of the unit in the AOI and then counts only those pixels in the selected portion. Although this is exactly the method that would be used under ideal conditions, it is also exactly the storage-and processing-intensive method for which this study sought alternatives.

Interpreting Estimates Obtained with Stratifi cation Summaries
Estimates obtained with stratifi cation summaries for an AOI may be interpreted in two ways: 1) as estimates obtained with an approximate stratifi cation for the exact AOI, or 2) as estimates obtained with an exact stratifi cation for the approximate AOI defi ned by the boundaries of the selected summary units. If bias in stratifi ed estimates due to using an approximate stratifi cation is small and the gain in precision realized from stratifi ed estimation is acceptable, then the fi rst interpretation is preferable because the estimates are for the exact AOI. The specifi c objectives of the study were to estimate the bias and the precision loss for estimates obtained using stratifi cation summaries. If the bias is too great, users may accept the second interpretation, or if there is too much loss in precision, they may revert to estimation under the SRS assumption.

Analyses
Two sets of analyses were used to compare stratifi ed estimates obtained using the pixel method to estimates obtained using the stratifi cation summary methods. For both sets of analyses, the comparisons focused on estimates of means and standard errors of volume per unit area and proportion forest land for selected areas in the State of Indiana, USA (Fig. 1). The fi rst set of analyses evaluated differences between estimates obtained using the pixel stratifi cations and estimates obtained using stratifications summarized at the spatial scales of square kilometre, sub-hexagon, and hexagon for Indiana counties. For each Indiana county for which there were at least fi ve plots in at least each of the two collapsed forest and non-forest strata, stratifi ed estimates of the mean were calculated for the pixel method using [1] and for each summarization method using [1], except that estimated stratum weights obtained from stratifi cation summaries were used. Estimates of the variances were calculated for each county for the pixel method using [2] and for each summarization method using [5].
In addition, estimates of the means and standard errors were calculated under the SRS assumption, and bias was estimated as the difference between the stratifi ed estimates of the mean obtained using the pixel method and the summarization methods. Finally, for each stratifi cation summary method, four comparative measures were calculated: 1) the ratio, RB mn , of the bias and the stratifi ed mean calculated using [1]; 2) the ratio, RB SE , of the bias and the estimate of the stratifi ed standard error for the pixel method; 3) the ratio, RV, of the stratifi ed variance estimate obtained with the pixel method and the estimate obtained with the stratifi cation summary method; and 4) relative effi ciency, RE, calculated as the ratio of the variance of the mean obtained under the SRS assumption and the stratifi ed variance. The second set of analyses compared methods for areas enclosed in two sets of concentric circles of radii 16.09 km (10 mi), 32.19 km (20 mi), 48.28 km (30 mi), 64.37 km (40 mi), and 80.47 km (50 mi) (Fig. 6). One set of circular areas had its centre in an area of northern Indiana with sparse, fragmented forest, while the other set had its centre in a more heavily forested area in southern Indiana that includes the Hoosier National Forest (HNF). The purpose of these analyses was to compare estimates of the means and standard errors for areas that mimic user-defi ned AOIs. Circular AOIs of radius 16.09 km include 81 332 ha, and with a sampling intensity of one plot for every 2402.8 ha, were expected to include approximately 34 plots. Assuming moderate variability in the number of plots per stratum, circular areas of radius 16.09 km were about the smallest areas that could be expected to have fi ve plots assigned to each of the four strata. The same estimates and measures were calculated for the circular areas as were calculated for the counties.
The assumption when using stratifi cation summaries is that stratum weights obtained from the summaries will be similar to stratum weights obtained from the pixel stratifi cations. Similarity in stratum weights was expected to be related to similarity between an AOI and the analytical area represented by the spatial aggregation of the AOI's stratifi cation summary units. For the hexagon-exterior method, the boundary of an analytical area was the extension of the AOI boundary to the exterior boundaries of any hexagons through which the boundary of the AOI passed. For the hexagon-centre method, the boundary of the analytical area was the extension of the AOI boundary to the exterior boundaries of hexagons with centres in the AOI and the contraction of the AOI boundary to the interior boundaries of hexagons with centres outside the AOI. Pixel stratifi cations and stratifi cation summaries are exactly the same for interior summarization units and differ only for boundary summary units. Thus, one measure of the potential for differences in stratum weights for an AOI and corresponding analytical area is the ratio of interior to boundary summary units for the AOI. Large ratios indicate that stratifi ed estimates obtained with the pixel stratifi cations and the stratifi cation summaries would be expected to be similar; small ratios indicate greater potential for the two sets of stratifi ed estimates to be dissimilar. Small ratios indicate only a potential for dissimilarity, because if the proportions of pixels by stratum in the boundary unit within the AOI are similar to the proportions for the entire boundary unit, then the stratum weights and stratifi ed estimates would still be expected to be similar.
The relationship between the ratio of interior to boundary hexagons and the bias in stratifi ed estimates of the mean was analysed by graphing absolute values of RB mn obtained using the hexagon-centre method versus ratios of interior to boundary hexagons. The graphs included one data point for each Indiana county, one point for each of the northern and southern circular areas, and for one point for HNF. HNF consists of the aggregation of many relatively small, non-contiguous fragments of heavily forested areas in south central Indiana (Fig. 7) and is the kind of area for which stratifi cation summaries may not produce acceptable estimates for two reasons. First, due to land management practices, the portions of hexagons within HNF are expected to be heavily forested, while the portions outside HNF may or may not be forested. Second, because the contiguous fragments of HNF are relatively small, the ratio of interior to boundary hexagons was also expected to be small. Thus, the ratios of interior to boundary hexagons were expected to range from very small for HNF to quite large for the 80.47-km circular areas.

County-level Analyses
Seven Indiana counties were eliminated from the analyses as a result of the requirement that at least fi ve plots be assigned to at least each of the two collapsed forest and non-forest strata; 85 counties remained. The analyses indicated that stratifi ed estimates of both means and variances obtained using the stratifi cations summarized at the spatial scale of square kilometre, sub-hexagon, and hexagon differed little from the estimates obtained using the pixel method (Tables 1a and 1b). As expected, differences were less for smaller summary units. Among the three methods using hexagon summary units, results for the hexagon-exterior method were inferior, while results for the hexagon-centre and hexagon-proportional methods were similar to each other. RE values indicated that the stratifi cations produced substantial gains in precision, averaging nearly 2.0 for volume per unit area and more than 3.0 for proportion forest land. The average RE for proportion forest area was slightly better than the RE obtained by McRoberts et al. (2002). Counties for which RE < 1.0 were characterized by small estimates of proportion forest land and were relatively few in number, 16 for volume per unit area and six for proportion forest land. Calculation of estimates under the SRS assumption is an easy task and permits these counties to be readily identifi ed. Little precision was lost with the stratifi cation summaries compared to the pixel stratifi cations. Excluding the inferior hexagon-exterior method, the variances were always within ±10 percent of the variance obtained with the pixel method. Because of its inferiority relative to the other two hexagon methods, the hexagon-exterior method was not evaluated further. The hexagon-centre and the hexagon-proportional methods produced similar results, but the hexagon-centre method was preferable because of its less intense processing requirements. Therefore, only the square kilometre, sub-hexagon, and hexagon summary units with the centre method were evaluated further.

Analyses for Circular AOIs
The second set of analyses was based on simulated user-defi ned AOIs. Results for the square kilometre, sub-hexagon, and hexagon summary units, all of which used the centre method, were similar to each other and to results for the pixel method which was the standard for comparison (Tables 2a and 2b). Results for smaller summary units were slightly more similar to the results for the pixel method. These similarities held for estimates of both means and standard errors.

Interior/ Boundary Hexagon Analyses
For the hexagon-centre method, graphs of RB mn versus the ratio of interior to boundary hexagons (Fig. 8) indicated that as the ratio increased, RB mn decreased. However, the graphs also indicated considerable variability in the relationship. For volume per unit area the ratio of interior to boundary hexagons for the circular areas ranged from slightly less than 0.89 to approximately 6.63, while RB mn ranged from 0.00 to 0.02. For the Indiana counties, the ratio ranged from 0.20 to approximately 1.24, with corresponding range in RB mn of 0.00 to 0.08, although the majority of RB mn observations were less than 0.03. The wide range of the RB mn observations illustrated that when the ratio of interior to exterior hexagons was in the 0.20 to 1.24 range, RB mn-vol could be as great as 0.08 but that this maximum was not always realized. The two circular areas with radius 16.09 km had ratios of 0.89 and 1.00, well within the upper range of the county ratios, and the corresponding RB mn observations were less than 0.01 and 0.03, well within the lower range of RB mn for the counties. These results suggested continuity in the relationship between the ratio of interior to boundary hexagons and RB mn , despite differences in the county and circular areas that produced the data points. The ratio of interior to boundary hexagons for HNF was zero; there were no interior hexagons for HNF. For HNF, RB mn was 0.06, well within the upper range of values of RB mn for the counties, again suggesting continuity. Although considerable variability in RB mn would be expected for other areas similar to HNF, this single point illustrated that even with a ratio of zero, RB mn may be relatively small. However, a wise precaution would be to evaluate such situations on a case-by-case basis. Nevertheless, ratios of interior to boundary hexagons that were greater than 0.25, nearly always resulted in RB mn < 0.05. Slightly better results were obtained for proportion forest land.

Conclusions
Three conclusions may be drawn from this study. First, on average, results obtained with the square kilometre, sub-hexagon, and hexagon summary units with the centre method were all acceptable approximations to the results obtained with the pixel stratifi cations which were the standard for comparison. Second, the hexagon-centre method was preferred overall: it produced satisfactory results relative to the standard for comparison, the pixel method; it has direct linkages to the FIA national sampling design; and it requires the least storage and processing time of all the stratifi cation summary methods considered. The quality of results for the hexagon-centre method suggests that it should be investigated in other areas with different topographies, tree species, and forest management practices. Third, ratios of interior to boundary hexagons greater than 0.25 nearly always resulted in differences in estimates relative to the mean of volume per unit area and proportion forest land that were smaller in absolute value than 0.05. Differences this small will generally be considered an acceptable price to pay to realize virtually all the gain in precision that the underlying pixel stratifi cation provides. The necessity of summarizing stratifi cations may be alleviated as the costs of computer storage and processing decrease. However, the tendency toward using fi ner resolution stratifi cations as they become available may exacerbate the problem. Regardless of whether the problem is alleviated or exacerbated, it is worth noting the difference in storage requirements for the underlying stratifi cation and a summary of the stratifi cation at the hexagon level. For hexagons of 2402.8 ha, the storage requirement for the underlying stratifi cation is one cell for each of the 26 696 30 m × 30 m pixels. When summarizing a stratifi cation at the hexagon level, four cells are required, one for each of the four stratum pixel counts. The magnitude of this ratio, 26 696:4, which is also an approximation of the factor by which computer processing requirements may be reduced, more than justifi es proportional differences in estimates of 0.05 or less when the benefi ts of stratifi cation are also realized.