Incorporating tree- and stand-level information on crown base height into multivariate forest management inventories based on airborne laser scanning

This study examines the alternatives to include crown base height (CBH) predictions in operational forest inventories based on airborne laser scanning (ALS) data. We studied 265 field sample plots in a strongly pine-dominated area in northeastern Finland. The CBH prediction alternatives used area-based metrics of sparse ALS data to produce this attribute by means of: 1) Tree-level imputation based on the k-nearest neighbor (k-nn) method and full field-measured tree lists including CBH observations as reference data; 2) Tree-level mixed-effects model (LME) prediction based on tree diameter (DBH) and height and ALS metrics as predictors of the models; 3) Plot-level prediction based on analyzing the computational geometry and topology of the ALS point clouds; and 4) Plot-level regression analysis using average CBH observations of the plots for model fitting. The results showed that all of the methods predicted CBH with an accuracy of 1–1.5 m. The plot-level regression model was the most accurate alternative, although alternatives producing tree-level information may be more interesting for inventories aiming at forest management planning. For this purpose, k-nn approach is promising and it only requires that field measurements of CBH is added to the tree lists used as reference data. Alternatively, the LME-approach produced good results especially in the case of dominant trees.


Introduction
The size of a tree crown is an indicator of tree health, vigor, and growth and yield potential (Smith 1986;Salminen et al. 2005). The crown size is usually defined in terms of crown length, crown ratio, i.e., crown length divided by the total height of a tree, crown surface area or crown volume. Of those measures, crown length and crown ratio can be obtained most simply, because tree height is typically measured in forest inventories and only crown base height (CBH) needs to be measured in addition. The usability of CBH is not limited to crown size calculation: tree-level CBHs have been used to determine the quality classes and value of sawn logs (Uusitalo 1995;Verkasalo et al. 2004), and stand-level mean CBH can also be used to indicate on wood quality (Wall et al. 2004). CBH has also been used as a tree level predictor variable in tree growth (Salminen et al. 1995) and biomass component models (Repola 2009) and stand level predictor variable in fire behavior models (Andersen et al. 2005). CBH varies considerably between tree species being usually at a higher relative height for dominant species, such as Scots pine (Pinus sylvestris L.), and lower for shade tolerant species, such as Norway spruce (Picea abies [L.] H. Karst.). CBH is also related to silvicultural history, age and social status of a tree in a stand.
Information obtained by airborne laser scanning (ALS) is strongly related to some forest inventory attributes, such as tree height, different plot-level mean height characteristics, canopy cover and canopy gaps, and these attributes can be derived almost directly from the data (Naesset 2002; Thomas et al. 2008;Vepakomma et al. 2008;Korhonen et al. 2010). On the other hand, characteristics such as tree diameter distribution cannot be directly observed with ALS data and statistical relationships must be utilized to predict those (Gobakken and Naesset 2004).
Based on ALS data, the upmost surface of the tree crowns is typically described most accurately, since the first ALS echoes mainly reflect back from the top of the canopy. The description becomes less accurate towards the lower end of canopy, i.e., around the height of the crown base, because most of the first echoes have already reflected back and the subsequent echoes may suffer from transmission losses due to penetrating upper canopy. The challenges related to predicting CBH by ALS are further reviewed by Maguya (2015;Chapter 5) and at least three different prediction approaches can be distinguished based on earlier literature: First, some researchers have attempted to derive the CBH directly from the data, applying polygon, alpha shape or voxel based crown or canopy approximations (Pyysalo and Hyyppä 2002;Holmgren et al. 2008;Popescu and Zhao 2008;Vauhkonen 2010;Maltamo et al. 2010). Second, it may be possible to predict CBH according to descriptive statistics of ALS-based height value distribution without any statistical model (Solberg et al. 2006;Dean et al. 2009;Maltamo et al. 2010), especially when combined with expert opinion or calibration field data to indicate what statistics are relevant for the CBH. Third, using field data more integrally for model fitting, the relationships between ALS features and crown base height can be estimated using regression analysis or corresponding techniques (Naesset and Okland 2001;Andersen et al. 2005;Maltamo et al. 2006Maltamo et al. , 2010. In Finland, ALS-based approaches have been established for the data acquisition of operational forest management inventories . Due to the requirements of the planning system, the inventory system must produce data at the tree-level, but practical accuracy and cost reasons prohibit the use of individual tree detection approaches. Instead, the practical forest inventories in Finland use area-based ALS data and k-nearest neighbor (k-NN) imputation to produce stand attributes such as mean diameter, dominant height, basal area and volume (for details, see Maltamo and Packalen 2014). Although the stand description is predicted at an area-level, the use of tree size distribution models transforms the description back to the tree-level, meaning that information of both tree and stand levels is available for the subsequent management or wood procurement planning. As reviewed above, it would benefit the aforementioned purposes, if also CBH predictions were available either at the tree or stand level. However, even though earlier studies have confirmed that such predictions are feasible based on available ALS data, there are no studies that would recommend on how this information should be linked to multivariate forest inventories, such as those described by Maltamo and Packalen (2014).
The aim of this study is to examine the possibilities to include crown base height prediction into multivariate forest inventory information using ALS. The study compares four different methods to predict the CBH, which can all be implemented with sparse ALS data and area-based metrics. The methods require different levels of field reference data and produce either tree or plotlevel predictions. However, all methods are cross-validated at the plot-level, which corresponds to a scale where the aforementioned inventory systems are operated.

Study area and field data
The study area was located in Kuhmo, eastern Finland, and it was a combination of two separate laser scanning areas (Fig. 1). The Kuhmo region is very homogenous in terms of species, being strongly dominated by Scots pine. Also Norway spruces and deciduous trees, mainly birches (Betula spp.) and aspen (Populus tremula L.), are typically found, but especially the deciduous species usually form only minor proportions of suppressed trees.
In this study, 265 field sample plots with co-located ALS and field data were available. The plots had been placed in L-shaped clusters of 5 plots according to old inventory data on development stages of forests in the area. The distance between plots in a cluster was 150 m. Every tree with a diameter at breast height (DBH) greater than 5 cm had been measured from the circular sample plots with a radius of 9 meters. The measured attributes were DBH, CBH, and species. Species-specific median trees of each plot, selected according to tree basal area, were measured for height. Tree-level basal areas were first calculated by formula π × (DBH/2) 2 and the plot basal area (G) was summed from these measurements. The missing tree heights were predicted using Näslund's (1936) height curve: On each plot, these curves were first calibrated using the height and DBH of the species-specific median trees (Siipilehto 1999). Tree stem volumes were estimated using Laasasenaho's models (1982), which include the DBH, height and tree species as predictors. Pine and spruce had their own separate models and the model for birch was used in the case of all deciduous trees. The most essential characteristics of the studied plots are shown in Table 1.
The final study material consisted of 6527 trees, of which 3245 were Scots pines, 1661 Norway spruces and 1621 deciduous trees, the latter treated as a combined species group in this study. The pine proportion of stem number was therefore 49.7% and on average, pines were on dominant position in the stands and considerably larger than other tree species (Table 1).
Categorical forest characteristics measured from the plots included soil type, site type, development stage and dominant species (Table 2). According to this information, the plots represented widely different types of forests. Most of the plots had a mean DBH between 8 and 16 cm and could therefore by categorized as young stands, for which reason the mean CBHs were fairly low in the entire data (Table 1).

ALS data
The study area was laser scanned from the air between 4th and 7th of September, 2011, when the deciduous trees still had their leaves on. The scanning was performed with Leica ALS50-II sensor from a flying altitude of 2000 m using a scanning angle of 30 degrees, scanning rate of 52 Hz and pulse frequency of 58.9 Hz. These parameters yielded a nominal data density of 0.52 pulses m -2 . The data were acquired and pre-processed, including height normalization, by Arbonaut, Ltd.
Height-normalized first echoes ("single" or "first-of-many") of each pulse were extracted from the region of each plot. The ground threshold of 2 m was employed when computing the ALS features. Altogether 33 different plot-level ALS features were computed from the point data, including, i.e., percentiles and densities at relative heights 0.05; 0.1; 0.2,…, 0.9, 0.95 and central statistics of both the height and intensity values (Table 3).  Table 3. The ALS metrics used in this study. Suffices single and first refer to using only single or first-of-many echoes, respectively; otherwise, the metrics were computed using pooled single and first-of-many echoes.

CBHas CBH prediction based on alpha shapes (see Section 2.3.3) Hmax
Maximum height Hmean Mean height Hstd Standard deviation of height Vegeratio Ratio of the number of echoes above 2 m to the total number of echoes H05…H95 ith percentile of height D05…D95 Relative density at height i Imean Mean intensity Imean single Mean intensity (single echoes) Imean first Mean intensity (first-of-many echoes) Istd Standard deviation of intensity Istd single Standard deviation of intensity (single echoes) Istd first Standard deviation of intensity (first-of-many echoes) Prop_first Proportion of first-of-many echoes

Methods to predict crown base height
This study compared altogether four different methods that can be implemented with sparse, areabased ALS data. The methods however require different degrees of reference data in order to be run and therefore result to CBH predictions of different levels (tree or plot). The compared methods can be summarized according to the reference data requirements as follows: 1. Tree-level imputation requires that full field-measured tree lists including CBH observations are available for reference data. 2. Tree-level mixed-effects modelling approach requires similar reference data as alternative 1 for model fitting. 3. Plot-level prediction based on alpha shapes examines the properties of the point clouds for prediction and it can be applied without field information. 4. Plot-level regression analysis requires plot-level CBH observations for model fitting.

Tree-level imputation
The tree list imputation based on k-NN is fundamentally similar to Packalen and Maltamo (2008) and therefore referred to as PMk-NN. The idea is to search for k most similar plots in terms of chosen independent variables and to predict tree lists as weighted unions of trees measured from those plots, using the selected distance metric as the weights. The predicted tree lists, therefore, include all tree attributes available for each tree, being the species, DBH, height, and CBH in our case. The use of an imputation method requires choices on dependent and independent variables, distance metric, and a value of k within available training plot data (Maltamo et al. 2009), each of which affect the predictions (Latifi et al. 2010). These factors can be optimized with respect to desired accuracies in the training data (Packalen et al. 2012). As we focused on CBH, our PMk-NN was trimmed for these predictions, as explained below and further discussed in the Discussion section.
PMk-NN is based on the most similar neighbor (MSN) method, which employs canonical correlation analysis to produce a weighting matrix for selecting the NNs from the training data. Both the variable selection for the correlation analysis and the NN search were carried out in a leave-out-one-plot fashion using the yaImpute package (Crookston and Finley 2008) of the R statistical computing environment. The dependent (Y) and independent (X) variables were selected according to the Root Mean Squared Error (RMSE) of basal area weighted CBH with both k = 1 and k = 5, testing different variable combinations in two steps. First, the plot volume, basal area, the mean, minimum and maximum diameter and height, and 10th, 20th, ..., 90th percentiles of diameter and height distributions were tested one by one as the Y-variable, using all available ALSfeatures to form the vector of X-variables. The Y-variable that minimized the RMSEs was retained and new Y-variables were appended until the RMSE accuracy could not be improved. Then, the vector of Y-variables was fixed to that obtained above and the selection was repeated similarly for the X-variables. The sensitivity of both variable combinations was finally tested by replacing each selected X and Y variable with all possible non-selected candidates.
When predicting, the k parameter was set to be 5, meaning that the predicted tree-list of each plot was composed of the trees from 5 most similar plots. The CBH prediction of each plot was computed as the distance weighted mean of the CBHs measurements of the 5 plots. The relative weight w of individual plot i was calculated as: where D i is the distance between the target and the plot i and n is the number of plots, in this case 5.

Tree-level mixed-effects model
In the tree-level modelling, the linear mixed-effects model (LME) was developed for predicting CBH of pine using tree DBH and height as the main predictors. In ALS-based forest inventory applications, the DBH and height are not known but they must first be predicted. Therefore, the constructed model was applied to the tree-lists produced by the PMk-NN method described above.
Only the plots with pine as the dominant species were used as modeling data since mainly the dominant trees affect the plot-level ALS-metrics and our inventory area is strongly dominated by pine. The model included both the tree-level variables measured in the field and ALS-derived plot-level variables. The different ALS-derived variables such height percentiles (H05, H10, H20,…, H95), the densities value of each quantiles (D05, D10, D20,…, D95), ALS-based mean height and crown height were tested as plot-level independent variables (Table 3). To account for hierarchical data structure and obtain unbiased tests for variables at each level, the model was formulated as a linear mixed model (Searle 1987) with both fixed and random effects as follows: where CBH is the response variable, a 0 is the intercept, a and b are vectors of fixed regression coefficients, TREE is a vector of field measured tree characteristics for tree j in plot i, PLOT is a vector of ALS-based stand characteristics in plot i, u i is the random effect of plot i and e ij is the residual error for tree j in plot i.

Plot-level prediction based on alpha shapes
The implementation of the alpha shape method (α-shape) corresponds precisely with the "Prediction alternative A" of Maltamo et al. (2010). It is based on examining geometrical and topological properties of the 3D point data to detect vertical gaps that distinguish canopy from the terrain and possible understorey shrubs. If such gaps exist and appear larger than within-canopy gaps in the ALS data, this method can detect the CBH directly from the point data and without any field work. According to the experiences of Maltamo et al. (2010), however, these predictions can be expected to include 20-40% more error compared to best ones obtained with field calibration data.
To implement this method, the plot-level ALS point clouds were intersected by a vector grid with a cell size of 4 × 4 m (Maltamo et al. 2010) to derive three-dimensional Delaunay triangulation and alpha shape (Edelsbrunner and Mücke 1994) based on the point data of each cell. CBHs were first predicted separately for each cell by filtering the alpha shapes according to parameter alpha, which can be considered as a size-criterion to determine the level of detail in the obtained triangulation (Vauhkonen 2010). The filtering started from such an alpha value that delimited the point cloud into one connected component representing canopy. The alpha values were traversed in descending order until a new component was split. The vertical position of each split component was examined relative to the initial component. The traversal of the alpha values was continued until new components located completely below the highest (canopy) component and above the ground threshold of 1 m could not be extracted. Cell-specific CBHs were defined as the heights of the lowest vertices in the filtered alpha shapes and plot-level predictions were obtained as their weighted averages, using the joint areas of the plot and cells as the weights.

Plot-level regression analysis
Linear regression (LR) analysis was used to construct a plot-level model separately for the arithmetic mean CBH and basal-area-weighted mean CBH. All the ALS variables available were used as candidate predictors in the model. The correlation between different predictors was tested with VIF-test (Variance Inflation Factor).

Validation
Plot-level validation was done separately for the whole data and for subsets of plots with the proportions of tree species groups ≥ 50% of the total volume. Altogether 201, 38 and 15 plots were dominated by Scots pine, Norway spruce and deciduous tree species, respectively. The remaining 11 plots without a clear dominant species were classified as mixed stands. We wanted to evaluate the predictions of these plots separately, because it is known that the CBH is a tree species dependent attribute. We considered both arithmetic mean CBH and basal-area-weighted mean CBH (cf. Eqs. 3-4): using PMk-NN and LME methods, also the predicted plot-level CBH was either an arithmetic mean or weighted by the basal area obtained from the tree lists. When validating the predictions, a leave-one-plot-out cross-validation (LOOCV) was applied to all predictions except those produced by alpha shapes, which did not require model fitting or calibration. The validation was based on the root mean squared error (RMSE, Eq. 3) and mean difference (BIAS, Eq. 4) where y is the observed plot-level arithmetic or basal-area-weighted mean CBH, x is the predicted plot-level mean CBH and n is the number of plots. Using PMk-NN, LME and LR, also the predicted CBH was either arithmetic mean or the basal-area-weighted mean. The Relative RMSEs and biases were calculated by multiplying the absolute values with factor 100/y, where y is the average CBH in the population studied.

Tree-level imputation
Altogether three X-and three Y-variables were used for the PMk-NN imputation (Table 4). In the first step of the feature selection, the RMSE of both k = 1 and k = 5 was improved by including the minimum and 70th percentile of tree heights and the 80th percentile of tree diameters as Y-variables. Two candidate Y-variables would still have improved the result with k = 1, but less than 1% in terms of the RMSE-improvement compared to using three Y-variables. All these candidates also increased the RMSE with k = 5, for which reason the first step of the feature selection was terminated. In the second step, Hmean, D90, and H10 were selected as X-variables. With k = 5, altogether 12 additional X-variables would have provided similar decimal-level improvements to the RMSE and one of the candidate X-variables as much as 5%, compared to the last variable added. All of these variables however increased the RMSE with k = 1, for which reason the second step of the feature selection was terminated. Adding or removing Y-features did not improve the RMSE compared to the combination presented in Table 4.

Tree level mixed-effects model
The constructed mixed-effect model for CBH was based on field-measured tree diameter at breast height (DBH) and tree height (h), and ALS-derived variables such 5th and 80th percentiles of height (H05, H80) and the relative density at the relative height of 0.05 (D05). In addition to the main effect of these variables, DBH was used also as interaction term with h, D05 and H80. All the independent variables were highly significant i.e. the p-value was < 0.0001, except ln(D BH ) with p-value of 0.0011. The final mixed-effect model was specified as follows: The error variance was 0.843 m between plots and 1.328 m between trees, meaning that from the unexplained residual variance, a greater part occurred between trees then between plots -furthermore meaning that the model was actually more accurate predicting plot-level mean CBH than the CBH of a single tree.

Plot level linear regression model
The significance (p-value) of each predictor was examined and the least significant candidate was removed from the model. This procedure was repeated until the p-value of each predictor was smaller than 0.001. The final combination of predictors was manually chosen from this group by minimizing the RMSE. Additionally, to avoid multicollinearity, VIF-value greater than 5 between two predictors was not allowed. Different transformations were also tested but they did not improve neither the model accuracy nor the plotted residuals.
The final form of the linear regression model for predicting the arithmetic mean crown base height (CBH ARI ) was the following (Eq. 6): where Hstd is the standard deviation of the height, H10 is the 10th percentile of height, D50 is the relative density at the relative height 0.5 and Imean_all is the mean of intensities. The error variance and adjusted R-square of the model were 1.1 m and 0.72, respectively, in the entire data. However, using the LOOCV approach in the validation calculations changed the variance and coefficient values above between the iterations. Correspondingly, the final form of the linear regression model for predicting the basal-areaweighted crown base height (CBH BAW ) was the following (Eq. 7): where D20 is the relative density at the relative height 0.2. The error variance and adjusted R-square were 1.2 m and 0.78, respectively, in the entire data.

Plot-level reliability figures
The RMSEs and biases observed for the different methods are shown in the Table 5 for the whole study data. The relative RMSEs were always lower for the basal-area-weighted mean crown base height (CBH BAW ) than for the arithmetic mean crown base height (CBH ARI ) , where the improvement is because of the increase in the reference CBH value due to the basal-area weighting. The linear regression model was the most accurate alternative for both CBH ARI and CBH BAW . It is not surprising, since it modelled the plot-level mean CBH using straightforward regression analysis. In that case, predictions matching closely with the reference values could be obtained without bias. The RMSEs were 0.11-0.12 m.
The PMk-NN method showed the second best accuracy figures. The imputation of tree-level CBH values also turned out to be more accurate than applying the LME-model to the PMk-NNproduced tree lists. The LME-model in particular led to considerably high overestimates.
The alpha shape method was the least accurate alternative to predict CBH in the case of the arithmetic mean, although it did not perform remarkably poorly compared to the other methods. In the case of the basal-area-weighted mean CBH, the accuracy of the alpha shape method was better than that of the LME-model and almost as good as for the PMk-NN-method. The alpha shape method also overestimated the CBH even more than the LME-model in the case of the arithmetic mean, but was almost unbiased toward the basal-area-weighted values.
The reliability figures of both the arithmetic and basal-area-weighted CBH are additionally shown for plots dominated by the different species (Tables 6 and 7). The reliability figures were in general better in the pine-dominated plots and especially for the LME-model. Also the bias values were lower. This is an obvious result, since the LME-model was originally constructed at the tree-level using observations of pine plots only. In the case of basal-area-weighted mean CBH, the LME-model was even more accurate than the PMk-NN approach in pine plots (Table 7). As above, the reliability figures of the alpha shape method were better for basal-area-weighted mean CBH values. The alpha shape method was interestingly more accurate than PMk-NN in the spruce plots. Additionally in the plots dominated by spruce, the reliability figures varied depending on the method and whether the evaluation was done with respect to the arithmetic or basal-area-weighted mean CBH. Only for deciduous plots, the CBH predictions were always poor (considerable overestimates), which may be due to the fact that the reference CBH values were generally lower for these plots.
Finally, the residuals of predicting the arithmetic mean CBH and basal-area-weighted mean CBH using the compared methods are shown in Figs. 2 and 3, respectively. In general, the residuals were evenly distributed. However, especially the LME-model and alpha shape method generally overestimated the CBH values of the plots not dominated by pine. Figs. 2 and 3 also provide more background for the biased reliability figures of those methods.

Discussion
This study considered the prediction of CBH using sparse density ALS data. The plot-level validation showed that CBH can be predicted in most cases at an accuracy of about 1-1.5 meters for plots dominated by different species. Only for plots with deciduous trees, the error values were considerably higher. The results above can be regarded as promising in that all of the tested methods can be feasibly applied in ALS based large scale forest inventory. The observed relative RMSEs were between 21% and 34% in the whole data and between 18% and 30% in the pine dominated plots (i.e., the most frequent and the most important species of the inventory area). These values are somewhat high compared to RMSEs reported earlier (Maltamo et al. 2010). However, the reason may be related to generally low reference CBH values, when even a minor absolute error may result to high relative figures. Another reason for the low relative accuracy may be the aforementioned wide distribution of development classes from saplings to mature stands on the plots studied (Table 2). For comparison, Maltamo et al. (2010) focused on mature Norway spruce stands and reported RMSEs of approximately 13-20% on independent validation plots. The use of PMk-NN based trees list is an appropriate alternative for generating tree-level CBHs in ALS based forest inventory. Tree level estimates can be directly used in different models (biomass, tree growth) and these estimates can be flexible aggregated to desired units such as stands. The reliability of this approach was almost always better than that of the LME alternative and the estimates were also almost unbiased in the whole data and pine dominated plots. Since k-nn is already used in ALS based forest inventories (Hudak et al. 2008;Maltamo and Packalen 2014) the inclusion of crown base height as a target variable only requires that it is measured from field plots in each inventory project similar to DBH and tree height are currently measured. The development of field measurement devices, especially different altimeters, may allow taking measurements for CBH together with those for the tree heights. However, the feasibility of measuring tree-level CBHs in the field depends, on one hand, on the costs related to the measurements and, on the other hand, on the benefits of including this attribute in the description of the tree stock.
It should be noted that the current tree lists of the PMk-NN method were produced using independent and dependent variables selected to predict CBH values as well as possible. Many other possibilities for training the nearest neighbor search exist and different combinations of features representing both the tree-and plot-levels could be used (Packalen et al. 2012;Vauhkonen et al. 2014). The tree lists used in our study may therefore represent best-case accuracies for CBH predictions. However, it should be noted that also earlier studies optimized the composition of the tree lists specifically for the variables used in evaluation. Additionally, in k-nn analyses the weighting of different tree attributes must be re-solved if new attributes are added as dependent variables. In practice, however, we acknowledge that less accurate CBH predictions are possibly obtained based on tree lists optimized for other purposes such as predicting tree diameter and height distributions accurately.
The use of the mixed-effects model with both the tree and stand level predictors led to almost as good RMSE values as the PMk-NN imputation in the whole data. However, the estimates were biased at the plot-level. There are two possible sources for the bias. First, the DBH and height used with the mixed-effects model were those obtained from the PMk-NN tree lists. Since this approach was optimized for predicting CBH, the other tree attribute estimates might not be as accurate as possible (see above discussion). We examined the accuracy and precision of the LME approach in more detail from this point of view by examining the residuals of the CBHs against the mean DBH and height predicted for the plots (detailed results not shown). The trend between the residuals of these attributes was, indeed, stronger in the case of the LME model predictions than those obtained from the PMk-NN tree lists. However, although the plots with high residual errors based on the PMk-NN method had inaccurate DBH and height, the LME model was not found to magnify the CBH errors in these plots.
More importantly, the LME-model was constructed by using only pine-dominated plots but applied to plots including all species. This choice is reflected from the better plot-level reliability figures for the pine-dominated plots, but poorer estimates for the plots dominated by the other species. Using the models fundamentally assumes that the predictions are always made for target stands having pine trees in the dominant canopy layer, which was reasoned in the present case due to the strong dominance of Scots pine in the inventory area. If applied in another study area with different species composition, the plots dominated by different species should be distinguished from each other based on the ALS data. Even though certain plot-level intensity metrics derived from sparse ALS point clouds were found to differ between dominant species in another study area dominated by Scots pine (Vauhkonen et al. 2014), the use of similar metrics to distinguish between dominant species was only moderately successful in the presently available data (Räty et al. 2016). As discussed in more detail by Räty et al. (2016), the reason could be related to the fact that the intensity recordings were not range-corrected and no flying trajectory data to perform the calibration were available. We cannot thus rule out the possibility that if range-corrected, some intensity metrics could possibly have been selected more often also to the presently formulated models. Yet, it would be difficult to envisage considerable accuracy improvements due to such a calibration. We also experimented a model version which included all trees but the model reliability was worse due to the high number of suppressed trees (the bias increased 0.1 m in the modelling data). Overall, our mixed-effects model is more useful for predicting tree-level CBHs as an indicator of stand-level saw log quality than producing tree crown information for the whole tree stock.
Direct plot-level estimates of canopy base height were also included. Although the accuracy of these alpha shape estimates was the worst among the compared methods in the case of arithmetic mean CBH values, it should be noted that this estimate can be obtained directly from the ALS point cloud without field data. On the other hand, these predictions were almost unbiased for the whole study area and pine and spruce plots, when evaluated against the basal-area-weighted mean values, indicating a better quantification of the properties related to the largest trees of the stands. On the other hand, the considerable overestimation of the arithmetic CBH could also indicate on either that the direct detection of vertical gaps is overly sensitive for within-canopy gaps in the upper canopy, or that the ALS point cloud is not sufficiently dense from the lower parts (Korhonen et al. 2013). Notably, the approach cannot even theoretically produce the lowest reference CBHs because of the ground threshold applied. In the studied data, there were two plots with mean CBH below this threshold (1 m). Interestingly, the alpha shape approach showed better accuracies for spruce-dominated plots and the approach overall performed better in the spruce-dominated study area of Maltamo et al. (2010), which may jointly imply a need for a data-specific calibration. Finally, the alpha shape estimate was never included in the models or PMk-NN neighbor search, even if being available as a candidate predictor.
The plot-level regression estimate was the most accurate alternative also showing the bestcase accuracy that can be obtained for plot-level mean CBH predictions in this data set using ALS. On the other hand, a separate model must be constructed each time for different reference values, as in this study when using either the arithmetic or basal-area weighted mean CBH. Including the plot-level estimates of CBH in multivariate forest inventories may improve the possibilities to calibrate the estimates using similar techniques as in Kotivuori et al. (2018). On the other hand, it is also possible to transfer mixed-effect models effectively by means of local calibration using a low number of measurements either within an inventory area  or between inventory areas, even at the national scale (Kotivuori et al. 2016).
In this study the prediction of CBH was examined at the plot-level. However, if CBH estimation is included to ALS based multivariate forest inventory the interest and benefit of the approach is especially in species-specific-estimates. Since our study area was strongly dominated by one species, the number of plots (n = 265) was rather low for multivariate forest inventory and we only had ALS data, which are not feasible conditions for species-level analyses. The results presented here for pine dominated plots, however, show that for the dominant species, this metric can be predicted adequately. Therefore, further work is proposed to examine the reliability of species level CBH estimates.

Conclusions
The study showed that tree-level crown base height prediction can be included with an accuracy of 1-1.5 m to forest management inventory applications based on sparse ALS data. The most obvious alternative is to add tree-level CBH information to training data of k-nn imputation in addition to typically used species, DBH and height. As a result, wall-to-wall estimates of these attributes can be obtained. This study showed that k-nn approach is the most accurate tree-level alternative for CBH modelling and it can be preferred also due to the potential to provide tree-level predictions, the quality of which should however be further verified. Alternatively, mixed-effects models can be applied to predict CBH using tree attributes and plot-level ALS metrics. The benefit of the mixed-effects modeling approach is that once the model is formulated, it can be applied in other areas without new training data. Additionally, it can also be calibrated with rather low number of local measurements.