Classifying soil stoniness based on the excavator boom vibration data in mounding operations

The stoniness index of forest soil describes the stone content in the upper soil layer at depths of 20–30 centimeters. This index is not available in any existing map databases, and traditional measurements for the stoniness of the soil have always necessitated laborious soil-penetration methods. Knowledge of the stone content of a forest site could be of use in a variety of forestry operations. This paper presents a novel approach to obtaining automatic measurements of soil stoniness during an excavator-based mounding operation. The excavator was equipped with only a low-cost inertial measurement unit and a satellite navigation receiver. Using the data from these sensors and manually conducted soil stoniness measurements, supervised machine learning methods were utilized to build a model that is capable of predicting the stoniness class of a given mounding location. This study compares different classifiers and feature selection methods to find the most promising solution for this learning problem. The discussion includes a proposition for a meaningful measurement resolution of the soil’s stoniness, and a practical method for evaluating the variability of the stone content of the soil. The results indicate that it is possible to predict the soil stoniness class with 70% accuracy using only the inertial and location measurements.


Introduction
The development of forestry operations using evolving sensor devices and new data sources with modern data analysis methods is a current trend that enables efficiency gains throughout the wood procurement process. This data-driven approach, also referred to as precision forestry, gathers detailed information about the forest in order to improve decision-making and enable more efficient forest operations (Fardusi et al. 2017;Holopainen et al. 2014). Measurement data about either the characteristics of individual trees or the prevailing environmental conditions of the forest can be collected (Melander and Ritala 2018;Salmivaara et al. 2018;Lindroos et al. 2015). In-situ data can be automatically gathered during various phases of the forest management process, in particular, during operations where heavy machinery offers a platform for the data collection equipment. One example of a forestry machine collecting such data is the forest harvester used in the wood harvesting stage (Hauglin et al. 2017;Olivera et al. 2016). However, other stages of forest management, such as forest regeneration, utilize other types of heavy machinery; excavators being a case in point. Although these machines are often less automated than harvesters, continuous advances in sensor technology now provide affordable measurement system solutions for such machinery.
One example of an environmental parameter in a forest that is not currently measured automatically is the stone content of the soil. The stone content can be determined with a variety of methods at different depths. The most obvious method, although often unrealistic due to its destructive and laborious nature, is to dig a pit of a given volume, and sieve all the coarse fragments in order to calculate the relative stone content of the soil (Stendahl et al. 2009;Eriksson and Holmgren 1996;Alexander 1981). Soil stoniness can also be estimated non-destructively by recording the stones visible on the soil surface using, for example, aerial photographs (Stendahl et al. 2009) or airborne laser-scanning data (Nevalainen et al. 2016). Such visual assessments of soil surface stoniness are employed in many European countries (Baritz et al. 2010), while another non-destructive method, used in Finland and Sweden at least, is the Viro rod method (Viro 1952;Stendahl et al. 2009). In this method, a steel bar is inserted into the soil up to a certain depth and the number of contacts with stones are recorded. After repeated measurements, a stoniness index can be calculated for the measured area by considering the proportion of insertions that hit stones. The stoniness index can be converted to soil stoniness using empirical translation functions (Viro 1952;Stendahl et al. 2009).
The stone content is particularly important when planning forest regeneration processes for an area, since a high stone content can cause difficulties for both the soil preparation and the planting (Saksa et al. 2018;Rantala et al. 2010;Saarinen 2006;Lideskog et al. 2014). Knowing the soil stoniness may also help when tree growth, weathering and hydrological models are developed (Eriksson and Holmgren 1996;Karlsson 2000;Coppola et al. 2013;Panagos et al. 2014). In the context of forest regeneration, the most relevant measure is the stoniness of the soil at depths of 20-30 centimeters. Visual interpretations of the soil's stoniness derived from the top of the soil can be misleading for estimating soil stoniness at this depth (Stendahl et al. 2009), while sieving through many soil pits is unfeasible due to the large areas of forest which have to be covered. In Finland at least, soil stoniness is neither available nor derivable from any of the existing map databases. Such a map database, if automatically collected, would require a predetermined spatial resolution of the recorded soil stoniness. One established representation of forest inventory data, utilized, for example, in airborne laser-scanning measurements, is to aggregate the measurements into an exhaustive grid with square cells. According to White et al. (2013), the size of the cells will depend on the size of the ground plot that needs to be measured. In Finland, grids of 16 × 16 m cells are typically used for forest inventory data (Holopainen 2011;White et al. 2013). The resolution of stoniness index measurements using the Viro method covers 10 meters in one direction, so a 16 × 16 m grid would appear to be a reasonable resolution for automated stoniness measurements.
A soil preparation operation is typically carried out after regeneration felling using heavy machinery. Its purpose is to enhance the survival rate and competitiveness of the young trees by manipulating, for instance, the soil conditions, the level of solar radiation and the competing vegetation (Löf et al. 2012). The main preparation methods include harrowing (disk trenching), patch scarification and mounding (Löf et al. 2012). These methods all involve tilling the forest soil in some manner, but at different depths and in different patterns. The method of choice depends on the soil type, the current conditions of the forest site and the tree species being planted (Äijälä et al. 2014;Löf et al. 2012;Luoranen et al. 2007;Londo and Mroz 2001). For instance, in spot mounding, an excavator, or some other similar heavy machine, scoops a patch out of the forest ground and tips the soil upside down next to the excavated patch. The seedlings are later planted on top of the resulting mounds. This paper focuses on spot mounding as the soil preparation method because: it can be carried out using an excavator; it is one of the most commonly-used forest soil preparation methods in Finland (Äijälä et al. 2014); and it is also widely used in other countries (Sutton 1993;Dzerina et al. 2016;Londo and Mroz 2001). In addition to excavator-based spot mounding operations, there are also forwarder-mounted continuously advancing mounders (Löf et al. 2015). In spot mounding, the soil is agitated to a similar depth as is used for conventional stoniness index measurements. Mounding is also an example of a forest regeneration process that would benefit from knowing the prevailing stoniness content in the target site, in particular when the continuous mounding method is considered (Saksa et al. 2018;Rantala et al. 2010).
Excavators provide a convenient platform for installing various sensors, enabling automatic data collection and in-depth analysis of forestry operations. With regard to mounding operations and the stoniness of the soil, the most relevant factors to measure are the movements of the excavator boom and the resulting vibrations and shocks caused by the mounding activity. Such movements can be measured with an inertial measurement unit (IMU), particularly if the unit is installed near the mounding blade of the excavator. These sensors have developed into low-cost devices due to their extensive use in numerous applications, including robotics (Botero Valencia et al. 2017;Menna et al. 2017), vibration detection (Singleton et al. 2017;Sabato et al. 2017), and human activity recognition (Pavey et al. 2017;Zdravevski et al. 2017). As the mounding blade of the excavator usually penetrates the soil surface at the same angle between the mounds, the subsequent vibrations and shocks to the excavator boom may correlate with the stone content of the soil. Therefore, measurements of these vibrations could be transformed into a stoniness classification for the area where the excavator is working. This approach necessitates identification of the excavator's movements, so that only the data resulting from the mounding activity need be extracted as being relevant for the stoniness classification. Consequently, the first task in the stoniness classification process is to be able to automatically distinguish which of the measured vibrations occur when the excavator is performing the mounding motion.
Activity recognition based on inertial measurements is a well-researched topic, in particular for human activities (Pavey et al. 2017;Zdravevski et al. 2017;Hammerla et al. 2016;Trost et al. 2014). Typically, activity recognition is achieved by supervised machine learning, in which a training data set is obtained from the inertial measurement sensors, and then a model is trained based on the observations and the known labels for the activity. Akhavian and Behzadan (2015) have proposed activity recognition for construction equipment using supervised machine learning classifiers and the built-in sensors of a mobile phone. However, to the best of our knowledge, no earlier work has been published on the automatic classification of soil stoniness based on inertial measurement data.
The aim of this paper is to present an approach that enables the automatic classification of a forest area into three stoniness classes while it is being mounded. In this approach, the stoniness classification is based on the vibrations of an excavator boom taken while the excavator is performing spot mounding. The classification models are trained and validated using both the IMU measurements and the stoniness index measurements conducted manually on several soil preparation sites. This study evaluates the selection of different features for supervised machine learning algorithms in order to generate a two-tier classification model, and reports on the best combination of these methods based on the measured data.

Materials and methods
The overall stoniness classification process using recorded IMU data includes the following two classification tasks in series: excavator activity classification, and stoniness classification of the mounded points. These tasks are performed in sequence, since the result of the activity classification is needed in order to select valid input data for the stoniness classification. The same IMU data is used for both classification tasks, but the first classification eliminates the vibration data that does not result from the mounding motion. The performance of several machine learning algorithms and the selection of features were evaluated for both classification problems. Fig. 1 shows the overall classification process from the acquired raw data to the final stoniness class.

Data collection setup
The collected data consists of the vibration measurements from the excavator boom, the GPS location and manual stoniness measurements of the mounded forest ground. The excavator was a Doosan DX140LCR crawler (Doosan 2019), weighing approximately 15 tonnes. The attached mounding blade had a width of 52 centimeters. The automatic measurements were collected from four separate mounding sites in the Valkeakoski region in southern Finland during November and December 2017. These measurements were gathered during a normal mounding operation, and thus represent true forest operation conditions. The logging residues from the clear-cutting had been removed from the site before the measurements were taken, but the stumps had been left in place, which is common practice before spot-mounding. The temperature during the data collection period fluctuated around zero degrees, but the ground was free from snow and frost. The soil types in the area consisted of sandy moraine, fine sandy moraine and silty moraine.
The vibrations of the excavator boom were measured using a special-purpose measurement device that was attached to the boom near the mounding blade (Fig. 2). The measurement device consisted of an IMU, a GPS receiver, a small computer for storing the measurements and a battery. The IMU was an Adafruit BNO055, which is based on a Bosch BNO055 9-axis absolute orientation sensor (Bosch Sensortec 2014). This sensor consists of an accelerometer, a gyroscope and a magnetometer, each having three axes, and it is capable of fusing the measurements together in real time before sending them to the data acquisition computer. The measurement axes of the IMU are marked in Fig. 2. The GPS receiver was a Haicom HI-204III-USB GPS receiver (Haicom 2018). The IMU is capable of producing linear acceleration output by preprocessing the accelerometer data using internal data-fusion algorithms. This study utilizes the fused linear acceleration and raw gyroscope measurements. The IMU measurements were gathered at a sampling frequency of 100Hz and the GPS locations were updated every second. In addition to the automatic vibration measurements, manual measurements were collected after the mounding work was completed for each site. The manual measurements were performed according to the method suggested by Viro (1952). This is a straightforward method for obtaining a single stoniness index measurement. A sharp steel bar with a diameter of 12 mm was inserted into the ground to a depth of 20 centimeters and the number of hit stones was recorded. In this study, each stoniness index measurement consisted of ten penetrations in a straight line at intervals of one meter. The percentage of hit stones within the penetrations gives the stoniness index in the center of the measurement line. This procedure thus returned stoniness indices at percentages of from 0 to 100% at intervals of 10%. The direction of the measurement line was selected according to the assumed route of the excavator, ensuring that the measurements were only taken from the same locations as where the IMU had recorded the excavator boom vibrations. Fig. 3 shows the locations of the measured stoniness indices at the four sites, each produced with ten penetrations. The initial manual measurements were conducted in November 2017, under much the same conditions as for the IMU measurements. The ground was fairly wet as it had rained for several days at the end of October, but the temperature was fluctuating around zero and there was no snow or frost. The manual measurements were divided into three stoniness classes as suggested by Viro (1952) in the following manner: 0 to 30% for low stoniness, 40 to 60% for medium stoniness and 70 to 100% for high stoniness. The convention of measuring the index along a line in single direction introduces a degree of uncertainty for the manual measurements and consequently to the ground truth of the automatic classification process. In order to evaluate the accuracy of the manual measurements and to validate the predictive models, another set of manual measurements were taken from sites 1 and 2 in August 2018 (Fig. 3). These later manual measurements were performed at the sites where the variability of the stoniness index seemed to be highest in the earlier measurements. The weather conditions were completely different from the initial measurements taken the previous autumn, as the summer of 2018 was dry in Finland, although the weather should not significantly affect the manual stoniness index measurement process.

Activity recognition
The development of an excavator-activity recognition algorithm was done with the aid of a video recording of the mounding work. This video consists of approximately 10 minutes of the mounding work and it included a total of 46 mounding movements. The excavator was halted in order to synchronize the video and sensor times. Based on the video, segments of the IMU data were labeled manually as being either mounding or non-mounding operations. With this training set, the machine-learning classifiers were trained to detect the digging phase in the mounding motion sequence based on the IMU data alone.
The first step in the activity recognition process was to extract the relevant features from the IMU signals. A window of 400 measurements, meaning a duration of 4 seconds, was employed for this purpose. This window length is justified by the fact that the average length of a digging motion in the labeled data was 3.5 seconds. The data windows were overlapped by 50% (200 measurements), as this procedure has been found to be efficient in other activity recognition studies (Akhavian and Behzadan 2015;Gupta and Dallas 2014). A set of features was calculated over each window in each sensor axis. The features were initially selected based on other researchers' experience in handling similar problems in the literature (Akhavian and Behzadan 2015;Gupta and Dallas 2014;Liu et al. 2012). The selected initial features are listed in Table 1. A total of 123 different features were calculated (108 single signal features and 15 correlations) as there were six measurement axes (three in the accelerometer and three in the gyroscope). The feature space is very large compared to the available training data, so the established practice is to whittle the feature space down until it only contains the most relevant features. Two feature selection algorithms, ReliefF (MathWorks 2018b), and Sequential feature selection (MathWorks 2018c) in Matlab, were used to select the most appropriate features. A sequential feature selection (SFS) algorithm was performed 5 times for each model and any individual features that appeared in more than one of the selections were chosen as final features. The ReliefF algorithm ranks the features according to their importance, so the final number of features was determined by observing which number of the ranked features minimizes the misclassification rate in the cross-validation process. The algorithms can select very different features for training, so the performance of the classifiers is reported using both approaches.
In this study, the following supervised learning classifiers were trained: Support Vector Machine (SVM), Binary Decision Tree (BDTree), NaïveBayes, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), K-Nearest Neighbors (KNN) and a neural network model (ANN). Each of the models were trained using uniquely selected features, except for the neural network model, which was trained using all 123 features as a point of comparison. The ANN model was a feed-forward network consisting of one hidden layer with 10 nodes. The accuracy of each trained classifier was evaluated using 10-fold cross-validation, which divides the dataset into 10 folds and uses each fold in turn for testing while the other folds are used for training. The result of the classification is applied in selecting the data for the stoniness classifications, so the models can be trained by assigning a higher cost for a false positive classification. Although this results in lower overall accuracy, it tends to leave more data out of the second phase rather than including data that was not generated by the digging activity. For example, the cost of a false-positive classification can be set twice as high as the cost of a true-positive classification in the cost matrix given for the algorithm. The final activity recognition models were trained using all the available data and the best performing model was defined using the averaged accuracy of the cross-validation.

Stoniness classification
The stoniness classification process starts by detecting the digging periods from all the recorded IMU data using the best-performing activity recognition model. Only those parts of the data classified as "digging" are utilized in the stoniness classification, so any vibrational data that did not originate from the mounding blade's contact with the soil is ignored. This process also selects data that has been falsely classified as digging by the first classifier, but the errors can be minimized by applying higher classification costs for the false-positive classifications, as described in the previous section. The resulting dataset for all the measured data consisted of approximately 70 000 data windows.
The stoniness class labels for the supervised learning setting are based on the manual measurements. Since the manual measurements were pointwise and did not cover the whole measured site, the manual measurements that were within a radius of four meters from the mounding locations were used for labeling. Of the 70 000 data windows, approximately 3000 were labelled with manual measurements. Only the labeled parts of the IMU data could be used for training the model and for the final accuracy evaluation of the predicted stoniness classes.
The stoniness classification process was evaluated using the same classifier algorithms as for the activity recognition, but multiclass versions (MathWorks 2018a) of them were needed to predict three classes instead of two. The features for the IMU signals were calculated again for the mounding windows. The initial features were the same as in the activity recognition, but the feature selection algorithms (ReliefF and SFS) utilized the stoniness class labels and selected different features to be calculated. This labeled dataset, approximately 3000 observations, was 10-fold crossvalidated to evaluate the accuracy of the stoniness classifications. After the accuracy estimation, the classification models were again trained using all the labeled data, and the stoniness classes of the unlabeled dataset was predicted using all the models.
The output of a stoniness classification model is a probability distribution for the three stoniness classes (low, medium, high) for each mounding point, i.e. three probabilities, totaling one in all, for every soil contact period in the data. The stoniness class with the highest probability is chosen as the predicted class for the mounding point.

Grid representation
The classification model of the stoniness index returns the stoniness class predictions, hereafter called point predictions, for each window of data. A single digging motion could produce more than one prediction for the stoniness class due to the windowing of the data. Furthermore, the generated mounds are not evenly distributed over the forest site, so a robust representation of the data is needed to generate useful databases for the stoniness information. In this study, the previously-mentioned grid of 16 × 16 m cells is utilized for aggregating such data points into a map with regular intervals for the stoniness classes. In the collected data, one cell typically contains 10 to 150 predictions of the stoniness class. The most frequent prediction (i.e. the mode) of the single-mound predictions inside the grid cell was selected as the stoniness class of the cell in question (hereafter called grid prediction).
The grid prediction gives more coarse information about the soil stoniness in the forest sites than the individual point predictions, but it is also more comparable to the resolution of the manual measurement technique that was used. For this reason, a heuristic accuracy measure was calculated using the grid cell neighborhood of each manual measurement. In other words, each manual measurement was examined together with four of the nearest grid cell predictions. A correctly classified prediction was recorded if the nearest cell prediction agreed with the manual measurement, or if at least 50% of the four nearest grid cell predictions agreed with the manual measurement class. The grid prediction was cross-validated in the same manner as the point prediction.
Moreover, an entropy measure was calculated based on the predicted stoniness classes inside each cell. Entropy, according to information theory, describes the average lack of information contained in an observation, i.e. the uncertainty of a discrete-valued random variable (here the cell stoniness class). High entropy means low predictability. The entropy for the single cell k is defined as: where m is the number of different predicted classes (here three), n i is the summed occurrence of i:th class inside the cell and N is the total number of predictions inside the cell. If all classes occur with the same frequency, the entropy is at its highest, 1.585. If, for example, two classes are equally frequent and the third does not occur at all, the entropy is 1. Both the final grid cell prediction and the entropy are dependent on the performance of the classification model. They are reported only for the model that shows the best results when compared to the manual measurements. Single grid cells contain different numbers of predictions in this study, and some grid cells have only a few measurements. In this study, grid cells having less than 10 predictions were not regarded as representative samples of the variability of the stoniness inside the cell, so a threshold of 10 predictions was used to exclude these cells from the entropy evaluation.

Feature selection
The features ranked most important by the ReliefF algorithm are presented in Table 2. Only the 25 most important features are listed. The features were individually selected for each classification model based on the combination that best minimizes the classification error. Fig. 4 shows the 10-fold cross-validated misclassification rate with different numbers of features taken from the ReliefF algorithm feature-ranking results. Fig. 4a shows the error progression in the first classification task (activity recognition) and Fig. 4b, in the second classification task (stoniness classification). In the activity recognition task, most of the models have only small variations in the misclassification rate after five selected features. However, the SVM and KNN models show significantly worse prediction results when the number of selected features is increased. This same behavior due to overfitting is somewhat visible in the stoniness classification task.
The SFS algorithm selected the features directly using the 10-fold cross-validation and the misclassification rate. The selected features for the activity recognition are listed in Table 3 and for the stoniness classification in Table 4. The number of the selected features is similar to the results presented with the ReliefF algorithm in Fig. 4, but the selected features themselves are actually different.

Activity recognition
The average accuracy calculated using the 10-fold cross-validation for the activity recognition is presented in Fig. 5. The figure shows also the percentage of false-positive predictions, which are harmful for the subsequent stoniness classification. Based on the results, the BDTree classifier with features selected by the SFS algorithm was chosen as the best model for activity recognition as it had an accuracy of 94.4%. The model in question predicted an average 2.5% of the labels as false positives. Essentially, all the models gave uniform results that showed an averaged accuracy of near 90%.

Stoniness prediction
The cross-validated accuracy of the stoniness prediction is presented in Table 5. The accuracies for the point predictions are rather low, due to the fact that the labels were generalized for a larger group of predicted points from a single manual measurement. Table 5 also shows the grid prediction accuracy, which better describes the success of the classification within the grid cell resolution. The grid prediction accuracy is calculated for the second set of manual measurements.
Figs. 6-9 show the predicted stoniness classes for each measured site individually. The prediction results are accumulated in the grid cells as described in the Materials and Methods section. In the figures, the area formed by the predicted grid cells is the site where the excavator was moving, in other words the locations where the IMU measurements were recorded. The manuallymeasured stoniness classes are laid on top of the predicted grid cells to give a good visualisation of the prediction performance.

Entropy of the stoniness prediction
The calculated entropy for the grid cells is presented in Fig. 10 for the first site. Calculation of the cell entropies for the other sites returned similar results. A large entropy value (near the maximum value of 1.585) for a grid cell suggests that all three of the stoniness classes are evenly predicted inside the grid cell. This implies a large variation of the stoniness inside the grid cell, and consequently the predicted class would not have much use for forestry applications.

Discussion
Knowledge of the prevailing forest soil stoniness is an asset when planning forestry operations, but it is not usually available before actually visiting the site. Therefore, we present a solution for automatically classifying soil stoniness during mounding work. This method of classification only needs an IMU and a GPS receiver and thus enables an affordable solution for the problem, as these devices have already been commercially developed for use with heavy machinery, such as excavators. As this data about the stoniness of the soil is not gathered until the mounding work is done, we cannot directly improve the soil preparation process with these measurements. However, the main idea is to gradually build up an exhaustive database that could be utilized in subsequent forest operation phases and could act as a reference for new stoniness models and measurement methods. These are currently needed in Finland as there is an upcoming project for closely mapping the soil properties of the whole country. In this project, novel models are being developed for estimating soil stoniness which, for instance, fuse available but heterogeneous data sources, such as remotely-sensed and manually-collected soil data. In this respect, our automaticallycollected reference data for soil stoniness would be extremely useful in building generalizable stoniness models for larger geographic areas. In addition, the recorded stoniness information can aid subsequent phases in the forest renewal process, such as planting (Lideskog et al. 2014) and harvesting. Furthermore, as the physical properties of the soil are strongly influenced by its stoniness (Eriksson and Holmgren 1996;Stendahl et al. 2009), hydrology models could be improved if the stoniness is already known. Hydrology models are utilized, for example, to make estimations of the forest's trafficability (Salmivaara et al. 2017), so improved hydrology models could be very useful in planning thinning operations in the future. As areas for further study, larger training sets are needed to cover the effects of different mounding site conditions, different types of excavator and possible differences in the behavior of different excavator operators. This paper has proposed an automated two-stage procedure for classifying the stoniness of forest soil into three categories. One of the first results of this study is a procedure for selecting suitable features from the IMU signals and for deciding what windowing procedures should be applied. Although we did not report the results obtained with other window lengths here, no significant improvements were observed with window sizes of 100, 200 or even 600 samples, so it can be said that the selected window lengths seem to work well. We performed the feature selection using two different algorithms to reduce the risk of drawing false conclusions about the selected features. The results for the activity recognition shows that the selected features clearly differ between the selection algorithms. However, both of the methods selected many features from the Y-axis of the accelerometer (Table 2 and Table 3), so we have assumed it to be the most important axis for recognizing any digging motion. This was as expected because the Y-axis is parallel with the direction of the digging motion. Therefore, the mounding blade's contact with the soil is likely to cause vibrations on the excavator boom in the Y direction. We also expected the gyroscope measurements to be important in distinguishing the occasions when the machine is not digging, as the gyroscope easily detects the rotation of the excavator boom. The same reasoning applies for the stoniness classification, the Y-and X-axes of the accelerometer are expected to sense the vibrations caused by different soil stoniness levels.
The classification of the excavator activities into "digging" and "non-digging" is essential for detecting relevant data for the stoniness classification. In this context, it is important to be able to detect the most certain digging motions, as activities incorrectly labelled as "digging" would result in predicting the stoniness class of the soil from data that was not produced from contact with the soil. For this reason, the activity classifiers should be trained so that false-positive results are more costly than false negative results. Furthermore, on the whole the number of predictions per grid cell is pretty high, so mistakenly discarding data from the stoniness classification is not an issue. We found the activity recognition results to be consistent using both feature-selection algorithms and with all the supervised learning algorithms. This suggests that the digging motion has been correctly detected, and therefore the first classification task is justified. Nevertheless, the dataset for the activity detection was relatively small, so all the possible mounding conditions have probably not yet been covered.
In the stoniness classification process, the low number of manual measurements did not yield enough training data for the classifier training. For this reason, we used a radius of four meters around the center point of the manual measurement to label the detected mounding locations according to their class. This was a coarse generalization, which surely does not represent the true variation in the soil's stoniness. However, the required resolution in the final application is probably in the region of 16 × 16 meter cells, which is the resolution used by many forest operation planning systems in Finland. Consequently, the generalization of the manual measurements is reasonable, but it does result in seemingly poor accuracy results if calculated for each single predicted data point. In addition, the point accuracy does not seem to describe the large differences that are evident when visually inspecting the stoniness predictions by the different classifiers. The accuracy we calculated for the grid prediction provides a more meaningful evaluation of the accuracy as it arranges the classifiers in the order that agrees with human reasoning. Nonetheless, due to the measurement procedure of measuring along a line in one direction, repeated manual measurements inside the grid cell can have notable variations. The uncertainty of the manual measurements is seemingly the most significant error source in the model training and evaluation. For practical reasons, the manual and automatic measurements rely on different GPS devices, which may result in misalignment of the measurements when comparing them to each other, although we consider this error source to be negligible.
We have presented the predicted results for the four different sites for the best-performing classifier. The visual examination of the predicted grids and the manual measurements together give an idea of the classifiers' performances. For example, in the prediction map of the third site (Fig. 8), the few locations where low stoniness was detected in the manual measurements were also estimated as low stoniness in the final prediction. If we compare the sites with generally low or high stoniness, i.e. the second and fourth sites, the classifier predicts the results logically according to the dominant class. These observations suggest that the model is capable of detecting different stoniness classes and has not overfitted to a certain condition. Furthermore, we did not observe any notable differences in the prediction accuracies for single classes, i.e. the reported overall accuracy describes the predictability of all three classes well.
In any final application of this methodology, the excavator would record the data, and perform the soil stoniness classification with a pre-trained classification model. As seen from the presented entropy map (Fig. 10), the uncertainty of the classification varies and the amount of single predictions per grid can be quite different. Therefore, the entropy of a single grid cell could be used to evaluate whether the predicted class is a representative estimate for the whole grid cell. The degree of entropy needs to be taken into account when saving the predicted classes to a database, and probably some thresholds for separating uncertain predictions from the reliable ones are needed. The entropy value can give misleading interpretations for the variability of the stoniness in the grid cell if it is calculated using only a few point predictions. In this study, we have excluded some of the grids from the entropy analysis using an arbitrarily selected threshold of 10 point predictions. We do not suggest here that this is an optimal threshold value, but such a value could be assessed when more measurement data is available. In addition to the grid cell entropy, the probability distribution of a single point prediction could be used to determine whether the point is taken into account when calculating the grid cell prediction.
In conclusion, the results show that the three-class stoniness classification has an accuracy of up to 70% when using a moderate spatial resolution for the predictions, such as the 16 × 16 m cell grid. In addition, with our test data the detection of the digging motion achieved over 90% accuracy. These results suggest that equipping excavators used in soil preparation operations with inexpensive inertial measurement units would facilitate the collection of comprehensive information about the stone content of the upper layers of forest soil. Further research is needed to investigate whether other sensors, such as pressure sensors for the hydraulic cylinders of the excavator, could be utilized to further develop the classification accuracy, and larger datasets are needed to cover different field conditions. However, it has been shown that it is possible to automatically collect information about soil stoniness during forest mounding operations with only minor changes to the mounding machinery. This could eventually lead to more efficient (and economical) forestry operations and would also produce valuable reference data for developing more extensive soil models.