A flexible geometric model for leaf shape descriptions with high accuracy

Accurate assessment of canopy structure is crucial in studying plant-environment interactions. The advancement of functional-structural plant models (FSPM), which incorporate the 3D structure of individual plants, increases the need for a method for accurate mathematical descriptions of leaf shape. A model was developed as an improvement of an existing leaf shape algorithm to describe a large variety of leaf shapes. Modelling accuracy was evaluated using a spatial segmentation method and shape differences were assessed using principal component analysis (PCA) on the optimised parameters. Furthermore, a method is presented to calculate the mean shape of a dataset, intended for obtaining a representative shape for modelling purposes. The presented model is able to accurately capture a large range of single, entire leaf shapes. PCA illustrated the interpretability of the parameter values and allowed evaluation of shape differences. The model parameters allow straightforward digital reconstruction of leaf shapes for modelling purposes such as FSPMs.


Introduction
The shape of plant leaves plays an important role in various fields of study.Their diagnostic value is important in the field of systematics, where the taxonomy of closely related species, natural hybrids or subspecies is studied (Jensen et al. 2002), and paleobotany, where leaf geometry is used as proxy for paleoclimatic and paleoecological variables (Royer et al. 2005).In agricultural engineering, leaf shape analysis is essential for automated species recognition, which has applications in site-specific weed control and herbicide application using machine vision or remote sensing (Meyer 2011).Similarly, in forest ecosystem monitoring, the ability to capture and model the structural complexity of the forest canopy is one of the main constraints in method development and can lead to inconsistencies when neglected (Côté et al. 2011).In the fields of plant modelling and ecophysiology, accurate descriptions of the leaves' spatial arrangement and shape play a crucial role in the study of a plant's interaction with its environment.The structure of the forest or crop canopy, which is the combination of leaf shape, size and spatial orientation, constitutes the major interface between the plant and its aerial environment, and thus influences important ecophysological factors such as boundary layer resistance and radiation interception, which heavily impact the fluxes of water, heat and carbon (Prusinkiewicz et al. 1994).The prediction of these complex interactions and how morphological traits influence the final production is an important issue for applications such as plant breeding (Bergez et al. 2013).This makes a mathematical description and simulation of realistic leaf shapes indispensable in plant and tree models that take 3D geometry into account, such as functional-structural plant models (FSPMs) (Perttunen et al. 1996;Henke et al. 2014).
The purpose of geometric leaf descriptors is to provide a set of landmarks or parameters for a model, which allow the unambiguous approximation or identification of a leaf shape based on collected data.The level of geometrical detail contained in these descriptors depends on the application.A complex model may consist of large sets of parameters, which allow for a detailed and unbiased description of the leaf shape.Such descriptors can be useful in species determination, where the large set of parameters can be analysed through means of principal component or discriminant analysis (Neto et al. 2006), or for rendering realistic 3D scenes.A less complex model can provide a simplified though accurate representation of the leaf shape suitable for modelling purposes by saving substantial computation time (Fournier and Pradal 2012).
The most straightforward way to describe a leaf shape is by approximating the leaf width as a function of the distance to the leaf base by curve-fitting.This can be achieved using polynomials (Fournier et al. 2003), smoothing splines (Mündermann et al. 2005;Henke et al. 2014) or hermit curves (Henke et al. 2014).These methods are able to accurately describe simple leaf shapes, but have non-functional parameters, and therefore, cannot be directly interpreted as geometric features.Consequently, two morphologically very similar leaves can yield distinctly different parameter sets limiting applications such as studying leaf growth, as a small change in leaf shape during growth will not be captured by the gradual progression of one or a few parameters, but by a change of the entire parameter set.This limitation can be overcome by adapting the polynomial functions to contain only functional parameters such as curvature parameters (Dornbusch et al. 2007;Evers et al. 2007).While this works well for the lanceolate leaves of Gramineae, these functions are unable to capture more complex shapes (e.g.cordate leaves) due to their inherent inability to describe multivalued function shapes.
Another way to obtain geometric descriptors of leaf shape is by contour-fitting techniques, of which elliptic Fourier analysis (Iwata et al. 2002;Neto et al. 2006) allows near-perfect description of the leaf contour.However, the accuracy of this approach depends on the number of harmonics (h) used, resulting in 4h -3 parameters.A typical approximation uses the first 20 (Iwata et al. 2002) or 30 (Neto et al. 2006) harmonics resulting in 77 or 117 (non-functional) parameters, respectively.While this imposes no limitations for multivariate analysis, such a number of parameters are not practically applicable for plant and tree models (Henke et al. 2014).Bézier curves (Chi et al. 2003) allow the description of a multivalued function with a low number of functional parameters.These are ideal for describing the general shape of plant leaves for applications such as species identification and growth measurements, but lack the flexibility to accurately describe more complex shapes.
Ideally, a leaf shape model should be applicable to a wide variety of leaf shapes and strive for an optimum between complexity and accuracy with parameters that are geometrically interpretable (i.e.functional).To this end, Young (2010) presented a simple shape algorithm that is capable of generating a large variety of leaf shapes with a small number of parameters.The starting condition of the algorithm is a line of points, representing the midrib of the leaf.These points are assigned a growth rate and growth direction using the model parameters and their relative position on the midrib.In each step of the algorithm, these points grow outward, following their assigned direction and growth rate, causing an exponential growth pattern.These "growth" parameters apply to the progression of the algorithm, and have no biological interpretation (i.e. they do not represent the growth of the leaf).Young's primary focus was to capture the range of biological diversity in leaf shapes with a general and conceptually simple algorithm rather than accurately describing individual leaves.Therefore, the algorithm was kept relatively simple and has certain limitations in terms of flexibility and geometrical interpretability of the parameters.
This paper presents an adjustment to Young's algorithm to resolve the limitations on parameter interpretability and flexibility.More specifically, the abstract "growth" concept was discarded, and the concepts of direction and length (i.e."growth" rate) were modified into more flexible functions to enable description of a large variety of single, entire leaf shapes.The model was applied to leaf scan data of a variety of leaf shapes (from seven tree species) to illustrate the model accuracy and the diagnostic interpretability of the model parameters.Furthermore, a method is described to allow calculation of the mean leaf shape in a dataset.

Data acquisition
Seven tree species were selected from an existing leaf dataset (Mallah et al. 2013) based on their large shape differences to acquire a wide range of single leaf shapes.While this sample contains only trees, we believe the diversity of shapes to be sufficient to illustrate the applicability of the shape model for other plant species.The selection contained mask images for 16 different leaves derived from Alnus cordata (Loisel.)Duby, Arundinaria simonii (Carrière) Rivière & C. Rivière, Cercis siliquastrum L., Cercis siliquastrum F.Buerger ex Hance var.Chinensis, Argyrocytisus battandieri (Maire) Raynaud, Ginkgo biloba L. and Tilia oliveri Szyszył.These mask images were manually manipulated to remove the leaf petiole (if present) and to align the central vein of each leaf with the x-axis using the image manipulation software GIMP.Afterwards, contour coordinates were extracted by automated processing of the resulting images using the Python component of OpenCV.Finally, the contours were split at the midrib and normalised (i.e.length of midrib and width of the half leaf were set to one).Leaf model implementation, parameter optimisation, principal component analysis (PCA) and mean leaf shape calculation were carried out in RStudio.

Leaf shape model
The model describes the outline of a half leaf by progression of an internal variable j (0 ≤ j ≤ 1) describing the position (j,0) on the leaf midrib.Each value of j corresponds to a coordinate on the leaf edge (x j ,y j ) (Fig. 1).This transformation is represented by a vector, of which the length and the orientation are determined by the value of j.The length of this vector is described by g L (0 ≤ g L ≤ 1), with its progression depending on the complexity of the leaf shape.In any case, the point at the leaf base and leaf tip correspond to the same point on the leaf edge, namely (0,0) and (1,0).Therefore, the length g L of their transformation vector is zero.For the common entire shape of many leaves, a single point j d can be located where the width is maximal (i.e.g L = 1), with g L strictly increasing from the leaf base to j d and strictly decreasing to the leaf tip.A simple power law dependence is well-suited to describe such behaviour (Young 2010): The magnitude of exponents k 1 and k 2 determine the drop-off rate of g L towards the leaf base and tip, respectively (Fig. 2).
The orientation of the transformation vectors is described by a slope value s, which is equal to tangent of the angle between the transformation vector and the midrib normal.Consequently, s = 0 represents a positive orientation parallel to the y-axis, a negative value for s represents a rotation to the left and a positive value a rotation to the right.In the simplest case, a constant value for s can be used (as done in Young (2010)).However, this limits model flexibility and hinders the interpretability of parameter j d , as it will no longer indicate the relative position of maximal width on the leaf midrib.Therefore, we assured the fixed position of the maximal leaf width by assigning a value of zero to the slope at j d .Coordinates closer to the leaf base get assigned a negative slope value and those closer to the leaf tip a positive slope value.Progression of these values in terms Fig. 1.The model describes the leaf outline through a transformation of every point (j,0) on the leaf midrib.The size and orientation of these transformation vectors are dependent on the value of j.At the point of maximal width j d , the size of the this vector is 1 and its slope 0. of j can be described by a variety of functions depending on the complexity of the leaf that is to be described.In the general case of entire shapes, another power law dependence can be used: Contrasting to the functions describing the vector length, these functions achieve their maximal absolute value at the leaf base and tip.The magnitudes of b 1 and b 2 determine the maximal strength of the slope (Fig. 3), while p 1 and p 2 dictate the rate at which the slope increases (Fig. 4).Positive values for b 1 and b 2 cause the leaf to bulge outwards, negative values bulge inwards.As the position of the coordinates changes according to the length and orientation of their transformation vectors, the change in coordinates from the leaf midrib to the leaf outline can be calculated using Pythagoras' theorem: Therefore, the coordinates of the leaf edge are described by the following parametric equation: Concretely, these equations split each leaf side further into quadrants by the position of j d with parameters k 1 , b 1 , p 1 and k 2 , b 2 , p 2 describing the quadrants closest to the leaf base and tip, respectively.

Model performance evaluation
Model performance was evaluated by comparing the segmentation of an area containing the input leaf with the model description using R packages "sp" and "rgeos" (Bivand et al. 2013;Bivand and Rundel 2017).This resulted in four regions (Smochina 2011) (Fig. 5): -True Positive (TP) areas contain leaf surface that are captured by the model description; -False Positive (FP) areas contain no leaf surface, but are described as leaf by the model; -True Negative (TN) areas contain no leaf surface and are not described as leaf by the model; -False Negative (FN) areas contain leaf surface, but are not captured by the model.
As this definition does not specify the size of TN around the leaf, TN was redefined as: which represents the portion of correctly classified area (both leaf and non-leaf) by the model.

Parameter optimisation
A parameter optimisation was conducted to find the optimal parameter sets that best described each individual leaf half according to the accuracy criterion (Eq.6).Due to the multi-dimensionality of this problem with respect to model parameters, optimisation was done by applying a genetic algorithm to find the global minimum using R package "genalg" (Willighagen and Ballings 2015).
A genetic algorithm is a constrained optimisation (i.e. the boundaries of each parameter must be set prior to optimisation), which allows i) the elimination of nonsensical parameter combinations (e.g.negative values for k 1,2 ) and ii) maintaining the geometric interpretability j d as the position on which the leaf reaches its maximal width, while still allowing a broad search within the parameter space.Therefore, k 1,2 was constrained between 0 and 10, b 1,2 between -10 and 10 and p 1,2 between 0 and 20.Parameter j d was set to allow a maximal deviation of 0.1 from the true relative position of maximal width in the data.

Mean leaf shape calculation
The mean leaf shape of each group cannot be calculated by simply averaging each parameter in the set due to the inherent interdependency of the parameters k, b and p in each quadrant of the leaf, i.e. the magnitude of these parameters heavily affects the contribution of the other parameters to the size and shape of each quadrant.Therefore, a set of comparable pseudo-landmarks (Dryden and Mardia 1998) needed to be defined on the outline of each (half) leaf.This was achieved by calculating the intersections of a set of lines originating in the leaf centroid, defined as (j d ,0), at an incrementing angle α with the leaf contour (Fig. 6).An arbitrary value of α = 2° was chosen, resulting in a set of 91 pseudo-landmarks.In Ginkgo, the centroid definition was shifted to (1,0) to ensure a single intersection for each line.The resulting coordinates serve as a set of pseudo-landmarks which can be averaged across the samples, resulting in a description of the mean leaf shape.

Results
The flexibility of the model allows an accurate description of all selected single, entire leaf shapes (Fig. 7).On average, the model overestimated the normalised area of the leaves by 0.14%, and modelled them with an accuracy (as defined by Eq. 6) of 97.88%.The means and confidence intervals of the normalised area, and the accuracy of the model for each leaf group are given in Table 1.Similarly, Cercis is distinguished by high values for k 1 and b 1 , combined with a low value for p 1 , illustrating the presence of an outward leaf bulge oriented away from the leaf midrib.Cornus is mainly distinguished by the high value for p 1 .Combined with moderate values for k 1 and b 1 this can indicate an acute leaf base.The low values of k 2 , b 2, and p 2 may suggest the presence of a leaf tip.It must be noted however, that the PCA plot shows relative rather than absolute parameter values.Therefore, PCA is a good tool to compare relative shape differences but direct interpretation of shape from the model parameters should preferably be done directly on the absolute model parameters.
The calculated mean leaf shapes underestimate the true mean area ranging from 0.13% in Alnus to 3.82% in Ginkgo (Table 2).

Discussion
The model presented in this paper was built on the concepts of the growth-algorithm model presented by Young (2010) with the intention of improving flexibility and parameter interpretability to describe individual leaves with high accuracy.The notion of using an internal variable to describe the leaf outline through transformation vectors proved a useful approach to describe a large variety of leaves with the inclusion of a slope variable to deal with multivalued function shapes.However, Young's algorithm relies on multiple evaluations of the model (i.e. the transformation vectors are applied several times) to generate leaf shapes.Hence, the number of evaluation steps determines the final size of the leaf, but also alters its shape.This method is not suited for dealing with normalised shapes and hinders evaluation of the contribution of each parameter to the leaf's shape.Secondly, Young represents the slope of the transformation vectors as a constant.This results in a low flexibility of the model, but also hinders the interpretation of j d which then no longer corresponds to the point of maximal width.These problems were addressed to create an improved model, suited for mathematically describing the outlines of a large variety of single, entire leaf shapes.
In its current form, the shape of each leaf half is described by seven parameters (i.e.j d , k 1 , k 2 , b 1 , b 2 , p 1 and p 2 ), which results in a required 15 parameters for a full leaf shape description, including a single parameter describing the proportion of each leaf half to the entire normalised leaf.Additionally, the length and width of the leaf are required for a full digital reconstruction of the leaf on the original scale.
In applications where a low amount of parameters is preferred over high accuracy, a number of simplifications to the model can be made.Firstly, a PCA analysis can be used on the opposing leaf halves to assess whether the leaves can be assumed symmetrical (Fig. 8).If this is the case, the parameter set can be reduced from 15 to 7, as a proportionality parameter is then redundant.Secondly, in the case of very similar leaves (e.g. a single species), less flexibility is required.This allows the user to replace one or more parameters with well-chosen constants, effectively finetuning the model for the leaves in question.
In comparison to other methods, this model provides the advantage of being able to describe multivalued function shapes (as opposed to polynomials (Fournier et al. 2003), smoothing splines (Mündermann et al. 2005;Henke et al. 2014) or hermit curves (Henke et al. 2014)) with high flexibility (as opposed to Bézier curves (Chi et al. 2003)) and a relatively low amount of diagnostic parameters (as opposed to elliptic Fourier analysis (Iwata et al. 2002;Neto et al. 2006)).
Aside from the leaf shape model, a method was presented for calculating the mean leaf shape as a simplified version of the Procrustes mean shape (Stegmann and Gomez 2002).Where the presented mean shape is obtained by immediate comparison of the acquired landmarks, the Procrustes mean allows additional rotation and rescaling of the landmarks in order to optimise the shape alignment.These additional operations are redundant in this case as scaling was removed during normalisation, and the orientation is fixed due to the presence of the leaf petiole.The method for acquiring the landmarks was chosen to obtain a large, unambiguous set of comparable landmarks.The resulting mean shapes are good representations of the shapes in each group, demonstrated by their normalised area which closely approximates the true group mean (Table 2).The larger deviation of Ginkgo can be ascribed to a combination of large shape variability in the dataset and the irregular edges at the leaf tips.
Calculation of the mean leaf shape allows for the generation of a single representative shape of a leaf dataset.Combining this with an architectural model, results in a detailed description of the canopy.In the case of crop models, this allows straightforward evaluation of how leaf shape differences between plant genotypes influence crop characteristics such as light interception and photosynthesis for plant ideotyping (Sarlikioti et al. 2011).This presents a significant increase in model complexity to classical crop models, which often represent the crop canopy as a fixed nu mber of horizontal layers (e.g.comparison in Confalonieri et al. 2016).However, it can provide a solution to inconsistencies arising from neglecting canopy complexity such as leaf senescence resulting from self-shading and leaf age (Stella et al. 2014).Furthermore, proper simulation of tree light interception could be of major importance in understanding the microclimate, and thus growth conditions, of understorey plant species (De Frenne et al. 2013).Inclusion of such complexity substantially increases the performance of modelling light interception, but also allows simulation of the effect of individual leaves on tree hydraulics (de Reffye et al. 1997) and detailed descriptions of adaptation to light of individual leaves in the crown (Eschenbach 2000).
The presented model for capturing the planar leaf shape of trees can smoothly be extended for an even more detailed 3D leaf representation by including additional equations, which independently describe the progression of potential z-coordinates (Supplementary file S1: Fig. S.1, available at https://doi.org/10.14214/sf.7740).However, parameterisation of this additional complexity requires 3D digitising using in situ photographs from different perspectives or use of digitisers (Falster and Westoby 2003;Evers et al. 2005;Dornbusch et al. 2007;Dauzat et al. 2008), which was not the focus of our study.
In this paper a highly flexible and customisable model has been presented for the digitalisation of single, entire leaf shapes.The model was demonstrated for seven contrasting leaf shapes, illustrating the model's capability of capturing a large range of shapes with high accuracy.The interpretability of the parameters allows evaluation of (dis)similarities between leaf groups or opposing leaf halves through PCA.The presented method for calculating the mean shape of a group offers perspectives for modelling purposes such as FSPMs.

Fig. 3 .
Fig. 3. Influence of parameters b 1 and b 2 .The magnitude of these parameters determines the size of the formed extrusions.Positive values cause the leaf to push outwards, while negative values cause an inwards movement.j d = 0.3 , k 1 = k 2 = 2.5, p 1 = p 2 = 2.5.

Fig. 2 .
Fig. 2. Influence of parameters k 1 and k 2 on the relative length of the transformation vector across the leaf midrib.j d = 0.3, b 1 = b 2 = 0.

Fig. 4 .
Fig. 4. Influence of parameters p 1 and p 2 , which depend on the value of b. (A) b = 1, (B) b = -1.Lowering the parameter values cause the extrusions to move away from the midrib, while higher values cause them to move towards the midrib.j d = 0.3 , k 1 = k 2 = 2.5.

Fig. 5 .
Fig. 5. Illustrative example of model performance criterion areas when a leaf is classified as a square.

Table 2 .
Comparison of the mean normalised area of the leaves of the seven selected tree species with the normalised area of the calculated mean leaf shape for each group.SilvaFennica vol.52 no. 2 article id 7740 • Coussement et al. • A flexible geometric model for leaf shape…