Genetic determinism of flowering regularity over years in an apple multi-family population

Irregular flowering over years is commonly observed in fruit trees and is assumed to be, at least partly, under genetic control. This study aimed at 
predicting genotype flowering behaviours and at detecting QTL associated to regularity, in a multi-family apple tree population. Successions of vegetative and floral annual shoots (AS) were observed along axes in trees belonging to five apple related full-sib families, observed at two experimental sites. Sequences were analysed using Markovian and linear mixed models including year and site effects. 
Indices of flowering regularity, periodicity and synchronicity were estimated, at tree and axis scales. First indices were derived from the Biennial Bearing Index (BBI). A second index was the auto-regressive correlation coefficient between flowering in consecutive years. A third index quantified the synchronicity of meristems within the trees through the measure of differences in the predictability of flowering over years, from a probabilistic viewpoint. These three types of indices were used to predict tree behaviour and to detect QTL with a Bayesian pedigree-based analysis, using an integrated genetic map containing 6,849 SNPs. The combination of the three indices efficiently predicted and 
classified the genotype behaviours despite few miss-classifications. Four common QTLs for BBIs and auto-regressive coefficient were highlighted (on LG4, 5, 8 and 10) and one for synchronicity (on LG9) in the integrated multi-family map, thus revealing the complex genetic architecture of the considered traits. This study proposes a posteriori sampling of axes within trees as a relevant and time-saving 
method to estimate tree flowering behaviours. Coupled to appropriate statistical indices, it is efficient to evaluate the tree breeding values for flowering regularity. 
In apple tree, biennial bearing appeared to result from high AS synchronism in flowering, i.e. with all axes alternatively flowering or not in a given year, whereas regularity resulted from either asynchronous alternating or regular flowering of AS.


INTRODUCTION
Biennial bearing is defined as irregular fruit or seed production over consecutive years. This trait is commonly observed in perennial crops (Monselise & Goldschmidt, 1982, Samach & Smith, 2013. In fruit trees, yield and fruit quality depend on bearing behaviour, which is in turn strongly dependent on flowering intensity. a E-mail: evelyne.costes@inra.fr Management techniques are usually required for controlling biennial bearing in fruit production. An alternative strategy would be to select cultivars combining high fruit quality, long-term resistance to pests and diseases, tree architecture adapted to modern training systems and regular production. Predicting bearing habit as soon as possible from the beginning of the genotype production is thus of high interest, since breeding programmes are particularly long for trees. This strategy is reinforced by the existence of large differences in bearing behaviour among cultivars (Lauri et al., 1997) and of genetic control of biennial bearing (Guitton et al., 2012, Durand et al., 2013. The Biennial Bearing Index (BBI), which estimates the intensity of deviation in yields during successive years (Wilcox, 1944), has become the accepted standard to describe biennial bearing. However, the measure of the magnitude of irregular bearing by BBI is questionable, especially for trended series (Huff, 2001). A new methodology was thus introduced to characterize the bearing habit of trees as early as the first years of production, when the production is increasing. This methodology was based on a trend model on the yearly number of flowers, combined with a BBI-derived index and an index based on correlations between residuals, denoted . An entropy criterion was proposed to assess synchronicity of flowering in a given year, allowing a connection to be drawn between axis-and tree-scale behaviours. However, the ability of axis-scale indices to predict genotype habits at tree scale was investigated on a single family and the annual shoot sequences were merely used to approximate the total number of flowering AS.
In the present study, we extended previous investigations by further exploring the sequences of flowering shoots and by performing a multi-family QTL detection to enlarge the genetic basis of biennial bearing variation in apple trees. We assumed that the analysis of entire sequences of successive AS, combined to flowering synchronicity in each year, would provide new insights on the genotype behaviours. We thus proposed to use not only the total number of flowering AS, but also the vegetative ones and their succession and to re-examine previous assumptions on the relation between alternation and regularity at tree and axis scales. Regarding genetics, we considered a larger germplasm to allow the comparison of alleles' performance in different genetic backgrounds (Pauly et al., 2012). We assumed that this would increase the number of segregating QTLs, detection power, accuracy of positions, and give more robust estimation of QTL effects . Our aim was to confirm previously found QTLs on the SG family and find new ones, deepening our understanding of the genetic determinisms of biennial bearing in apple tree.

MATERIAL AND METHODS
Five segregating families with known and related pedigrees were used (See Allard et al., 2016 for family and pedigree description). The first family (SG) is derived from a cross between 'Starkrimson'Starkrimson® Red Delicious' (biennial bearer) and 'Granny Smith' (regular). It was composed of 115 genotypes, each replicated twice (Table 1). In the second family (XB) derived from a cross between the hybrid X3263 (regular) and 'Belrène' (biennial), 58 genotypes were considered. These families were planted in 2004 and 2005, respectively, at the Diascope INRA Montpellier experimental unit. The three other families were chosen for their related pedigree. The HIVW family had a parent (X3263) common to the XB family and the other one (X3259) to the N family. Both N and P families had a common parent (X3305) and these last three families derived from 'Golden Delicious'. They were composed of 171, 42 and 45 individuals, respectively, each with a single replicate. They were planted at the INRA Angers experimental station. The trees were trained in vertical axis with an annual manual thinning that was performed at the end of June, and left one fruit per inflorescence. At both sites, pest and disease management was performed consistently with professional practices. On each individual tree, successions of vegetative vs. floral AS were observed over consecutive years (based on the presence/absence of an inflorescence) along different types of axes, classified depending on their length (Table 1). Statistical analysis was based on indices defined in Durand et al. (2013): a BBIderived index denoted BBI_res_norm, an autoregressive coefficient denoted g and entropy, denoted g Ent . They are based on counts of flowering AS at axis and whole-tree scales, whenever possible. The description of the trend and autoregressive models, BBIderived indices and the statistical methodology for classification of genotype habit can be found in Durand et al. (2013 and2017). The indices at axis scale, denoted by BBI_res_norm_ax, and  ax , were computed as those defined at tree scale but using the total yearly counts of flowering AS in axes sampled within each tree replicate. These counts were also used to compute two entropies, denoted g Ent (entropy based on frequencies) and g glmm Ent , (entropy based on a generalized linear mixed model -GLMM). We assumed that the true class of each genotype (regular, irregular or biennial) could be deduced from the tree-scale indices for the SG family, using a clustering method developed in Durand et al. (2013). The classification of the genotypes of all families from axis-scale indices was achieved using as predictors BBI_res_norm_ax, ax  , g Ent and g glmm Ent , , which were available for every genotype, as opposed to whole tree-scale predictors. Note however that some predictors could be missing for some genotypes, or highly correlated and redundant. To handle both issues, a principal component analysis (PCA) for partially missing data was used, with the R package missMDA (Josse & Husson, 2012). Classification was performed using neural networks. The number of principal components (PCs) and the NN regularisation parameter were determined by out-ofsample validation. Because classification prediction provides a rough summary (through three classes only) of the flowering behaviour, quantitative assessment of the bearing behaviour was achieved through the tree-scale indices BBI_res_norm and g  (when measured). Since these indices were known for SG only, approximation was performed by regression, using axis-scale indices BBI_res_norm_ax, ax  , g Ent and g glmm Ent , as predictors. NNs were used as nonlinear regression functions, also using missing data PCA. The NN parameters were estimated by least squares minimization. Since the optimal numbers of PCs to be used in classification and regression NNs may be different, they were both chosen independently by out-of-sample validation. The approximated values of BBI_res_norm and g  are referred to as BBI_res_norm_pred and pred  , respectively. These two values are linearly dependent, due to the nature of NNs.
The five full-sib families and their progenitors were genotyped with the Infinium® 20K SNP array (Bianco et al., 2014). A genetic map composed of 7,100 SNPs was integrated over 27 full-sib families (Di Pierro et al. 2016) and used for QTL mapping. 6,849 SNPs were used after careful checking of their robustness (Van de Weg et al., 2013;Di Guardo et al., 2015), consistency and recombination pattern on the 5 families and the pedigree members (Allard et al., 2016). Then sets of single SNPs were integrated into haploblocks, corresponding to successive 1 cM segments. Haplotypes were composed using the software FlexQTLTM and PediHaplotyper .
QTLs were detected using a linear model that comprised an intercept , the regression coefficients a on the QTL covariates, and a residual e, as: where W is the design matrix for the QTL effects. As QTL genotypes of individuals are a priori unknown, modelling is based on independent assignment of alleles to founders and segregation indicators to trace transmission from parents to offspring (Bink, 2002;Bink et al., 2008). The number of QTLs was assigned a Poisson prior, but results are reported for a prior mean of 5, only. Samples from the joint posterior distribution of the model parameters were obtained by Markov chain Monte Carlo (MCMC) simulation in FlexQTLTM (Bink et al., 2008 and. The number of QTLs was inferred from a pairwise comparison of models differing by one QTL, and considering twice the natural logarithm of the Bayes Factors, denoted 2*lnBF. Values greater than 2, 5 and 10 indicate positive, strong, and decisive evidence for the presence of a QTL, respectively. QTL positions were based on posterior QTL intensities, and QTL contributions on the posterior mean estimates of the QTL effects.

RESULTS
According to BIC values, the best trended model included all fixed (site, memory, year) and the random replication effects (data not shown; see Durand et al., 2017). The trees in Montpellier had lower flowering probability than those in Angers. Flowering AS were more frequent after a vegetative AS preceded by a flowering AS, than directly after a flowering AS, showing frequent biennial alternation in flowering at axis scale. The probability of flowering was the highest in 2008, whereas it was particularly low in 2010 and 2012, whatever the site.
BBI_res_norm and g were regressed with three or four principal components (PCs) using NNs, and the best cross-validated correlations were obtained with three PCs. The optimal correlation between BBI_res_norm and its prediction BBI_res_norm_pred was 0.71 when using BBI_res_norm_ax, ax  and g Ent as predictors, and 0.72 when using the PCs. The optimal correlation between g and pred  was 0.60 using BBI_res_norm_ax, ax  and g Ent , and 0.64 using PCs. The unknown bearing habits at tree-scale of the genotypes of XB, HIVW, N and P were predicted by using the optimal NN model on SG that was re-estimated on the whole data set, using genotypes with known classes (i.e. 122 genotypes in SG) for learning the mapping between local indices and classes. Confusion between classes concerned irregular genotypes that could hardly be discriminated from regular and biennial bearing genotypes (Table 1). This comes from irregular genotypes having intermediate values of their indices, between those for the regular and biennial bearing genotypes. As a result, 15 regular and 9 biennial genotypes were classified as irregular, and 9 irregular genotypes were classified as regular. In contrast, one misclassification only occurred between regular and biennial bearing genotypes, highlighting that discrimination between both behaviours is easy.
Two major QTLs were detected with a strong evidence (2*lnBF ≥ 5) on LG4 and LG5, for the two BBI-derived indices (BBI_res_norm_ax, BBI_res_norm_pred) (Table 2, Figure 1 A and B). The QTL detected on LG4 explained 11.5, and 13.3% of variance, respectively. The QTL on LG5 explained 6.9, and 8.3% of the variance of each index, respectively (Table 2). Two other QTLs were detected on LG8 and LG10 but had a strong evidence for BBI_res_norm_pred only. They explained 11.7% and 10% of variance of this index, respectively. Two QTLs were detected on LG8 and LG10 but had a strong evidence for BBI_res_norm_pred only. They explained 11.7% and 10% of variance of this index, respectively. For the autoregressive coefficient  ax the same regions along the genome were detected. Three QTLs were detected on LG4, LG 5 and LG 10 ( Figure 1C) but none with a strong evidence (Table 1). The QTLs on LG5 and LG10 colocalized with that of the BBI indexes (Figure 1). Four QTLs were mapped for g pred on LG4, LG5, LG8 and LG10 ( Figure  1D) and colocalized with BBI_res_norm_pred. QTLs on LG4 and LG8 explained 12%, and those on LG5 and LG10 explained 10% of the variance (Table 1). Five QTLs were detected for g glmm Ent , (Figure 1F), among which only that on LG9 had a strong evidence and explained 20% of the variance. No interaction between QTLs could be identified for BBI_res_norm_ax and  ax , whereas interactions were identified for BBI_res_norm_pred between QTLs on LG4 and LG8, and for g glmm

Ent
, between QTLs on LG7, LG9 and LG17, with 2-way and 3-way-interactions being significant, respectively (results not shown). Table 2 -Parameters associated with the QTL detected for BBI derived indices and autoregressive coefficients and entropy. The first column indicates the variable concerned, the following columns indicate the linkage group (LG) where the QTL is located, 2ln(BF) value at LG scale, 2ln(BF) value at bin scale, the position of the QTL in cM, the position of the QTL peak, its additive effect, the frequency of positive allele and percentage of variance explained, respectively.

Index
LG 2lnBF_LG

DISCUSSION AND CONCLUSIONS
In conclusion, three indices can be considered as key and complementary descriptors of the bearing behaviour of genotypes at either tree or axis scales: BBI_res_norm, and entropy. The first two are sufficient to classify the genotypes into regular, irregular and biennial classes. However, entropy allows this diagnostic to be refined by providing information on the within-tree strategy of regular genotypes, with potential consequences on tree management and breeding goals. The correlations between tree-and axis-scale descriptors suggest that biennial bearing at tree scale results from the conjunction of synchronism in flowering between AS in a given year and biennial alternation at AS scale between consecutive years. On the contrary, regularity at tree scale results from either asynchronous locally alternating flowering or regular flowering at AS scale. Irregular genotypes exhibit intermediate values for every descriptor, suggesting that these genotypes are characterised by partial biennial alternation at AS scale or strong biennial alternation with partial synchronism. However, more complex within-tree organisation of synchronisms could exist (Couranjou, 1983). We suggest that selecting genotypes with regular desynchronized axes could be an appropriate strategy.
The lower flowering probability at Montpellier than Angers may result from the absence of thinning practices on XB family, which may have hampered tree flowering capacity over years (Dennis & Neilsen, 1999). However, thinning was performed quite lately in Angers, due to a large dispersion of phenological stages among genotypes (Allard et al., 2016). This practice likely had a relatively low impact on floral induction which is assumed to occur mid-June in apical meristems of spurs (Hanke et al., 2007) and therefore on alternation. Even though the mean probability of flowering of all genotypes per site highlighted phase opposition in the last three years (data not shown), no clear characterisation of years as being 'ON' or 'OFF' could be made on the mean values per family. Although critical climatic conditions such as frost (Nagy et al., 2010) can synchronise trees in a given year and site, different genotypes in opposition phase for flowering can be observed in a given year.
The predicted BBI_res_norm and appeared robust based on QTL detection, but the classification error was still of 35%. The miss-classification mainly concerned the irregular genotypes, whereas the regular and biennial behaviours could be predicted with good accuracy. The misclassification of regular genotypes considered as irregular (15 over 36, see Table 1) could lead to discard them during the selection process. However, this type of error is less problematic than the reverse (selecting irregular genotype that would be misclassified as regular) especially if we consider the drastic reduction in the number of individual selected in the early stages of breeding process. Therefore, the simplified phenotyping strategy that consists in sampling axes within the tree structure with an a posteriori observation appears to be relevant. Indeed, this is a time-saving strategy for phenotyping that can be combined to computation of indices for rejecting biennial and irregular genotypes during the assessment of agronomic performance of pre-selected genotypes in breeding programmes.
In a breeding perspective, the three descriptors (i.e. BBI_res_norm_pred, pred and entropy) and five major QTLs on LG 4, 5, 8, 9 and 10 could be considered to ensure regular bearing behaviour with alleles adequately combined in new released materials. As underlined by Samach and Smith (2013), the evolutionary advantage of masting remains questionable. Our results suggest that flowering synchronicity at the tree level could not be associated with regularity, probably because it would lead to over-cropping and major drawbacks in an agronomic context. This study revealed a complex genetic architecture of flowering habit in apple. The overview of all loci involved in trait variation led us to assess promising individuals and progenitors (Durand et al., 2017). The pedigree-based approach used herein appears particularly relevant for plant species in which varieties are tightly related to each other, which is the case in most crops. In addition, the relative importance of loci and the cumulative effects of small loci should not be overlooked. In this perspective, genomic selection models would be complementary to QTL analyses to evaluate the genetic value of individuals.