# Simulation study to assess the performance of EM-AMMI for incomplete G×E data of mango (Mangifera indica) in presence of bienniality

## Simulation study to assess the performance of EM-AMMI for incomplete G×E data of mango (Mangifera indica) in presence of bienniality

Ram Kumar Choudhary^{1}
, Atmakuri Ramakrishna Rao^{2}
, Shiv Kumar Choudhary^{3}
, Shanti Bhushan^{3}
, Ashutosh Prasad Maurya^{4}
, Vinay Kumar Choudhary^{1}

^{1}Dr. Rajendra Prasad Central Agricultural University, Pusa, Samastipur, Bihar-848 125, India

^{2}ICAR-Indian Agricultural Statistics Research Institute, Library Avenue, New Delhi – 110 012, India

^{3}VKSCOA, Dumraon (Bihar Agricultural University, Sabour), Buxar, Bihar-802 136, India

^{4}National Information Centre (NIC), CGO Complex, Lodi Road, New Delhi -110 008, India

Corresponding Author Email: shiv_iasri@rediffmail.com

##### Abstract

Mango (*Mangifera indica*) is one of the important perennial fruit crops that exhibits bienniality and results in high economic loss to the growers. In the analysis of Genoype×Environment (G×E) data, it is noticed that few data points are missing. Due to incompleteness, G×E data becomes unbalanced which is a bit challenging task for the analysis of Genotype ´ Environment Interaction (GEI). The problem gate intensify when crop exihibits bienniality. Expectation Maximization-Additive Main Effect and Multiplicative Interaction (EM-AMMI) is used for imputing missing observations in incomplete G×E data. EM-AMMI involves iterative procedure for filling the missing values in such a manner that starting values do not affect the solution. In present study performance of EM-AMMI method for imputing missing values in incomplete G´E data has been assessed considering four different rates i.e. 5%, 10% 15% and 20% of missing observations in presence of bienniality using original data of mango for four centres as well as simulated data. Computer programme have been developed for generating 1000 data sets for simulation. Number of principal components for imputing missing data using AMMI model has been selected on the basis of Root Mean Square Predictive Distance (RMSPD). Lowest RMSPD with low dispersion was found for AMMI0 at Rewa as well Sangareaddy centre. Hence, AMMI0 was selected as parsimonious AMMI model. Box-plots have also been generated showing the distribution of 1000 correlation coefficients between ranks of genotypes obtained by yield, different stability measures and simultaneously selection of genotypes for yield and stability (SISGYS) indices. It has been found that EM-AMMI does not converge for higher rate of missing observations. Thus, EM-AMMI is recommended for imputation up to 10% of missing observations of G×E data in mango in presence of bienniality. The results have been confirmed by simulation study.

##### Keywords

**Introduction**

Mango (*Mangifera indica*) is one of the important perennial fruit crops that exhibit bienniality Choudhary [7]. It affects income of the growers badly. Genotypes are being assessed under multi-location for selection of best performing varieties which can minimise the income losses of the growers. The problems of analysis of incomplete G×E data gate intensify when crop exhibits bienniality. Whenever genotype-environment tables are incomplete the data are considered as unbalanced and the methods for analysis of balanced data no longer applicable for deriving the results. Under such situations, one should consider for compensations so that due assessment of genotypes is made and appropriate GEI effects can be derived. Compensation has to be made in estimating the genotype means for those environments in which the genotypes are missing. Patterson [13] used the method of fitting constants but suggested an adjustment in estimates of means of genotype which were absent in some environments. By introducing sensitivity parameter in Patterson’s model, the approach was modified by Digby [9]. Some adjustments in least square estimates were suggested by considering an unbalanced trial having seventy genotypes and ten locations gave estimates of genotypes and location effects by minimizing the weighted sum of squares of deviations from the expected values with weights proportional to the number of observations. Gauch and Zobel [11] suggested employing Expectation-Maximization (EM) algorithm to implement AMMI model for an incomplete two-way GEI data. In many important cases, including AMMI model, the EM algorithm is remarkably simple, both conceptually and computationally. Laxmi [12] proposed two methods of assigning weights to the observations in the two-way incomplete data. Bajpai [4] proposed to implement AMMI for an incomplete data by initializing the EM-AMMI’s additive parameters by Patterson’s FITCON technique. Gauch and Zobel [11] procedures. In all the cases the missing values are imputed and then the complete data was analyzed the methods given by Patterson [13] and Digby [9]. Choudhary [6] and Rao *et al.* [19] developed a simultaneous selection index, which can be safely used for simultaneous selection of genotype for both high yield and stability purposes under incomplete data situation. Index is more general and approaches to the index given by Bajpai and Prabhakaran [4] when data is complete. This index can therefore be used in both complete and incomplete data situations. They also asses the performance of index for increasing rate of missing observations and found that the developed index can be safely used up to 10% of missing observations.

Raju and Bhatia [16] estimated the sensitivity in incomplete G×E data and found the estimates are biased to zero in comparison to the usual method of single covariate case. Raju *et al. *[18] studied the varietal sensitivity while simultaneously dealing with incomplete data as well as random environments. Raju *et al.* [17] evaluated the different methods cited in literature for incomplete G×E data namely FTCON, modified regression and estimate of stability variance for incomplete tow-way table by Piepho [15]. They suggested to use stability variance safely even in the case of incomplete data because it is robust measures of stability of crop variety. Alarcon* et al.* [1] proposed an alternate deterministic algorithm using a modification of Gabriel cross-validation method for imputing missing data in MLTs. Rodrigues *et al.* [20] compares joint regression analysis and AMMI with particular focus on robustness with increasing amounts of randomly selected missing data. They found similar dominant genotypes by joint regression analysis and winner of meg-environments by AMMI for the same environment. However, joint regression analysis is more stable with the increase in the incidence rate of missing values. Alarcon *et al.* [3] proposed five new imputation methods for unbalanced multi-location trial data. The methods used cross-validation by eigenvector, based on an iterative scheme with Single Value Decomposition (SVD) of a matrix. The methods have been tested by simulation study using three set of original data of Peas, cotton and beans by deleting randomly 10% 20% and 40% observations from each matrix. The quality of imputation have been judged by AMMI using root mean squared predictive difference (RMSPD) between the genotypes and environment parameters of the original data set and set completed by imputation. The proposed method of imputation does not need any distributional or structural assumption. They considered four patterns of missing observation: one with cell missing completely at random and three patterns with cells not missing at random (block-diagonal pattern, diagonal pattern and block-diagonal pattern with checks). They concluded that there is significant interaction between the proportion of missing values, the number of principal components in AMMI model and the pattern of the missing cell on the basis of RMSPD. They also used RMSPD to select the parsimonious AMMI model for the imputation. Alarcon *et al. *[3] extended his work Alarcon *et al. *[1] by including weight chosen by cross-validation and allowing multiple as well as simple imputation. They also asses the three methods and compared in a simulation study. Alarcon *et al. *[2] in another paper compared the four algorithms of imputation with the gold standard EM-AMMI by simulation study based on three complete data sets of real data (eucalyptus, sugar cane and beans) for various imputation percentages. The methodology was compared with normalized RMSPD and the spearman correlation coefficient. They concluded that EM-SVD provides competitive results to those obtained with the gold standard. Paderewski [14] proposed an R-function for imputing missing values by EM-AMMI algorithm. This R-function also provides to check the repeatability of this algorithm as well as reliability of the imputation. Therefore, in this article, an attempt has been made to access the performance of EM-AMMI for incomplete G×E data of mango in presence of bienniality.

**Material and Methods**

** **Mango varieties tested at four locations namely, Rewa (Madhya Pradesh), Sangaready (Andhra Pradesh), Vengurla (Maharashtra) and Sabour (Bihar) over different years were collected from All India Coordinated Research Project on Sub-Tropical Fruits (AICRP-STF), Central Institute for Subtropical Horticulture (CISH), Lucknow. The Multi-Location Trials were conducted in Randomized Complete Block Design (RCBD) with four replications having two trees per replication. For present study, fruit yield per tree (kg.) has been considered for assessing the Performance of EM-AMMI using data set consisting of 16 varieties commonly tested at all the 4 centres for 9 years from 1997 to 2005.

The following three cases have been considered for the purpose of simulation:

Case 1: Years are considered as environments (*Temporal*) for all the centres separately (4 sets, each of 16´9 G×E data). Where 16 is the number of varieties tested over 9 years (where years were considered as environments). Thus each set consist of 144 data points.

Case 2: Combination of years and centres are considered as environments (16×36 G×E data).

Case 3: Data were averaged over nine years and thus the data set has 16×4 data.

*EM-AMMI for incomplete G**´**E data*

Implementation of AMMI requires that the two way table of interactions should be complete. But, the interaction for the missing cells in incomplete genotype × environment table is undefined. Gauch and Zobel [11] suggested employing Expectation and Maximization (EM) algorithm to implement AMMI model for an incomplete two-way table. They termed the so called missing data version of AMMI as “EM-AMMI”. In many important cases, including the AMMI model, the EM algorithm is remarkably simple, both conceptually and computationally. In essence, EM involves filling in missing values and iterating in such a manner that starting values do not affect the solution and hence are arbitrary and inconsequential, apart from some affect upon the number of iterations required for convergence.

The computations are as follows:

Steps 1: Initialize the missing values by grand mean plus the main effect of genotypes and environments in G×E matrix.

Step2: the parameters of parsimonious AMMI model are computed.

Step3: the adjusted means of AMMI model are re-calculated using new AMMI parameters.

Step 4: the values of missing observations are replaced by the new estimates.

Step 5: step 2 to 4 are repeated until convergence achieved.

A suitable implementation of the EM algorithm for EM-AMMI work as follows: First compute cell means for every cell with data were computed. Then initialize EM-AMMI’s additive parameters were initialized by computing the un-weighted genotype means, environment means, and grand mean. Then interaction residuals were initialized as usual for cells with data (namely, the interaction equals the cell mean minus the genotype mean minus the environment mean plus the grand mean), but for missing cells, residual of zero were imputed. Now, the interaction matrix has no unspecified cells and thus ordinary PCA calculations were used to solve EM-AMMI’s multiplicative parameters. It should be noted that missing cells are initialized here by the un-weighted additive model (since their interaction residuals are imputed by zero), but a still simpler initialization with the grand mean would lead to identical results, although requiring a somewhat larger number of iterations to reach convergence. Now, each missing cell is re-estimated and revised with the current EM-AMMI model. Then EM-AMMI was again fitted to these revised data, treating imputed values the same as actual data. This process was iterated until convergence was reached i.e., until the imputed values for missing cell shoed acceptable small changes.

Upon convergence, the EM-AMMI model “fits” the imputed cells perfectly with a residual of zero (within numerical precision), where as actual, data have finite residuals as usual. Hence the EM algorithm fits a model to the actual data, while ignoring missing cells in the sense that they receive imputed values that fit the model perfectly.

Here, the missing cells require that each EM-AMMI model be computed from scratch, without allowing the results from lower order models to be used. EM-AMMI models are indicated by the number of principal components employed for the imputation. For example, EM-AMMI0, EM-AMMI1, EM-AMMI3 and so forth. The EM-AMMI0 doesn’t employed multiplicative term and hence it is simply called EM.

*Simulation*

Model based simulation was used to generate the 1000 of data in the presence and absence of bienniality for each cases. Initially the bienniality has been estimated from the real data. Later on estimate were obtained after eliminating bienniality by taking moving average of two consecutive years. Since, the bienniality is removed by taking moving average of two consecutive years, the data set now has 16×8 (128) data points. The above obtained estimates of bienniality and different effect were then used to simulate data of required size. This simulation procedure has been followed to generate data under cases 1, 2, and 3 that too in the presence of bienniality as well as in the absence of bienniality. Hence, each set of simulated data with bienniality consists of 128 data point and after eliminating bienniality consists of 112 data points in each set. All together it constitutes (4+1+1) *1000*2=12,000 data sets.

1000 incomplete data sets were generated by simulation for four different level of missing data situations, with 5%, 10%, 15% and 20% of randomly missing observations. Each incomplete data set was imputed by EM-AMMI for two cases: first case is named as “with bienniality” (WB) and second is named as “without bienniality” (WOB). R-code has been written for performing the imputation by EM-AMMI using R-software [21]. Box-plots have been generated showing the correlation between ranks of the genotypes from original data set and imputed data set for yield, stability and SISGYS indices using SAS.

*R –code for imputation by EM-AMMI*

——————————————————————————————————-

Source (“D:/EM_AMMI_Rcode.txt”)

mis5<-as.matrix(read.table(“clipboard”, header=F))

s<-seq(from=72,to=144,by=72)

s1<-c(1,s[-2]+1)

s2<-s

for(i in 1:2)

{

set1<-mis5[s1[i]:s2[i],1];

x=matrix(c(set1),nrow=8,ncol=9,byrow=TRUE);

result<-EM.AMMI(x,PC.nb=6, initial.values=NA, precision=0.01, max.iter=1000);

RM<-CV.LOO(x,PC.nb=0:6,MNO=4)$RMSPD;

x.imp<-result$X;

row.names(x.imp)= c(1:8);

colnames(x.imp) = c(1:9);

d5_1 <- data.frame(i=rep(colnames(t(x.imp)),each=nrow(t(x.imp))),j=rep(row.names(t(x.imp)),ncol(t(x.imp))),

score=as.vector(t(x.imp)));

print(d5_1,append=T);

print(RM,append=T)

}

—————————————————————————————————————–

*Leave one-out-cross validation*

Leave one-out-cross validation is a procedure to select the optimum number of principal components for the imputation by EM-AMMI method based on root mean square predictive difference (RMSPD) (Gauch and Zobel [11]; Dias and Krzanowski, [8]. The principal components with least RMSPD can be chosen as optimum number of principal components for the purpose of imputation by EM-AMMI. In this procedure, a single observation from original data set is hidden before running the EM-AMMI and used for further validation. The imputation is done by the EM-AMMI procedure based on the training data set, which comprises of data set without this single observation and without all the original missing values. This procedure is repeated in turn for all the available observations in the incomplete data set. The differences between the hidden value and the imputed value by EM-AMMI called as ‘*the *predictive difference’ are squared, averaged, and further taken square root to RMSPD. Gauch and Zobel [11] illustrated this concept on a soybean yield trial data.

In the event of presence of bienniality in MLTs data conducted over years, the selection of genotypes is questionable as it may mislead the ranking of genotypes based on stability performance as well as from the performance of simultaneous selection (SIS) indices. Thus, an empirical procedure was followed to show that the effect of bienniality on the selection of genotypes. For this, initially ignoring the presence of bienniality and following the usual selection strategies the genotypes were ranked based on yield, Shukla’s stability variance, ASTAB_{i}, index-1 and index-2. Later on the bienniality was tested and removed by taking moving average of two consecutive years. This refined data set also referred as MLTs data without bienniality (WOB). This data set was further used to rank the genotypes by the above mentioned yield stability statistics and SIS indices. The rank correlations were estimated between the ranks given to the genotypes under with bienniality (WB) and WOB situations. This procedure was repeated over all the simulated data sets under three cases described in material and methods. The estimates and standard error of rank correlations between WB and WOB data sets were computed.

*Incomplete MLTs data situation*

Under incomplete MLT’s conducted over years, the effect of bienniality was assessed for selecting genotypes in mango. Initially, a complete WB set was taken (kept as reference data) and then genotypes were ranked based on yield performance, stability measures and SIS indices (situation-1). Randomly missing observations were created and later on imputed by EM-AMMI method. This was further subjected to rank the genotypes for selection purposes (situation-2). The bienniality was removed by taking moving average (WOB) and then genotypes were ranked (situation-3). Also, the data under situation-2 was taken subjected to removal of bienniality. This leads to the creation of missing data WOB. EM-AMMI has been applied on the missing data WOB to impute the missing observation (situation-4). Subsequently, the genotypes were ranked based on different stability measures and indices. The rank correlations between the genotypes ranked under situations (1) and (2), (3) vs (4) have been worked out.

*Effect of bienniality on selection of genotypes under incomplete genotype × environment (G×E) data situation*

** **In real data situation, some of the observations are found to be missing in G×E data of Multi-Location Trials conducted over years. This missing observations result in incomplete G×E data situation. Application of AMMI model for studying GEI, stability analysis and SISGYS needs complete G×E data. Thus, in order to get complete G×E data from incomplete G×E data following procedures, in general are followed: (i) to delete the corresponding genotype from the analysis for which observations are missing, (ii) to delete the corresponding environments for which observations are missing and (iii) to impute the missing observations in order to make complete G×E data. However, the procedure-1 and procedure-2 leads to loss of information thus, in the present investigation situation (iii) has been opted. EM-AMMI has been assessed for performance under four rates of missing observations simulated through random deletions from the complete data set and that too in presence of bienniality.

As explained in materials and methods, different situation, *viz*., situation-1, situation-2 and situation-3 and situation-4 were created and the genotypes were ranked under different rates of missing observation (5%, 10%, 15%, and 20%)

In the process of eliminating bienniality from the data by taking moving average of two consecutive years/environments, the values corresponding to the moving average involving missing observations are treated as missing. For example, response of 2^{nd} genotype in 3^{rd} year is missing then while eliminating bienniality the moving average of years 2 and 3 as well as years 3 and 4 will be treated as missing for the genotype under question. Due to this process number of missing observations becomes approximately doubled as compared t to the number of missing observation in the incomplete G×E. data barring exceptions for the missing of first and last observation or consecutive observation in the given row of a 2-way table. The correlations were estimated under simulations, are worked out. Correlations close to one are considered as efficent method of impuation as far as perforamenc of stability and SISGYS is concerned.

** Results and Discussion**

** **Imputation of missing observations in incomplete G×E data was performed by EM-AMMI algorithm as described in materials and methods section. Performance of stability measures (1.Yield , 2. ASTABi value, 3. ASTABi, 5. Shukla’s stability value, 6. Shukla’s stability rank and SISGYS indices (7. Index-1 value, 8. Index-1rank value, 9. Index-2 value and 10 Index-2 rank) have been assessed based on correlation between ranks of genotypes obtained between imputed and complete data set for different situations-1 to situation-4. Box-plot of RMSPD distribution to select the parsimonious AMMI model to be used for EM-AMMI has been plotted and presented through Fig.1 and Fig.4 for Rewa and Sangareddy centres, respectively. Since, lowest RMSPD with low dispersion was found for AMMI0 under both the centres, AMMI0 was selected as parsimonious AMMI model. Accordingly, EM-AMMI0 has been used to impute the missing observations at both the centres under different rates of incompleteness. It can be noticed that distribution of RMSPD for Rewa and Sangareddy does not follow similar trend. The increase in RMSPD with increase in the number of principal components in the AMMI model may be due to over fitting of signal and decrease in RMSPD may be due to under fitting of noise. Thus, choosing optimum number of principal component(s) to be included in EM-AMMI is very important to capture the variation.

Table 1. Correlation coefficients of yield, stability measures, SISGYS and their ranks between before and after imputation by EM-AMMI in original data for three cases | ||||||||||

Rate of missing | Yield | Yield rank | Stability value | Stability rank | Sulkla’s Stability value | Sulkla’s Stability rank | Index value | Index rank | Bajpai index value | Bajpai index rank |

Rewa | ||||||||||

5% | 0.880 | 0.888 | 0.496 | 0.868 | 0.303 | 0.865 | 0.960 | 0.956 | 0.931 | 0.915 |

10% | 0.700 | 0.685 | 0.818 | 0.724 | 0.786 | 0.776 | 0.847 | 0.435 | 0.976 | -0.238 |

15% | 0.864 | 0.818 | 0.918 | 0.826 | 0.966 | 0.835 | 0.538 | 0.344 | 0.989 | 0.791 |

20% | 0.224 | 0.600 | 0.202 | 0.344 | -0.014 | 0.344 | 0.700 | 0.691 | 0.897 | 0.753 |

Sangaredy | ||||||||||

5% | 0.782 | 0.794 | 0.615 | 0.815 | 0.553 | 0.824 | 0.705 | 0.815 | 0.433 | 0.732 |

10% | 0.674 | 0.556 | 0.845 | 0.797 | 0.778 | 0.738 | 0.698 | 0.668 | 0.390 | 0.474 |

15% | 0.759 | 0.726 | 0.479 | 0.429 | 0.412 | 0.462 | 0.613 | 0.668 | 0.762 | 0.526 |

20% | 0.773 | 0.812 | 0.840 | 0.885 | 0.933 | 0.932 | 0.782 | 0.641 | 0.769 | 0.885 |

Case 2 | ||||||||||

5% | 0.993 | 0.953 | 0.987 | 0.906 | 0.994 | 0.956 | 0.945 | 0.900 | 0.873 | 0.832 |

10% | 0.994 | 0.974 | 0.996 | 0.950 | 0.997 | 0.962 | 0.990 | 0.959 | 0.957 | 0.953 |

15% | 0.842 | 0.829 | 0.424 | 0.635 | 0.402 | 0.788 | 0.161 | 0.918 | 0.494 | 0.944 |

20% | 0.983 | 0.915 | 0.995 | 0.985 | 0.997 | 0.985 | 0.971 | 0.974 | 0.947 | 0.944 |

Case3. | ||||||||||

5% | 0.989 | 0.991 | 0.992 | 0.962 | 0.983 | 0.882 | 0.973 | 0.971 | 0.783 | 0.924 |

10% | 0.982 | 0.953 | 0.986 | 0.897 | 0.990 | 0.888 | 0.859 | 0.832 | 0.426 | 0.918 |

15% | 0.962 | 0.897 | 0.956 | 0.909 | 0.963 | 0.900 | 0.839 | 0.921 | 0.849 | 0.815 |

20% | 0.944 | 0.891 | 0.956 | 0.844 | 0.966 | 0.750 | 0.729 | 0.847 | 0.172 | 0.576 |

Correlation coefficients of yield, stability measures, SISGYS and their ranks between before and after imputation by EM-AMMI in original data for all the three cases have been presented in Table 1. It clearly shows the deviation in correlation from 1 that indicated the effect of biennility on ranking of the genotypes in all the cases for both stability measures and SISGS indices. It has also been observed that effect are more in case 1 with respect to case 2 and case 3. Due to averaging the yeild over year supress the effect of biennilaity in case 3.

The box-plots of distribution of correlation coefficient for situation-1 and 2 (Fig. 2) and for situation-3 and 4 (Fig.5) show high correlation with less dispersion for 5 % and 10% of missing observations. With these results it can be concluded that up to 10% of missing observations EM-AMMI can be used safely for the imputation. The correlation values are departed from one which indicated changes in rank of genotypes. Besides, it is recommended that imputation of missing observation must be done after eliminating bienniality, which is a correct procedure. By doing this effect of bienniality would be properly eliminated. Performance of stability measures and SISGYS indices were found to be similar except ranking based on yield which is least affected. For the Sangareddy centre, the distribution of correlation coefficient between situation-1 and 2 is shown through box-plot in Fig. 5 for 5%, 10% and 15% rate of missing observations, whereas the distribution of correlation coefficient between situation-3 and 4 is shown in Fig.6 for 5% and 10% rate of missing observations. For higher rate of missing observations EM-AMMI does not converge for imputation and no distribution of correlation coefficient was shown. Fig. 5 shows the box-plot for 5%, 10% and 15% only whereas Fig. 6 depicted for only 5% and 10 % since EM-AMMI got terminated after that due to large number of missing observations. Fig. 7 and 8 depict the box-plot distribution of RMSPD and correlation coefficient for different measures of yield, stability and indices for case-2. In case 2, it has been found that up to 7 principal components EM-AMMI gets conversed to 100%. Thus, box-plot distribution of RMSPD up to AMMI7 has been plotted in Fig.7 for all the rates of missing observations. In Fig. 7 AMMI3 was found to be parsimonious model for all the four rates of missing observations since, AMMI3 has lowest RMSPD with low dispersion. Hence, EM-AMMI3 has been used for the imputation of missing observations in case 2.

**Fig. 2** Box plot showing of the distribution of correlations between situation 1 and 2 for different measures between imputed data by EM-AMMI WB and original data WB at Rewa centre; (1 to 10 on x-axis); 1:Yield value, 2 : Yield rank, 3 : ASTABi value 4 : ASTABi rank 5 : Shukla’s stability value 6 : Shukla’s stability rank 7 : Index 1 value 8 : Index 1 rank 9 : Index 2 value 10 : Index 2 rank

**Fig. 3** Box plot showing of the distribution of correlations between situation 1 and 2 for different measures between imputed data by EM-AMMI WOB and original data WOB at Rewa centre; (1 to 10 on x-axis); 1:Yield value, 2 : Yield rank, 3 : ASTABi value 4 : ASTABi rank 5 : Shukla’s stability value 6 : Shukla’s stability rank 7 : Index 1 value 8 : Index 1 rank 9 : Index 2 value 10 : Index 2 rank

**Fig. 5** Box plot showing of the distribution of correlations between situation 1 and 2 for different measures between imputed data by EM-AMMI WB and original data WB at Sangareddy centre; (1 to 10 on x-axis); 1:Yield value, 2 : Yield rank, 3 : ASTABi value 4 : ASTABi rank 5 : Shukla’s stability value 6 : Shukla’s stability rank 7 : Index 1 value 8 : Index 1 rank 9 : Index 2 value 10 : Index 2 rank

**Fig. 6** Box plot showing of the distribution of correlations between situation 3 and 4 different measures between imputed data by EM-AMMI WOB and original data WOB at Sangareddy centre; (1 to 10 on x-axis); 1:Yield value, 2 : Yield rank, 3 : ASTABi value 4 : ASTABi rank 5 : Shukla’s stability value 6 : Shukla’s stability rank 7 : Index 1 value 8 : Index 1 rank 9 : Index 2 value 10 : Index 2 rank

**Fig. 7** Box-plot showing distribution of RMSPD for case 2; a, b, c, and d represents for 5%, 10, 15% and 20% missing observations, respectively.

**Fig 8 **Box plot showing of the distribution of correlations between situation 1 and 2 for different measures between imputed data by EM-AMMI WB and original data WB for case 2; (1 to 10 on x-axis); 1:Yield value, 2 : Yield rank, 3 : ASTABi value 4 : ASTABi rank 5 : Shukla’s stability value 6 : Shukla’s stability rank 7 : Index 1 value 8 : Index 1 rank 9 : Index 2 value 10 : Index 2 rank

**Conclusion**

EM-AMMI0 was found to be parsimonious imputation model based on RMSPD by simulation study for Rewa as well Sangareddy centre. The comparisons between imputed data WB *vs* original data WB and imputed data WOB vs original data WOB shows up to 10% of missing observations EM-AMMI0 can safely be used for the imputation at both the centres. There was significant effect on ranking between imputation WB and imputation WOB, due to presence of bienniality. Thus it can be recommended that imputation must be performed after elimination bienniality. Other comparisons were not good for comparison. For case 2 EM-AMMI3 was found to be parsimonious model for the imputation. Very low to moderate change were observed in ranking by all most all the indices. This indicates very good imputation has been done by EM-AMMI3 in case 3 up 10% of missing observations. The results based on RMSPD obtained by real data set were not same. EM-AMMI was found to be more robust against increasing rates of missing observations than FITCON.

**Aknowledgement**

We are highly obliged to the ICAR-Indian Agricultural Statistical Research Institute, New Delhi for providing facilities for carried out my research work. We are also thankful to the RPCAU, Pusa, Bihar for providing financial support and All India Coordinated Research Project on Sub-Tropical Fruits (AICRP-STF), CISH, Lucknowproviding data**.**

**Refrences**

- Alarcon, S.A., Pena, M.G., Dias, C.T.S. and Krzanowski, W (2010). An alternative methodology for imputing missing data in trials with genotype-by-environment interaction.
*Biometrical Letters*,**47**( 1), 1-14. - Alarcon, S.A., Garcia-Peña, M., Krzanowski, W., Dias, C.T.S. (2014). Imputing missing values in multi-environment trials using the singular value decomposition, An empirical comparison.
*Communications in Biometry and Crop Science*,**9 (2),**54–70. - Alarcon, S.A., Garcia-Pena, M.,Krzanowski, W., and Dias, C.T.S. (2013).Deterministic Imputation in Multienvironment Trials.
*Hindawi Publishing Corporation, ISRN Agronomy*, Article ID 978780, 17.*http,//dx.doi.org/10.1155/2013/978780**.* - Bajpai, P. K. and Prabhakaran, V. T. (2000).A new procedure of simultaneous selection for high yielding and stable crop genotypes.
*Indian Journal of Genetics*,**60**(2),141-146. - Barrit B.H., Konishi B. and Dilley M. (1997).Tree size, yield and biennial bearing relationships with 40 apple rootstocks and three scion cultivars.
*ActaHorticulturae*,**451,**105-112. - Choudhary, S. K. (2006). Statistical investigation on simultaneous selection of genotypes for yield and stability under incomplete data of groundnut.M.Sc. thesis, Indian Agricultural Research Institute, New Delhi
*,*37-42. - Choudhary, R.K., Rao, A.R, Wahi, S.D. and Misra, A.K. (2015). Detection of biennial rhythm and estimation of repeatability in mango (
*Mangiferaindica*L.),*Indian Journal Genetics*.,**76(1),**In press. - Dias, C., and Krzanowski, W. J. (2003). Model selection and cross validation in additive main effect and multiplicative interaction models.
*Crop Science*,**43**, 865-873 - Digby, P. G. N. (1979). Modified joint regression analysis for incomplete variety × environment data.
*Journal of Agricultural Sciences*,**93**, 81-86. - Finlay, K. W., and Wilkinson, G. N. ( 1963). The Analysis of adaptation in plant breeding program.
*Aust. Jour. Agri.Res*.**14**,742-54 - Gauch, H.G. and Zobel, R.W. (1990).Imputing missing yield trial data.
*Theoretical and Applied Genetics,***79**, 753-761. - Laxmi, R. R., (1992). Genotype-Environment Interaction, Its role in stability of crop varieties. Unpublished Ph.Dthesis . P.G. School IARI, New Delhi.
- Paderewski, J. (2013). An R function for imputation of missing cells in two-way data sets by EM-AMMI algorithm.
*Communications in Biometry and Crop Science*,**8**(2), 60–69. - Patterson, H.D
*.*(1978). Routine least squares estimation fo variety means in incomplete tables.*Jour. Nat. Inst. Ag. Bot.,***14,**401-4013. - Piepho, H. P. (1994). Best linear unbiased prediction (BLUP) for regional yield trials, a comparison to additive main effects and multiplicative interaction (AMMI) analysis.
*Theor.Appli.Genet*.**89**, 647-54. - Raju, B.M.K. and Bhatia, V.K. (2003).Bias in the estimates of sensitivity from incomplete G X E tables.
*Jour. Ind. Soc. Agril. Statist*.,**56**(2) , 177-189. - Raju, BMK, Bhatia, VK and Bhar, LM (2009). Assessing stability of crop varieties with incomplete data.
*Jour. Ind. Soc. Agril. Statist*.,**63**(2), 139-150 - Raju,B.M.K., Bhatia, V.K. and Kumar, V.V.S. (2006). Assessment of Sensitivity with Incomplete Data.
*Jour. Ind. Soc. Agril. Statist*.,**60**(2) , 118-0125. - Rao, A.R., Choudhary, S.K., Wahi, S.D. and Prabhakaran,V.T. (2010). An index for simultaneous selection of genotypes for high yield and stability under incomplete genotype x environment data.
*Ind. Jour. Genetics*,**70**(1), 80-84. - Rodrigues, P.C., Pereira, D.G.S., Mexia, J.T. (2011). A comparison between Joint Regression Analysis and the Additive Main and Multiplicative Interaction model, the robustness with increasing amounts of missing data.
*Sci. Agric.*,**68(6),**679-686. - R Development Core Team. (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL
*http://www.R-project.org.*