# Simulation study for assessment of FITCON in the presence of bienniality in MLT data of mango (Mangifera indica)

## Simulation study for assessment of FITCON in the presence of bienniality in MLT data of mango (Mangifera indica)

Ram Kumar Choudhary^{1}
, Atmakuri Ramakrishna Rao^{2}
, Shiv Kumar Choudhary^{3}
, Shanti Bhushan^{3}
, Ashutosh Prasad Mauryar^{4}
, Chandra Shekhar Choudhary^{5}

^{1}Dr. Rajendra Prasad Central Agricultural University, Pusa, Samastipur, Bihar-848 125, India

^{2}ICAR-Indian Agricultural Statistics Research Institute, Library Avenue, New Delhi – 110 012, India

^{3}Bihar Agricultural University, Sabour, Bhagalpur, Bihar-843 210, India

^{4}National Information Centre (NIC), CGO Complex, Lodi Road, New Delhi -110008, India

^{5}Dr. Rajendra Prasad Central Agricultural University, Pusa, Samastipur, Bihar-848 125, India

Corresponding Author Email: shiv_iasri@rediffmail.com

**DOI : ** http://dx.doi. org/10.53709/CHE.2019.v0s1is2.0010

##### Abstract

Mango is an important perennial fruit crop of India which exhibits bienniality in fruiting. In the analysis of Multi-Location Trials (MLTs) data, it is noticed that a few entries in the genotype-environmental table are missing. Such missing values leads to incomplete data in genotype × environment analysis. Due to incompleteness, MLT data becomes unbalanced data which is a bit challenging task for the analysis of Genotype ´ Environment Interaction (GEI). This challenge gates intensified when crop exhibits bienniality like mango fruit crop. Fitting constant (FITCON) is being used for imputing missing observations in incomplete MLT data. In present study performance of FITCON method has been assessed considering different rates of missing observations in the presence of bienniality on the basis of ranking of genotypes based on different stability parameters as well as simultaneous selection indices in mango crop. MLTs data on mango fruit yield has been collected from All India co-ordinated research of Tropical and sub-tropical fruits, (CISH) Lucknow for 16 genotypes for fourteen years i.e. from 1990 to 2005. Box plot of distribution correlation coefficient yield and different measures of stability as well as indices for simultaneous selection for yield and stability (SISGYS) has been used to assess the performance of FITCON on simulated data under four different empirical situations. It has been found that FITCON terminated after few iteration when imputation was done for more that 10% of missing observations after eliminating the bienniality. Thus, FITCON is recommended for imputation up to 10% of missing observations of MLT data in mango.

##### Keywords

**Introduction**

Mango (*Mangifera indica* *L*.) is one of the important perennial fruit crops grown in many tropical and subtropical countries [1-4]. Mango fruit is also called as ‘King of fruits’ in India due to its sweetness, richness of taste, huge variability, large production volume and variety of usages like pickles, juice, chutney etc. India is the largest producer of mango in the world, with an annual production of 21.82 million tons from an area of 2.25 million hectares (National Horticulture Board, 2017-18)contributing about 56% of the total world production. Large number of varieties of mango exists in India. Mango exhibits biennial rhythm in fruiting [5-7]. Due to this behaviour growers are facing economic loss during ‘off’ year with poor yield or no yields and the selling-off the heavy yield at low price during ‘on’ year due to oversupply in the market.

When combined analysis of Multi-Location Trials (MLTs) data is done, it is noticed that quite a few entries in the genotype-environmental table are missing. Such missing values leads to incomplete data in genotype × environment analysis. Incomplete data are primarily due to the result of few genotypes having not been tested in all the environments due to constraints like, insufficient seed, failure of planting, non-germination and pest & disease attack. Also, in some MLTs the entries under test may be discontinued and new entries may be added while testing the performance of genotypes. Due to incompleteness, MLTs data becomes unbalanced data which is a bit challenging task for the analysis of GEI. The problem of analysis of unbalanced MLTs data gates intensified when crop exhibits bienniality. Fitting constant (FITCON) analysis and ‘modified regression’ procedure given by [8]9] respectively are the two commonly used procedures for imputing missing observations in incomplete MLTs data. [10-14] applied the above method of modified regression analysis in the form of “augmented FITCON” for estimating yield sensitivity of wheat varieties and recommended list trial involving incomplete variety-centre data. The performance of FITCON needs to be assessed for different rates of missing observations in the presence of bienniality. Therefore, in this paper performance of FITCON has been assessed for incomplete MLTs data in presence of bienniality. Assessment has been carried out on the basis distribution of correlation coefficient of ranking of the genotypes obtained by different stability parameters as well as SISGYS indices between complete simulated data and imputed simulated data in mango fruit crop.

** ****Material and Methods**

If the yield of some of the genotypes is not available then the orthogonality of the original design disappears and bias is introduced in the varietal means. The comparisons based on the biased means favour the varieties which happen to be exposed to better environmental conditions than average environmental conditions. To avoid this compensation has to be made in the varietal means for environments in which particular varieties were not present. This is done by the least squares procedure which is obtained below:

* Y _{ij} = μ_{i} + e_{j} + ε_{ij }*(1)

and by regressing the existing *Y _{ij}*‘s on the estimated

*e*s , the estimates of

_{j}‘*b*(

_{i}*i*= 1, 2, 3, …,

*t*), the linear sensitivities of the individual genotypes were obtained. In model (1),

*μ*is the mean of the

_{i}*i*variety,

^{th}*e*the effect of the

_{j}*j*environment (

^{th}*j*= 1, 2, …, s) and

*ε*the random error, distributed with mean zero and a constant variance. For estimating the parameters

_{ij}*μ*and

_{i}*e*, the residual sum of squares was minimized:

_{j}with respect to *μ _{i}* and

*e*, noting that the weight δ

_{j}_{ij}is introduced to obtain the incomplete data set-up are such that

δ_{ij} = 1 if Y_{ij} is present in the data

= 0 if Y_{ij} is missing.

Differentiating residual sum of squares with respect to *μ _{i}* and

*e*, the normal equations become

_{j}(2)

(3)

These normal equations were solved subjected to the constraint

(4)

From equation (3.42), (3.43) and (3.44) the estimates of *μ _{i}*‘s and

*e*‘s can be obtained as

_{j}(5)

, (6)

is present and 0 if absent

where and are the means based on the existing *n _{i}* and

*n*observations for the

_{j}*i*variety and

^{th}*j*environment respectively and the adjustment for these estimators depend on each other’s final estimates. Here j*’s denote the environments where the

^{th}*i*variety is absent. The iteration procedure starts by considering the trial value for in Eq. (6) giving rise to a set of

^{th}*e*values. Substituting these values in (5) revised estimates of

_{j}*e*are obtained. These are then substituted in (6) to get the revised estimates of

_{j}*e*. This procedure is continued till there is a convergence in the value of

_{j}*e*. The iterative process usually converges in a small number of iterations. Using the final set of

_{j}*e*values (), the estimate of

_{j}*μ*can be obtained from the equation

_{i}(7)

Initially, a complete data set with bienniality (WB) was taken (kept as reference data) and then genotypes were ranked based on yield performance, stability measures and SIS indices named as situation-1. Randomly missing observations were created and later on imputed by FITCON methods. This was further subjected to rank the genotypes for selection purposes considered as situation-2. In situation-3 bienniality was removed (WOB) by using method given by [15] and then genotypes were ranked . Also, the data under situation-2 was taken subjected to removal of bienniality. This leads to the creation of missing data WOB. FITCON were applied on the missing data WOB to impute the missing observation in situation 4). Subsequently, the genotypes were ranked based on 1.Yield, 2. ASTABi value [16] 3. ASTABi rank, 5. Shukla’s stability value [17] 6. Shukla’s stability rank, 7. Index 1 value[18], 8. Index 1 rank, 9. Index 2 value[19], and 10. Index 2 rank. The rank correlations between the genotypes ranked under situations-1 and situation -2, and Situation-3 and situation-4 have been worked out on simulated data. 1000 data sets have been generated by developing code in My-SQL. R-code was written for imputing simulated 1000 data sets using FITCON. Then SAS code were written for plotting Box-plot of correlation co-efficient of ranking of genotypes obtained by yield, and diffrent stability measures as well as SISGYS between different situations.

It is worth mentioning that the performances of measures as well as effect of bienniality were studied under different levels (percentage) of missing observations in the data. Hence, FITCON has been assessed for performance under four different rates of missing observation (5%, 10%, 15%, and 20%) of missing observations simulated through random deletions from the complete data set and that too in presence of bienniality.

In the process of eliminating bienniality from the data by taking moving average of two consecutive years/environments, the values corresponding to the moving average involving missing observations are treated as missing. Due to this process number of missing observations becomes approximately doubled as compared to the number of missing observation in the incomplete MLTs data barring exceptions for the missing of first and last observation or consecutive observation in the given row of a 2-way table.

**MY-SQL code for simulation**

**Stored Procedure for creation of random values**

CREATE PROCEDURE [dbo].[rkc_ijkl_random]

AS

declare @loop_i INT

declare @loop_jint

declare @loop_kint

declare @loop_lint

declare @val_ijk numeric(8,2)

declare @val numeric(8,2)

declare @rdm1 numeric(8,2)

declare @count1 int

declare @loop int

declare @final1 numeric(8,2)

SET @loop_i = 1

SET @loop_j = 1

SET @loop_k = 1

set @count1 = 1

set @loop = 1

WHILE @loop <= 1000

BEGIN

set @count1 = 1

while @count1 <= 2048

begin

set @rdm1= (CAST(SQRT(-2*LOG(RAND()))*COS(2*PI()*RAND(CHECKSUM(NEWID())))as decimal(5,2)))

……………..

…………………..

…………………

end

**Stored Procedure for generation of values with help of Random values**

CREATE PROCEDURE [dbo].[rkc_ijkl_random_final_val]

AS

declare @loop_i INT

declare @loop_jint

declare @loop_kint

declare @loop_lint

declare @val_ijk numeric(8,2)

declare @val numeric(8,2)

declare @rdm1 numeric(8,2)

declare @count1 int

declare @set1 int

declare @final1 numeric(8,2)

declare @temp1 numeric(8,2)

declare @temp2 numeric(8,2)

set @set1=1

while @set1 <= 1000

………..

………..

………..

set @temp2 = (select (mu+gi+ej+yk_j+geij+gyikj+rikj+eijkl) from rkc_data_final222 where set1=@set1 and count1=@count1)

set @val = @temp1 + @temp2

insert into final_data2304_1000(ii,jj,kk,ll,final,count1,set1) values(@loop_i,@loop_j,@loop_k,@loop_l,@val,@count1,@set1)

………….

………..

……….set @loop_i=@loop_i+1

end

set @set1=@set1+1

end

__/*R-code for imputation by Fitcon*/__

setwd(“D:/ramfitcon”)

data<-as.matrix(read.table(“miss_sim128_100.txt”, header=T))

s<-seq(from=128,to=64000,by=128)

s1<-c(1,s[-500]+1)

s2<-s

for(j in 1:500)

{

set1<-data[s1[j]:s2[j],1]

y<-1:128;

z<-sample(y,13)

set1[z]<-NA

x=matrix(set1,nrow=16,ncol=8,byrow=TRUE);

nr <- nrow(x)

nc <- ncol(x)

rm <- rowMeans(x, na.rm=T)

cm <- colMeans(x, na.rm=T)

Ex <- cm

Mx <- rm

#est <- matrix(0,nrow=nr, ncol=4)

E2 <- 0

M2 <- 0

E0 <- Ex

for(i in 1:100){

NAR <- function(s){z <- sum(E0[which(is.na(s)==T)])/sum(!is.na(s));z}

M1 <- Mx+apply(x,1,NAR)

NAC <- function(s){z <- sum(M1[which(!is.na(s)==T)])/sum(!is.na(s));z}

E1 <- Ex-apply(x,2,NAC)

de <- sum(abs(E1-E2))

dm <- sum(abs(M1-M2))

if(de <= nc*0.001 && dm <= nr*0.001)

{

print(i)

est1 <- cbind(E2,E1,M2,M1)

break()

}

else{

E0 <- E1

E2 <- E1

M2 <- M1

}

}

E <- E2

M <- M2

for( i in 1:nc){ z <- which(is.na(x[,i])==T); x[z,i] <- E[i]+M[z]}

x.imp<-x

row.names(x.imp)= c(1:16)

colnames(x.imp) = c(1:8)

imp<-data.frame(i=rep(colnames(t(x.imp)),each=nrow(t(x.imp))),

j=rep(row.names(t(x.imp)), ncol(t (x.imp))),yld=as.vector(t(x.imp)))

yij<-list(imp)

se<-list(set1)

J<-list(j)

out<-c(J, se, yij)

write.table(out, file=”impfitcon15.csv”, append=TRUE)

}

__/ *SAS code for ranking and boxplot for simulated correlations */__

data ram1;

proc import datafile= “<path>”

out=s32 dbms=xlsx;

run;

prociml;

use s32;

read all into s;

aa=1;

aa1=1;

m=1000; /* number of iterations required*/

…………….

……………..

……………..

y=shape(yld,loc,var);

y=y`;

y=shape(yld,var,loc);

………….

…………….

…………..

a1=a[+,];

w=diag(x*x`);

msge=a1/((var-1)*(loc-1));

do i=1 to var;

sig2=((w[i,i]*var)/((loc-1)*(var-2)))-(msge/(var-2));

sig4=sig4//sig2;

sig5=1/sig2;

sig6=sig6//sig5;

end;

………………

……………

…………..

do i=1 to var;

e=1/sqrt((ssq(d[i,])));

f=f//e;

end;

e3=(1/f);

f4=rank(e3);

e1=f[+,]/var;

do

…………….

……………

……………

yldrank=yldrank||rank1;

indexval=indexval||index;

rankindex=rankindex||rank;

stabval=stabval||stability;

stabrank=stabrank||rank2;

bajval=bajval||baj1;

bajrank=bajrank||rank3;

sig4val=sig4val||sig4;

sig4rank=sig4rank||rank4;

FREE geno;

free f3;

free index;

free stability;

free f1;

free f;

free baj1;

free sig4;

free sig6;

end;

/*—————————————————-*/

yldvalcorr=corr(yldval);

yldcorr=corr(yldrank);

corrindval=corr(indexval);

corrindex=corr(rankindex);

……………..

………..

…………

plot xx2*xx1 /

boxstyle = schematic

nohlabel;

insetgroup min max nhighnlownout/

header = ‘Extremes rank correlation’;

/* legend1 label = (‘Cancellations’);*/

label xx2=’xx2 rank correlation’;

label xx1 = ‘xx1 indeices and ranks’;

run;

============================================================

**Results and Discussion**

Fitting constant (FITCON) has been used to impute four rates i.e. 5%, 10%, 15%, and 20% of missing observations on simulated data by running R-code. Influence of missing observation on yield and different stability measures as well SISGYS indices (1.Yield , 2. ASTABi value, 3. ASTABi, 5. Shukla’s stability value, 6. Shukla’s stability rank, 7. Index-1 value, 8. Index-1 value, 9. Index-2 value and 10 Index-2 rank ) were assessed based on correlation between situation-1 and situation-2 by FITCON method. Box plots (Gabriel, 1971). have been given for all the rates of missing observations for yield, stability measures and SISGYS indices and depicted in Fig. 1 and Fig. 2 for Sangareddy centre. Fig. 1 depicts the distribution of correlation between original simulated data with bienniality (situation-1) and imputed by FITCON simulated data with bienniality (situation-2) for Sangaraddy centre from 1 to 10 on X-axis obtained by 1.Yield , 2. ASTABi value, 3. ASTABi, 5. Shukla’s stability value, 6. Shukla’s stability rank, 7. Index-1 value, 8. Index-1 value, 9. Index-2 value and 10 Index-2 rank ) for four rates of missing observations. It has been found that mean correlation of 1000 sets of data are slightly decreasing from order of 0.9 as rate of missing observations increases from 5 to 20 %. The correlation is slightly lower for SISGYS indices as compared to Yields and stability measures. Therefore, imputation by FITCON can safely be used up to 20 % missing observations with bienniality data. The distribution of 1000 sets without biennilaity (WOB)and imputed data by FITCON without bienniality (WOB) obtained for (1.Yield , 2. ASTABi value, 3. ASTABi, 5. Shukla’s stability value, 6. Shukla’s stability rank, 7. Index-1 value, 8. Index-1 value, 9. Index-2 value and 10 Index-2 rank) have been depicted in Fig.2 . It has been observed that order of correlation are slightly lower that the case of with bienniality for all most all the measures 1 to 10 from 5% T 10% missing observations. It has also been found that iteration got terminated when imputation done by FITCON for more than 10% missing observations. This is due to eliminating bienniality by taking moving average of two consecutive observation this courses doubling of missing observations. As like with bienniality in case of without bienniality also showing lower correlations for SISGYS indices abs compared to yield and stability measures. Disperson of correlation is more in case of SISGYS indices as compared to yield and measures.

Finally it can be concluded that FITCON safely be used for imputation up 20 missing observations in case of with biennilaity data and up while up only 10% missing observations in case of without bienniality data.

**Fig. 1 **Box plot showing of the distribution of correlations between situation 1 and 2 for different measures between imputed data by FITCON WB and original data WB at Sangareddy; (1 to 10 on x-axis); 1:Yield value, 2 : Yield rank, 3 : ASTABi value 4 : ASTABi rank 5 : Shukla’s stability value 6 : Shukla’s stability rank 7 : Index 1 value 8 : Index 1 rank 9 : Index 2 value 10 : Index 2 rank

**Fig. 2** Box plot showing of the distribution of correlations between situation 3 and 4 for different measures between imputed data by FITCON WOB and original data WOB at Sangareddy; (1 to 10 on x-axis); 1:Yield value, 2 : Yield rank, 3 : ASTABi value 4 : ASTABi rank 5 : Shukla’s stability value 6 : Shukla’s stability rank 7 : Index 1 value 8 : Index 1 rank 9 : Index 2 value 10 : Index 2 rank

**AKNOWLEDGEMENT**

We are highly obliged to the ICAR-Indian Agricultural Statistical Research Institute, New Delhi for providing facilities for carried out my research work. We are also thankful to the RPCAU, Pusa, Bihar for providing financial support and All India Coordinated Research Project on Sub-Tropical Fruits (AICRP-STF), CISH, Lucknowproviding data**.**

**REFERNCES**

- Bajpai, P. K. and Prabhakaran, V. T. (2000). A new procedure of simultaneous selection for high yielding and stable crop genotypes.
*Indian Journal of Genetics*,**60**(2),141-146. - Choudhary, R. K., Rao, A.R, Wahi, S.D. and Misra, A.K. (2016). Detection of biennial rhythm and estimation of repeatability in mango (Mangifera indica L.),
*Indian J. Genet.,***76(1),**88-97. - Digby, P. G. N. (1979). Modified joint regression analysis for incomplete variety × environment data.
*Journal of Agricultural Sciences*,**93**, 81-86. - Gabriel, K.R. (1971). The biplot-graphical display of matrices with applications to principal component analysis.
*Biometrika*, 58, 453-467. - Mukherjee S.K., and Litz R.E. (2009).
*Introduction: Botany and Importance*. In: Litz RE (ed) The mango botany, production and uses, 2^{nd}*edn*. CBI International, Wallingford, 1-18. - Patterson, H. D. and Silvey, V. (1980). Statutory and recommend list trials of crop varieties in U.K.
*Jour. Roy. Stat. Soc., A*.,**143**, 291-252. - Patterson, H.D
*.*(1978). Routine least squares estimation fo variety means in incomplete tables.*Jour. Nat. Inst. Ag. Bot.,***14,**401-4013. - Rao, A. R., Choudhary, S. K., Wahi, S. D. and. Prabhakaran, V. T (2010). An index for simultaneous selection of genotypes for high yield and stability under incomplete genotype × environment data.
*Indian J. Genet*.,**70(1)**, 80-84. - Rao, A.R., and Prabhakaran, V.T., (2005). Use of AMMI in simultaneous selection of genotypes for yield and stability.
*Jour. Ind. Soc. Ag. Statistics,***59**(1), 76-82 - Rao, A.R., Prabhakaran, V.T. and Singh, A.K. (2004). Development of statistical procedures for selecting genotypes simultaneously for yield and stability.
*IASRI Publication*, ICAR-IASRI, New Delhi. - Shulka, G. K. (1972). Some statistical aspects of partitioning genotype-environmental components of variability.
*Heredity*, 29: 237-45. - Wahi, S.D. and Malhotra, P.K. (1993). Estimation of repeatability of fruit yield in presence of biennial rhythm.
*IASRI Publication*, New Delhi. - Kempthorne, O. (1978). A biometrics invited paper: Logical, epistemological and statistical aspects of nature-nurture data interpretation.
*Biometrics*, 1-23. - Simon, R., & Altman, D. G. (1994). Statistical aspects of prognostic factor studies in oncology.
*British journal of cancer*,*69*(6), 979. - Veech, J. A., & Crist, T. O. (2010). Toward a unified view of diversity partitioning.
*Ecology*,*91*(7), 1988-1992. - Økland, R. H., & Eilertsen, O. (1994). Canonical correspondence analysis with variation partitioning: some comments and an application.
*Journal of Vegetation Science*,*5*(1), 117-126. - Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.
*Psychological methods*,*14*(4), 323. - Ferreira, F. C. (2008). Comments about some species abundance patterns: classic, neutral, and niche partitioning models.
*Brazilian Journal of Biology*,*68*, 1003-1012.