Robust Depth based weighted Estimator with Application in Discriminant Analysis

Received: 10/Mar/2018, Revised: 22/Mar/2018, Accepted: 13/Apr/2018, Online: 30/Jun2018 Abstract— Data depth concept used to measure the deepness of a given point in the entire multivariate data cloud. It leads to center-outward ordering of sample points used rather than usual smallest to largest rank. The ordering starts from middle and moves in all directions. Multivariate location and scatter can be computed by using the depth value of each data point. Various depth procedures have been established by many authors. In this paper, a new depth procedure is proposed, namely Modified Mahalanobis Depth (MMD), which calculates depth based on robust distance with Minimum Covariance Determinant (MCD) approach and a weight function is established to determine the location and scale. The superiority of the proposed depth based procedure over existing depth procedures has been studied in simulated environment using R software with respect to application in discriminant analysis. The proposed depth procedure performs well when compared with the existing procedures even with higher contamination levels and larger sample sizes.


INTRODUCTION
Location and scatter plays a very important part in multivariate statistical methods. Data depth is one of the main emerging concepts to determine such measures. Data depth measures deepness of a given point in the whole data cloud. This concept is essential since it leads to a centeroutward ordering rather than usual smallest to largest rank. It means the ordering starts from the center and moves in all directions. The depth value for every data point can be calculated by using various established depth procedures. The data point which has the highest depth value is considered as the deepest point and the lowest depth value as outlier. The data point which has the maximum depth value, which approaches to 1, is considered as the best location. Various depth procedures have been developed in the literature [1]- [16]. Comprehensive surveys on data depth are described in [17]- [19].
In this paper, a new depth procedure and a new weight function associated with the study are proposed to estimate location and dispersion. The new procedure uses robust distance with the Minimum Covariance Determinant (MCD) approach instead of Mahalanobis distance.
In Section II, various existing depth procedures are defined. The proposed depth procedure, (MMD) is explained in section III. The new weight function to estimate location and scale is described in section IV. In section V, the performance of the proposed depth procedure is compared with existing procedures in a simulating environment with application in discriminant analysis in the form of apparent error rate (AER). The conclusion is presented in last section.

II. MATERIALS AND METHODS
The existing depth procedures, based on distances, projection pursuit, halfspaces, weighted mean and neighbourhoods are summarized in this section.

A. Mahalanobis Depth
Then the Mahalanobis distance of each point can be estimated and from this distance, (MD) can be computed by The deepest point in the entire data has the largest depth value.

B. Halfspace Depth
The idea of halfspace depth was originally developed by [1].
for all  , then the depth of x equals to zero; if depths of x equal to 1 then x is the expectation.

D. Spatial Depth
The idea of spatial quantiles was introduced by [21] and Spatial depth (SD) was formulated by [7] and extended by [22]. A generalization of L 1 norm in univariate case form spatial quantiles. Spatial depth is also known as L 1 -depth.
For a distribution function F, spatial quantile function F Q and the interpretation of outlyingness, SD is given by The point corresponding to maximal depth is considered as the spatial median and the point that has the lowest depth value is considered as the outlier.

E. Projection Depth
Projection depth (PD) has been proposed by [6]. Both Projection depth and halfspace depth are closely related, which reflects the projection pursuit methodology. This procedure involves supremum over infinitely numerous direction vectors hence the computation of PD appears intractable.
Let x be any point in the data cloud, u be any p dimensional vector having unit norm. M denotes the median of data cloud X, MAD represents the Median Absolute Deviation. Then PD is defined as

F. Local Depth
The concept of local depth (LD) was proposed by [12]. This procedure is a local extension of depth. The construction of this depth is obtained by conditioning the distribution to appropriate neighbourhoods. For defining a neighbourhood of a point, depth of a point is calculated in the idea of symmetrisation of a distribution. Let x be any point in the entire data cloud X, and instead of a distribution X P a distribution where then for a locality parameter  , a neighbourhood of a point x as be a depth function, then LD with respect to a point x is defined as where

III. PROPOSED DEPTH PROCEDURE
The proposed procedure is formulated by using the concept of MCD estimator instead of the conventional estimator of location and scale in the Mahalanobis distance, namely the MMD procedure. The robust MCD estimator was proposed by [23] to locate the robust measure of location and scatter. The computational depth procedure for MMD is as follows: Let p X X X ,..., , 2 1 be a p dimensional multivariate data set X and x be a numerical vector whose depth is to be calculated.
1) Find center ( X M ) and the covariance matrix 3) Sort the distance given in step 2 and denote it by 5) Find the difference between distance value and median from step 2 and step 4.
6) Find the absolute value of difference given in step 5 7) Finally the MMD depth can be computed by Mahalanobis depth provides reliable results when the data is normal. But it gives unreliable results, when the data contain outliers, since this depth procedure uses traditional mean vector and covariance matrix which are very sensitive to outliers.
The proposed MMD procedure is used to compute depth by employing robust estimator and is also it is an advancement of MD, since it further calculates the absolute differences about median of distances. It gives the highest depth value to the best points and establishes reliable results.

IV. PROPOSED WEIGHT FUNCTION
The weight function has been proposed to estimate location and scale after computing depth value of each point in a data cloud. Given a notion of depth described in section 2 and 3, the depth weight function is computed and is given below.
Consider the depth value for each data point denoted by De . Sort the depth values and find the median denoted by   x md . The depth weight function is given by From the above weight function, assign weights  

V. EXPERIMENTAL RESULTS
To study the performance of the proposed depth procedure (MMD), it was compared with the existing depth procedures which were given in the section 2. The experiments were carried out under simulation environment in the context of discriminant analysis by computing AER with the help of packages in R software and are summarised in this section.
The data were simulated with different dimensions p=2 and 5, the number of groups, g=2 and 3, the size of training sample, n=100,1000 and 5000, and various levels of contamination (0%,10%, 20%, 30% and 40%). In all the cases, the class distributions are normal, but the generated data sets have different mean vector and the same covariance matrix for 2 dimension, Generated data were contaminated with different mean vectors along with various levels. The computed AER is considered to understand the efficiency of the proposed and the other depth procedures. The experimental results are summarised in the form of tables and is given in appendix.
Ttables 1 and 2 reveal that, all the depth procedures gave the same AER when the data were without contamination. But when the contamination level increased, all the depth procedures showed different results. Zonoid depth gets affected even when the contamination level increases to 10% and also with increases in sample size and dimensions. Halfspace depth gets affected at 20% level of contamination at both sample size and dimension increases. In some cases, SD gets affected at 30% level of contamination, but is fully affected at 40% contamination. At 40% level of contamination almost all depth procedures get affected except the proposed MMD procedure. It gives better results as compared with all other depth procedures. AER increases rapidly in all depth procedures except MMD procedure when the contamination level increases and also sample size and dimension increases. Projection depth also gives almost the same result as MMD, but in some cases it gives a little high AER compared with MMD. Finally, it is concluded that the, proposed MMD procedure performs well as compared with the other depth procedures.

VI. CONCLUSION
Location and scatter play a very important role in multivariate statistical methods. Data depth is one of the main emerging concepts to determine such measures. This paper proposed a new depth procedure called MMD and also proposed a weight function to estimate location and scatter. The performance of the proposed depth procedure is compared with the existing depth procedures. The experiments were carried out with application in discriminant analysis by computing AER. The computed AER under the proposed depth procedure MMD is lesser than the other depth procedures when contamination level increases and also when considering various sample sizes and dimensions. It is concluded that, the proposed MMD procedure can be applied in all multivariate statistical techniques, since it can tolerate certain levels of contamination and produces reliable results.