Robust Linear Classifier

Project starts with developing robust sparse hyperplanes to classify molecular profiling data. (Bhattacharyya et al., 2004) built classification and relevant feature identification algorithms those are resilient to uncertainty in the data. In our approach, the notion of “uncertainty” is made explicit by specifying the allowable values of a data point via an ellipsoidal data uncertainty model parameterized by a center(location) and a covariance matrix(shape). We considered both Gaussian and unknown arbitrary uncertainty. The task of learning a robust sparse hyperplane from such models is posed as a robust Linear Progamming(LP) and a Second Order Cone Program(SOCP).

(Bhattacharyya et al., 2004; Shivaswamy et al., 2006) built Robust Support Vector Machines to handle missing values in data. In the case of missing values we may be able to (using a secondary estimation procedure) estimate the values of the missing variables, albeit with a certain degree of uncertainty. This work has also modeled uncertainty by its expected value and co-variance structure and has proposed a robust classification method via worst case optimization scheme. The task of learning a robust SVM has been posed as SVM with chance-constraint and finally with the help of "Chebysev's inequality" it has been again posed as a seccond order cone program.

(Bhadra et al., 2009; Ben-Tal et al., 2011) present a novel methodology for constructing maximum-margin classifiers which are robust to interval-valued uncertainty in examples. The idea is to employ chance-constraints which ensure that the uncertain examples are classified correctly with high probability. The key novelty is in employing Bernstein bounding schemes to relax the resulting chance-constrained program as a convex second order cone program. Bernstein bounds employ richer partial information and hence can be far less conservative than Chebyshev bounds. Due to this efficient modeling of uncertainty, the resulting classifiers achieve higher classification margins and hence better generalization.