Multicategory Classification by Support Vector Machines: Application to Proteins Secondary Structure Prediction

Abstract:

Numerous methods have been proposed for predicting protein secondary structure. Many different algorithmic approaches have been attempted. Among those, methods using evolutionary information and others using only the amino acid sequence. In this paper we focus on the second case, because predicting the secondary structure from amino acid sequence alone is still a challenge due to the fact that many proteins are orphan (their amino acid sequence is not significantly similar to that of proteins with known secondary structure and function). In this work, we present a comparison between results obtained by using approach based on kernel machines theory and the one using feed forward Neural Networks (NNs) in protein secondary structure prediction, based on a same dataset. The Support Vector Machines gave powerful results in pattern recognition especially in computational biology. As supervised machines learning, they have been successfully applied to a wide range of classification problems. The performance of SVM is significantly better than that of NNs, although the latter always remain in perpetual competition if the parameters are well selected. To improve classification accuracy, we employed Multiclass Support Vector Machines (MSVM) for training two datasets RS126 and CB513 which are collections of globular protein secondary structure sequences using a specific kernel and five-fold-cross validation.