A region or segment of an antigen, recognized by a specific antibody or B-cell is called antigenic region or B-cell epitope. These B-cell epitopes can be categorized into two classes, continuous and discontinuous. A continuous/linear epitope is a segment of consecutive residues in the primary sequence while a discontinuous/conformational epitope is a bunch of residues of an antigen that are far away from each other in the primary sequence but are brought to spatial proximity as a result of polypeptide folding. It is also known that most of the B-cell epitope (~90%) are conformational epitope. Both types of epitopes play an important role in the peptide-based vaccines and disease diagnosis [1, 2]. One of the beauties of immune system is that it recognizes the foreign proteins/antigens and generate specific antibody against these antigens. This potential of immune system has been exploited by researchers for designing subunit vaccines [3, 4].
In the post genomic era where a large number of pathogens have been completely sequenced, it is crucial to identify B-cell epitope or here after called antibody interacting residues in an antigen for the design of subunit vaccines against these pathogens. In the past several experimental techniques have been developed for mapping antibody interacting residues on an antigen that includes identification of interacting residues from structure of antibody-antigen complexes [5]. One of the popular approaches is overlapping peptide synthesis covering the entire antigen sequence, which identifies mainly sequential epitopes [6]. Mapping of antibody interacting residues has been severely hampered by the costly and time taking process of 3D structure determination. Many tools, covering compilation, visualization and prediction of B and T cell epitopes have been developed [7]. Despite of majority of epitopes being conformational, most of the computational methods and databases centered at the sequential epitopes [8, 9, 10]. Linear epitope prediction methods can be categorized into physico-chemical property [11], HMM [12] and ANN based [13]. Many methods are available for antibody interacting residues identification if antigen's or its homolog's tertiary structure is known which in itself is a big limitation. These are based on features like flexibility, solvent accessibility [14, 15] and amino acid propensity scales [16]. Earlier researchers created a benchmark dataset from the 3D PDB structures and evaluated several structure-based protein-protein binding site prediction methods which included popular CEP [15] and DiscoTope [16] for predicting immunogenic regions [17]. They opted the definition, that epitope consist of antigen residues in which any atom of the antigen residue is separated from any antibody atom by a distance of ≤ 4Å. They found that the performance of all methods were mediocre and no method could achieve Area under curve (AUC) greater than 0.7. In addition to these a bunch of improved methods have been developed for the prediction of antibody interacting residues if tertiary structure of antigen is known [18, 19, 20, 21, 22, 23]. In summary, one needs to determine structure of antigen using crystallography in order to identify antibody interacting residues in antigen. The experimental techniques like crystallography are expensive and time consuming where as functional assays are not reliable enough [5]. Thus there is need to develop alternate technique for predicting antibody interacting residues in a protein.
In this study attempt has been made to predict antibody interacting residues in an antigen from its primary sequence. First we created the patterns of different window lengths from the corresponding amino acid sequences then used the standard binary and physico-chemical profiles of patterns. We have introduced for the first time the concept of composition profile of pattern (CPP) generated through sliding window where the central residue is antibody interacting. These features were used to develop SVM based models to predict antibody interacting residues with high accuracy.