The binding of peptides to human leukocyte antigen (HLA) class I molecules is a prerequisite for CD8+ T cell response. Majority of these peptides are generated in the cytosol by proteosomal cleavage of endogenous proteins [1]. The degraded peptides, preferably 9-18 amino acids in length, are transported into the lumen of the endoplasmic reticulum (ER) by the transporter associated with antigen processing (TAP) for loading on HLA class I molecules [2, 3]. The ligated HLA class I complexes then leave the ER and are transported to the cell surface for presentation to T cell receptors [4]. Defects in TAP genes can severely impair peptide transport into the ER, and result in reduced surface expression of HLA class I molecules [5].
The substrate specificity of TAP has been examined in several studies. It is now known that hydrophobic aromatic residues are preferred at the C-terminus, positions (p) 3, and p7; hydrophobic or positively charged residues are preferred at p2; aromatic or acidic residues are preferred at p1; and proline is disfavored at p1 and p2 [5, 6, 7]. Different HLA class I alleles exhibit different TAP-dependencies. HLA-A2 is reportedly the least TAP-dependent; B7 can bind to other mechanisms besides TAP transport; while A3 is predominantly TAP dependent [8]. As such, improved understanding of TAP selectivity is important for elucidating its role in regulating the supply of peptides to HLA class I molecules. This is also crucial for the design of T cell-based vaccines for infectious diseases, autoimmune disorders, transplantation and cancer.
To date, a variety of computational methods have been developed to predict TAP-binding peptides. Daniel and coworkers [9] applied artificial neural networks (ANN) to simulate TAP binding experiments. Zhang et al. [10] combined ANN and hidden Markov models to predict peptide binding to human TAP. Doytchinova and colleagues [11] developed an additive QSAR model for peptides binding to TAP molecule. Bhasin and Raghava [12] utilized a cascade support vector machines (SVM)-based method to predict the binding affinities of TAP ligands, while Peters et al. [13] and Diez-Rivero et al. [14] reported the use of stabilized matrix method and SVM-based system, respectively, to predict both nonamer and variable length TAP ligands. Although numerous studies have shown the importance of sequence locality in TAP transport [12], none of the existing systems have exploited localized amino acid effect for predicting TAP binding affinity of peptides.
Here we report TAP Hunter, a web-based computational system for predicting TAP ligands using SVM as the discrimination engine. A novel data encoding scheme, based on sequence locality and composition effects, allows the system to model essential features in peptides that can bind to the TAP translocator. This simple method allows us to predict TAP ligands with an accuracy that is better than existing approaches based on full-length sequences.