Because peptide binding to MHC is a requirement to elicit a T cell response, algorithm-based approaches predicting peptide binding are often utilized as a first screen to identify epitopes derived from large pathogens. In the present study, we have utilized 9-mer positional scanning combinatorial libraries to characterize the peptide binding specificities of several mouse and human class I alleles. When the corresponding positional scanning combinatorial library data were utilized to generate matrices for the prediction of binders derived from vaccinia, it was found that in all cases examined between 58 and 78% of the top 0.5% scoring peptides were high affinity binders, depending on the specific allele considered. The biological relevance of quantitative motifs derived from combinatorial library analyses were validated by identifying several epitopes derived from influenza A virus that were recognized by PBMCs from human donors. This study therefore provides a set of 19 uniformly generated matrices that can be directly applied to predict MHC peptide binding and T cell epitope candidates.
An implicit feature of the approach is that it provides a detailed quantitative motif for each MHC specificity examined. However, it is often useful to summarize MHC binding specificity in the more simple terms of primary anchor motifs. This "minimalist" approach dates back to the earliest studies of MHC binding, where specificities were defined using pool sequencing or single amino acid substitution analyses. These methods were very good at characterizing the most prominent features of allele specific motifs, and the resulting motifs have generally formed the syntax with which MHC binding is described. To extend the utility of the combinatorial approach, we have developed a heuristic approach to translate the matrix data generated by the combinatorial libraries into the more simple motifs that are the idiom of MHC studies. In the majority of cases, generalizable parameters could be defined that allowed the identification of main and secondary anchor positions congruent with those defined by other approaches.
The majority of HLA class I molecules whose binding specificity have been described by crystal structure, pool sequencing or peptide binding studies, the main anchor interactions of the peptide almost invariably involve the residues at position 2 and the C-terminus of the peptide. This pattern also appears to be true for most macaque and chimpanzee class I alleles studied to date. As evidenced by the cases of A*3001 and B*0801, the combinatorial library analysis suggests that the paradigm of position 2/C-terminus anchor spacing for MHC peptide binding is not always true. This has been reported previously in the case of B*0801 , where positions 3 and 5, in addition to the C-terminus, have been identified as primary anchors. Although this exact pattern was not duplicated by the combinatorial analysis, the present data do confirm the importance of positively charged residues in the middle of the peptide for conferring high affinity binding capacity. The ability to pick up unexpected binding patterns of MHC alleles is one of the key advantages of the combinatorial libraries, which have no prior expectations on which positions are likely to be important for MHC:peptide interactions.
In our previous HLA supertype classification study , B*0801 was considered an outlier on the basis of it's somewhat unique, for HLA, use of positions 3 and 5 as main anchor positions. This designation was also made by Lund  and Hertz . Others [74, 75] have classified it with alleles we  and others [72, 73] have assigned as members of the B7-supertype. In the present study, the combinatorial library analysis suggested that positions 5 and 6 are important (in addition to the C-terminus) for peptide binding. As such, the present analysis does not provide sufficient evidence to suggest that B*0801 should be assigned to a specific supertype, which are largely defined by position 2 (and C-terminus) specificity. We can note that proline, the B7-supertype associated position 2 specificity, does appear to be well tolerated in position 2 by B*0801. Also, our own unpublished binding data suggests that there may be some cross-reactivity between B*0702 and B*0801. However, this potential cross-reactivity has not been examined in enough detail at this point to draw any conclusions.
Previously, B*3501, B*5101, B*5301, and B*5401 were assigned to the B7-supertype [71, 72, 73], which describes a set of HLA alleles sharing a preference for proline as the position 2 main anchor. In the present study, the combinatorial library analysis confirmed this preference, but, surprisingly, also indicated that alanine was well tolerated by these alleles in position 2. This would suggest that overlap may exist in the repertoires of at least some B7-supertype alleles with alleles outside the B7-supertype, and in particular those associated with the B58- and or B62-supertypes. While the binding data we have to date suggest that the majority of instances of repertoire overlap for B7-supertype alleles will fall within the B7-supertype, and that proline is the most dominant preference in position 2, evidence for some cross-reactivity is also quite apparent. Indeed, a recent study  has found a high degree of cross-recognition of epitopes between alleles associated with different supertypes. Future studies will hopefully shed additional light on this issue.
In utilizing combinatorial libraries to characterize MHC specificity and identify binders, the approach we have implemented is computationally simple. We have largely utilized relative binding values for each residue/position coordinate. To predict binders, we have assumed the independent binding of peptide side chains, and represented the predicted binding propensity as a product of each coordinate. There are other ways to process the raw data for the purpose of generating prediction matrices, or to define anchor positions. To facilitate further investigation of prediction approaches by the bioinformatic community, we have here provided both the raw and processed data for over a dozen different HLA, and 4 H-2, class I alleles. We believe that this data will be of value to the community for the prediction of binders and epitopes, at least for several alleles not previously characterized in detail.
We compared the prediction performance of the combinatorial library with a set of 16 bioinformatic approaches for the best characterized human MHC allele, HLA A*0201. While several algorithms outperformed the combinatorial library, this has to be taken into perspective, as these algorithms are based on up to ten times more training data. Even more surprising, the combinatorial library nevertheless proved highly competitive, with a better prediction quality than 10 out of 16 algorithms. Taken together, the combinatorial libraries minimally provide a very solid baseline characterization of MHC binding specificity, which can be generated both quickly and with cost effectiveness.
Using combinatorial library based matrices to identify sets of candidate peptides, epitopes were successfully identified in patients typed as A*3001, A*3201 and B*1501. Notably, however, no peptide was recognized in more than one donor. This diversity of responses is similar to what was noted previously for mapping T cell responses to vaccinia-derived peptides in human donors. Because of the small number of A*3001 and A*3201 donors tested, it is possible we have under estimated the number of positive responses. Other factors may also be responsible for the lower response rates observed. The donor pool utilized represents an outbred population, and almost all of the donors were heterozygous at both the A and B loci. Thus, the diverse donor responses may reflect the different influences of other MHC alleles in shaping the overall T cell repertoire. Similarly, that dominant epitopes recognized in multiple donors were not identified may be due to the fact that the set of donors is representative of diverse histories of exposure to different viral strains.
The epitope identification aspect of the study was not pursued to the level and detail of our previous studies (e.g., ). Several factors are responsible for this, including the fact that while the alleles studied are not rare, neither are they prevalent, making it resource intensive to identify a sufficient number of additional donors. As a result, the identified peptides represent potential leads of a preliminary nature. At the same time, the data does help demonstrate that the matrices derived in the study are useful for epitope identification, even if the epitope identification study was not ideal. Furthermore, to the best of our knowledge, each of the epitopes identified in the present study represent novel epitopes, and together are the first set of influenza virus derived epitopes based on predictions for A*3001, A*3201 and B*1501.