Computational vaccinology workflows
For reducing development efforts a frequently pursued approach is usage of existing software components. For example, pBone and pView rest on Taverna for both, workflow design and execution. Certainly, each software application is designed for a specific task and corresponding specifications, consequently the trade off for adapting software modules has to be kept in mind. However, using frameworks as Taverna renders computational vaccinology workflow design and implementation highly efficient. Similarly, Java offers significant utility libraries.
Web services are the preferred method of communication for pBone and pView, being in widespread use and providing (backed by industry standards) an easy method for accessing computational resources. Restricted efficacy is seen when transferring larger data volumes when e.g. compared to remoting technologies such as RMI or CORBA. For pBone/pView the computational efficacy issue can be neglected because real time responsiveness is not required for such applications. However, large data volumes have to be kept in mind for bioinformatics applications, where Taverna tends to cause problems distributing 10+ MB of data between workflow modules. This problem is rooted in the way web services operate, as the SOA protocol uses XML as a basis for messages. XML parsers often have problems dealing with large files because building the XML data structure is a computationally expensive task. For example, a series of PSI-BLAST runs which were calculated for the homology mapping of the EBV virus resulted in overall 30 GB of data, individual sequences produced PSI-BLAST results of 8 to 10 MB. To combat these data size issues pBone was designed for using file storage, subsequently allowing exchange of file identifiers rather than files as such. A technology currently spreading throughout the Java community is object-relational-mapping. The frequently utilized implementation via Hibernate significantly simplifies the development of database access code. Using Hibernate allows switching the focus to developing application code rather than data persistence code, in turn substantially decreasing overall development time.
Bioinformatics is a fast evolving area commonly featuring short software life cycles, often due to major improvements or development of alternative solutions. When requiring input from multiple sources as is the case in computational vaccinology designing non-monolithic, flexible and adaptable software applications is therefore a logical step. The practical problem faced is that designing such software applications takes more time than creating applications exactly for one narrowly defined task, accenting the importance of a reasonable assessment of flexibility on the one and focus on the other hand. pBone and pView have been made for bioinformatics-rich environments with the analysis of proteins in mind. However, allowing pBone to make use of an expandable list of bioinformatics tools made it necessary to create a wrapper providing this flexibility.
Multiple sequence alignments and variability are a central part of the visualization concept of pView. At the first glimpse pView looks similar to the popular sequence alignment viewer Jalview. Indeed, both make use of similar concepts such as an MSA centered view, web-services, and inclusion of project support features. The facultative focus on processes with access to a central data repository resembles one of the most significant differences, allowing pBone/pView fast processing of large data volumes. One additional aspect is the ease of adding or editing of (also overlapping) annotations. Annotations are an important collaborative feature, as well as necessary for project controlling. Support for persistence of annotations to a central (pBone) repository was therefore included. To support easy extendibility pView allows addition of new data formats and visualization on a ’plugin’ basis.
Identification of vaccine targets
Various strategies and predictors exist for selecting vaccine targets and antigenic determinants, also in pathogens less well studied than EBV. Available predictors used in the presented workflow were selected based on superior or at least representative performance judged by state-of-the-art. In many cases comparable alternative methods exist or larger consensus predictions could be included, as is usually the case in a bioinformatics environment. Applied workflows therefore can only represent one possible realization within a larger set of options.
Selection of B-cell targets is strongly synonymous with annotation of genomes, as the assignment of function is usually indicative for relevance e.g. in metabolism, pathogenesis, and possibly conservation across isolates. Expression of proteins only during certain phases of the pathogen life cycle, an aspect sometimes criticized for vaccines generated from cultivated pathogens, is of particular relevance. Similarly, antigen abundance can be tackled to a certain degree utilizing transcriptomics or proteomics data (or alternatively sequencing methods) [61, 62, 63]. While level of expression is of high importance for T- as well as B-cell epitopes, understanding of expression variability during pathogen life-cycle such as early and late viral genes can be of importance [64, 65]. It is not yet clear whether mimcry or natural immunity should be the aim particularly in chronically infecting pathogens, where immune responses in naive hosts are by definition not particularly successful for clearance of infection. Immuno-diversion or -evasion through specific pathogenicity factors may be more often the problem than lack of effective epitopes or epitope variability.
Homology mapping allows the usage of known epitopes on proteins to predict epitope regions on related (homologous) proteins. Utilizing the concept of homology between biological entities is common practice for various tasks such as gene or protein annotation. Thus the idea of using epitope information from homologous proteins for predicting epitope regions is obvious. Previous work has demonstrated that PSI-BLAST provides an eligible basis particularly for detecting moderate remote homology [66, 67].
As in all mapping procedures a few key components exerting major influence on the final outcome have to be respected. The use of the UniRef90 sequence database for generation of the PSI-BLAST homology network reduces runtime and overcomes the problem of over-emphasizing groups of closely related sequences. Using cascading PSI-BLAST may therefore at least in theory be more sensitive to biases resulting in less sensitive searches. In our example the use of only three PSI-BLAST iterations and a stringent cutoff for the final, non-iterative BLAST against the Herpesviridae (e-value <= 10-20) ensured that only highly homologous sequences are identified. In contrast to other remote homology detection methods which focus on overall homology at least of domains, our method uses this concept as a first step and then focuses on identified regions containing experimentally validated epitopes.
Naturally, mapping of protective antigens can be of great value in vaccine design per se. We attempt to make use of as much information as possible by mapping homologous regions of those reported to be involved in protective immune responses. In consequence each significant PSI-BLAST alignment has to be re-evaluated in the context of individual neutralizing peptides. This approach ensures that relevant regions lie within aligned regions, and that the overall secondary structure correlates. Even if simple one-to-one mappings already allow carrying forward experimentally verified data on epitope sequences, the use of multiple mapping steps within a homology network is valuable since it allows propagating information among distantly related proteins at the borderline of PSI-BLAST sensitivity. Care has to be taken, however, to avoid false positive mappings. Alternatively, CSI-BLAST  may be applied which was reported to be more sensitive than the widely used NCBI-BLAST.
Sequence areas supported by mapping of multiple epitopes even of different pathogens may indicate that such an area allows targeting several of these pathogens with one peptide when using homologous region if they are at least moderately conserved.
Expectations towards Systems Biology in supporting an understanding of host-pathogen interactions are substantial. While relatively new, this field has already produced encouraging results . Chronically infecting pathogens like EBV are in this context of particular interest due to extensive mechanisms for interfering with the host’s immune system aimed at provoking immune evasion or misdirection . For vaccinology insights into such mechanisms are central for identifying pathogen factors crucial for maintaining viral life-cycle (or even leading to subsequent diseases as malignancies).
We exemplarily focused on the molecular environment of differentially regulated CD9 on B-cells, a factor associated with the primary EBV receptor CD21, for deriving some of the effects EBV may have on host-cell network organization and function. While EBV can also infect CD21 negative cells (probably through binding to B-cells first) this complement receptor clearly plays a central role in EBV pathogenesis . According to UNIPROT CD21 is expressed on mature B-lymphocytes as well as on some other immunologically relevant cell types and further on pharyngeal epithelial cells. This is of particular interest as the two groups of malignancies which have been speculated to be most likely associated with EBV infection are lymphoid tumours (primarily of B-cell origin) as well as nasopharyngeal carcinomas.
Several observations can be made in a Systems Biology analysis. Among the 48 human factors being in context to EBV in a CD9-centered interaction network 12 are directly targeted by HHV-4 proteins while none of the direct binders is differentially regulated. Probably most apparent is the strong presence of immunologically relevant players (12 of 48). This may not seem astonishing as the network was built around down-regulated CD9, thus likely including factors involved in B-cell regulation, although data on differential regulation were derived from epithelial cells and not B-cells. While CD21 is present in epithelial cells, this specific enrichment of functionality centrally assigned to B-cells is an interesting finding when targeting a virus infecting both cell types. This finding certainly raises the question in how far pathogenicity mechanisms of EBV are specialized for epithelial cells or B-cells, or work similarly in both. There is certainly one major exception, however, as EBV only shows a latent (dormant) status in the B-cell lineage. Concerning interference with homeostatic B-cell regulation, the latency associated antigen LMP2 plays a major role as has been shown previously . LMP2 prevents EBV reactivation in latently infected cells through suppression of SYK and LYN activity . On the other hand this protein bypasses developmental checkpoints allowing immature B-cells to proliferate, an aspect which may be associated with malignancy. Similarly LMP1, a functional CD40 homologue , exerts major effects on control of B-cell proliferation . The association with CD209 (DC-SIGN) is not unexpected as this protein is known to bind several envelope viruses including human cytomegalovirus, a relative of EBV . The third observation is the strong enrichment of factors associated with neoplastic disorders. At least eleven out of 48 proteins present in the interaction graph have been explicitly associated with cancer.
The direct interaction between the latency associated protein EBNA2 and the host factor MCP (CD46) is also of particular interest. MCP acts as a receptor for multiple pathogens including Measles virus and Human Herpesvirus 6 [76, 77], the latter being phylogenetically related to EBV. MCP exhibits several immunity associated functionalities, one being a co-stimulatory factor for T-helper cell development which enact at least part of their potential through IL-10 release . Generally, the immune-evasive strategy of EBV appears to rest strongly on IL-10, as the virus itself also codes for an IL-10 homologue (vIL-10; BCRF1) being expressed during the lytic cycle. The immunosuppressive effects mediated through regulatory T-cells during EBV latency may also exert certain side effects pertaining to auto-immune-disease. Hypothetically this may happen through the inability (or decreased efficiency) to fully resolve certain infections in the presence of EBV.
IMPDH2, a rate limiting enzyme of guanine nucleotide synthesis, is consistently down-regulated, much in contrary to what may be expected during oncogenic transformation or increased metabolic needs for virus production. While there is no safe way to generalize findings found for specific tumors, at least in colorectal cancer up-regulation of this protein and associated auto-antibodies were observed . The down-regulation seen may result from anti-viral mechanisms striving to limit the metabolic rate (or may reflect the epithelial origin of the transcriptomics data used). However, IMPDH2 was not reported as consistently differentially regulated in the ‘meta-B’ dataset presented by Chen et al. in both, nasopharyngeal carcinoma and primary effusion lymphoma . Similarly, MAP3K5 up-regulation may be a reaction to viral infection as this protein is thought to be associated with apoptotic death . Interestingly, this is also the case in latently infected cells suggesting viral mechanisms to overcome this apoptotic stimulus. Of relevance in this context is the finding of the down-regulation of the pro-apoptotic factor TRIP12. Down-regulation of APPBP1 (NAE1) is somewhat unexpected as it may suggest attempted cell-cycle arrest also in latently infected cells, while on the other hand avoiding the apoptotic stimulus . However, this may be a tissue specific assumption as apoptosis data was generated in neuronal tissue. Laminin receptor ITGA6 is down-regulated during latent infection but up-regulated during recurrent infection.
The further observation of potential interest derived on the Systems Biology analysis of EBV infection is the interaction between BALF4 (gp110;gB) and human LAMB1 (Laminin subunit beta-1) as well as FN1 (fibronectin). BALF4, and at least its homologue in HHV-1 (UL27;gB), act as essential secondary receptors after initial activity of the primary adhesin (BLLF1 in the case of EBV). UL27 is also essential for initial attachment to host proteoglycans . gp110 is also of specific interest for homology mapping and epitope prediction as this protein is highly conserved in herpesviruses and present in fairly diverse viral species. EBV gp350 (BLLF1) binds to host CD21, and in a second step BALF4 is required for host-membrane fusion [47, 82, 83, 84]. As LAMB2 and FN1 are both components of the extracellular matrix this may suggest a way to enrich in certain tissues lacking CD21. An alternative interpretation is the potential enrichment of laminins such as LAMB1 (or alternatively fibronectin) in the viral membrane, as these may potentially serve as primary receptors interfacing with gp110. While this may seem far-fetched, LAMB1 and FN1 bind to integrins (receptors) ITGA3 and ITGB1, as well as LAMB1 alone binds to ITGA7. The integrins ITGA3 and ITGB1 also directly interact with CD9 and CD19 suggesting that a hypothetical uptake of laminins or fibronectin in the viral membrane would utilize similar membrane complexes as the classical CD19 assembly. According to UNIPROT annotation LAMB1 is thought to interact with other laminins through coiled coil structures and can be taken up by a high affinity receptor. The reduction of extracellular matrix in cell-cultures may thus be part of the reason why the tissue tropism of EBV seems to be more restricted in-vitro than in-vivo. Also, an increased amount of BALF4 in mature virions can expand tropism to epithelial cells [84, 85]. To verify this adhesion hypothesis EBV particle proteomics would be helpful, as host proteins have been shown to be components of virions in Herpes simplex virus 1 [85, 86, 87].
The presented interaction network in particular suggests LMP2, EBNA2 and BALF4 (gp110) for inclusion into a list of vaccine targets resulting from the Systems Biology analysis of interfacing to host cellular processes involved in both, immunological response as well as neoplastic disorders.
Methods for selecting candidate B-cell antigens and epitopes for protective immune responses have been significantly put forward in the recent past [25, 88]. Particularly with respect to protective epitopes we stress to include functional considerations for B-cell epitope prediction, essentially ranking pure antigenicity as a secondary selection criterion. We consider functionally well accessible sites as preferred targets for stimulating an immune response interfering with relevant pathogen (or other target) functionality. Such considerations primarily pertain to peptide vaccines although it may well be extended to entire domains and select recombinant antigens. The presented approach primarily relies on the classification method previously proposed by Söllner et al. , as well as inclusion of potential binding sites in disordered regions . This concept will have to be significantly expanded as new predictors for functional sites [90, 91, 92] and annotation resources with respect to target coverage expand [93, 94]. From our point of view prediction of B-cell epitopes is far less an issue for one specific epitope prediction routine than a suite of bioinformatics resources for characterizing a candidate protein. Routines applied in this area are naturally in constant flow due to generally short software life-cycles in bioinformatics. Dedicated predictors for continuous as well as discontinuous epitopes are just one element. In the study presented here semi-manual analysis of protectivity yielded encouraging results. Comparison of precision and baseline for amino acid based comparison and particularly coverage of known neutralizing epitopes clearly demonstrate the practical benefit for utilizing such tools in vaccinology. The concept of considering functionally conserved sites is naturally also applicable to T-cell epitopes. In this context it will also be interesting whether furin dependent maturation of EBV gp110 is therapeutically tangible . In the realm of T-cell epitope prediction several successful methods have been published. For this study we selected a number of human HLA alleles relevant for equatorial Africa and integrated predictions performed by several of the CBS tools (netMHC, netCTL, netMHCpan, netChop, netMHCIIpan) into a format amenable for visual analysis of variability of antigenicity. In this context it is worth mentioning that while the difference between antigenicity and immunogenicity of peptides is a highly important concept it is often underappreciated in immunoinformatics practice. Modeling of immunological pathways will likely improve prediction of antigenicity, whereas immunogenicity and immunodominance are decidedly less well understood and represented in models. Nevertheless we see prediction of antigenicity as a reasonable estimate of immunogenicity and immunodominance.
While manual analysis of single sequences for a reduced set of alleles has its merits it is often necessary to screen proteins or entire proteomes for peptides covering multiple alleles. Cluster contained epitopes may not necessarily be immuno-dominant during natural infection but may still provide the merit of broad HLA/MHC coverage. This approach tends to identify supertype binders for several MHC classes or alternatively potentially overlapping epitopes depending on the length of input epitopes and length of the output peptides. We term this concept ‘Conservancy Constrained T-cell Epitope Cluster (CCTEC)’ and understand it as a complement or extension of previously published methods for optimizing vaccine coverage . For the implementation presented here we used fixed width windows leading to ranked peptides of equal length. While this is a straightforward approach it may be preferred to allow peptides within a certain length range, possibly applying a penalty function for particularly long peptides. This would lead to most efficient selections in terms of covering most alleles with as few amino acids to be included as possible. As an additional advantage C-terminal processing of epitopes would be less relevant as at least a fraction of the epitopes would end with the cluster C-terminus thus potentially leading to enhanced availability for MHC loading. Also it may be desirable to weight specific alleles and allele classes differently, for example for achieving optimal coverage of one supertype before searching for additional supertype specificities. One aspect to consider for T-cell epitope clusters is that they are probably best suited for DNA vaccines, as encoded proteins go through the proteasome for which good predictive models such as netChop are available. For peptide and subunit vaccines the lysosomal pathway of proteolysis should apply, so the prediction of dominantly produced peptides ready for cross-presentation  may be a hurdle for efficient application of CCTECs in dendritic cell based (T-cell focused) peptide vaccines. In any case, validation of T-cell epitopes, optimally in suitable animal models , is of crucial importance due to various difficult to predict factors .