The marsupial and eutherian (placental) lineages diverged approximately 180 million years ago. Marsupials are chiefly distinguished from other mammals by their unique reproductive strategies, with young born in an immature state with only the most rudimentary neurological and immunological systems [1]. At birth, the animal manoeuvres its way to a waiting teat, where it attaches until it reaches a state of maturity that allows it to function independently. Marsupials possess lymphoid tissue and cellular components that are structurally similar to those of other mammals. Key antigen receptor and recognition molecules including Major Histocompatibility (MHC) Class I, II and III [2], T Cell Receptors alpha, beta, gamma and delta [3, 4], Toll-like receptors [5] and immunoglobulins [6] have been characterized.
However, conventional experimental strategies using degenerate primers for reverse-transcriptase polymerase chain reaction (RT-PCR) and heterologous probes for screening genetic libraries have only identified the most phylogenetically conserved immune molecules, with cytokines proving particularly difficult to isolate [7]. To date, only eleven cytokines including one receptor have been cloned from marsupials. They include tumour necrosis factor alpha (TNF-α) [8, 9], lymphotoxin (LT) -α and -β [10, 11], Interleukin IL-1β [12], IL-1R2 [7], IL-5 [13], IL-10 [14], leukemia inhibitory factor LIF; a member of the IL-6 family [15] and three type I Interferon (IFN) genes [16]. These cytokines show relatively high levels of identity compared to their eutherian homologues. Previous attempts to isolate the more divergent T-cell derived cytokines that orchestrate adaptive immunity such as IL-2, IL-4 and interferon-γ have failed [7, 17].
Identification of divergent marsupial immune genes is important for two reasons. Firstly, unsuccessful attempts to isolate T cell derived cytokines in the laboratory has led some authors to suggest that the marsupial immune system is 'primitive' and does not possess the level of complexity demonstrated by eutherians such as humans and mice. The fact that some T cell driven responses are also aberrant adds to this argument. Marsupials appear to have delayed skin graft rejection [18] and antibody class switching [19], together with an apparent lack of an in vitro Mixed Lymphocyte Response [20]. Elucidation of genes involved in specific immunity will help us to determine whether the apparently 'simple' immune responses generated by marsupials are genetically hardwired.
The second reason for identifying divergent immune genes in the marsupial genome is to develop marsupial specific immunological reagents. To date, most assay systems employed to characterise cells and their function rely on eutherian reagents or culture techniques developed in eutherian species. Where low levels of cross reactivity exist between marsupials and these model species, the usefulness of the data generated from such assays is limited. Identification of key cell markers, such as CD4 and CD8 will allow us to generate marsupial-specific reagents, which would then be used to gain a better understanding of the marsupial immune response.
Difficulties associated with identifying rapidly evolving cytokines are not limited to marsupials. The chicken IL-2 gene took seven years of focused effort to identify [21], and was eventually found using expression strategies and not heterologous cloning techniques. The recent sequencing of the complete genomes of a large number of non-eutherian vertebrates will expedite the isolation and characterization of these immune genes in distantly related species. However, currently automated annotation techniques are not sensitive enough to identify many of these molecules outside the eutherian lineage.
The first marsupial genome was recently sequenced by the Broad Institute. The subject of this project, Monodelphis domestica, is a South American opossum. It is a well-recognised biomedical model in the study of comparative physiology, immunogenetics, cancer development and disease susceptibility. Two publicly available annotations of this genome have been generated. Ensembl have produced a gene build with their automatic pipeline [22], which relies principally on GeneWise [23], while the UCSC genome browser provides several annotation tracks with similarity features and gene models, for example chained TBLASTN alignments of human proteins, BLAT alignments of RefSeq mRNAs, and Genscan [24] and N-SCAN [25] predictions. With the exception of the Genscan predictions, which are ab initio gene predictions based on genomic sequence only, the gene builds rely on cross species homology, as no large-scale opossum EST projects have been completed yet and there are only 425 known opossum protein sequences in GenBank. In most cases, Ensembl and the UCSC genome browser were unable to identify highly divergent cytokine genes such as IL-2, 4 and 13.
To overcome this shortcoming in the automated annotation of the opossum genome and to start to address uncertainties about immune function in marsupials, we have adopted a manual, expert-curated approach to annotating highly divergent genes. The critical first stage of this is the careful identification of the genomic region containing the gene. This is performed using a sensitive TBLASTN search. HMMER [26] can also be useful at this stage. Frequently, it is necessary to first narrow the search to the syntenic region by identifying conserved flanking genes.
Having identified similarity features, gene prediction is performed on genomic sequence extracted from the region. The accuracy of gene prediction is dependent on the prediction method. As with the automated annotations, we favour gene predictors that incorporate information from orthologous sequences into the prediction process. In addition to GeneWise and N-Scan, there are now several such methods available including Procrustes [27], HMMgene,[28] GenomeScan [29], and Augustus+ [30]. Procrustes and the default GeneWise algorithm perform spliced alignment. Augustus+ uses an interesting approach, which constrains predicted genes to incorporate user-supplied hints. However, it is not particularly convenient for manual use or use by biologists lacking scripting skills. While not the only possible choice, we have found GenomeScan to be both convenient and reasonably accurate (based on comparison with known eutherian sequences). It is worth noting that there is another class of gene prediction methods that obtain homology information from syntenic regions of other genomes. These include TwinScan [31], which is asymmetric and predicts genes in one genome only and SLAM [32], which simultaneously aligns two genomes and predicts genes in both. These methods were unlikely to be useful in our study as we were looking for genes that are highly divergent and difficult to align at the genomic level. Finally, a comparison of predicted results with known eutherian sequences and curation of the result was undertaken if required. Our success with this strategy suggests that this method will be applicable to the identification of rapidly evolving gene families in other distant vertebrate species.