The human Ig variable region gene locus has undergone extensive evolutionary editing. This can be seen by the division to families, where every family probably started from a single gene that was duplicated and mutated to form sets of similar but not identical sequences. The aim of the current study was to answer the following questions. What evolutionary parameters affect the size and structure of gene libraries? Are the numbers of genes in libraries of contemporary species, and the corresponding gene locus structure, a random result of evolutionary history, or have these properties been optimized with respect to individual or population fitness? To aid us in answering those questions, we created a simulation of the evolution of Ig gene libraries in a population of organisms.
Although it is difficult to directly relate rates and other simulation parameters to actual evolutionary rates, we may examine the general characteristics of real Ig gene loci in order to draw conclusions regarding whether the current Ig variable gene loci have been optimized in any way. Our results suggest that the Ig gene locus structure has been optimized by evolution, based on the following observations. Our simulations show that the population of organisms, after a stochastic start, settles into to a steady state with the maximal fitness that can be reached within the limiting conditions of the chosen parameters. This is achieved by evolution towards the optimal combination of genome size and diversity for the given set of parameters. This optimization is not observed when the size of the genome is unlimited and the organisms' genomes can grow indefinitely. As the Ig gene loci of contemporary organisms do have limited sizes, it is reasonable to conclude that there is a maintenance cost that limits the number of Ig gene segments. If there was no limitation on gene locus size, we would have expected that large numbers of genes will have an evolutionary advantage. This advantage would have manifested itself in the largest possible number of genes appearing in all species. However, there are large differences in the number of genes between species [11]. For example, in the chicken genome, where the main diversity mechanism is gene conversion, only one functional V gene for each of the light and the heavy chains is found in the genome [25, 26], and the fish Raja erinacea has no somatic diversification and therefore has a larger germline Ig gene diversity with amino acid sequence differences preferentially distributed in complementarity-determining versus framework regions [27]. Additionally, the rat genome has a 353 IgV genes in the heavy chain (IgVH) while humans and zebrafish have much lower numbers of IgVH genes (104 and 47, respectively). As the genes are created by gene duplications, if there were no limiting mechanisms, we would also expect a smaller degree of diversity in the genes. However, the data show that more evolutionarily advanced species do not necessarily have more genes, and the genes have large diversity [11]. Preference for diversity is found in the rabbit, where the polymorphism in the dominant VH is highly conserved through [28, 29]. Sharks also show a selection aimed at increasing the VH diversity [30]. Ig locus sizes and arrangement in different species may reflect the different diversity-generating processes they use, for example translocon organization (locus with multiple V and few or one C genes) versus clusters (multiple loci each containing one to three V and one C genes) [31].
Our simulations show that the Phenotype/Genotype ratio has an optimal range, below which the fitness is too low, as the benefits of having a large gene library to creates a large enough Ab repertoire are counteracted by the overhead of maintaining a large gene library (Figure 3), and above which the fitness is not significantly increased.
Human IgH variable region gene loci contain up to ten gene families, with a total number of genes of at most 123, the diversity of which is quite high. Together with the diversity of the IgL and with the additional mechanisms of diversity generation not modeled here, such as gene rearrangement and junctional diversity, the Ab repertoire generated in normal individuals, although it covers only a small fraction of potential receptors, is evidently sufficient for survival. That is, the Phenotype/Genotype ratio is probably in the range that allows reasonable diversity and hence fitness, and also a reasonable gene locus size. Comparing our results to the repertoire coverage in those species for which repertoire sequencing for single individuals has been performed, i.e., fish [23] and humans [24], we see that the fraction of V genes used is within the range that our simulations identify as "safe" for the individual.
The binding threshold in the simulation must be more than the random 0.5 match, to prevent antibodies from being so "sticky" as to be potentially autoreactive while minimizing the loss of matching to possible foreign Ag sequences. On the other hand, a too-large threshold results in a too-small coverage of the Ag sequence space. Values should be in the order of 0.55, as for higher values the coverage becomes too small, and the organisms become extinct. This result is in agreement with recent studies showing that low-avidity interactions between an Ag and the B-cell receptor can induce deletion, receptor editing and T-dependent immune responses, suggesting that high-avidity binding of the Ag is not essential [9]. As actual Igs are not bit strings and the matching to the Ag is through amino acids with a 3-dimensional structure, the optimal matching value and length that are small enough to cover the maximum Ag space and large enough to implement self-tolerance without compromising Ag recognition, as evidenced by the relatively small size of antibody binding epitope size relative to the total antigen size.
Our simulation also shows that the gene length should not be too large, as the possible number of Ag genes increases exponentially with gene length. Keeping antibody genes short, as observed in real species genomes (relative to the full antigen size as explained above), together with the relatively low binding threshold, allows the Ab repertoire to cover a large portion of the Ag space and thus aids in the survival of the organisms. As the number of human Ig variable region genes is by no means the largest observed, we may also conclude that the gene duplication and deletion rates were evolutionarily optimized to a range that would not cause the decrease in fitness shown by our simulation.
When gene deletion or duplication changes the genome size, the evolutionary process can restore the gene library size back to its optimal size, and so maintain the fitness values. If the rate of gene deletion or duplication is too high, our simulation shows that evolution cannot restore the library size to the optimal one in a way that will maintain the organism population's fitness. From this we conclude that the range of rates of Ig gene deletion and insertion were also optimized during evolution, and organisms with too-high rates became extinct because of their reduced fitness.
While V(D)J recombination is an important contributor to V region diversity, it is out of the scope of the model presented here, which deals only with the evolution of V segments, for the following reasons. (a) Evolutionary changes in the IG gene locus, such as duplications and mutations, operate on segments and not on the recombined genes. (b) Most of the length of the variable region gene is due to the length of the V segment, hence the insights gained for the length of the V segment should be applicable for the whole gene. (c) Most of the inheritable variability in Ig genes is contained within the V segments; while much variability is added by recombination, it is not inheritable. Additionally, most of the binding to the Ag is done by the longer V genes and not the D and J genes, so we focused our study on them, but the same principles that govern V gene evolution should also apply to the evolution of the D and J genes. Evolution has reduced the cost of maintaining large segment libraries by generating diversity through recombination. However the V segment libraries are still large, and their structures - i.e., the composition of the families and in the order the family members appear in the genome - are extremely diverse in the different vertebrates [10, 11] and thus their evolutionary optimization is still an interesting question.
The structure of the families of IgV gene is extremely diverse in the different vertebrates; both the composition of the families and in the order the family members appear in the genome vary considerably. As our simulation does not include the somatic diversity, our results should be more relevant to species that use genomic diversity as the main diversity generator. Indeed, in the simulation results (Figure 4) for most cases the size of families is close to 1, similar to the repertoires found in fish species that rely mostly on genomic diversity, and have a small similarity between genes [11].