Dna Viruses Do Not Include Which of the Following Families of Viruses?

J Virol. 2001 Dec; 75(23): 11720–11734.

Common Origin of Four Diverse Families of Big Eukaryotic DNA Viruses

Received 2001 May 29; Accepted 2001 Aug 7.


Comparative analysis of the poly peptide sequences encoded in the genomes of iii families of large DNA viruses that replicate, completely or partly, in the cytoplasm of eukaryotic cells (poxviruses, asfarviruses, and iridoviruses) and phycodnaviruses that replicate in the nucleus reveals ix genes that are shared by all of these viruses and 22 more genes that are nowadays in at least three of the four compared viral families. Although orthologous proteins from different viral families typically show weak sequence similarity, because of which some of them have non been identified previously, at least five of the conserved genes appear to be synapomorphies (shared derived characters) that unite these four viral families, to the exclusion of all other known viruses and cellular life forms. Cladistic analysis with the genes shared past at least two viral families as evolutionary characters supports the monophyly of poxviruses, asfarviruses, iridoviruses, and phycodnaviruses. The results of genome comparison allow a tentative reconstruction of the ancestral viral genome and suggest that the mutual ancestor of all of these viral families was a nucleocytoplasmic virus with an icosahedral capsid, which encoded complex systems for DNA replication and transcription, a redox protein involved in disulfide bond germination in virion membrane proteins, and probably inhibitors of apoptosis. The conservation of the disulfide-oxidoreductase, a major capsid protein, and two virion membrane proteins indicates that the odd-shaped virions of poxviruses have evolved from the more mutual icosahedral virion seen in asfarviruses, iridoviruses, and phycodnaviruses.

The category of virus is biological, not evolutionary. Viruses are intracellular parasites that depend on the host cell for their protein synthesis, about of the reactions of nucleic acid forerunner biosynthesis and, to a variable extent, transcription and replication (15). Clearly, viruses are not a monophyletic group. At that place is piddling doubt, for example, that small viruses with unmarried-stranded RNA genomes of merely 5 to 10 kb, such every bit poliovirus or tobacco mosaic virus, on the one hand, and large viruses with double-stranded Deoxyribonucleic acid (dsDNA) genomes of 100 to 500 kb, such as herpesviruses, poxviruses, or iridoviruses, on the other hand, accept evolved independently. Yet, comparative analyses of the genomes of many groups of viruses have suggested common origins for large, heterogeneous assemblages. For example, information technology appears most likely that all reverse-transcribing viruses and mobile elements, in spite of the extreme diverseness of their life cycles and the sets of encoded proteins, take evolved from a common ancestor (17, 56, seventy). Even more unexpected evolutionary connections are suggested by the involvement of homologous enzymes, such every bit superfamily III helicases, in genome replication of both RNA and DNA viruses with small genomes (23), and the central part of the conserved rolling circumvolve replication initiator poly peptide in single-stranded Dna (ssDNA) viruses of eukaryotes and bacteria and in bacterial plasmids (26).

Viruses with big, dsDNA genomes are generally thought to have evolved by capturing multiple genes from the genomes of cellular organisms, their hosts. Indeed, many genes of these viruses, particularly those involved in virus-host interactions, testify high levels of protein sequence similarity to their cellular homologs, which is apparently indicative of relatively recent acquisition by the viral genomes (12, 51, 59). Withal, viruses belonging to a particular large family, such every bit the herpesvirus family or the poxvirus family, share between themselves a core set of genes encoding proteins involved in DNA replication, transcription, and virion biogenesis, most of which are only moderately similar to cellular homologs, if such are detectable at all (3, 51). The being of cadre sets of upward to 40 to fifty conserved viral genes (8, 22) establishes beyond reasonable doubt that the extant members of the families Herpesviridae and Poxviridae have diverged from the respective ancestral viruses that already possessed the principal features of genome replication and expression and of virion structure that are typical of these viral families. In contrast, it remains unclear whether there are any evolutionary connections betwixt different viral families. Poxviruses, African swine fever virus (ASFV, the archetypal member of the family Asfarviridae), and iridoviruses are the 3 families of eukaryotic viruses with large dsDNA genomes that undergo their replication cycle either entirely in the cytoplasm (poxviruses) or beginning their replication in the nucleus and consummate it in the cytoplasm (20, 22, 38, xl, 63, 67), as opposed to herpesviruses and baculoviruses, whose Deoxyribonucleic acid replication and transcription occur exclusively in the nucleus (xxx, 65). Poxviruses, asfarviruses, and iridoviruses encode their own transcription machinery, which includes, in each case, several RNA polymerase subunits and boosted transcription factors, and share several other conserved genes (58, 72). Large Dna viruses isolated from very diverse algae, the Paramecium bursaria chlorella virus (PBCV) and the related Ectocarpus siliculosus virus (ESV), members of the Phycodnaviridae family unit, besides share several genes with nucleocytoplasmic large Dna viruses, although genomes of these viruses are transcribed in the nucleus and, accordingly, they lack genes for RNA polymerase subunits (41, 61). The four families of big eukaryotic Deoxyribonucleic acid viruses, Poxviridae, Asfarviridae, Iridoviridae, and Phycodnaviridae, to which we collectively refer here as nucleocytoplasmic big DNA viruses (NCLDV), take both common and unique features of genomic DNA and virion structure. Poxviruses, ASFV, and PBCV have linear DNA genomes with terminal inverted repeats that grade covalently closed hairpins (40, 67, 75), iridoviruses have circularly permuted linear genomes (60), and ESV appears to accept a circular genome (41). The virions of ASFV, iridoviruses, and PBCV consist of a Dna-protein core that is surrounded past a lipid bilayer, which in turn is encased in one or more icosahedral capsid shells (58, 63, 66). Poxviruses have a more complex, unique virion structure, with a core surrounded by a "brick-shaped" proteolipid crush (40).

It remains uncertain whether the similarities between the gene repertoires, genome structures, and virion architectures of different families of NCLDV are due to contained recruitment of the same or related host genes driven by the common functional requirements for the viral replication cycles or by origin from a common viral ancestor. This crucial dilemma is non readily amenable to conventional phylogenetic assay because even homologous proteins of viruses from different families evidence moderate or weak sequence conservation and may be less similar to each other than to the corresponding cellular homologs (51). At face up value, these observations appear to favor the polyphyletic origin of unlike viral families. However, this aspect of the relationships betwixt viruses needs to be interpreted with circumspection given the realistic possibility of rapid evolution of viral genes (44). Moreover, such rapid deviation potentially might even preclude the very detection of evolutionary relationships between some viral genes. Given these considerations, nosotros were interested in delineating the complete set of conserved genes among NCLDV by applying the almost advanced available methods for sequence similarity detection and assessing the hypothesis of independent recruitment of similar sets of genes from the host equally opposed to an origin of several viral families from a single, ancestor virus. Nosotros expand the listing of conserved genes shared by all or a bulk of NCLDV families and show that origin from a common viral ancestor is the virtually parsimonious scenario for the evolution of all of these viruses.


Viral genome and protein sequences.

Nucleotide sequences of the consummate genomes of large DNA viruses and the respective, predicted poly peptide sequences were extracted from the Genomes division of the Entrez system (National Center for Biotechnology Information, National Institutes of Wellness, Bethesda, Doctor. [http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?db=Genome]). The complete genomes included in this analysis were from the following viruses: poxviruses, including vaccinia virus, strain Copenhagen (VV [21]), variola virus, strain India (VAR [37]), Molluscum contagiosum virus type 1 (MCV [50]), Shope fibroma virus (SFV [66]), Fowlpox virus (FPV [2]), Melanoplus sanguinipes entomopoxvirus (MSV [i]), Amsacta moorei entomopoxvirus (AMV [viii]); asfarviruses, including ASFV (72); iridoviruses, including fish lymphocystis illness virus (FLDV [58]), Chilo iridescent virus (CIV [27]); and phycodnaviruses, including PBCV (type one [35]) and ESV (type 1; N. Delaroque, Grand. Bothe, T. Pohl, R. Knippers, D. G. Mueller, and W. Boland [GenBank NC002687]).

Sequence analysis.

Poly peptide sequences were compared to protein sequence databases past using the BLASTP program and to nucleotide sequence databases translated in six frames by using the TBLASTN program (5). Additional searches for detecting subtle similarities were performed past using the PSI-Nail program with varied cutoffs for including sequences into profiles (4, 5). Multiple alignments of protein sequences were constructed past using the ClustalW (57) and T_coffee programs (43), with subsequent manual refinement on the basis of the PSI-Smash search results. Protein secondary construction was predicted by using the PHD programme, with a multiple alignment submitted as the query (47). Poly peptide sequence-structure threading was performed by using the hybrid fold recognition method (16).

Identification of clusters of orthologous viral proteins.

In order to place sets of orthologous viral proteins, single-linkage clustering based on BLASTP search results was performed by using the BLASTCLUST program and an empirically determined alignment score cutoff of 0.2 bits/position (I. Dondoshansky, Y. I. Wolf, and E. Five. Koonin, unpublished data; ftp://ftp.ncbi.nlm.nih.gov/blast). For resulting clusters that included representatives of two or more viral families, additional PSI-Blast searches were performed confronting the NR database, with all sequences from the original cluster used as queries. Position-specific weight matrices obtained through these searches were saved and used for a second round of searching the NCLDV protein sequences. This was done to detect potential members of the given protein cluster encoded in the genomes from other virus families that could have been missed at the first stage due to depression sequence conservation.

Cladistic analysis.

Cladistic assay was performed by using the PAUP* version 4.0 package (55). A maximum of 4 states, namely, the primitive state (0) and up to three derived states (1, 2, and three), were considered. The human relationship between the derived states was causeless to be unordered, that is, a primitive grapheme could make the transition to any of the derived states if more than one derived state existed for the given character. Proceeds of a novel protein, domain, or sequence motif was scored equally a derived graphic symbol with respect to its complete absence, which was defined as the archaic state. The size ranges and domain architectures of proteins were also used every bit characters scored in the matrix. The shortest trees were determined past using the Co-operative and Spring and the Exhaustive Search algorithms. The consensus of the shortest trees was obtained by using the Consensus Tree routine of PAUP. The character state transitions for each node of the shortest copse were derived by using the Prove Apomorphy routine of PAUP, and this was used to determine the synapomorphies supporting a given clade.


Clusters of orthologous viral proteins.

Viral proteins tend to evolve faster than their cellular counterparts, which makes it difficult to observe homologous relationships for some of them. Therefore, the detection of orthologous sets of viral proteins is non a trivial task and, in some cases, requires application of the nearly avant-garde sequence assay methods. Furthermore, for detecting clusters of viral orthologs, it was important to compare viral proteins amongst themselves simply, to limit the search infinite and thus increment the sensitivity. Once the clusters were identified, their relationships with non-NCLDV proteins were investigated by additional sequence comparisons; the results of these comparisons were so used for refinement of the NCLDV clusters.

The present study resulted in the identification of 9 clusters of apparent orthologs that are shared past all NCLDV, 8 clusters that are represented in all families (although missing in i or more species), and 14 clusters that are conserved in all but 1 family (Table 1). To our noesis, the conservation of v of these proteins in all viral families has not been described previously. These include the predicted helicase D5R (hereinafter we use the systematic nomenclature of proteins from VV Copenhagen, whenever possible), the packaging ATPase A32L, the transcription cistron A1L, the capsid poly peptide D13L, and the myristoylated virion membrane protein L1R/F9L (Table 1). The critical aspect of these clusters of conserved viral proteins is that, although they did not necessarily show a high level of sequence conservation, each of them had distinct features that appeared to be synapomorphies (shared derived characters) of the NCLDV class. Despite systematic searches, we were unable to identify direct counterparts (orthologs) of any of these proteins outside this class of viruses, with the possible exception of D5R orthologs from some bacteriophages. Furthermore, for the two virion proteins, no non-NCLDV homologs at all were detected. We briefly depict each of these signature NCLDV protein families below, with an accent on the features that support their status every bit synapomorphies.


Distribution of conserved genes in large, cytoplasmic Dna viruses and Phycodnaviridae

Factor group and protein familya Distribution of conserved genes inb:
Chordopoxvirus Entomopoxvirus ASFV Iridovirus PBCV ESV Other viruses and plasmids
 VV D5 ATPase D5R AMV087, MSV089 C962R LDV1-ORF6, CIV-184R A456L ORF109 A member of the superfamily Three helicases within the AAA+ superclass ATPases; involved in poxvirus Dna replication, nigh likely as the chief helicase
 Deoxyribonucleic acid polymerase (B family) E9L AMV050, MSV036 G1211R LDV1-ORF5, CIV-037L A185R ORF93 BV, HV, T4, KP Members of the B family of Dna polymerases that as well includes archaeoeukaryotic replicative DNA polymerases and polymerases of herpes-, adeno- and baculoviruses and many bacteriophages
 VV A32 ATPase A32L MSV171, AMV150 B354L LDV1-ORF46, CIV-075L A392R ORF26 Distinct family of ATPases required for virion packaging
 VV A18 helicase A18R AMV059, MSV148 QP509L CIV-161L A153R ORF66 T4 Superfamily II helicase required for transcription termination; absent in FLDV
 Capsid protein D13L AMV122, MSV069 B646L LDV1-MCP, CIV-274L A622L ORF116 PBCV encodes six members of this family
 Thiol-oxidoreductase E10R AMV114, MSV093 B119L LDV1-ORF79, CIV-347L A465R ORF161 Required for the formation of cytoplasmic disulfide bonds in poxvirus proteins; homologous to the cellular ERV1/2 family unit but differs from them in having only ii conserved cysteines
 VV D6R/D11L-similar helicase D11, D6 AMV192, MSV053, AMV174, MSV113 D1133L, Q706L LDV1-ORF4, CIV-022L A363R ORF23 KP Superfamily II helicase required for transcription in poxviruses
 South/T protein kinase F10L MSV154, AMV153 R298L LDV1-ORF17, CIV-380R A617R ORF156 BV, HV Distinct S/T kinases that show no obvious eukaryotic orthologs
 Transcription factor VLTF2 A1L AMV047, MSV187 B175L LDV-ORF102, CIV-350L A482R ORF96 Small proteins containing an FCS-type Zn-finger; entomopoxviruses have a duplication of the FCS domain
 TFIIS-like Zn-ribbon-containing transcription factor E4L AMV120, MSV082 I243L LDV1-ORF105, CIV-349L A125L The viral TFIIS lacks the α-helical TFIIN domain typical of eukaryotic TFIIS
 Nudix (MutT-like) NTP pyrophosphohydrolase D9R/D10R AMV058, MSV150 D250R LDV1-ORF78, CIV-414L A326L These nucleotidases are typically involved in repair of oxidative damage to Dna; their functions in NCLDV remain unclear, but they might regulate expression via mRNA cap hydrolysis (53)
 Myristoylated virion poly peptide A L1R, F9L AMV217, AMV243, MSV094, MSV183 E248R LDV1-ORF20, CIV-118L, CIV-458R A565R Components of the external lipid membrane, the PBCV course is extremely divergent and lacks the cysteines that are conserved in other members of this family
 PCNA G8R E301R LDV1-ORF45, CIV-436R A193L, A574L ORF132 BV, HV, T4 DNA sliding clamp, essential for Dna replication; viral forms are extremely divergent from the cellular forms; G8R is a late transcription factor in poxviruses; PBCV A193L is most closely related to the single PCNA ortholog in ESV and these in plough group with other viral
PCNAs; PBCV A574L groups weakly only specifically with the divergent poxviral PCNAs
 Ribonucleotide reductase, large subunit 14L F778R LDV1-ORF12, CIV-085L A629R ORF180 BV, HV, T4 Absent-minded in FPV and MCV
 Ribonucleotide reductase, small subunit F4L F334L LDV1-ORF26, CIV-376L A476R ORF128 BV, HV, T4 Absent in FPV and MCV
 Thymidylate kinase A48R A240L LDV1-ORF60, CIV-143R, CIV-251L A416R BV, HV Absent in MCV
 dUTPase F2L AMV002, AMV107 E165R CIV-438L A551L BV, HV Absent-minded in MCV, FLDV, and MSV
 Uncharacterized poly peptide B385R LDV1-ORF43, CIV-282R A494R ORF101
 RuvC-like HJR A22R AMV162, MSV106 CIV-170L ORF108 Phage bIL170 Distantly related to fungal mitochondrial RuvC, a possible degenerate version present in PBCV; absent in FLDV
 BV BroA-similar N-final domain MSV194, AMV057 CIV-201R ORF117 BV phage N15 A Deoxyribonucleic acid-bounden domain (BRO) widely distributed in phages and expanded in baculoviruses, entomopoxviruses, and CIV (73)
 Capping enzyme (guanylyltransferase) D1R AMV135, MSV067 NP868R A103R KP ASFV and poxviruses capping enzymes contain RNA triphosphatase, guanylyl transferase, and methyltransferase domains, and the capping enzyme from KP has the aforementioned domain architecture; PBCV encodes singled-out proteins with RNA triphosphatase and methyltransferase activities
 ATP-dependent Deoxyribonucleic acid ligase A50R NP419L A544R T4, BV Lacks the BRCT domains seen in eukaryotes; absent-minded in MCV
 RNA polymerase, largest subunit J6R AMV221, MSV043 NP1450L LDV1-ORF1, CIV-176R KP CIV-343L is just the C-concluding region of this polymerase
 RNA polymerase, subunit 2 A24R AMV066, MSV155 EP1242L LDV1-ORF3, CIV-428L KP
 Thioredoxin/glutaredoxin G4L AMV079, MSV087 CIV-196R, CIV-453L A427L ORF128 T4
 Dual-specificity serine/tyrosine phosphatase H1L AMV078, AMV246 CIV-123R, CIV-197R A305L BV Dual-specificity phosphatases involved in early transcription in poxviruses; absent-minded in FLDV and MSV
 BIR domains AMV021, MSV242 A224L CIV-193R BV Inhibitor of apoptosis in BV and ASFV; the entomopoxviruses, CIV, and BV take a RING finger fused to the C terminus of the BIR domain; the AmEPV and the BV proteins have a duplication of the BIR domain; absent in FLDV
 Virion-associated membrane proteins J5L, A16L, G9R AMV232, MSV142, AMV035, MSV121, AMV118, MSV090 E199L LDV1-ORF29, CIV-337L
 Topoisomerase II P1192R CIV-045L A583L Probably involved in the resolution of replication intermediates
 SW1/SNF2 family helicase MSV224 CIV-172L A548L BV Superfamily 2 helicase; MSV244 protein is fused to an ariadne-like Parkin domain
 RNA polymerase, subunit 10 G5.5R CP80R CIV-107L Accessory transcription gene of the helix-turn-helix fold; absent in FLDV
 Phage P1-like KilA Due north-terminal domain N1R (SFV) AMV100 CIV-313L Phage P1 DNA-binding protein, widely distributed in phages and expanded in AMV and FPV; absent in MSV, MCV, and FLDV; the chordopoxviral proteins are fused to a RING finger; the N1R protein of SFV has been shown to bind DNA and inhibit apoptosis (x)
 VV I8-like helicase I8R AMV081, MSV086 B962L Superfamily Ii helicase required for early on transcription in poxviruses
 RNA polymerase, subunit 5 D205R CIV-455L
 Lambda-type exonuclease D345L A166R ORF64 BV, HV An exonuclease of the restriction endonuclease fold that, in phage lambda, is involved in recombination
 RNase III LDV1-ORF44, CIV-142R A464R Fused to a Staufen-like dsRNA-binding domain
 3β-Hydroxysteroid dehydrogenase, steroid isomerase A44L LDV1-ORF31
 Thymidine kinase J2R AMV016 K196R T4 Absent in MSV
 Ankyrin repeats B17R A238L A672R ORF157 Multiple paralogs in FPV, PBCV, and ESV; ESV ORF142 is fused to a RING finger; absent in MCV
 Smt4/adenovirus-like protease 17L AMV181, MSV189 S273R Adenovirus Thiol protease related to eukaryotic SUMO-deconjugating enzyme (Smt4) and adenovirus protease, which is involved in virion maturation (64)
 Cu-Zn superoxide dismutase A45R AMV255 A245R BV Absent in MSV
 RecB-similar nuclease AMV240 A467L A protein with the restriction endonuclease fold, homologous to archaeal proteins containing a stand-lone RecB nuclease domain (7); absent in MSV
 C-blazon lectin A34R EP153R HV Essential for infectivity of the extracellular enveloped form of chordopoxviruses; multiple paralogs in FPV
 Uncharacterized protein AMV193 DP71L HV Uncharacterized proteins that share a domain with GADD34/MyD116; missing in MSV
 UvrC-like nuclease (URI domain) CIV-146R A134L T4 Related to intron-encoded nucleases (7); CIV-146R is additionally fused to a domain nowadays in CIV-118L (see beneath); multiple paralogs in PBCV; absent in FLDV
 Uncharacterized protein CIV-136R A521L HV Predicted metal-dependent hydrolase (unpublished results)
 Cathepsin B LDV1-ORF24, CIV-224L, CIV-361L ORF75 BV Cysteine protease
 Thymidylate synthase MSV238 CIV-225R T4, HV Absent in AMV
 Bcl2/Bax FPV039 A179L LDV1-ORF81 HV Apoptosis inhibitor; absent in variola and MCV
 Lipase AMV133, MSV048 ORF185
 Lysophospholipase K5L A271L Absent in variola, MCV, and FPV
 Matrix metalloprotease AMV070, MSV175, MSV176, MSV179 CIV-165R BV Absent in FLDV
 Uncharacterized protein LDV1-ORF70, CIV-067R A324L ORF103
 Ariadne-similar Parkin-domain-containing protein MSV224 LDV1-ORF36 A regulatory domain with a potential role in ubiquitin-mediated signaling; MSV224 is fused to a SW1/SNF2-similar superfamily 2 helicase
 NAD-dependent Deoxyribonucleic acid ligase AMV199, MSV162 CIV-205R A singled-out DNA ligase family unit that is distantly related to ATP-dependent DNA ligases and is ubiquitous in leaner but uncharacteristic of eukaryotes
 Very curt patch repair endonuclease MSV229, MSV196, AMV257 CIV-069L A nuclease of the restriction enzyme fold (six); CIV-069L and four of its orthologs in MSV are fused to the baculovirus-like BRO DNA-binding domain
 MACRO domain AMV247, MSV139 CIV-031R, CIV-032R T4 A phosphoesterase domain present in chromatin and splicing associated complexes
 Methyltransferase AMV004 CIV-235L A distinct class of non-purine methyltransferase; absent in MSV
 Uncharacterized poly peptide MSV198, AMV194 CIV-118L Expanded in CIV and entomopoxviruses; several entomopoxvirus genes are fused to a BRO-similar Dna-binding domain; CIV-146R is fused to a URI domain nuclease
 Predicted esterase CIV-463L A173L α/β Hydrolase fold protein
 Uncharacterized domain CIV-378R, CIV-232R, CIV-380R, LDV1-ORF14, LDV1-ORF16, LDV1-ORF25 A676R The FLDV proteins and CIV 232R and 280R are fused to an S/T protein kinase domain; the domain in PBCV-A676R is fused to a PBCV-specific domain that is likewise nowadays in several PBCV S/T kinases

D5 NTPase and helicase.

VV D5R protein is an NTPase that is essential for viral Dna replication (fourteen). The D5R protein and its orthologs in other NCLDV are peripheral members of the AAA+ grade of NTPases (42), as demonstrated by the detection of these sequences in iterative database searches started with many AAA+ NTPase sequences. Within the AAA+ class, the D5R family belongs to the so-called helicase superfamily Iii (SFIII), which consists entirely of viral and plasmid proteins (Fig. 1A). Originally, SFIII has been identified equally an assemblage of (predicted) helicases encoded by small RNA and Dna viruses (23, 31). Nosotros found that, in PSI-BLAST searches seeded with the sequence of the predicted ATPase domains of poxvirus D5R proteins, statistically significant similarity to E1 proteins of papillomaviruses (bona fide members of SFIII) was detected in the fifth iteration. The closest homologs of the predicted NCLDV helicases are encoded by certain bacteriophages, in some cases integrated into bacterial chromosomes (Fig. 1A). The predicted helicases of NCLDV and this subset of bacteriophage helicases share a singled-out, conserved region upstream of the ATPase domain that is non found in any other proteins (Fig. 1A). The NCLDV grouping also has several unique motifs within the predicted ATPase domain (Fig. iA).

An external file that holds a picture, illustration, etc.  Object name is jv231107101a.jpg
An external file that holds a picture, illustration, etc.  Object name is jv231107101c.jpg
An external file that holds a picture, illustration, etc.  Object name is jv231107101d.jpg
An external file that holds a picture, illustration, etc.  Object name is jv231107101e.jpg

Multiple alignments of conserved proteins that define the cytoplasmic DNA virus clade. (A) D5R-similar helicases. With the PBCV ATPase as the seed, the ESV ortholog and many phage primases were recovered with highly significant Expectation (East) values in the first iteration. Proteins from the other NCLDV and the distantly related papillomavirus, parvovirus, and positive-strand RNA viruses were recovered in the second and 3rd iterations with E-values of <10−iii. For example, ASFV C962R was recovered with an Due east-value of 10−viii in the tertiary iteration. Farther transitive searches identified all of the members of superfamily Iii helicase. (B) A32L-like ATPases. With the PBCV ATPase equally the seed, iridoviral orthologs were recovered in the first iteration with an E-value of <x−5. Orthologs from all other NCLDV were recovered by the tertiary iteration with significant East-values such as 3 × ten−xix for MCV and 2 × 10−04 for ASFV orthologs. (C) A1L-similar transcription factors. A profile fabricated with previously detected FCS domains from the polyhomeotic and FIM families of proteins, when run confronting the NCLDV poly peptide sets, with an inclusion cutoff of 0.01, recovered all members of this family; VV A1L, for case, was recovered with an Eastward-value of x−4. (D) D13L-similar capsid proteins. With p50 of the Spodoptera exigua ascovirus as the seed, the PBCV and other iridoviral capsid proteins were recovered with Due east-values of <2 × 10−8. The ASFV ortholog was detected in the third iteration with an E-value of iii × 10−3, and the poxviral D13L-like proteins were recovered at borderline Eastward-values (0.14) in the fourth iteration. When a profile made from the alignment of the PBCV, iridovirus, and ASFV sequences was run against a database of all NCLDV proteins, the poxviral orthologs were detected as superlative hits, with E-values of <x−5. The probability of the conserved motifs shown hither to occur in these proteins by gamble was <x−xv, as computed past using the MACAW program (49). (E) L1R/F9L-like virion membrane proteins. With CIV 048L as the seed, the ASFV and PBCV orthologs were recovered in the second iteration, with E-values of 8 × ten−4 and 10−3, respectively. The entomopoxviral orthologs were detected in the third iteration with an E-value of 2 × 10−4. A transitive search with the entomopoxviral proteins recovered the other poxviral proteins with E-values of <x−3. Each protein is denoted by the corresponding gene name followed by species abbreviation and the GenBank Identifier (GI) number. The numbers preceding and post-obit the alignments indicate the positions of the first and last residues of the aligned regions in the corresponding protein sequences. The numbers betwixt aligned blocks indicate the number of inserted residues that were omitted from the figure. The coloring reflects the conservation of amino acid residues at 85% consensus. The coloring scheme and the consensus abbreviations are as follows: hydrophobic residues (LIYFMWACV) are designated "h" in the consensus line, aliphatic (LIAV) residues are also shaded yellow and designated "l," alcohol (Due south,T) is bluish and designated "o," charged (KERDH) residues are purple and designated "c," polar (STEDRKHNQ) residues are purple and designated "p," small (SACGDNPVT) residues are green and designated "south," large (LIFMWYERKQ) residues are shaded gray and designated "b." Conserved cysteines predicted to form a Zn-finger structure (C) or a disulfide bond (E) are indicated past white letters confronting a red background. Secondary construction elements predicted by using the PHD program are indicated in panels C and D; where "E" indicates extended conformation (b-strand) and "H" indicates the α-helix. The abbreviations for the NCLDV are divers in Materials and Methods. Boosted abbreviations: AAV, adeno-associated virus 5; AcNPV, Autographa californica nucleopolyhedrovirus; Bf, Bacteroides fragilis, Ce, Caenorhabditis elegans; Cglu, Corynebacterium glutamicum; Cpf, Clostridium perfringens; Dm, Drosophila melanogaster; DpAV4, Diadromus pulchellus ascovirus; Ec, Escherichia coli; HPV08, man papillomavirus blazon eight; Hs, Man sapiens; LcbA2, Lactobacillus casei bacteriophage A2; Mace, Methanosarcina acetivorans; MStV, maize streak virus; phi-105, Bacteriophage phi-105; phiC31, Bacteriophage phiC31; Polio, human poliovirus 1; SacV, Spodoptera exigua ascovirus; Si, Sulfolobus islandicus; SV40, Simian virus 40; Xf, Xylella fastidiosa.

Packaging ATPase A32L.

The A32L gene product has been predicted to possess ATPase action, primarily on the footing of the conservation of the P-loop and Mg2+-binding motifs (33), and subsequently has been shown to be involved in DNA packaging into virions (13). Comparisons of the NCLDV protein sets and iterative database searches detected apparent orthologs of A32L in all NCLDV (Fig. iB). Although these predicted ATPases may be distantly related to the AAA+ superclass, they showed no specific relationship with any other ATPase family. In particular, other ATPases do not contain readily detectable counterparts of the C-last motifs of A32L, which should be considered a synapomorphy of NCLDV (Fig. 1B).

Transcription cistron A1L.

A1L is a pocket-size protein that contains a Zn-finger-domain that nosotros designated the FCS-finger (so named later a characteristic amino acrid signature) and functions equally a transcriptional transactivator of late VV genes (28); A1L orthologs were found in all NCLDV. The FCS-finger is a previously undetected Zn-binding domain that we identified in several eukaryotic chromatin proteins such as the Drosophila Sexual practice Combs on Middle Leg, Polyhomeotic, Lethal three of Malignant Encephalon Tumor, and vertebrate FIM. This domain is also found fused to the C termini of recombinases from certain prokaryotic transposons. However, A1L orthologs from NCLDV are a singled-out stand-alone class of the FCS domain and thus should exist considered an NCLDV synapomorphy (Fig. 1C).

Capsid protein D13L.

The virions of unlike NCLDV accept dramatically different structures. The major capsid proteins of iridoviruses and phycodnaviruses, both of which have icosahedral capsids surrounding an inner lipid membrane, showed a high level of sequence conservation. A more than limited, but statistically significant sequence similarity was observed between these proteins and the major capsid protein (p72) of ASFV, which likewise has an icosahedral capsid. It was surprising, however, to observe that all of these proteins shared a conserved domain with the poxvirus poly peptide D13L, which is an integral virion component idea to course a scaffold for the germination of viral crescents and immature virions (54). In spite of low sequence similarity, D13L sequences share a common domain with conserved predicted structural elements with the major capsid proteins of the other NCLDV (Fig. iD). The capsid proteins of iridoviruses, phycodnaviruses, and ASFV have an additional C-terminal domain that is predicted to adopt the jelly curl fold typical of capsid proteins of numerous Deoxyribonucleic acid and RNA viruses (46). In poxvirus D13L proteins, the jelly scroll domain is replaced by a distinct β-strand-rich domain that showed no detectable human relationship with any known domains. This difference in the C-terminal domains of poxvirus D13L proteins compared to the major capsid proteins of other NCLDV probably reflects the new function of D13L as a scaffold for viral crescents.

Virion membrane protein L1R/F9L.

Paralogous poxvirus genes L1R and F9L encode membrane proteins that accept a conserved domain architecture, with a single, C-final transmembrane helix, and an Due north-terminal, multiple-disulfide-bonded domain (51). The L1R protein is myristoylated and has been implicated in virion assembly (45, 68). Homologs of the L1R/F9L family unit proteins and then far take not been detected outside poxviruses. Nonetheless, our comparisons revealed apparent representatives of this family unit in all NCLDV, with the single exception of ESV (Fig. 1Eastward). With the exception of PBCV, all NCLDV share 2 of the disulfide-bail-forming cysteine residues and accept a transmembrane helix C-final to the core domain. The PBCV protein is highly divergent and seems to accept lost the disulfide-bonding cysteines; nevertheless, it has an additional cysteine-rich, EGF-like domain that is also found in other PBCV proteins (information not shown). This domain is inserted between the cadre L1R-similar domain and the C-terminal transmembrane helix.

A conserved structural role for this protein is compatible with the being of a lipid membrane in all NCLDV, in spite of the major differences in virion structure. Furthermore, the conservation of the myristoylated, disulfide-bonded protein in nigh of the NCLDV correlates with the conservation of the thiol-disulfide oxidoreductase E10R which, in VV, is required for the formation of disulfide bonds in L1R and F9L (52).

Other apparent synapomorphies of NCLDV.

Fifty-fifty when credible orthologs of a viral protein are present in cellular life forms, the viral version may have unique features. An example is the thiol-disulfide oxidoreductase E10R. The proteins of this family encoded by different NCLDV show limited sequence similarity to each other, and some are more like to apparent orthologs from eukaryotes, such every bit the yeast ERV1/2 proteins (52). Nevertheless, all nonviral members of this family unit share two pairs of conserved cysteines, whereas simply one pair is conserved in the proteins from NCLDV.

Another notable ancestral protein family of NCLDV consists of homologs of proliferating cell nuclear antigen (PCNA), a poly peptide that is ubiquitous in cellular life forms and functions as the sliding clench during Deoxyribonucleic acid replication (xi). The members of the PCNA superfamily identified in NCLDV testify limited sequence similarity to the cellular homologs; in fact, the poxvirus PCNA homologs (G8R) were identified in this study only through the utilize of the sequence-structure threading technique. Phylogenetic analyses on the PCNA superfamily indicated that the NCLDV PCNA homologs tend to cluster together, to the exclusion of eukaryotic homologs, but typically class longer branches than any cellular PCNAs, suggesting rapid divergence during NCLDV evolution (unpublished data). Poxvirus G8R is the well-nigh divergent member of the PCNA superfamily. The available experimental evidence points to a chief role of this protein in vaccinia virus belatedly cistron transcription, rather than replication (69, 74), suggesting a causal connexion betwixt rapid sequence departure and the change of function.

Among the proteins that are conserved in three of the iv NCLDV families, the most notable one is the membrane protein that, in poxviruses, is represented by three paralogs, J5L, G9R, and A16L, which are predicted to form multiple disulfide bonds (51). These proteins resemble the virion membrane proteins of the L1R/F9L grouping in domain architecture, just announced not to be homologous to them or to any other proteins.

Cladistic analysis suggests monophyly of NCLDV.

Phylogenetic tree analysis of those NCLDV proteins that have homologs in other viruses and in cellular life forms, such as Dna polymerase, helicases and others (Table 1), fails to back up monophyly of NCLDV (26; unpublished observations). However, this cannot be considered stiff evidence confronting monophyly considering viral genomes tend to evolve apace, resulting in distortions of phylogenetic tree topologies. Indeed, as discussed above, even those groups of orthologous NCLDV proteins that contain clear synapomorphies testify only limited sequence conservation. Therefore, as an alternative approach for assessing the evolutionary relationships among the NCLDV, we undertook formal cladistic analysis (25) of viral factor sets after identifying likely orthologs in other viruses and cellular organisms (Table i). All genes that occur in at to the lowest degree ii families of NCLDV were scored equally described in Materials and Methods to obtain character states for the last taxa under examination. The 11 terminal taxa considered in this analysis were chordopox viruses, entomopox viruses, asfarviruses (ASFV), iridoviruses (CIV and FLDV), PBCV, ESV, herpesviruses, baculoviruses, bacteriophage T4, and the eukaryotic cell (host cell). A full of 59 characters were scored over these xi taxa to construct the data matrix used in the cladistic analysis (data non shown [bachelor as supplementary fabric from the authors]).

Trees that provided the shortest path of character state changes to result in the grapheme configuration observed in the concluding taxa were identified past using the Co-operative and Bound method and the Exhaustive Search algorithm that evaluates all possible tree topologies for the given terminal taxa. I near parsimonious tree was found that supported the monophyly of the NCLDV past 16 synapomorphies. As expected, the monophyly of the so-called phycodnavirus clade (PBCV plus ESV) and the poxvirus clade (entomopox viruses plus chordopoxviruses) was strongly supported (Fig. 2). In addition, there was a weaker support for the monophyly of the animal viruses (poxviruses plus ASFV plus iridoviruses), to the exclusion of the phycodnaviruses, by half-dozen synapomorphies. Furthermore, the tree contained a clade consisting of poxviruses and asfarviruses, to the exclusion of the iridoviruses, which was supported by 8 synapomorphies. This tree was used to excerpt a listing of derived shared characters for the NCLDV clade that were used in reconstructing the repertoire of genes present in the hypothetical NCLDV (see below). The monophyly of the three fauna viral families, namely, asfarviruses, iridoviruses, and poxviruses, emerged consistently with different sets of characters, simply the relationships among these families were highly sensitive to minor changes in characters used in the analysis (information not shown). Thus, the actual branching pattern within the beast NCLDV clade requires additional data for confident resolution.

An external file that holds a picture, illustration, etc.  Object name is jv2311071002.jpg

Consensus cladogram of cytoplasmic DNA viruses. The cladistic analysis was performed as described in the text. The proteins that were probably present in the common antecedent of the universally supported NCLDV clade are superimposed on the consensus tree. Also shown on the consensus tree are the state changes in each of the terminal lineages and the strictly supported clades. The plus sign indicates a graphic symbol that is most parsimoniously explained as an independent gain that was almost likely acquired through horizontal transfer between the viral genome or through transfer from the host genome. The minus sign denotes the loss of an ancestral graphic symbol in a particular lineage.

Hypothetical ancestral NCLDV.

Given the support for a monophyletic NCLDV clade, the possibility emerges for an approximate reconstruction of the hypothetical ancestral virus. The genes that are shared by all viruses within this clade are obvious candidates for ancestral origin but, additionally, other genes identified as synapomorphies of the NCLDV clade are also, according to the parsimony principle, likely to have been present in their last mutual ancestor. These typically are genes present in the bulk of the NCLDV taxa considered in this analysis. Under this reasoning, the absence of otherwise conserved genes in 1 lineage is attributed to factor loss, in case of essential genes accompanied by nonorthologous gene displacement (32). Lineage-specific cistron loss obviously occurred also inside private NCLDV families, particularly in ESV, which does not have many genes conserved in all or virtually NCLDV, including PBCV, and, amidst poxviruses, in MCV that has lost all genes involved in nucleotide metabolism (51). A probable example of displacement is the topoisomerase function that is represented by the predicted ancestral form, type 2 topoisomerase, in asfarviruses, iridoviruses, and phycodnaviruses (except for ESV, which apparently has lost this gene), whereas poxviruses accept an unrelated type IB topoisomerase. Some of the genes that are conserved in simply 2 of the NCLDV families too might be part of the legacy of the ancestral virus, but in these cases, it is difficult to rule out alternative scenarios, such equally contained acquisition from the host or horizontal cistron transfer.

Under these assumptions, we arrive at a conservative list of 31 bequeathed viral genes (Table 1); for comparison, all poxviruses share ca. 50 genes (viii). Considering that the ancestral virus might have been a simpler entity than its extant descendants, even this conservative reconstruction may exist a reasonable approximation of the ancestral set of essential viral genes. Test of this list suggests that the bequeathed NCLDV already had adequately elaborate systems for genome replication and expression, some enzymes of nucleotide metabolism, a packaging mechanism, capsid and membrane virion proteins, an electron-transfer system for disulfide-bond formation in the latter, a machinery of protein phosphorylation-dephosphorylation probably involved in the regulation of virion morphogenesis, and maybe an apoptosis inhibitor (Table 2).


Predicted functional systems of the ancestral nucleocytoplasmic Dna virus

Part and/or pathway Proteins
Deoxyribonucleic acid replication DNA polymerase, D5R-like helicase, RuvC-like Holliday junction resolvase, PCNA (DNA clamp), ATP-dependent DNA ligase, type 2 topoisomerase, dUTPase
Dna precursor synthesis Ribonucleotide reductase (2 subunits), thymidylate kinase
Transcription and RNA processing   RNA polymerase (two large subunits and subunit 10), A1L-like and TFIIS-similar transcription factors, D6R-like, A18R-similar, SWI/SNF2-like helicases, capping enzyme, BRO-like Dna-binding protein, Nudix hydrolase
Virion morphogenesis A32-like packaging ATPase, E10R-similar thiol-oxidoreductase, glutaredoxin-thioredoxin
Regulation of morphogenesis F10L-like protein kinase, H1L-similar phosphatase
Virion structure D13L-like capsid poly peptide, L1R-family and J5L-family virion membrane proteins
Inhibition of apoptosis BIR-domain-containing protein

Given the presence of nucleocytoplasmic, purely cytoplasmic, and nuclear life cycles in the monophyletic assemblage of NCLDV, information technology appears well-nigh likely that their last common ancestor had both nuclear and cytoplasmic phases in its life cycle. From this ancestral state, some of the descendant lineages, such every bit phycodnaviruses, announced to have moved to an entirely nuclear replication. The wholly nuclear replication of vertebrate iridoviruses (22, 36) likewise appears to be a secondary accommodation because FLDV has lost several essential enzymes that are essential for viruses that replicate in the cytoplasm, such as DNA ligase, capping enzyme, and topoisomerase.

The ancestral virus tin be inferred to have had an icosahedral capsid with an inner membrane layer, a structure most similar to those of iridoviruses and PBCV. This notion is supported by the presence of icosahedral capsids in three of the iv NCLDV families, which correlates with the presence of the jelly roll domain in the major capsid protein, and the general consideration of the icosahedron being 1 of the basic virion structures in numerous, various viruses. The more complex organization of poxvirus virions appears to exist a derived state. With the previously described conservation of the ERV-family thiol-oxidoreductase and glutaredoxin (with the apparent exception of ASFV) that contribute to the formation of disulfide bonds in virion membrane proteins (51, 52) and the present demonstration of the conservation of three structural proteins of the virion, the evolutionary connexion between the poxvirus virions and those of other NCLDV appears certain.

The genes of the ancestral NCLDV that were responsible for virus-host interaction cannot exist inferred from the comparison of extant viral genomes because the repertoires of such genes in different NCLDV families are largely unlike and, based on the existence of highly like cellular homologs for virtually of them, must have been acquired independently. The BIR domain-containing apoptosis inhibitor could be an exception to this full general pattern (Table 1). We are unlikely to go any insight into this aspect of the ancestral NCLDV until clear indications are obtained every bit to what kind of host it infected. If the fungal connections mentioned below indicate to the original host, a relatively simple genome with a small-scale number of host-interaction genes seems a plausible possibility.

Relationships betwixt NCLDV and other genetic elements and origin of NCLDV.

Many NCLDV genes have homologs or even apparent orthologs in other viruses and plasmids (Tabular array 1). In particular, multiple relationships have been previously noticed to exist between NCLDV genes (specifically, those of poxviruses) and genes of T-fifty-fifty bacteriophages (34, 62). However, neither T-fifty-fifty phages nor herpesviruses or baculoviruses possess a significant subset of the core gene set of the NCLDV (Tabular array 1). Furthermore, the genes that are shared do non evidence observable synapomorphic features. Therefore, direct evolutionary relationships between these classes of viruses manifestly cannot be positively established. The observed overlaps between gene sets can be explained largely past independent acquisition of genes that are generically required for DNA virus replication (for example, Dna polymerase, ribonucleotide reductase, or thymidylate kinase) and, peradventure, some cases of horizontal gene exchange.

A more coherent relationship appears to exist betwixt the NCLDV and linear Deoxyribonucleic acid plasmids from fungal mitochondria, with five shared genes (of the 10 to 12 genes that are typically present on these plasmids [eighteen, 39]) (Table 1). Importantly, these seem to be the principal genes that are required for Deoxyribonucleic acid virus genome expression in the cytoplasm, including two RNA polymerase subunits, a helicase involved in transcription, and a capping enzyme with a conserved domain architecture (Tabular array 1). In at least i case, that of the D6R-type helicase, the NCLDV proteins bear witness high sequence similarity to the plasmid homolog, to the exclusion of other homologous helicases (information not shown). It seems plausible that the fungal plasmids indeed contain a part of the core gene set of the hypothetical ancestral NCLDV. Nonetheless, the fungal plasmid genomes accept a final poly peptide that functions in replication priming and, in this respect, resemble adenoviruses and protein-priming DNA phages (48), rather than NCLDV; the monophyly of Dna polymerases from protein-priming viruses and plasmids is supported past phylogenetic tree analysis (29). Thus, the data suggest complex evolutionary relationships, with components of the replication and expression systems drawn from different types of genetic elements, rather than a direct link betwixt the NCLDV and fungal plasmids.

A complex evolutionary scenario for the origin of the NCLDV, including multiple gene exchanges between unlike types of genomes, is suggested past the phyletic provenance of several other genes shared by all or a subset of NCLDV families. These include the replicative helicase D5R, the Holliday junction resolvase (HJR) A22R, and the predicted protease I7L (Table one). The distribution of the D5R homologs is particularly unusual. As shown to a higher place (Fig. 1), true orthologs of the NCLDV replicative helicase were detected but in certain bacteriophages. More afar members of the helicase III superfamily are encoded past diverse small genetic elements, including ssDNA viruses (geminiviruses and parvoviruses), small dsDNA viruses (papovaviruses), positive-strand RNA viruses (for instance, picornaviruses), some phages, and plasmids. And so far, no members of this superfamily encoded in genomes of cellular life forms (some prophages nevertheless) have been detected. This distribution pattern of an essential viral gene suggests a long history of broadcasting betwixt (relatively) pocket-sized genomes, perhaps tracing dorsum to the ancient RNA earth.

A different evolutionary history appears plausible for the RuvC-similar HJR A22R, which is present in poxviruses, at least some iridoviruses, and phycodnaviruses, suggesting that it might have been inherited from the mutual antecedent of the NCLDV. This enzyme belongs to a family of resolvases that are common in bacteria but not detectable in eukaryotes, except for a nuclease that functions in fungal mitochondria; the latter shows the strongest (albeit express) sequence similarity to the resolvases of NCLDV (xix). This suggests at least two horizontal transfers, from protomitochondria to fungi and from fungi to the ancestral NCLDV (bold that this resolvase indeed is inherited by NCLDV from their common ancestor). In the lineages which lack the RuvC-like HJR, such as PBCV and ASFV, it might have been displaced by an culling enzyme, namely, the Lambda-type exonuclease that is present in these viruses (6) (Table ane) or the RecB-like nuclease in PBCV.

The available data are insufficient to reconstruct a complete evolutionary scenario for the origin of the ancestral NCLDV. Genome sequencing of representatives of additional viral families has the potential to shed lite on the evolutionary source(due south) of NCLDV as suggested, for example, by the recent preliminary assay of the genome of the archaeal virus SIRV1 (ix). This virus has a relatively small genome of 32 kB with covalently closed hairpins at the ends, which resembles the genome construction of poxviruses, asfaviruses, and phycodnaviruses. However, the HJR and dUTPase of SIRV1 show articulate archaeal affinities, emphasizing a difference from NCLDV (unpublished data). Taken together, the above observations show that the ancestral viral genome probably assembled via gradual accretion of genes from different genetic sources, including host genomes, plasmids, and other viruses. It appears that a complex history of multiple horizontal genes transfers and cistron losses both preceded and succeeded the emergence of the ancestral NCLDV. Thus, it is all the more notable that this evolutionary focal indicate can exist identified and some basic aspects of the replication of the bequeathed virus can be reconstructed with reasonable confidence on the basis of a detailed comparison of extant viral genomes.


We give thanks Bernard Moss for disquisitional reading of the manuscript and useful suggestions and Stewart Shuman for a helpful discussion.

Addendum IN PROOF

While this article was being processed for production, a paper describing the sequence of the ESV1 genome was published (North. Delaroque, D. G. Muller, G. Bothe, T. Pohl, R. Knippers, and Due west. Boland, Virology 287:112–132, 2001).


