Conjunto de datos de listas de chequeo Registrado

    Genome Taxonomy Database r214.1

    Parks D • Hugenholtz P

    Descripción

    The Genome Taxonomy Database (GTDB) is an initiative to establish a standardised microbial taxonomy based on genome phylogeny, primarily funded by the Australian Research Council via a Laureate Fellowship (FL150100038) and Discovery Project (DP220100900), with the welcome assistance of strategic funding from The University of Queensland.

    The genomes used to construct the phylogeny are obtained from RefSeq and GenBank, and GTDB releases are indexed to RefSeq releases, starting with release 76. Importantly and increasingly, this dataset includes draft genomes of uncultured microorganisms obtained from metagenomes and single cells, ensuring improved genomic representation of the microbial world. All genomes are independently quality controlled using CheckM before inclusion in GTDB, see statistics here . The GTDB taxonomy is based on genome trees inferred using FastTree from an aligned concatenated set of 120 single copy marker proteins for Bacteria, and with IQ-TREE from a concatenated set of 53 (starting with R07-RS207) and 122 (prior to R07-RS207) marker proteins for Archaea (download page here ). Additional marker sets are also used to cross-validate tree topologies including concatenated ribosomal proteins and ribosomal RNA genes. NCBI taxonomy was initially used to decorate the genome tree via tax2tree and subsequently used as a reference source of new taxonomic opinions including new names. The 16S rRNA-based Greengenes and SILVA taxonomies were intially used to supplement the taxonomy particularly in regions of the tree with no cultured representatives, however genome assembly identifiers are now used to create placeholder names for uncultured taxa. LPSN is used as the primary nomenclatural reference for establishing naming priorities and nomenclature types. All taxonomic ranks except species are normalised using PhyloRank and the taxonomy manually curated to remove polyphyletic groups. Polyphyly and rank evenness can be visualised in PhyloRank plots . Species were originally delineated based on phylogeny and rank normalization but this was replaced with an ANI-based method (starting with R04-RS89) to enable scalable and automated assignment of genomes to species clusters.

    The GTDB taxonomy can be queried and downloaded through a number of tools at https://gtdb.ecogenomic.org/

    Cobertura taxonómica

    Cobertura
    ArchaeaArchaea

    Bibliografía

    • Parks, D.H., et al. (2020). A complete domain-to-species taxonomy for Bacteria and Archaea. Nature Biotechnology
      Identificador: DOI:10.1038/s41587-020-0501-8Google Scholar
    • Parks, D.H., et al. (2018). A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nature Biotechnology, 36: 996-1004
      Identificador: DOI:10.1038/nbt.4229Google Scholar

    Contactos

    • Donovan Parks

      Autor
      Autor de metadatos
      Punto de contacto administrativo
      Organización
      Australian Centre for Ecogenomics
      Cargo
      Dr
      Roles
      Autor
      Autor de metadatos
      Punto de contacto administrativo
      Correo electrónico
      ID del usuario
    • Phil Hugenholtz

      Autor
      Autor de metadatos
      Punto de contacto administrativo
      Organización
      Australian Centre for Ecogenomics
      Cargo
      Professor
      Roles
      Autor
      Autor de metadatos
      Punto de contacto administrativo
      Correo electrónico
      ID del usuario
    • Pierre Chaumeil

      Usuario
      Punto de contacto administrativo
      Organización
      Australian Centre for Ecogenomics
      Cargo
      Software developer
      Roles
      Usuario
      Punto de contacto administrativo
      Correo electrónico

    Registro en GBIF

    Fecha de registro
    08 de enero de 2021
    Última modificación de los metadatos
    05 de enero de 2024
    Fecha de publicación
    05 de enero de 2024
    Alojado por
    GBIF Secretariat
    Instalación
    GBIF Hosted Datasets
    Endpoints
    Archivo Darwin Core
    Identificador preferido
    10.15468/dpzg84

    Cita

    Parks D, Hugenholtz P (2024). Genome Taxonomy Database r214.1. Version 1.92. The University of Queensland. Checklist dataset https://doi.org/10.15468/dpzg84 accessed via GBIF.org on 2025-04-28.