Genome, repeat, and functional annotation associated with the naked mole-rat genome assembly, mHetGlaV3 (GCA_964261345.1) (Q6883)

From MaRDI portal
Revision as of 15:16, 20 February 2025 by Importer (talk | contribs) (‎Created a new Item)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Dataset published at Zenodo repository.
Language Label Description Also known as
English
Genome, repeat, and functional annotation associated with the naked mole-rat genome assembly, mHetGlaV3 (GCA_964261345.1)
Dataset published at Zenodo repository.

    Statements

    0 references
    The naked mole-rat (NMR; Heterocephalus glaber) is a eusocial subterranean rodent with a highly unusual set of physiological traits, such as extreme longevity, that has attracted great interest amongst the scientific community. However, the genetic basis of most of these traits has not been elucidated. To facilitate our understanding of the molecular mechanisms underlying NMR physiology and behaviour, we generated a long-read chromosomal-level genome assembly of the NMR. This genome, mHetGlaV2, was subsequently annotated and incorporated into a 91 eutherian mammals multiple whole genome alignment in Ensembl. We identified intra-chromosomal misassemblies within mHetGlaV2. We fixed these misassemblies by comparing syntenic blocks between this assembly and the Canadian Porcupine (EreDor) genome assembly (https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_028451465.1/) and a FISH-Karyotype of the naked mole-rat completed by Romanenko et al., 2023 (PMID: 380307020) to address any misassemblies and place centromeres. Chromosome numbering was identified from a composite karyogram of karyotypes from over 350 cells.This scaffold-corrected assembly is labelled mHetGlaV3 (https://www.ebi.ac.uk/ena/browser/view/GCA_964261345.1). This repository stores the repeat, genome, and epigenome annotations for HetGlaV3. mHetGlaV3.primary.gtf.gz. Gene structures and gene symbols are transferred from ENSEMBL annotations of mHetGlaV2 using liftOff with default parameters. Additional gene symbols were identified using TOGA and manual curation. mHetGlaV3.primary.gtf.gz. Simple repetitive regions and transposable elements were annotated using EarlGrey (https://github.com/TobyBaril/EarlGrey) using "Rodentia" annotations for RepeatMasker. mHetGlaV3.primary.genesymbol_table.txt.txt.gz. A tab-delimited file where rows are gene IDs and columns are gene symbols generated with each method. "Consensus" shows the best matching gene symbol for each gene ID. mHetGlaV3.primary_annotated_blacklist.bed.gz. Provides an assembly "blacklist" for mHetGlaV3. This blacklist is a bed file annotating assembly breakpoints between HetGlaV2 and HetGlaV3. This blacklist contains additional columns (e.g., closest gene, overlapping TE etc.) and should therefore be filtered to the first column before being incorporated into traditional genomic pipelines. mHetGlaV3.primary_hypothalamus_ABC_enhancer.bedpe.gz. Activity-By-Contact enhancers (https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction) generated in the female subordinate naked mole-rat hypothalamus using Hi-C-seq, ChIP-seq of H3K27Ac data, ATAC-seq, and RNA-seq information. mHetGlaV3.primary_hypothalamus_chromHMM.bed.gz. Chromatin states (using Chromhmm) annotating the female subordinate naked mole-rat hypothalamus using H3K4me3 (promoter), H4K4me2 (promoter-enhancer), H3K27Ac (active enhancer), H3K36me3 (elongated), H3K27me3 (polycomb repressed), H3K9me3 (heterochromatin), and CTCF (whole brain) ChIP-seq data, as well as ATAC-seq and RNA-seq data. mHetGlaV3.primary.fa.gz. Genome assembly fasta file for the naked mole-rat (V3, primary assembly). This assembly matches the primary assembly stored on ENA, however the chromosome names match these files, rather than have chromosome names processed by ENA (e.g. chr 1 instead of "OZ179169.1 Heterocephalus glaber genome assembly, chromosome: 1"). UPDATES: * The 1.2 update fixed unscaffolded contig names from those used in-lab to those compatible with ENA. * The 1.3 update added small (50~100kbp) contigs onto mHetGlaV3.primary.fa.gz that were filtered before the ENA submission. * The 1.4 update fixed a small chromosome naming inconsistency spotted in the 1.3 update.
    0 references
    15 December 2024
    0 references
    0 references
    0 references
    0 references
    0 references
    1.4
    0 references

    Identifiers

    0 references