The genomic origin of the unique chaetognath body plan (Q6032)

From MaRDI portal
Dataset published at Zenodo repository.
Language Label Description Also known as
English
The genomic origin of the unique chaetognath body plan
Dataset published at Zenodo repository.

    Statements

    0 references
    Supplementary files and code for the "The genomic origin of the unique chaetognath body plan" ATAC-seq Processing of called peaks (bed) descripting classif-atac.ipynb and resulting called peaks in peaks_all_re.txt and filtered version peaks_flt_re.txt. GeneFamilies Code used for gene family analyses is detailed in gene_families 2.ipynbusing as input: - the gene families inferred by Broccoli orthologous_groups_eq.txt - the reconciliated gene trees calculated by GeneRax as NHS format: Chaeto_rev0124_recon.nhx and also as XML in xml/Chaeto_rev0124_recon_xml.tgz with the corresponding code to parse them. The file Chaeto_rev0124_recon.lab.trehas the same trees in a human readable foramt with the gene names for mouse and Drosophila. Resulting files include - the list of gained, lost and duplicated gene families:Orthogroups_GLD_re - GO enrichment for chaetognath duplicates:Pgot_DupGO_enrch_r_BP_wn.tsv - script used to compute 4DTv stats4D.pyfrom reciprocal gene alignementsPargotALI.out.gz. Results are inPargotALI.stats.gz -Panther_all.txt contains panther annotation for all the proteomes. - emapper/*.emapper.annotations.gzcontains the eggnog annotation for selected proteomes - GenEra_34758_gene_ages.tsvis the result of GenEra phylostratigraphic analyses - loss_gnathi_bflo.txt: amphioxus homologues of genes lost in the gnathiferan lineages - proteomes-pgot-sel.tgz: proteomes of selected genes used for gene family reconstruction Methylation - Script_Chaeto.R: R script to perform data analysis and plotting -ChaetoDeepToolsCommands.sh: plots of methylation in genes and TEs - EMseq_files.tar.gz: result file from EM-seq - Methylated_genes.tsv: list of methylated genes - MethylationToolkitGenes.txt: analyseis of methylation toolkit - Paraspadella_EMseq.CGmap.gz: EMSeq genome-wide map OperonTransSplicing -SL_Operon_redux-chim.ipynb: notebook describing the annotation of operons - SL_status_eq.txt: SL assigned to genes - go-basic.obo: gene ontology file - Pgot_lowinput_SLs_counts_eq.tsv: counts of splice-leaders detetected for transcripts of annotated genes - Pgot_operons_filt_eq.txt: list of annotated operons - Pgot_OvL0Qm.cro.sizes: list of chromsome and scaffold sizes - Pgot_oper_GOe_eq.tsv: GO enrichment in operons Ressources Main ressource files including : - Pgot_OvL0Qm_cn.fa.gz: genome fasta file - Pgot_OvL0Qm_aPe.gtf.gz: GTF file - Pgot_genInfo_rr.txt: list of genes with function annotation, gene family, domains, phylostrata, etc... - Pgot_OvL0Qm_aP.repeats.cro.bed: BED files with positions of repeats Single-cell -SAMap_vignette.ipynb: notebook describing how to run SAMap. - markers.tsv: list of cell-types marker genes inferred using Seurat - ChaetoGN_Lau.Rmd: R markdown summarising main analyses steps - Ch_v5_chim.RDS: R object containing the analyses datasets - maps/*: results of SAMAP comparisons Hi-C -NOTEBOOK_all_hic_analyses.ipynbdetailed the analyses of Hi-C data using other scripts and files in this folder -data` contains main Hi-C datafiles including multires contact map chaeto.matrix.final.allres.hic
    0 references
    15 October 2024
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references

    Identifiers

    0 references