Benchmark Multi-Omics Datasets for Methods Comparison (Q6847)

From MaRDI portal
Revision as of 15:15, 20 February 2025 by Importer (talk | contribs) (‎Created a new Item)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Dataset published at Zenodo repository.
Language Label Description Also known as
English
Benchmark Multi-Omics Datasets for Methods Comparison
Dataset published at Zenodo repository.

    Statements

    0 references
    Pathway Multi-Omics Simulated Data These are synthetic variations of the TCGA COADREAD data set (original data available athttp://linkedomics.org/data_download/TCGA-COADREAD/). This data set is used as a comprehensive benchmark data set to compare multi-omics tools in the manuscript pathwayMultiomics: An R package for efficient integrative analysis of multi-omics datasets with matched or un-matched samples. There are 100 sets (stored as 100 sub-folders, the first 50 in pt1 and the second 50 in pt2) of random modifications to centred and scaled copy number, gene expression, and proteomics data saved as compressed data files for the R programming language. These data sets are stored in subfolders labelled sim001, sim002, ..., sim100. Each folder contains the following contents: 1) indicatorMatricesXXX_ls.RDSis a list of simple triplet matrices showing which genes (in which pathways) and which samples received the synthetic treatment (where XXX is the simulation run label: 001, 002, ...), (2) CNV_partitionA_deltaB.RDS is the synthetically modified copy number variation data(where A represents the proportion of genes in each gene set to receive the synthetic treatment [partition 1 is 20%, 2 is 40%, 3 is 60% and 4 is 80%] and B is the signal strength in units of standard deviations), (3) RNAseq_partitionA_deltaB.RDS is the synthetically modified gene expression data (same parameter legend as CNV), and (4)Prot_partitionA_deltaB.RDS is the synthetically modified protein expression data (same parameter legend as CNV). Supplemental Files The file cluster_pathway_collection_20201117.gmt is the collection of gene sets used for the simulation study in Gene Matrix Transpose format.Scripts to create and analyze these data sets available at:https://github.com/TransBioInfoLab/pathwayMultiomics_manuscript_supplement
    0 references
    11 November 2021
    0 references
    0 references
    0 references

    Identifiers

    0 references