Data downloads¶

Download processed data from GMrepo¶

File	Description
Projects	a list of projects and related information such as project description
Projects summary	a list of projects and related information such as total numbers of associated runs, processed runs, valid runs and failed runs.
All runs	information on runs/samples collected in this database, including their associated projects, and other curated meta-data.
Runs to phenotypes	all runs and their corresponding phenotypes (if any); one run/sample can be associated with multiple phenotypes.
Processed runs	a list of proccessed runs, tools used to do the analysis, and their QC statuses
Relative abundances	relative abundances at species and genus levels in samples/runs that passed our QC criteria
Relative abundance summary	some statistics on the taxonomic abundances data
Taxon co-occurrence data	taxon cooccurence in runs/samples of each phenotype, calculated separately for species and genus
Statistics by phenotype	statistics by phenotype, including total number of runs (with meta-data available) associated with each phenotype, the numbers of processed, valid and failed runs for each phenotype, and the total numbers of species and genera associated with each phenotype in our database
MySQL dump of the whole database	mysql dump of all tables included in our database
MeSH table	Medical Subject Headings (MeSH) data used in this study
NCBI taxonomy table	reformatted NCBI taxonomy information, including ncbi taxon ID to scientific name and rank information

Note

Please NOTE that in the NCBI taxonomy database two types of taxonomy ids are used,

taxon_id: refers to internal unique id used by NCBI taxonomy, while
ncbi_taxon_id: refers to the REAL taxonomy id of a taxnomy entity.

Download raw sequence data¶

Due to limited hardware capacity we do not offer downloading raw sequence data directly from our database.

Instead, users should download the raw sequence reads from public databases such as SRA (Sequence Read Archive) database at NCBI (National Center for Biotechnology Information).

To do so, users can either copy & paste the run ID of interest to the "Search" box of SRA (Sequence Read Archive) database , go to the web page for the run, and use the download links provided at the SRA web page to download the raw sequence data. or use the "linkout" icon usually available for each run ID in our database to go directly to the corresponding SRA web page.

Alternatively, users can use command line tools in the SRA Tookit to download raw data of various formats from NCBI SRA . Commonly used tools include:

fastq-dump : download SRA data to local directory. Usage:

fastq-dump [options] <run_accesion_id>

prefetch : download SRA, dbGaP and ADSP data. Usage:

prefetch [options] <run_accesion_id>

Please consult the SRA Toolkit documentation for more details.

Programmable access¶

GMrepo also provides programmable access to most of the database contents through RESTful APIs.

Below please find example codes for a few languages that we use in our lab; users who are using other languages can either create their own codes following these examples, or contact us for support.

See our GitHub page for details.