Data acquisition¶

Raw sequencing data¶

Raw data, including sequences in FASTQ format were obtained from the following public databases:

Data were downloaded using enaBrowserTools and SRA-Tools facilitated by Aspera (a high-speed data transfer tool).

Meta data¶

Meta-data were first extracted using in-house Perl/R/Python scripts and then manually curated at least two-rounds to ensure the quality. Meta-data curation was not painless because sometimes such information were often incomplete, misplaced or even completely missing. Very often we had to consult the description of the samples, supplementary data of related publications or sometimes even the authors.

Technical meta-data extracted include:

experiment type (16S or Metagenomics),
sequencing devices / instruments, and
number of obtained sequencing reads.

Host-related, biological-relevant meta-data extracted include:

disease or health of the host (refered as to phenotype in our database),
age,
sex,
BMI (body mass index), and
antibiotic usage.

More meta-data will be added in the future.