Download Help

From ABRP Genomics
Jump to: navigation, search

Data Availability

This page describes the options available for downloading and associating data with genomic data sets generated by the Amboseli Baboon Research Project (ABRP) through the ABRP Genomics Public Portal. These data are freely available for use, but we ask that you cite the studies that generated the original data if you use them in a publication or webpage.

ABRP Genomics Data Search

This section provides information on the options available to users interested in downloading functional genomic data here. Results for all queries are returned in a zipped folder containing a text file for each data set (individual/sample IDs in columns) and an “additional information” file containing associated data on individuals.

Study Descriptions

This section allows users to choose the ABRP studies they wish to query. The current options are:

  1. Tung, J., Barreiro, L.B., Burns, M., Grenier, J.C., Lynch, J., Grieneisen, L., Altmann, J., Alberts, S.C., Blekhman, R., and Archie, E.A. 2015. Social networks predict gut microbiome composition in wild baboons. eLife 4: e05224.
    • This study uses metagenomic shotgun sequencing to profile the gut microbiota of 48 baboons in 2 social groups. Available data sets for download include
      • The proportional representation of microbial species in each individual, estimated using the program MetaPhlAn 2.0 (Segata et al 2012).
      • The proportional representation of KEGG enzyme orthologs in each individual, estimated using the program HUMAnN (Abubucker et al 2012).
      • An alternative estimate of the proportional representation of microbial species in each individual, estimated based on mapping to de novo assembled genomes.
    • Raw read data for this project are also deposited in the NCBI Short Read Archive under PRJNA271618. Note that sample IDs are not identical to the sample IDs returned here.
  2. Tung, J., Zhou, X., Alberts, S.C., Stephens, M., and Gilad, Y. 2015. The genetic architecture of gene expression levels in wild baboons. eLife 4: e04729.
    • This study uses RNA-seq to profile genome-wide gene expression in 63 adult baboons. Available data sets for download include (i) count data for expressed genes by individual, based on mapping raw reads to the Panu2.0 genome assembly.
    • Genotype data from this study are also available from the ABRP VCF download page here.
    • Raw read data for this project are also deposited in the NCBI Gene Expression Omnibus under GSE63788. Note that sample IDs are not identical to the sample IDs returned here.
  3. Lea, A.J., Tung, J., and Zhou, X. 2015. A flexible, efficient binomial mixed model for identifying differential DNA methylation in bisulfite sequencing data. PLoS Genetics 11: e1005650
    • This study presents a new tool for analyzing bisulfite sequencing data. The method is applied to reduced representation bisulfite sequencing generated from baboon whole blood samples (n=50 individuals). Available data sets for download include:
      • Count data representing DNA methylation levels for each individual at 433,871 CpG sites in the genome (Panu 2.0; see Methods in the associated paper for information on site filtering). Count data are represented as the number of methylated reads over the total number of mapped reads (for each individual and genomic location). Count data were generated using the program BSMAP (Xi and Li et al. 2009).
    • Raw read data for this project are also deposited in the NCBI Short Read Archive under SRP058411. Note that sample IDs are not identical to the sample IDs returned here.
  4. Lea, A.J., Altmann, J., Alberts, S.C., and Tung, J. 2015 (advance access). Resource base influences genome-wide DNA methylation levels in wild baboons (Papio cynocephalus). Molecular Ecology (advance access): doi: 10.1111/mec.13436
    • This study uses reduced representation bisulfite sequencing generated from baboon whole blood samples (n=69 individuals). The data were used to test the relationship between resource base and DNA methylation levels. Available data sets for download include:
      • Count data representing DNA methylation levels for each individual at 535,996 CpG sites in the genome (Panu 2.0; see Methods and Supplementary Figure 1 in the associated paper for information on site filtering). Count data are represented as the number of methylated reads over the total number of mapped reads (for each individual and genomic location). Count data were generated using the program BSMAP (Xi and Li et al. 2009).
    • Raw read data for this project are also deposited in the NCBI Short Read Archive under SRP058411. Note that sample IDs are not identical to the sample IDs returned here.
    • Additional data sets are available in the Dryad database (doi:10.5061/dryad.2d80m): (i) the output files from our main analyses performed in the program MACAU (Lea et al. 2015), (ii) cell type proportion data collected from blood smears.

Data Type Descriptions

This section allows users to choose the data types they wish to obtain. The current options include:

  • Microbiome_metaphlan_species: proportional microbial species representation from microbiome profiling, estimated using MetaPhlAn 2.0 (Segata et al 2012)
  • Microbiome_humann_kegg: proportional KEGG ortholog representation from microbiome profiling, estimated using HUMAnN (Abubucker et al 2012).
  • Microbiome_denovo_species: proportional microbial species representation from microbiome profiling, estimated based on mapping raw reads to de novo assembled genomes
  • MRNA_seq: RNA-seq count data by gene, obtained from poly-A purified RNA
  • DNA_methylation_RRBS: RRBS count data (the number of methylated reads and the total number of mapped reads for a given chromosome number and base pair number). Counts are estimated from genomic DNA that has been digested with a restriction enzyme, treated with sodium bisulfite, sequenced, and mapped to a reference genome

Age-Sex Characteristics Help

This section allows users to filter the data download by sex (male or female) or by age class (infant, juvenile, or adult). N.B. If no values are selected, the data sets will be returned complete, without further filtering by individual.

  • Sex is assigned through direct observation.
  • Age class is assigned using near-daily collection of demographic data and assessment of maturational markers (testicular enlargement in males and menarche in females).

Additional Amboseli Data

This section gives users the option to request additional data on individuals represented in ABRP studies, which are returned in a separate tab-delimited text file. If additional data are not requested, only age class at the time of sampling (adult, juvenile, infant) and sex information are returned. If additional data are requested, age class and sex data are returned along with the following information (note that for non-Amboseli animals, many of these values will be missing):

  • Birth: birthdate (YYYY-MM-DD).
  • Bstatus: an estimate of the accuracy of the birth date. Bstatus= 0 means the birthdate is known to within a few days (i.e., captured through near-daily monitoring). Other values capture uncertainty around the assigned birthdate, up to a maximum of 4 years. For example, bstatus = 1 corresponds +/- 1 year uncertainty around the estimated birth date. Most cases in which bstatus does not equal 0 reflect birth dates for males who immigrated into our study population as adults. A value of 9 corresponds to an unknown birth date (for example, if data were generated from a sample collected by other investigators or from a dismembered body part); in these cases, the birthdate will be NULL.
  • Collection_date: collection date for the biological sample from which genomic data were generated. Takes the value NULL if collection date is unknown, which also results in a NULL value for age of the animal at time of sampling (age_sample).
  • Age_sample: age of the individual on the date of sample collection, in years. This value is NULL for individuals with bstatus = 9.
  • Entrydate: date the individual was first observed by the ABRP. Entry date is equal to birthdate for individuals observed since birth as part of long-term monitoring. Entry date is equal to immigration date for males who immigrated into the study population as adults. Entry date is equal to the date that long-term monitoring began in the case of individuals who were adult members of social groups at the time those social groups came under observation (see “entrytype” values).
  • Entrytype: the nature of the first observation of a given animal: “B” for birth, “I” for immigration, or "O" for observation (the first date the individual was observed; only used when new social groups were added to the regular observation schedule).
  • Depart_date: last observation of the individual by the ABRP, due to death, emigration, or censored data.
  • Depart_type: the nature of the last observation of a given animal: 0 = right-censored (the animal is still be alive in the population); 1 = Individual is deceased; 2 = Censored; event-driven (e.g. male dispersal); 3 = Censored; observer driven (e.g. group dropped from regular observation).
  • Mature_date: date the individual reached reproductive maturation (menarche for females, testicular enlargement for males).
  • Mature_date_status: the nature of the mature_date: "O" = maturation was noted through direct observation "on" his/her mature_date; "B" = the individual was known to be mature “by" the mature_date (usually reserved for immigrant males who were already mature when they were first observed).
  • Mom_id: unique ID for the individual’s mother, if known.

ABRP Genomics VCF Downloads

Genotype data sets for Amboseli animals can be obtained as VCF files. Genotype data sets for Amboseli animals with Panu2.0 as reference can be obtained here as VCF files. Some of our studies are based on our own in-house assembly for yellow baboon Pcyn1.0 . VCF files for these studies can be found here


Note: VCF files downloaded prior to Apr 19 2016 may contain sample labeling errors. This bug has been corrected; for accurate cross-referencing to other Amboseli data sets available here, please re-download the appropriate VCF.


Currently available data sets with Panu2.0 as reference are:

  1. Tung, J., Zhou, X., Alberts, S.C., Stephens, M., and Gilad, Y. 2015. The genetic architecture of gene expression levels in wild baboons. http://elifesciences.org/content/4/e04729 eLife 4: e04729.
    • Genotype calls for 64,432 SNPs based on RNA-seq data for 63 adult baboons.


Currently available data sets with Pcyn1.0 as reference are:

  1. Snyder-Mackler et al 2016, Efficient Genome-Wide Sequencing and Low Coverage Pedigree Analysis from Non-invasively Collected Samples, Genetics
    • Two different VCF files are available for this study
      1. Genotype calls for 54 baboons using capture-based enrichment from fecal-derived DNA. This file also includes genotypes called from high-coverage sequencing of blood-derived DNA from two of these baboons, HAP and LIT.
      2. Low-coverage genotype calls for 52 baboons using capture-based enrichment from fecal-derived DNA. This file includes only sites that are ≥ 10 Kb apart and have minor allele frequency > 5%.
  2. Wall et al 2016, Genome-wide ancestry and divergence patterns from low-coverage sequencing data reveal a complex history of admixture in wild baboons
    • Genotype calls for 14M SNPs called for 47 baboons from DNA-sequencing. This file only includes variants that were called with a GQ ≥ 30 in the three highest coverage samples, HAP, SWY, and SWA.

Contact us