ANGSD: Analysis of Next Generation Sequencing Data

BMC Bioinformatics

Table 1 Overview of analyses implemented in ANGSD

Analysis	Basis	Reference
Contamination estimates based on the X-chromosomes	BC	[19] ^b
Type specific error estimation estimated by simultaneously estimating allele frequencies and genotype likelihoods	GL	[10]
Type specific error estimation based on an outgroup and a high quality genome	BC	[20] ^ab
Genotype likelihoods (GL) (diploids)	BC/Seq	[6],[8],[10],[15]
Allele frequencies for a site	BC/GL/GP	[21] ^b [10]
SNP discovery (LRT) used for rejecting that the allele frequency is different from zero	GL	[10]
Genotype posteriors (GP) can be used for calling genotypes by specifying a cutoff	GL/SAF	[9],[10]
Sample allele frequencies (SAF) the probability of all read data given the sample allele frequency	GL/GP	[9] ^b
Population differentiation statistics F _st	SAF	[14] ^ac
Population structure via principle components analysis (PCA)	GP	[14] ^ac
Admixture analysis (NGSadmix) NGS data	GL	[22] ^ab
Detection of ancient admixture ABBA-BABA/d-statistics	BC	[20] ^b
Estimation of SFS (1D)	SAF	[9] ^ab
Estimation of SFS (2D)	SAF
Selection scans, Neutrality tests (e.g θ's and Tajima's D)	SAF	[12] ^ab
Estimation of individual and site-wise Inbreeding coefficients. Also MAF and GP estimation for inbreed individuals	GL	[13] ^abc
Allele frequency based association for case/control data)	GL	[10]
Association score test in a generalized linear model framework for both quantitative and case/control data while allowing for additional covariates	GL-GP	[11] ^b

Table of the supported analyses in ANGSD. ^aindicates methods that require a secondary program in ANGSD package. ^bindicates methods for which ANGSD is the de facto implementation and ^care user supplied extensions for ANGSD. The basis for each analysis is either the sequencing data (Seq), base counts (BC), genotype likelihood (GL), sample allele frequencies (SAF) or genotype probabilities (GP).

ISSN: 1471-2105