acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data

Lux, Markus; Krüger, Jan; Rinke, Christian; Maus, Irena; Schlüter, Andreas; Woyke, Tanja; Sczyrba, Alexander; Hammer, Barbara

doi:10.1186/s12859-016-1397-7

BMC Bioinformatics

Table 1 Description of parameters for various techniques used in acdc

From: acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data

Method	Parameter description
Data pre-processing	Given a target of n data points (by default, n=1000), the window width is fixed as \(w = \sum _{i} l_{i} / n\), where l _i is the length of contig i. Default choices of Δ w=w/2 and k=4 (tetramer frequencies) are robust. For contigs with l _i<w, the window width is taken as large as possible (w=l _i).
BH-SNE	The parameter θ=0.5 is a trade-off between speed and accuracy. We set the perplexity perp(n)=⌊log(n)²⌋. It can be seen as an effective neighborhood size that controls the graininess of clusters. A small number of data points n receives a small perplexity whereas with growing n the perplexity saturates.
DIP	The significance level which is uncritical as it is α=0 in the large majority of significant cases. Furthermore, the DIP split threshold, i.e. the percentage of data points, for which multimodality was detected, can be seen as a control of detection precision. We found a default value of t _dip=0.001 to work very well throughout all tested data sets.
CC	The number of clusters found depends on the underlying graph. In acdc, the graph is constructed by connecting each data point to it’s k _cc mutual nearest neighbors. The parameter k _cc can be interpreted as the minimum number of data points contained in a separate cluster. To be able to detect also very small contamination, we use a default value of k _cc=9.
Bootstrapping	We set the number of bootstraps B=10. Setting B to a larger number will result in more accurate confidence estimations at the cost of a longer runtime.
Kraken	The only parameter required by Kraken is the database to be used. It can be specified as a parameter to acdc as well.
RNAmmer	16S rRNA gene sequence prediction using RNAmmer does not require any parameters.

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com