A model-based approach to selection of tag SNPs

BMC Bioinformatics

Table 1 Model comparison using code-length, or average negative cross-log-likelihood. Code lengths are given in bits. For Chromosome 7 data sets, SNP loci were sub-sampled in ENCODE regions to maintain an uniform coverage of the chromosome.

	Pop.^(a)	SNPs^(b)	Markov	HMM-2^(c)	HMM-4D^(d)	GR-1^(e)	GR-2^(f)	LS-HOM^(g)	LS-HET^(h)
Chr. 7	CEU	42835	–	13334.9	9737.9	10898.7	8273.4	6441.5	6030.6
Chr. 7	YRI	42790	–	16783.0	13996.9	15252.2	13543.4	9082.5	8705.6
ENr112	CEU	1134	378.6	238.4	152.8	127.2	89.2	53.6	51.4
ENr112	YRI	1082	489.9	348.9	238.9	225.0	157.8	80.7	77.9
ENr131	CEU	1188	454.0	306.7	161.5	151.7	100.2	66.1	60.2
ENr131	YRI	1080	439.7	331.9	244.2	227.4	172.4	101.9	92.7
ENr113	CEU	1375	478.7	287.8	160.4	120.6	88.8	57.9	55.9
ENr113	YRI	1525	597.9	424.9	286.3	228.3	157.5	84.9	81.7
ENm010	CEU	706	261.9	187.2	106.6	106.1	83.1	58.8	56.7
ENm010	YRI	741	325.5	250.5	175.4	177.8	152.1	106.6	101.4
ENm013	CEU	1001	417.5	279.6	132.2	83.7	57.1	38.2	37.6
ENm013	YRI	1111	452.1	336.3	211.0	157.0	108.7	63.2	62.0
ENm014	CEU	1110	442.1	290.1	140.9	104.6	71.5	55.3	50.9
ENm014	YRI	1224	483.2	338.9	237.0	166.5	117.7	71.7	68.4
ENr321	CEU	782	243.1	143.2	90.1	90.4	68.9	48.0	46.0
ENr321	YRI	1123	458.9	325.5	232.1	199.1	145.0	77.1	73.9
ENr232	CEU	627	189.4	117.2	89.9	98.5	82.3	64.1	58.4
ENr232	YRI	833	345.9	268.6	206.2	198.8	161.5	102.2	93.4
ENr123	CEU	1183	453.8	294.2	175.2	114.6	76.3	47.5	46.8
ENr123	YRI	1055	436.6	291.5	206.1	161.3	118.5	67.7	66.7
ENr213	CEU	800	323.9	207.1	87.4	82.3	59.5	41.0	38.5
ENr213	YRI	1085	418.5	319.9	219.8	178.7	131.6	75.4	71.6

^(a): Population. ^(b): Number of actually polymorphic SNPs in the population considered. ^(c-h): HMM-2 for the unconstrained two-state HMM; HMM-4D for the Daly et al. HMM; LS-HOM and LS-HET for the homogeneous and the heterogeneous Li and Stephens models; GR-1 and GR-2 for the "greedy" models with context sizes 1 and 2.

ISSN: 1471-2105