Classification of microarrays; synergistic effects between normalization, gene selection and machine learning

Önskog, Jenny; Freyhult, Eva; Landfors, Mattias; Rydén, Patrik; Hvidsten, Torgeir R

doi:10.1186/1471-2105-12-390

BMC Bioinformatics

Table 1 Overview of the data sets and the methods used in this study

From: Classification of microarrays; synergistic effects between normalization, gene selection and machine learning

Data set (D)	Classes*	No. of genes^**
Alizadeh	DLBCL (68), other samples (65)	7806 (7430)
Finak	Epithelial (34), stromal tissue (32)	33491
Galland	Invasive NFPAs (22), non- invasive NFPAs (18)	40475 (40291)
Herschkowitz	High ER expression (58), low ER expression (46)	19718
Jones	Cancerous samples (72), non-cancerous samples (19)	40233 (39746)
Sørlie	High ER expression (55), low ER expression (18)	8033 (7734)
Ye	Metastatic (65), non-metastatic (22)	8911
Normalization (No)	Description
No 0	Raw data
No 1	Print-tip MA-loess, no background correction
No 2	Print-tip MA-loess, background correction
No 3	Global MA-loess, no background correction
No 4	Global MA-loess, background correction
Gene selection (G)	Fixed parameters
T-test	Two-sided
Relief	Threshold = 0, nosample = # obs. in data set
Paired distance	Euclidian distance
Number of genes (N)	2, 12, 22, 32, 42, 52, 62, 72, 82, 92, 100, 200, 300, 400, 150, 500, 600, 700, 800, 900, 1000
Machine learning (M)	Description, Fixed parameters	Optimized parameters
DT Gini	Decision tree, Splitting index = Gini
DT Information	Decision tree, Splitting index = Information
NN One layer	Neural Network, one hidden layer, decay = 0.001, rang = 0.1, maxit = 100	size = [2-5]
NN No layer	Neural Network, no hidden layer, decay = 0.001, rang = 0.1, maxit = 100, skip = TRUE, size = 0
SVM Linear	Support Vector Machine, linear kernel, type = nu-scv, cross = 10, nu = 0.2, scaled = FALSE
SVM Poly2	Support Vector Machine, polynomial kernel, deg 2, type = nu-scv, cross = 10, nu = 0.2, scaled = FALSE
SVM Poly3	Support Vector Machine, polynomial kernel, deg 3, type = nu-scv, cross = 10, nu = 0.2, scaled = FALSE
SVM Rb	Support Vector Machine, radial basis kernel, type = nu-scv, cross = 10, nu = 0.2, scaled = FALSE	sigma = [2^-14, 2¹⁴]

Acronyms defined here are used throughout the paper. "Fixed parameters" in the methods were given fixed values, while "Optimized parameters" were optimized in the inner cross validation using a grid search. *The number of samples belonging to each class is given in parenthesis. **Dimensions after background corrected normalization (No 2 and No 4) are given in parenthesis.

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com