Skip to main content

Table 1 Biological datasets used for RMRCM performance asssessment

From: Correlated mutations via regularized multinomial regression

Dataset

Nsetsa

Nprotb

Ncolb

Structurec

Reference

PFAM

1256

200/3975

10/521

Severald

[31]

CASP9

28

5/501

21/227

Severale

www.predictioncenter.org

MADS

12

34/339

78/218

1n6j

[36]

Response regulators

1

1433

186

1xhe

[16]

CDD

36

125/1922

34/411

Severalf

[8]

SK-RR

1

4934

184

2c2a, 1pey, 1f51

[16]

PDZ-peptide

1

2385

162

1n7f

[33]

  1. a Nsets, number of separate multiple sequence alignments. Four datasets consists of several multiple sequence alignments, each of which is analyzed separately. For these, the number of proteins and the number of columns mentioned are the minimum and maximum found in these sets.
  2. b Nprot, number of proteins; Ncol, number of columns in the multiple sequence alignment.
  3. c PDB identifier of structure used to compare predicted residue contacts.
  4. d Obtained via PFAM.
  5. e Obtained via www.predictioncenter.org.
  6. f See ref. [8].