- Research article
- Open Access
Evaluation of 3D-Jury on CASP7 models
BMC Bioinformatics volume 8, Article number: 304 (2007)
3D-Jury, the structure prediction consensus method publicly available in the Meta Server http://meta.bioinfo.pl/, was evaluated using models gathered in the 7thround of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7). 3D-Jury is an automated expert process that generates protein structure meta-predictions from sets of models obtained from partner servers.
The performance of 3D-Jury was analysed for three aspects. First, we examined the correlation between the 3D-Jury score and a model quality measure: the number of correctly predicted residues. The 3D-Jury score was shown to correlate significantly with the number of correctly predicted residues, the correlation is good enough to be used for prediction. 3D-Jury was also found to improve upon the competing servers' choice of the best structure model in most cases. The value of the 3D-Jury score as a generic reliability measure was also examined. We found that the 3D-Jury score separates bad models from good models better than the reliability score of the original server in 27 cases and falls short of it in only 5 cases out of a total of 38. We report the release of a new Meta Server feature: instant 3D-Jury scoring of uploaded user models.
The 3D-Jury score continues to be a good indicator of structural model quality. It also provides a generic reliability score, especially important for models that were not assigned such by the original server. Individual structure modellers can also benefit from the 3D-Jury scoring system by testing their models in the new instant scoring feature http://meta.bioinfo.pl/compare_your_model_example.pl available in the Meta Server.
The number of protein structure prediction servers has increased over the past years . The use of many different methods to predict the structure of a protein is now state-of-the-art in protein structure prediction . However, the number of available servers, taken together with the number of models returned exceeds the limit a human researcher is likely to scan. Fortunately, structure prediction meta-servers address this problem: they gather models from various other servers and employ automated processes successfully applied by human experts in order to deliver a correct prediction . Since existing structure prediction servers are constantly upgraded while new servers appear, it is necessary to re-evaluate the fitness of the aforementioned expert processes.
The latest, 7thround of the Critical Assessment of Techniques for Protein Structure Prediction  has provided us with a fair amount of structure prediction server models. With the help of the Structure Prediction Meta Server , we have evaluated the servers returning these models using the same protocols as in previous Livebench experiments , results are available at .
Standard evaluation methods take into account the first (top ranked) model of the prediction servers. The Meta Server assigns a new reliability score to each model using 3D-Jury . This score can be used to re-rank the models and thus affect the evaluation results. The aim of the present work was to verify the continued applicability of this model ranking method, focusing on the version available on-line. We were interested in answering the following three questions: Can we use 3D-Jury to estimate model quality? Does 3D-Jury select a model more accurate than the choice of the generating server? Could the 3D-Jury score be used as a generic model reliability score?
Results and Discussion
3D-Jury score correlates with the number of correctly predicted residues
The correlation of the 3D-Jury score (Jscore) with model quality is of fundamental importance to the operation of the Meta Server. Therefore we first examined the correlation of the 3D-Jury score returned by the default on-line version of 3D-Jury: 3J1,A(see Methods: 3D-Jury operating modes), with the number of correctly predicted residues ().
3D-Jury scores correlate with the number of correctly predicted residues (): the correlation coefficient is 0.95. A linear model (LM1) is presented on Figure 1. The residual error, 20.15, is low enough to enable meaningful estimation of the number of correctly positioned residues.
A better model (LM2) can be obtained by fitting to the [30, 100) 3D-Jury score range only. This range represents difficult targets. Figure 2 shows the linear model obtained. The residual error is 13.37, offering narrower, better prediction intervals for the number of correctly positioned residues.
As an example to the use of LM2, let's assume that our model has 3D-Jury score 44.5. We can expect to have 13 to 82 well positioned residues in this model on the 99% confidence level, 21 to 74 on the 95% confidence level. For a score of 59 the 99% prediction interval for the number of correct residues is 26–94, the 95% prediction interval is narrower: 34–86.
A key to which residues are likely to be well-positioned is provided on the model-centred 3D-Jury page, accessible by selecting a model in the Model column of the main 3D-Jury page. Here, residues that are likely to be correctly positioned would have grey background at the corresponding positions of most of the other aligned models, forming a column of grey background.
3D-Jury improves overall server prediction results
We examined whether 3D-Jury could improve overall server performance by selecting a better model when multiple models are returned by a prediction server. We tested four operating modes of 3D-Jury: 3J1,A– uses one model of the default servers (a mode typical for on-line predictions); 3Ja,A– all models of default servers; 3J1,C– one model of all servers; 3J a,C – all models of all servers. We have computed the MaxSub score (MaxS)  of 25,215 models for this analysis. Four 3D-Jury scores (Jscore) were also computed for each model, respective to the four 3D-Jury operating modes mentioned above. The servers' choice of the best model was evaluated by summing the MaxS' of the first models returned for each target. The four 3D-Jury variants' choice of the best model was evaluated by summing the MaxS' of the models with the highest respective 3D-Jury score for each target. We also summed up the highest MaxS score for each target, giving an upper limit to possible improvements. Results for 3J1,Aare presented in Table 1, column Q%. The order of the five model ranking approaches is revealed by the grand total of MaxS: 3Ja,C(20,006) > 3J1,C(19,983) > 3J1,A(19,690) > 3 a,A (19,655) > first server model (19,039) (the sum of MaxS over the highest scoring models is 20,718). Table 1, column N j shows the number of targets where 3J1,Amade a better choice about the best model than the original server. In the case of pmodeller6  and 3dpro , we can see that 3D-Jury 3J1,Apredicts more targets better, but its overall performance is slightly worse than the original servers'. The reason for this is that 3J1,A's more numerous choices of better models were not good enough to counteract its loss of MaxSub scores on the bad choices. In the case of inub  and BasD  the situation is inverse: 3J1,Aimproved fewer targets, but the net improvement is positive. For many servers the improvement – or worsening – of the targets is marginal (e.g. phyre-2 = 0.6%). Nevertheless we can see that even in these cases there is room for a 4 – 5% improvement (Table 1, column Q%, values in parentheses). Moreover, it appears that for at least 14 targets every server fails to pick the best model.
3D-Jury scores as generic model reliability scores
In order to assess the advantage of using 3D-Jury scores as generic reliability scores we conducted a receiver operating characteristic (ROC) analysis adapted for CASP and Livebench  evaluation. The analysis shows how well a reliability score separates good models from bad ones, in terms of the average number of good models seen before encountering 1 to 11 bad models (). We compared the 3D-Jury scores returned by the on-line version 3J1,Ato the reliability scores of the original servers, when available. Results are shown in Table 2. The 3D-Jury score exceeds the original server score () in 27 cases and falls short of it in only 5 cases out of the 38 analysed. The exceptions are pmodeller6 , pcons6 , ffas03 , inub  and shub .
The J0 scores listed in Table 2 indicate the lowest 3D-Jury score seen before a bad model was encountered from the indicated server. In other words, no bad model above J0 score was seen in the test model set of the server. J0 scores are of practical value: they can be used as server-specific score thresholds, since a score above J0 is likely to indicate a good model.
3D-Jury scoring of user models
In order to encourage model selection and refinement using 3D-Jury, we introduced a new feature: instant 3D-Jury scoring of user models. This feature, available for any completed job by selecting the job in the Queue and uploading a model, enables the user to score a set of models and obtain a ranking based on the 3D-Jury score. Pop-up hints and an on-line tutorial , available from the job page, offer help with this new feature.
In this report we present the evaluation of 3D-Jury  on models gathered in CASP7. We found good correlation between the 3D-Jury score and a model quality measure: the number of correctly predicted residues. This correlation can be used to predict important model features such as the number of correctly positioned residues. Using Figure 2, 3D-Jury scores can be translated to the estimated number of correctly predicted residues. We plan to upgrade the on-line 3D-Jury to provide the 90%, 95% and 99% prediction intervals for the number of correctly predicted residues automatically.
3D-Jury, in general, also appears to boost server predictions by identifying better models. Our results show that 3D-Jury performs best when all models of all servers are used to calculate the J score. This option, however, is not feasible in the Meta Server since many of the servers participating in CASP7 are not currently available on-line. Nevertheless, 3J1,A, the provided on-line default presents a reasonable choice. We found that 3D-Jury scores can be used as generic reliability scores, an especially important feature for models that are not provided with such values. We have also extracted serverwise 3D-Jury score thresholds to help identifying reliable models. We report the release of a new Meta Server feature: instant 3D-Jury scoring of uploaded user models.
3D-Jury remains to be a valuable tool in the hands of protein structure modellers. Its ability to pinpoint the best server models is founded by the results of our analysis.
Test model set
In order to assess 3D-Jury we downloaded the complete set of server structure predictions from the Protein Structure Prediction Center . Predictions from our partner servers (BasD , ffas03 , inub , mgenthreader , ORFeus-2 , pdbblast  and 3D-PSSM ) were added if missing.
Servers that predicted less than two targets and/or returned only one model for each target were excluded from the server model ranking tests (reported in Table 1). The resulting set contains 25,215 models for 85 targets from 59 servers – a 5 models per server average.
Models with Jscore = 0 were excluded from all correlation and regression analyses.
Server reliability scores (Rscore) that anti-correlate with model quality were multiplied by -1.
Model quality measures
MaxSub  score and (defined below) were used to measure the quality of models. Maxsub returns a score between 0.0 (incorrect prediction) and 1.0 (perfect prediction). In this study the score was multiplied by 10.0 as is customary on the 3D-Jury web pages . We say that models with MaxS > 0 are good, while models with MaxS = 0 are bad.
is the number of C α atoms that are predicted within 3.5 Å from their respective locations in the solved structure, as reported by the MaxSub tool  operating on the C α atoms of the structures compared. We say that gives the number of correctly predicted residues.
3D-Jury model scoring
The 3D-Jury score of a model M is calculated by first comparing M to a set of other models available to the system for the same target. The way these other models are selected is a tunable parameter of 3D-Jury. M is compared to each selected model, and a pairwise similarity score (S M,i , for pair i) is assigned that equals to the number of respective C α atoms that are within 3.5 Å of each other after optimal superposition of the structures represented by their the C α atoms. MaxSub  is used to carry out this step. In case a pairwise similarity score falls below a certain cutoff value, it is set to zero. The 3D-Jury score (Jscore) of model M is the sum of its pairwise similarity scores divided by the number of these scores (n) + 1 : .
3D-Jury offers three tunable parameters: the list of servers to draw models from for pairwise score calculation; the method of server model selection (applicable in case of multiple available models, the name of the method is shown in italics): first model, most similar (in terms of S M,i ) one, or all models; and the pairwise similarity score cutoff . In this analysis we used the publicly available BasD , ffas03 , inub , mgenthreader , ORFeus-2 , pdbblast  and 3D-PSSM  as default servers and a constant similarity cutoff of 40 in order to simulate regular on-line use of the service.
3D-Jury operating modes
The four operating modes of 3D-Jury used in this report are: 3J1,A– uses one model of the default servers (a mode typical for on-line predictions); 3J a,A – all models of default servers; 3J1,C– one model of all servers; 3J a,C – all models of all servers.
Measures for comparing model selection methods
Q%– 3D-Jury vs. original server
∑MaxS j – sum of MaxSub scores of models selected by 3J1,A
∑MaxS s – sum of MaxSub scores of the server's first models
– 'best model' vs. original server
∑max(MaxS) – sum of the server's highest, best MaxSub scores per target
∑MaxS s – sum of MaxSub scores of the server's first models
Receiver operating characteristic (ROC) analysis
We performed a ROC analysis adapted for CASP and Livebench  model evaluation for each server. Server models were ordered by the original reliability score (Rscore, when available), or the 3D-Jury score (Jscore). The highest scoring models for each target were collected into separate sets M R and M J , corresponding to the Rscore or Jscore used for ordering. Models in both sets were ordered by their respective scores. Good models (MaxS > 0) were labelled positive, bad models (MaxS = 0) were labelled negative. Using Rscore or Jscore as the discrimination threshold, we plotted the number of true positives (tp) versus the number of false positives (fp) on the [0 – 10] fp range. This was to take into account the absolute number of targets predicted by the servers, focusing on the hardest targets. We used the number of true positives averaged over the [0 – 10] false positive range as a quality measure for the reliability scores, the higher values indicating better reliability scores.
Statistics and figures
Reported correlation coefficients are significant at the 95% significance level.
Statistics and figures were prepared using R .
Availability and requirements
Project name: Meta Server/3D-Jury
Project home page: http://meta.bioinfo.pl/
Operating system: Linux
Programming language: Perl
Other requirements: SQL server, web server, mail server, procmail
Licence: the web service is freely accessible to everybody
Fischer D: Servers for protein structure prediction. Curr Opin Struct Biol 2006, 16(2):178–82.
Wallner B, Elofsson A: Pcons5: combining consensus, structural evaluation and fold recognition scores. Bioinformatics 2005, 21(23):4248–54.
7th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction[http://www.predictioncenter.org/casp7/]
Bujnicki JM, Elofsson A, Fischer D, Rychlewski L: Structure prediction meta server. Bioinformatics 2001, 17(8):750–1.
Rychlewski L, Fischer D: LiveBench-8: the large-scale, continuous assessment of automated protein structure prediction. Protein Sci 2005, 14: 240–5.
Livebench-style evaluation of CASP 7 predictions[http://metav1.bioinfo.pl/results.pl?B=CASP&V=7]
Ginalski K, Elofsson A, Fischer D, Rychlewski L: 3D-Jury: a simple approach to improve protein structure predictions. Bioinformatics 2003, 19(8):1015–8.
Siew N, Elofsson A, Rychlewski L, Fischer D: MaxSub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics 2000, 16(9):776–85.
Wallner B, Fang H, Elofsson A: Automatic consensus-based fold recognition using Pcons, ProQ, and Pmodeller. Proteins 2003, 53(Suppl 6):534–41.
Cheng J, Baldi P: A machine learning information retrieval approach to protein fold recognition. Bioinformatics 2006, 22(12):1456–63.
Fischer D: 3D-SHOTGUN: a novel, cooperative, fold-recognition meta-predictor. Proteins 2003, 51(3):434–41.
Ginalski K, von Grotthuss M, Grishin NV, Rychlewski L: Detecting distant homology with Meta-BASIC. Nucleic Acids Res 2004, (32 Web Server):W576–81.
Jaroszewski L, Rychlewski L, Li Z, Li W, Godzik A: FFAS03: a server for profile-profile sequence alignments. Nucleic Acids Res 2005, (32 Web Server):W284–8.
Guide to the BioInfoBank Meta Server 'Upload and score your model' feature[http://meta.bioinfo.pl/compare_your_model_example.pl]
Protein Structure Prediction Center – CASP7 predictions[http://www.predictioncenter.org/casp7/SERVER_HTML/tarballs/]
Bryson K, McGuffin LJ, Marsden RL, Ward JJ, Sodhi JS, Jones DT: Protein structure prediction servers at University College London. Nucleic Acids Res 2005, (33 Web Server):W36–8.
Ginalski K, Pas J, Wyrwicz LS, von Grotthuss M, Bujnicki JM, Rychlewski L: ORFeus: Detection of distant homology using sequence profiles and predicted secondary structure. Nucleic Acids Res 2003, 31(13):3804–7.
Bujnicki JM, Elofsson A, Fischer D, Rychlewski L: LiveBench-1: continuous benchmarking of protein structure prediction servers. Protein Sci 2001, 10(2):352–61.
Kelley LA, MacCallum RM, Sternberg MJ: Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol 2000, 299(2):499–520.
BioInfoBank Meta Server[http://meta.bioinfo.pl/]
The R Project for Statistical Computing[http://www.r-project.org/]
Hung LH, Ngan SC, Liu T, Samudrala R: PROTINFO: new algorithms for enhanced protein structure predictions. Nucleic Acids Res 2005, (33 Web Server):W77–80.
Xu J, Li M, Kim D, Xu Y: RAPTOR: optimal protein threading by linear programming. J Bioinform Comput Biol 2003, 1: 95–117.
Bates PA, Kelley LA, MacCallum RM, Sternberg MJ: Enhancement of protein modeling by human intervention in applying the automatic programs 3D-JIGSAW and 3D-PSSM. Proteins 2001, (Suppl 5):39–46.
Shi J, Blundell TL, Mizuguchi K: FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol 2001, 310: 243–57.
Yamaguchi A, Iwadate M, Suzuki E, Yura K, Kawakita S, Umeyama H, Go M: Enlarged FAMSBASE: protein 3D structure models of genome sequences for 41 species. Nucleic Acids Res 2003, 31: 463–8.
Karplus K, Barrett C, Hughey R: Hidden Markov models for detecting remote protein homologies. Bioinformatics 1998, 14(10):846–56.
Heger A, Holm L: More for less in structural genomics. J Struct Funct Genomics 2003, 4(2–3):57–66.
Tosatto SCE, Albrecht M, Cestaro A, Toppo S, Valle G: Secondary Structure Prediction by Consensus and Homology.[http://www.forcasp.org/modules.php?name=Papers&file=article&sid=1731]
Torda AE, Procter JB, Huber T: Wurst: a protein threading server with a structural scoring function, sequence profiles and optimized substitution matrices. Nucleic Acids Res 2004, (32 Web Server):W532–5.
Liu S, Zhang C, Liang S, Zhou Y: Fold recognition by concurrent use of solvent accessibility and residue depth. Proteins 2007, 68(3):636–645.
Teodorescu O, Galor T, Pillardy J, Elber R: Enriching the sequence substitution matrix by structural information. Proteins 2004, 54: 41–8.
Kurowski MA, Bujnicki JM: GeneSilico protein structure prediction meta-server. Nucleic Acids Res 2003, 31(13):3305–7.
Karplus K, Karchin R, Draper J, Casper J, Mandel-Gutfreund Y, Diekhans M, Hughey R: Combining local-structure, fold-recognition, and new fold methods for protein structure prediction. Proteins 2003, 53(Suppl 6):491–6.
Kalisman N, Keasar C: Protein Structure Prediction with an Ant Lion Town Potential.[http://www.forcasp.org/modules.php?name=Papers&file=article&sid=1785]
Tomii K, Akiyama Y: FORTE: a profile-profile comparison tool for protein fold recognition. Bioinformatics 2004, 20(4):594–5.
Kim DE, Chivian D, Baker D: Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res 2004, (32 Web Server):W526–31.
McGuffin LJ, Jones DT: Improvement of the GenTHREADER method for genomic fold recognition. Bioinformatics 2003, 19(7):874–81.
DeRonne KW, Karypis G: Effective optimization algorithms for fragment-assembly based protein structure prediction. Comput Syst Bioinformatics Conf 2006, 19–29.
Karplus K, Karchin R, Barrett C, Tu S, Cline M, Diekhans M, Grate L, Casper J, Hughey R: What is the value added by human intervention in protein structure prediction? Proteins 2001, (Suppl 5):86–91.
Zhang Y, Arakaki AK, Skolnick J: TASSER: an automated method for the prediction of protein tertiary structures in CASP6. Proteins 2005, 61(Suppl 7):91–8.
Jin W, Furuta T, Park SJ, Koga N, Fujitsuka Y, Chikenji G, Takada S: ROKKY: structure prediction server that integrates PDB-BLAST, 3D-Jury, and the SimFold fragment assembly simulator.[http://www.forcasp.org/modules.php?name=Papers&file=article&sid=2195]
Vullo A, Walsh I, Pollastri G: A two-stage approach for improved prediction of residue contact maps. BMC Bioinformatics 2006, 7: 180.
Fischer D: Hybrid fold recognition: combining sequence derived properties with evolutionary information. Pac Symp Biocomput 2000, 119–30.
Wu S, Skolnick J, Zhang Y: Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biol 2007, 5: 17.
Soding J, Biegert A, Lupas AN: The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 2005, (33 Web Server):W244–8.
Jaśkowski W, Blazewicz J, Lukasiak P, Milostan M, Krasnogor N: 3D-Judge – A Metaserver Approach to Protein Structure Prediction. Foundations of Computing and Decision Sciences 2007., 31: [http://www.cs.put.poznan.pl/wjaskowski/pub/papers/jaskowski073djudge.pdf]
Lund O, Hansen J, Brunak S, Bohr J: Relationship between protein structure and geometrical constraints. Protein Sci 1996, 5: 2217–25.
Marin A, Pothier J, Zimmermann K, Gibrat JF: FROST: a filter-based fold recognition method. Proteins 2002, 49(4):493–509.
Canutescu AA, Shelenkov AA, Dunbrack RL Jr: A graph-theory algorithm for rapid protein side-chain prediction. Protein Sci 2003, 12: 2001–14.
The authors wish to thank the CASP organisers for their on-going efforts to maintain this important experiment and the developers of public protein structure prediction servers for providing their models for this analysis. This work was supported by the European Commission grants GeneFun (LSHG-CT-2004-503567) and BioSapiens (LSHG-CT-2003-503265).
LK carried out the statistical analysis of the data, programmed the user model scoring feature and prepared the first draft of the manuscript. LR conceived of the study, coordinated it and revised this manuscript.
About this article
Cite this article
Kaján, L., Rychlewski, L. Evaluation of 3D-Jury on CASP7 models. BMC Bioinformatics 8, 304 (2007). https://0-doi-org.brum.beds.ac.uk/10.1186/1471-2105-8-304
- Prediction Interval
- Protein Structure Prediction
- Reliability Score
- Original Server
- Partner Server