Skip to main content

Estimating the optimal linear combination of predictors using spherically constrained optimization

Abstract

Background

In the context of a binary classification problem, the optimal linear combination of continuous predictors can be estimated by maximizing the area under the receiver operating characteristic curve. For ordinal responses, the optimal predictor combination can similarly be obtained by maximization of the hypervolume under the manifold (HUM). Since the empirical HUM is discontinuous, non-differentiable, and possibly multi-modal, solving this maximization problem requires a global optimization technique. Estimation of the optimal coefficient vector using existing global optimization techniques is computationally expensive, becoming prohibitive as the number of predictors and the number of outcome categories increases.

Results

We propose an efficient derivative-free black-box optimization technique based on pattern search to solve this problem, which we refer to as Spherically Constrained Optimization Routine (SCOR). Through extensive simulation studies, we demonstrate that the proposed method achieves better performance than existing methods including the step-down algorithm. Finally, we illustrate the proposed method to predict the severity of swallowing difficulty after radiation therapy for oropharyngeal cancer based on radiation dose to various structures in the head and neck.

Conclusions

Our proposed method addresses an important challenge in combining multiple biomarkers to predict an ordinal outcome. This problem is particularly relevant to medical research, where it may be of interest to diagnose a disease with various stages of progression or a toxicity with multiple grades of severity. We provide the implementation of our proposed SCOR method as an R package, available online at https://CRAN.R-project.org/package=SCOR.

Background

In precision medicine, there is great interest in utilizing multiple biomarkers or continuous variables to categorize patients based on disease stage, predict treatment response, or anticipate the potential severity of toxicity. To address this challenge, various classification methods have been proposed. Existing model-based approaches to the classification problem include logistic regression and linear discriminant analysis. As an alternative to model-based methods, [1, 2] focused on classifying different subgroups based on maximization of the area under the receiver operating characteristic (ROC) curve (AUC). In its simplest form, the ROC curve can be interpreted as the probability of detection as a function of the false alarm rate, so higher values of the AUC signify higher average hit rates over different possible values of the false alarm rate. In the field of medical science, the AUC is widely used as a criterion to combine multiple diagnostic test results or other continuous predictor values [3]. However, the ROC curve does not have an exact analytical expression, so estimating the AUC and finding optimal predictor combinations to maximize its value remains challenging.

In the case of a binary outcome, various non-parametric estimates of the AUC have been proposed. For example, [2] estimated the AUC using Mann-Whitney U statistics, while [4] proposed a smooth estimate obtained using a sigmoid function. However, with increasing sample size, these objective functions become challenging to maximize. In order to handle this problem for large datasets, [5] proposed the min-max method, which only considers the linear combination of two extreme biomarkers in estimating the AUC, and therefore maintains the same computation time for any given sample size. For multi-categorical outcomes, the hypervolume under the manifold (HUM), also known as the volume under the ROC surface, has been proposed as an analogue to the AUC [6]. For the three-category outcome scenario, various estimates of the HUM have been proposed [7,8,9]. In the multi-category setting, [10] proposed a method to estimate the optimal combination vector when the outcomes of the biomarkers follow a normal distribution. However their proposed method was noted to under-perform when the biomarker values are non-normal [11, 12]. In the presence of multiple biomarkers, to avoid the computational burden of simultaneous estimation of the combination vector, [2] proposed the step-down algorithm. However, this strategy of maximizing the HUM estimate by estimating one coordinate at a time can work poorly in higher dimensional settings. More recently, [13] proposed an empirical estimate of HUM (EHUM) for the three-category outcome scenario, but because of the lack of concavity of most of the proposed estimates, maximization of the objective functions remained challenging. To overcome this computational burden, under the assumption of normality, [14] proposed a penalized and scaled stochastic distance based method. Finally, [11] proposed an efficient approach which relies on upper and lower bounds (ULBA).

As discussed above, most of the proposed estimates of the HUM are discontinuous and non-differentiable functions of the combination coefficients, and potentially have multiple maximums. Therefore, one of the most vital aspects regarding their performance remains the maximization step. This problem becomes more challenging with increasing sample size, number of biomarkers, and the number of outcome categories. For simple cases, even in the presence of multiple local maximums, most of the optimization algorithms used in practice can still find the best solution. But, as the maximization problem gets harder, it becomes more difficult to find the global maximum out of all local maximums. Thus, performance becomes highly dependent on the optimization algorithm used for the maximization. When maximizing discontinuous, non-differentiable, multi-modal objective functions, derivative-free non-convex optimization techniques are critical to ensure a good solution [15]. However, to the best of our knowledge, none of the previous articles proposing HUM estimates considered using global optimization techniques for maximizing the corresponding objective function. Our goal in the current work is both to point out the limitation of existing methods for this problem, and to propose a global optimization method that offers improved performance in the prediction of ordinal outcomes.

Global optimization using pattern search

We now give a more precise mathematical description of the problem of interest. Suppose \({\beta }\) denotes the d-dimensional linear combination vector to be estimated, and \(f({\beta })\) denotes the empirical value of the HUM. Unfortunately, \(f({\beta })\) is not identifiable since \(f({\beta }) = f(a{\beta })\) for any scalar \(a>0\) [5]. This means that we cannot simply optimize \(f({\beta })\) over the unconstrained space \({\beta } \in \mathrm {R}^d\). In order to address this non-identifiability, instead of maximizing \(f({\beta })\) over \(\mathrm {R}^d\), we add an extra constraint \(||{\beta }|| = 1\) where \(||\cdot ||\) denotes the Euclidean norm. So the problem can be re-defined as

$$\begin{aligned} maximize:&\; f({\beta }), \text { where }{\beta } = (\beta _1,\ldots ,\beta _d) \nonumber \\ subject \; to:&\; \sum _{i=1}^d\beta ^2_i = 1, {\beta } \in \mathrm {R}^d, \end{aligned}$$
(1)

where \(f({\beta })\) can be discontinuous (as is the case for EHUM), non-differentiable, and multi-modal. Note that here the coordinates \({\beta }\) must lie on the surface of a unit sphere. Due to the possibly multi-modal nature of \(f({\beta })\), global optimization tools are preferred over convex optimization methods.

There are a number of existing optimization methods, which we summarize briefly here. For maximizing any multi-modal function, global optimization techniques such as the genetic algorithm [16] and simulated annealing [17] have been shown to yield better results as compared to convex optimization methods such as the interior-point algorithm [18] or sequential quadratic programming [19]. The Nelder-Mead algorithm [20], also known as the simplex method, is a heuristic derivative-free optimization technique that has been widely applied to statistical problems, and serves as the default option for the optim function in R and the basis of fminsearch in MATLAB. To improve on the convergence of these algorithms, [21] proposed pattern search (PS), where possible solution points around the current solution are found using an adaptive step-size vector. Importantly, none of these algorithms were specifically designed to handle global optimization over a spherical parameter space.

To minimize a non-convex function (or, equivalently, maximize a non-concave function) globally over a hyper-rectangular parameter space, [22] proposed a modified version of global pattern search called Recursive Modified Pattern Search (RMPS). This approach, which is discussed in detail in the Methods section, has desirable properties in terms of computational scalability and the fact that many of the operations can be performed in parallel. Subsequently, [23, 24] extended RMPS to problems where the parameter space is given by a collection of simplexes.

In this article we develop an algorithm called ‘Spherically Constrained Optimization Routine’ (SCOR) which utilizes the basic principle of the RMPS algorithm but accommodates the constraint of a spherical parameter space. Based on an extensive set of simulation studies, the computational efficiency of the proposed algorithm is established and its performance is compared with existing optimization techniques. We show that using SCOR to maximize EHUM or ULBA results in noticeable improvement over the Nelder-Mead, step-down and min-max algorithms. Finally, we apply our proposed algorithm to derive a combination score which best predicts the severity of dysphagia, or difficulty swallowing, based on the mean radiation dose delivered to various structures in the head and neck during cancer treatment.

Prior work

In this section, we describe existing methods of estimating the optimal coefficient vector for combining biomarkers. Consider a study with M possible outcome categories. Let \(\mathbf {X}_1, \ldots , \mathbf {X}_M\) denote the d-dimensional observed biomarker values for the M outcome categories. Each coordinate of \(\mathbf {X}_j\) denotes the value of a biomarker, \(j=1,\ldots ,M\), where d is the number of biomarkers. Suppose \(\mathbf {X}_j \sim F_j\) for \(j=1,\ldots , M\) where \(F_j\) denotes a multivariate continuous distribution. Now the linear combination of the biomarkers corresponding to the j-th outcome category is given by \({\beta }^T\mathbf {X}_j = \sum _{k=1}^d\beta _kX_{jk},\) where \({\beta } = (\beta _1,\ldots ,\beta _d)\) denotes the combining coefficient vector. Without loss of generality, assuming that the higher value of the combination value corresponds to the higher outcome category, the HUM [6], which measures the diagnostic accuracy, is given by \(D({\beta }) = P({\beta }^T\mathbf {X}_M> \cdots {\beta }^T\mathbf {X}_2 > {\beta }^T\mathbf {X}_1)\). In order to distinguish the outcome categories, our goal is to find the optimal combination coefficient vector \({\beta }_0\) for which \(D({\beta })\) is maximized. So, \({\beta }_0 = \mathop {\mathrm {arg\,max}}\limits _{||{\beta }|| = 1} D({\beta })\). Note that \(D({\beta })\) should be maximized over all possible coefficient vectors of norm 1 to avoid the issue of non-identifiability. When \(\mathbf {X}_1,\ldots , \mathbf {X}_M\) follow multivariate normal distributions, under a few regularity conditions, the value of \({\beta }_0\) can be derived [25]. However, without any distributional assumption on \(\mathbf {X}_1,\ldots , \mathbf {X}_M\), the value of \({\beta }_0\) cannot be analytically obtained.

Estimates of the HUM

We now review estimates of the HUM proposed in the literature. Essentially, these approaches provide different options for how to formulate the objective function.

Empirical hypervolume under the manifold (EHUM)

In order to estimate the optimal coefficient vector, [13] proposed maximizing the empirical estimate of the HUM from the given sample. Suppose the sample is denoted by \(\{\mathbf {X}_{ji_j}: j = 1,\ldots , M, i_j = 1,\ldots ,n_j\}\), so the total sample size is \(n = \sum _{j=1}^Mn_j\). Then the empirical estimate of HUM is given by

$$\begin{aligned} {D}_{E}({\beta }) = \dfrac{1}{n_{1}n_{2} \cdots n_{M}} \displaystyle \sum _{i_{1}=1}^{n_{1}}\displaystyle \sum _{i_{2}=1}^{n_{2}} \cdots \displaystyle \sum _{i_{M}=1}^{n_{M}}I({\beta }^{T}\mathbf {X}_{M,i_{M}}>\cdots>{\beta }^{T}\mathbf {X}_{2i_{2}}>{\beta }^{T}\mathbf {X}_{1i_{1}}). \end{aligned}$$

Here \(I(\cdot )\) denotes the indicator function. The optimal combination coefficient vector obtained by maximizing \({D}_{E}({\beta })\) is given by \(\hat{{\beta }}_E = \mathop {\mathrm {arg\,max}}\limits _{||{\beta }|| = 1} D_E({\beta })\). Note that \(D_E({\beta })\) can be multi-modal. The discontinuity and non-differentiablity of \(D_E({\beta })\) pose additional challenges in maximizing it using existing optimization methods.

Upper and lower bound approach (ULBA)

In order to alleviate the computational burden of maximizing \(D_E({\beta })\), [11] proposed alternative objective functions which can be maximized more easily. [11] showed that

$$\begin{aligned} \max \{0, (M-1)P_{A}({\beta }) - (M-2)\} \le D({\beta }) \le P_{M}({\beta }), \end{aligned}$$

where \(P_{A}({\beta })\) and \(P_{M}({\beta })\) are defined by

$$\begin{aligned} P_{A}({\beta }) =\frac{1}{M-1} \displaystyle \sum _{j=1}^{M-1}P({\beta }^{T}\mathbf {X}_{j+1}>{\beta }^{T}\mathbf {X}_{j}), P_{M}({\beta }) = \min _{1 \le j \le M-1} P({\beta }^{T}\mathbf {X}_{j+1}>{\beta }^{T}\mathbf {X}_{j}). \end{aligned}$$

[11] proposed that instead of maximizing \(D_E({\beta })\), we can either maximize \(P_{A}({\beta })\) or \(P_{M}({\beta })\), as they are much easier to solve. Since they showed that the solution obtained by maximizing \(P_A({\beta })\) yields better results than that obtained using \(P_M({\beta })\), in this paper we consider the optimal coefficient combination vector using ULBA as given by \(\hat{{\beta }}_{ULBA} = \mathop {\mathrm {arg\,max}}\limits _{||{\beta }|| = 1} P_A({\beta })\). Despite the simpler form of \(P_A({\beta })\) compared to \(D_E({\beta })\), this function is still discontinuous with possibly multiple modes.

Existing techniques for estimating the optimal value of \({\beta }\) beta

Due to the multi-modal nature of the objective functions discussed above, it is challenging to optimize the combination coefficient vector \({\beta }\) simultaneously. In Additional file 1 Section A, we provide background on general-purpose global optimization techniques including the genetic algorithm (GA), simulated annealing (SA), and pattern search (PS) which can be applied here. In the current section, we focus on approaches designed specifically for estimation of the combination vector developed in the past few decades.

[2] proposed the step-down algorithm for combining multiple biomarkers. Although the method was first proposed for the binary categorical outcome case, this principle can also be used in scenarios with more than two categorical outcomes [14]. In the step-down approach, the biomarkers are first ordered based on their individual EHUM value. Then the coefficient of the biomarker with highest EHUM value is taken to be 1. Then at each step one additional biomarker is included and its coefficient is estimated. Thus, the coefficients of the biomarkers are estimated one at a time. A detailed description of the step-down algorithm is provided in Additional file 1 Section B.1. [11] used this algorithm to maximize the upper or lower bound of HUM, namely \(P_{M}\) or \(P_{A}\). [5] proposed the min-max (MM) principle in the context of a binary outcome, where, corresponding to each vector of biomarkers of length d, only the maximum and the minimum values of those d values are used to estimate the combination coefficient vector. The estimation procedure using the MM principle is provided in detail in Additional file 1 Section B.2.

Results

We propose a new optimization approach, Spherically Constrained Optimization Routine (SCOR), to address the challenge of identifying the optimal linear combination of input variables to predict an ordinal outcome. For a detailed description of our efficient derivative-free black-box optimization technique, please see the Methods section below.

Simulation study

In this section, we compare the performance of SCOR with that of the Nelder-Mead (NM), step-down, and min-max methods in the context of simulated data. Under different simulation scenarios, we estimate the optimal combination vector by maximizing the EHUM and ULBA objective functions using each of these four approaches. We measure performance in terms of the EHUM objective function value \(D_E(\cdot )\) at the estimated optimal solution, evaluated on a separate test data set. The SCOR algorithm is implemented in MATLAB 2016b. NM is an unconstrained optimization technique, so in applying this algorithm, we set the first parameter value to be either 1 or -1, whichever maximizes the individual EHUM if we set other parameter values to be 0. Then, the rest of the parameters are estimated by maximizing EHUM and ULBA using NM, performed using the fminsearch function in MATLAB 2016b. For the step-down and min-max algorithms also, we use the fminsearch function with default options.

We consider three simulation scenarios, two based on the normal distribution, and one based on the Weibull distribution (to illustrate performance in the context of non-normal data). Among the two normal distribution scenarios considered, in the first one, we simulate covariates that are not correlated, while in the second, the covariates have non-zero correlation. For each simulation scenario, we consider settings with \(M=2\) and \(M=3\) ordinal outcomes. The number of biomarkers is taken to be \(d = 5,10,\) or 20. In each case, the d biomarkers are generated from the corresponding normal or Weibull distribution. For the case of \(M=2\) categories, we set the sample sizes for each category to be 15, 30, and 60. For the \(M=3\) category case, we take the sample sizes of each category to be 15 and 30.

Scenario 1: For the i-th disease category, the values of the biomarkers are simulated from the d-variate normal distribution for \(d=5,10,\) or 15 with mean \({\mu }_{i} = \big (\mu _{i1},\ldots ,\mu _{id}\big )\), and covariance matrix \(\mathbf {\Sigma }_i=\mathbf {I}_d\), where \(\mu _{ij}=(-1)^j*i[1+0.1(j-1)]\), for \(j = 1,\ldots ,d\), \(i=0,1\) (for the two category outcome case) and \(i=0,1,2\) (for the three category outcome case).

Scenario 2: In order to explore the relative performance of all methods in cases with correlated predictors, here we consider the same model as Scenario 1 except we take \(\mathbf {\Sigma }_i=(a_{st})_{d\times d}\) where \(a_{st} = (0.5)^{|s-t|}\) for \(s,t = 1,\ldots ,d\) and \(i=0,1\) (for the two category outcome case) and \(i=0,1,2\) (for the three category outcome case).

Scenario 3: Here the values of the biomarkers are generated from the multivariate Weibull distribution. For the i-th disease category, the j-th biomarker follows a univariate Weibull distribution with scale parameter \(\lambda _i\), shape parameter \(k_j\) and location parameter \(\gamma _j\). We take \(\lambda _i = i+1,k_j=0.5*j,\gamma _j = (-5)^j\) for \(j = 1,\ldots ,d\), \(i=0,1\) (for the two category outcome case) and \(i=0,1,2\) (for the three category outcome case).

For any given scenario using any of the methods considered, we first estimate the biomarker coefficient vector maximizing EHUM and ULBA. We then generate a new sample of the same size under the same simulation design. The EHUM values reported (i.e., the objective function value \(D_E(\cdot )\)) are obtained using the biomarker coefficient vectors estimated on the training data on the new test set. We adopt this strategy to ensure that the performance estimates do not reflect overfitting to the training data. We report the mean EHUM at the estimated optimal solutions using the SCOR, NM, step-down and min-max algorithms for the different simulation scenarios over 100 replications.

In Table 1, we provide the results for the two category outcome setting for sample sizes 15 and 30 per group and for the three category outcome setting with sample size 15 per group. The results for the two category outcome setting with sample size 60 and for the three category outcome setting with sample size 30 are provided in the Additional file 1: Table S3. SCOR outperforms both NM and step-down for all the scenarios considered. SCOR also outperforms the min-max method for all scenarios except for a few cases for Scenario 3 with higher sample sizes; however, in those cases, the SCOR performance is quite close to that obtained using min-max. As the dimension d of the biomarker vector increases, estimates by SCOR tend to improve, while the step-down algorithm tends to underperform. As the dimension d increases, SCOR tends to outperform step-down by a higher margin. In general, the SCOR estimates have lower standard errors compared to other methods, indicating more stable estimates, which is expected when using a global optimization approach.

Table 1 Performance comparison for two and three ordinal outcomes, where each class has sample size 15 and 30 for the two category case, and 15 for the three category case

In Additional file 1 : Table S2, we compare the performance of SCOR with general-purpose global optimization algorithms based on optimization of five benchmark functions on the unit-spherical parameter space with dimensions \(d=5,20,50,100,\) and 500, and show that in general SCOR outperforms existing optimization algorithms, with greatly reduced computation time. Specifically, using SCOR, we obtain an improvement in computation time up to 67 fold over pattern search, up to 43 fold over simulated annealing, and up to 38 fold over the genetic algorithm.

Theoretical properties of the estimator obtained from SCOR depend on the choice of objective function. When using EHUM as the objective, \(\hat{{\beta }}_E = \mathop {\mathrm {arg\,max}}\limits _{||{\beta }|| = 1} D_E({\beta })\), so obtaining the optimal \({\beta }\) requires finding the set of values that maximize \(D_E({\beta })\). Since this objective may be multi-modal, it is difficult to maximize in practice, creating the need for the proposed SCOR procedure. However, from a theoretical point of view, the properties of the estimator are independent of the choice of algorithm, meaning that existing work on the theoretical properties EHUM can be applied to our results as well. We can therefore rely on previous theoretical results establishing that the empirical estimate of HUM is consistent in probability for the true HUM and asymptotically normal [6], while \(\hat{{\beta }}_E\) is consistent and asymptotically normal under certain regularity conditions [13]. Similar theoretical results for ULBA do not exist. Although we found that in practice maximizing \(P_A({\beta })\) resulted in good performance, likely because its simpler form is easier to maximize than \(D_E({\beta })\), the resulting estimate is a lower bound on the true HUM, and may be strictly lower under certain conditions. Additionally, to show how the mean squared error (MSE) of the estimated optimal coefficient vector \({\beta }\) obtained using SCOR changes with sample size, we provide the results of a simulation study in Table S4 of Additional file 1 Section F. We note that as the sample size increases, the MSE decreases for both the ULBA and EHUM objective functions.

The variance of the estimates for the coefficients and the HUM itself are challenging to estimate in practice [6, 13]. We therefore recommend the use of bootstrap resampling to obtain empirical confidence intervals for both the coefficients and the HUM. For the empirical HUM estimate, the bootstrap standard error \(\widehat{\text {se}}_B\) can be obtained from the bootstrap samples, and the resulting \(100(1-\alpha )\)% confidence interval can be estimated as \(\widehat{\text {HUM}} \pm z_{\alpha /2}\widehat{\text {se}}_B\), where \(z_{\alpha /2}\) represents the upper \(\alpha /2\) quantile of the standard normal distribution [6]. A similar procedure can be used to obtain confidence intervals for \(\hat{{\beta }}\).

Application to the prediction of swallowing difficulty following radiation therapy

In this section, we illustrate the SCOR method with an application to data from a study of swallowing difficulty, or dysphagia, in patients treated with radiation therapy for oropharyngeal cancer. Outcomes of observational studies and clinical trials in cancer are commonly defined as categorical or ordinal variables, and standardized systems have been adopted for both response and toxicity assessment in this framework. Specifically, the standard definition of response to cancer therapy in solid tumors includes four categories reflecting the treatment efficacy: complete response, partial response, stable disease, and progressive disease [26]. Toxicities experienced as a consequence of treatment also follow a standard definition for classification and grading, the Common Terminology for Criteria for Adverse Events (CTCAE) [27]. The categories on this scale are 0 for no symptoms, 1 for mild, 2 for moderate, 3 for severe, 4 for life-threatening, and 5 for death. Similar scales have been developed for outcomes of interest in specific cancer types: in particular, the DIGEST scale measures swallowing difficulty using a similar grading scale to that of CTCAE [28, 29].

In this case study, we analyze data derived from a study of swallowing difficulties, quantified using the DIGEST scale, as a result of radiation therapy for oropharyngeal cancer [30]. Radiation therapy is commonly used to treat oropharyngeal cancer [31]. Although this treatment is critical to ensure control of the tumor, radiation to structures in the head and neck related to swallowing can lead to loss of function, and ultimately reduced loss of quality of life or serious complications, such as aspiration-related pneumonia [32, 33]. Understanding the impact of radiation dose to specific structures on swallowing outcomes can inform the development of radiation treatment plans designed to minimize dose to critical structures [34].

In our dataset, radiation dose is characterized as the mean dose delivered to 13 critical structures. Here we focus on the 87 subjects with no missing values for dose. The response variable describing swallowing difficulty was quantified using the DIGEST scale; to ensure sufficient sample size in each class, we collapsed grades 2-4 into a single group [28]. The sample sizes per class are then 29 subjects with no dysphagia (0 on the DIGEST scale), 27 with mild dysphagia (1 on the DIGEST scale), and 30 with moderate to severe dysphagia (2-4 on the DIGEST scale). Since some of the predictors were highly correlated, we performed a screening step to filter out the most highly correlated using a greedy approach, with variables that had the highest average correlation removed from the set until all pairwise correlations were less than 0.8. After screening out 4 variables, we had 9 remaining predictors: mean radiotherapy dose to cricopharyngeal muscle, esophagus, glottic area, inferior pharyngeal constrictor (PC), ipsilateral anterior digastric muscle (ADM), middle PC, mylohoid muscle, superior PC, and intrinsic tongue muscles.

To illustrate the relative performance of the SCOR, step-down and min-max techniques, we compute the estimated combination coefficients maximizing the ULBA and EHUM criteria using each method. The EHUM value at those solutions are then evaluated. We also compute the optimal cut-points to categorize the combination scores (obtained by multiplying the optimal coefficient vector with the biomarker values) using Youden’s index [35]. Youden’s index is defined as the maximum possible value of \((\text {sensitivity}+ \text {specificity} - 1)\) over all possible decision thresholds, and provides a summary measure combining sensitivity and specificity [36]. Note that before obtaining the combination scores, all the solution combination vectors obtained in the step-down and min-max techniques are divided by their corresponding norms so that each solution vector has norm 1.

Figure 1 shows the optimal combination vector scores for each class. SCOR results in improved separation of these scores across the classes, which is reflected in the higher values of EHUM and Youden’s index achieved for both versions of the objective function (ULBA and EHUM). In addition, the cut-points (marked by horizontal dotted lines) obtained by SCOR distinguish the three outcome categories more clearly than those of step-down or min-max.

Fig. 1
figure 1

Boxplots of the optimal combination vector scores are shown across the outcome categories 0, 1, and 2-4. The methods compared are SCOR a ULBA, e EHUM), NM b ULBA, f EHUM), Step-down c ULBA, g EHUM) and Min-max d ULBA, h EHUM). The horizontal dotted lines denote the corresponding cut-points for classification obtained by maximizing Youden’s Index. Since this example includes 3 outcome categories, values of EHUM or ULBA greater than \(\frac{1}{3\; !} \approx 0.167\) reflect improved accuracy over random guessing

In Table 2 we provide the values of the optimal combination coefficients using SCOR, Nelder-Mead, and step-down. In the familiar setting of AUC estimation, where there are two outcome categories, a value of 0.5 corresponds to the expected value for random guessing, with higher values demonstrating improved classification accuracy. Here, since we have three ordered categories, the expected value of the HUM under random guessing is \(\frac{1}{3\; !} \approx 0.167\). Hence, estimated values of EHUM or ULBA above 0.167 can be considered as an improvement over a random guess. Since we obtain the highest value of \(D_E(\hat{\beta })\) using the estimates obtained by maximizing ULBA with the SCOR algorithm, the coefficients obtained using this approach (i.e., those given in the first column) represent the preferred solution. The esophagus and middle pharyngeal constrictor, which are known to be essential to swallowing function [37], have the largest estimated coefficients, representing a relatively large contribution to the predicted combination scores. Since increasing radiation dose to structures in the head and neck is generally expected to worsen function, the slight negative coefficient for glottic area likely reflects correlation with other predictors, rather than a protective effect. The results from this case study could be used to inform radiation treatment planning, since intensity-modulated radiation therapy plans could be adjusted to minimize predicted dysphagia risk by sparing radiation to key muscles and organs in the head-and-neck region.

Table 2 Optimal coefficients obtained by maximizing the ULBA and EHUM objective functions using the SCOR, NM and step-down algorithms

Discussion

Here the proposed SCOR algorithm is used mainly in the context of a classification problem with a hypervolume under manifolds criteria. However, this algorithm can be used in various other statistical problems such as directional statistics or single-index models where fixing the norm of the coefficient vector is needed to avoid the issue of non-identifiability. In the future, the SCOR algorithms can be extended to the variable selection problem over the coefficients belonging to the surface of a unit sphere.

Conclusions

In this paper, we propose a novel derivative-free black-box optimization technique to minimize any non-convex function where the parameters are constrained to the surface of a unit sphere. The proposed algorithm is highly efficient, as it allows parallelization using up to 2n parallel threads when maximizing a function whose parameters belong to the surface of an n-dimensional unit sphere. Our simulations demonstrate that SCOR outperforms existing methods for biomarker combination, as well as other black-box optimization techniques, in terms of both performance and computation time. The R package SCOR, which implements the method described here, is available at https://CRAN.R-project.org/package=SCOR.

We show that using SCOR, we obtain better estimates of the empirical hypervolume under the manifold (EHUM) compared to the estimates obtained using the Nelder-Mead, step-down and min-max algorithms. Irrespective of the objective function considered, the EHUM value at the solution obtained by SCOR is always better than that obtained using the alternative algorithms. In our case study, we applied SCOR to find the optimal combination coefficients of radiation dose to various structures in the head-and-neck to distinguish subjects with increasing severity of swallowing difficulty experienced as a result of radiation treatment for cancer. We found that the values of EHUM and Youden’s index obtained were higher for SCOR than for existing alternatives, suggesting that this approach can be applied to more accurately stratify patients based on their risk of developing dysphagia.

Methods

In this section, we provide a detailed description of the algorithm for our proposed SCOR method.

Basic Principle

The basic principle of the search for the optimal value is as follows. Within an iteration, at first a step size \(s>0\) is fixed. Then, two new points in the neighborhood are obtained by adding and subtracting s from each coordinate keeping all other coordinates fixed. Thus, at any given iteration, 2d possible new solution points are generated. For example, if \((\beta _1,\beta _2,\beta _3)\) denotes the current solution (taking \(d=3\)), then the new set of possible solutions are \((\beta _1+s,\beta _2,\beta _3), (\beta _1-s,\beta _2,\beta _3), (\beta _1,\beta _2+s,\beta _3), (\beta _1,\beta _2-s,\beta _3), (\beta _1,\beta _2,\beta _3+s)\) and \((\beta _1,\beta _2,\beta _3-s)\). This strategy is related to the approach used in [38] to solve an unconstrained optimization problem. This specific principle has two desirable properties: firstly, at each iteration the size of the search space (i.e., the set of newly generated points) is of O(d) (i.e., 2d), unlike GA, where the search space increases exponentially with the dimension [39]. Secondly, after the step-size s is fixed, the operations for finding the 2d new points and evaluating the objective function at those points are independent, so they can be performed in parallel. However, due to the spherical constraint on \({\beta }\), this simple search strategy cannot be directly applied here.

We now summarize the proposed optimization scheme designed to address this issue. The algorithm can be broken into a sequence of runs where within each run, a number of iterations are performed. At the beginning of each iteration, the step-size \(s>0\) is fixed. The location of the new set of points to be evaluated in the iteration depends on this step size s. The exact relationship between the step size and the new set of possible solutions is described in the following subsections. The first iteration in the first run starts from a starting point provided by the user. At each iteration, a new set of 2d points around the current solution is generated. As mentioned above, the location of these new points is directly related to the value of step-size s. In general, larger values of s result in more distant points from the current solution. Then, the objective function is evaluated at all \(2d+1\) points. At the end of an iteration, the point with the maximum objective function value is taken as the updated solution. Thus, at each iteration the objective function value either increases or stays the same. At the beginning of a run, the step size is taken to be large. Depending on the improvement of the objective function across iterations, the step size is either reduced or kept same after each iteration. Once the step size within a run becomes sufficiently small (determined by a threshold \(\phi\) which is a tuning parameter of the algorithm), the run ends, and the current solution is passed to the next run, which uses this solution as its starting point. Once two consecutive runs yield the same solution, the algorithm stops execution and returns the final solution. A brief overview of the SCOR algorithm is given in Fig. 2.

Fig. 2
figure 2

Flowchart of the SCOR algorithm where s denotes the step size, \(\phi\) denotes the step size threshold, and SOL(k) denotes the solution returned by the k-th run

Derivation of adjustment step-size

The RMPS method [22] incorporates adjustments to the step size to ensure that the proposed points remain within a restricting hyper-rectangle. In that approach, once the step size s is added to (or subtracted from) any of the coordinates, the other coordinates are kept unchanged. However, over a spherically constrained parameter space, adding the step size s to a coordinate of the current solution while keeping the other coordinates fixed would yield a point outside the unit sphere surface, assuming the current solution is on the unit sphere. To handle a spherically constrained parameter space, a critical challenge is devising an adjustment strategy such that when s is added to (or subtracted from) the i-th coordinate, the adjustment of other coordinates ensures that the proposed solution point is still on the unit sphere. We derive an appropriate adjustment step size \(t_i\) which depends on s and is chosen so that when \(t_i\) is added to the remaining coordinates, the point obtained is still on the unit-sphere. In Figure 3(a) an exemplary plot is provided giving an idea why adjustment step-size should be a function of s.

For example, suppose within any given run, at the j-th iteration, the current solution is given by \({\beta }^{(j)} = (\beta _1^{(j)},\ldots ,\beta _d^{(j)})\). Note that \(\sum _{k=1}^d(\beta _k^{(j)})^2 = 1\). Suppose we update the i-th coordinate as \(\beta _i^{(j+1)} = \beta _i^{(j)}+s.\) Then the corresponding adjustment step size \(t_i\) is added to the rest of the coordinates, \(\beta _k^{(j+1)} = \beta _k^{(j)}+t_i, \; k \in \{1,\ldots ,d\} \setminus \{i\}.\) By the definition of the adjustment step size, \({\beta }^{(j+1)}\) is on the unit sphere as long as such a \(t_i\) exists. Since \({\beta }^{(j)}\) is also on the unit-sphere, we have,

$$\begin{aligned} \sum _{k=1}^d (\beta _k^{(j)})^2 = \sum _{k=1}^d (\beta _k^{(j+1)})^2 = \sum _{k=1,k\ne i}^d (\beta _k^{(j)} + t_i)^2 + (\beta _i^{(j)}+s)^2 = 1, \end{aligned}$$
(2)

which implies,

$$\begin{aligned} (d-1)t_i^2 + 2t_i\sum _{k=1,k\ne i}^d\beta _k^{(j)} + (2s\beta _i^{(j)} + s^2) =0. \end{aligned}$$
(3)

The value of \(t_i\) can be obtained by solving (3) to obtain \(t_i = T_1(s)\) or \(T_2(s)\) where

$$\begin{aligned}&T_1(s) = \frac{-2\sum _{k=1,k\ne i}^d\beta _k^{(j)} + \sqrt{D_i(s)}}{2(d-1)}, \quad T_2(s) = \frac{-2\sum _{k=1,k\ne i}^d\beta _k^{(j)} - \sqrt{D_i(s)}}{2(d-1)}, \nonumber \\&D_i(s) = \left (2\sum _{k=1,k\ne i}^d\beta _k^{(j)}\right )^2 - 4(d-1)(2s\beta _i^{(j)} + s^2). \end{aligned}$$
(4)

In order for the proposed algorithm to converge to the true solution, the value of \(t_i\) needs to be chosen such that the distance of the new possible solution \({\beta }^{(j+1)}\) from the current solution \({\beta }^{(j)}\) should go to 0 as s goes to 0, which only holds true for \(t_i = T_1(s)\). So for a given step size s, the adjustment step size is taken to be \(T_1(s)\). It should be noted that \(T_1(s)\) may come out to be complex implying that for the given step size s and coordinates of the current solution, no appropriate adjustment step size \(t_i\) exists. In those cases, we adopt alternative strategies, as discussed in the section on local step sizes. In Fig. 3b, starting with a current solution on the surface of unit sphere, the new set of explored points are plotted on the unit-sphere corresponding to two different exemplary step sizes. In order to make the algorithm more efficient, a minor modification is incorporated which is described in Section C of the Additional file 1

Fig. 3
figure 3

Exploratory moves on the surface of unit sphere by SCOR. a shows an initial point \((\beta _1,\beta _2,\beta _3)\) (green) on the surface of the unit sphere and a point explored \((\beta _1+t_2,\beta _2+s,\beta _3+t_2)\) (red) after making exploratory move with step-size s for the \(2^{nd}\) coordinate, where \(t_2\) denotes the adjustment step-size. b shows exploratory moves in a typical iteration of SCOR: starting from the current solution point \((0.289,-0.816,0.5)\) (green), the points explored after making exploratory moves with step-size \(s=0.03\) (blue) and \(s=0.06\) (red)

Tuning parameters

In SCOR, the tuning parameters and their roles are similar to those considered in RMPS and RMPSS [22, 24]. Since the solution update strategy is consistent across runs, explaining the roles of the tuning parameters within a single run is sufficient to illustrate their effect. Within each run, iterations are initialized with a starting point which is either the initial guess provided by the user (for the first run) or the solution returned by the previous run (for all runs after the first run). At the beginning, the step size is set to \(s_{initial}\), which is preferably taken to be large in order to promote selection of distant candidate solutions w.r.t. the current solution. Suppose \({\beta }^{(j)} = (\beta _1^{(j)},\; \ldots , \beta _{d}^{(j)})\) denotes the solution at the end of j-th iteration. At the beginning of \((j+1)\)-th iteration, suppose the current step-size is \(s = s^{(j+1)} (>0)\). Then the set of d new candidate solutions is given by \(\{{\beta }_{(i,+)}^{(j+1)}\}_{i=1}^d\) where

$$\begin{aligned} {\beta }^{(j+1)}_{(i,+)} = (\beta _1^{(j)}+t_i, \beta _2^{(j)}+t_i, \ldots , \beta _{i-1}^{(j)}+t_i, \beta _{i}^{(j)}+s,\beta _{i+1}^{(j)}+t_i, \ldots , \beta _{d}^{(j)}+t_i). \end{aligned}$$

for \(s>0, i=1,\ldots ,d\). Similarly, taking \(s = -s^{(j+1)}\), another d candidate solutions are generated given by \(\{{\beta }_{(i,-)}^{(j+1)}\}_{i=1}^d\) where

$$\begin{aligned} {\beta }^{(j+1)}_{(i,-)} = (\beta _1^{(j)} + t_i^\prime , \beta _2^{(j)} + t_i^\prime , \ldots , \beta _{i-1}^{(j)}+t_i^\prime , \beta _{i}^{(j)}+s,\beta _{i+1}^{(j)}+t_i^\prime , \ldots , \beta _{d}^{(j)}+t_i^\prime ), \; s<0. \end{aligned}$$

Thus, within any given iteration, based on the step size s, 2d new candidate solutions are generated. The objective function is evaluated at these \(2d+1\) points (2d candidate solutions and the current solution \({\beta }^{(j)}\)), and the point with the lowest value of the objective function value is taken as the updated solution \({\beta }^{(j+1)}\). As discussed in the previous section on the derivation of the adjustment step-size, in some scenarios, the adjustment step sizes \(t_i\) or \(t_i^\prime\) might be complex. In those settings, we propose alternative update strategies which are discussed in the following section on local step sizes.

Within a run, at the end of any given iteration, if the improvement of the objective function value is less than a user-specified tolerance tol_fun, the step size is divided by a factor \(\rho >1\) denoted as the step decay rate. So, at each iteration, the step size is either kept the same or reduced by dividing it by the step decay rate. For example, \(s^{(j+1)}\) will be either \(s^{(j)}\) or \(\frac{s^{(j)}}{\rho }\) depending on the improvement to the solution obtained in the \((j+1)\)-th iteration. The step size is reduced to enable finer search close to the current solution if no better solution is found in the set of candidate solutions using a larger value of s. The idea of reducing the step size is a well-known strategy used in existing derivative-free optimization algorithms on unconstrained parameter spaces [38, 40].

We consider the step size as sufficiently close to 0 once its value gets smaller than the step size threshold \(\phi\). Once the step size becomes less than \(\phi\), no further iterations are performed within that particular run. If two consecutive runs yield the same solution, the algorithm terminates after returning the solution obtained in the last run.

To handle cases where the solution is known to be sparse a priori, we consider the sparsity threshold \(\lambda\) as another tuning parameter. Once the step size for the j-th iteration \(s^{(j)}\) is determined, before calculating the adjustment step size \(t_i\), all the coefficients (except the i-th coordinate) with absolute values less than \(\lambda\) are set equal to zero. Suppose at the j-th iteration, while updating i-th coordinate, out of the remaining \(d-1\) coordinates, h of them have absolute values less than \(\lambda\). Then those h coordinates are set equal to 0. The adjusted step size \(t_i\) is calculated based on rest of the remaining \(d-h\) coordinates. In case a sparse solution is not expected, the user can set the value of \(\lambda\) to 0.

Local step sizes

At the beginning of the j-th iteration, we initialize 2d local step sizes such that \(s_i^+=s^{(j)}\) and \(s_i^-=-s^{(j)}\) for \(i = 1,\ldots ,d\). When adding \(s_i^+\) (or \(s_i^-\)) to the i-th coordinate, if the corresponding adjustment step size \(t_i\) (or \(t_i^\prime\)) exists, update movements are performed without modifying \(s_i^+\) (or \(s_i^-\)). However, in case no such real \(t_i\) (or \(t_i^\prime\)) exists for the given local step size \(s_i^+\) (or \(s_i^-\)), it is subsequently divided by the step decay rate \(\rho\) until the solution \(t_i\) (or \(t_i^\prime\)) becomes real. Note that in equation (4), \(D_i(s) \xrightarrow {} \big (2\sum _{k=1,k\ne i}^d \beta _k^{(j)}\big )^2>0\) as \(s \xrightarrow {} 0\). Therefore, a real \(t_i\) (or \(t_i^\prime\)) will exist given a local step size \(s_i^+\) (or \(s_i^-\)) sufficiently close to 0 (see Theorem 1). Since at the beginning of the iteration, the values of these local step-sizes are set equal to the step size for that iteration, the values of the local step sizes in the current iteration do not depend on the values from the previous iteration.

Overview of algorithm

Pseudocode for the proposed procedure is given in Algorithm 1. There we let \(\hat{{\beta }}^{(R)}\) denote the solution obtained at the end of R-th run, and \({\beta }^{(j)}\) denotes the solution obtained after the j-th iteration within a particular run. We set an upper limit \(max\_runs\) on the maximum number of runs to be executed. Within each run, there is an upper limit \(max\_iter\) on the number of iterations. The roles of the tuning parameters along with their default values are provided in Additional file 1 Table S1.

figure a

Theorem 1

Suppose \(\mathbf {S} = \{(x_1,\ldots ,x_n) \in \mathrm {R}^n \; : \; \sum _{i=1}^n x_i^2 = 1, i=1,\cdots ,n\}\). Consider a sequence of step-sizes \(\delta _k = \frac{s}{\rho ^k}\) for \(k\in \mathrm {N}\) and \(s\ne 0, \rho >1\). Then there exists a K such that for \(k \ge K\), all adjustment step sizes \(\{t_i\}_{i=1}^d\) are real.

Theorem 2

Suppose f is convex, continuous and differentiable on \(\mathbf {S}\). Consider a sequence \(\delta _k = \frac{s}{\rho ^k}\) for \(k\in \mathrm {N}\) and \(s>0, \rho >1\). Suppose \(\mathbf {u}\) is a point in \(\mathbf {S}\). Define \(\mathbf {u}_k^{(i+)} = (u_1+t_i(\delta _k),\ldots , u_{i-1}+t_i(\delta _k), u_i+\delta _k,u_{i+1}+t_i(\delta _k), \ldots , u_n+t_i(\delta _k))\) and \(\mathbf {u}_k^{(i-)} = (u_1+t_i(-\delta _k),\ldots , u_{i-1}+t_i(-\delta _k),u_i-\delta _k,u_{i+1}+t_i(-\delta _k), \ldots , u_n+t_i(-\delta _k))\) for \(i=1,\cdots ,n\), where \(t_i(s)\) denotes the adjustment step size corresponding to step size s.

Given conditions detailed in Additional file 1 Section D, the global minimum of f over \(\mathbf {S}\) occurs at \(\mathbf {u}\).

The proofs of Theorems 1 and 2 are provided in Additional file 1 Section D. As is the case for any existing black-box optimization method, SCOR is not theoretically guaranteed to find the global minimum of any arbitrary black-box function. However, it incorporates strategies to avoid getting stuck at local minima. As shown in Section E of the Additional file 1, SCOR outperform competing black-box optimization methods including GA, SA, and PS in terms of identifying better solutions for benchmark functions. Moreover, as discussed in the next section, in the context of maximizing HUM estimates, SCOR outperforms existing methods proposed for biomarker combination as well.

Availability of data and materials

The R package SCOR is available online at https://CRAN.R-project.org/package=SCOR, and the MATLAB code is available at https://github.com/priyamdas2/SCOR. The case study data were originally described in [30]. An anonymized version of this data set is available at https://0-doi-org.brum.beds.ac.uk/10.6084/m9.figshare.14753052. This optimization method can also be used to maximize or minimize any black-box function on a spherically constrained parameter space. MATLAB code: The MATLAB code is available at https://github.com/priyamdas2/SCOR.

Abbreviations

AUC::

Area under the ROC curve

CTCAE::

Common terminology for criteria for adverse events

EHUM::

Empirical estimate of HUM

GA::

Genetic algorithm

HUM::

Hypervolume under the Manifold

MSE::

Mean squared error

NM::

Nelder-mead algorithm

PS::

Pattern search

RMPS::

Recursive modified pattern search

ROC::

Receiver operating characteristic

SA::

Simulated annealing

SCOR::

Spherically constrained optimization routine

ULBA::

Upper and lower bounds approach

References

  1. Pepe MS, Thompson ML. Combining diagnostic test results to increase accuracy. Biostatistics. 2000;1(2):123–40.

    Article  CAS  PubMed  Google Scholar 

  2. Pepe MS, Cai T, Longton G. Combining predictors for classification using the area under the receiver operating characteristic curve. Biometrics. 2006;62(1):221–9.

    Article  PubMed  Google Scholar 

  3. Zhou X-H, McClish DK, Obuchowski NA. Statistical methods in diagnostic medicine, vol. 464. New York: Wiley; 2002.

    Book  Google Scholar 

  4. Ma S, Huang J. Combining multiple markers for classification using ROC. Biometrics. 2007;63(3):751–7.

    Article  PubMed  Google Scholar 

  5. Liu C, Liu A, Halabi S. A min-max combination of biomarkers to improve diagnostic accuracy. Stati Med. 2011;30(16):2005–14.

    Article  Google Scholar 

  6. Li J, Fine JP. ROC analysis with multiple classes and multiple tests: methodology and its application in microarray studies. Biostatistics. 2008;9(3):566–76.

    Article  PubMed  Google Scholar 

  7. Scurfield BK. Multiple-event forced-choice tasks in the theory of signal detectability. J Math Psychol. 1996;40(3):253–69.

    Article  CAS  PubMed  Google Scholar 

  8. Mossman D. Three-way ROCs. Med Decis Making. 1999;19(1):78–89.

    Article  CAS  PubMed  Google Scholar 

  9. Nakas CT, Yiannoutsos CT. Ordered multiple-class ROC analysis with continuous measurements. Stat Med. 2004;23(22):3437–49.

    Article  PubMed  Google Scholar 

  10. Yanyu Z. ROC analysis in diagnostic medicine (phd thesis). Department of Statistics and Applied Probability, National University of Singapore 2010.

  11. Hsu MJ, Chen YH. Optimal linear combination of biomarkers for multi-category diagnosis. Stat Med. 2016;35(2):202–13.

    Article  PubMed  Google Scholar 

  12. Maiti R, Li J, Das P, Feng L, Hausenloy D, Chakraborty B. A distribution-free smoothed combination method of biomarkers to improve diagnostic accuracy in multi-category classification. 2019 arxiv.org/abs/1904.10046

  13. Zhang Y, Li J. Combining multiple markers for multi-category classification: an ROC surface approach. Aust N Z J Stat. 2011;53(1):63–78.

    Article  Google Scholar 

  14. Kang L, Xiong C, Crane P, Tian L. Linear combinations of biomarkers to improve diagnostic accuracy with three ordinal diagnostic categories. Stat Med. 2013;32(4):631–43.

    Article  PubMed  Google Scholar 

  15. Horst R. Introduction to Global Optimization (Nonconvex Optimization and Its Applications). Berlin: Springer; 2002.

    Google Scholar 

  16. Fraser AS. Simulation of genetic systems by automatic digital computers i. introduction. Austr J Biol Sci. 1957;10:484–91.

    Article  Google Scholar 

  17. Kirkpatrick S, Gelatt CD, Vecchi MP. Optimization by simulated annealing. Austr J Biol Sci. 1983;220(4598):671–80.

    CAS  Google Scholar 

  18. Karmakar N. New polynomial-time algorithm for linear programming. Combinatorica. 1984;4:373–95.

    Article  Google Scholar 

  19. Boggs PT, Tolle JW. Sequential quadratic programmings. Acta Numerica. 1996;1–52.

  20. Nelder JA, Mead R. A simplex method for function minimization. Comput J. 1965;7:308–13.

    Article  Google Scholar 

  21. Torczon VJ. On the convergence of pattern search algorithms. SIAM J Optimiz. 1997;7:1–25.

    Article  Google Scholar 

  22. Das P. Black-box optimization on hyper-rectangle using recursive modified pattern search and application to matrix completion problem with non-convex regularization. arXiv, 2016. arxiv.org/pdf/1604.08616pdf

  23. Das P. Black-box optimization on multiple simplex constrained blocks. arXiv, 2016. arxiv.org/abs/1609.02249

  24. Das P. Recursive modified pattern search on high-dimensional simplex : A blackbox optimization technique. The Indian Journal of Statistics: Series B 2020.

  25. Su JQ, Liu JS. Linear combinations of multiple diagnostic markers. J Am Stat Assoc. 1993;88(424):1350–5.

    Article  Google Scholar 

  26. Therasse P, Arbuck SG, Eisenhauer EA, Wanders J, Kaplan RS, Rubinstein L, et al. New guidelines to evaluate the response to treatment in solid tumors. J Natl Cancer Inst. 2000;92(3):205–16.

    Article  CAS  PubMed  Google Scholar 

  27. Trotti A, Colevas AD, Setser A, Rusch V, Jaques D, Budach V, et al. Ctcae v3.0: development of a comprehensive grading system for the adverse effects of cancer treatment. Semin Radiat Oncol. 2003;13:176–81.

    Article  PubMed  Google Scholar 

  28. Hutcheson, et al. Dynamic imaging grade of swallowing toxicity (digest): scale development and validation. Dysphagia. 2017;23(1):62–70.

    Google Scholar 

  29. Goepfert RP, Lewin JS, Barrow MP, Warneke CL, Fuller CD, Lai SY, et al. Grading dysphagia as a toxicity of head and neck cancer: differences in severity classification based on MBS DIGEST and clinical CTCAE grades. Dysphagia. 2018;33(2):185–91.

    Article  PubMed  Google Scholar 

  30. Kamal M, Mohamed AS, Volpe S, Zaveri J, Barrow MP, Gunn GB, et al. Radiotherapy dose-volume parameters predict videofluoroscopy-detected dysphagia per DIGEST after IMRT for oropharyngeal cancer: results of a prospective registry. Radiother Oncol. 2018;128(3):442–51.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Daly ME, Le QT, Maxim PG, Loo BW Jr, Kaplan MJ, et al. Intensity-modulated radiotherapy in the treatment of oropharyngeal cancer: clinical outcomes and patterns of failure. Int J Radiat Oncol Biol Phys. 2010;76(5):1339–46.

    Article  PubMed  Google Scholar 

  32. Eisbruch, et al. Objective assessment of swallowing dysfunction and aspiration after radiation concurrent with chemotherapy for head-and-neck cancer. Int J Radiat Oncol Biol Phys. 2002;53(1):23–8.

    Article  PubMed  Google Scholar 

  33. Hutcheson, et al. Long-term functional and survival outcomes after induction chemotherapy and risk-based definitive therapy for locally advanced squamous cell carcinoma of the head and neck. Head Neck. 2014;36(4):474–80.

    Article  PubMed  Google Scholar 

  34. Duprez, et al. Systematic review of dose-volume correlates for structures related to late swallowing disturbances after radiotherapy for head and neck cancer. Dysphagia. 2013;28:337–49.

    Article  PubMed  Google Scholar 

  35. Youden W. Index for rating diagnostic tests. Cancer. 1950;3:32–5.

    Article  CAS  PubMed  Google Scholar 

  36. Luo J, Xiong C. Diagtest3grp: an r package for analyzing diagnostic tests with three ordinal groups. J Stat Softw. 2012;51(3):1–24.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Levendag PC, Teguh DN, Voet P, van der Est H, Noever I, de Kruijf WJ, et al. Dysphagia disorders in patients with cancer of the oropharynx are significantly affected by the radiation therapy dose to the superior and middle constrictor muscle: a dose-effect relationship. Radiother Oncol. 2007;85(1):64–73.

    Article  PubMed  Google Scholar 

  38. Fermi E, Metropolis N. Numerical solution of a minimum problem. los alamos unclassified report la–1492. Los Alamos National Laboratory, Los Alamos, USA 1952.

  39. Geris L. Computational modeling in tissue engineering. Berlin: Springer; 2012.

    Google Scholar 

  40. Kerr C, Dura-Bernal S, Smolinski T, Chadderdon G, Wilson D. Optimization by adaptive stochastic descent. PLOS One. 2018;13(3):0192944.

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to acknowledge the funding sources listed above which supported this work.

About this supplement

This article has been published as part of BMC Bioinformatics Volume 23 Supplement 3, 2022: Selected articles from the International Conference on Intelligent Biology and Medicine (ICIBM 2021): bioinformatics. The full contents of the supplement are available online at https://0-bmcbioinformatics-biomedcentral-com.brum.beds.ac.uk/articles/supplements/volume-23-supplement-3.

Funding

We would like to acknowledge the following funding sources: MK is partially funded by NIH (R01CA205146) and the Charles and Daneen Stiefel MD Anderson Oropharynx Program (00059189). KAH receives funding support from the PCORI, NIH/NCI (R21CA226200-02), NIH/NIDCR (R01DE025248-05), THRIVE/HESI foundation, and Atos Medical not directly related to this work; the data were collected with direct funding support from the Charles and Daneen Stiefel MD Anderson Oropharynx Program. CDF received/receives relevant funding and salary support during the period of data collection and study execution from: the NCI Early Phase Clinical Trials in Imaging and Image-Guided Interventions Program (1R01CA218148); the NSF/NIH Joint Smart Connected Health Program (R01CA257814); and the NCI Parent RPG mechanism (R01CA258827). CDF received/receives unrelated funding and salary support during the period of study execution from: NIH NIBIB Research Education Programs for Residents and Clinical Fellows Grant (R25EB025787-01); NIDCR Academic Industrial Partnership Grant (R01DE028290); NCI Parent Research Project Grant (R01CA258827) an NIH/NCI Cancer Center Support Grant (CCSG) Pilot Research Program Award from the UT MD Anderson CCSG Radiation Oncology and Cancer Imaging Program (P30CA016672); and an NSF Division of Civil, Mechanical, and Manufacturing Innovation (CMMI) grant (NSF 1933369). CDF has received direct industry grant support, honoraria, and travel funding from Elekta AB unrelated to this project. CDF receives direct infrastructure support from the multidisciplinary the Radiation Oncology/Cancer Imaging Program (P30CA016672-44) of the MD Anderson Cancer Center Support Grant (P30CA016672) and the MD Anderson Program in Image-guided Cancer Therapy. BC is partially supported by Ministry of Education - Singapore (MOE2015-T2-2-056) and Duke-NUS Medical School start-up grant. CBP is partially funded by NIH/NCI CCSG (P30CA016672, Biostatistics Shared Resource) and by Cancer Prevention and Research Institute of Texas (RP160668/RP150521). Publication costs are funded by CBP’s MD Anderson Physicians Referral Service (PRS) funding.

Author information

Authors and Affiliations

Authors

Contributions

PD, RM, BC and CBP conceived the study. PD and CBP drafted the manuscript. PD and DD developed the code. PD performed the data analysis. MK, CDF, and KAH contributed to the acquisition of the case study data and interpretation of the results. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Priyam Das.

Ethics declarations

Ethics approval and consent to participate

The case study is a secondary analysis of data collected as a part of the MD Anderson Oropharynx Cancer Registry Patient-Reported Outcomes and Functional (PROF) Core. The study protocol under which the data were collected was approved by The University of Texas MD Anderson Cancer Center Institutional Review Board (PA14-0947).

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1

. This file contains a brief description about the literature on existing global optimization techniques, existing techniques alternative to SCOR, theoretical properties and optimization performance evaluation of SCOR, additional simulation results, and an application of SCOR to Alzheimer's disease data. Table S1. Tuning parameters, their roles and default values in the SCOR algorithm. Table S2. Comparison of minimum value achieved and average computation time (in seconds) for solving the transformed d-dimensional negative log-product function, modified Griewank’s function, negative sum of squares function, modified Exponential function, and modified Easom function for d = 5, 20, 50, 100 and 500 using SCOR, PS, GA, SA starting from 10 randomly generated points in each case. Table S3. Performance comparison for two and three ordinal outcomes, where each class has sample size 60 for the two category case, and 30 for the three category case. The empirical hypervolume under manifolds (EHUM) and upper and lower bound approach (ULBA) objective functions are maximized by the proposed Spherically Constrained Optimization Routine (SCOR) algorithm and the existing Nelder-Mead (NM), step-down, and min-max algorithms. The estimated biomarker coefficient vectors are then used to calculate the EHUM value on a new dataset of the same size generated from the corresponding model. The entire procedure is repeated 100 times, resulting in 100 simulated training and test data sets, and the mean EHUM objective function values on the test data are reported, with the standard error in the parentheses. The result for the method with the best performance is marked in bold.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Das, P., De, D., Maiti, R. et al. Estimating the optimal linear combination of predictors using spherically constrained optimization. BMC Bioinformatics 23 (Suppl 3), 436 (2022). https://0-doi-org.brum.beds.ac.uk/10.1186/s12859-022-04953-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://0-doi-org.brum.beds.ac.uk/10.1186/s12859-022-04953-y

Keywords