 Research
 Open Access
 Published:
Nonlinear expression and visualization of nonmetric relationships in genetic diseases and microbiome data
BMC Bioinformatics volume 19, Article number: 505 (2018)
Abstract
Background
The traditional methods of visualizing highdimensional data objects in lowdimensional metric spaces are subject to the basic limitations of metric space. These limitations result in multidimensional scaling that fails to faithfully represent nonmetric similarity data.
Results
Multiple maps tSNE (mmtSNE) has drawn much attention due to the construction of multiple mappings in lowdimensional space to visualize the nonmetric pairwise similarity to eliminate the limitations of a single metric map. mmtSNE regularization combines the intrinsic geometry between data points in a highdimensional space. The weight of data points on each map is used as the regularization parameter of the manifold, so the weights of similar data points on the same map are also as close as possible. However, these methods use standard momentum methods to calculate parameters of gradient at each iteration, which may lead to erroneous gradient search directions so that the target loss function fails to achieve a better local minimum. In this article, we use a Nesterov momentum method to learn the target loss function and correct each gradient update by looking back at the previous gradient in the candidate search direction.
By using indirect secondorder information, the algorithm obtains faster convergence than the original algorithm. To further evaluate our approach from a comparative perspective, we conducted experiments on several datasets including social network data, phenotype similarity data, and microbiomic data.
Conclusions
The experimental results show that the proposed method achieves better results than several versions of mmtSNE based on three evaluation indicators including the neighborhood preservation ratio (NPR), error rate and time complexity.
Background
A large number of studies have shown that genetic diseases with overlapping phenotypes are closely related to functionrelated gene mutations [1, 2]. From another perspective, there are similar pathophysiological mechanisms between different clinical features and genetic diseases [3, 4]. In addition, classical methods of dimensionality reduction and visualization of data have been applied to the analysis of microbial data [5]. However, generally speaking, the integration and analysis of microbiome big data are still in its preliminary stage. There are currently no effective integration techniques and visualization methods to exploit microbiome big data. Some studies have focused on established mathematical models that exploit the complicated correlations between phenotypes and genotypes in isomeric genomic datasets such as genetic expression data, gene ontology annotations [6], and proteinprotein interaction networks [7, 8]. In addition, some studies prove that nonmetric attributes are important features of microbial data [9]. Researching the associations between diseases not only helps us to discover their mutual hereditary basis [10], but also provides us new insights into the molecular circadian mechanisms [11] and prospective drug target studies [12] Each person’s gut microbiota has a dominant flora in the intestine and can be divided into three different “intestinal types” based on the characteristics of the human intestine. This finding can help us discover the relationship between drugs, diet, microbes and the body in different states of health and disease [13]. These microbes distributed in different parts of the body play a vital role in our health. Lowering the dimensions of data and extracting useful information from data in the analysis of microbiome big data, with the help of statistics and pattern recognition, the structure and characteristics of the microbial community could be analyzed; new biological hypothesis could be proposed and examined.
Before performing computational tasks on a large amount of data, to conduct preliminary visualization and exploration at first will helps us understand this task intuitively. By visualizing the relationships between disease phenotypes, we may gain new insights into the relationships between genes and disease. The conventional method of dimensionality reduction visualizes highdimensional space objects into twodimensional or threedimensional metric space by constructing a single map in lowdimensional space [14]. However, this visualization method suffers from the basic limitations of the metric space. The main limitation of metric space comes from the triangular inequality criterion. For example, from a biological point of view, if phenotype A is associated with phenotype B in the metric space and phenotype B is associated with phenotype C, logically, phenotype A should be associated with phenotype C. As a matter of fact, this restriction is most likely to be ruined by the implicit structure of similarity data. Because these diseases may be interrelated in different categories, they may have overlapping phenotypes in which a cluster of phenotypes may belong to disparate illness categories. The mmtSNE [15] can properly model nontransitive similarities by assign a significance weight to each point in disparate maps. For example, we imbed three instance phenotypes A, B, and C into two maps in low dimensional space (see Fig.1 (a)), mmtSNE assigns a significance weight of 1 to the phenotype A on the first map, assign an importance weight 1 for the phenotype B in the second map and assign to the phenotype C a significance weight in both maps is 0.5. As a result, the pairs of similarities between phenotype A and B is 0. The mmtSNE approach breaks down the nature of metricspace transitivity similarities by visualizing data points into multiple maps [15]. Nevertheless, mmtSNE may have some drawbacks, that is, the data points with high significance weights in the uniform map do not accord with the uniform cluster structure. That adds to the difficulty of explaining the implication of each and every map. The mmtSNE regularization [16] improves the mmtSNE by introducing the Laplacian penalty term in the target loss function. The Laplacian penalty term has been widely applied to many machine learning models [17, 18]. Compared with mmtSNE, a preponderance of mmtSNE regularization is that it adopts clustering structure of variate and offers more sparsity for parameter estimation. These methods use standard momentum updates [19] to evaluate point of the gradient at each iteration. But sometimes the gradient of the previous update is wrong, it would make the current update jump high, which leads to excessive oscillation. This article is an extended version of the mmtSNE regularization based on NAG from an earlier conference publication [20]. In contrast to these previous papers, this article: (1) contains more detailed technical and experimental descriptions; and (2) includes additional experimental results on some microbial datasets. In this article, we use a Nesterov momentum method [21, 22] to learn the target loss function and correct each gradient update by looking back at the previous gradient in the candidate search direction. The key difference between standard momentum and Nesterov momentum is that standard momentum calculates the gradient before the velocity is applied, while Nesterov momentum calculates the gradient after doing so. Therefore, the calibration gradient can be corrected faster and more accurately. This benignlooking difference seems to allow Nesterov momentum to change velocity in a quicker and more responsive way, letting it behave more stable than momentum in many situations, especially for higher values of momentum coefficient. By indirectly using the information of the second order, the Nesterov momentum method achieves a better convergence rate than the momentum method and further reduces the error rate of the loss function. The results of the present study indicate that the proposed method can obtain comparable performance compared with the original methods and provide a better data visualization framework.
Methods
Tdistributed stochastic neighborhood embedding (tSNE)
tDistributed Stochastic Neighborhood Embedding (tSNE) is a classical multidimensional scaling technique [23] It is a nonlinear mapping method based on the early work of Stochastic Neighbor Embedding [24]. As data points are mapped from highdimensional space to lowdimensional space, the distances between data points are maintained and local information and global information are preserved. This method has been applied to the visualization of data in many fields such as literature [25], linguistic data [26], and breast cancer CADx imaging data [27]. In tSNE, the similarities amongst data points are modeled by probability metrics different from the Euclidean distance decision. The paired distances between data points in a highdimensional space are transformed by Gaussian distribution into probability distances p_{ij} to represent the similarities between data points:
The aim of tSNE is to calculate and retain the probabilistic of distances between all object points in lowdimensional space. In tSNE, the two or threedimensional “metric space” is defined as a longtailed distribution Q_{ij} that centers at each and every point, for purposing of avoiding the “crowding problem [23]”. The paired distances between data points in a low dimensional space is transformed into a probability distance q_{ij} by tdistribution to represent the similarities between data points:
The difference between the similarity q_{ij} in the lowdimensional space and the similarity p_{ij} in the highdimensional space is measured by calculating the KL divergence between the joint distributions P and Q:
Multiple maps tSNE
mmtSNE is a variant of the tSNE method that breaks down the traditional limitations of a single metric map by constructing multiple mappings M in a lowdimensional space to visualize pairwise similarities in nonmetric spaces.
Multiple maps tSNE constructs M maps in low dimensional space, where each map contains N data points. In the map with index m, the data point with index i has an importance weight \( {\pi}_i^{(m)} \), which represents the importance of data point i in map M, and the sum of the weights of data point i in all maps is equal to 1. Therefore, the pairwise similarity q_{ij} between data points in a lowdimensional space is measured by a weighted sum of pairwise similarities between data points i and j in all the maps. Its mathematical definition is as follows:
where \( {y}_i^{(m)} \) indicates that the data point i in the highdimensional space is mapped to the m map in the lowdimensional space. Since it is more difficult to directly calculate the parameter \( {\pi}_i^{(m)} \). In order to simplify the calculation, the weight of importance \( {\pi}_i^{(m)} \) is obtained by calculating the unconstrained \( {\omega}_i^{(m)} \):
The objective loss function has the uniform form as Eq. 3, but the cost function minimum is calculated by the location of the point \( {y}_i^{(m)} \) in all relevant metric maps and the associated unrestrained weight \( {\omega}_i^{(m)} \).
Multiple maps tSNE with Laplacian regularization
Multiple maps tSNE with Laplacian regularization (mmtSNE regularization) alleviates the problem that the higherweighted data points in the uniform map do not accord with the uniform clustering structure by adding Laplacian penalties to the original mmtSNE cost function C (Y).
where L = (diag(∑_{j}p_{ij}) − P_{ij}).
The gradient about the mapping point \( {y}_i^{(m)} \) in the lowdimensional space is calculated by the following equation:
where \( {\mathrm{d}}_{ij}^{(m)}={\left\Vert {y}_i^{(m)}{y}_j^{(m)}\right\Vert}^2 \).
The gradient about the weights \( {\omega}_i^{(m)} \) in the lowdimensional space is calculated by the following equation:
where \( Z={\sum}_k{\sum}_{l\ne k}{\sum}_{m^{\prime }}{\pi}_i^{m^{\prime }}{\pi}_k^{m^{\prime }}\left(1+{d}_{kl}^{m^{\prime }}\right) \).
Mathematically, the gradient update of the momentum item is given by the following equation:
where Y are the model parameters, the velocity is v^{(t)}, the momentum coefficient is γ ∈ [0, 1] and η is the learning rate at iteration t, \( \frac{\partial C(Y)}{\partial Y} \) is the gradient.
Simplified Nesterov momentum
Nesterov momentum [21, 22] is a firstorder optimization method to improve stability and convergence of regular gradient descent. The algorithm update rules are as follows [28, 29]:
where θ_{t} are the model parameters, the velocity is v^{(t)}, μ^{(t)} ∈ [0, 1] is the momentum coefficient and ε^{(t)} > 0 is the learning rate at iteration t, f(θ) is the objective function and ∇f(θ^{′}) is a shorthand notation for the gradient \( \frac{\partial f\left(\theta \right)}{\partial \theta}\left\theta ={\theta}^{\prime}\right. \).
The equivalent form is as follows:
Different from the momentum term, Nesterov momentum renews the parameter vector at some positionθ^{(t)}, which depends on μ^{(t − 1)}ν^{(t − 1)} as well as in the last momentum update of the current parameter position. The gradient correction to the velocityv_{t}, with the Nesterov momentum, is calculated at point θ^{(t)} + μ^{(t − 1)}v^{(t − 1)}, and if μ^{(t − 1)}v^{(t − 1)} is an even worse update, ∇f(θ^{(t − 1)} + μ^{(t − 1)}v^{(t − 1)}) will point reversely θ^{(t)} more forcefully than the gradient computed at θ^{(t)}, hence providing a larger and more timely correction to v^{(t)}. Fig. 1 (b) illustrates the geometric significance of this phenomenon. With the equivalent form of Nesterov momentum, we can observe the difference between Nesterov momentum and standard momentum. The direction of this update has increased by an amount of \( {\mu}^{\left(t1\right)}\left[\nabla f\left(\hat{\theta^{\left(t1\right)}}\right)\nabla f\left(\hat{\theta^{\left(t2\right)}}\right)\right] \), the change is essentially an approximation of the second order of the objective function. Since Nesterov momentum uses the secondorder information of the objective function, the Nesterov momentum is more efficient than the standard momentum term in modifying the large and undue velocity in each iteration, which makes it run faster than the momentum method, and can further reduce the error rate of the loss function.
Multiple maps tSNE regularization based on Nesterov momentum
In this article, unlike the original several versions of mmtSNE, we use the Nesterov momentum method to optimize the target loss function, which lets the loss function reach the optimal value better and faster and obtain a higher neighborhood preservation ratio.
The learning algorithm is as follows:
where Y represents the model parameter to be optimized, ν^{(t)} represents the velocity of the i iteration, γ ∈ [0, 1] represents the momentum coefficient, η represents the learning rate for the i iteration, and \( \frac{\partial C(Y)}{\partial Y} \) represents the gradient.
Datasets
To assess the performance of our approach, we apply our method to several datasets, including phenotypic similarity dataset and microbial dataset. The microbial dataset consisted of 6313 orthologous proteins which are from 345 individual intestinal microorganisms [30]. After data preprocessing, a similarity matrix of 1299 KOs is finally obtained. The phenotypic similarities come from the Online Mendelian Inheritance in Man (OMIM) database [31, 32], which contains 1025 phenotypes related to 21 diseases, respectively, according to the disease classification information from the Human Disease Network [8]. At them in the middle, the value of similarity less than 0.5 is filtered out.
Evaluation indicators
Neighborhood preservation ratio
The ideal state for dimensionality reduction visualization is that the neighboring point of the sample point x_{i} in the highdimensional space is exactly the same as its neighboring point in the lowdimensional spacey_{i}. That is, it is assumed that the neighboring points around the sample point x_{i} pass through the highdimensional space. After the dimensional method is projected into a twodimensional space, the neighboring points aroundy_{i}coincide with the highdimensional space. The neighborhood preservation ratio is a measure proposed by Laurens van der Maaten [15], which measures similarities in the highdimensional space are preserved in the lowdimensional space by the mmtSNE method. For each data point i, we choose its k highest p_{ij}values in the highdimensional space as its k nearest neighbors (N^{i1} for short), and select the k highest q_{ij}values in the lowdimensional space as its k nearest neighbors (N^{i2} for short). By calculating the intersection of N^{i1} and N^{i2}, it can be determined whether the reduceddimensional visualization method used can maintain the distribution of neighboring points of data in highdimensional space. Therefore, NPR indicates the average ratio of the number of neighbors to be saved.
where N^{i1} ∩ N^{i2} is the number of points that common points in highdimensional space and lowdimensional space and n represent the total number of visualized target data points.
Error rate
The error rate represents the cost of using the KL divergence method to model the difference between the Q distribution and the P distribution.
Time complexity
The time complexity of the algorithm is measured by the number of times the basic operations are repeated.
Results
We compare the mmtSNE regularization based on Nesterov momentum method with the original several mmtSNE methods in the phenotype (Fig. 2) and microbiome (Fig. 3) dataset respectively using the neighborhood preservation ratio, the error rate and the time complexity as the evaluation indicators.
We then apply the mmtSNE regularization based on Nesterov momentum to explore the nonmetric relationships on phenotype similarity dataset and microbiomic dataset. The number of model parameters m—the number of maps and λ—the penalty term are selected according to the neighborhood preservation ratio (NPR) (See methods). Fig. 2 and Fig. 3 show the experimental results on phenotype similarity dataset and microbial dataset, respectively. The mmtSNE regularization based on Nesterov momentum has performance comparable with mmtSNE and mmtSNE regularization. The green line in Fig. 2 and Fig. 3 shows that our proposed models are at an advantage over original mmtSNE methods of several versions. Fig. 4 is the heat map of NPR in the parametric space of m and λ when apply mmtSNE regularization based on Nesterov momentum algorithm. The xaxis represents the value of λ in the experiment, and the yaxis represents the number of maps. The color change in the legend represents a gradual decrease in the preservation ratio of the neighborhood from high to low. When λ = 0.002 and the number of maps is 27, the neighbor’s preservation ratio is maximized. Nevertheless, according to the experimental results, we choose the number of maps as 15, and set the λ as15 as our model parameters, because it is sufficient to model the nonmetric structure of phenotype similarities and KOs similarities. When the mmtSNE regularization based on Nesterov momentum is applied, the relationship between the NPR and the number of maps is shown in Fig.5. When λ = 0.005 and m = 15, we obtain the highest neighborhood preservation ratio. Overall, the mmtSNE regularization based on Nesterov momentum obtains better performance compared to other methods and reduces the time complexity of algorithm from Ο(1/k) (after k steps) to Ο(1/k^{2}) [21] (See Fig. 6). Since the processed data of the proposed algorithm is a matrix with N×N size, the spatial complexity of proposed algorithm does not improve relative to the original algorithms. The space complexity of the proposed algorithm is O (N^{2}).
Discussion
From the phenotypic point of view, similar phenotypes tend to converge into the same class. Nevertheless, some of the phenotypes in the same disease category may exist in other disease categories as well. In addition, we discover that our method compared to mmtSNE and mmtSNE regularization can better appropriately model nontransitive similarities between phenotypes. For example, Apert syndrome (AS, OMIM ID: 101200) has importance weights of 0.5967 and 0.3896 at two maps (Maps 9 and 15, See Fig. 7 and Fig. 8). Removing the phenotype of each map with an importance weight less than 0.1 prevents visualization from being too clutter. In Map 9, Ellisvan Creveld syndrome (EVC, OMIM ID: 225500) is one of the neighbors of the AS, with similarity of 0.5148 (See Table 1) and they have an importance weights of 0.5967 and 0.9474 in the metric space Map 9 severally (See Table 2). In Map 15, AS has a near neighbor MowatWilson syndrome (MOWS, OMIM ID: 235730) with similarity 0.5957. From Table 2, it can be found that MOWS is not displayed on Map 9 and EVAS is not displayed on Map 15, the fact that they are both neighbors in single maps. In other words, the neighbor of AS in Map 9 is not essentially the neighbor of it in Map 15. In fact, the similarity between EVC and MOWS is 0 (See Table 1). Although the initial aim of mmtSNE regularization and mmtSNE is to find intransitivity similarity. We find that the mmtSNE and mmtSNE regularization combine the four phenotypes in Table 1 into one map (See Fig. 9 and Fig. 10). This result indicates that the mmtSNE regularization based on Nesterov momentum excavates nontransitive similarity of the original several methods without discovering.
Except MOWS, at Map 15 (see Fig. 7), AS has another near neighborHayWells syndrome (HWS, OMIM: 106260) with a similarity 0.5957. AS, MOWS and HWS are all neighbors in Map 15. Nevertheless, astonishing truth is that the similarity between AS and HWS is 0 (See Table 1). Then we have a deep analysis of these three phenotypes. Apert syndrome is a congenital disease; the main symptoms include craniosynostosis, middle facial hypoplasia, hands and feet, with the tendency of bone structure fusion [33,34,35]. MowatWilson’s syndrome is an autosomal dominant complex dysplasia, characterized by a variety of clinical symptoms such as mental retardation, motor retardation, epilepsy, vasovagal disease and neuropathy, caused by mutations in individual functions [36,37,38]. HWS is a rare, complex disease characterized by congenital ectodermal dysplasia with a variety of symptoms including thinning hair, mild hypohidrosis, scalp infection, dental hypoplasia, and maxillary dysplasia [39,40,41]. Although these three diseases belong to different types of diseases (tissue, developmental and multiple respectively), they have the same symptoms, such as nail and tooth dysplasia and skeletal deformities. The experimental result shows that although the text mining method [42] measures the direct similarity between AS and HWS as 0, our method does deduce their true relationship from data. This is different from nontransitive similarity modeling, because they are in the uniform metric space Map 15.
The experimental results demonstrate that our proposed method reveals the nontransitive similarity not found in the original several mmtSNE methods in microbiomic dataset (See Table 3). K00691 is a maltose phosphorylase involved in glucose metabolism and transcription [43]. Table 3 shows three KOs, of which at least three maps have an importance weight of not less than 0.2, which are respectively close to K00691. K05340 is a transporter involved in signal transduction and glucose uptake of cellular activity. K06204 is a Dnak inhibitor that is involved in the biofilm formation and prokaryotic cell activities of Escherichia coli and rRNA transcription [44]. From Table 3 we can see that although these three KOs are similar in Map 7, they are not similar to each other in other maps. For example, K05340 in Map 12 is not similar to K06204. Likewise, K06204 is not similar to K05340 in Map 13. These nontransitive similarities can not be expressed by traditional data visualization methods.
Conclusions
We propose a new method to optimize the mmtSNE regularization cost function. Experimental result shows that this method outperforms several versions of mmtSNE, when measured by neighborhood preservation rate and error rate. In this study, it is shown that nonmetric properties are ubiquitous in biological and microbiological data and should be considered in future studies. Traditional visualization techniques are effective when applied to small and mediumscale data, but they still face a huge challenge when applied to large biological and microbiological data. In future research work, we will propose a method to solve the problem of high computational complexity and problems in data visualization caused by the increase of data volume and the high dimensionality.
Abbreviations
 AS:

Apert syndrome
 EVC:

Ellisvan Creveld syndrome
 HWS:

HayWells syndrome
 mmtSNE regularization:

Multiple maps tSNE with Laplacian regularization
 mmtSNE:

Multiple maps tSNE
 MOWS:

MowatWilson syndrome
 NPR:

Neighborhood preservation ratio
 OMIM:

Online Mendelian Inheritance in Man
 tSNE:

tDistributed Stochastic Neighborhood Embedding
References
 1.
Brunner HG, Van Driel MA. From syndrome families to functional genomics. Nat Rev Genet. 2004;5:545–51.
 2.
Lim J, et al. A protein–protein interaction network for human inherited ataxias and disorders of Purkinje cell degeneration. Cell. 2006;125(4):801–14.
 3.
Limviphuvadh V, et al. The commonality of protein interaction networks determined in neurodegenerative disorders (NDDs). Bioinformatics. 2007;23(16):2129–38.
 4.
Oti M, Huynen MA, Brunner HG. Phenome connections. Trends Genet. 2008;24(3):103–6.
 5.
Wooley JC, Godzik A, Friedberg I. A primer on metagenomics. PLoS Comput Biol. 2010;6(2):e1000667.
 6.
Freudenberg J, Propping P. A similaritybased method for genomewide prediction of disease relevant human genes. Bioinformatics. 2002;18(suppl2):S110–5.
 7.
Lage K, et al. A human phenomeinteractome network of protein complexes implicated in genetic disorders. Nat Biotechnol. 2007;25(3):309–16.
 8.
Oti M, et al. Predicting disease genes using protein–protein interactions. J Med Genet. 2006;43(8):691–8.
 9.
Xu, W., Jiang, X., Li, G. (2013) Nonmetric property of diabetesrelated genes in human gut microbiome, IEEE International Conference on Bioinformatics and Biomedicine.
 10.
Loscalzo J, Kohane I, Barabasi AL. Human disease classification in the postgenomic era: a complex systems approach to human pathobiology. Mol Syst Biol. 2007;3:124.
 11.
Wang Q, Jia P, Cuenco KT, Feingold E, Marazita ML, Wang L, et al. Multidimensional prioritization of dental caries candidate genes and its enriched dense network modules. PLoS One. 8:e76666. https://0doiorg.brum.beds.ac.uk/10.1371/journal.pone.0076666.
 12.
P. Csermely, T. Korcsmáros, H J M Kiss, G London, R Nussinov, Structure and dynamics of molecular networks: a novel paradigm of drug discovery: a comprehen sive review, Pharmacol Ther 138 (3) (2013) 333–408.
 13.
Arumugam M, et al. Enterotypes of the human gut microbiome.Nature 2011; 473:174–180.[PubMed: 21508958].
 14.
Legendre, P., L. Legendre, Numerical Ecology Vol. 20. 2012: Elsevier.
 15.
Van der Maaten L, Hinton G. Visualizing nonmetric similarities in multiple maps. Mach Learn. 2012;87(1):33–55.
 16.
Xu W, Jiang X, Hu X, Li G. Visualization of genetic diseasephenotype similarities by multiple maps tSNE with Laplacian regularization. BMC Med Genet. 2014;7(2):1–9.
 17.
Li C, Li H. Networkconstrained regularization and variable selection for analysis of genomic data. Bioinformatics. 2008;24(9):1175–82.
 18.
He X, et al. Laplacian regularized Gaussian mixture model for data clustering. Knowledge and data engineering. IEEE Transactions on. 2011;23(9):1406–18.
 19.
Qian N. On the momentum term in gradient descent learning algorithms. Neural networks. 1999;12(1):145–51.
 20.
Shen, X., Zhu, X., Jiang, X., Hu, X. (2017) Visualization of disease relationships by multiple maps tSNE regularization based on Nesterov accelerated gradient, IEEE International Conference on Bioinformatics and Biomedicine.
 21.
Nesterov Y. A method for unconstrained convex minimization problem with the rate of convergence O(1/k^{2}). Doklady ANSSSR (translated as SovietMathDocl). 269:543–7.
 22.
Nesterov Y. Introductory lectures on convex optimization: a basic course. Applied optimization. Kluwer academic Publ. London: Boston, Dordrecht; 2004.
 23.
Van der Maaten L, Hinton G. Visualizing Data using tSNE. J Mach Learn Res. 2008;9(11).
 24.
Hinton GE, Roweis S. Stochastic neighbor embedding. In NIPS’2002; 2003.
 25.
LacosteJulien S, Sha F, Jordan MI. DiscLDA: discriminative learning for dimensionality reduction and classification. In NIPS, volume. 2008;22.
 26.
Mao Y, Balasubramanian K, Lebanon G. Dimensionality reduction for text using domain knowledge. In: Proceedings of the 23rd international conference on computational linguistics: posters, COLING '10, Association for Computational Linguistics, Stroudsburg, PA, USA; 2010. p. 801–9.
 27.
Jamieson AR, et al. Exploring nonlinear feature space dimension reduction and data representation in breast CADx with Laplacian eigenmaps and tSNE. Med Phys. 2010;37:339.
 28.
Sutskever I. Training recurrent neural networks, Ph.D. thesis. Toronto: CS Dept., U; 2012.
 29.
Bengio Y, Boulanger Lewandowski N, Pascanu R. Advances in optimizing recurrent networks. In Proceedings of the 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013), May; 2013.
 30.
Qin J, et al. A metagenomewide association study of gut microbiota in type 2 diabetes. Nature. 2012;490(7418):55–60.
 31.
Hamosh A, et al. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic acids research. 2005;33(suppl 1):D514–7.
 32.
Jiang X, et al. Modularity in the genetic disease phenotype network. FEBS Lett. 2008;582(17):2549–54.
 33.
MantillaCapacho JM, Arnaud L, DiazRodriguez M, BarrosNunez PA. Syndrome with preaxial polydactyly showing the typical mutation Ser252Trp in the FGFR2 gene. Genet Counsel. 2005;16:403–6.
 34.
Moloney DM, Slaney SF, Oldridge M, Wall SA, Sahlin P, Stenman G, Wilkie AOM. Exclusive paternal origin of new mutations in Apert syndrome. Nature Genet. 1996;13:48–53.
 35.
Lajeunie E, De Parseval N, Gonzales M, Delezoide AL, Journeau P, Munnich A, Le Merrer M, Renier D. Clinical variability of Apert syndrome. J Neurosurg. 2000;90:443.
 36.
Mowat DR, Wilson MJ, Goossens M. MowatWilson syndrome. J Med Genet. 2003;40:305–10.
 37.
Strenge S, Heinritz W, Zweier C, Rauch A, Rolle U, Merkenschlager A, Froster UG. Pulmonary artery sling and congenital tracheal stenosis in another patient with MowatWilson syndrome. (letter). Am J Med Genet. 2007;143A:1528–30.
 38.
Horn D, Weschke B, Zweier C, Rauch A. Facial phenotype allows diagnosis of MowatWilson syndrome in the absence of Hirschsprung disease. Am J Med Genet A. 2004;124A:102–4.
 39.
Hay RJ, Wells RS. The syndrome of ankyloblepharon, ectodermal defects and cleft lip and palate: an autosomal dominant condition. Brit J Derm. 1976;94:287–9.
 40.
McGrath, J. A., Duijf, P. H. G., Doetsch, V., Irvine, A. D., de Waal, R., Vanmolkot, K. R. J., Wessagowit, V., Kelly, A., Atherton, D. J., Griffiths, W. A. D., Orlow, S. J., Ausems, M. G. E M, Yang, A, McKeon, F, Bamshad, M A, Brunner, H G, Hamel, B C J, van Bokhoven, H. HayWells syndrome is caused by heterozygous missense mutations in the SAM domain of p63. Hum Mol Genet10: 221–229, 2001.
 41.
Bertola DR, Kim CA, Sugayama SMM, Albano LMJ, Utagawa CY, Gonzalez CH. AEC syndrome and CHAND syndrome: further evidence of clinical overlapping in the ectodermal dysplasias. Pediat Derm. 2000;17:218–21.
 42.
van Driel MA, et al. A textmining analysis of the human phenome. European journal of human genetics. 2006;14(5):535–42.
 43.
Zhou J, Ashouian N, Delepine M, Mastsuda F, Chevillard C, Rivlet R, Schildkraut CL, Birshtein BK. The origin of a developmentally regulated lgh replicon is located near the border of regulatory domains for lgh replication and expression. PNAS. 2002;99(21):13693–8.
 44.
Adachi Y, Asakura Y, Sato Y, Tajiama T, Nakajima T, Yamamoto T, Fujieda K. Novel SLC12A1 (NKCC2) mutations in two families with Bartter syndrome type1. Endocr J. 12 Nov 2007;54(6):1003–7.
Acknowledgements
Not applicable.
Consent to publication
Not applicable.
Funding
Publication costs are funded by the National Natural Science Foundation of China (61532008) and the National Key Research and Development Program of China (2017YFC0909502).
Availability of data and materials
The social network dataset used in our experiment can be downloaded in https://lvdmaaten.github.io/multiplemaps/Multiple_maps_tSNE/Multiple_maps_tSNE.html. This dataset is available for public and free to use.
The microbial dataset used in our experiment can be downloaded in ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100036/Intermediate_results/. This dataset is available for public and free to use.
The phenotypic similarity dataset used in our experiment can be downloaded in http://www.cmbi.ru.nl/MimMiner/cgibin/main.pl. This dataset is available for public and free to use.
About this supplement
This article has been published as part of BMC Bioinformatics Volume 19 Supplement 20, 2018: Selected articles from the IEEE BIBM International Conference on Bioinformatics & Biomedicine (BIBM) 2017: bioinformatics. The full contents of the supplement are available online at https://0bmcbioinformaticsbiomedcentralcom.brum.beds.ac.uk/articles/supplements/volume19supplement20.
Author information
Affiliations
Contributions
XS and XJ designed the algorithm based on mmtSNE regularization. XZ implemented the mmtSNE regularization based on Nesterov momentum algorithm and run the experiments. KW and YM helped plan the experimental analysis. JL contributed to writing the manuscript. TH and XH supervised and helped conceive the study. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Zhu, X., Shen, X., Jiang, X. et al. Nonlinear expression and visualization of nonmetric relationships in genetic diseases and microbiome data. BMC Bioinformatics 19, 505 (2018). https://0doiorg.brum.beds.ac.uk/10.1186/s128590182537z
Published:
DOI: https://0doiorg.brum.beds.ac.uk/10.1186/s128590182537z
Keywords
 Multiple maps tSNE
 Data visualization
 Nonmetric similarities
 Nesterov momentum