Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: Constructing a database for the relations between CNV and human genetic diseases via systematic text mining

Fig. 1

The workflow of text mining. a Initial abstracts file (b) The result of using NLTK to split abstracts into clauses (c) The result of using DNorm to recognize disease entities. d The result of using CNV-Rec to recognize CNV entities. e match the location of CNV and disease in sentences. f The result of using PKDE4J to extract the relation between CNV- disease. *NLTK:Natural language toolkit, a set of Natural Language Processing tools based on python. It can be used for text categorization, symbolization, root extraction, labeling, parsing, semantic reasoning, or packaging into an industrial-grade natural language processing library. *Dnorm:a toolkit of disease name normalization with pairwise learning to rank. *PKDE4J: a toolkit of relation extraction with rules. *CNV-Rec: a regular expressions-based method of CNV recognitionEmbedding feature layer

Back to article page