Skip to main content

Table 5 Application of SKiM to a closed search problem

From: Serial KinderMiner (SKiM) discovers and annotates biomedical knowledge using co-occurrence and transformer models

A

Cutoff date

B

A–B p value

C

B–C p value

First A–C publication (year)

Tumorigenesis

2001

AHR

9 × 10–7

Inflammation

6 × 10–138

17848686 (2007)

CD44

6 × 10–45

5 × 10–93

7530464 (1994)

FAS

3 × 10–23

8 × 10–107

10358186 (1999)

REL

5 × 10–10

7 × 10–35

15197457 (2004)

JUN

4 × 10–40

1 × 10–36

10657993 (2000)

FOS

5 × 10–42

7 × 10–38

10657993 (2000)

HGF

5 × 10–34

9 × 10–22

14596869 (2003)

STAT1

2 × 10–7

3 × 10–13

16734720 (2006)

APC

3 × 10–263

1 × 10–26

8672984 (1995)

STAT3

4 × 10–7

4 × 10–12

12219085 (2002)

LOX

7 × 10–6

5 × 10–9

N/A

DES

6 × 10–28

4 × 10–23

N/A

PCNA

1 × 10–59

5 × 10–12

20975039 (2010)

EGF

5 × 10–78

1 × 10–6

7895532 (1995)

  1. “Tumorigenesis” was the only A term searched, and “inflammation” was the only C-term searched; 17,545 genes were the B-terms. Only abstracts published in 2001 or before were searched. A total of six of the genes were already co-mentioned with both tumorigenesis and inflammation in the same article by 2001. Also, six were not co-mentioned with both tumorigenesis and inflammation in the same article by 2001; five of these latter six would later be shown to link tumorigenesis and inflammation. Finally, two genes (LOX and DES) were false positives caused by semantic ambiguity