Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: Improvements in viral gene annotation using large language models and soft alignments

Fig. 1

A Similarity matrix D between a query sequence q in red and a subject sequence s in blue, both eight amino acids in length. For each amino acid pair, \(D_{ij}\) represents the cosine similarity between \(E(q_i)\) and \(E(s_j)\), the embeddings of the amino acids \(q_i\) and \(s_j\) respectively. Green cells represent mutual matches, whereas yellow cells are secondary matches. B Graph illustration depicting the mutual and secondary matches in q and s. Each vertex represents an amino acid. Solid arrows link a vertex to its top match, while dashed arrows connect a vertex to its second-best match. The green outline surrounding amino acids indicates mutual matches, where both edges are solid lines. This signifies that the enclosed amino acids are each other’s best match. The yellow outline denotes secondary matches, where at least one of the edges is a dashed line

Back to article page