Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: Anomaly detection in genomic catalogues using unsupervised multi-view autoencoders

Fig. 1

a Workflow and model description. Once candidate regions (ReMap-identified CRMs) are set, we build tensors of peak presence representing them. The X axis represents the position along the genome, while the Y and Z axis are dataset and TR identifiers respectively. The tensor has a value of 1 is a peak for this TF in this dataset (i.e., for this source) is present, 0 otherwise. The atyPeak model will lossily compress this representation. This will result in losing anomalies and other finer details, by learning correlation groups for the rebuilding instead of individual peaks. At the end, each peak is given an anomaly score corresponding to the mean autoencoder reconstruction error, the difference between the original (grey) and rebuilt (red) representation. Scores are then added to the original BED file. Full source code and documentation are available at < https://github.com/qferre/atypeak > . b Model structure. During the encoding, the CRM are viewed by the model through convolutional filters to focus on the correlations between datasets and then between TRs. We use two type of filters (combinations of datasets, then combinations of TRs) successively in a stacked multiview approach. After the subsequent Dense layers, we obtain a smaller encoded representation. This encoded representation is fed to a convolutional decoder with several layers, trying to rebuild the original CRM representation. In subsequent figure legends, “deep dimension” is the number of neurons in each Dense layer, while the “filters number” is the number of kernels in each Convolutional layer, and LR is the learning rate of the Adam optimizer. More details about the structure are available in Methods

Back to article page