Regularizing transformers with deep probabilistic layers.

Neural Netw

Swiss Data Science Institute (ETHZ/EPFL), Universitatstrasse 25, 8006, Zurich, Switzerland. Electronic address:

Published: April 2023


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Language models (LM) have grown non-stop in the last decade, from sequence-to-sequence architectures to attention-based Transformers. However, regularization is not deeply studied in those structures. In this work, we use a Gaussian Mixture Variational Autoencoder (GMVAE) as a regularizer layer. We study its advantages regarding the depth where it is placed and prove its effectiveness in several scenarios. Experimental result demonstrates that the inclusion of deep generative models within Transformer-based architectures such as BERT, RoBERTa, or XLM-R can bring more versatile models, able to generalize better and achieve improved imputation score in tasks such as SST-2 and TREC or even impute missing/noisy words with richer text.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.neunet.2023.01.032DOI Listing

Publication Analysis

Top Keywords

regularizing transformers
4
transformers deep
4
deep probabilistic
4
probabilistic layers
4
layers language
4
language models
4
models grown
4
grown non-stop
4
non-stop decade
4
decade sequence-to-sequence
4

Similar Publications

Human beings have the ability to continuously analyze a video and immediately extract the motion components. We want to adopt this paradigm to provide a coherent and stable motion segmentation over the video sequence. In this perspective, we propose a novel long-term spatio-temporal model operating in a totally unsupervised way.

View Article and Find Full Text PDF

Transformer-based ECG classification for early detection of cardiac arrhythmias.

Front Med (Lausanne)

August 2025

Universidad Internacional Iberoamericana, Arecibo, PR, United States.

Electrocardiogram (ECG) classification plays a critical role in early detection and trocardiogram (ECG) classification plays a critical role in early detection and monitoring cardiovascular diseases. This study presents a Transformer-based deep learning framework for automated ECG classification, integrating advanced preprocessing, feature selection, and dimensionality reduction techniques to improve model performance. The pipeline begins with signal preprocessing, where raw ECG data are denoised, normalized, and relabeled for compatibility with attention-based architectures.

View Article and Find Full Text PDF

Multivariate time series anomaly detection has shown potential in various fields, such as finance, aerospace, and security. The fuzzy definition of data anomalies, the complexity of data patterns, and the scarcity of abnormal data samples pose significant challenges to anomaly detection. Researchers have extensively employed autoencoders (AEs) and generative adversarial networks (GANs) in studying time series anomaly detection methods.

View Article and Find Full Text PDF

Arterial Spin Labeling (ASL) perfusion MRI is the only non-invasive technique for quantifying regional cerebral blood flow (CBF) visualization, which is an important physiological variable. ASL MRI has a relatively low signal-to-noise-ratio (SNR), making it challenging to achieve high quality CBF images using limited data. Promising ASL CBF denoising results have been shown in recent convolutional neural network (CNN)-based methods.

View Article and Find Full Text PDF

Transformer-based approaches have recently made significant advancements in 3D human pose estimation from 2D inputs. Existing methods typically either consider the entire 2D skeleton for global features extraction or break it into independent parts for local features learning. However, capturing the spatial dependencies of the entire 2D skeleton does not effectively facilitate learning local spatial features, while partitioning the skeleton into independent segments disrupts the relevance of individual joints to the whole.

View Article and Find Full Text PDF