Denoising Autoencoder vs. Masked LMs

Denoising Autoencoders

A denoising autoencoder is a type of autoencoder designed to learn to reconstruct the original data from a corrupted version of it. The process can be summarized as follows:

Input Corruption: Part of the input data is intentionally corrupted (e.g., by adding noise, masking some input features, or omitting parts of the input).
Encoding: The corrupted input is passed through an encoder network, which compresses it into a lower-dimensional latent representation.
Decoding: The latent representation is then passed through a decoder network, which attempts to reconstruct the original, uncorrupted input.
Objective: The model is trained to minimize the difference between the original input and its reconstruction, effectively learning to denoise the corrupted input.

Masked Language Models

Masked language models, such as BERT (Bidirectional Encoder Representations from Transformers), use a different approach:

Masking: Some tokens (words) in the input sentence are randomly masked (replaced with a special [MASK] token).
Encoding: The masked input is passed through a transformer-based encoder, which processes the entire input contextually.
Prediction: The model is trained to predict only the masked tokens based on the context provided by the surrounding unmasked tokens.
Objective: The training objective is to minimize the prediction error of the masked tokens, rather than reconstructing the entire input.

Key Difference

Denoising Autoencoders: Aim to reconstruct the entire input from a corrupted version, learning a robust representation that can handle noise and missing data.
Masked Language Models: Focus on predicting only the masked tokens, leveraging the context of the surrounding words. The model learns contextual representations by filling in the blanks rather than reconstructing the entire input sequence.

Example to Illustrate the Difference

Suppose we have the input sentence: “The quick brown fox jumps over the lazy dog.”

Denoising Autoencoder Approach

Corrupted Input: “The quick brown fox jumps over the lazy dog.”
Objective: Reconstruct the original input: “The quick brown fox jumps over the lazy dog.”

Masked Language Model Approach

Masked Input: “The quick [MASK] fox jumps over the [MASK] dog.”
Objective: Predict the masked tokens: [MASK] = “brown”, [MASK] = “lazy”.

🪴 Quartz 4.0

Explorer