LSTM Variational AutoEncoder (LSTM-Sequence-VAE)

A PyTorch Implementation of Generating Sentences from a Continuous Space by Bowman et al. 2015.

Table of Contents

Introduction

This is a PyTorch Implementation of Generating Sentences from a Continuous Space by Bowman et al. 2015. where LSTM based VAE is trained on Penn Tree Bank dataset.

Setup

The code is using pipenv as a virtual environment and package manager. To run the code, all you need is to install the necessary dependencies. open the terminal and type:

  • git clone https://github.com/Khamies/Sequence-VAE.git
  • cd Sequence-VAE
  • pipenv install

And you should be ready to go to play with code and build upon it!

Run the code

  • To train the model, run: python main.py

  • To train the model with specific arguments, run: python main.py --batch_size=64. The following command-line arguments are available:

    • Batch size: --batch_size
    • bptt: --bptt
    • Learning rate: --lr
    • Embedding size: --embed_size
    • Hidden size: --hidden_size
    • Latent size: --latent_size

Training

The model is trained on 30 epochs using Adam as an optimizer with a learning rate 0.001. Here are the results from training the LSTM-VAE model:

  • KL Loss

  • Reconstruction loss

  • KL loss vs Reconstruction loss

  • ELBO loss

Inference

1. Sample Generation

Here are generated samples from the model. We randomly sampled two latent codes z from standard Gaussian distributions, and specify “like” as the start of the sentence (sos), then we feed them to the decoder. The following are the generated sentences:

  • like other countries such as alex powers a former wang marketer

  • **like design and artists have been by how many **

2. Interpolation

The “President” word has been used as the start of the sentences. We randomly generated two sentences and interpolated between them.

  • Sentence 1: President bush veto power changes meant to be a great number
  • Sentence 2: President bush veto power opposed to the president of the house
 *bush veto power opposed to the president of the house
 bush veto power opposed to the president of the house.
 bush veto power opposed to the president of the house.
 bush veto power opposed to the president of the house.
 bush veto power opposed to the president of the house.
 bush veto power opposed to the president of the house.
 bush veto power opposed to the president of the house.
 bush veto power opposed to the president of the house.
 bush veto power opposed to the president ' s council.
 bush veto power opposed to the president ' s council.
 bush veto power opposed to the president ' s council.
 bush veto power opposed to the president ' s council.
 bush veto power opposed to the president ' s council.
 bush veto power that kind of <unk> of natural gas.
 bush veto power changes to keep the <unk> and that.
 bush veto power changes to keep the <unk> and that.
 bush veto power changes that is in a telephone to.
 bush veto power changes that is in a telephone to.
 bush veto power changes meant to be a great number.
 *bush veto power changes meant to be a great number

Play with the model

To play with the model, a jupyter notebook has been provided, you can find it here

Citation

 @misc{Khamies2021SequenceVAE,
 author = {Khamies, Waleed},
 title = {PyTorch Implementation of Generating Sentences from a Continuous Space by Bowman et al. 2015},
 year = {2021},
 publisher = {GitHub},
 journal = {GitHub repository},
 howpublished = {\url{https://github.com/Khamies/Sequence-VAE}},
 }

Acknowledgement

  • This work has been inspired from Sentence-VAE , where their data prepossessing pipeline is used.

License