dialogue policy task-oriented dialog

lava-policy-multiwoz

This is the best performing LAVA_kl model from the LAVA paper which can be used as a word-level policy module in ConvLab3 pipeline.

Refer to ConvLab-3 for model description and usage.

Training procedure

The model was trained on MultiWOZ 2.0 data using the LAVA codebase. The model started with VAE pre-training and fine-tuning with informative prior KL loss, followed by corpus-based RL with REINFORCE.

Training hyperparameters

The following hyperparameters were used during SL training:

The following hyperparameters were used during RL training: