Base model facebook/opt-2.7b

Fine-tuned for causal language modeling of transcribed spoken dialogue from the TalkBank CABank collection. Training corpora include:

(Corpus descriptions are from TalkBank)

Data input format: The data format models a sequence of spoken dialogue between two or more participants:

Example:

<span style="color:red"><participant></span> S1 (name: Dave, age: 33, sex: male) <span style="color:red"><participant></span> S2 (name: unknown, age: unknown, sex: unknown) <span style="color:red"><dialog></span> <span style="color:orange">S1:</span> Hi! (2.3) are you there? <span style="color:orange">S2:</span> hhh hhh [% background noise] uh yeah (0.8) I can hear you. (1.2) &=cough can you hear me? <span style="color:orange">S1:</span> ...

Usage Info:

Per the OPT documentation, the model was trained with tokenizer setting use_fast=False.

To use this model for real-time inference in a continuous duplex dialogue system, see: https://github.com/AbrahamSanders/realtime-chatbot.