A clone of the openai/whisper-large.

In this version, the tokenizer's vocabulary does not contain predefined timestamps, which allows more flexible customization.