audio automatic-speech-recognition endpoints-template

This repository implements a custom handler task for automatic-speech-recognition and Speech Diarization for 🤗 Inference Endpoints using new WhisperX model.

There is also a notebook included, on how to create the handler.py

Request

The endpoint expects a binary audio file. Below is a cURL example and a Python example using the requests library.

curl


# run request
curl --request POST \
  --url https://{ENDPOINT}/ \
  --header 'Content-Type: audio/x-wav' \
  --header 'Authorization: Bearer {HF_TOKEN}' \
  --data-binary '@sample.wav'

Python

import json
from typing import List
import requests as r
import base64
import mimetypes

ENDPOINT_URL=""
HF_TOKEN=""

def predict(path_to_audio:str=None):
    # read audio file
    with open(path_to_audio, "rb") as i:
      b = i.read()
    # get mimetype
    content_type= mimetypes.guess_type(path_to_audio)[0]

    headers= {
        "Authorization": f"Bearer {HF_TOKEN}",
        "Content-Type": content_type
    }
    response = r.post(ENDPOINT_URL, headers=headers, data=b)
    return response.json()

prediction = predict(path_to_audio="sample.wav")

prediction

expected output

{"text": " going along slushy country roads and speaking to damp audiences in draughty school rooms day after day for a fortnight. He'll have to put in an appearance at some place of worship on Sunday morning, and he can come to us immediately afterwards."}