Process audio files with OpenAI Whisper
Defer, which supports ffmpeg
natively, can be used to boost the speed of
applications that run inside Bun. Because of its native support, Defer’s
serverless background job processing retains the same unified developer
experience, making the Defer/Bun combination a perfect environment for complex
data processing.
One limitation of
OpenAI Whisper API
is its maximum file size of 25 MB, which is easily reached, for example, on
average meeting recording duration. To overcome this limitation, you can
leverage Defer’s native ffmpeg
support to break the file into chunks of less
than 24MB.
Use case—Audio subtitle generation
Here’s a NextJS app that generates audio subtitles with Web API’s Streams API.
The createTranscript()
background function downloads meeting audios and
forwards them to the OpenAI Whisper API.
With the help of Defer’s defer(createTranscript,..)
), the chunking of the
audio file with ffmepg
will perform in the background:
import { defer } from "@defer/client";
import { S3Client, GetObjectCommand } from "@aws-sdk/client-s3";
import ffmpeg from "fluent-ffmpeg";
import s3Config from "../s3Client";
import openAIClient from "../openAIClient";
async function createTranscript(meetingID: string) {
// we fetch the meeting audio file as a stream
const file: ReadableStream = s3
.getObject({
Bucket: "meetings-recordings",
Key: `meeting-recording-${meetingID}`,
})
.createReadStream();
// we leverage ffmpeg to create audio chunks of 10min
ffmpeg(Readable.fromWeb(rs))
.addOutputOptions("-f segment")
.addOutputOptions("-segment_time 10:00")
.addOutputOptions("-reset_timestamps 1")
.output('audio_%02d.m4a')
// retrieve the created audio chunk files
// and forward them to the OpenAI Whisper API
const files = await glob("audio_*.m4a");
files.forEach((file) => {
processings.push(
openAIClient.audio.transcriptions.create({
file: createReadStream(file),
model: "whisper-1",
})
);
});
await Promise.allSettled(files.map((file) =>
openAIClient.audio.transcriptions.create({
file: createReadStream(file),
model: "whisper-1",
})
);
// Finally: save the transcripts...
// ...
}
export default defer(createTranscript, { concurrency: 5, retry: 2 });
As you can see, Defer native supports for ffmepg
enable processing of large
audio files at scale, without timeout limitations. And Bun makes it easy to
deal with large streams of data thanks to its Web API native support and more
efficient memory usage.
Was this page helpful?