Mixdown all audio channels from video for OpenAI Whisper (0:a:0 ?)
I'm prepping audio for transcription in whisper, but having some trouble with multichannel source audio since ffmpeg defaults to including only the first listed channel unless specified when transcoding or resampling (right?)
Part of the issue is I want to downmix ALL the audio channels from video no matter if there are one or eight, or if some are considered stereo and others mono etc etc etc.
I modified the code to include a map=0:a:0
** argument, but I think it's not changing anything:
allchannel, _ = (
ffmpeg.input(file, threads=0)
.output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=sr, **{"map":"0:a:0"})
.run(cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True)
)
From this stackoverflow article I read this solution but I don't know enough ffmpeg/python-ffmpeg to implement, or really what it's doing:
ffmpeg -y -vn -i stereo.mp4 -filter_complex "[0:a:0]channelsplit=channel_layout=mono:channels=FC[C0]" -map "[C0]" -acodec pcm_s16le -ac 1 -ar 8k first_channel.wav