如何用 Docker 搭配 NVIDIA CUDA 執行 OpenAI Whisper 的最新 turbo 模型

前幾天 OpenAI 悄悄的發佈了 Whisper 的最新 turbo 模型，這是一個多國語言的模型 (Multilingual model)，而且這個 turbo 模型是 large-v3 模型的優化版本，提供更快的轉錄速度，但準確性卻只有輕微下降，整體參數數量只比 medium 模型稍微大一點點而已。我特別為此打造了一個可以利用 NVIDIA 的 CUDA 加速執行的 Docker 映像檔，讓大家可以輕鬆的在自己的電腦上執行這個模型。

Representing the integration of Docker, NVIDIA CUDA, and OpenAI's Whisper turbo model

閱讀這篇文章之前，建議先看如何在 Windows 的 Docker Desktop 中啟用 NVIDIA CUDA 支援 (GPU)文章，先確認你的 Docker Desktop 是否可以支援 NVIDIA 的 CUDA 加速！

我先在 PowerShell 環境用以下步驟建立環境：

下載 image

docker pull pytorch/pytorch:2.4.1-cuda12.1-cudnn9-runtime

執行容器

docker run --rm --gpus=all -it -v ${PWD}:/data --workdir=/data --name whisper pytorch/pytorch:2.4.1-cuda12.1-cudnn9-runtime

初始化容器環境

apt update && apt install -y ffmpeg
pip install -U openai-whisper

然後在容器中測試可以成功執行 whisper：

whisper \
  --model turbo \
  --device cuda \
  --task transcribe \
  --language zh \
  --output_dir zh \
  --output_format all \
  --verbose True \
  video.webm

測試成功後，我就寫了一份 Dockerfile 來幫我建立未來要使用的容器映像：

FROM pytorch/pytorch:2.4.1-cuda12.1-cudnn9-runtime

ENV PYTHONWARNINGS="ignore::FutureWarning"
ENV CUDA_LAUNCH_BLOCKING=1

WORKDIR /data

RUN apt-get update && apt-get install -y \
  ffmpeg \
  && rm -rf /var/lib/apt/lists/*

RUN pip install -U openai-whisper

VOLUME [ "/data" ]

ENTRYPOINT [ "whisper" ]

其實我過往嘗試了很多份不同的版本，網路上並沒有太多打包成 Docker 的文章，上述 Dockerfile 是我實驗出最為精簡的版本，也就是直接拿 pytorch/pytorch:2.4.1-cuda12.1-cudnn9-runtime 這個 image 就可以直接提供 whisper 所有需要的執行環境，其中當然還包括 CUDA 的支援！👍

轉錄的中文問題

我之前其實都用 medium 模型來轉錄「中文」聲音檔的字幕，預設 whisper 就會輸出繁體中文，若結果輸入了簡體字，我也可以透過 --initial_prompt 參數提示 whisper 輸出繁體中文。但這次我嘗試 turbo 模型後發現，用原本給 medium 的提示詞並沒有效果，turbo 模型在中文語音轉錄文字時，無論如何都只會輸出簡體中文，這讓我有點困擾，因為這問題在 medium 模型並不會發生。

經過多次實驗，我發現我在執行 whisper 的時候，只要使用以下 --initial_prompt 提示，就可以順利輸出繁體中文：

這是一段以正體中文講解的節目

完整的命令如下：

whisper \
  --model turbo \
  --device cuda \
  --task transcribe \
  --language zh \
  --output_dir zh \
  --verbose True \
  --initial_prompt "這是一段以正體中文講解的節目" \
  video.webm

不過，我嘗試過超過一小時的錄音檔，在轉錄到超過 46 分鐘之後，就會開始輸出「簡體中文」，這個問題似乎還沒有很好的解決方案。

輸出單字層級的時間戳記

我若在執行 whisper 時使用了 --word_timestamps True 參數，試圖輸出單字層級的時間戳記，執行時就會顯示以下警告訊息：

/opt/conda/lib/python3.10/site-packages/whisper/timing.py:42: UserWarning: Failed to launch Triton kernels, likely due to missing CUDA toolkit; falling back to a slower median kernel implementation...
warnings.warn(
/opt/conda/lib/python3.10/site-packages/whisper/timing.py:146: UserWarning: Failed to launch Triton kernels, likely due to missing CUDA toolkit; falling back to a slower DTW implementation...
warnings.warn(

雖然我有找到 --word_timestamps True lead to Failed to launch Triton kernels warning #1283 這篇討論，但找了好幾天都找不到解決方案，直到我今天嘗試了 pytorch/pytorch:2.4.1-cuda12.1-cudnn9-devel 這個容器映像，問題才徹底解決！🎉

其他錯誤

我在執行 whisper 時還會顯示以下警告訊息：

/opt/conda/lib/python3.11/site-packages/whisper/init.py:150: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.

這個問題我是透過新增 PYTHONWARNINGS 環境變數來忽略的：

export PYTHONWARNINGS="ignore::FutureWarning"

最後，我還遇到了一個偶發的 RuntimeError: CUDA error: unknown error 錯誤，這個問題並非每次都會發生，但我這幾天還至少遇過兩次，所以還摸不著頭緒。但因為我的 NVIDIA GeForce RTX 2070 只有 8GB 的 Dedicated GPU memory，在執行 whisper 時，我可以看到 GPU memory 使用率高達 7.4 GB 以上，我在猜想這可能是我遇到 CUDA error: unknown error 的原因之一，所以我建議大家在執行 whisper 時，可以用 watch 指令搭配 nvidia-smi 命令來觀察 GPU memory 的使用情況：

watch -n 1 nvidia-smi

最終結果

我最終版本的 Dockerfile 如下：

FROM pytorch/pytorch:2.4.1-cuda12.1-cudnn9-devel

ENV PYTHONWARNINGS="ignore::FutureWarning"
ENV CUDA_LAUNCH_BLOCKING=1

WORKDIR /data

RUN apt-get update && apt-get install -y \
  ffmpeg \
  && rm -rf /var/lib/apt/lists/*

RUN pip install -U openai-whisper

VOLUME [ "/data" ]

ENTRYPOINT [ "whisper" ]

以下是 Docker 建置命令：

docker build -t my-whisper:latest .

最後，我可以透過以下命令來執行 whisper：

docker run --rm --gpus all -v "G:\data:/data"  -v "whisper-data:/root/.cache/whisper"  -w "/data" --entrypoint "" my-whisper:latest bash -c " whisper --model turbo --device cuda --task transcribe --language zh --output_format all --output_dir zh --initial_prompt '這是一段以正體中文講解的節目。' --word_timestamps True input.m4a "

The Will Will Web

記載著 Will 在網路世界的學習心得與技術分享