# Whisper

Whisper is OpenAI's general-purpose automatic speech recognition (ASR) model. You can use it for audio transcription, translation, and language identification. You can run Whisper on the NPU of your Dragonwing development board using Qualcomm's [VoiceAI ASR](https://softwarecenter.qualcomm.com/catalog/item/VoiceAI_ASR_Community), or on the CPU using [whisper.cpp](https://github.com/ggml-org/whisper.cpp).

## Running Whisper on the NPU with VoiceAI ASR

### 1. Installing SDKs

1. Open a terminal on your development board, and set up the base requirements for this example:

   ```bash
   sudo apt install -y cmake pulseaudio-utils
   ```
2. Install the [AI Runtime SDK - Community Edition](https://softwarecenter.qualcomm.com/catalog/item/Qualcomm_AI_Runtime_Community):

   ```bash
   # Install the SDK
   wget -qO- https://cdn.edgeimpulse.com/qc-ai-docs/device-setup/install_ai_runtime_sdk.sh | bash

   # Use the SDK in your current session
   source ~/.bash_profile
   ```
3. Install [VoiceAI ASR - Community Edition](https://softwarecenter.qualcomm.com/catalog/item/VoiceAI_ASR_Community):

   ```bash
   cd ~/

   wget https://softwarecenter.qualcomm.com/api/download/software/sdks/VoiceAI_ASR_Community/All/2.3.0.0/VoiceAI_ASR_Community_v2.3.0.0.zip
   unzip VoiceAI_ASR_Community_v2.3.0.0.zip -d voiceai_asr
   rm VoiceAI_ASR_Community_v2.3.0.0.zip

   cd voiceai_asr/2.3.0.0/

   # Put the path to VoiceAI ASR in your bash_profile (so it's available under VOICEAI_ROOT)
   echo "" >> ~/.bash_profile
   echo "# Begin VoiceAI ASR" >> ~/.bash_profile
   echo "export VOICEAI_ROOT=$PWD" >> ~/.bash_profile
   echo "# End VoiceAI ASR" >> ~/.bash_profile
   echo "" >> ~/.bash_profile

   # Re-load the environment variables
   source ~/.bash_profile

   # Symlink Whisper libraries
   cd $VOICEAI_ROOT/whisper_sdk/libs/npu/rpc_libraries/linux/whisper_all_quantized/
   sudo ln -s $PWD/*.so /usr/lib/
   ```

### 2. Download models from AI Hub

With the SDKs installed, you can download precompiled Whisper models from [AI Hub](https://aihub.qualcomm.com/models?searchTerm=whisper). When downloading a model select the following device:

* RB3 Gen 2 Vision Kit: 'Qualcomm QCS6490 (Proxy)'
* RUBIK Pi 3: 'Qualcomm QCS6490 (Proxy)'
* IQ-9075 EVK: 'Qualcomm QCS9075 (Proxy)'

After downloading, rename the encoder model to `encoder_model_htp.bin` and the decoder model to `decoder_model_htp.bin`.

To download the [Whisper-Small-Quantized](https://aihub.qualcomm.com/models/whisper_small_quantized?searchTerm=whisper\&chipsets=qualcomm-qcs6490-proxy) model directly on your development board:

* RB3 Gen 2 Vision Kit / Rubik Pi 3:

  ```bash
  mkdir -p ~/whisper_models/ai_hub_small_quantized/
  cd ~/whisper_models/ai_hub_small_quantized/

  # Models from https://aihub.qualcomm.com/models/whisper_small_quantized for QCS6490 (Proxy) target
  wget -O encoder_model_htp.bin https://huggingface.co/qualcomm/Whisper-Small-Quantized/resolve/0e21411/precompiled/qualcomm-qcs6490-proxy/Whisper-Small-Quantized_WhisperSmallEncoderQuantizable_w8a16.bin
  wget -O decoder_model_htp.bin https://huggingface.co/qualcomm/Whisper-Small-Quantized/resolve/0e21411/precompiled/qualcomm-qcs6490-proxy/Whisper-Small-Quantized_WhisperSmallDecoderQuantizable_w8a16.bin

  # Vocab file is not in AI Hub yet, grab from our CDN
  wget -O vocab.bin https://cdn.edgeimpulse.com/qc-ai-docs/models/whisper/vocab.bin
  ```
* IQ-9075 EVK:

  ```bash
  mkdir -p ~/whisper_models/ai_hub_small_quantized/
  cd ~/whisper_models/ai_hub_small_quantized/

  # Models from https://aihub.qualcomm.com/models/whisper_small_quantized for QCS9075 (Proxy) target
  wget -O encoder_model_htp.bin https://huggingface.co/qualcomm/Whisper-Small-Quantized/resolve/0e21411/precompiled/qualcomm-qcs9075-proxy/Whisper-Small-Quantized_WhisperSmallEncoderQuantizable_w8a16.bin
  wget -O decoder_model_htp.bin https://huggingface.co/qualcomm/Whisper-Small-Quantized/resolve/0e21411/precompiled/qualcomm-qcs9075-proxy/Whisper-Small-Quantized_WhisperSmallDecoderQuantizable_w8a16.bin

  # Vocab file is not in AI Hub yet, grab from our CDN
  wget -O vocab.bin https://cdn.edgeimpulse.com/qc-ai-docs/models/whisper/vocab.bin
  ```

### 3. Compiling and running examples

1. Build the `npu_rpc_linux_sample/voice-ai-ref` example:

   ```bash
   cd $VOICEAI_ROOT/whisper_sdk/sampleapp/npu_rpc_linux_sample/voice-ai-ref

   # Overwrite the LogUtil.h function to log to stdout
   wget -O include/LogUtil.h https://cdn.edgeimpulse.com/qc-ai-docs/code/voiceai_asr/2.3.0.0/whisper_sdk/sampleapp/npu_rpc_linux_sample/voice-ai-ref/include/LogUtil.h
   # Overwrite the main.cpp example to add microphone selection
   wget -O src/main.cpp https://cdn.edgeimpulse.com/qc-ai-docs/code/voiceai_asr/2.3.0.0/whisper_sdk/sampleapp/npu_rpc_linux_sample/voice-ai-ref/src/main.cpp

   # Symlink Whisper libraries for build
   mkdir -p libs/arm64-v8a/
   cd libs/arm64-v8a/
   ln -s $VOICEAI_ROOT/whisper_sdk/libs/npu/rpc_libraries/linux/whisper_all_quantized/*.so .
   cd ../../

   mkdir -p build
   cd build
   cmake ..
   make -j`nproc`
   ```
2. You can now transcribe WAV files:

   ```bash
   cd $VOICEAI_ROOT/whisper_sdk/sampleapp/npu_rpc_linux_sample/voice-ai-ref/build

   # Download sample file
   wget -O jfk.wav https://raw.githubusercontent.com/ggml-org/whisper.cpp/refs/heads/master/samples/jfk.wav

   # Transcribe:
   ./voice-ai-ref -f jfk.wav -l en -t transcribe -m ~/whisper_models/ai_hub_small_quantized/

   # ... Expected result:
   # VoiceAIRef final result =  And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country. [language: English]
   ```

{% hint style="warning" %}
**TODO:** Can't figure out how to quit the application once transcription is complete.
{% endhint %}

3. Or even do live transcription:
   1. Connect a microphone to your development board.
   2. Find the name of your microphone:

      ```bash
      pactl list short sources
      # 49	alsa_output.platform-sound.stereo-fallback.monitor	PipeWire	s24-32le 2ch 48000Hz	SUSPENDED
      # 76	alsa_input.usb-046d_C922_Pro_Stream_Webcam_C72F6EDF-02.analog-stereo	PipeWire	s16le 2ch 32000Hz	SUSPENDED

      # To use the USB webcam, use "alsa_input.usb-046d_C922_Pro_Stream_Webcam_C72F6EDF-02.analog-stereo" as the name
      ```
   3. Run live transcription:

      ```bash
      ./voice-ai-ref -r -l en -t transcribe -m ~/whisper_models/ai_hub_small_quantized/ -d "alsa_input.usb-046d_C922_Pro_Stream_Webcam_C72F6EDF-02.analog-stereo"

      # VoiceAIRef final result =  Hi, this is to see if I can do live transcription on my Rubik Pi. [language: English]
      ```

{% hint style="info" %}
**TODO:** Live transcription broken on 2.3.0.0 (errors out immediately when the VAD flags no speech).
{% endhint %}

4. 🚀 You now have fully offline transcription of audio on your development board! VoiceAI ASR does not have bindings to higher level languages (like Python), so if you want to use Whisper in your application it's easiest to just spawn the `voice-ai-ref` binary, and read data from stdout.

## Running Whisper on the CPU with whisper.cpp

Alternatively you can run Whisper on the CPU (with less performance) using whisper.cpp (or any of the other popular Whisper libraries).

Here's instructions for [whisper.cpp](https://github.com/ggml-org/whisper.cpp). Open the terminal on your development board, or an ssh session to your development board, and run:

1. Install build dependencies:

   ```bash
   sudo apt update
   sudo apt install -y libsdl2-dev libsdl2-2.0-0 libasound2-dev
   ```

{% hint style="info" %}
**TODO:** `libsdl2-dev` this gave me some trouble... Need to check on a fresh system.
{% endhint %}

2. Build whisper.cpp:

   ```bash
   mkdir -p ~/dev/llm/
   cd ~/dev/llm/

   git clone https://github.com/ggml-org/whisper.cpp.git
   cd whisper.cpp
   git checkout v1.7.6

   # Build (CPU)
   cmake -B build-cpu -DWHISPER_SDL2=ON
   cmake --build build-cpu -j`nproc` --config Release
   ```
3. Add the whisper.cpp paths to your PATH:

   ```bash
   cd ~/dev/llm/whisper.cpp/build-cpu/bin

   echo "" >> ~/.bash_profile
   echo "# Begin whisper.cpp" >> ~/.bash_profile
   echo "export PATH=\$PATH:$PWD" >> ~/.bash_profile
   echo "# End whisper.cpp" >> ~/.bash_profile
   echo "" >> ~/.bash_profile

   # To use the whisper.cpp files in your current session
   source ~/.bash_profile
   ```
4. You now transcribe some audio using whisper.cpp:

   ```bash
   # Download model
   cd ~/dev/llm/whisper.cpp
   sh ./models/download-ggml-model.sh tiny.en-q5_1

   # Transcribe text
   whisper-cli -m models/ggml-tiny.en-q5_1.bin -f samples/jfk.wav

   # [00:00:00.000 --> 00:00:10.480]
   # and so my fellow Americans ask not what your country can do for you ask what you can do for your country
   ```
5. You can also live transcribe audio:
   1. Connect a microphone to your development board.
   2. Find your microphone ID:

      ```bash
      SDL_AUDIODRIVER=alsa whisper-stream -m models/ggml-tiny.en-q5_1.bin
      # init: found 2 capture devices:
      # init:    - Capture device #0: 'qcm6490-rb3-vision-snd-card, '
      # init:    - Capture device #1: 'Yeti Stereo Microphone, USB Audio'

      # If you want "Yeti Stereo Microphone, USB Audio" then the ID is 1
      ```
   3. Start live transcribing:

      ```bash
      SDL_AUDIODRIVER=alsa whisper-stream -m models/ggml-tiny.en-q5_1.bin -c 1

      # main: processing 48000 samples (step = 3.0 sec / len = 10.0 sec / keep = 0.2 sec), 4 threads, lang = en, task = transcribe, timestamps = 0 ...
      # main: n_new_line = 2, no_context = 1
      #
      # [Start speaking]
      # This is a test to see if you can transcribe text live on your Qualcomm device
      ```

### Running on the GPU with OpenCL

You can build binaries that run on the GPU too via:

1. First follow the steps in [llama.cpp](https://qc-ai-test.gitbook.io/qc-ai-test-docs/running-building-ai-models/llama-cpp) under "Install the OpenCL headers and ICD loader library".
2. Build a binary with OpenCL:

   ```bash
   cd ~/dev/llm/whisper.cpp

   cmake -B build-gpu -DGGML_OPENCL=ON  -DWHISPER_SDL2=ON
   cmake --build build-gpu -j`nproc` --config Release

   # Find the binary in:
   #     build-gpu/bin/whisper-cli
   ```

But this does not run faster than on CPU, at least on QCS6490 (even with Q4\_0 quantized weights).
