Whisper

Whisper is OpenAI's general-purpose automatic speech recognition (ASR) model. You can use it for audio transcription, translation, and language identification. You can run Whisper on the NPU of your Dragonwing development board using Qualcomm's VoiceAI ASRarrow-up-right, or on the CPU using whisper.cpparrow-up-right.

Running Whisper on the NPU with VoiceAI ASR

1. Installing SDKs

  1. Open a terminal on your development board, and set up the base requirements for this example:

    sudo apt install -y cmake pulseaudio-utils
  2. Install the AI Runtime SDK - Community Editionarrow-up-right:

    # Install the SDK
    wget -qO- https://cdn.edgeimpulse.com/qc-ai-docs/device-setup/install_ai_runtime_sdk.sh | bash
    
    # Use the SDK in your current session
    source ~/.bash_profile
  3. Install VoiceAI ASR - Community Editionarrow-up-right:

    cd ~/
    
    wget https://softwarecenter.qualcomm.com/api/download/software/sdks/VoiceAI_ASR_Community/All/2.3.0.0/VoiceAI_ASR_Community_v2.3.0.0.zip
    unzip VoiceAI_ASR_Community_v2.3.0.0.zip -d voiceai_asr
    rm VoiceAI_ASR_Community_v2.3.0.0.zip
    
    cd voiceai_asr/2.3.0.0/
    
    # Put the path to VoiceAI ASR in your bash_profile (so it's available under VOICEAI_ROOT)
    echo "" >> ~/.bash_profile
    echo "# Begin VoiceAI ASR" >> ~/.bash_profile
    echo "export VOICEAI_ROOT=$PWD" >> ~/.bash_profile
    echo "# End VoiceAI ASR" >> ~/.bash_profile
    echo "" >> ~/.bash_profile
    
    # Re-load the environment variables
    source ~/.bash_profile
    
    # Symlink Whisper libraries
    cd $VOICEAI_ROOT/whisper_sdk/libs/npu/rpc_libraries/linux/whisper_all_quantized/
    sudo ln -s $PWD/*.so /usr/lib/

2. Download models from AI Hub

With the SDKs installed, you can download precompiled Whisper models from AI Hubarrow-up-right. When downloading a model select the following device:

  • RB3 Gen 2 Vision Kit: 'Qualcomm QCS6490 (Proxy)'

  • RUBIK Pi 3: 'Qualcomm QCS6490 (Proxy)'

  • IQ-9075 EVK: 'Qualcomm QCS9075 (Proxy)'

After downloading, rename the encoder model to encoder_model_htp.bin and the decoder model to decoder_model_htp.bin.

To download the Whisper-Small-Quantizedarrow-up-right model directly on your development board:

  • RB3 Gen 2 Vision Kit / Rubik Pi 3:

  • IQ-9075 EVK:

3. Compiling and running examples

  1. Build the npu_rpc_linux_sample/voice-ai-ref example:

  2. You can now transcribe WAV files:

circle-exclamation
  1. Or even do live transcription:

    1. Connect a microphone to your development board.

    2. Find the name of your microphone:

    3. Run live transcription:

circle-info

TODO: Live transcription broken on 2.3.0.0 (errors out immediately when the VAD flags no speech).

  1. 🚀 You now have fully offline transcription of audio on your development board! VoiceAI ASR does not have bindings to higher level languages (like Python), so if you want to use Whisper in your application it's easiest to just spawn the voice-ai-ref binary, and read data from stdout.

Running Whisper on the CPU with whisper.cpp

Alternatively you can run Whisper on the CPU (with less performance) using whisper.cpp (or any of the other popular Whisper libraries).

Here's instructions for whisper.cpparrow-up-right. Open the terminal on your development board, or an ssh session to your development board, and run:

  1. Install build dependencies:

circle-info

TODO: libsdl2-dev this gave me some trouble... Need to check on a fresh system.

  1. Build whisper.cpp:

  2. Add the whisper.cpp paths to your PATH:

  3. You now transcribe some audio using whisper.cpp:

  4. You can also live transcribe audio:

    1. Connect a microphone to your development board.

    2. Find your microphone ID:

    3. Start live transcribing:

Running on the GPU with OpenCL

You can build binaries that run on the GPU too via:

  1. First follow the steps in llama.cpp under "Install the OpenCL headers and ICD loader library".

  2. Build a binary with OpenCL:

But this does not run faster than on CPU, at least on QCS6490 (even with Q4_0 quantized weights).

Last updated