Whisper
Whisper is OpenAI's general-purpose automatic speech recognition (ASR) model. You can use it for audio transcription, translation, and language identification. You can run Whisper on the NPU of your Dragonwing development board using Qualcomm's VoiceAI ASR, or on the CPU using whisper.cpp.
Running Whisper on the NPU with VoiceAI ASR
1. Installing SDKs
Open a terminal on your development board, and set up the base requirements for this example:
sudo apt install -y cmake pulseaudio-utilsInstall the AI Runtime SDK - Community Edition:
# Install the SDK wget -qO- https://cdn.edgeimpulse.com/qc-ai-docs/device-setup/install_ai_runtime_sdk.sh | bash # Use the SDK in your current session source ~/.bash_profileInstall VoiceAI ASR - Community Edition:
cd ~/ wget https://softwarecenter.qualcomm.com/api/download/software/sdks/VoiceAI_ASR_Community/All/2.3.0.0/VoiceAI_ASR_Community_v2.3.0.0.zip unzip VoiceAI_ASR_Community_v2.3.0.0.zip -d voiceai_asr rm VoiceAI_ASR_Community_v2.3.0.0.zip cd voiceai_asr/2.3.0.0/ # Put the path to VoiceAI ASR in your bash_profile (so it's available under VOICEAI_ROOT) echo "" >> ~/.bash_profile echo "# Begin VoiceAI ASR" >> ~/.bash_profile echo "export VOICEAI_ROOT=$PWD" >> ~/.bash_profile echo "# End VoiceAI ASR" >> ~/.bash_profile echo "" >> ~/.bash_profile # Re-load the environment variables source ~/.bash_profile # Symlink Whisper libraries cd $VOICEAI_ROOT/whisper_sdk/libs/npu/rpc_libraries/linux/whisper_all_quantized/ sudo ln -s $PWD/*.so /usr/lib/
2. Download models from AI Hub
With the SDKs installed, you can download precompiled Whisper models from AI Hub. When downloading a model select the following device:
RB3 Gen 2 Vision Kit: 'Qualcomm QCS6490 (Proxy)'
RUBIK Pi 3: 'Qualcomm QCS6490 (Proxy)'
IQ-9075 EVK: 'Qualcomm QCS9075 (Proxy)'
After downloading, rename the encoder model to encoder_model_htp.bin and the decoder model to decoder_model_htp.bin.
To download the Whisper-Small-Quantized model directly on your development board:
RB3 Gen 2 Vision Kit / Rubik Pi 3:
mkdir -p ~/whisper_models/ai_hub_small_quantized/ cd ~/whisper_models/ai_hub_small_quantized/ # Models from https://aihub.qualcomm.com/models/whisper_small_quantized for QCS6490 (Proxy) target wget -O encoder_model_htp.bin https://huggingface.co/qualcomm/Whisper-Small-Quantized/resolve/0e21411/precompiled/qualcomm-qcs6490-proxy/Whisper-Small-Quantized_WhisperSmallEncoderQuantizable_w8a16.bin wget -O decoder_model_htp.bin https://huggingface.co/qualcomm/Whisper-Small-Quantized/resolve/0e21411/precompiled/qualcomm-qcs6490-proxy/Whisper-Small-Quantized_WhisperSmallDecoderQuantizable_w8a16.bin # Vocab file is not in AI Hub yet, grab from our CDN wget -O vocab.bin https://cdn.edgeimpulse.com/qc-ai-docs/models/whisper/vocab.binIQ-9075 EVK:
mkdir -p ~/whisper_models/ai_hub_small_quantized/ cd ~/whisper_models/ai_hub_small_quantized/ # Models from https://aihub.qualcomm.com/models/whisper_small_quantized for QCS9075 (Proxy) target wget -O encoder_model_htp.bin https://huggingface.co/qualcomm/Whisper-Small-Quantized/resolve/0e21411/precompiled/qualcomm-qcs9075-proxy/Whisper-Small-Quantized_WhisperSmallEncoderQuantizable_w8a16.bin wget -O decoder_model_htp.bin https://huggingface.co/qualcomm/Whisper-Small-Quantized/resolve/0e21411/precompiled/qualcomm-qcs9075-proxy/Whisper-Small-Quantized_WhisperSmallDecoderQuantizable_w8a16.bin # Vocab file is not in AI Hub yet, grab from our CDN wget -O vocab.bin https://cdn.edgeimpulse.com/qc-ai-docs/models/whisper/vocab.bin
3. Compiling and running examples
Build the
npu_rpc_linux_sample/voice-ai-refexample:cd $VOICEAI_ROOT/whisper_sdk/sampleapp/npu_rpc_linux_sample/voice-ai-ref # Overwrite the LogUtil.h function to log to stdout wget -O include/LogUtil.h https://cdn.edgeimpulse.com/qc-ai-docs/code/voiceai_asr/2.3.0.0/whisper_sdk/sampleapp/npu_rpc_linux_sample/voice-ai-ref/include/LogUtil.h # Overwrite the main.cpp example to add microphone selection wget -O src/main.cpp https://cdn.edgeimpulse.com/qc-ai-docs/code/voiceai_asr/2.3.0.0/whisper_sdk/sampleapp/npu_rpc_linux_sample/voice-ai-ref/src/main.cpp # Symlink Whisper libraries for build mkdir -p libs/arm64-v8a/ cd libs/arm64-v8a/ ln -s $VOICEAI_ROOT/whisper_sdk/libs/npu/rpc_libraries/linux/whisper_all_quantized/*.so . cd ../../ mkdir -p build cd build cmake .. make -j`nproc`You can now transcribe WAV files:
cd $VOICEAI_ROOT/whisper_sdk/sampleapp/npu_rpc_linux_sample/voice-ai-ref/build # Download sample file wget -O jfk.wav https://raw.githubusercontent.com/ggml-org/whisper.cpp/refs/heads/master/samples/jfk.wav # Transcribe: ./voice-ai-ref -f jfk.wav -l en -t transcribe -m ~/whisper_models/ai_hub_small_quantized/ # ... Expected result: # VoiceAIRef final result = And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country. [language: English]
TODO: Can't figure out how to quit the application once transcription is complete.
Or even do live transcription:
Connect a microphone to your development board.
Find the name of your microphone:
pactl list short sources # 49 alsa_output.platform-sound.stereo-fallback.monitor PipeWire s24-32le 2ch 48000Hz SUSPENDED # 76 alsa_input.usb-046d_C922_Pro_Stream_Webcam_C72F6EDF-02.analog-stereo PipeWire s16le 2ch 32000Hz SUSPENDED # To use the USB webcam, use "alsa_input.usb-046d_C922_Pro_Stream_Webcam_C72F6EDF-02.analog-stereo" as the nameRun live transcription:
./voice-ai-ref -r -l en -t transcribe -m ~/whisper_models/ai_hub_small_quantized/ -d "alsa_input.usb-046d_C922_Pro_Stream_Webcam_C72F6EDF-02.analog-stereo" # VoiceAIRef final result = Hi, this is to see if I can do live transcription on my Rubik Pi. [language: English]
🚀 You now have fully offline transcription of audio on your development board! VoiceAI ASR does not have bindings to higher level languages (like Python), so if you want to use Whisper in your application it's easiest to just spawn the
voice-ai-refbinary, and read data from stdout.
Running Whisper on the CPU with whisper.cpp
Alternatively you can run Whisper on the CPU (with less performance) using whisper.cpp (or any of the other popular Whisper libraries).
Here's instructions for whisper.cpp. Open the terminal on your development board, or an ssh session to your development board, and run:
Install build dependencies:
sudo apt update sudo apt install -y libsdl2-dev libsdl2-2.0-0 libasound2-dev
Build whisper.cpp:
mkdir -p ~/dev/llm/ cd ~/dev/llm/ git clone https://github.com/ggml-org/whisper.cpp.git cd whisper.cpp git checkout v1.7.6 # Build (CPU) cmake -B build-cpu -DWHISPER_SDL2=ON cmake --build build-cpu -j`nproc` --config ReleaseAdd the whisper.cpp paths to your PATH:
cd ~/dev/llm/whisper.cpp/build-cpu/bin echo "" >> ~/.bash_profile echo "# Begin whisper.cpp" >> ~/.bash_profile echo "export PATH=\$PATH:$PWD" >> ~/.bash_profile echo "# End whisper.cpp" >> ~/.bash_profile echo "" >> ~/.bash_profile # To use the whisper.cpp files in your current session source ~/.bash_profileYou now transcribe some audio using whisper.cpp:
# Download model cd ~/dev/llm/whisper.cpp sh ./models/download-ggml-model.sh tiny.en-q5_1 # Transcribe text whisper-cli -m models/ggml-tiny.en-q5_1.bin -f samples/jfk.wav # [00:00:00.000 --> 00:00:10.480] # and so my fellow Americans ask not what your country can do for you ask what you can do for your countryYou can also live transcribe audio:
Connect a microphone to your development board.
Find your microphone ID:
SDL_AUDIODRIVER=alsa whisper-stream -m models/ggml-tiny.en-q5_1.bin # init: found 2 capture devices: # init: - Capture device #0: 'qcm6490-rb3-vision-snd-card, ' # init: - Capture device #1: 'Yeti Stereo Microphone, USB Audio' # If you want "Yeti Stereo Microphone, USB Audio" then the ID is 1Start live transcribing:
SDL_AUDIODRIVER=alsa whisper-stream -m models/ggml-tiny.en-q5_1.bin -c 1 # main: processing 48000 samples (step = 3.0 sec / len = 10.0 sec / keep = 0.2 sec), 4 threads, lang = en, task = transcribe, timestamps = 0 ... # main: n_new_line = 2, no_context = 1 # # [Start speaking] # This is a test to see if you can transcribe text live on your Qualcomm device
Running on the GPU with OpenCL
You can build binaries that run on the GPU too via:
First follow the steps in llama.cpp under "Install the OpenCL headers and ICD loader library".
Build a binary with OpenCL:
cd ~/dev/llm/whisper.cpp cmake -B build-gpu -DGGML_OPENCL=ON -DWHISPER_SDL2=ON cmake --build build-gpu -j`nproc` --config Release # Find the binary in: # build-gpu/bin/whisper-cli
But this does not run faster than on CPU, at least on QCS6490 (even with Q4_0 quantized weights).
Last updated