IM SDK

The Qualcomm® Intelligent Multimedia SDK (IM SDK) is a set of GStreamer plugins that let you run computer vision operations on the GPU of your Dragonwing development board; and that can create AI pipelines that run fully on GPU and NPU, without ever having to yield back to the CPU (zero-copy). Together this makes it possible to achieve much higher throughput rates than when you implement AI CV pipelines yourself in e.g. OpenCV + TFLite.

So... GStreamer pipelines?

The IM SDK is built on top of GStreamer. GStreamer is a multimedia framework that lets you describe a processing pipeline for video or audio, and it takes care of running each step in order. In 'normal Python' you might write OpenCV code that grabs a frame from a webcam, resizes and crops it, calls into an inference function, draws bounding boxes on the result, and then outputs or displays the frame again — with each step running on the CPU unless you explicitly wire up GPU/NPU APIs yourself. With GStreamer + IM SDK, you declare that same sequence once in a pipeline string, and the framework streams frames through the chain for you.

What IM SDK adds on Qualcomm hardware is the ability for those steps to be transparently accelerated: resize/crop and drawing bounding boxes can run on the GPU, inference can run on the NPU, and whole chains of operations (e.g. crop → resize → NN inference) can execute without ever yielding back to the CPU (zero-copy). From your application you only need to configure the pipeline; the underlying framework handles frame-by-frame scheduling, synchronization, and accelerator offload.

The IM SDK provides the special GStreamer plugins that make this possible. For example, qtivtransform offloads color conversion, cropping, and resizing to the GPU, while qtimltflite handles inference on the NPU. This way, the same high-level pipeline you'd write with standard GStreamer can now run almost entirely on dedicated accelerators, giving you real-time throughput with minimal CPU load.

Setting up GStreamer and the IM SDK

Alright, let's go build some applications using the IM SDK.

  1. Install GStreamer, the IM SDK and some extra dependencies we'll need in this example. Open the terminal on your development board, or an ssh session to your development board, and run:

    if [ ! -f /etc/apt/sources.list.d/ubuntu-qcom-iot-ubuntu-qcom-ppa-noble.list ]; then
        sudo apt-add-repository -y ppa:ubuntu-qcom-iot/qcom-ppa
    fi
    
    # Install GStreamer / IM SDK
    sudo apt update
    sudo apt install -y gstreamer1.0-tools gstreamer1.0-plugins-good gstreamer1.0-plugins-base gstreamer1.0-plugins-base-apps gstreamer1.0-plugins-qcom-good gstreamer1.0-qcom-sample-apps
    
    # Install Python bindings for GStreamer, and some build dependencies
    sudo apt install -y v4l-utils libcairo2-dev pkg-config python3-dev libgirepository1.0-dev gir1.2-gstreamer-1.0
  2. Clone the example repo, create a venv, and install its dependencies:

    # Clone repo
    git clone https://github.com/edgeimpulse/qc-ai-test-docs-examples.git
    cd qc-ai-test-docs-examples/imsdk/tutorial
    
    # Create a new venv
    python3 -m venv .venv
    source .venv/bin/activate
    
    # Install Python dependencies
    pip3 install -r requirements.txt
  3. You'll need a camera (either built-in, on the RB3 Gen 2 Vision Kit) or a USB webcam.

    • If you want to use a USB webcam:

      1. Find out the device ID:

        v4l2-ctl --list-devices
        # msm_vidc_media (platform:aa00000.video-codec):
        #         /dev/media0
        #
        # msm_vidc_decoder (platform:msm_vidc_bus):
        #         /dev/video32
        #         /dev/video33
        #
        # C922 Pro Stream Webcam (usb-0000:01:00.0-2):
        #         /dev/video2     <-- So /dev/video2
        #         /dev/video3
        #         /dev/media3
      2. Set the environment variable (we'll use this in our examples):

        export IMSDK_VIDEO_SOURCE="v4l2src device=/dev/video2"
    • If you're on the RB3 Gen 2 Vision Kit, and want to use the built-in camera:

      export IMSDK_VIDEO_SOURCE="qtiqmmfsrc name=camsrc camera=0"

Ex 1: Resizing and cropping on GPU vs. CPU

Let's show how much faster working on the GPU can be compared to the CPU. If you have a neural network that expects a 224x224 RGB input, you'll need to preprocess your data: first, grab the frame from the webcam (e.g. native resolution is 1980x1080), then crop it to a 1/1 aspect ratio (e.g. crop to 1080x1080), then resize to the desired resolution (224x224), and then create a Numpy array from the pixels.

  1. Create a new file ex1.py, and add:

  2. Let's run this. This pipeline runs on the CPU (using vanilla GStreamer components):

    Here you see the resize/crop takes 18ms., for a total of ~20ms. per frame processing time (measured on RB3 with the built-in camera).

  3. Now let's make this run on the GPU instead... Replace:

    With:

  4. Run this again:

    🚀 You've now sped up the crop/resize operation by 9 times; with just two lines of code.

Ex 2: Tee'ing streams and multiple outputs

So... in the pipeline above you've seen a few elements that will be relevant when interacting with your own code:

  • Identity elements (e.g. identity name=frame_ready_webcam silent=false). These can be used to debug timing in a pipeline. The timestamp when they're emitted is saved, and then returned at the end of the pipeline in the marks element (k/v pair, key is the identity name, value is the timestamp).

  • Appsink elements (e.g. appsink name=frame). These are used to send data from a GStreamer pipeline to your application. Here the element before the appsink is a video/x-raw,format=RGB,width=224,height=224 - so we'll send a 224x224 RGB array to Python. You receive these in the frames_by_sink element (k/v pair, key is the appsink name, value is the data).

You can have multiple appsinks per pipeline. For example, you might want to grab the original 1920x1080 image as well. In that case you can split the pipeline up in two parts, right after identity name=frame_ready_webcam; and send one part to a new appsink; and the other part through the resize/crop pipeline.

  1. Create a new file ex2.py and add:

  2. Run this:

    (The out/ directory has the last processed frames in both original and resized resolutions)

Alright! That gives you two outputs from a single pipeline. Now you know how to construct more complex applications in a single pipeline.

Ex 3: Run a neural network

Now that we have images streaming from the webcam in the correct resolution, let's add a neural network to the mix.

3.1: Neural network and compositing in Python

circle-exclamation
  1. First we'll do a 'normal' implementation, where take the resized frame from the IM SDK pipeline, and then use LiteRT to run the model (on the NPU). Afterwards we'll then we'll the draw the top conclusion on the image and write it to disk. Create a new file ex3_from_python.py and add:

  2. Now run this application:

    Image classification model with an overlay
    Image classification model with an overlay

    Absolutely not bad, but let's see if we can do better...

3.2: Running the neural network with IM SDK

circle-exclamation

Let's move the neural network inference to the IM SDK. You do this through three plugins:

  • qtimlvconverter - to convert the frame into an input tensor.

  • qtimltflite - to run a neural network (in LiteRT format). If you send these results over an appsink you'll get the exact same tensor back as earlier (you just didn't need to hit the CPU to invoke the inference engine).

  • An element like qtimlvclassification to interpret the output. Here this plugin is made for image classification usecases (like the SqueezeNet model we use) with a (1, n) shape. This plugin spits either text out (with the predictions), or it spits an overlay out (to draw onto the original image).

    • This element has a particular labels format (see below).

  1. Create a new file ex3_nn_imsdk.py and add:

circle-exclamation
  1. Now run this application:

    OK! The model now runs on the NPU inside the IM SDK pipeline. If you rather have the top 5 outputs (like we did in 3.1), you can tee the stream after the qtimltflite element and send the raw output tensor back to the application as well.

circle-info

Overlay image: If you want to see the overlay image, rather than the text, see tutorial/_ex3_nn_imsdk_show_overlay.py.

3.3: Overlays

To mimic the output in 3.1 we also want to draw an overlay. Let's first demonstrate that with a static overlay image.

  1. Download a semi-transparent image (sourcearrow-up-right):

  2. Create a new file ex3_overlay.py and add:

  3. Run this application:

    Static overlay onto webcam image
    Static overlay onto webcam image

3.4: Combining neural network with overlay

You've now seen how to run a neural network as part of an IM SDK pipeline; and you've seen how to draw overlays. Let's combine these into a single pipeline, where we overlay the prediction onto the image - all without ever touching the CPU.

  1. Create a new file ex3_from_imsdk.py and add:

  2. Run this application:

    Great! This whole pipeline now runs in the IM SDK. You can find the output image in out/webcam_with_overlay_imsdk.png:

    Image classification model with an overlay rendered by IM SDK
    Image classification model with an overlay rendered by IM SDK
circle-exclamation
circle-info

TODO #2: Any way to modify the overlay to add the confidence rating?

Troubleshooting

Pipeline does not yield anything

If you don't see any output, add GST_DEBUG=3 to see more detailed debug info.

QMMF Recorder StartCamera Failed / Failed to Open Camera

If you see an error like (using the built-in camera on the RB3 Gen 2 Vision Kit):

Run:

Last updated