Getting Started with Qualcomm Dragonwing

Welcome to the Qualcomm Dragonwing platform! This guide is designed for developers who are new to Dragonwing and want to understand where and which AI models to run, the basics of NPUs, and more.

Overview

Dragonwing is a family of IoT SoCs that offer high performance, advanced connectivity and efficient power - paired with fast GPUs and NPUs; and running a variety of operating systems (Ubuntu, Qualcomm Linux, Android and Windows). This makes them perfect for advanced edge AI workloads, or high-performance computer vision applications.

Where can models run?

Your trained machine learning models can be executed on different processing units of the device:

Compute Target

Description

When to Use

CPU

General-purpose processor. Easy to debug and great for lightweight models or pre/post-processing steps.

When latency isn’t critical or for testing small models.

GPU

Optimized for parallel operations and floating-point workloads.

When you need a balance between performance and precision.

NPU

Specialized accelerator for neural networks. Delivers the best performance-per-watt for AI inference.

For production deployment or real-time edge AI workloads.

What’s an NPU?

An NPU (Neural Processing Unit) is a specialized hardware block designed to accelerate neural network inference.

Instead of executing layer-by-layer operations on the CPU, the NPU performs matrix multiplications and convolutional operations in parallel, dramatically improving throughput and efficiency.

On Qualcomm Dragonwing, the NPU is part of the Qualcomm® AI Engine, which supports frameworks like TensorFlow Lite and ONNX.

Key benefits:

Faster inference — optimized for deep learning layers like Conv2D, Depthwise, and Fully Connected.
Lower power usage — ideal for battery-powered or always-on devices.
Optimized toolchain — integrated with Qualcomm’s SDKs and runtime libraries.

How do NPUs differ from GPUs?

GPUs handle a wide range of parallel tasks, while NPUs are purpose-built for neural network inference. NPUs use dedicated hardware to perform matrix and convolution operations efficiently, delivering faster, lower-power AI performance ideal for edge devices.

What is "quantization"?

Quantization is the process of reducing the precision of a model’s numbers, for example, converting 32-bit floating-point values to 8-bit integers. This makes the model smaller and faster to run, with only a small trade-off in accuracy. On hardware like the Dragonwing’s NPU, quantized models use less memory and power, enabling efficient, real-time AI inference on the edge.

What happens when you run a model on the device?

When you run a model on the Dragonwing platform, your input data (like an image, audio clip, or sensor reading) first goes through preprocessing — scaling, filtering, or reshaping to match the model’s expected input. The processed data is then sent to the selected compute target (CPU, GPU, or NPU), where the model performs inference by running its trained neural network layers. The output is a set of predictions or scores, which your application can interpret as a classification, detection, or control signal — turning raw data into actionable insights directly on the device.

How do you interpret the AI inference results?

The results of an AI model inference are a set of numeric scores that represent how confident the model is in each possible outcome. For example, a classification model might output probabilities for "cat", "dog", or "background." Your application then selects the highest score (or those above a certain threshold) to decide what the model detected. These values can then be displayed, logged, or used to trigger specific actions from within the application code, like turning on an alert when a certain object or sound is recognized.

What is QNN?

QNN, or the Qualcomm® Neural Network SDK, is a toolkit that enables the deployment of AI models on Qualcomm hardware accelerators, such as the NPU (Neural Processing Unit), HTP (Hexagon Tensor Processor), and GPU (Graphics Processing Unit).

It serves as an interface between your model and the underlying hardware, optimizing inference performance while reducing power consumption. In short, QNN allows developers to efficiently run quantized or floating-point models on supported Qualcomm platforms.

Qualcomm Hexagon Tensor Processor (HTP): HTP is a specialized hardware component designed to accelerate AI inference on-device. It is optimized for running neural networks efficiently, with low latency and power consumption.

Next steps

Set up your Dragonwing development environment:
Learn more about the Qualcomm AI Engine SDK

Further resources

TODO: add links for Edge Impulse, AI tutorials, other topics, deep-dives

PreviousWelcome NextRB3 Gen 2 Vision Kit

Last updated 1 month ago