CUDA® and Android on Apalis TK1 in advanced camera systems

quarta-feira, 14 de setembro de 2016

ApalisThis is a Guest Blog Post by Antmicro, a Toradex Partner since many years. Antmicro can support Toradex customers in integrating Toradex modules in their own application, including product design with customized Toradex carrier boards or with custom software. Antmicro has a wide spectrum of SW and HW capabilities, for this blog they will focus on their knowledge around CUDA® and Vision Processing and on the new Toradex Apalis TK1 SoM, based on NVIDIA’s powerful Tegra® K1 SoC. This is the highest performance Toradex module yet, offering deep learning and embedded vision capabilities enabled by the built-in, programmable GPU with 192 CUDA-enabled cores connected to the quad-core Cortex-A15 CPU at 2.2 GHz


Antmicro has a long history of many successful projects with the Toradex modules based on the NVIDIA Tegra 2 and 3, many of which had a video processing and camera focus, involving drivers, gstreamer plugins / processing pipelines and multi-camera systems. Some of this work was described in Antmicro blog notes like the ones about the ADV7280 and Epson S2D13P04 chips. Related demos were also jointly exhibited by Toradex and Antmicro at Embedded World in 2014 and 2015.

Thus, by the time the Toradex Apalis TK1 module was released, there was already an established track record of successful vision/video projects with Toradex Tegra CPUs and appetites were high for a Tegra K1 CPU module coming from Toradex.

This is because TK1 is more than ‘just’ a successor to its older brothers. While both T20 and T30 included an embedded GPU, the one in TK1 is not only much bigger, with 192 CUDA cores, but also - more importantly - fully programmable. The TK1 makes general purpose high-performance parallel programming with CUDA (or OpenCL) available for the first time for embedded applications.

This capability is extremely interesting for various drone, defense and industrial video applications where local, low-latency processing of high volumes of data is the key differentiating factor. With local processing using CUDA, tasks such as deep neural network inference, 4K video processing or live object detection in video streams become viable in a small embedded chip for a broad range of possible use cases.

As an official NVIDIA Jetson ecosystem partner with a high interest in high-performance parallel processing for its own and its customers’ projects, Antmicro was deeply interested in pushing forward a powerful yet cost-effective platform like the Apalis TK1.

That is why Antmicro and Toradex worked closely around the new Apalis TK1 SoM even before its release. The work related to interfacing various cameras with the TK1 leading up to the module’s launch resulted in a blog note about cameras and the Jetson platform. Antmicro had also built the demonstrator for the module’s launch at Embedded World this year, the road sign-recognizing MOPED RC car, which is the first application presented in this post. At the same show, Antmicro - being Toradex’s partner for the Android operating system - also demonstrated Android running on the TK1 within days of receiving the first samples; Android on the TK1 is described further in the article.


To demonstrate what the Apalis TK1 module is capable of, autonomous vision systems are probably one of the most exciting, given their broad applications and growing market demand. Autonomy is perhaps best exemplified with the broadly discussed automotive use case, where self-driving cars are a motor for much of the development in autonomous machines in the recent years, hence Toradex’s and Antmicro’s choice of the TK1 demonstrator for Embedded World.

The MOPED project is an open source research platform originally developed by Antmicro’s partner research institute, SICS, for the automotive industry in the form of a classy RC car. It was rebuilt from the ground up by Antmicro to serve the machine learning/vision use case and runs CUDA for on-board traffic sign recognition on the Apalis TK1 module as its main processing unit. MOPED, which was designed to reflect the typical setup present in many modern cars today, is capable of detecting and recognizing traffic signs thanks to a pre-trained Deep Neural Network using the computing power of the module’s 192 Kepler cores.


The setup includes signs of different shapes and sizes which are displayed on the external monitor. The type of sign and its position is randomized. The classification thread has to solve two problems: finding signs in the surrounding environment and recognizing their type. The latter is done in CUDA® using the cuDNN and Caffe frameworks.


Since object detection and classification does present significant computational complexity, performing it in real-time on embedded platforms used to be difficult to impossible – that is, before platforms like NVIDIA Tegra K1 emerged, bringing down the performance-per-watt for such applications to levels acceptable for packing it into a relatively small physical footprint. In fact, MOPED’s size is mostly owed to its demonstrational and physical aspect - the processing power provided by Antmicro’s very small TK1 development kit including a custom Apalis TK1 baseboard and dual camera board, and other, much smaller applications such as drones, can easily be imagined (in fact, some autonomous vehicle/drone projects are currently under way for Antmicro’s customers).

MOPED has been on display around the world, most recently at XPONENTIAL (New Orleans, USA), a world class event for autonomous/unmanned vehicles, co-exhibited with NVIDIA. Further work is under way to bring the pedestrian detection use case to MOPED as well.


Another application that makes great use of the Apalis TK1 is the AXIOM Gamma 4K open camera. The AXIOM Gamma project was recognized by the EU H2020 funding scheme as breakthrough for offering the world’s first fully modular and open source professional 4K camera. Thanks to the modularity and extensibility aspect, AXIOM can serve as a computer vision platform to suit countless use cases: from filmmaking to industrial vision systems.


The AXIOM is primarily FPGA-based, but enables extensions by means of a backplane exposing the video data to additional modules. In order to provide a widely available computer vision programming paradigm (CUDA) as well as a good-looking and familiar UI (Android) for the camera, an Apalis TK1 camera add-on board was developed by Antmicro with the accompanying software. Currently the TK1 add-on runs Android 5.1 but it will be upgraded to 6.0 in the coming weeks.

The Apalis TK1 module in the camera can be used to process the video data coming in from the camera over MIPI-CSI2. The Android OS enables the use of standard Android camera applications to show and manipulate the video stream, and the CUDA capability makes it possible to further process the video data in real time.


There are more Apalis TK1 projects under way at Antmicro, but we hope we have caught your attention with the above examples. This note is the first in a series of notes by Antmicro concerning interesting topics and work related to Toradex products.

The next post will discuss the differences between Android and Linux on Toradex platforms, what are their strengths and weaknesses and when it is best to use either for your next product.

As a partner of Toradex, Antmicro provides development services for Toradex customers from designing hardware for custom, small form-factor electronics with specialized I/Os, through software development, to implementing GPGPU processing, CUDA, deep learning and embedded vision algorithms.

To learn more about Antmicro’s development services around the Toradex Apalis TK1 SoM, please enquire at

Autor: Catherine Labedzka, Antmicro Ltd.

Deixe um comentário

Please login to leave a comment!
Have a Question?