In this series, we’ll build a Data-Centric pipeline using è to debug and fix a model trained with the NVIDIA TAO Toolkit.
Part 1.
Part 2.
Part 3.
🚨Spoiler alert! By the end of this series you will learn how è enabled us to increase performance in the worst performing class, from ~0.27 mAP to ~0.81 mAP as illustrated on Figure 1.
Figure 1. Left (baseline model), Right (model with fixed dataset).
Table of Contents
- Our task and the challenges ahead
- Detour: The NVIDIA Ecosystem 101
- è API
- Defining a Data-Centric Pipeline
- Training with the NVIDIA TAO Toolkit
- What’s next
1. Our task and the challenges ahead
By the time you finish this series, you will be able to build your own Data-Centric pipeline (see Figure 2) using state-of-the-art tools developed by è to debug and fix a computer vision model trained with the NVIDIA TAO framework.

1.1 Task: Detecting cars, traffic lights, crosswalks
We will be working with an autonomous vehicles dataset. Imagine you are tasked with building a system capable of detecting various types of objects on the road (see Figure 3).
Figure 3. Autonomous vehicles dataset
Please feel free to download a copy of the dataset in the COCO format .
1.2 Challenges
Data-Centric, as explained in more detail in , was proposed by ML researcher and entrepreneur Andrew Ng. You can find plenty of resources, including formal on Data-Centric.
- The first challenge arises when translating theory into practice. Imagine you are given an ML system that isn’t performing as expected, and your task is to improve its performance. Where do you start? What do you do first? What are the common pitfalls you should avoid?
- Putting together a bunch of scripts in a Jupyter notebook is one thing; building a repeatable process that can help your team solve similar problems in the future is another.
- Additionally, what if you’re dealing with a model trained with the NVIDIA TAO framework? We love NVIDIA, but sometimes their documentation isn’t easy to follow.
Since we’ll be using the NVIDIA TAO framework, let’s start by coming to terms with some of the concepts of the NVIDIA ecosystem that are relevant to keep in mind when working with the NVIDIA TAO Toolkit.
2. Detour: The NVIDIA Ecosystem 101
2.1 NVIDIA TAO Toolkit

At the center stage is the (see Figure 4), a CLI and Jupyter Notebook-based solution that abstracts away the complexity of AI and deep learning frameworks.
- What can I do with this? You can fine-tune with your own data.
- What’s the output of the NVIDIA TAO Toolkit? A trained model that can be deployed in , , or (more on these terms below).
- Why might I want to use NVIDIA TAO? 1) Robust workflows for fine-tuning, 2) Optimization of models for edge devices, 3) Scalability across multiple GPUs, and 4) Integration with CUDA, cuDNN, and .
- What system requirements do I need to use the NVIDIA TAO Toolkit? A list of all the requirements you need can be found ! Hint: Here’s a that have everything you need!
- How can I interact with the NVIDIA TAO Toolkit? The easiest way is to use a Jupyter Notebook that will contain all the code you need to train a model. These notebooks are contained in the NGC catalog (see below).
- What are some good resources to learn more about the NVIDIA TAO Toolkit? (1) The NVIDIA TAO Toolkit official documentation, (2) This website contains some of the best tutorials on the NVIDIA ecosystem, and (3) This video contains a clear guide to install the NVIDIA TAO Toolkit.
2.2 The NGC Catalog
The is a GPU-optimized hub with performance-optimized frameworks, SDKs, and models to build Computer Vision and Speech AI applications.
To run an NVIDIA TAO pipeline, you need to create an account on the NGC Catalog. After that, you will be provided with an API key to download resources (e.g. Jupyter Notebooks) from the NGC Docker registry.
2.3 Deepstream, Triton and TensorRT
is a streaming analytics toolkit for building end-to-end AI-powered solutions. It takes streaming data as input and uses AI and computer vision to generate insights from pixels.
NVIDIA TAO can be used to train and adapt models that can be integrated into applications built with DeepStream for real-time video analysis.
provides a scalable and efficient way to serve models in production environments. It’s akin to a backend where you can run your models and process HTTP requests with images.
The combination of NVIDIA TAO and Triton Inference Server provides an end-to-end workflow: NVIDIA TAO handles the training and adaptation, while Triton manages the serving and scaling of models in production.
is a high-performance deep learning inference library with the goal of optimizing models for efficient inference on NVIDIA GPUs. Some optimizations include: precision calibration, layer fusion, kernel auto-tuning, dynamic tensor memory, and support for .
After a model has been trained with NVIDIA TAO, TensorRT can be employed to optimize the model for efficient inference.
2.4 NVIDIA Jetson
Jetson is a platform for AI at the edge. Once you have trained your model with NVIDIA TAO, you can make use of these power-efficient production modules and developer kits that offer an AI software stack for high-performance acceleration to power AI at the edge.
Many folks get confused here, hence let’s break down the NVIDIA Jetson platform.
- are development platforms that include a Jetson module along with additional components such as carrier boards, power supplies, and peripherals. These kits are intended for developers to prototype, test, and develop applications before deploying them on end-user products. There are three types of Jetson Developer Kits (ordered from more to less performant): (up to 275 TOPS), (40 TOPS), .
- are compact, power-efficient compute modules that integrate the CPU, GPU, and other essential components into a single package. These modules are designed to be integrated into end-user products or systems to enable AI capabilities.
🤔 What does TOPS stand for? In the context of NVIDIA’s computing performance measurements, TOPS stands for “trillions of operations per second”.
3. è API
3.1 Create automated workflows to debug your data
Data is at the core of what we do at è: 👩🔬 we are pioneering the way humans interact with AI. To achieve that goal, we focus on creating optimal tooling for ML teams to debug AI systems at scale using a Data-Centric approach.
We’ve previously detailed how some of these tools operate in our earlier posts. For instance, you can use to identify edge cases or swiftly where your model is failing. Now, we’ll turn our attention to the , a set of endpoints that can assist you in building automated workflows to debug computer vision models.
We offer you a sneak peek into the endpoint for creating a dataset. In the upcoming posts of this series, we will take a closer look at the rest of them.
3.2 Create a dataset
The code below demonstrates how to make a POST request to the URL to create a dataset on your è account.
You can specify: i) the type of computer vision task (e.g., object_detection) you are interested in, ii) the location of your AWS bucket where your data will be stored, and iii) the name of your dataset.
4. Defining a Data-Centric pipeline

Figure 5 describes the Data-Centric Pipeline we will build.
- Data ingestion. In this stage, we use the è API to ingest our dataset on the è platform.
- mAP analysis. Once your data is on the platform, we conduct a performance analysis to explore the best and worst-performing classes.
- Data imbalance. We identify potential classes with imbalance issues.
- Class selection. Based on the previous analysis, we select the classes with the most room for improvent (i.e., often the less performing classes).
- Failure and error analysis. We use è to identify errors, biases and failures in the selected classes.
- Data slice analysis. Once a failure is found, we conduct a data slice analysis to identify the potential root cause of the failure.
- Fixing dataset. We use one of several methods to fix the data failures we found.
- Model performance comparison. We systematically compare how our efforts to build a fixed dataset impact model performance 🏁.
In the rest of the series, we will walk you through all the steps required to build this end-to-end pipeline for your NVIDIA TAO model.
5. Training with the NVIDIA TAO Toolkit
This step assumes you already have a trained model with the NVIDIA TAO Toolkit.
Setting up the NVIDIA TAO Toolkit used to be a nightmare! We even wrote a during the TAO installation (e.g., broken docker images, incompatibility of dependencies).
It turns out that since late 2023, you can now run the NVIDIA TAO Toolkit in a 😉
Please, if you don’t have a model trained with the NVIDIA TAO Toolkit, run the notebook linked above. After you have a trained model, please download the following:
- Dataset: images and annotations (in COCO format).
- Model: training set and test set predictions (in COCO format).
💡 Hint: can help you get up to speed with the COCO format.
6. What’s next
We have defined an ambitious goal for this series: build a Data-Centric pipeline to debug and fix a model trained with the NVIDIA TAO Toolkit.
In this first post, we laid the ground for our task and discussed the challenges we’ll face. We introduced the dataset we’ll use during the series. We demystified some of the many moving parts of the NVIDIA ecosystem, and clarified how they relate to the NVIDIA TAO Toolkit. At a high level, we established the kind of Data-Centric pipeline we’ll build with the help of the è platform. We used the NVIDIA TAO Toolkit to train a model that we’ll utilize in the following posts of this series.
Stay tuned for Part 2! 💙
Authors: Jose Gabriel Islas Montero, Dmitry Kazhdan.
If you would like to know more about è, sign up for a .
