Create Conversational AI Applications With NVIDIA Jarvis
I show you an overview of the NVIDIA Jarvis framework for conversational AI and how to get started with it.
#more
NVIDIA Jarvis is an end-to-end application framework for multimodal conversational AI services that delivers real-time performance on GPUs.
In this Tutorial I show you an overview of this framework and how to get started with it. We're also having a look at how to use the Python API to connect to different services.
- Official Resources: https://nvda.ws/3afJXJW
What does the framework include?¶
Jarvis is a fully accelerated application framework for building multimodal conversational AI services that use an end-to-end deep learning pipeline. It is optimized for inference to offer end-to-end real-time services that run in less than 300 milliseconds (ms) and delivers 7x higher throughput on GPUs compared with CPUs.
Additionally, it includes pre-trained conversational AI models and tools to easily finetune it to achieve a deeper understanding of a specific context
Different services¶
Jarvis offers multiple services that can be combined to build various types of applications, such as:
- Automatic speech recognition (ASR)
- Natural language understanding (NLU)
- Text-to-speech (TTS)
- Domain-specific fulfillment services
With those services we can fuse speech and vision to offer accurate and natural interactions in virtual assistants, chatbots, and other conversational AI applications. To take full advantage of the computational power of the GPUs, Jarvis is based on Triton to serve neural networks and ensemble pipelines that are running efficiently with TensorRT.
The services that Jarvis provides are exposed through API operations accessible using gRPC endpoints that also hide all the complexity to application developers. The API server can be run in a Docker container and accessed from the client with simple gRPC calls.
E.g., the following code shows a simple Python script that connects to the server and uses the TTS service with a simple request-response mechanism:
import numpy as np
import grpc
import src.jarvis_proto.jarvis_tts_pb2 as jtts
import src.jarvis_proto.jarvis_tts_pb2_grpc as jtts_srv
import src.jarvis_proto.audio_pb2 as ja
# Create a gRPC channel to the Jarvis endpoint:
channel = grpc.insecure_channel('localhost:50051')
jarvis_tts = jtts_srv.JarvisTTSStub(channel)
# Create a TTS request:
req = jtts.SynthesizeSpeechRequest()
req.text = "We know what we are, but not what we may be?"
req.language_code = "en-US"
req.encoding = ja.AudioEncoding.LINEAR_PCM
req.sample_rate_hz = 22050
req.voice_name = "ljspeech"
# Send request to the service and get the response:
resp = jarvis_tts.Synthesize(req)
audio_samples = np.frombuffer(resp.audio, dtype=np.float32)
Create State-of-the-Art Deep Learning Models¶
The framework offers state-of-the-art pre-trained models that have been built with more than 100,000 hours on NVIDIA DGX™ systems for speech, language understanding, and vision tasks. Pre-trained models and scripts used in Jarvis are freely available in NGC™.
Models can then be finetuned either with the Transfer Learning Toolkit (TLT), a zero coding approach, or with NeMo, an open-source toolkit on top of PyTorch.
Easy Deployment¶
Jarvis offers an end-to-end pipeline that includes an easy deployment in the cloud or at the edge. Only one command is needed to deploy the entire Jarvis application or individual services through Helm charts on Kubernetes clusters.
Getting Started¶
To get started, I recommend to follow the official introductory resources and the quick start guide. Make sure to install all prerequisites first, and check the Support Matrix for the list of supported hardware and software requirements.
FREE VS Code / PyCharm Extensions I Use
✅ Write cleaner code with Sourcery, instant refactoring suggestions: Link*
Python Problem-Solving Bootcamp
🚀 Solve 42 programming puzzles over the course of 21 days: Link*