Building an AI Digital Human – Showing How to Capture a Customer’s Attention

The Solution Supermicro Built in 2 Weeks for the 2025 NRF Big Show Event

Supermicro’s Digital Human at the NRF Big Show, January 2025

As we prepared our Supermicro booth design and content for the National Retail Federation’s (NRF) Big Show in January 2025, we knew we wanted to have a demonstration of a digital human that was capable of interpreting questions from a visitor to our booth and providing detailed responses. Importantly, we knew the demonstration needed to be hosted locally to ensure a conversation between a human and a digital human would occur within the normal latency parameters (50 – 100ms) expected when talking to another human.

This article delves into the back story of why we chose to demonstrate a digital human, the challenges we faced and overcame them in only 2 weeks, and the reception we got from people who interacted with our digital human.

The Story We Wanted To Tell

Given our decision to showcase a digital human application at the NRF Big Show, we wanted to pick a compelling story in the context of a retail setting. We also wanted to pick a real-world example that would show the inherent value of a digital human. The use case we decided to show was a digital human that would respond to natural language questions about menu options at a restaurant.

While this use case may appear simple, it gave us the potential to tell a story to three different audiences who could come by our booth. These audiences can be described as follows:

A restaurant manager/owner. Our goal for this persona is to show them an application that could easily be deployed and provide value for their business. A positive digital human experience would reduce the difficulty of keeping employees up to date with changes to menu choices and options. This person would know that employee churn is expensive, and being able to show that it is possible to have a digital human that enriches their customers’ experience at a cost point that saves their business money.
A technologist. This person would want to know how we created our digital human, what our challenges were, and how we overcame them. We aim to show the reality of creating a digital human and why it is a replicable opportunity.
A consumer. This would be someone who was a consumer of restaurant services, so virtually everyone who sees the demonstration would meet this definition. This person would judge the digital human on the ease-of-use, the accuracy of its answers, and ultimately the believability of its interactions. Our goal is to show how simple it would be to use a digital human in their everyday life.

Not Our First AI Digital Human Experience

Our desire to build a digital human demonstration for the 2025 NRF Big Show was not Supermicro’s first experience with creating a digital human. We previously worked with two ISV partners to create a digital human capable of answering product recommendation questions for Supermicro’s product portfolio. In fact, this was one of two digital human demonstrations we showed at the 2024 Mobile World Congress event in Barcelona, Spain.

While that first experience was a positive one, especially seeing people’s real-time interactions with our digital human, it took a lot of coordination work, especially with three parties involved to make it happen. This time around, we decided we would do all the work on our own. In examining the time and energy it took to build the demonstration on our own, we determined it was definitely more manageable when we had resources and the ability to make rapid decisions within our organization.

What Did We Need To Do To Create Our Digital Human

We started by bringing together a team of our in-house AI experts and our retail market experts to decide what use case we wanted to showcase and what messaging we wanted to convey as discussed above.

Next, we documented the requirements and created a proposed timeline to develop, test, and troubleshoot a prototype. We also planned to repeat one or more of these cycles that would be needed to finalize our digital human. We also needed to make sure to leave enough time to ship our edge servers which hosted the LLM and the digital human application, to the NRF event, and get it set up at our booth.

We were off and running to make it happen once we had our requirements and a timeline that we thought we could meet.

What We Used To Build Our Digital Human

One of our earliest decisions was to leverage our strong relationship with NVIDIA, a market leader for AI development and implementation tools. We built our digital human based on NVIDIA’s Digital Human Blueprint. This accelerated our time to value (in our case, a working demonstration) while implementing NVIDIA’s best practices. Furthermore, it allowed us to focus on the customization that differentiates each demonstration, such as the avatar characteristics and LLM customization.

NVIDIA’s development tools made it easy for us to customize the LLM portion of the solution using retrieval-augmented generation (RAG). RAG is an AI technique that combines a retrieval model with a generative model. It retrieves related information from a database or document set and uses it to generate more accurate and contextually relevant responses. In our project, we connected a RAG pipeline to our restaurant’s specific information to have the latest details on their food and drink options, pricing, hours of operations, and other factors, such as identified weekly specials. This ensured that our data was up to date and our digital human was well “trained”.

Please read Appendix A for the technical details on building our digital human.

Meeting the Requirements For an Edge Server to Enable a Digital Human

One of the advantages we had in developing our digital human was a pre-existing portfolio of edge servers designed to support the requirements of edge AI applications. We had the following requirements for the server that was going to host/run our digital human demonstration:

The GPU processing pipeline for the LLM and RAG required a system with two NVIDIA L40S.
The front-end system needed to support two NVIDIA L40S and CPU computing to support the user experience: convert speech to text (human to machine), convert text to speech (machine to human), animating and rendering the avatar, and synchronizing the avatar’s lips with what they are saying.
Each system needed to store application containers and data.
The back-end system needed to be able to host the database to support the RAG pipeline.
A typical retail environment wouldn’t have full-size rack space.

To meet these requirements, we selected the Supermicro SYS-221HE-FTNR system, which is part of our Hyper-E family of servers. We chose this server specifically because it:

Short-depth system optimized for edge deployments where data center racks are not available
Dual-processor system capable of holding the required types and quantities of GPUs for AI acceleration

A Successful Conclusion

Besides the countless visitors who came to our booth, we also welcomed five tour groups hosted by NRF. These groups typically consisted of a dozen or more show attendees looking for an immersive experience. Supermicro was selected to be a stop on this guided tour because of our demonstration of a digital human experience.

The demonstration resonated very well with visitors. Many not only saw the value of the use case we were demonstrating, but also began to brainstorm on how the system could be adapted to their business needs, leading to several great conversations and subsequent post-show meetings.

Appendix A – Technical Details Of Building Our Digital Human

System Setup

The digital human consists of two systems: a front-end system and a back-end system. The front-end system is responsible for rendering the digital human, while the back-end system is responsible for running the RAG pipeline and hosting the LLM models.

Start by obtaining an NVIDIA API key to download the models and container images. You can obtain an API key by going to the NVIDIA API Catalog. There should be a Develop with this Blueprint pop-up, but if it doesn’t appear, click on the Develop with this Blueprint button. Then, click on the Generate API Key button and save this key.

Front-end System Setup

Install Ubuntu 24.04 and ensure all packages are up to date.
Install OpenSSH server.
Make sure that the user has sudo privileges and can run sudo without a password.

Back-end System Setup

Install Ubuntu 24.04 and make sure that all packages are up to date.
Install OpenSSH server.
Make sure that the user has sudo privileges and can run sudo without a password.
Generate an SSH key pair for the user and copy the public key to the front-end system. This will allow the front-end system to connect to the back-end system without a password. Replace your_email@example.com with your actual email address:
```
ssh-keygen -t rsa -b 4096 -C "your_email@example.com"
```
Copy the public key to the front-end system, replacing <user> with your username and <frontend_ip_address> with the actual IP address of the front-end system:
```
ssh-copy-id <user>@<frontend_ip_address>
```

Provision the Front-end System

On the back-end system, download the deployment script by cloning the ACE GitHub repository.
```
git clone https://github.com/NVIDIA/ACE.git
```

Navigate to the baremetal one-click script directory:

cd ACE/workflows/tokkio/4.1/scripts/one-click/baremetal

Configure the deployment script by setting the environment variables. Replace the information in the myconfig.env with the actual values for this setup.

cp config-template-examples/llm-ov-3d-cotrun-1x-stream/my-config.env my-config.env nano my-config.env
export OPENAI_API_KEY="<replace-with-openai-api-key>"
export NGC_CLI_API_KEY="<replace-with-your-NVIDIA-personal-API-Key>"
export NVIDIA_API_KEY="<replace-with-you-NVIDIA-Personal-API-Key>"
export APP_HOST_IPV4_ADDR="<replace-with-the-ip-address-of-front-end-system>"
export APP_HOST_SSH_USER="<replace-with-the-username-of-front-end-system>"
export COTURN_HOST_IPV4_ADDR="<replace-with-the-ip-address-of-front-end-system>"
export COTURN_HOST_SSH_USER="<replace-with-the-username-of-front-end-system>"

Copy the config template file and edit the values to match this setup.

cp config-template-examples/llm-ov-3d-cotrun-1x-stream/config-template.yml my-config-template.yml

Run the deployment script to provision the front-end system. This will take a while to complete, so be patient.
```
source my-config.env
./envbuild.sh install --component all --config-file ./my-config-template.yml
```
Verify that the front-end system is up and running by running the following command on the front-end system:
```
kubectl get pods -n app
```

Provision the RAG Pipeline and LLM Models

On the back-end system, do the following steps to provision the RAG pipeline and LLM models:

Install Docker and Docker Compose.
Install the latest NVIDIA drivers.
Install and configure the NVIDIA Container Toolkit.
Follow the instructions here https://github.com/NVIDIA-AI-Blueprints/rag for deploying using Docker Compose.
Replace the NIM used with the Llama 3.1 8B one.
On the front-end system, follow the instructions here to customize the Digital Human’s RAG endpoint:
https://docs.nvidia.com/ace/tokkio/4.1/customization/customize-reference-workflows.html#rag-endpointcustomization

Rackmount Servers

1U Dual Processor

2U Dual Processor

Single Processor

Multi-Processor

Product Families

GPU Servers

8U/10U GPU Lines

4U/5U GPU Lines

2U GPU Lines

1U GPU Lines

Twin Servers

FlexTwin™

BigTwin®

GrandTwin®

TwinPro®

Twin

FatTwin®

Blade Servers

SuperBlade®

MicroBlade®

MicroCloud

Storage Servers

All Storage Systems

All-Flash NVMe

Top-Loading Storage

JBOF

Petascale Grace Storage

Enterprise-Optimized Storage

JBOD Storage Enclosures

Motherboards

Server Boards

Workstation Boards

Embedded / IoT Boards

Desktop / Gaming Boards

Previous Gen.

Motherboard Matrix

Global SKUs

Chassis

1U Chassis

2U Chassis

3U Chassis

4U / Tower Chassis

Mid / Mini-Tower

Embedded / IoT Chassis

Mobile Racks / Drive Kits

JBOD Storage Enclosures

Global SKUs

SuperRack®

Data Center Solution Engineering (DCSE)

Rack Integration Service

Accessories

Cable Matrix

Riser Card Matrix

Storage AOC Matrix

Power Supply Matrix

Heatsink Matrix

System Fan Matrix

Mobile Racks / Drive Kits

Front Chassis Bezels

Storage, I/O, Security

Edge & Telecom Servers

Fanless Edge Systems

Compact Edge Systems

Edge GPU Systems

Outdoor Edge Systems

1U Edge Network Systems

5G/Telecom Systems

Embedded Components

Embedded Motherboards

Embedded Chassis

Switches

Adapters

SuperWorkstations

Liquid-Cooled AI Development Platform

Single-Processor

Dual-Processor

Supero™ Gaming Solutions

AI Infrastructure

Data Center Building Block Solutions® (DCBBS)