What latency means #
The latency indicates the time that passes between the end of the Player’s intervention and the start of the avatar’s response.
In a real-time voice simulation, latency does not depend on a single element. Before the avatar starts speaking, the system must:
- detect that the Player has finished speaking;
- send the content to the AI model;
- generate the avatar’s response;
- produce the voice audio;
- synchronize the response with the Web or VR experience;
- display or animate the avatar consistently.
For this reason, the avatar’s response is not instantaneous like in a live human conversation, but is generated in real time through an AI pipeline.
Expected response time #
In the current state of the platform, the average perceived time for the avatar’s first word after the Player finishes speaking is indicatively:
- Web: approximately 2.8 seconds
- VR: approximately 2 seconds
These values represent a realistic operational baseline of the current experience.
Latency can vary based on several factors, including:
- Internet connection quality;
- device used;
- browser;
- Web or VR channel;
- simulation language;
- selected voice;
- system load;
- number of active sessions at the same time;
- any additional components such as avatars, visual synchronization or lipsync.
Why there is a brief wait before the response #
The brief wait before the avatar’s response allows the system to generate a coherent, contextual and vocalized response.
This is not simply pre-recorded audio.
The avatar listens to the Player, interprets the conversation content and produces a response consistent with:
- simulation scenario;
- Avatar Persona;
- configured language;
- access mode;
- training objective;
- conversation state;
- any emotional or behavioral signals.
This processing requires a few seconds.
Latency and simulation quality #
A slightly higher latency can be acceptable when it allows the avatar to maintain greater coherence, quality and safety in the response.
The platform’s goal is not to produce instantaneous responses at any cost, but to offer an interaction that is:
- realistic;
- stable;
- formative;
- consistent with the avatar’s role;
- safe for the Player;
- useful for subsequent analysis.
Latency should therefore be understood as part of the real-time AI experience, not as a technical error.
What simultaneous sessions are #
The simultaneous sessions indicate how many simulations can be run at the same time.
This concept is different from the total number of users or the total number of simulations created.
For example, a tenant can have:
- many registered Players;
- many available simulations;
- a certain number of active runtimes;
- but only a defined number of sessions usable simultaneously.
Simultaneous sessions therefore represent the system’s operational capacity in real time.
Practical example #
If a tenant has 10 simultaneous connections available, it means that up to 10 Players can be simultaneously inside an active simulation.
If an eleventh Player tries to access while all sessions are occupied, the system may temporarily prevent access or manage it according to the rules provided by the platform.
Difference between users, runtimes and simultaneous sessions #
It is important to distinguish three concepts.
Users #
These are the people registered on the platform.
Example:
- Tenant Admin;
- Manager;
- Player.
Active runtimes #
These are the operational accesses to simulations, i.e. the simulations made available to Players with specific settings for language, channel, voice, avatar, attempts and availability.
Simultaneous sessions #
These are the simulations that can be run at the same time.
A tenant can therefore have many users and many active runtimes, but a more limited number of simultaneous sessions available.
What to expect during use #
During the experience, the Player may perceive a brief pause between the end of their intervention and the start of the avatar’s response.
This pause is normal and is part of how the AI voice simulation works.
Under standard operating conditions, the platform aims to maintain a fluid response time compatible with a real-time simulated conversation.
Any variations may depend on the technical context and session configuration.
Final result #
Latency and simultaneous sessions are two fundamental elements of the real-time experience.
Latency describes the time required for the avatar to generate and start the response.
Simultaneous sessions describe how many simulations can be used at the same time.
Understanding these concepts helps Tenant Admins, Providers and Players have correct expectations about the experience and correctly interpret the platform’s behavior during simulations.
