ITA | ENG
BLOG AND EMILI / 50 shades of AI virtual agents: types, differences and potential

50 shades of AI virtual agents: types, differences and potential

Image for AI virtual agents: types, differences, potential
AI virtual agents: types, differences and potential
11:43

Defining the concept of a virtual agent in 2024 can be complicated. Sometimes decision-tree chatbots are referred to as virtual agents, sometimes they are just a few simple prompts on OpenAI, and other times they are fine-tuned models tailored to a specific business reality.

Virtual agents in 2024 represent an opportunity, and there is no one-size-fits-all solution that can cover every need. All types of agents can work perfectly, if in the right situation; the context makes the difference. Let's look at the different types.

Types of virtual agents

Decision tree

These are the old-fashioned, pre-AI agents. They are based on a decision tree of preset Q&As that guide the user through a predetermined path. These agents are not AI, but they can work well if the user's journey can be outlined in a very linear way. They are usually recognizable because they are not able to answer to vague questions and they also have a button-based interface with preset questions. By entering a question that strays from the predetermined path, it is very easy to encounter responses like "i did not understand" or "i'm not able to answer". 

Prompting with “ChatGPT”-like assistants  

This category includes all the agents created with a simple prompting directly on Open AI's ChatGPT or other generalist large language model (LLM) assistants. These agents are much more conversational and human-like than the previous ones, but they still have quite limited capabilities.

Prompting, RAG e functions on “ChatGPT”

This agent makes use of the retrieval-augmented generation technique (RAG) to access web documents or PDFs, and also the integration of  external functions into the model to query corporate knowledge databases. The agent's abilities grow to match the complexity of the contents and of the deployed functions. It is possible to adjust the tone of the response, change the behavior and integrate any external service. 

Prompting, RAG, functions and other advanced tecniques for fine-tuned models

This is the last step in the virtual agents development. By starting with an open-source generalist model (e.g., LLama) or a closed sourced one (e.g., OpenAI GPT 3.5/4o/4o mini), it is possible to change the LLM's behavior through fine-tuning techniques. In this scenario, you can adjust the tone's response, enhance the model's reasoning skills and shape its behavior across different situations. The model can be 'taught' to interpret highly specific contexts and use cases.

Pros and cons of the different types 

Each type of virtual agent comes with both advantages and disadvantages. As the complexity grows, so do the costs related to development, hosting, and validation. Simultaneously, the agent’s capabilities, precision, and reliability also increase: 

  • Decision tree: The benefit lies in the assurance of a response (when the path is followed) and the much lower cost compared to the other types. However, the drawbacks of this model include its limited range of interaction, lack of a ‘human’ touch, and risk of backfiring if used in the wrong context, potentially failing the user and causing frustration.
  • Prompting with “ChatGPT”-like assistants: the user experience improves compared to the previous case, but the agent skills remains limited, as does the control over the chatbot's behavior. In this scenario, there is a higher exposure to the problem of hallucinations and the possibility that the agent uses its basic training to generate responses. In some contexts, the basic training might be at odds with the company’s goals and policies.
  • Prompting, RAG and functions on “ChatGPT”: this step is the very first example of a virtual agent. It offers extensive capabilities and a strong understanding of context and user requests. The range of behaviors it can produce depends on the functions added and the precision of the prompting. The control over response quality is very high and the exposure to hallucinations is much lower. However, not all behaviors of the LLM can be altered, and understanding very specific contexts may still be difficult.
  • Prompting, RAG, functions and other advanced techniques for fine-tuned models:
    The agent is designed to align with the business and specific use cases. Prompting becomes easier because the desired behaviors are embedded directly into the model. The agent’s capabilities can be incredibly extensive and diverse. There is greater control over its behavior compared to the previous approach, as well as an improved ability to understand highly specific contexts. On the downside, a fine-tuned model can prove to be more rigid and less suitable for scenarios where the content requires frequent updates.

As we move from decision tree models to fine-tuned ones, the ways services are used and the related costs can change significantly, sometimes drastically. The last two types present greater challenges and advantages, which will be our focus.

LLM and AI agents

LLM, a frequently misunderstood term. Let's begin by explaining what it means.

A Large Language Model (LLM) is essentially a mathematical (statistical) model. Every LLM is based on mathematical and statistical principles and should be seen through that lens. When first created, the model is empty, much like the brain of a newborn. It has neurons, but the synapses are not yet formed. 

Just like a child, it needs to learn from its surroundings. For LLMs, this learning process involves training on vast amounts of data, with the internet being the main source for  generalist models. Training shapes the synapses, building the model’s (statistical) 'reasoning' abilities. This process results in a set of parameters (billions, tens, or hundreds of billions and beyond) that represent the model’s reasoning capabilities. LLMs don’t store specific data, but rather a comprehensive synthesis of all the information used during the training process.

Generalist models have a wide-ranging understanding of various subjects; they can cover topics like history or architecture, compose poetry, or help with troubleshooting engine performance issues. It all depends on the data that were used for their training. 

Prompting, RAG and functions on “ChatGPT”

The idea behind this approach is to use a generalist LLM model and create an enhanced version without modifying the model itself. Several methods can be used to accomplish this: 

  • Prompting: a LLM can be trained. Through prompting, you can instruct the model on what it can or cannot do, which languages it should use, the tools at its disposal, the tone it should maintain during interactions, and many other aspects. 
  • RAG: by using Retrieval Augmented Generation, it's possible to provide the LLM with documents that it wouldn't typically have access to. For instance, it can retrieve information from a company's knowledge base, confidential information, or real-time data from a website. 
  • Functions: they enable the expansion of a model's abilities. For example, it can search the branches of a company, run simulations, calculate estimates, and much more.

The pricing system of an LLM is generally token-based. A token is the minimum unit of text sent to the model as input and returned as output. This means there is no hosting fee, but there is an infrastructure cost for the solution built around the model and for the number of 'words' processed by the model. RAG content and functions are also counted as tokens.

In this scenario, there is no minimum user requirement, which makes it ideal for POCs/MVPs and production systems where the business model favors consumption-based payment (per token).

Prompting, RAG, functions and other advanced techniques for fine-tuned models

When generalist knowledge falls short, the model can be refined using a fine-tuning approach. This process changes the internal behavior of the model by reshaping the synapses built during training. This allows the model to grasp very specific situations and acquire abilities that wouldn't be achievable otherwise.

The pricing model usually includes a hosting charge for the fine-tuned LLM because it is built for a specific use case (yours). These costs are much higher than a token-based approach and are reasonable when the token model becomes unsustainable or the enhanced model's capabilities are insufficient. 

In order to fine-tune a model, you need a pre-existing dataset of messages that the model can learn from (such as call center conversations or generated data) and a precise understanding of the features you wish to integrate into the model. Once fine-tuning is completed, the model is 'frozen,' and making further adjustments may require a new round of fine-tuning. 

Which should you pick?

Based on the business environment, the allocated budget, and the anticipated impact of the agent multiple right answers are possible.

As with any new development, we could start with a hypothesis and work to validate it as quickly as possible with minimal budget investment. The fine-tuned model represents a significant opportunity for the company, but it also marks the end of a process that will lead to changes in some business processes, and thus, it should be implemented gradually.

A gradual approach 

The starting point of a major revolution can surely be a proof of concept (POC). A POC helps validate the hypothesis internally and assess whether the chosen solution can meet the fundamental needs , and, as frequently occurs, lead to new needs. 

The following stage is an MVP (minimum viable product), where the agent is exposed to the target audience for validation. This time, the validation is external, meaning a group of users, not from the project team, will be asked to evaluate the agent's capabilities.  

Every step, from the POC to the MVP, will require refinements and revisions to enhance the agent’s behavior. Costs will increase in line with the process's development. 

Do I necessarily need to start with a fine-tuned model for a POC? Luckily, the answer is often no. Unless you are in very particular situations or under strict regulatory constraints (such as in environments with tight data compliance requirements), a generalist LLM can be used, and by leveraging RAG, functions, and advanced prompting, you can create a highly functional and valuable agent. The same strategy can be used for the MVP and production agent if the workload isn’t too heavy and the response quality is sufficient. 

So, which option should you go for?

Once again, the answer is 'it depends.'

  • When the requirement involves a particular process flow, especially one that's strictly defined at its core, a decision-tree chatbot could be a good starting point.
  • If you need to add a translation feature or generate SEO metadata for a CMS, a basic ChatGPT prompt is an affordable and effective way to meet the need.
  • For a customer support agent, an evolved Q&A system, a helpdesk, and other more extensive and complicated use cases, an AI system with advanced functionalities is essential.
  • In exceptional cases, a fine-tuned model may be required even for a POC. When the operational environment is highly specific or not publicly available, a generalist model might struggle to manage the request, even with prompting, RAG, and advanced functions. 

 

From theory to practice

Let’s talk about an actual use case from the project we worked on for Credem Banca.

nl-q2_emily

By applying the POC-MVP-user testing process, we built a virtual agent to assist users on the official website of the bank.

Check out the case study on Emily, Credem Bank’s virtual agent.