This article illustrates Humanativa’s approach to Generative Artificial Intelligence services, with a particular focus on Large Language Models (LLMs). We present an overview of these technologies, followed by a methodological comparison of different approaches and their impact on customer solutions.
In subsequent articles, we will explore:
- How to improve the LLM’s ability to generate accurate responses in natural language through RAG (Retrieval-Augmented Generation) technology.
- The innovation of Humanativa’s LLM solutions, where the experience and research conducted by our Competence Center is transformed into LLM solutions.
- This will be followed by a further focus on Data Preparation for RAG, known as Indexing, exploring new pre-processing techniques using new Visual Language Models (VLMs) (artificial intelligence models that combine artificial vision and natural language processing capabilities).
Generative AI and LLM: a new frontier in communication
Generative AI represents one of the most fascinating developments in artificial intelligence. Unlike traditional technologies, designed to recognize, classify, or predict, this new generation of AI is capable of creating original content: text, images, audio, and video. Through training on huge data sets, generative models learn complex patterns, developing the ability to generate realistic and innovative outputs.
From the point of view of consumers and the general public, the hype generated by generative AI is evident, as it has captured the collective imagination:
- its “creative” aspect leads users to believe that it is sufficient to provide a description (the so-called prompt) and ask an AI model to produce stories, scripts, compose music, or create images and videos. Generative AI is already a driving force in sectors such as entertainment and advertising, for example.
- But also on the need to regulate the use of AI, on issues related to bias, i.e., in cases of distorted, harmful, or human prejudices.
Despite this general “noise,” the positive aspect is that this hype has somehow reinvigorated the major players, universities, and AI and ML service companies, who are aware that this is a “historic” moment for AI, the right time to invest and showcase the myriad possibilities that AI can offer in various industrial fields. Today, there is great interest in healthcare, medicine, industrial design, process automation, and the enormous revolution in the field of knowledge management. The field of knowledge management has always been the focal point for companies that want to make the most of their corporate data knowledge to generate services. In this case, with the use of Virtual Assistants based on Generative AI, there are high expectations around issues such as quick and easy access to information, acceleration of decision-making processes, acceleration of maintenance services, and process automation. All this sounds very interesting because for a company it means “saving time.”
Large Language Models (LLM)
One of the most promising developments in Generative AI is Large Language Models (LLM). LLMs are advanced models based on billions of parameters and are designed to understand and generate natural language with a level of accuracy never before achieved. The emergence and development of LLMs has accompanied the “renaissance” of AI in recent years, thanks to the increase in digital data and improved computational capabilities, as well as the introduction of deep learning algorithms.
Large Language Models, such as the well-known GPT, have emerged thanks to advances in neural networks, increased computing power, and abundant data, revolutionizing natural language processing. Here are some examples of possible services:
- Answering questions: They can provide detailed information and answers to questions asked by users.
- Writing assistance: They help write and edit texts, such as articles, emails, and reports.
- Machine translation: They can translate text from one language to another.
- Sentiment analysis: Used to analyze tone and emotions in texts such as reviews or social media posts.
- Virtual assistants: They power virtual assistants and ticket service systems to improve customer service and user interaction. With this technology, the era of “chatbots” is coming to an end because the limitations of traditional chatbots lay precisely in the rules on which they were based and the need to update them. In many user services, chatbots ended up generating frustration and inadequate responses.
- Content generation: They create original content such as stories, news articles, and scripts.
- Code generation: LLMs are capable of generating code and support various programming languages. They can also generate SQL queries, suggesting scripts based on a description in natural language. For example, some models such as GPT-4.1, GPT-5-Codex, and Claude Sonnet 4.5 are trained on a wide variety of programming languages.
Given the characteristics outlined above, these models are becoming increasingly popular in various industrial sectors for automating and improving processes involving human language. This has led to widespread interest among large companies in acquiring a Virtual Assistant specifically designed for their industrial environment. We will highlight Humanativa’s solutions in this area later on.
The Value of the Competence Center Experience
Our Competence Center keeps a close eye on the ongoing evolution of LLM models and various techniques and approaches, always taking into account certain factors that are essential when proposing a solution to a customer. Here are a few of them:
- Open and Closed Models
- Closed: Proprietary cloud-based models such as GPT (OpenAI), Gemini (Google), Claude (Anthropic)
- Open: On-premise executables such as Llama (Meta/Facebook), Mistral (Mistral AI)
- Size and related capabilities
- Small models with low reasoning capabilities, useful for simple tasks or for research and testing
- Medium models useful for more advanced but non-critical tasks, as they are prone to errors
- Large, multilingual models
- Multi-modal models
- Analysis of general weaknesses
- Hallucinations: cases in which the LLM model’s response is incorrect or out of context, responding with information, data, or references inconsistent with the question asked
- Limited input data size
- Training/fine-tuning cost
- Multilingual, especially for the Italian language, particularly the quality of responses in Italian
- Capacity in Italian, both in terms of the quality of responses and the ability to follow instructions in Italian
- Response times
- Model size, a factor that cannot be overlooked even in the case of on-premise execution
- The cost of querying a model
- Number of parameters
- Maximum number of tokens in a prompt
- Cost per token
- Compact and non-compact models
On this last point (the cost of querying a model), to give an idea of processing capabilities, the following table shows some values such as “parameters” and the maximum number of “tokens” (text units, parts of a word, or whole words) that each model can handle in a prompt. Parameters refer to a numerical coefficient (weight or bias) that the model learns during training. In practice, more parameters mean greater ability to represent complex patterns, but also:
- Higher computational cost (training and inference)
- Higher memory consumption
- Potentially better reasoning/understanding capabilities
Model Name | Number of Parameters/Activities | Maximum Context Tokens | Release Date |
GPT‑4.1 | between ~1.0 and ~1.8 trillion parameters | ~1 000 000 tokens | April 2025 |
Gemini 2.5 | unconfirmed — often reported between ~200B – 600B | ~1 000 000 tokens | May 2025 |
Claude (Sonnet / Opus) | undisclosed | ~128k–200k tokens | March 2025 |
PaLM 2 | hundreds of billions (estimate) | ~8k–32k tokens | May 2023 |
Llama 4 (Scout / Maverick) | variable MoE (~17B active) | 100k–1M tokens | August 2025 |
Mistral Large 2 | ~123B | 128k tokens | July 2025 |
Mixtral 8×7B (MoE) | ~47B total (~13B active) | 32k tokens | January 2024 |
Falcon 180B | ~180B | tens of thousands of tokens | September 2023 |
DBRX (MoE) | ~132B total (~36B active) | 32k tokens | March 2024 |
Phi‑2 (SLM) | ~2.7B | tens of thousands of tokens | December 2023 |
DeepSeek‑Chat / DeepSeek‑Reasoner | 16B / 236B total (~21B active) | 128k tokens | August 2025 |
The table shows how some models are growing in terms of the number of parameters and tokens they offer. However, considerations regarding the costs of a prompt must be added to this information.
Large models are often proprietary models and offer cloud-based API query services, with query costs based on tokens.
In addition, other information must be considered that is not dictated solely by the volumes they can handle, but is related to processing costs. For this reason, several LLM manufacturers are now producing “compact” models, since a compact model:
- It is designed to be smaller, more efficient, and more economical in terms of computational resources, while maintaining good language comprehension and generation capabilities.
- It is a compromise between computational power and linguistic ability, making it ideal for practical applications that require efficiency and speed rather than the maximum accuracy or token capacity of a large LLM.
- Of course, following the market, faster and cheaper smaller models are an additional challenge among competitors because there is greater focus on cost and providing open models. Some examples: Google’s Gemma models or Microsoft’s Phi, but also Mistral, which rivals both large and compact models from OpenAI and Anthropic.
Our Competence Center observatory on LLM models is based on this combination of “Efficiency and Speed” vs. “Accuracy or Token Capacity” as balancing these characteristics is crucial to deriving a solution that is “fit” for the customer’s needs. For example, the creation of a Virtual Assistant in a “legislative” domain requires greater attention to the accuracy of the response, and cases of hallucination must be avoided.