{"id":416,"date":"2025-11-04T11:05:40","date_gmt":"2025-11-04T10:05:40","guid":{"rendered":"https:\/\/machinelearning.humanativaspa.it\/en\/?p=416"},"modified":"2025-11-04T11:07:05","modified_gmt":"2025-11-04T10:07:05","slug":"from-llms-to-vlms","status":"publish","type":"post","link":"https:\/\/machinelearning.humanativaspa.it\/en\/from-llms-to-vlms\/","title":{"rendered":"From LLMs to VLMs"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">In this article, we continue to share our experiences in the field of Large Language Models (LLMs), focusing in particular on how Visual Language Models (VLMs) are revolutionizing document pre-processing in RAG systems.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">VLMs are the meeting point between vision and language, between visual content and text.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">They represent the next step in the evolution of AI for document understanding and are an essential component of next-generation RAG systems.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">With VLMs, document pre-processing in RAG systems is no longer a simple technical step, but becomes a phase of <strong>intelligent data interpretation<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">From the era of LLMs to multimodal understanding<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">In recent years, <strong>Large Language Models (LLMs)<\/strong> \u2014 such as GPT-4o, Claude 3.5, or Llama 3.1 \u2014 have transformed the way companies manage and interpret textual data.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">From the automatic generation of intelligent responses in support systems to the semantic analysis of business reports and logs, LLMs have become essential tools for improving efficiency and decision quality.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">However, most business documents are not made up of text alone.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Technical projects, reports, engineering drawings, or functional specifications contain <strong>images, diagrams, charts, and tables<\/strong> that convey crucial information but are difficult to interpret with purely linguistic models.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This is where a new generation of models comes into play: <strong>Visual Language Models (VLMs)<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What are VLMs and why are they a fundamental component of RAG systems<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>VLMs<\/strong> combine the visual capabilities of computer vision models with the semantic understanding typical of LLMs. In other words, a VLM is able to <strong>\u201csee\u201d <\/strong>and <strong>\u201cread\u201d<\/strong> <strong>simultaneously<\/strong>, interpreting images, text, and graphic structure as a single coherent language.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In a <strong>Retrieval-Augmented Generation (RAG)<\/strong> architecture, the use of VLMs represents a turning point in the data preparation phase, which includes document pre-processing, chunking, data enrichment, embedding, and indexing in a vector store. More specifically, for the pre-processing phase, a VLM can analyze documents at a visual and semantic level, returning structured representations enriched with metadata.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In practice, while traditional OCR extracts only text, a VLM is capable of understanding <strong>diagrams, legends, tables, and visual relationships<\/strong>, providing a more comprehensive database for retrieval and the generation of high-quality responses.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">An example of a VLM pipeline for pre-processing in RAG systems<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">To understand the potential of VLMs in <em>document understanding<\/em>, let&#8217;s imagine a pre-processing pipeline designed to process complex documents\u2014for example, PDFs containing technical diagrams, tables, and illustrations. These are the main stages of the pipeline:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Multimodal conversion and analysis<br><\/strong>Each page of the document is converted into an image and sent to an advanced VLM model (such as Gemini 2.5 Pro, GPT-4o, or an open-source model such as LLaVA-NEXT). The model simultaneously interprets text, layout, and visual components, returning a <strong>semantic understanding at the \u201cpage\u201d level<\/strong>.<br><\/li>\n\n\n\n<li><strong>Structured extraction<br><\/strong>The result of the analysis is translated into<strong> structured data<\/strong>, such as a JSON file describing text, coordinates, types of visual elements, and spatial relationships. This step provides a unified view of the document, which is useful for subsequent segmentation or intelligent <em>chunking<\/em> operations.<br><\/li>\n\n\n\n<li><strong>Synthetic data generation and fine-tuning<br><\/strong>In the absence of labeled datasets, the pipeline can generate <strong>synthetic data<\/strong> from public documents or controlled internal repositories. This data is used to optimize the model&#8217;s behavior through <em>fine-tuning<\/em> or <em>prompt optimization<\/em>, improving its accuracy in recognizing specific patterns.<br><\/li>\n\n\n\n<li><strong>Indexing and integration with RAG<br><\/strong>The results are then enriched with metadata and sent to the embedding phase to be indexed in a vector database. In this way, the RAG system can subsequently retrieve both textual and visual information, ensuring more relevant responses based on <em>multimodal<\/em> understanding.<br><\/li>\n\n\n\n<li><strong>Pipeline automation<br><\/strong>The pipeline can be executed asynchronously: for example, a service monitors an S3 bucket or a shared directory, automatically processes each new document, and updates the knowledge base index in real time.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">The main Visual Language Models available today<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Thanks to public benchmarks from platforms such as Hugging Face, it is possible to compare the best-performing VLMs on the market today. Below is an updated summary:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><td><span style=\"color: #dc661d;\"><strong>Model<\/strong><\/span><\/td><td><span style=\"color: #dc661d;\"><strong>Type<\/strong><\/span><\/td><td><span style=\"color: #dc661d;\"><strong>State of the art<\/strong><\/span><\/td><td><span style=\"color: #dc661d;\"><strong>Production Ready<\/strong><\/span><\/td><td><span style=\"color: #dc661d;\"><strong>Notes<\/strong><\/span><\/td><\/tr><\/thead><tbody><tr><td><strong>Gemini 2.5 Pro<\/strong><\/td><td>Proprietary (Google DeepMind)<\/td><td>High<\/td><td>YES<\/td><td>Full multimodal (text, images, video). Excellent for technical documents.<\/td><\/tr><tr><td><strong>GPT-4o<\/strong><\/td><td>Proprietary (OpenAI)<\/td><td>High<\/td><td>YES<\/td><td>High multimodal performance; already used in production environments.<\/td><\/tr><tr><td><strong>Claude 3.5 Sonnet<\/strong><\/td><td>Proprietary (Anthropic)<\/td><td>High<\/td><td>YES<\/td><td>Strong in diagrams and complex visual comprehension.<\/td><\/tr><tr><td><strong>LLaVA-NEXT<\/strong><\/td><td>Open Source<\/td><td>Medium-high<\/td><td>NO<\/td><td>Good balance between performance and openness; still evolving.<\/td><\/tr><tr><td><strong>Qwen-VL-Max<\/strong><\/td><td>Open Source (Alibaba)<\/td><td>Medium-high<\/td><td>NO<\/td><td>Excellent balance between visual accuracy and speed.<\/td><\/tr><tr><td><strong>InternVL 2.0<\/strong><\/td><td>Open Source<\/td><td>Medium<\/td><td>NO<\/td><td>Interesting for PDFs and complex diagrams; experimental phase.<\/td><\/tr><tr><td><strong>Kosmos-2<\/strong><\/td><td>Open Source (Microsoft)<\/td><td>Low<\/td><td>NO<\/td><td>Solid multimodal OCR, but less effective in deep semantics.<\/td><\/tr><tr><td><strong>Fuyu 8B<\/strong><\/td><td>Open Source (Adept AI)<\/td><td>Low<\/td><td>NO<\/td><td>Excellent speed, ideal for prototyping and testing.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><em>Sources: OpenCompass public benchmarks \u2013 Hugging Face<\/em><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">VLM as the key to document understanding in RAG systems<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">In conclusion, <strong>Visual Language Models<\/strong> represent the natural evolution of LLMs, paving the way for systems capable of <strong>understanding multimodal documents<\/strong> in a truly intelligent way.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Within RAG pipelines, their contribution is decisive:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>they make <strong>pre-processing more accurate<\/strong>,<\/li>\n\n\n\n<li>enable truly contextual <strong>semantic chunking<\/strong>,<\/li>\n\n\n\n<li>and allow for <strong>automatic enrichment<\/strong> of visual and textual metadata.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">For organizations that handle large quantities of technical documents, reports, or project diagrams, VLMs offer a concrete advantage: transforming every document \u2014 even the most complex ones \u2014 into <strong>knowledge that can be used by artificial intelligence<\/strong>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In conclusion, Humanativa continues to invest in RAG\/LLM, and Humanativa&#8217;s Competence Center will include Preprocessing Modules using VLM in the next version of the core system of LLM solutions, based on feedback from the first advanced RAG projects.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The new frontier of AI for understanding complex documents<\/p>\n","protected":false},"author":7,"featured_media":417,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[56,2,43],"tags":[68,11,110,112,108,103,102,116,111,105,104,115,114],"class_list":["post-416","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-articoli","category-approfondimenti","category-slideshow","tag-ai","tag-artificial-intelligence","tag-generative-ai","tag-generative-artificial-intelligence","tag-gpt","tag-large-language-models","tag-llm","tag-pre-trained-generative-transformer","tag-prompt-engineering","tag-rag","tag-retrieval-augmented-generation","tag-visual-language-models","tag-vlm"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>From LLMs to VLMs - HN Machine Learning en<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/machinelearning.humanativaspa.it\/en\/from-llms-to-vlms\/\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"From LLMs to VLMs - HN Machine Learning en\" \/>\n<meta name=\"twitter:description\" content=\"The new frontier of AI for understanding complex documents\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/machinelearning.humanativaspa.it\/en\/wp-content\/uploads\/sites\/4\/2025\/11\/Dai-LLM-ai-VLM.jpg\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"pierfrancesco\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/from-llms-to-vlms\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/from-llms-to-vlms\\\/\"},\"author\":{\"name\":\"pierfrancesco\",\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/#\\\/schema\\\/person\\\/d2b6fd914c90d166fceb88fea15ee8f6\"},\"headline\":\"From LLMs to VLMs\",\"datePublished\":\"2025-11-04T10:05:40+00:00\",\"dateModified\":\"2025-11-04T10:07:05+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/from-llms-to-vlms\\\/\"},\"wordCount\":922,\"publisher\":{\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/from-llms-to-vlms\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/wp-content\\\/uploads\\\/sites\\\/4\\\/2025\\\/11\\\/Dai-LLM-ai-VLM.jpg\",\"keywords\":[\"AI\",\"Artificial intelligence\",\"Generative AI\",\"Generative Artificial Intelligence\",\"GPT\",\"Large Language Models\",\"LLM\",\"Pre-trained Generative Transformer\",\"Prompt Engineering\",\"RAG\",\"Retrieval-Augmented Generation\",\"Visual Language Models\",\"VLM\"],\"articleSection\":[\"Articles\",\"Insights\",\"Slideshow\"],\"inLanguage\":\"en-GB\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/from-llms-to-vlms\\\/\",\"url\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/from-llms-to-vlms\\\/\",\"name\":\"From LLMs to VLMs - HN Machine Learning en\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/from-llms-to-vlms\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/from-llms-to-vlms\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/wp-content\\\/uploads\\\/sites\\\/4\\\/2025\\\/11\\\/Dai-LLM-ai-VLM.jpg\",\"datePublished\":\"2025-11-04T10:05:40+00:00\",\"dateModified\":\"2025-11-04T10:07:05+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/from-llms-to-vlms\\\/#breadcrumb\"},\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/from-llms-to-vlms\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/from-llms-to-vlms\\\/#primaryimage\",\"url\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/wp-content\\\/uploads\\\/sites\\\/4\\\/2025\\\/11\\\/Dai-LLM-ai-VLM.jpg\",\"contentUrl\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/wp-content\\\/uploads\\\/sites\\\/4\\\/2025\\\/11\\\/Dai-LLM-ai-VLM.jpg\",\"width\":1200,\"height\":673},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/from-llms-to-vlms\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"From LLMs to VLMs\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/#website\",\"url\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/\",\"name\":\"HN Machine Learning\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/#organization\"},\"alternateName\":\"Humanativa\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-GB\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/#organization\",\"name\":\"HN Machine Learning\",\"url\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/wp-content\\\/uploads\\\/sites\\\/4\\\/2023\\\/09\\\/libellula_hn.jpg\",\"contentUrl\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/wp-content\\\/uploads\\\/sites\\\/4\\\/2023\\\/09\\\/libellula_hn.jpg\",\"width\":696,\"height\":696,\"caption\":\"HN Machine Learning\"},\"image\":{\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/HumanativaGroupSpA\\\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/#\\\/schema\\\/person\\\/d2b6fd914c90d166fceb88fea15ee8f6\",\"name\":\"pierfrancesco\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/a5f644705853b418c243b7844d68179bf0fc5e09b724a62ce889a71ad449dc07?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/a5f644705853b418c243b7844d68179bf0fc5e09b724a62ce889a71ad449dc07?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/a5f644705853b418c243b7844d68179bf0fc5e09b724a62ce889a71ad449dc07?s=96&d=mm&r=g\",\"caption\":\"pierfrancesco\"},\"url\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/author\\\/pierfrancesco\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"From LLMs to VLMs - HN Machine Learning en","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/machinelearning.humanativaspa.it\/en\/from-llms-to-vlms\/","twitter_card":"summary_large_image","twitter_title":"From LLMs to VLMs - HN Machine Learning en","twitter_description":"The new frontier of AI for understanding complex documents","twitter_image":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-content\/uploads\/sites\/4\/2025\/11\/Dai-LLM-ai-VLM.jpg","twitter_misc":{"Written by":"pierfrancesco","Estimated reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/machinelearning.humanativaspa.it\/en\/from-llms-to-vlms\/#article","isPartOf":{"@id":"https:\/\/machinelearning.humanativaspa.it\/en\/from-llms-to-vlms\/"},"author":{"name":"pierfrancesco","@id":"https:\/\/machinelearning.humanativaspa.it\/en\/#\/schema\/person\/d2b6fd914c90d166fceb88fea15ee8f6"},"headline":"From LLMs to VLMs","datePublished":"2025-11-04T10:05:40+00:00","dateModified":"2025-11-04T10:07:05+00:00","mainEntityOfPage":{"@id":"https:\/\/machinelearning.humanativaspa.it\/en\/from-llms-to-vlms\/"},"wordCount":922,"publisher":{"@id":"https:\/\/machinelearning.humanativaspa.it\/en\/#organization"},"image":{"@id":"https:\/\/machinelearning.humanativaspa.it\/en\/from-llms-to-vlms\/#primaryimage"},"thumbnailUrl":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-content\/uploads\/sites\/4\/2025\/11\/Dai-LLM-ai-VLM.jpg","keywords":["AI","Artificial intelligence","Generative AI","Generative Artificial Intelligence","GPT","Large Language Models","LLM","Pre-trained Generative Transformer","Prompt Engineering","RAG","Retrieval-Augmented Generation","Visual Language Models","VLM"],"articleSection":["Articles","Insights","Slideshow"],"inLanguage":"en-GB"},{"@type":"WebPage","@id":"https:\/\/machinelearning.humanativaspa.it\/en\/from-llms-to-vlms\/","url":"https:\/\/machinelearning.humanativaspa.it\/en\/from-llms-to-vlms\/","name":"From LLMs to VLMs - HN Machine Learning en","isPartOf":{"@id":"https:\/\/machinelearning.humanativaspa.it\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/machinelearning.humanativaspa.it\/en\/from-llms-to-vlms\/#primaryimage"},"image":{"@id":"https:\/\/machinelearning.humanativaspa.it\/en\/from-llms-to-vlms\/#primaryimage"},"thumbnailUrl":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-content\/uploads\/sites\/4\/2025\/11\/Dai-LLM-ai-VLM.jpg","datePublished":"2025-11-04T10:05:40+00:00","dateModified":"2025-11-04T10:07:05+00:00","breadcrumb":{"@id":"https:\/\/machinelearning.humanativaspa.it\/en\/from-llms-to-vlms\/#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/machinelearning.humanativaspa.it\/en\/from-llms-to-vlms\/"]}]},{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/machinelearning.humanativaspa.it\/en\/from-llms-to-vlms\/#primaryimage","url":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-content\/uploads\/sites\/4\/2025\/11\/Dai-LLM-ai-VLM.jpg","contentUrl":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-content\/uploads\/sites\/4\/2025\/11\/Dai-LLM-ai-VLM.jpg","width":1200,"height":673},{"@type":"BreadcrumbList","@id":"https:\/\/machinelearning.humanativaspa.it\/en\/from-llms-to-vlms\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/machinelearning.humanativaspa.it\/en\/"},{"@type":"ListItem","position":2,"name":"From LLMs to VLMs"}]},{"@type":"WebSite","@id":"https:\/\/machinelearning.humanativaspa.it\/en\/#website","url":"https:\/\/machinelearning.humanativaspa.it\/en\/","name":"HN Machine Learning","description":"","publisher":{"@id":"https:\/\/machinelearning.humanativaspa.it\/en\/#organization"},"alternateName":"Humanativa","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/machinelearning.humanativaspa.it\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-GB"},{"@type":"Organization","@id":"https:\/\/machinelearning.humanativaspa.it\/en\/#organization","name":"HN Machine Learning","url":"https:\/\/machinelearning.humanativaspa.it\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/machinelearning.humanativaspa.it\/en\/#\/schema\/logo\/image\/","url":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-content\/uploads\/sites\/4\/2023\/09\/libellula_hn.jpg","contentUrl":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-content\/uploads\/sites\/4\/2023\/09\/libellula_hn.jpg","width":696,"height":696,"caption":"HN Machine Learning"},"image":{"@id":"https:\/\/machinelearning.humanativaspa.it\/en\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/HumanativaGroupSpA\/"]},{"@type":"Person","@id":"https:\/\/machinelearning.humanativaspa.it\/en\/#\/schema\/person\/d2b6fd914c90d166fceb88fea15ee8f6","name":"pierfrancesco","image":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/secure.gravatar.com\/avatar\/a5f644705853b418c243b7844d68179bf0fc5e09b724a62ce889a71ad449dc07?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/a5f644705853b418c243b7844d68179bf0fc5e09b724a62ce889a71ad449dc07?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/a5f644705853b418c243b7844d68179bf0fc5e09b724a62ce889a71ad449dc07?s=96&d=mm&r=g","caption":"pierfrancesco"},"url":"https:\/\/machinelearning.humanativaspa.it\/en\/author\/pierfrancesco\/"}]}},"_links":{"self":[{"href":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-json\/wp\/v2\/posts\/416","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-json\/wp\/v2\/comments?post=416"}],"version-history":[{"count":1,"href":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-json\/wp\/v2\/posts\/416\/revisions"}],"predecessor-version":[{"id":418,"href":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-json\/wp\/v2\/posts\/416\/revisions\/418"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-json\/wp\/v2\/media\/417"}],"wp:attachment":[{"href":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-json\/wp\/v2\/media?parent=416"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-json\/wp\/v2\/categories?post=416"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-json\/wp\/v2\/tags?post=416"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}