{"id":261,"date":"2022-04-13T15:17:48","date_gmt":"2022-04-13T13:17:48","guid":{"rendered":"https:\/\/machinelearning.humanativaspa.it\/en\/?p=261"},"modified":"2023-03-03T14:59:48","modified_gmt":"2023-03-03T13:59:48","slug":"machine-learning-reinforcement-learning","status":"publish","type":"post","link":"https:\/\/machinelearning.humanativaspa.it\/en\/machine-learning-reinforcement-learning\/","title":{"rendered":"Machine learning: reinforcement learning"},"content":{"rendered":"<p><span data-contrast=\"auto\">Reinforcement Learning is one of the hottest topics in the field of Machine Learning, and also one of the oldest. In fact, the first studies date back to the 50s of the last century.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:2,&quot;335559739&quot;:330,&quot;335559740&quot;:360}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">In 2013 a British startup, <\/span><b><span data-contrast=\"auto\">DeepMind<\/span><\/b><span data-contrast=\"auto\">, showed everyone how it was possible to create a system capable of learning to play any Atari game from scratch.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:2,&quot;335559739&quot;:0,&quot;335559740&quot;:360}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Below is the link to an article on this theme dedicated https:\/\/arxiv.org\/abs\/1312.5602.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:2,&quot;335559739&quot;:0,&quot;335559740&quot;:360}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">The masterpiece of <\/span><b><span data-contrast=\"auto\">DeepMind<\/span><\/b><span data-contrast=\"auto\"> remains, however, <\/span><b><span data-contrast=\"auto\">AlphaGo<\/span><\/b><span data-contrast=\"auto\">: a <\/span><b><span data-contrast=\"auto\">Reinforcement Learning<\/span><\/b><span data-contrast=\"auto\"> system that in 2017 beat Ke Jie then world champion of <\/span><i><span data-contrast=\"auto\">Go<\/span><\/i><span data-contrast=\"auto\">, a complicated Chinese game with a number of positions higher than that of all the atoms present in the observable Universe: a goal achieved through the application of the power of <\/span><i><span data-contrast=\"auto\">neural networks<\/span><\/i><span data-contrast=\"auto\"> to the field of <\/span><b><span data-contrast=\"auto\">Reinforcement Learning.<\/span><\/b><span data-ccp-props=\"{&quot;201341983&quot;:2,&quot;335559739&quot;:0,&quot;335559740&quot;:360}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">In nature, <\/span><i><span data-contrast=\"auto\">learning<\/span><\/i><span data-contrast=\"auto\"> is a process of <\/span><i><span data-contrast=\"auto\">exploration<\/span><\/i><span data-contrast=\"auto\"> and <\/span><i><span data-contrast=\"auto\">environmental interaction<\/span><\/i><span data-contrast=\"auto\"> necessary to obtain <\/span><i><span data-contrast=\"auto\">rewards<\/span><\/i><span data-contrast=\"auto\">.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:2,&quot;335559739&quot;:0,&quot;335559740&quot;:360}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">This simple paradigm is captured by <\/span><b><span data-contrast=\"auto\">Reinforcement Learning<\/span><\/b><span data-contrast=\"auto\"> and coded into systems that can be executed by artificial machines.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:2,&quot;335559739&quot;:330,&quot;335559740&quot;:360}\">\u00a0<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Reinforcement Learning <\/span><\/b><span data-contrast=\"auto\">is simply an area of study or better, more technically, a class of machine learning systems with typical structure and functioning.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:2,&quot;335559739&quot;:330,&quot;335559740&quot;:360}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Let&#8217;s say, therefore, that in a Reinforcement Learning system a software agent makes <\/span><b><span data-contrast=\"auto\">observations<\/span><\/b><span data-contrast=\"auto\"> in an <\/span><b><span data-contrast=\"auto\">environment<\/span><\/b><span data-contrast=\"auto\">, performing <\/span><b><span data-contrast=\"auto\">actions<\/span><\/b><span data-contrast=\"auto\"> in it and receiving <\/span><b><span data-contrast=\"auto\">rewards<\/span><\/b><span data-contrast=\"auto\"> in exchange.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:2,&quot;335559739&quot;:330,&quot;335559740&quot;:360}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">The agent&#8217;s goal is to maximize the reward in the long run.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:2,&quot;335559739&quot;:330,&quot;335559740&quot;:360}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">The reward can also be negative, a penalty, c.d. <\/span><b><span data-contrast=\"auto\">negative rewards<\/span><\/b><span data-contrast=\"auto\">.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Some examples of <\/span><b><span data-contrast=\"auto\">Reinforcement Learning<\/span><\/b><span data-contrast=\"auto\">:<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<ul>\n<li><b><span data-contrast=\"auto\">A walking robot<\/span><\/b><span data-contrast=\"auto\">: the robot that walks.<\/span><\/li>\n<\/ul>\n<p><span data-contrast=\"auto\">In this case the agent is the software in charge of controlling the machine, which observes the real world with a multitude of sensors obtaining a reward if it reaches the goal and a penalty (i.e. negative reward) if it wastes time in useless actions (e.g. wrong direction, fall etc.)<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559739&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559739&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<ul>\n<li><b><span data-contrast=\"auto\">A Smart Thermostat<\/span><\/b><span data-contrast=\"auto\">.<\/span><\/li>\n<\/ul>\n<p><span data-contrast=\"auto\">An agent does not necessarily have to control the movement of a physical (or virtual) device.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559739&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">For example, the <\/span><b><span data-contrast=\"auto\">Google Nest<\/span><\/b><span data-contrast=\"auto\"> thermostat, in the first weeks of use, adjusts a machine learning model (more correctly than <\/span><b><span data-contrast=\"auto\">reinforcement learning<\/span><\/b><span data-contrast=\"auto\">) adapting to the user&#8217;s needs.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559739&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">In this case the positive reward is triggered by setting a valid temperature and reducing energy consumption, the negative one in case of corrective human intervention (i.e. wrong temperature).<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559739&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">In this case, the agent must therefore anticipate human needs.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559739&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559739&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<ul>\n<li><b><span data-contrast=\"auto\">FinTech<\/span><\/b><\/li>\n<\/ul>\n<p><span data-contrast=\"auto\">The financial sector, especially the stock trading industry, welcomes these systems, because they could assist <\/span><b><span data-contrast=\"auto\">brokers<\/span><\/b><span data-contrast=\"auto\"> and daily <\/span><b><span data-contrast=\"auto\">traders<\/span><\/b><span data-contrast=\"auto\"> by observing the prices of shares and deciding when and how much to buy or sell.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559739&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">The rewards are triggered here according to the profit or loss margins.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559739&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559739&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">In all cases, the lack of fundamental information at the logical level will not have escaped.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559739&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Effectively the agent performs actions in an environment and receives in return rewards that must be maximized.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559739&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559739&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">But how does the agent know what action to take at any given time, based on the observations made? Using an algorithm to determine what actions to perform, this algorithm is called <\/span><b><span data-contrast=\"auto\">Policy<\/span><\/b><span data-contrast=\"auto\">.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:2,&quot;335559739&quot;:330,&quot;335559740&quot;:360}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">A policy can take the form of a neural network: taking the input observations it processes the action to be taken in output.\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:2,&quot;335559739&quot;:330,&quot;335559740&quot;:360}\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><b><span data-contrast=\"auto\">Q-learning<\/span><\/b><span data-ccp-props=\"{&quot;201341983&quot;:2,&quot;335559739&quot;:330,&quot;335559740&quot;:360}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Q-learning is one of the most well-known <\/span><b><span data-contrast=\"auto\">reinforcement learning algorithms<\/span><\/b><span data-contrast=\"auto\">.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:2,&quot;335559739&quot;:330,&quot;335559740&quot;:360}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">It is part of the family of algorithms adopted in the techniques of time differences in the case of models with incomplete information.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:2,&quot;335559739&quot;:330,&quot;335559740&quot;:360}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Q-learning is a value-based learning algorithm and focuses on optimizing the value function according to the environment or problem.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:2,&quot;335559739&quot;:330,&quot;335559740&quot;:360}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Its goal is to allow a machine learning system to adapt to the environment around it, improving the choice of actions among those possible to perform. To achieve this, work by trying to maximize the value of the next prize earned Q.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:2,&quot;335559739&quot;:330,&quot;335559740&quot;:360}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">The model stores all the values in a table, which is Table Q, where the rows show all possible observations while the columns include all possible actions. The cells are then filled during training with values that represent the expected reward.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:2,&quot;335559739&quot;:330,&quot;335559740&quot;:360}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">The Q-learning algorithm is described by an agent, technically the AI model, interacting with the environment, a set of S-states and a set of actions for each state.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:2,&quot;335559739&quot;:330,&quot;335559740&quot;:360}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">To achieve this, the agent performs actions that alter the environment by generating a new state, obtaining a reward that can be negative or positive depending on the effect of the action, and depending on the desired result.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:2,&quot;335559739&quot;:330,&quot;335559740&quot;:360}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">By performing actions and obtaining rewards, the agent devises an optimal strategy to maximize the reward over time. This strategy is called policy, and is a mathematical function with optimized parameters.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:2,&quot;335559739&quot;:330,&quot;335559740&quot;:360}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">A very interesting example of Q-learning is the Gym developed with <\/span><b><span data-contrast=\"auto\">OpenAI Gym + Box2D<\/span><\/b><span data-contrast=\"auto\"> (https:\/\/gym.openai.com\/ )<\/span><span data-ccp-props=\"{&quot;201341983&quot;:2,&quot;335559739&quot;:330,&quot;335559740&quot;:360}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Gym is a toolkit for developing and comparing reinforcement learning algorithms. It supports teaching agents everything from walking to games like Pong or Pinball.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:2,&quot;335559739&quot;:330,&quot;335559740&quot;:360}\">\u00a0<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Reinforcement Learning is one of the hottest topics in the field of Machine Learning, and also one of [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":262,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[56],"tags":[],"class_list":["post-261","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-articoli"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Machine learning: reinforcement learning - HN Machine Learning en<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/machinelearning.humanativaspa.it\/en\/machine-learning-reinforcement-learning\/\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"Machine learning: reinforcement learning - HN Machine Learning en\" \/>\n<meta name=\"twitter:description\" content=\"Reinforcement Learning is one of the hottest topics in the field of Machine Learning, and also one of [&hellip;]\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/machinelearning.humanativaspa.it\/en\/wp-content\/uploads\/sites\/4\/2023\/03\/reinforcement_learning.jpg\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Andream\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/machine-learning-reinforcement-learning\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/machine-learning-reinforcement-learning\\\/\"},\"author\":{\"name\":\"Andream\",\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/#\\\/schema\\\/person\\\/a6de167b6fe30bf1d2edfcbfd3417de8\"},\"headline\":\"Machine learning: reinforcement learning\",\"datePublished\":\"2022-04-13T13:17:48+00:00\",\"dateModified\":\"2023-03-03T13:59:48+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/machine-learning-reinforcement-learning\\\/\"},\"wordCount\":812,\"publisher\":{\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/machine-learning-reinforcement-learning\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/wp-content\\\/uploads\\\/sites\\\/4\\\/2023\\\/03\\\/reinforcement_learning.jpg\",\"articleSection\":[\"Articles\"],\"inLanguage\":\"en-GB\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/machine-learning-reinforcement-learning\\\/\",\"url\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/machine-learning-reinforcement-learning\\\/\",\"name\":\"Machine learning: reinforcement learning - HN Machine Learning en\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/machine-learning-reinforcement-learning\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/machine-learning-reinforcement-learning\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/wp-content\\\/uploads\\\/sites\\\/4\\\/2023\\\/03\\\/reinforcement_learning.jpg\",\"datePublished\":\"2022-04-13T13:17:48+00:00\",\"dateModified\":\"2023-03-03T13:59:48+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/machine-learning-reinforcement-learning\\\/#breadcrumb\"},\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/machine-learning-reinforcement-learning\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/machine-learning-reinforcement-learning\\\/#primaryimage\",\"url\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/wp-content\\\/uploads\\\/sites\\\/4\\\/2023\\\/03\\\/reinforcement_learning.jpg\",\"contentUrl\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/wp-content\\\/uploads\\\/sites\\\/4\\\/2023\\\/03\\\/reinforcement_learning.jpg\",\"width\":1000,\"height\":500},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/machine-learning-reinforcement-learning\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Machine learning: reinforcement learning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/#website\",\"url\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/\",\"name\":\"HN Machine Learning\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/#organization\"},\"alternateName\":\"Humanativa\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-GB\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/#organization\",\"name\":\"HN Machine Learning\",\"url\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/wp-content\\\/uploads\\\/sites\\\/4\\\/2023\\\/09\\\/libellula_hn.jpg\",\"contentUrl\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/wp-content\\\/uploads\\\/sites\\\/4\\\/2023\\\/09\\\/libellula_hn.jpg\",\"width\":696,\"height\":696,\"caption\":\"HN Machine Learning\"},\"image\":{\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/HumanativaGroupSpA\\\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/#\\\/schema\\\/person\\\/a6de167b6fe30bf1d2edfcbfd3417de8\",\"name\":\"Andream\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/1f7a315d6f9ae9709ffc015996ad40b2c1779d16ea2dede3da3989ca3cf5aae8?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/1f7a315d6f9ae9709ffc015996ad40b2c1779d16ea2dede3da3989ca3cf5aae8?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/1f7a315d6f9ae9709ffc015996ad40b2c1779d16ea2dede3da3989ca3cf5aae8?s=96&d=mm&r=g\",\"caption\":\"Andream\"},\"sameAs\":[\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/\"],\"url\":\"https:\\\/\\\/machinelearning.humanativaspa.it\\\/en\\\/author\\\/andream\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Machine learning: reinforcement learning - HN Machine Learning en","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/machinelearning.humanativaspa.it\/en\/machine-learning-reinforcement-learning\/","twitter_card":"summary_large_image","twitter_title":"Machine learning: reinforcement learning - HN Machine Learning en","twitter_description":"Reinforcement Learning is one of the hottest topics in the field of Machine Learning, and also one of [&hellip;]","twitter_image":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-content\/uploads\/sites\/4\/2023\/03\/reinforcement_learning.jpg","twitter_misc":{"Written by":"Andream","Estimated reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/machinelearning.humanativaspa.it\/en\/machine-learning-reinforcement-learning\/#article","isPartOf":{"@id":"https:\/\/machinelearning.humanativaspa.it\/en\/machine-learning-reinforcement-learning\/"},"author":{"name":"Andream","@id":"https:\/\/machinelearning.humanativaspa.it\/en\/#\/schema\/person\/a6de167b6fe30bf1d2edfcbfd3417de8"},"headline":"Machine learning: reinforcement learning","datePublished":"2022-04-13T13:17:48+00:00","dateModified":"2023-03-03T13:59:48+00:00","mainEntityOfPage":{"@id":"https:\/\/machinelearning.humanativaspa.it\/en\/machine-learning-reinforcement-learning\/"},"wordCount":812,"publisher":{"@id":"https:\/\/machinelearning.humanativaspa.it\/en\/#organization"},"image":{"@id":"https:\/\/machinelearning.humanativaspa.it\/en\/machine-learning-reinforcement-learning\/#primaryimage"},"thumbnailUrl":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-content\/uploads\/sites\/4\/2023\/03\/reinforcement_learning.jpg","articleSection":["Articles"],"inLanguage":"en-GB"},{"@type":"WebPage","@id":"https:\/\/machinelearning.humanativaspa.it\/en\/machine-learning-reinforcement-learning\/","url":"https:\/\/machinelearning.humanativaspa.it\/en\/machine-learning-reinforcement-learning\/","name":"Machine learning: reinforcement learning - HN Machine Learning en","isPartOf":{"@id":"https:\/\/machinelearning.humanativaspa.it\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/machinelearning.humanativaspa.it\/en\/machine-learning-reinforcement-learning\/#primaryimage"},"image":{"@id":"https:\/\/machinelearning.humanativaspa.it\/en\/machine-learning-reinforcement-learning\/#primaryimage"},"thumbnailUrl":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-content\/uploads\/sites\/4\/2023\/03\/reinforcement_learning.jpg","datePublished":"2022-04-13T13:17:48+00:00","dateModified":"2023-03-03T13:59:48+00:00","breadcrumb":{"@id":"https:\/\/machinelearning.humanativaspa.it\/en\/machine-learning-reinforcement-learning\/#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/machinelearning.humanativaspa.it\/en\/machine-learning-reinforcement-learning\/"]}]},{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/machinelearning.humanativaspa.it\/en\/machine-learning-reinforcement-learning\/#primaryimage","url":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-content\/uploads\/sites\/4\/2023\/03\/reinforcement_learning.jpg","contentUrl":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-content\/uploads\/sites\/4\/2023\/03\/reinforcement_learning.jpg","width":1000,"height":500},{"@type":"BreadcrumbList","@id":"https:\/\/machinelearning.humanativaspa.it\/en\/machine-learning-reinforcement-learning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/machinelearning.humanativaspa.it\/en\/"},{"@type":"ListItem","position":2,"name":"Machine learning: reinforcement learning"}]},{"@type":"WebSite","@id":"https:\/\/machinelearning.humanativaspa.it\/en\/#website","url":"https:\/\/machinelearning.humanativaspa.it\/en\/","name":"HN Machine Learning","description":"","publisher":{"@id":"https:\/\/machinelearning.humanativaspa.it\/en\/#organization"},"alternateName":"Humanativa","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/machinelearning.humanativaspa.it\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-GB"},{"@type":"Organization","@id":"https:\/\/machinelearning.humanativaspa.it\/en\/#organization","name":"HN Machine Learning","url":"https:\/\/machinelearning.humanativaspa.it\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/machinelearning.humanativaspa.it\/en\/#\/schema\/logo\/image\/","url":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-content\/uploads\/sites\/4\/2023\/09\/libellula_hn.jpg","contentUrl":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-content\/uploads\/sites\/4\/2023\/09\/libellula_hn.jpg","width":696,"height":696,"caption":"HN Machine Learning"},"image":{"@id":"https:\/\/machinelearning.humanativaspa.it\/en\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/HumanativaGroupSpA\/"]},{"@type":"Person","@id":"https:\/\/machinelearning.humanativaspa.it\/en\/#\/schema\/person\/a6de167b6fe30bf1d2edfcbfd3417de8","name":"Andream","image":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/secure.gravatar.com\/avatar\/1f7a315d6f9ae9709ffc015996ad40b2c1779d16ea2dede3da3989ca3cf5aae8?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/1f7a315d6f9ae9709ffc015996ad40b2c1779d16ea2dede3da3989ca3cf5aae8?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1f7a315d6f9ae9709ffc015996ad40b2c1779d16ea2dede3da3989ca3cf5aae8?s=96&d=mm&r=g","caption":"Andream"},"sameAs":["https:\/\/machinelearning.humanativaspa.it\/"],"url":"https:\/\/machinelearning.humanativaspa.it\/en\/author\/andream\/"}]}},"_links":{"self":[{"href":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-json\/wp\/v2\/posts\/261","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-json\/wp\/v2\/comments?post=261"}],"version-history":[{"count":1,"href":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-json\/wp\/v2\/posts\/261\/revisions"}],"predecessor-version":[{"id":263,"href":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-json\/wp\/v2\/posts\/261\/revisions\/263"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-json\/wp\/v2\/media\/262"}],"wp:attachment":[{"href":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-json\/wp\/v2\/media?parent=261"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-json\/wp\/v2\/categories?post=261"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/machinelearning.humanativaspa.it\/en\/wp-json\/wp\/v2\/tags?post=261"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}