Search Machine Learning Papers

Tips: Use specific ML terms like "transformer", "reinforcement learning", or "computer vision". Search for authors with quotes like "Hinton".

Search Results for "reinforcement learning"

Found 20 papers

cs.CL cs.AI cs.LG
2021-04-10
Imperfect also Deserves Reward: Multi-Level and Sequential Reward Modeling for Better Dialog Management
Zhengxu Hou, Bang Liu, Ruihui Zhao, Zijing Ou, Yafei Liu, Xi Chen, Yefeng Zheng

For task-oriented dialog systems, training a Reinforcement Learning (RL) based Dialog Management module suffers from low sample efficiency and slow convergence speed due to the sparse rewards in RL.To...

cs.LG cs.AI cs.CL
2024-03-15
Parameter Efficient Reinforcement Learning from Human Feedback
Hakim Sidahmed, Samrat Phatale, Alex Hutcheson, Zhuonan Lin, Zhang Chen, Zac Yu, Jarvis Jin, Simral Chaudhary, Roman Komarytsia, Christiane Ahlheim, Yonghao Zhu, Bowen Li, Saravanan Ganesh, Bill Byrne, Jessica Hoffmann, Hassan Mansoor, Wei Li, Abhinav Rastogi, Lucas Dixon

While Reinforcement Learning from Human Feedback (RLHF) effectively aligns pretrained Large Language and Vision-Language Models (LLMs, and VLMs) with human preferences, its computational cost and comp...

cs.CL cs.AI cs.LG
2024-12-05
Reinforcement Learning Enhanced LLMs: A Survey
Shuhe Wang, Shengyu Zhang, Jie Zhang, Runyi Hu, Xiaoya Li, Tianwei Zhang, Jiwei Li, Fei Wu, Guoyin Wang, Eduard Hovy

Reinforcement learning (RL) enhanced large language models (LLMs), particularly exemplified by DeepSeek-R1, have exhibited outstanding performance. Despite the effectiveness in improving LLM capabilit...

cs.CL cs.AI cs.LG
2023-11-30
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
Marwa Abdulhai, Isadora White, Charlie Snell, Charles Sun, Joey Hong, Yuexiang Zhai, Kelvin Xu, Sergey Levine

Large language models (LLMs) provide excellent text-generation capabilities, but standard prompting and generation methods generally do not lead to intentional or goal-directed agents and might necess...

cs.CL cs.AI cs.LG eess.AS
2019-05-27
AgentGraph: Towards Universal Dialogue Management with Structured Deep Reinforcement Learning
Lu Chen, Zhi Chen, Bowen Tan, Sishan Long, Milica Gasic, Kai Yu

Dialogue policy plays an important role in task-oriented spoken dialogue systems. It determines how to respond to users. The recently proposed deep reinforcement learning (DRL) approaches have been us...

cs.LG cs.AI cs.CL
2024-11-21
Natural Language Reinforcement Learning
Xidong Feng, Bo Liu, Yan Song, Haotian Fu, Ziyu Wan, Girish A. Koushik, Zhiyuan Hu, Mengyue Yang, Ying Wen, Jun Wang

Artificial intelligence progresses towards the "Era of Experience," where agents are expected to learn from continuous, grounded interaction. We argue that traditional Reinforcement Learning (RL), whi...

cs.AI cs.CL cs.LG
2023-07-10
RLTF: Reinforcement Learning from Unit Test Feedback
Jiate Liu, Yiqin Zhu, Kaiwen Xiao, Qiang Fu, Xiao Han, Wei Yang, Deheng Ye

The goal of program synthesis, or code generation, is to generate executable code based on given descriptions. Recently, there has been an increasing number of studies employing reinforcement learning...

cs.CL cs.AI cs.LG
2017-07-01
Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management
Pei-Hao Su, Pawel Budzianowski, Stefan Ultes, Milica Gasic, Steve Young

Deep reinforcement learning (RL) methods have significant potential for dialogue policy optimisation. However, they suffer from a poor performance in the early stages of learning. This is especially p...

cs.LG cs.AI cs.CL
2025-02-17
Learning to Reason at the Frontier of Learnability
Thomas Foster, Anya Sims, Johannes Forkel, Mattie Fellows, Jakob Foerster

Reinforcement learning is now widely adopted as the final stage of large language model training, especially for reasoning-style tasks such as maths problems. Typically, models attempt each question m...

cs.LG cs.AI cs.CL
2024-02-11
Using Large Language Models to Automate and Expedite Reinforcement Learning with Reward Machine
Shayan Meshkat Alsadat, Jean-Raphael Gaglione, Daniel Neider, Ufuk Topcu, Zhe Xu

We present LARL-RM (Large language model-generated Automaton for Reinforcement Learning with Reward Machine) algorithm in order to encode high-level knowledge into reinforcement learning using automat...

cs.AI cs.CL cs.LG
2025-03-16
IPCGRL: Language-Instructed Reinforcement Learning for Procedural Level Generation
In-Chang Baek, Sung-Hyun Kim, Seo-Young Lee, Dong-Hyeon Kim, Kyung-Joong Kim

Recent research has highlighted the significance of natural language in enhancing the controllability of generative models. While various efforts have been made to leverage natural language for conten...

cs.LG cs.AI cs.CL cs.RO
2021-01-18
Natural Language Specification of Reinforcement Learning Policies through Differentiable Decision Trees
Pradyumna Tambwekar, Andrew Silva, Nakul Gopalan, Matthew Gombolay

Human-AI policy specification is a novel procedure we define in which humans can collaboratively warm-start a robot's reinforcement learning policy. This procedure is comprised of two steps; (1) Polic...

cs.CL cs.AI cs.LG
2025-05-21
NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning
Wei Liu, Siya Qi, Xinyu Wang, Chen Qian, Yali Du, Yulan He

Recent advances such as DeepSeek R1-Zero highlight the effectiveness of incentive training, a reinforcement learning paradigm that computes rewards solely based on the final answer part of a language ...

cs.CL cs.AI cs.LG
2024-02-11
Natural Language Reinforcement Learning
Xidong Feng, Ziyu Wan, Mengyue Yang, Ziyan Wang, Girish A. Koushik, Yali Du, Ying Wen, Jun Wang

Reinforcement Learning (RL) has shown remarkable abilities in learning policies for decision-making tasks. However, RL is often hindered by issues such as low sample efficiency, lack of interpretabili...

cs.CL cs.AI cs.LG
2016-06-12
Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads
Ji He, Mari Ostendorf, Xiaodong He, Jianshu Chen, Jianfeng Gao, Lihong Li, Li Deng

We introduce an online popularity prediction and tracking task as a benchmark task for reinforcement learning with a combinatorial, natural language action space. A specified number of discussion thre...

cs.LG cs.AI cs.CL
2022-01-21
Learning Two-Step Hybrid Policy for Graph-Based Interpretable Reinforcement Learning
Tongzhou Mu, Kaixiang Lin, Feiyang Niu, Govind Thattai

We present a two-step hybrid reinforcement learning (RL) policy that is designed to generate interpretable and robust hierarchical policies on the RL problem with graph-based input. Unlike prior deep ...

cs.AI cs.CL cs.LG
2015-11-14
Deep Reinforcement Learning with a Natural Language Action Space
Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Deng, Mari Ostendorf

This paper introduces a novel architecture for reinforcement learning with deep neural networks designed to handle state and action spaces characterized by natural language, as found in text-based gam...

cs.LG cs.AI cs.CL
2025-05-30
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
Zafir Stojanovski, Oliver Stanley, Joe Sharratt, Richard Jones, Abdulhakeem Adefioye, Jean Kaddour, Andreas Köpf

We introduce Reasoning Gym (RG), a library of reasoning environments for reinforcement learning with verifiable rewards. It provides over 100 data generators and verifiers spanning multiple domains in...

q-fin.ST cs.AI cs.CL cs.LG
2021-10-18
Embracing advanced AI/ML to help investors achieve success: Vanguard Reinforcement Learning for Financial Goal Planning
Shareefuddin Mohammed, Rusty Bealer, Jason Cohen

In the world of advice and financial planning, there is seldom one right answer. While traditional algorithms have been successful in solving linear problems, its success often depends on choosing the...

cs.AI cs.CL cs.LG cs.NE
2019-08-27
Deep Reinforcement Learning for Chatbots Using Clustered Actions and Human-Likeness Rewards
Heriberto Cuayáhuitl, Donghyeon Lee, Seonghan Ryu, Sungja Choi, Inchul Hwang, Jihie Kim

Training chatbots using the reinforcement learning paradigm is challenging due to high-dimensional states, infinite action spaces and the difficulty in specifying the reward function. We address such ...