Search Machine Learning Papers

Search Results for "reinforcement learning"

Found 20 papers

cs.CL cs.AI cs.LG

2021-04-10

Imperfect also Deserves Reward: Multi-Level and Sequential Reward Modeling for Better Dialog Management

Zhengxu Hou, Bang Liu, Ruihui Zhao, Zijing Ou, Yafei Liu, Xi Chen, Yefeng Zheng

For task-oriented dialog systems, training a Reinforcement Learning (RL) based Dialog Management module suffers from low sample efficiency and slow convergence speed due to the sparse rewards in RL.To...

View Details

PDF arXiv

cs.CL cs.AI cs.LG

2024-12-05

Reinforcement Learning Enhanced LLMs: A Survey

Shuhe Wang, Shengyu Zhang, Jie Zhang, Runyi Hu, Xiaoya Li, Tianwei Zhang, Jiwei Li, Fei Wu, Guoyin Wang, Eduard Hovy

Reinforcement learning (RL) enhanced large language models (LLMs), particularly exemplified by DeepSeek-R1, have exhibited outstanding performance. Despite the effectiveness in improving LLM capabilit...

View Details

PDF arXiv

cs.LG cs.AI cs.CL

2025-08-07

Exploring Superior Function Calls via Reinforcement Learning

Bingguang Hao, Maolin Wang, Zengzhuang Xu, Yicheng Chen, Cunyin Peng, Jinjie GU, Chenyi Zhuang

Function calling capabilities are crucial for deploying Large Language Models in real-world applications, yet current training approaches fail to develop robust reasoning strategies. Supervised fine-t...

View Details

PDF arXiv

cs.CL cs.AI cs.LG

2023-11-30

LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models

Marwa Abdulhai, Isadora White, Charlie Snell, Charles Sun, Joey Hong, Yuexiang Zhai, Kelvin Xu, Sergey Levine

Large language models (LLMs) provide excellent text-generation capabilities, but standard prompting and generation methods generally do not lead to intentional or goal-directed agents and might necess...

View Details

PDF arXiv

cs.CL cs.AI cs.LG eess.AS

2019-05-27

AgentGraph: Towards Universal Dialogue Management with Structured Deep Reinforcement Learning

Lu Chen, Zhi Chen, Bowen Tan, Sishan Long, Milica Gasic, Kai Yu

Dialogue policy plays an important role in task-oriented spoken dialogue systems. It determines how to respond to users. The recently proposed deep reinforcement learning (DRL) approaches have been us...

View Details

PDF arXiv

cs.AI cs.CL cs.LG

2023-07-10

RLTF: Reinforcement Learning from Unit Test Feedback

Jiate Liu, Yiqin Zhu, Kaiwen Xiao, Qiang Fu, Xiao Han, Wei Yang, Deheng Ye

The goal of program synthesis, or code generation, is to generate executable code based on given descriptions. Recently, there has been an increasing number of studies employing reinforcement learning...

View Details

PDF arXiv

cs.LG cs.AI cs.CL

2024-11-21

Natural Language Reinforcement Learning

Xidong Feng, Bo Liu, Yan Song, Haotian Fu, Ziyu Wan, Girish A. Koushik, Zhiyuan Hu, Mengyue Yang, Ying Wen, Jun Wang

Artificial intelligence progresses towards the "Era of Experience," where agents are expected to learn from continuous, grounded interaction. We argue that traditional Reinforcement Learning (RL), whi...

View Details

PDF arXiv

cs.CL cs.AI cs.LG

2017-07-01

Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management

Pei-Hao Su, Pawel Budzianowski, Stefan Ultes, Milica Gasic, Steve Young

Deep reinforcement learning (RL) methods have significant potential for dialogue policy optimisation. However, they suffer from a poor performance in the early stages of learning. This is especially p...

View Details

PDF arXiv

cs.LG cs.AI cs.CL

2025-02-17

Learning to Reason at the Frontier of Learnability

Thomas Foster, Anya Sims, Johannes Forkel, Mattie Fellows, Jakob Foerster

Reinforcement learning is now widely adopted as the final stage of large language model training, especially for reasoning-style tasks such as maths problems. Typically, models attempt each question m...

View Details

PDF arXiv

cs.LG cs.AI cs.CL

2024-02-11

Using Large Language Models to Automate and Expedite Reinforcement Learning with Reward Machine

Shayan Meshkat Alsadat, Jean-Raphael Gaglione, Daniel Neider, Ufuk Topcu, Zhe Xu

We present LARL-RM (Large language model-generated Automaton for Reinforcement Learning with Reward Machine) algorithm in order to encode high-level knowledge into reinforcement learning using automat...

View Details

PDF arXiv

cs.LG cs.AI cs.CL cs.RO

2021-01-18

Natural Language Specification of Reinforcement Learning Policies through Differentiable Decision Trees

Pradyumna Tambwekar, Andrew Silva, Nakul Gopalan, Matthew Gombolay

Human-AI policy specification is a novel procedure we define in which humans can collaboratively warm-start a robot's reinforcement learning policy. This procedure is comprised of two steps; (1) Polic...

View Details

PDF arXiv

cs.CL cs.AI cs.LG

2025-05-21

NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning

Wei Liu, Siya Qi, Xinyu Wang, Chen Qian, Yali Du, Yulan He

Recent advances such as DeepSeek R1-Zero highlight the effectiveness of incentive training, a reinforcement learning paradigm that computes rewards solely based on the final answer part of a language ...

View Details

PDF arXiv

cs.AI cs.CL cs.LG

2025-03-16

IPCGRL: Language-Instructed Reinforcement Learning for Procedural Level Generation

In-Chang Baek, Sung-Hyun Kim, Seo-Young Lee, Dong-Hyeon Kim, Kyung-Joong Kim

Recent research has highlighted the significance of natural language in enhancing the controllability of generative models. While various efforts have been made to leverage natural language for conten...

View Details

PDF arXiv

cs.CL cs.AI cs.LG

2024-02-11

Natural Language Reinforcement Learning

Xidong Feng, Ziyu Wan, Mengyue Yang, Ziyan Wang, Girish A. Koushik, Yali Du, Ying Wen, Jun Wang

Reinforcement Learning (RL) has shown remarkable abilities in learning policies for decision-making tasks. However, RL is often hindered by issues such as low sample efficiency, lack of interpretabili...

View Details

PDF arXiv

cs.CL cs.AI cs.LG

2016-06-12

Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads

Ji He, Mari Ostendorf, Xiaodong He, Jianshu Chen, Jianfeng Gao, Lihong Li, Li Deng

We introduce an online popularity prediction and tracking task as a benchmark task for reinforcement learning with a combinatorial, natural language action space. A specified number of discussion thre...

View Details

PDF arXiv

cs.AI cs.CL cs.LG

2015-11-14

Deep Reinforcement Learning with a Natural Language Action Space

Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Deng, Mari Ostendorf

This paper introduces a novel architecture for reinforcement learning with deep neural networks designed to handle state and action spaces characterized by natural language, as found in text-based gam...

View Details

PDF arXiv

cs.LG cs.AI cs.CL

2022-01-21

Learning Two-Step Hybrid Policy for Graph-Based Interpretable Reinforcement Learning

Tongzhou Mu, Kaixiang Lin, Feiyang Niu, Govind Thattai

We present a two-step hybrid reinforcement learning (RL) policy that is designed to generate interpretable and robust hierarchical policies on the RL problem with graph-based input. Unlike prior deep ...

View Details

PDF arXiv

cs.LG cs.AI cs.CL

2025-05-30

REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Zafir Stojanovski, Oliver Stanley, Joe Sharratt, Richard Jones, Abdulhakeem Adefioye, Jean Kaddour, Andreas Köpf

We introduce Reasoning Gym (RG), a library of reasoning environments for reinforcement learning with verifiable rewards. It provides over 100 data generators and verifiers spanning multiple domains in...

View Details

PDF arXiv

q-fin.ST cs.AI cs.CL cs.LG

2021-10-18

Embracing advanced AI/ML to help investors achieve success: Vanguard Reinforcement Learning for Financial Goal Planning

Shareefuddin Mohammed, Rusty Bealer, Jason Cohen

In the world of advice and financial planning, there is seldom one right answer. While traditional algorithms have been successful in solving linear problems, its success often depends on choosing the...

View Details

PDF arXiv

cs.AI cs.CL cs.LG cs.NE

2019-08-27

Deep Reinforcement Learning for Chatbots Using Clustered Actions and Human-Likeness Rewards

Heriberto Cuayáhuitl, Donghyeon Lee, Seonghan Ryu, Sungja Choi, Inchul Hwang, Jihie Kim

Training chatbots using the reinforcement learning paradigm is challenging due to high-dimensional states, infinite action spaces and the difficulty in specifying the reward function. We address such ...

View Details

PDF arXiv