Search Machine Learning Papers
Search Results for "reinforcement learning"
Found 20 papers
Imperfect also Deserves Reward: Multi-Level and Sequential Reward Modeling for Better Dialog Management
For task-oriented dialog systems, training a Reinforcement Learning (RL) based Dialog Management module suffers from low sample efficiency and slow convergence speed due to the sparse rewards in RL.To...
Parameter Efficient Reinforcement Learning from Human Feedback
While Reinforcement Learning from Human Feedback (RLHF) effectively aligns pretrained Large Language and Vision-Language Models (LLMs, and VLMs) with human preferences, its computational cost and comp...
Reinforcement Learning Enhanced LLMs: A Survey
Reinforcement learning (RL) enhanced large language models (LLMs), particularly exemplified by DeepSeek-R1, have exhibited outstanding performance. Despite the effectiveness in improving LLM capabilit...
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
Large language models (LLMs) provide excellent text-generation capabilities, but standard prompting and generation methods generally do not lead to intentional or goal-directed agents and might necess...
AgentGraph: Towards Universal Dialogue Management with Structured Deep Reinforcement Learning
Dialogue policy plays an important role in task-oriented spoken dialogue systems. It determines how to respond to users. The recently proposed deep reinforcement learning (DRL) approaches have been us...
Natural Language Reinforcement Learning
Artificial intelligence progresses towards the "Era of Experience," where agents are expected to learn from continuous, grounded interaction. We argue that traditional Reinforcement Learning (RL), whi...
RLTF: Reinforcement Learning from Unit Test Feedback
The goal of program synthesis, or code generation, is to generate executable code based on given descriptions. Recently, there has been an increasing number of studies employing reinforcement learning...
Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management
Deep reinforcement learning (RL) methods have significant potential for dialogue policy optimisation. However, they suffer from a poor performance in the early stages of learning. This is especially p...
Learning to Reason at the Frontier of Learnability
Reinforcement learning is now widely adopted as the final stage of large language model training, especially for reasoning-style tasks such as maths problems. Typically, models attempt each question m...
Using Large Language Models to Automate and Expedite Reinforcement Learning with Reward Machine
We present LARL-RM (Large language model-generated Automaton for Reinforcement Learning with Reward Machine) algorithm in order to encode high-level knowledge into reinforcement learning using automat...
IPCGRL: Language-Instructed Reinforcement Learning for Procedural Level Generation
Recent research has highlighted the significance of natural language in enhancing the controllability of generative models. While various efforts have been made to leverage natural language for conten...
Natural Language Specification of Reinforcement Learning Policies through Differentiable Decision Trees
Human-AI policy specification is a novel procedure we define in which humans can collaboratively warm-start a robot's reinforcement learning policy. This procedure is comprised of two steps; (1) Polic...
NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning
Recent advances such as DeepSeek R1-Zero highlight the effectiveness of incentive training, a reinforcement learning paradigm that computes rewards solely based on the final answer part of a language ...
Natural Language Reinforcement Learning
Reinforcement Learning (RL) has shown remarkable abilities in learning policies for decision-making tasks. However, RL is often hindered by issues such as low sample efficiency, lack of interpretabili...
Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads
We introduce an online popularity prediction and tracking task as a benchmark task for reinforcement learning with a combinatorial, natural language action space. A specified number of discussion thre...
Learning Two-Step Hybrid Policy for Graph-Based Interpretable Reinforcement Learning
We present a two-step hybrid reinforcement learning (RL) policy that is designed to generate interpretable and robust hierarchical policies on the RL problem with graph-based input. Unlike prior deep ...
Deep Reinforcement Learning with a Natural Language Action Space
This paper introduces a novel architecture for reinforcement learning with deep neural networks designed to handle state and action spaces characterized by natural language, as found in text-based gam...
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
We introduce Reasoning Gym (RG), a library of reasoning environments for reinforcement learning with verifiable rewards. It provides over 100 data generators and verifiers spanning multiple domains in...
Embracing advanced AI/ML to help investors achieve success: Vanguard Reinforcement Learning for Financial Goal Planning
In the world of advice and financial planning, there is seldom one right answer. While traditional algorithms have been successful in solving linear problems, its success often depends on choosing the...
Deep Reinforcement Learning for Chatbots Using Clustered Actions and Human-Likeness Rewards
Training chatbots using the reinforcement learning paradigm is challenging due to high-dimensional states, infinite action spaces and the difficulty in specifying the reward function. We address such ...