Research
I am working to unlock the potential of reinforcement learning techniques in complex real-world applications.
My recent research interests include LLMs self-improvement (with minimal human intervention), reward modeling for reinforcement learning, and synthetic data generation.
|
News
-
Our paper on using LLMs for orchestrating bimanual robots got accepted to Humanoids 2024. [Project webpage]
-
Our paper on enhancing zero-shot reasoning of LLMs got accepted to LREC-COLING 2024. [Paper link]
-
Our paper on explainable reinforcement learning got accepted for oral presentation to CLeaR 2024. [Paper link]
-
Our paper on using LLMs for robotic multimodal exploration got accepted to IROS 2023. [Paper link]
-
Our paper on stabalizing RL when the reward is produced by a jointly optimized reward model got accepted to
ICML 2023. [Paper link]
|
|
Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language
Models through Logic
Xufeng Zhao,
Mengdi Li,
Wenhao Lu,
Cornelius Weber,
Jae Hee Lee,
Stefan
Wermter,
LREC-COLING, 2024
project page
/
arXiv
/
Aiming to improve the zero-shot chain-of-thought reasoning ability of LLMs, we propose LoT (Logical Thoughts), a neurosymbolic framework that leverages principles from symbolic logic to verify and revise the reasoning processes accordingly.
|
|
Chat with the Environment: Interactive Multimodal Perception using
Large Language Models
Xufeng Zhao,
Mengdi Li,
Cornelius Weber,
Burhan Hafez,
Stefan
Wermter,
IROS, 2023
project page
/
code
/
video
/
arXiv
/
poster
/
slides
/
We develop an LLM-centered modular network to provide high-level planning and reasoning
skills and control interactive robot behaviour in a multimodal environment.
|
|
Internally Rewarded Reinforcement Learning
Mengdi Li*,
Xufeng Zhao*,
Jae Hee Lee,
Cornelius Weber,
Stefan
Wermter,
ICML, 2023
project page
/
code
/
arXiv
/
poster
We propose the clipped linear reward to stablize reinforcement learning where reward signals
for policy learning are generated by a discriminator-based reward model that is dependent on
and jointly optimized with the policy.
|
|
Robotic Occlusion Reasoning for Efficient Object Existence
Prediction
Mengdi Li,
Cornelius Weber,
Matthias
Kerzel,
Jae Hee Lee,
Zheni Zeng,
Zhiyuan Liu,
Stefan
Wermter,
IROS, 2021
code
/
video
/
arXiv
We propose an RNN-based model that is jointly trained with supervised and reinforcement
learning to achieve the task of predicting the existence of objects in occusion scenarios.
|
|
Neural Networks for Detecting Irrelevant Questions During Visual
Question Answering
Mengdi Li,
Cornelius Weber,
Stefan
Wermter,
ICANN, 2020
paper
We demonstrate that an efficient neural network designed for VQA can achieve high accuracy
on detecting the relevance of questions to images,
however joint training the model on relevance detection and VQA leads to performance
degradation on VQA.
|
|
Generating Steganographic Image Description by Dynamic Synonym
Substitution
Mengdi Li,
Kai Mu,
Ping Zhong,
Juan Wen,
Yiming Xue,
Signal Processing, 2019
paper
We propose a novel image captioning model to automatically generate stego image
descriptions.
The proposed model is able to generate high-quality image descriptions in both human
evaluation and statistical analysis.
|
The template of this website is borrowed from Jon Barron.
|
|