Mengdi Li / 李梦迪

I am currently a postdoctoral researcher at the Provable Responsible AI and Data Analytics (PRADA) Lab at King Abdullah University of Science and Technology (KAUST). Before that, I completed my Ph.D. at the Knowledge Technology group at University of Hamburg, where I worked on training RL agents to actively collect task-relevant information. My dissertation is available here. Also, please feel free to leave anonymous feedback and suggestions regarding both my work and myself here.

Email  /  Google Scholar  /  Twitter  /  Github  /  Linkedin

profile photo

Research

I am working to unlock the potential of reinforcement learning techniques in complex real-world applications. My recent research interests include LLMs self-improvement (with minimal human intervention), reward modeling for reinforcement learning, and synthetic data generation.

News

  • Our paper on using LLMs for orchestrating bimanual robots got accepted to Humanoids 2024. [Project webpage]
  • Our paper on enhancing zero-shot reasoning of LLMs got accepted to LREC-COLING 2024. [Paper link]
  • Our paper on explainable reinforcement learning got accepted for oral presentation to CLeaR 2024. [Paper link]
  • Our paper on using LLMs for robotic multimodal exploration got accepted to IROS 2023. [Paper link]
  • Our paper on stabalizing RL when the reward is produced by a jointly optimized reward model got accepted to ICML 2023. [Paper link]

Selected Publications

(see all publications here)

In reversed chronological order / * equal contributions

Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic
Xufeng Zhao, Mengdi Li, Wenhao Lu, Cornelius Weber, Jae Hee Lee, Stefan Wermter,
LREC-COLING, 2024
project page / arXiv /

Aiming to improve the zero-shot chain-of-thought reasoning ability of LLMs, we propose LoT (Logical Thoughts), a neurosymbolic framework that leverages principles from symbolic logic to verify and revise the reasoning processes accordingly.

Chat with the Environment: Interactive Multimodal Perception using Large Language Models
Xufeng Zhao, Mengdi Li, Cornelius Weber, Burhan Hafez, Stefan Wermter,
IROS, 2023
project page / code / video / arXiv / poster / slides /

We develop an LLM-centered modular network to provide high-level planning and reasoning skills and control interactive robot behaviour in a multimodal environment.

Internally Rewarded Reinforcement Learning
Mengdi Li*, Xufeng Zhao*, Jae Hee Lee, Cornelius Weber, Stefan Wermter,
ICML, 2023
project page / code / arXiv / poster

We propose the clipped linear reward to stablize reinforcement learning where reward signals for policy learning are generated by a discriminator-based reward model that is dependent on and jointly optimized with the policy.

Robotic Occlusion Reasoning for Efficient Object Existence Prediction
Mengdi Li, Cornelius Weber, Matthias Kerzel, Jae Hee Lee, Zheni Zeng, Zhiyuan Liu, Stefan Wermter,
IROS, 2021
code / video / arXiv

We propose an RNN-based model that is jointly trained with supervised and reinforcement learning to achieve the task of predicting the existence of objects in occusion scenarios.

Neural Networks for Detecting Irrelevant Questions During Visual Question Answering
Mengdi Li, Cornelius Weber, Stefan Wermter,
ICANN, 2020
paper

We demonstrate that an efficient neural network designed for VQA can achieve high accuracy on detecting the relevance of questions to images, however joint training the model on relevance detection and VQA leads to performance degradation on VQA.

Generating Steganographic Image Description by Dynamic Synonym Substitution
Mengdi Li, Kai Mu, Ping Zhong, Juan Wen, Yiming Xue,
Signal Processing, 2019
paper

We propose a novel image captioning model to automatically generate stego image descriptions. The proposed model is able to generate high-quality image descriptions in both human evaluation and statistical analysis.


The template of this website is borrowed from Jon Barron.