Publications
Please see Google Scholar for more recent works and arXiv papers.
* : Equal contribution †: Corresponding author.
2025
- arXivLook Shallow, Think Deep: What Multimodal Chain-of-Thought Reasoning Can and Cannot DoarXiv preprint (arXiv), 2025
- ACLAgent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal AgentsIn Annual Meeting of the Association for Computational Linguistics (ACL) , 2025
- ACLA Troublemaker with Contagious Jailbreak Makes Chaos in Honest TownsIn Annual Meeting of the Association for Computational Linguistics (ACL) , 2025
- ACLEvaluating Personalized Tool-Augmented LLMs from the Perspectives of Personalization and ProactivityIn Annual Meeting of the Association for Computational Linguistics (ACL) , 2025
- ACLEstablishing Trustworthy LLM Evaluation via Shortcut Neuron AnalysisIn Annual Meeting of the Association for Computational Linguistics (ACL) , 2025