|
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms
Rafael Rafailov*, Yaswanth Chittepu*, Ryan Park*, Harshit Sikchi*, Joey Hejna*, W. Bradley Knox, Chelsea Finn, Scott Niekum, (* Equal Contribution) NeurIPS 2024; paper We define and explore the reward over-optimization phenomenon in direct alignment algorithms, such as DPO |
|
A Dual Approach to Imitation Learning from Observations with Offline Datasets
Harshit Sikchi, Caleb Chuck, Amy Zhang, Scott Niekum, CoRL 2024; paper A dual approach to LfO that is principled, computationally efficient, and empirically performant. |
|
Score Models for Offline Goal Conditioned Reinforcement Learning
Harshit Sikchi, Rohan Chitnis, Ahmed Touati, Alborz Geramifard, Amy Zhang, Scott Niekum, ICLR 2024; NeurIPS GCRL 2023 paper A discriminator-free occupancy matching approach for performant offline goal-conditioned reinforcement learning. |
|
Contrastive Prefence Learning: Learning from Human Feedback without RL
Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh ICLR 2024; paper We propose Constrastive Preference Learning, a new supervised algorithm for learning optimal policies from regret-based preferences in general MDPs. |
|
Dual RL: Unification and New Methods for Reinforcement and Imitation Learning
Harshit Sikchi, Qinqing Zheng, Amy Zhang, Scott Niekum ICLR 2024 (Spotlight Presentation, Top 5%); ICLR RRL 2023; EWRL 2023 paper A unification of RL and IL methods through the lens of duality that allows us to propose new methods for discriminator-free imitation learning and stable offline reinforcement learning. |
|
A Ranking Game for Imitation Learning
Harshit Sikchi, Akanksha Saran, Wonjoon Goo, Scott Niekum TMLR, 2022 project page / code / blog / talk / arXiv A unifying framework for learning from demonstrations and preferences. We develop methods to solve previously unsolvable tasks in the Learning from Observation (LfO) setting. |
|
Learning Off-Policy with Online Planning
Harshit Sikchi, Wenxuan Zhou, David Held CoRL, 2021 (Oral Presentation, Best Paper Finalist) project page / blog / video / arXiv Improving RL with lookahead: LOOP is an efficient framework for RL to learn with a policy that finds the best action sequence using imaginary rollouts with a learned model. This allows LOOP to potentially reduce dependence on value function errors. LOOP achieves strong performance across a range of tasks and problem settings. |
|
Imitative Planning using Conditional Normalizing Flow
Shubhankar Agarwal,Harshit Sikchi, Cole Gulino, Eric Wilkinson, IROS, BADUE 2022 (Best Paper) paper / video Combining DreamBooth (personalized text-to-image) and DreamFusion (text-to-3D) yields high-quality, subject-specific 3D assets with text-driven modifications |
|
Lyapunov Barrier Policy Optimization
Harshit Sikchi, Wenxuan Zhou, David Held arXiv, 2021; NeurIPS Deep RL Workshop 2020; NeurIPS Real World RL Workshop 2020 paper / video LBPO is a performant model-free safe RL method that leverages lyapunov constraints as barriers to guarantee safety. Our method shows state-of-the-art safety performance on OpenAI safety gym. |
|
f-IRL: Inverse Reinforcement Learning via State Marginal Matching
Tianwei Ni*, Harshit Sikchi*, Yufei Wang*, Tejus Gupta* Ben Eysenbach Lisa Lee (* Equal Contribution - Dice Rolling) CoRL, 2020 paper / project page Want to get rid on alternating min-max optimization in IRL methods? Check out our method that differentiates through the max-entropy policy optimization to minimize divergence with the demonstrator. We learn transferrable reward function and achieve state of the art performance in imitation learning. |
|
Robust Lane Detection Using Multiple Features
Tejus Gupta*, Harshit Sikchi*, Debashish Chakravarty (* Equal Contribution - Dice Rolling) Intelligent Vehicle(IV), 2018 paper Robust lane detection by fusing multiple features. |