Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms
Rafael Rafailov*, Yaswanth Chittepu*, Ryan Park*, Harshit Sikchi*, Joey Hejna*, W. Bradley Knox, Chelsea Finn, Scott Niekum, (* Equal Contribution)
NeurIPS 2024;
paper

We define and explore the reward over-optimization phenomenon in direct alignment algorithms, such as DPO

A Dual Approach to Imitation Learning from Observations with Offline Datasets
Harshit Sikchi, Caleb Chuck, Amy Zhang, Scott Niekum,
CoRL 2024;
paper

A dual approach to LfO that is principled, computationally efficient, and empirically performant.

Score Models for Offline Goal Conditioned Reinforcement Learning
Harshit Sikchi, Rohan Chitnis, Ahmed Touati, Alborz Geramifard, Amy Zhang, Scott Niekum,
ICLR 2024; NeurIPS GCRL 2023
paper

A discriminator-free occupancy matching approach for performant offline goal-conditioned reinforcement learning.

Contrastive Prefence Learning: Learning from Human Feedback without RL
Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh
ICLR 2024;
paper

We propose Constrastive Preference Learning, a new supervised algorithm for learning optimal policies from regret-based preferences in general MDPs.

Dual RL: Unification and New Methods for Reinforcement and Imitation Learning
Harshit Sikchi, Qinqing Zheng, Amy Zhang, Scott Niekum
ICLR 2024 (Spotlight Presentation, Top 5%); ICLR RRL 2023; EWRL 2023
paper

A unification of RL and IL methods through the lens of duality that allows us to propose new methods for discriminator-free imitation learning and stable offline reinforcement learning.

A Ranking Game for Imitation Learning
Harshit Sikchi, Akanksha Saran, Wonjoon Goo, Scott Niekum
TMLR, 2022
project page / code / blog / talk / arXiv

A unifying framework for learning from demonstrations and preferences. We develop methods to solve previously unsolvable tasks in the Learning from Observation (LfO) setting.

Learning Off-Policy with Online Planning
Harshit Sikchi, Wenxuan Zhou, David Held
CoRL, 2021   (Oral Presentation, Best Paper Finalist)
project page / blog / video / arXiv

Improving RL with lookahead: LOOP is an efficient framework for RL to learn with a policy that finds the best action sequence using imaginary rollouts with a learned model. This allows LOOP to potentially reduce dependence on value function errors. LOOP achieves strong performance across a range of tasks and problem settings.

Imitative Planning using Conditional Normalizing Flow
Shubhankar Agarwal,Harshit Sikchi, Cole Gulino, Eric Wilkinson,
IROS, BADUE 2022   (Best Paper)
paper / video

Combining DreamBooth (personalized text-to-image) and DreamFusion (text-to-3D) yields high-quality, subject-specific 3D assets with text-driven modifications

Lyapunov Barrier Policy Optimization
Harshit Sikchi, Wenxuan Zhou, David Held
arXiv, 2021; NeurIPS Deep RL Workshop 2020; NeurIPS Real World RL Workshop 2020
paper / video

LBPO is a performant model-free safe RL method that leverages lyapunov constraints as barriers to guarantee safety. Our method shows state-of-the-art safety performance on OpenAI safety gym.

f-IRL: Inverse Reinforcement Learning via State Marginal Matching
Tianwei Ni*, Harshit Sikchi*, Yufei Wang*, Tejus Gupta* Ben Eysenbach Lisa Lee (* Equal Contribution - Dice Rolling)
CoRL, 2020
paper / project page

Want to get rid on alternating min-max optimization in IRL methods? Check out our method that differentiates through the max-entropy policy optimization to minimize divergence with the demonstrator. We learn transferrable reward function and achieve state of the art performance in imitation learning.

Robust Lane Detection Using Multiple Features
Tejus Gupta*, Harshit Sikchi*, Debashish Chakravarty (* Equal Contribution - Dice Rolling)
Intelligent Vehicle(IV), 2018
paper

Robust lane detection by fusing multiple features.

Conference Talks