pankayaraj

Pankayaraj pankayaraj

Achievements

AAAI_2026_AdvBDGen AAAI_2026_AdvBDGen Public

Code base for your work "AdvBDGen: Adversarially fortified prompt-specific fuzzy backdoor generator against llm alignment"

Python 2 2
Robust_Deliberative_Alignment Robust_Deliberative_Alignment Public

Code for the work "Deliberative Alignment is Deep, but Uncertainty Remains: Inference time safety improvement in reasoning via attribution of unsafe behavior to base model"

Python
ACL_2026_REFORM ACL_2026_REFORM Public

Code for the work "Reward Models Can Improve Themselves: Reward-Guided Adversarial Failure Mode Discovery for Robust Reward Modeling"

Python
AAAI_2025_RLHFPoisoning AAAI_2025_RLHFPoisoning Public

"Is poisoning a real threat to LLM alignment? Maybe more so than you think" Pankayaraj Pathmanathan, Souradip Chakraborty, Xiangyu Liu, Yongyuan Liang, Furong Huang. ICML 2024 Workshop MHFAIA

Python 10 3
Cognitive_Computation-2023_Continual-Learning-With-Curiosity Cognitive_Computation-2023_Continual-Learning-With-Curiosity Public

"Using Curiosity for an Even Representation of Tasks in Continual Offline Reinforcement Learning" Pankayaraj Pathmanathan, Natalia Díaz-Rodríguez, Javier Del Ser. Cognitive Computation journal 2023

Python 9
Programming_Algorithms Programming_Algorithms Public

Small library of personal programming algorithm implementations

Python