Safety Pretraining: Toward the Next Generation of Safe AI
Pratyush Maini*, Sachin Goyal*, Dylan Sam*, Alex Robey, Yash Savani, Yiding Jiang, Andy Zou, Zachary C. Lipton, J. Zico Kolter
NeurIPS, 2025
[pdf, website]
Predicting the Performance of Black-box LLMs through Follow-up Queries
Dylan Sam, Marc Finzi, and J. Zico Kolter
NeurIPS, 2025
ICML Reliable and Responsible Foundation Models, 2025
[pdf, code]
Analyzing Similarity Metrics for Data Selection for Language Model Pretraining
Dylan Sam, Ayan Chakrabarti, Afshin Rostamizadeh, Srikumar Ramalingam, Gui Citovsky, Sanjiv Kumar
NeurIPS, 2025
[pdf]
Evaluating Language Model Reasoning about Confidential Information
Dylan Sam, Alex Robey, Andy Zou, Matt Fredrikson, J. Zico Kolter
[pdf, dataset]
Finetuning CLIP to Reason about Pairwise Differences
Dylan Sam, Devin Willmott, Joao D. Semedo, and J. Zico Kolter
TMLR, 2025
[pdf, code]
Understanding Prompt Engineering Does Not Require Rethinking Generalization
Victor Akinawade, Yiding Jiang, Dylan Sam, and J. Zico Kolter
ICLR, 2024
ICML Sampling and Optimization in Discrete Spaces, 2023 (Outstanding Paper & Oral)
[pdf]
Auditing Fairness under Unobserved Confounders
Yewon Byun, Dylan Sam, Michael Oberst, Zachary C. Lipton, and Bryan Wilder.
AISTATS, 2024
[pdf, code]
Computing Low-Entropy Couplings for Large-Support Distributions
Samuel Sokota, Dylan Sam, Christian Schroeder de Witt, Spencer Compton, Jakob Nicolaus Foerster, and J. Zico Kolter
UAI, 2024
[pdf, code]
Bayesian Neural Networks with Domain Knowledge Priors
Dylan Sam*, Rattana Pukdee*, Daniel P. Jeong, Yewon Byun, and J. Zico Kolter
ICML Knowledge and Logical Reasoning Workshop, 2023 (Oral)
[pdf, code]
Learning with Explanation Constraints
Rattana Pukdee*, Dylan Sam*, J. Zico Kolter, Maria-Florina Balcan, and Pradeep Ravikumar
NeurIPS, 2023
[pdf]
Label Propagation with Weak Supervision
Rattana Pukdee*, Dylan Sam*, Maria-Florina Balcan, and Pradeep Ravikumar
ICLR, 2023
[pdf,
code]
Losses over Labels: Weakly Supervised Learning via Direct Loss Construction
Dylan Sam and J. Zico Kolter
AAAI, 2023
[pdf,
code]
Improving self-supervised representation learning via sequential adversarial masking
Dylan Sam, Min Bai, Tristan McKinney, and Li Erran Li
NeurIPS Workshop on Self-Supervised Learning - Theory and Practice, 2022
[pdf]
Adversarial Multi Class Learning under Weak Supervision with Performance Guarantees
Alessio Mazzetto*, Cyrus Cousins*, Dylan Sam, Stephen H Bach, and Eli Upfal
ICML, 2021
[pdf,
code]
Semi-Supervised Aggregation of Dependent Weak Supervision Sources with Performance Guarantees
Alessio Mazzetto, Dylan Sam, Andrew Park, Eli Upfal, and Stephen H Bach
AISTATS, 2021
[pdf,
code]
Learning from Dependent Weak Supervision Sources
Dylan Sam
Undergraduate Honors Thesis, 2021
[pdf]
Automated Data Accountability for the Mars Science Laboratory
Ryan Alimo, Dylan Sam, Dounia Lakhmiri, Brian Kahovec, and Dariush Divsalar
IEEE Aerospace Conference, 2021
[pdf,
code]
Hierarchical Clustering Analysis of Spectral Fingerprints for Cheminformatics
Dylan Sam and Brenda M Rubenstein
NeurIPS Machine Learning for Molecules Workshop, 2020
[pdf]