machine learning engineer & researcher

Brihat Sharma

I build and study machine learning systems.

I care about evaluation as much as capability: how we decide whether a model or an agent is actually good, not just whether it looks good.

evaluation
agents
measurement
clinical NLP
physics

See what I build→Read my writing

scroll

01 — selected work

Projects

All projects →

01↗

Agent Judge Calibration

A study of how reliably LLM judges score agentic tool-use trajectories. Measures inter-judge agreement across a five-axis rubric and shows that agents with similar success rates can have very different failure modes.

LLM evaluationagentsjudgesresearch

02↗

Medical Concept Features in Open-Weight LLMs

An interpretability project training sparse autoencoders on the residual stream of an open-weight model (Gemma-2-2B) to isolate features for medical concepts: drugs, diseases, procedures, symptoms. Features are grounded against medical ontologies (UMLS, SNOMED CT, RxNorm), used to steer behavior on a medical QA benchmark, and probed for spurious correlates, for example a “diabetes” feature that is really “age over 60.”

interpretabilitySAEsclinical NLPsafety

03↗

Exoplanet Atlas

An interactive explorer for the 6,000+ confirmed exoplanets in NASA's Exoplanet Archive. Browse and filter the catalog, fly through 3D orbit visualizations with a habitable-zone overlay, compare planets side by side, and dig into discovery statistics, guides, and games.

data vizThree.jsastronomyNext.js

02 — notes

Writing

All writing →