Atharv Nair · Projects

Recent Projects

StealthRL

Reinforcement learning · AI safety

GitHub

Prototyped an RL-driven paraphrasing policy (GRPO + LoRA) tuned against detector feedback while preserving meaning and fluency.
Added semantic/fluency constraints plus an ESL-fairness penalty to reduce regressions across writing styles.
Reduced Fast-DetectGPT score from 0.587 to 0.458 while maintaining semantic similarity of 0.944 at the best checkpoint (exploratory).

Prompt-Length Optimization via Reinforcement Learning

LLMs · Post-training

GitHub

Formulated discrete prompt optimization as an RL problem (AdvBench, Pythia-70M) and trained with GRPO-style updates to shorten prompt suffixes while preserving likelihood.
Achieved 16 → 9 tokens (43.8% compression) with per-token log-likelihood improving from -1.64 to -1.12 on representative runs.
Ran ablations on entropy/credit assignment and compared against REINFORCE and PPO baselines.

Concept Bottleneck Models for Chest X-ray

Interpretability · Medical imaging

Exploring chest X-ray interpretability using VLMs (CheXAgent/MedGemma), concept detectors (CheX-DETR), and CBM layers.
Comparing CBM variants including LF-CBM (BiomedCLIP) and VLG-CBM (CheX), and evaluating concept presence/accuracy via VLM/LLM queries.
Visualizing learned concepts and failure cases with Grad-CAM and saliency-style analyses.

NYC Public Transit

Course project · Data analysis

GitHub

Small course project analyzing NYC public transit data, with an emphasis on clean data processing and communicating results clearly.
Built reproducible analysis notebooks/scripts and produced plots/visuals to summarize findings.

Probability & Statistics Project Website

Web · Simulations

Built a lightweight website to present a probability and statistics project with clean writeups and visuals.
Focused on readable explanations and simple interactive elements to communicate the analysis.

LLM Test-Time Scaling using Process Reward Models

LLMs · Post-training

Built an LLM test-time reasoning prototype for Lean4 tasks using chain-of-thought style rollouts plus PRM-style scoring and reranking (exploratory).
Implemented high-throughput inference with vLLM/FlashAttention, KV-cache batching, and evaluation harnesses to measure success under fixed compute.
Launched Kubernetes jobs on shared A100-80GB clusters: fp16 inference, memory-capped dataloaders, PVC-mounted datasets, and containerized PyTorch/PEFT (LoRA) training.
Automated checkpoints and Weights & Biases logging for repeatable runs.

Older Projects

OCT Analysis with RETFound & Generative Augmentations

Biomedical vision · Generative modeling

Co-authored a Bioengineering 2024 paper on RETFound-based retinal OCT feature detection.
Fine-tuned a foundation model pretrained on 1.6M OCTs using 1,770 labeled B-scans (SRF/IRF/drusen/PED) and benchmarked single-task vs multi-task vs ResNet-50 baselines.
Reached 0.75-0.80 AUC-ROC and explored data augmentation via GAN/Pix2Pix and MONAI latent diffusion models (exploratory).

Far-Field Speaker Verification on Mobile Robots

IEEE SP Cup 2024 · Speaker verification

1st place globally at IEEE SP Cup 2024 (ICASSP): adapted ERes2Net with targeted augmentations (RIR, MUSAN, speed) and robot-ready scoring (cosine + adaptive s-norm).
Final leaderboard: minDCF 0.67 and EER 8.93.

Document-Level Text Simplification

Two-stage plan-guided transformer

Designed a plan→generate pipeline in which a RoBERTa planner labels each sentence with copy/rephrase/split/delete operations using surrounding context, then feeds the tags into SIMSUM’s summarizer→simplifier stack. Training on R‑Wiki-Auto (12k docs) with curriculum scheduling, the model delivered SARI 43.56 / D-SARI 38.52 and held up on the out-of-domain PLABA medical corpus.

Built a sentence-level planning component that predicts edit operations using document context.
Conditioned generation on the planned operations to control simplification behavior and reduce unwanted deletions.

Exploring Self-Supervised Learning with DINO

Self-distillation · Representation learning

Reimplemented DINO’s student–teacher self-distillation with momentum encoders, multi-crop augmentations, and moving-average diagnostics on Imagenette. The distilled backbone exceeded supervised ResNet/Vision Transformer baselines by 12–20% top-1 accuracy, and its frozen features transfer cleanly to CIFAR-10/100 classification and Pascal VOC segmentation.

Implemented the student-teacher training loop and stability diagnostics (EMA teacher, centering, temperature schedules).
Evaluated representation quality via frozen-backbone transfer to downstream tasks.

Deep Learning for OFDM Channel Estimation

Wireless communication · Model compression

Modeled a 64-subcarrier, 16-QAM OFDM link end-to-end—pilot insertion ((3+3j) comb pattern), channel simulation, and demapper—and benchmarked classical LS/MMSE estimators against a skip-connected CNN that outputs 64×2 complex taps. The learned model closes much of the MMSE gap at low SNR while significantly outperforming LS, all within a lightweight PyTorch training loop.

Built an end-to-end simulation pipeline to generate training and evaluation data under controlled channel conditions.
Compared learned estimators against classical baselines across SNR regimes.

Comprehensive Review of Image Denoising

Classical + deep pipelines

Benchmarked wavelet, NLM, BM3D, and WNNM pipelines against autoencoder, DnCNN, RIDNet, CBDNet, and PRIDNet implementations on BSD400/CBSD68 (noise15 & noise25). Architectural tweaks—LeakyReLU activations, dropout, and cascaded enhancement attention—pushed RIDNet to SSIM 0.937 / 0.828, highlighting when classical priors still win and where deep residual learning shines.

Ran a structured benchmark across classical priors and deep residual/attention models on standard noisy datasets.
Documented failure modes and tradeoffs (quality vs compute) for practical denoising pipelines.

PID Control of Drone with Overhead Vision

Robotics club · Real-time control

Authored a Python SDK around the Pluto drone’s UDP protocol (ARM/BOXARM/SET_ATTITUDE) with interchangeable Xbox/keyboard teleop, then layered calibrated ArUco pose estimation for overhead feedback. Cropping the detection ROI to 300×300 shrank compute by 95.7%, letting PID loops run fast enough to hold course during Inter IIT drone swarm trials.

Built a real-time control stack combining teleop, overhead vision pose estimation, and PID stabilization.
Optimized the vision loop to keep compute bounded and latency stable during flight.