OCT Analysis with RETFound & Generative Augmentations
Biomedical vision · Generative modeling
-
Co-authored a Bioengineering 2024 paper on RETFound-based retinal OCT feature detection.
-
Fine-tuned a foundation model pretrained on 1.6M OCTs using 1,770 labeled B-scans (SRF/IRF/drusen/PED) and
benchmarked single-task vs multi-task vs ResNet-50 baselines.
-
Reached 0.75-0.80 AUC-ROC and explored data augmentation via GAN/Pix2Pix and MONAI latent diffusion models
(exploratory).
Far-Field Speaker Verification on Mobile Robots
IEEE SP Cup 2024 · Speaker verification
-
1st place globally at IEEE SP Cup 2024 (ICASSP): adapted ERes2Net with targeted augmentations (RIR, MUSAN,
speed) and robot-ready scoring (cosine + adaptive s-norm).
-
Final leaderboard: minDCF 0.67 and EER 8.93.
Document-Level Text Simplification
Two-stage plan-guided transformer
Designed a plan→generate pipeline in which a RoBERTa planner labels each sentence with copy/rephrase/split/delete operations using surrounding context, then feeds the tags into SIMSUM’s summarizer→simplifier stack. Training on R‑Wiki-Auto (12k docs) with curriculum scheduling, the model delivered SARI 43.56 / D-SARI 38.52 and held up on the out-of-domain PLABA medical corpus.
-
Built a sentence-level planning component that predicts edit operations using document context.
-
Conditioned generation on the planned operations to control simplification behavior and reduce unwanted
deletions.
Exploring Self-Supervised Learning with DINO
Self-distillation · Representation learning
Reimplemented DINO’s student–teacher self-distillation with momentum encoders, multi-crop augmentations, and moving-average diagnostics on Imagenette. The distilled backbone exceeded supervised ResNet/Vision Transformer baselines by 12–20% top-1 accuracy, and its frozen features transfer cleanly to CIFAR-10/100 classification and Pascal VOC segmentation.
-
Implemented the student-teacher training loop and stability diagnostics (EMA teacher, centering, temperature
schedules).
-
Evaluated representation quality via frozen-backbone transfer to downstream tasks.
Deep Learning for OFDM Channel Estimation
Wireless communication · Model compression
Modeled a 64-subcarrier, 16-QAM OFDM link end-to-end—pilot insertion ((3+3j) comb pattern), channel simulation, and demapper—and benchmarked classical LS/MMSE estimators against a skip-connected CNN that outputs 64×2 complex taps. The learned model closes much of the MMSE gap at low SNR while significantly outperforming LS, all within a lightweight PyTorch training loop.
-
Built an end-to-end simulation pipeline to generate training and evaluation data under controlled channel
conditions.
-
Compared learned estimators against classical baselines across SNR regimes.
Comprehensive Review of Image Denoising
Classical + deep pipelines
Benchmarked wavelet, NLM, BM3D, and WNNM pipelines against autoencoder, DnCNN, RIDNet, CBDNet, and PRIDNet implementations on BSD400/CBSD68 (noise15 & noise25). Architectural tweaks—LeakyReLU activations, dropout, and cascaded enhancement attention—pushed RIDNet to SSIM 0.937 / 0.828, highlighting when classical priors still win and where deep residual learning shines.
-
Ran a structured benchmark across classical priors and deep residual/attention models on standard noisy
datasets.
-
Documented failure modes and tradeoffs (quality vs compute) for practical denoising pipelines.
PID Control of Drone with Overhead Vision
Robotics club · Real-time control
Authored a Python SDK around the Pluto drone’s UDP protocol (ARM/BOXARM/SET_ATTITUDE) with interchangeable Xbox/keyboard teleop, then layered calibrated ArUco pose estimation for overhead feedback. Cropping the detection ROI to 300×300 shrank compute by 95.7%, letting PID loops run fast enough to hold course during Inter IIT drone swarm trials.
-
Built a real-time control stack combining teleop, overhead vision pose estimation, and PID stabilization.
-
Optimized the vision loop to keep compute bounded and latency stable during flight.