Available for new roles — 2026

Samad

Rehan.

|

Building production-grade LLM systems, RAG pipelines, and GPU-accelerated inference backends that ship with measurable impact on latency, cost, and reliability.

scroll

Engineering ML systems
that work in production.

I'm a Machine Learning Engineer based in Mumbai, India. My work lives at the intersection of LLM systems, NLP pipelines, and production inference infrastructure — where the hard problems aren't just accuracy but latency, cost, and reliability at scale.

I care about the parts of ML that get underspecified: ordering of components, failure modes, safety constraints, and the tradeoffs that only become clear when real traffic hits a system.

%
Latency reduction
%
Fewer deploy regressions
%
Higher task compliance
samad@ml ~

Tech stack.

🤖

LLMs & NLP

LLM Fine-tuningInstruction Tuning RAGNL2SQLNER Prompt EngineeringASR Pipelines Hindi / Hinglish NLP
⚙️

ML Engineering

PyTorchHugging Face Scikit-learnXGBoost Feature EngineeringModel Evaluation Computer VisionOpenCV
☁️

MLOps & Cloud

FastAPIDocker AWS EC2 / S3 / ECRGPU Inference CI/CD for MLModel Versioning LinuxREST APIs
💻

Languages

PythonC C++Java SQLBash

Where I've worked.

Assistant Systems Engineer

Tata Consultancy Services · Mumbai, India

Mar 2026 — Present
  • Joined TCS Mumbai as an Assistant Systems Engineer, contributing to enterprise-scale systems engineering and technology delivery.
Systems EngineeringEnterprise TechTCS

Machine Learning Engineer

Augurs Technologies · Lucknow, India

Dec 2025 — Feb 2026
  • Designed and deployed LLM-powered systems achieving 25–40% lower latency and 30% lower compute cost per request via prompt and inference-path optimization.
  • Built FastAPI inference backends for LLMs and RAG pipelines sustaining sub-second P95 latency at thousands of requests/day with structured logging and auth.
  • Performed LLM fine-tuning (instruction tuning, response calibration) improving task compliance by 35–45% and cutting invalid outputs by 23%.
  • Applied MLOps practices (versioning, monitoring, reproducible deployments) reducing deployment regressions by 60% and rollback time by .
LLM Fine-tuningFastAPIRAG MLOpsGPU InferenceAWS

Flagship work.

🏥
LLM · NER · ASR

AI Medical Scribe

Real-time transcription and structured clinical note generation from noisy Hindi/Hinglish doctor–patient conversations. Entity-first pipeline mitigates hallucinated clinical facts.

Reduced hallucinated clinical entities significantly
Identified text normalization as primary error source
Consistent OPD note structure across sessions
FastAPIVosk ASRLlama 3.1 OllamaHindi NLP
💬
NLP · SQL · LLM

NL-to-SQL Chatbot

Conversational chatbot that translates natural language to safe, validated SQL across multiple schemas. Context-aware multi-turn memory with schema-aware routing and pronoun/reference resolution.

Multi-schema SQL generation with safety validation
Accurate multi-turn follow-up resolution
Combined structured DB + conversational replies
FastAPILLM InferenceSQL Session MemorySchema Routing
📊
Applied ML · Open Source

Dynamic Pricing System

End-to-end retail pricing platform combining synthetic data generation, rule-based heuristics, and ML-driven price optimization with hard business safety constraints and full operational observability.

ML corrections outperformed static rule-based baselines
Cold-start bootstrapping via synthetic data generation
Interactive admin dashboard for SKU exploration
FastAPIFlaskXGBoost Synthetic DataDashboard
🎙️
NLP · LLM

Minutes of Meeting AI

Structured meeting intelligence system extracting decisions, action items, and accountability from multi-speaker conversations. Constrained extraction as an alternative to generic abstractive summaries.

Consistent action-item extraction across meetings
Reduced variance vs. free-form LLM summaries
Handles interruptions and implicit decisions
LLMStructured Extraction Multi-speaker ASRPost-processing
📈
ML · Transformers · Finance

Stock Market Predictor

Full-stack ML application using a TCN Transformer achieving 98% R² on stock forecasting. Real-time prediction pipeline handling thousands of data points per second with 40% latency reduction.

98% R² score on historical datasets
40% query latency reduction
Real-time pipeline at scale
PyTorchTCN Transformer Async APIsCaching
🏋️
ML · FastAPI · XGBoost

Gym PR Predictor

Personalized 1RM prediction app trained on your own workout history. Full-stack FastAPI + vanilla JS with XGBoost model, confidence bands, and Epley baseline for lifter-specific strength insights.

0.99 R² and low MAE on real workout logs
Personalized 1RM vs. log PR comparison per exercise
Interactive dark UI with recent prediction history
FastAPIXGBoostscikit-learn PandasVanilla JS

Credentials.

Cloud Practitioner

Amazon Web Services · Nov 2024

Solutions Architect Associate

Amazon Web Services · Mar 2025

CS50AI — AI with Python

Harvard University / edX · Oct 2025

CS50SQL — Databases with SQL

Harvard University / edX · Dec 2025

Azure Fundamentals AZ-900

Microsoft · Dec 2023

PCEP — Python Programmer

Python Institute · Aug 2022

Let's build something.

Open to full-time roles, contract work, and interesting LLM / MLOps challenges.