Samad Rehan — ML Engineer

01 / About

Engineering ML systems
that work in production.

I'm a Machine Learning Engineer based in Mumbai, India. My work lives at the intersection of LLM systems, NLP pipelines, and production inference infrastructure — where the hard problems aren't just accuracy but latency, cost, and reliability at scale.

I care about the parts of ML that get underspecified: ordering of components, failure modes, safety constraints, and the tradeoffs that only become clear when real traffic hits a system.

%

Latency reduction

%

Fewer deploy regressions

%

Higher task compliance

samad@ml ~

02 / Skills

Tech stack.

🤖

LLMs & NLP

LLM Fine-tuningInstruction Tuning RAGNL2SQLNER Prompt EngineeringASR Pipelines Hindi / Hinglish NLP

⚙️

ML Engineering

PyTorchHugging Face Scikit-learnXGBoost Feature EngineeringModel Evaluation Computer VisionOpenCV

☁️

MLOps & Cloud

FastAPIDocker AWS EC2 / S3 / ECRGPU Inference CI/CD for MLModel Versioning LinuxREST APIs

💻

Languages

PythonC C++Java SQLBash

03 / Experience

Where I've worked.

Assistant Systems Engineer

Tata Consultancy Services · Mumbai, India

Mar 2026 — Present

Joined TCS Mumbai as an Assistant Systems Engineer, contributing to enterprise-scale systems engineering and technology delivery.

Systems EngineeringEnterprise TechTCS

Machine Learning Engineer

Augurs Technologies · Lucknow, India

Dec 2025 — Feb 2026

Designed and deployed LLM-powered systems achieving 25–40% lower latency and 30% lower compute cost per request via prompt and inference-path optimization.
Built FastAPI inference backends for LLMs and RAG pipelines sustaining sub-second P95 latency at thousands of requests/day with structured logging and auth.
Performed LLM fine-tuning (instruction tuning, response calibration) improving task compliance by 35–45% and cutting invalid outputs by 23%.
Applied MLOps practices (versioning, monitoring, reproducible deployments) reducing deployment regressions by 60% and rollback time by 2×.

LLM Fine-tuningFastAPIRAG MLOpsGPU InferenceAWS

04 / Projects

Flagship work.

🏥

LLM · NER · ASR

AI Medical Scribe

Real-time transcription and structured clinical note generation from noisy Hindi/Hinglish doctor–patient conversations. Entity-first pipeline mitigates hallucinated clinical facts.

▸ Reduced hallucinated clinical entities significantly

▸ Identified text normalization as primary error source

▸ Consistent OPD note structure across sessions

FastAPIVosk ASRLlama 3.1 OllamaHindi NLP

💬

NLP · SQL · LLM

NL-to-SQL Chatbot

Conversational chatbot that translates natural language to safe, validated SQL across multiple schemas. Context-aware multi-turn memory with schema-aware routing and pronoun/reference resolution.

▸ Multi-schema SQL generation with safety validation

▸ Accurate multi-turn follow-up resolution

▸ Combined structured DB + conversational replies

FastAPILLM InferenceSQL Session MemorySchema Routing

📊

Applied ML · Open Source

Dynamic Pricing System

End-to-end retail pricing platform combining synthetic data generation, rule-based heuristics, and ML-driven price optimization with hard business safety constraints and full operational observability.

▸ ML corrections outperformed static rule-based baselines

▸ Cold-start bootstrapping via synthetic data generation

▸ Interactive admin dashboard for SKU exploration

FastAPIFlaskXGBoost Synthetic DataDashboard

🎙️

NLP · LLM

Minutes of Meeting AI

Structured meeting intelligence system extracting decisions, action items, and accountability from multi-speaker conversations. Constrained extraction as an alternative to generic abstractive summaries.

▸ Consistent action-item extraction across meetings

▸ Reduced variance vs. free-form LLM summaries

▸ Handles interruptions and implicit decisions

LLMStructured Extraction Multi-speaker ASRPost-processing

📈

ML · Transformers · Finance

Stock Market Predictor

Full-stack ML application using a TCN Transformer achieving 98% R² on stock forecasting. Real-time prediction pipeline handling thousands of data points per second with 40% latency reduction.

▸ 98% R² score on historical datasets

▸ 40% query latency reduction

▸ Real-time pipeline at scale

PyTorchTCN Transformer Async APIsCaching

🏋️

ML · FastAPI · XGBoost

Gym PR Predictor

Personalized 1RM prediction app trained on your own workout history. Full-stack FastAPI + vanilla JS with XGBoost model, confidence bands, and Epley baseline for lifter-specific strength insights.

▸ 0.99 R² and low MAE on real workout logs

▸ Personalized 1RM vs. log PR comparison per exercise

▸ Interactive dark UI with recent prediction history

FastAPIXGBoostscikit-learn PandasVanilla JS

05 / Certifications

Credentials.

AWS

Cloud Practitioner

Amazon Web Services · Nov 2024

AWS

Solutions Architect Associate

Amazon Web Services · Mar 2025

HX

CS50AI — AI with Python

Harvard University / edX · Oct 2025

HX

CS50SQL — Databases with SQL

Harvard University / edX · Dec 2025

AZ

Azure Fundamentals AZ-900

Microsoft · Dec 2023

PY

PCEP — Python Programmer

Python Institute · Aug 2022

06 / Contact

Let's build something.

Open to full-time roles, contract work, and interesting LLM / MLOps challenges.

samadrehan02@gmail.com LinkedIn GitHub

Samad

Rehan.

Engineering ML systemsthat work in production.