Real-Time Clinical Documentation · Hindi NLP

Hindi Medical
Transcription
Assistant

A real-time system that converts live doctor–patient conversations into structured clinical notes, smart suggestions, and print-ready Hindi PDF reports.

16kHz
Audio Fidelity
0.0°
LLM Temperature
5+
Core Modules
100%
Offline ASR

Everything a clinic needs,
automated in real-time.

From live audio capture to structured notes — the entire documentation workflow handled intelligently.

Real-Time Transcription

Low-latency Hindi speech-to-text powered by Vosk. Runs fully offline, streamed over WebSocket for instant display as the doctor speaks.

📝

Live Clinical Structuring

Gemini incrementally extracts symptoms, medications, and diagnoses in real-time — watch the structured fields populate as the consultation progresses.

💡

Intelligent Suggestions

ChromaDB vector search over historical consultations surfaces relevant diagnoses, missed tests, and common medications based on similar past cases.

📄

Hindi PDF Reports

Auto-generates a print-ready OPD slip with proper Devanagari rendering using NotoSansDevanagari fonts via ReportLab. One click to export.

🛡️

Full Audit Trail

Every session stores raw audio, raw transcripts, and structured JSON in date-partitioned folders. Nothing is ever lost; everything is replayable.

🔒

Privacy-First ASR

Speech recognition runs entirely on-device with Vosk — no audio ever leaves the local machine. Patient data stays within the clinic's infrastructure.

From microphone
to clinical note.

A five-step pipeline from raw audio to a structured, PDF-ready OPD report with AI-powered insights.

1

Browser Captures Audio

The frontend uses the Web Audio API to capture microphone input at 16-bit PCM, 16 kHz — optimised for speech recognition models.

WebSocket Stream
2

Vosk Transcribes Hindi Speech

FastAPI receives raw PCM frames and feeds them to the Vosk ASR engine (vosk-model-hi-0.22). Emits both partial and final results for low-latency display.

Offline · On-Device
3

Gemini Structures the Conversation

Raw transcript fragments are sent to Google Gemini with a strict JSON schema at Temperature 0.0, incrementally building the clinical state: symptoms, medications, BP, temperature, diagnosis.

Deterministic · Schema-Locked
4

Vector Store Suggests Related Cases

At session end, the finalized consultation is embedded into ChromaDB. A similarity search returns the top-N historical cases, suggesting likely diagnoses and commonly ordered tests.

ChromaDB · Semantic Search
5

PDF OPD Slip Generated

ReportLab renders a formatted, Hindi-compatible clinical PDF using NotoSansDevanagari. The full session (audio, transcript, JSON) is archived for auditability.

ReportLab · Devanagari

Data flow at a glance.

🎙️
Microphone
Browser
FastAPI
WebSocket
🔊
Vosk ASR
Hindi Model
🤖
Gemini LLM
Incremental
🖥️
Dashboard
Live Updates
🗄️
ChromaDB
Vector Store
💡
Suggestions
Similar Cases
📄
PDF Report
ReportLab
🛡️
Audit Store
Sessions

Six clean layers,
zero coupling.

Each module owns a single responsibility, making the system easy to extend, swap, or harden for production.

🌐

Frontend

Dumb renderer — captures mic audio and displays live JSON updates pushed from the backend. All logic lives server-side.

HTML/CSS/JS WebAudio API
🔌

Transport

WebSocket layer handles bi-directional streaming, session management, and triggers the suggestion engine on finalization.

FastAPI WebSocket
🎙️

ASR Engine

Vosk adapter streams PCM chunks and emits partial + final results. Fully offline, privacy-preserving, no external API calls.

Vosk vosk-hi-0.22
🤖

LLM Layer

Incremental clinical structuring via Gemini Flash/Pro. Temperature 0.0, strict JSON schema, deterministic output every time.

Gemini JSON Schema
🗄️

Vector Store

ChromaDB stores embeddings of past consultations. Similarity search returns the top-N matches to surface relevant clinical hints.

ChromaDB Embeddings
📂

Storage & PDF

Date-partitioned session files archive raw audio, transcripts, and structured JSON. ReportLab renders Devanagari-compatible PDFs.

ReportLab NotoSans HI

Built with the best tools.

🐍
Python Core backend
FastAPI API & WebSocket
🔊
Vosk Offline Hindi ASR
🤖
Gemini LLM Structuring
🗄️
ChromaDB Vector database
📄
ReportLab Hindi PDF gen
🔡
edge-tts TTS testing
🎵
pydub Audio processing

Up and running
in five steps.

1

Clone the repository

bash
git clone https://github.com/samadrehan02/upgraded-waddle-llm
cd upgraded-waddle-llm
2

Create a virtual environment

bash
# Windows
python -m venv .venv
.venv\Scriptsctivate

# Linux / macOS
python3 -m venv .venv
source .venv/bin/activate
3

Install dependencies & download Vosk model

bash
pip install -r requirements.txt

# Place Vosk model at:
models/vosk/hi/vosk-model-hi-0.22/
4

Configure environment variables

.env
ENV=dev
GEMINI_API_KEY=your_gemini_api_key_here
GEMINI_MODEL=gemini-2.0-flash-exp
5

Run the server

bash
# Development (hot reload)
uvicorn main:app --reload

# Production
python main.py

# Open: http://localhost:8000
Functional Proof-of-Concept

Production-ready foundation.

This project demonstrates a complete pipeline from audio ingestion to vector-backed clinical insights. The architecture is modular and suitable for further hardening, security auditing, and integration into real hospital workflows. It is a documentation assistant only — not a diagnostic tool.

Ready to explore
the full source?

Dive into the codebase, fork it, and adapt it for your own clinical documentation needs.