Real-Time Clinical Documentation · Hindi NLP

Hindi Medical
Transcription
Assistant

A real-time system that converts live doctor–patient conversations into structured clinical notes, smart suggestions, and print-ready Hindi PDF reports.

View on GitHub → See how it works

16kHz

Audio Fidelity

0.0°

LLM Temperature

Core Modules

100%

Offline ASR

Capabilities

Everything a clinic needs,
automated in real-time.

From live audio capture to structured notes — the entire documentation workflow handled intelligently.

⚡

Real-Time Transcription

Low-latency Hindi speech-to-text powered by Vosk. Runs fully offline, streamed over WebSocket for instant display as the doctor speaks.

📝

Live Clinical Structuring

Gemini incrementally extracts symptoms, medications, and diagnoses in real-time — watch the structured fields populate as the consultation progresses.

💡

Intelligent Suggestions

ChromaDB vector search over historical consultations surfaces relevant diagnoses, missed tests, and common medications based on similar past cases.

📄

Hindi PDF Reports

Auto-generates a print-ready OPD slip with proper Devanagari rendering using NotoSansDevanagari fonts via ReportLab. One click to export.

🛡️

Full Audit Trail

Every session stores raw audio, raw transcripts, and structured JSON in date-partitioned folders. Nothing is ever lost; everything is replayable.

🔒

Privacy-First ASR

Speech recognition runs entirely on-device with Vosk — no audio ever leaves the local machine. Patient data stays within the clinic's infrastructure.

Workflow

From microphone
to clinical note.

A five-step pipeline from raw audio to a structured, PDF-ready OPD report with AI-powered insights.

Browser Captures Audio

The frontend uses the Web Audio API to capture microphone input at 16-bit PCM, 16 kHz — optimised for speech recognition models.

WebSocket Stream

Vosk Transcribes Hindi Speech

FastAPI receives raw PCM frames and feeds them to the Vosk ASR engine (vosk-model-hi-0.22). Emits both partial and final results for low-latency display.

Offline · On-Device

Gemini Structures the Conversation

Raw transcript fragments are sent to Google Gemini with a strict JSON schema at Temperature 0.0, incrementally building the clinical state: symptoms, medications, BP, temperature, diagnosis.

Deterministic · Schema-Locked

Vector Store Suggests Related Cases

At session end, the finalized consultation is embedded into ChromaDB. A similarity search returns the top-N historical cases, suggesting likely diagnoses and commonly ordered tests.

ChromaDB · Semantic Search

PDF OPD Slip Generated

ReportLab renders a formatted, Hindi-compatible clinical PDF using NotoSansDevanagari. The full session (audio, transcript, JSON) is archived for auditability.

ReportLab · Devanagari

System Pipeline

Data flow at a glance.

🎙️

Microphone

Browser

→

⚡

FastAPI

WebSocket

→

🔊

Vosk ASR

Hindi Model

→

🤖

Gemini LLM

Incremental

→

🖥️

Dashboard

Live Updates

↓

🗄️

ChromaDB

Vector Store

↓

💡

Suggestions

Similar Cases

↓

📄

PDF Report

ReportLab

↓

🛡️

Audit Store

Sessions

Architecture

Six clean layers,
zero coupling.

Each module owns a single responsibility, making the system easy to extend, swap, or harden for production.

🌐

Frontend

Dumb renderer — captures mic audio and displays live JSON updates pushed from the backend. All logic lives server-side.

HTML/CSS/JS WebAudio API

🔌

Transport

WebSocket layer handles bi-directional streaming, session management, and triggers the suggestion engine on finalization.

FastAPI WebSocket

🎙️

ASR Engine

Vosk adapter streams PCM chunks and emits partial + final results. Fully offline, privacy-preserving, no external API calls.

Vosk vosk-hi-0.22

🤖

LLM Layer

Incremental clinical structuring via Gemini Flash/Pro. Temperature 0.0, strict JSON schema, deterministic output every time.

Gemini JSON Schema

🗄️

Vector Store

ChromaDB stores embeddings of past consultations. Similarity search returns the top-N matches to surface relevant clinical hints.

ChromaDB Embeddings

📂

Storage & PDF

Date-partitioned session files archive raw audio, transcripts, and structured JSON. ReportLab renders Devanagari-compatible PDFs.

ReportLab NotoSans HI

Quick Start

Up and running
in five steps.

Clone the repository

bash

git clone https://github.com/samadrehan02/upgraded-waddle-llm
cd upgraded-waddle-llm

Create a virtual environment

bash

# Windows
python -m venv .venv
.venv\Scriptsctivate

# Linux / macOS
python3 -m venv .venv
source .venv/bin/activate

Install dependencies & download Vosk model

bash

pip install -r requirements.txt

# Place Vosk model at:
models/vosk/hi/vosk-model-hi-0.22/

Configure environment variables

.env

ENV=dev
GEMINI_API_KEY=your_gemini_api_key_here
GEMINI_MODEL=gemini-2.0-flash-exp

Run the server

bash

# Development (hot reload)
uvicorn main:app --reload

# Production
python main.py

# Open: http://localhost:8000

Project Status

Functional Proof-of-Concept

Production-ready foundation.

This project demonstrates a complete pipeline from audio ingestion to vector-backed clinical insights. The architecture is modular and suitable for further hardening, security auditing, and integration into real hospital workflows. It is a documentation assistant only — not a diagnostic tool.

Hindi Medical
Transcription
Assistant

Everything a clinic needs,
automated in real-time.

Real-Time Transcription

Live Clinical Structuring

Intelligent Suggestions

Hindi PDF Reports

Full Audit Trail

Privacy-First ASR

From microphone
to clinical note.

Browser Captures Audio

Vosk Transcribes Hindi Speech

Gemini Structures the Conversation

Vector Store Suggests Related Cases

PDF OPD Slip Generated

Data flow at a glance.

Six clean layers,
zero coupling.

Frontend

Transport

ASR Engine

LLM Layer

Vector Store

Storage & PDF

Built with the best tools.

Up and running
in five steps.

Clone the repository

Create a virtual environment

Install dependencies & download Vosk model

Configure environment variables

Run the server

Production-ready foundation.

Ready to explore
the full source?

Hindi Medical Transcription Assistant

Everything a clinic needs,automated in real-time.

Real-Time Transcription

Live Clinical Structuring

Intelligent Suggestions

Hindi PDF Reports

Full Audit Trail

Privacy-First ASR

From microphoneto clinical note.

Browser Captures Audio

Vosk Transcribes Hindi Speech

Gemini Structures the Conversation

Vector Store Suggests Related Cases

PDF OPD Slip Generated

Data flow at a glance.

Six clean layers,zero coupling.

Frontend

Transport

ASR Engine

LLM Layer

Vector Store

Storage & PDF

Built with the best tools.

Up and runningin five steps.

Clone the repository

Create a virtual environment

Install dependencies & download Vosk model

Configure environment variables

Run the server

Production-ready foundation.

Ready to explorethe full source?

Hindi Medical
Transcription
Assistant

Everything a clinic needs,
automated in real-time.

From microphone
to clinical note.

Six clean layers,
zero coupling.

Up and running
in five steps.

Ready to explore
the full source?