ProjectsDental AI Automation SuiteTechnical Documentation
Technical Documentation0 min read

Dental AI Automation Suite

AI-Powered Document Processing & Verification Workflows

Project Walkthrough

Dental AI Automation Suite - Full Walkthrough

Dental AI Automation Suite - Full Walkthrough

Click to play walkthrough

Dental AI Automation Suite — Technical Documentation

Architecture: Modular n8n workflows with Google Gemini Vision integration
Deployment: Self-hosted n8n instance with Docker
AI Model: Google Gemini 2.0 Flash (vision-capable)
Data Flow: Trigger → AI Processing → Validation → Output


📐 System Architecture Overview

High-Level Design

The automation suite consists of three independent n8n workflows, each solving a specific document processing challenge. The architecture follows a consistent pattern:

text (diagram)
┌─────────────────────────────────────────────────────────────┐
│               Dental AI Automation Suite                  │
├─────────────────────────────────────────────────────────────┤
│                                                               │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐  │
│  │   Workflow 1  │    │   Workflow 2  │    │   Workflow 3  │  │
│  │    Label      │    │    Clinic     │    │    Invoice    │  │
│  │  Generator    │    │   Analysis    │    │      OCR      │  │
│  └──────────────┘    └──────────────┘    └──────────────┘  │
│         │                    │                    │          │
│         └────────────────────┴────────────────────┘          │
│                              │                                │
│                    ┌─────────▼─────────┐                     │
│                    │  Google Gemini    │                     │
│                    │  Vision 2.0 Flash │                     │
│                    └───────────────────┘                     │
└─────────────────────────────────────────────────────────────┘

Core Design Principles

  1. Workflow Independence — Each workflow is self-contained with its own trigger, processing logic, and output handling
  2. Fail-Safe Architecture — Errors in one workflow don't cascade to others
  3. Structured AI Output — All Gemini responses follow strict JSON schemas for reliable parsing
  4. Validation Layers — Pre-processing input validation + post-processing output verification
  5. Graceful Degradation — Partial success handling when full extraction isn't possible

🛠️ Technology Stack Deep Dive

n8n Workflow Orchestration

  • Version: Latest stable (self-hosted)
  • Deployment: Docker container with PostgreSQL persistence
  • Node Types Used:
    • Manual Trigger — User-initiated workflow execution
    • Google Sheets — Read product data
    • HTTP Request — API calls to Gemini, HTML-to-Image services
    • Code (JavaScript) — Custom validation, parsing, and transformation logic
    • Set — Data structuring and output formatting
    • IF — Conditional branching for error handling

Google Gemini Vision 2.0 Flash

  • Model: gemini-2.0-flash-exp (vision-capable, fast inference)
  • API: Google AI Studio REST API
  • Input: Binary image data (JPG/PNG)
  • Output: Structured JSON via prompt engineering
  • Rate Limits: Managed via n8n delay nodes (2-second intervals)
  • Cost: ~$0.00025 per image (1000 images = $0.25)

Supporting Services

  • Google Sheets API — Product data source for label generation
  • HTML-to-Image API — Converts HTML/CSS templates to PNG (htmlcsstoimage.com)
  • Barcode Generation — Server-side barcode API for SKU encoding

🎨 Workflow 1: Product Label Image Generator

Problem Statement

The client needed to generate hundreds of product labels from Google Sheets data, each containing:

  • Product name and description
  • SKU barcode
  • MRP (Maximum Retail Price)
  • Manufacturing and expiry dates
  • Manufacturer details
  • License information

Manual creation in design tools was taking 5-10 minutes per label.

Workflow Architecture

text (diagram)
[Manual Trigger]
      ↓
[Google Sheets: Read Rows]
      ↓
[Loop Over Items] ← Process one row at a time
      ↓
[Code: Validate Required Fields]
      ↓
[Code: Generate HTML Template]
      ↓
[HTTP: Barcode API] → Generate barcode image
      ↓
[Code: Inject Barcode into HTML]
      ↓
[HTTP: HTML-to-Image API]
      ↓
[IF: Conversion Success?]
   ├─ Yes → [Set: Format Response]
   └─ No  → [Code: Log Error]

Key Implementation Details

1. Data Validation

javascript

2. HTML Template Generation

The label template uses inline CSS to ensure consistent rendering across HTML-to-Image APIs:

html

3. Error Handling & Image Caching Strategy

  • Missing data: Skip row, log warning, continue processing
  • Barcode generation failure: Use text fallback "SKU: [value]"
  • HTML-to-Image API failure: Retry 3x with exponential backoff (2s, 4s, 8s)
  • Rate limiting: 1-second delay between API calls to stay within free tier limits
  • Image Caching (API-side): The HTML-to-Image service implicitly caches generated images based on the HTML payload hash. Re-running the workflow with unchanged SKU data returns the cached PNG URL instantly, bypassing the rendering engine and saving compute time.

Performance Metrics

  • Processing time: ~5 seconds per label (vs 5-10 minutes manual)
  • Batch capacity: 100 labels in ~8 minutes
  • Error rate: <2% (mostly due to malformed input data)
  • Cost: $0.02 per 100 labels (HTML-to-Image API)

🎥 Live Demonstration & Architecture

Product Label Generator Architecture
Product Label Generator Architecture


🏥 Workflow 2: Dental Clinic Image Analysis

Problem Statement

The compliance team needed to verify clinic images for:

  • Person count — Ensure staff presence during verification
  • Location validation — Extract pincode from metadata overlay
  • Equipment inventory — Identify visible dental equipment
  • Timestamp verification — Confirm image recency

Manual verification took 5+ minutes per image and was inconsistent across reviewers.

Workflow Architecture

text (diagram)
[Manual Trigger / Webhook]
      ↓
[Set: Define Image URL]
      ↓
[HTTP: Fetch Image Binary]
      ↓
[Gemini: Analyze Image]
      ↓
[Code: Parse JSON Response]
      ↓
[Code: Validate Extracted Data]
      ↓
[IF: Confidence > 0.7?]
   ├─ Yes → [Set: Format Output]
   └─ No  → [Set: Flag for Manual Review]

Key Implementation Details

1. Gemini Vision Prompt Engineering

The prompt is structured to guide Gemini through a systematic scan:

text

2. Confidence-Based Validation

javascript

3. Edge Case Handling

  • No metadata overlay: Return pincode: null, continue with other extractions
  • Ambiguous person count: Use lower bound (conservative estimate)
  • Poor image quality: Increase confidence threshold to 0.8 for flagging
  • Multiple people in background: Explicit prompt instruction to ignore partial figures

Performance Metrics

  • Processing time: ~8 seconds per image (vs 5+ minutes manual)
  • Accuracy: 95%+ for person count, 98%+ for pincode extraction
  • False positive rate: <3% (mostly ambiguous partial figures)
  • Cost: $0.00025 per image

🎥 Live Demonstration & Architecture

Clinic Compliance Verification Architecture
Clinic Compliance Verification Architecture


📄 Workflow 3: Invoice OCR Extractor

Problem Statement

The client receives supplier invoices in pink thermal format (challenging for OCR). Manual data entry was required to extract:

  • PIN code (6-digit) from buyer address
  • Item descriptions (typically 7 items) from "Description of Goods" table

Pink background and thermal printing made traditional OCR unreliable.

Workflow Architecture

text (diagram)
[Manual Trigger]
      ↓
[Set: Define Invoice Image Path]
      ↓
[HTTP: Fetch Image Binary]
      ↓
[Gemini: OCR with Specialized Prompt]
      ↓
[Code: Parse JSON Response]
      ↓
[Code: Validate PIN Format]
      ↓
[Code: Clean Item Descriptions]
      ↓
[Set: Format Final Output]

Key Implementation Details

1. Thermal Invoice Prompt Optimization

The prompt explicitly addresses the pink background challenge:

text

2. PIN Code Validation

javascript

3. Item Description Cleaning

javascript

4. Partial Success Handling

javascript

Performance Metrics

  • Processing time: ~6 seconds per invoice (vs 3-5 minutes manual)
  • PIN code accuracy: 98%+ (regex validation catches errors)
  • Item extraction accuracy: 92%+ (7/7 items correctly extracted)
  • Partial success rate: 5% (typically 5-6 items instead of 7)
  • Cost: $0.00025 per invoice

🎥 Live Demonstration & Architecture

Invoice OCR Extractor Architecture
Invoice OCR Extractor Architecture


🧠 Prompt Engineering Patterns

1. Structured Output Enforcement

Problem: LLMs often return narrative text instead of structured data.

Solution: Explicit JSON schema in prompt + "ONLY valid JSON" instruction:

text

Result: 98%+ valid JSON responses on first attempt.

2. Edge Case Enumeration

Problem: AI models make assumptions about ambiguous inputs.

Solution: Explicitly list what to ignore/exclude:

text

Result: Reduced false positives by 40%.

3. Context-Specific Instructions

Problem: Generic prompts underperform on domain-specific tasks.

Solution: Add context about the specific challenge:

text

Result: OCR accuracy improved from 75% to 92% on thermal invoices.

4. Confidence Scoring

Problem: No way to know when AI is uncertain.

Solution: Request confidence scores in output schema:

json

Result: Enabled automatic flagging of low-confidence results for manual review.


🛡️ Error Handling & Production Readiness

Validation Layers

Layer 1: Pre-Processing Validation

  • Check required fields exist
  • Validate data types and formats
  • Verify image file size and format
  • Confirm API credentials are present

Layer 2: API Error Handling

  • Retry logic with exponential backoff (2s, 4s, 8s)
  • Rate limit compliance (2-second delays between calls)
  • Timeout handling (30-second max per API call)
  • Graceful degradation on partial failures

Layer 3: Post-Processing Validation

  • JSON parsing with try/catch
  • Schema validation against expected structure
  • Confidence threshold checks
  • Data format validation (regex for PIN codes, etc.)

Monitoring & Logging

Each workflow includes comprehensive logging:

javascript

Failure Recovery Strategies

  1. Retry with Backoff — Transient API failures
  2. Partial Success — Return available data, flag for review
  3. Fallback Logic — Use text alternatives when image processing fails
  4. Dead Letter Queue — Log failed items for batch reprocessing
  5. Manual Review Queue — Flag low-confidence results for human verification

📊 Production Deployment Considerations

Scalability

  • Current capacity: 100 images/hour per workflow
  • Bottleneck: Gemini API rate limits (60 requests/minute)
  • Scaling strategy: Implement request queuing with Redis for burst handling

Cost Analysis

WorkflowCost per ItemMonthly Volume (est.)Monthly Cost
Label Generator$0.02500 labels$10
Clinic Analysis$0.00025200 images$0.05
Invoice OCR$0.000251000 invoices$0.25
Total$10.30

Security & Privacy

  • Data sanitization: All sensitive data (SKUs, customer info) removed before public showcase
  • API key management: Stored in n8n credentials vault, never in workflow JSON
  • Access control: n8n instance behind authentication, workflows not publicly accessible
  • Audit trail: All executions logged with timestamps and user IDs

🎯 Lessons Learned & Best Practices

What Worked Well

  1. Modular architecture — Independent workflows made debugging and iteration much faster
  2. Structured prompting — Explicit JSON schemas eliminated 90%+ of parsing errors
  3. Validation layers — Catching errors early saved API costs and processing time
  4. Comprehensive PRDs — Clear requirements prevented scope creep and rework

What Could Be Improved

  1. Batch processing — Current workflows process one item at a time; batch API calls would reduce latency
  2. Internal Result Caching — While our APIs (like HTML-to-Image) handle payload caching on their end, implementing an internal lookup table (e.g., Postgres/Redis) to completely bypass n8n API calls for already-processed SKUs or identical images would further reduce execution latency.
  3. A/B testing — Systematic prompt testing would optimize accuracy further
  4. Monitoring dashboard — Real-time metrics would help identify issues faster

Recommendations for Similar Projects

  1. Start with PRDs — Define inputs, outputs, and edge cases before writing code
  2. Test edge cases early — Pink invoices, metadata overlays, and partial data all required special handling
  3. Invest in prompt engineering — 80% of accuracy comes from prompt quality, not model choice
  4. Build validation layers — Pre-processing and post-processing validation catches 95%+ of errors
  5. Document everything — Future maintainers (including yourself) will thank you

📂 Repository Structure

text (diagram)
Dental-AI-Automation-Suite/
├── README.md                     # Executive Summary (This file on GitHub)
├── TECHNICAL-DOCUMENTATION.md    # Architecture and implementation details (This file)
├── 01-AI-Product-Label-Generator/
│   ├── PRD.md                    # Product requirements
│   ├── SETUP_GUIDE.md            # Deployment instructions
│   ├── TASK 1 -[DEV] workflow.json  # n8n workflow export
│   └── Gemini_Generated_Image_*.png # Reference label template
├── 02-Clinic-Compliance-Verifier/
│   ├── PRD.md
│   ├── PROMPTS.md                # Gemini prompt variations
│   ├── [DEV] Task 2 workflow.json
│   └── Image_*.jpg               # Sample clinic image
├── 03-Thermal-Invoice-OCR/
│   ├── PRD.md
│   ├── TASK-3 [PROD] workflow.json
│   ├── 1000440214.jpg            # Sample invoice
│   └── sample_workflows/
│       └── TASK_3_*.json         # Workflow iterations
└── [DEV] *.mp4                   # Demo videos (3 total)

🚀 Getting Started

Prerequisites

  • n8n instance (self-hosted or cloud)
  • Google AI Studio API key (Gemini Vision access)
  • Google Sheets API credentials (for Task 1)
  • HTML-to-Image API key (for Task 1)

Quick Start

  1. Import workflow JSON files into n8n
  2. Configure API credentials in n8n settings
  3. Update trigger nodes with your data sources
  4. Test with sample data from PRDs
  5. Monitor execution logs for errors

Support & Questions

For implementation questions or collaboration inquiries, reach out via:


Built by Aman Suryavanshi — AI Solutions Architect & Full-Stack Automation Developer
Specializing in n8n workflow automation, LangGraph agents, and production AI integrations.