PDF Pal (2024): Chat with PDFs using RAG

A web app that lets users upload PDFs and ask questions in natural language using Retrieval-Augmented Generation (RAG) over document chunks.

View Live

Product Screenshots

PDF Pal — split-pane PDF viewer with RAG chat — Split-pane RAG chat — ask a question and get an answer grounded in the document, alongside the source PDF (zoom, rotate, fullscreen).

PDF Pal — upload a PDF up to 32MB — Upload a PDF (up to 32MB); it's parsed, embedded per page, and indexed in Pinecone for retrieval.

Problem

I built this in university, during a year I was missing a lot of classes and needed to get through dense course PDFs quickly without falling behind. The problem underneath: PDFs are hard to search, slow to read, and don't let you ask questions grounded in the document.

Solution

A SaaS application where users: (1) Upload a PDF, (2) Ask questions in natural language, (3) Receive contextual answers grounded in the PDF using RAG.

Tech Stack

Frontend

Next.js 16 (App Router)

React 19

TypeScript

Tailwind CSS

Radix UI

Backend

Next.js API Routes

tRPC

Database

PostgreSQL + Prisma

Auth

Clerk

OpenAI (GPT-3.5-turbo + embeddings)

LangChain

Pinecone

File storage

UploadThing

Core Data Flows

Upload & Processing Pipeline

UploadThing receives the PDF
onUploadComplete triggers: (1) Create file record in DB with PROCESSING status, (2) Extract text pages via PDFLoader, (3) Generate one embedding per page via OpenAI (page-level vectors), (4) Store embeddings in Pinecone (namespace per file), (5) Update status to SUCCESS or FAILED
Each PDF uses its own Pinecone namespace for isolation
Page-level vectors keep every answer traceable to a specific source page
Status tracking supports real-time UI feedback

Chat / RAG Pipeline

User asks a question
POST /api/message: (1) Save user message to DB, (2) Embed question, (3) Similarity search Pinecone (top 4 pages), (4) Build prompt including retrieved context + recent message history + current question, (5) Stream OpenAI response to client, (6) Save AI response to DB

Database Design

Three core entities: User (email + auth), File (uploadStatus + URL/key + relations), and Message (text + isUserMessage + relations). UploadStatus enum tracks the pipeline: PENDING, PROCESSING, SUCCESS, FAILED.

Architecture Highlights

End-to-end type safety via tRPC
Vector isolation via Pinecone namespace per file
Streaming UX for responsiveness
Upload status polling with automatic UI updates
PostgreSQL for relational data + Pinecone for similarity search
Edge-ready Vercel deployment configuration

Key Features

PDF Viewer

Page navigation
Zoom (100%-300%)
Rotation
Fullscreen
Responsive split-pane layout

Chat

Streaming responses
Infinite scroll history
Markdown rendering
Optimistic updates
Context-aware answers

Key Design Decisions

Used Pinecone namespaces per document instead of a shared index. Eliminates cross-document leakage and simplifies deletion without index-wide operations
Went with tRPC instead of REST. When the types run all the way from the database to the UI, the frontend/backend mismatch bugs mostly stop happening.
Implemented streaming responses from OpenAI rather than waiting for completion. Improves perceived latency for long answers
Built a split-pane layout (chat + PDF viewer with zoom, rotate, fullscreen). Users need to verify AI answers against the source document
Embedded one vector per PDF page instead of character-chunking. Keeps every retrieved answer traceable to a source page and makes per-file deletion trivial
Chose Clerk over custom auth. Authentication is not a differentiator for this product

Tradeoffs

A Pinecone namespace per document means you can't query across documents. That's the point: it kills context bleed, the failure mode that quietly wrecks multi-doc RAG.
tRPC ties the frontend and backend together tightly, which would bite the day I needed a public API. For a solo SaaS, end-to-end types were worth more to me than that flexibility.
Streaming made error handling and message persistence fiddlier than a plain request/response. Watching the answer land token by token earned back the trouble.

Outcome

Live at pdfpal.enkambale.com. Users upload a PDF and get streaming, context-grounded AI answers within seconds. Per-document vector isolation prevents the most common RAG failure: cross-document context bleed.

Lessons Learned

Swapping the model barely moved answer quality. Chunking and isolation moved it a lot. That surprised me, and it's shaped how I build RAG ever since.
Half of 'good AI UX' is just progressive rendering. The exact same answer feels broken when it lands all at once and responsive when it streams in.
One namespace per document is the line between 'trustworthy' and 'why is it citing someone else's contract'. I wouldn't ship multi-tenant RAG without it.
Shipping the whole thing, auth through storage through AI, surfaced constraints no prototype ever would. You learn what a system actually costs by running it.

All Projects