Full-Stack AI Application2025

The Suite: Enterprise AI Intelligence

The Suite is a production-grade RAG (Retrieval Augmented Generation) application built with Next.js and Supabase. Users upload documents that are processed end-to-end: content is extracted, chunked using paragraph and heading detection to preserve structure and avoid data loss, then embedded and stored in Pinecone. When a user asks a question, the system embeds the query, uses an LLM to determine intent and whether the knowledge base is needed, retrieves the top 5 relevant chunks from Pinecone, and feeds them to the LLM to generate a response. Responses are validated; when needed, the pipeline augments with real-time web search via the Exa API and regenerates a grounded answer. For personalization, user-provided content is supported and Mem0 is used to persist user preferences and decisions across sessions. The stack is Next.js and Supabase throughout.

The Story

The Problem

Enterprises needed a reliable way to query internal documents (policies, SLAs, contracts) without hallucinations. Generic chunking lost context at paragraph and heading boundaries, and static RAG flows could not decide when to use the KB vs. the open web or adapt to individual user preferences.

The Solution

Designed a full RAG pipeline: document upload → extraction → structure-aware chunking (paragraph + heading detection) → embeddings → Pinecone storage. Query path: user question → query embedding → LLM intent detection → conditional KB lookup (top 5 chunks) → LLM generation → response validation → optional Exa API web search and re-generation. Integrated Mem0 for persistent user preferences and decisions. Delivered as a single Next.js + Supabase application.

My Approach

End-to-end RAG with structure-preserving chunking, intent-driven retrieval, validation, and optional web augmentation; user context and Mem0 for personalization.

Technologies Used

Next.jsSupabasePineconeOpenAIExa APIMem0Langfuse

Core Platform Modules

A. Document ingestion

Upload documents; extract full content and chunk by paragraphs and headings so no structural data is lost before embedding and indexing.

•Upload pipeline
•Paragraph & heading detection
•Structure-aware chunking
•Embedding generation
•Pinecone upsert

B. Query & retrieval

User question is embedded; LLM determines intent and whether to consult the knowledge base; top 5 chunks are fetched from Pinecone and passed to the LLM.

•Query embedding
•Intent detection
•Top‑5 chunk retrieval
•LLM generation
•Response validation

C. Web augmentation

When validation or intent requires live information, the pipeline calls the Exa API, then regenerates a response grounded in both KB and web results.

•Exa API integration
•Conditional web search
•Re-generation with combined context

D. Personalization

User content is included in context; Mem0 stores and recalls user preferences and decisions for personalized answers across sessions.

•User content in context
•Mem0 memory
•Preference-aware responses

System Architecture Principles

•Document upload → extract → paragraph/heading chunking → embed → Pinecone
•User question → embed → LLM intent → (if KB) top 5 chunks → LLM → validate → (if needed) Exa → re-generate
•User context + Mem0 for personalized, persistent preferences

View Live

View Code

Previous: OpenClaw WhatsApp Agent: Branding Image Sub-Agent Next: Dundalk: AI-First Smart Tourism