Back

Rahul Agarwal

Founder | Agentic AI... • 22h

6 Chunking Methods for RAG you should know. I’ve explained it in a simple, step by step way. 𝗪𝗵𝗮𝘁 𝗶𝘀 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴? 1. Chunking means splitting large documents into smaller pieces. 2. Helps LLMs search and understand data better. 3. Essential for Retrieval-Augmented Generation (RAG). 𝗦𝘁𝗲𝗽 1: 𝗦𝗲𝗺𝗮𝗻𝘁𝗶𝗰 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 • Split content based on meaning, not just size. • Group sentences that talk about the same idea. • Uses embeddings to detect topic changes. • Produces high-quality chunks but costs more compute. Best for: Meaning-heavy content where context matters. 𝗦𝘁𝗲𝗽 2: 𝗥𝗲𝗰𝘂𝗿𝘀𝗶𝘃𝗲 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 • Break text using a hierarchy (paragraphs → sentences → words). • Ensures chunks stay within token limits. • Works well for most text documents. Best for: General-purpose RAG pipelines. 𝗦𝘁𝗲𝗽 3: 𝗦𝗲𝗻𝘁𝗲𝗻𝗰𝗲-𝗟𝗲𝘃𝗲𝗹 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 • Split text strictly at sentence boundaries. • Combine multiple sentences into one chunk. • Preserves natural language flow. Best for: Articles, blogs, and readable text. 𝗦𝘁𝗲𝗽 4: 𝗣𝗮𝗿𝗲𝗻𝘁–𝗖𝗵𝗶𝗹𝗱 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 • Store small chunks for search accuracy. • Return larger parent chunks for full context. • Balances precision and completeness. Best for: Question answering systems. 𝗦𝘁𝗲𝗽 5: 𝗔𝗦𝗧-𝗔𝘄𝗮𝗿𝗲 𝗖𝗼𝗱𝗲 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 • Split code using its structure (functions, classes). • Avoids breaking logical blocks. • Requires language-specific parsers. • Keeps code clean and unbroken. Best for: Codebases and developer tools. 𝗦𝘁𝗲𝗽 6: 𝗛𝘆𝗯𝗿𝗶𝗱 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝘆 • Choose chunking method based on content type. 1. Code → AST-aware 2. PDFs → Page-based 3. Text → Recursive or Semantic • Delivers the highest retrieval accuracy. Best for: Production-grade AI systems. ✅ 𝗜𝗻 𝘀𝗵𝗼𝗿𝘁 • 𝗦𝗲𝗺𝗮𝗻𝘁𝗶𝗰 → splits by meaning • 𝗥𝗲𝗰𝘂𝗿𝘀𝗶𝘃𝗲 → splits by structure (most common) • 𝗣𝗮𝗿𝗲𝗻𝘁–𝗖𝗵𝗶𝗹𝗱 → small search, big context • 𝗦𝗲𝗻𝘁𝗲𝗻𝗰𝗲-𝗹𝗲𝘃𝗲𝗹 → simple and natural • 𝗔𝗦𝗧-𝗮𝘄𝗮𝗿𝗲 → best for code • 𝗛𝘆𝗯𝗿𝗶𝗱 → smart combination of all Test with real queries, adjust chunk size, monitor performance, and continuously improve your RAG pipeline. ✅ Repost for others who can benefit from this.

Reply

More like this

Recommendations from Medial

Image Description

Kimiko

Startups | AI | info... • 7m

Vector databases for AI memory just got disrupted… by MP4 files?! Video as Database: Store millions of text chunks in a single MP4 file Store millions of text chunks with blazing-fast semantic search — no database required. 100% open source. Zero

See More
1 Reply
3
18
Image Description

Rahul Agarwal

Founder | Agentic AI... • 24d

Most people building AI systems miss these crucial steps. I've explained the architecture in simple way below. 𝗦𝘁𝗲𝗽 1 – 𝗗𝗮𝘁𝗮 𝗜𝗻𝗴𝗲𝘀𝘁𝗶𝗼𝗻 & 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴 (𝗜𝗻𝗴𝗲𝘀𝘁 𝗟𝗮𝘆𝗲𝗿) • This step brings data into your AI system. •

See More
1 Reply
3
5
Image Description
Image Description

Rahul Agarwal

Founder | Agentic AI... • 27d

Most people don't even know these basics of RAG. I've explained it in a simple way below. 1. 𝗜𝗻𝗱𝗲𝘅𝗶𝗻𝗴 Convert documents into a format that AI can quickly search later. Step-by-step: • 𝗗𝗼𝗰𝘂𝗺𝗲𝗻𝘁: You start with files like PDFs, Word

See More
4 Replies
21
33
4
Image Description

Rahul Agarwal

Founder | Agentic AI... • 2m

9 Steps to Build AI Agents from Scratch. I've given a simple step by step explanation. 𝗦𝘁𝗲𝗽 1: 𝗘𝘀𝘁𝗮𝗯𝗹𝗶𝘀𝗵 𝗠𝗶𝘀𝘀𝗶𝗼𝗻 & 𝗥𝗼𝗹𝗲 • Decide what problem the agent will solve. • Figure out who will use it. • Plan how users will interact

See More
Reply
4
15
1
Image Description
Image Description

Rahul Agarwal

Founder | Agentic AI... • 5m

Simple explanation of Traditional RAG vs Agentic RAG vs MCP. 1. 𝗧𝗿𝗮𝗱𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗥𝗔𝗚 (𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹-𝗔𝘂𝗴𝗺𝗲𝗻𝘁𝗲𝗱 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻) • 𝗦𝘁𝗲𝗽 1: 𝗨𝘀𝗲𝗿 𝗮𝘀𝗸𝘀 𝗮 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻. Example: “𝘞𝘩𝘢𝘵 𝘪𝘴 𝘵𝘩𝘦 𝘤𝘢𝘱𝘪�

See More
4 Replies
34
41
4
Image Description
Image Description

sentence rewriter

Free AI Sentence Rew... • 4m

Writers often face writer’s block. A sentence rewriter can give your text a new shape instantly. Rewrite sentences, improve flow, and create engaging content for blogs, essays, or business use. #SentenceRewriter #ContentWriting #AIWriter #PlagiarismF

See More
2 Replies
5

Rahul Agarwal

Founder | Agentic AI... • 1m

Get RAG-ready data from any unstructured document. This is crazy for AI companies. I've explained below. 𝗦𝘁𝗲𝗽 1 – 𝗨𝗻𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲𝗱 𝗗𝗼𝗰𝘂𝗺𝗲𝗻𝘁𝘀 (𝗧𝗵𝗲 𝗦𝗼𝘂𝗿𝗰𝗲) • Real-world PDFs and documents are messy. Tables, images, signa

See More
Reply
1
5
Image Description

sentence rewriter

Free AI Sentence Rew... • 3m

When you need to rewrite sentences quickly and accurately, this tool delivers. It ensures originality while preserving the true meaning of your text. Whether for academic work or business content, it’s highly effective. Use it free: https://sentencer

See More
1 Reply
2
6

HEMANT GHUGE

Problem Zeroth, Tech... • 6m

Most people think of RAG (Retrieval-Augmented Generation) as a text-only thing. But when we apply it to images, it unlocks serious potential — especially in safety, retail, and surveillance. I recently explored Vision-RAG using Weaviate + LangChain

See More
Reply
4
10
Image Description
Image Description

LIKHITH

 • 

Medial • 1y

INGENIOUS MARKETING! #23 (Volvo) "THE EPIC SPLIT"- One of Best Ads in History. Context: Volvo highlighting the Precision and Stability of their Trucks Like and follow for more.

12 Replies
3
13

Download the medial app to read full posts, comements and news.