<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Iteradores on Cesar Gimenes</title><link>https://crg.eti.br/en/tags/iteradores/</link><description>Recent content in Iteradores on Cesar Gimenes</description><generator>Hugo -- gohugo.io</generator><language>en</language><managingEditor>crg@crg.eti.br (Cesar Gimenes)</managingEditor><webMaster>crg@crg.eti.br (Cesar Gimenes)</webMaster><lastBuildDate>Sat, 06 Jun 2026 11:57:59 -0300</lastBuildDate><atom:link href="https://crg.eti.br/en/tags/iteradores/index.xml" rel="self" type="application/rss+xml"/><item><title>Overlapping text chunker for RAG pipelines</title><link>https://crg.eti.br/en/post/text_chunker/</link><pubDate>Sat, 06 Jun 2026 11:57:59 -0300</pubDate><author>crg@crg.eti.br (Cesar Gimenes)</author><guid>https://crg.eti.br/en/post/text_chunker/</guid><description>&lt;p>In a RAG (&lt;em>Retrieval-Augmented Generation&lt;/em>) pipeline the first step is almost always the same: take a large text and break it into pieces before vectorizing. The pieces can&amp;rsquo;t be too big, because the model has a context limit, nor too small, because then the embedding loses semantics. And neighbors need to overlap, otherwise an answer that lands right on the boundary gets squeezed between two chunks and the retriever misses.&lt;/p></description></item></channel></rss>