LlamaIndex llamaindex - Bluesky Statics

Why Reading PDFs is Hard LlamaIndex is a simple, flexible framework for building knowledge assistants using LLMs connected to your enterprise data.

Read the full breakdown of why PDFs are so challenging and how we're tackling it: www.llamaindex.ai/blog/why-re...

06.03.2026 19:01 — 👍 0 🔁 0 💬 0 📌 0

We built LlamaParse using this hybrid approach: fast text extraction for standard content, vision models for complex layouts. It's how we're solving document processing at scale.

06.03.2026 19:01 — 👍 0 🔁 0 💬 1 📌 0

🔄 Reading order is pure guesswork — content streams have zero relationship to visual flow
🤖 Seventy years of OCR evolution led us to combine text extraction with vision models for optimal results

06.03.2026 19:01 — 👍 0 🔁 0 💬 1 📌 0

📝 PDF text isn't stored as characters: it's glyph shapes positioned at coordinates with no semantic meaning
📊 Tables don't exist as objects: they're just lines and text that happen to look tabular when rendered

06.03.2026 19:01 — 👍 0 🔁 0 💬 1 📌 0

PDFs are the bane of every AI agent's existence: here's why parsing them is so much harder than you think 📄

Every developer building document agents eventually hits the same wall: PDFs weren't designed to be machine-readable. They're drawing instructions from 1982, not structured data.

06.03.2026 19:01 — 👍 1 🔁 0 💬 1 📌 0

LlamaParse vs. the LLMs: Live OCR Battleground

LlamaParse vs. The LLMs — a free webinar where we parse the ugliest documents we can find across every leading model and show the results side by side.

Hosted by George, Head of Engineering, LlamaIndex

When: March 26th; 9 AM PST

Register 👇
landing.llamaindex.ai/llamaparsev...

05.03.2026 20:00 — 👍 0 🔁 0 💬 0 📌 0

"Just send the PDF to GPT-4o"

Ok. We did. Here's what happened:

• Reading order? Wrong.
• Tables? Half missing.
• Hallucinated data? Everywhere.
• Bounding boxes? Nonexistent.
• Cost at 100K pages? Brutal.

So we're doing it live.

05.03.2026 20:00 — 👍 0 🔁 0 💬 1 📌 0

DBOS Durable Execution

This integration with DBOS removes all the manual snapshot work from durable workflows. Just pass a DBOS runtime to your workflow and get great reliability.

Learn how to build durable agents on our new docs: developers.llamaindex.ai/python/llam...

05.03.2026 17:03 — 👍 2 🔁 0 💬 0 📌 0

💤 Idle release feature frees memory for long-running workflows waiting on human input
🛡️ Built-in crash recovery detects and relaunches incomplete workflows automatically

05.03.2026 17:03 — 👍 2 🔁 0 💬 1 📌 0

🔄 Every step transition persists automatically - workflows resume exactly where they left off
⚡ Zero external dependencies with SQLite, or scale to multi-replica deployments with Postgres
👯‍♀️ Built for replication - each replica owns its workflows, with Postgres coordinating across instances

05.03.2026 17:03 — 👍 2 🔁 0 💬 1 📌 0

Creating agent workflows and architecting the logic is one thing, making them durable and fail-safe is another👇

New integration for durable agent workflows with @dbos.dev execution - Make sure your agents survive crashes, restarts, and errors without writing any checkpoint code.

05.03.2026 17:03 — 👍 3 🔁 5 💬 1 📌 0

Our DevRel @tuana.dev gave a 30 minute workshop to get participants started on document agents with LlamaParse. We saw some amazing projects being submitted with no lack of creativity and imagination. Congrats to the 3 winning teams, and see you next time!

04.03.2026 20:04 — 👍 0 🔁 0 💬 0 📌 0

Huge thank you to everyone who joined the Google DeepMind hackathon in NYC with us over the weekend 💛

04.03.2026 20:04 — 👍 1 🔁 0 💬 1 📌 0

Getting Started Introduction to the Split API, a tool for automatically segmenting concatenated PDFs into logical document sections based on content categories.

🎥 Watch the full video here:
📘 Or get started right away with the docs (UI + code examples): developers.llamaindex.ai/python/clou...

04.03.2026 16:58 — 👍 2 🔁 1 💬 0 📌 0

In this walkthrough, @cle-does-things.bsky.social demonstrates how to configure LlamaSplit to break down Environmental Impact Reports into clearly defined impact categories 🌳

04.03.2026 16:58 — 👍 1 🔁 0 💬 1 📌 0

With the intuitive UI, you can:
• Define a custom configuration for how your documents should be categorized
• Specify the exact sections or impact types you want extracted
• Run the job and explore the results through an interactive interface🔍

04.03.2026 16:58 — 👍 0 🔁 0 💬 1 📌 0

If you need to split complex or composite documents into structured categories or sections, LlamaSplit is built for the job ✂️

04.03.2026 16:58 — 👍 0 🔁 0 💬 1 📌 0

LlamaIndex is more than a RAG Framework. It is Agentic Document Processing. LlamaIndex started as a RAG framework. It's evolved into something more focused: best-in-class document infrastructure for agentic work automation. As agent reasoning and coding tools advanced, framework abstractions became less critical. What hasn't changed is the need for accurate document understanding — the vast majority of enterprise knowledge lives in PDFs and spreadsheets, and extracting it reliably remains an unsolved, high-value problem.

Read about our evolution and what's next: www.llamaindex.ai/blog/llamai...

03.03.2026 20:04 — 👍 0 🔁 0 💬 0 📌 0

Our mission is now providing core infrastructure to automate knowledge work over documents, not just being connective tissue between LLMs and data.

03.03.2026 20:04 — 👍 1 🔁 0 💬 1 📌 0

⚙️ Real automation potential exists in workflows where humans manually process documents daily - financial analysis, contract review, insurance underwriting can all become end-to-end agentic processes

03.03.2026 20:04 — 👍 0 🔁 0 💬 1 📌 0

🏢 LlamaParse now processes 300k+ users across 50+ formats for enterprises like Carlyle, CEMEX, and KPMG with multi-agent workflows combining OCR, computer vision, and LLM reasoning

03.03.2026 20:04 — 👍 0 🔁 0 💬 1 📌 0

📄 Document understanding remains a massive opportunity - frontier vision models still struggle with complex tables, charts, and long documents at scale

03.03.2026 20:04 — 👍 0 🔁 0 💬 1 📌 0

LlamaIndex has evolved far beyond a RAG framework - we're now focused on agentic document processing that automates knowledge work.

🚀 Agent orchestration has fundamentally changed with sophisticated reasoning loops, tool discovery through Skills/MCP, and coding agents that write Python for you

03.03.2026 20:04 — 👍 0 🔁 0 💬 1 📌 0

llamaparse_images.ipynb Colab notebook

colab.research.google.com/drive/1EqsH...

02.03.2026 18:14 — 👍 0 🔁 0 💬 0 📌 0

When you parse a document with LlamaParse, you also get access to layout data for figures, charts, etc.

Parse the document, specify to save layout images, and access those images on the response! Each image will be a cropped screenshot of that specific layout element.

02.03.2026 18:14 — 👍 1 🔁 0 💬 1 📌 0

Parse Charts in PDFs and Analyze with Pandas

Check out the full tutorial: developers.llamaindex.ai/python/clou...

27.02.2026 17:02 — 👍 0 🔁 0 💬 0 📌 0

⚡ Use the items view to get per-page structured data including tables and figures

We demonstrate this using a 2024 Executive Summary PDF, extracting a fiscal year chart showing Budget Deficit vs Net Operating Cost data spanning 2020-2024, and reproducing the key financial insights.

27.02.2026 17:02 — 👍 0 🔁 0 💬 1 📌 0

📊 Enable specialized chart parsing to convert visual charts into structured table data
🐼 Extract table rows directly from parsed PDF pages and load them into DataFrames
📈 Perform year-over-year analysis, calculate gaps between metrics, and create visualizations

27.02.2026 17:02 — 👍 0 🔁 0 💬 1 📌 0

Turn your PDF charts into pandas DataFrames with specialized chart parsing in LlamaParse!

This tutorial walks you through extracting structured data from charts and graphs in PDFs, then running data analysis with pandas - no manual data entry required.

27.02.2026 17:02 — 👍 1 🔁 0 💬 1 📌 0

Creating a Deal Sourcing Agent with LlamaAgents Builder LlamaIndex is a simple, flexible framework for building knowledge assistants using LLMs connected to your enterprise data.

Read the full tutorial: www.llamaindex.ai/blog/creati...

26.02.2026 17:02 — 👍 1 🔁 0 💬 0 📌 0

Posts by LlamaIndex (@llamaindex.bsky.social)