Building Intelligent Document Processing Pipelines: On-Demand vs Batch Inference with Amazon Bedrock

Unlocking Hidden Business Intelligence from Documents

Many organizations sit on goldmines of untapped business intelligence locked away in paper documents and PDFs. With the rise of generative AI and large language models, we now have powerful tools to extract meaningful data from these documents at scale. In this post, we'll explore how to build intelligent document processing pipelines using Amazon Bedrock that give you flexibility in both processing time and cost.

The Challenge: Millions of Documents, Multiple Formats

Imagine having hundreds of millions of scanned PDF documents—like land lease agreements—sitting in your backlog, with new ones arriving daily. Each document might have a different format: some present information in numbered lists, others in tables, and some even include hand-drawn diagrams. How do you efficiently extract structured data from this variety while balancing speed and cost?

Solution Overview: Dual Pipeline Architecture

The solution involves building two complementary inference pipelines:

On-demand pipeline: Processes documents individually for time-sensitive requests, returning results within seconds
Batch inference pipeline: Handles multiple documents asynchronously for cost-optimized processing

Both pipelines leverage Amazon Bedrock's Prompt Management feature, allowing you to dynamically specify different large language models and prompts at the document level—perfect for handling varying document formats with the same infrastructure.

On-Demand Pipeline: Speed When You Need It

The on-demand pipeline uses an AWS SQS FIFO queue to maintain message ordering and ensure reliable delivery. Here's how it works:

Key Components:

SQS FIFO Queue: Triggers processing with message attributes including document ID, LLM model ID, and prompt versions
Lambda Function: Handles the heavy lifting of document processing
Amazon Bedrock: Performs the actual data extraction using multimodal models
DynamoDB: Stores extracted results and performance metrics

Processing Flow:

Lambda retrieves the PDF from S3 and converts pages to PNG images
Relevant prompts are fetched from Amazon Bedrock Prompt Management
For documents with more than 20 pages (current Claude 4 Sonnet limit), the function splits them into manageable chunks
The LLM processes the images and extracts data in JSON format
Results are stored in DynamoDB with tracking information for chunks

Dynamic Prompt Management

One of the most powerful features is the ability to use different prompts for different document types. Since land lease documents can vary dramatically in format, you can specify the appropriate prompt ID and version in each queue message. This ensures optimal extraction accuracy for each document type.

Batch Pipeline: Cost-Optimized Processing

For non-urgent processing needs, the batch pipeline offers significant cost savings by processing multiple documents in a single Amazon Bedrock batch inference job.

Key Differences:

Uses standard SQS queue for higher throughput
EventBridge Scheduler triggers processing on a schedule
Requires minimum 100 records for batch jobs
Handles duplicate message detection (since standard SQS doesn't guarantee exactly-once delivery)
Asynchronous processing with EventBridge rules for completion handling

Practical Implementation Tips

Message Structure

Queue messages contain essential attributes like:

Document S3 location
LLM model ID
Prompt ID and version
System prompt ID and version

Handling Large Documents

The solution elegantly handles documents exceeding model limits by chunking them and tracking each piece with unique identifiers. This ensures no data is lost while maintaining processing efficiency.

Prompt Versioning Strategy

With Amazon Bedrock's limit of 50 prompts per region and 10 versions per prompt, careful prompt management becomes crucial. Consider creating prompts for different document categories and using versioning for refinements.

Choosing Your Pipeline

The decision between on-demand and batch processing depends on your specific needs:

Choose on-demand when:

Processing time-sensitive documents
Need immediate results for downstream systems
Handling small volumes of high-priority documents

Choose batch processing when:

Cost optimization is the priority
Processing large backlogs
Results can wait for scheduled processing windows

The Bottom Line

This dual-pipeline approach gives you the flexibility to handle diverse document processing needs efficiently. By leveraging Amazon Bedrock's prompt management and both on-demand and batch inference capabilities, you can unlock valuable business intelligence from your document archives while optimizing for both speed and cost.

The key to success lies in the dynamic prompt selection capability, which allows a single infrastructure to handle multiple document types effectively. Whether you're processing urgent contracts or working through historical archives, this solution adapts to your needs.

Source: AWS Machine Learning Blog by Tim Shear

Building Intelligent Document Processing Pipelines: On-Demand vs Batch Inference with Amazon Bedrock

Unlocking Hidden Business Intelligence from Documents

The Challenge: Millions of Documents, Multiple Formats

Solution Overview: Dual Pipeline Architecture

On-Demand Pipeline: Speed When You Need It

Key Components:

Processing Flow:

Dynamic Prompt Management

Batch Pipeline: Cost-Optimized Processing

Key Differences:

Practical Implementation Tips

Message Structure

Handling Large Documents

Prompt Versioning Strategy

Choosing Your Pipeline

The Bottom Line

Share this post

Related Posts

How Rocket Close Built an AI Agent That Revolutionized Title Operations

OLMo-Eval: A Game-Changing Evaluation Framework for AI Model Development

Understanding PyTorch Performance: A Deep Dive into Neural Network Optimization

Attribution & Credits

Unlocking Hidden Business Intelligence from Documents

The Challenge: Millions of Documents, Multiple Formats

Solution Overview: Dual Pipeline Architecture

On-Demand Pipeline: Speed When You Need It

Key Components:

Processing Flow:

Dynamic Prompt Management

Batch Pipeline: Cost-Optimized Processing

Key Differences:

Practical Implementation Tips

Message Structure

Handling Large Documents

Prompt Versioning Strategy

Choosing Your Pipeline

The Bottom Line

Share this post

Related Posts

How Rocket Close Built an AI Agent That Revolutionized Title Operations

OLMo-Eval: A Game-Changing Evaluation Framework for AI Model Development

Understanding PyTorch Performance: A Deep Dive into Neural Network Optimization

Attribution & Credits

Quick Feedback