Building Intelligent Document Processing Pipelines: On-Demand vs Batch Inference with Amazon Bedrock

admin June 12, 2026 4 min read LLM Development

Unlocking Hidden Business Intelligence from Documents

Many organizations sit on goldmines of untapped business intelligence locked away in paper documents and PDFs. With the rise of generative AI and large language models, we now have powerful tools to extract meaningful data from these documents at scale. In this post, we'll explore how to build intelligent document processing pipelines using Amazon Bedrock that give you flexibility in both processing time and cost.

The Challenge: Millions of Documents, Multiple Formats

Imagine having hundreds of millions of scanned PDF documents—like land lease agreements—sitting in your backlog, with new ones arriving daily. Each document might have a different format: some present information in numbered lists, others in tables, and some even include hand-drawn diagrams. How do you efficiently extract structured data from this variety while balancing speed and cost?

Solution Overview: Dual Pipeline Architecture

The solution involves building two complementary inference pipelines:

  • On-demand pipeline: Processes documents individually for time-sensitive requests, returning results within seconds
  • Batch inference pipeline: Handles multiple documents asynchronously for cost-optimized processing

Both pipelines leverage Amazon Bedrock's Prompt Management feature, allowing you to dynamically specify different large language models and prompts at the document level—perfect for handling varying document formats with the same infrastructure.

On-Demand Pipeline: Speed When You Need It

The on-demand pipeline uses an AWS SQS FIFO queue to maintain message ordering and ensure reliable delivery. Here's how it works:

Key Components:

  • SQS FIFO Queue: Triggers processing with message attributes including document ID, LLM model ID, and prompt versions
  • Lambda Function: Handles the heavy lifting of document processing
  • Amazon Bedrock: Performs the actual data extraction using multimodal models
  • DynamoDB: Stores extracted results and performance metrics

Processing Flow:

  1. Lambda retrieves the PDF from S3 and converts pages to PNG images
  2. Relevant prompts are fetched from Amazon Bedrock Prompt Management
  3. For documents with more than 20 pages (current Claude 4 Sonnet limit), the function splits them into manageable chunks
  4. The LLM processes the images and extracts data in JSON format
  5. Results are stored in DynamoDB with tracking information for chunks

Dynamic Prompt Management

One of the most powerful features is the ability to use different prompts for different document types. Since land lease documents can vary dramatically in format, you can specify the appropriate prompt ID and version in each queue message. This ensures optimal extraction accuracy for each document type.

Batch Pipeline: Cost-Optimized Processing

For non-urgent processing needs, the batch pipeline offers significant cost savings by processing multiple documents in a single Amazon Bedrock batch inference job.

Key Differences:

  • Uses standard SQS queue for higher throughput
  • EventBridge Scheduler triggers processing on a schedule
  • Requires minimum 100 records for batch jobs
  • Handles duplicate message detection (since standard SQS doesn't guarantee exactly-once delivery)
  • Asynchronous processing with EventBridge rules for completion handling

Practical Implementation Tips

Message Structure

Queue messages contain essential attributes like:

  • Document S3 location
  • LLM model ID
  • Prompt ID and version
  • System prompt ID and version

Handling Large Documents

The solution elegantly handles documents exceeding model limits by chunking them and tracking each piece with unique identifiers. This ensures no data is lost while maintaining processing efficiency.

Prompt Versioning Strategy

With Amazon Bedrock's limit of 50 prompts per region and 10 versions per prompt, careful prompt management becomes crucial. Consider creating prompts for different document categories and using versioning for refinements.

Choosing Your Pipeline

The decision between on-demand and batch processing depends on your specific needs:

Choose on-demand when:

  • Processing time-sensitive documents
  • Need immediate results for downstream systems
  • Handling small volumes of high-priority documents

Choose batch processing when:

  • Cost optimization is the priority
  • Processing large backlogs
  • Results can wait for scheduled processing windows

The Bottom Line

This dual-pipeline approach gives you the flexibility to handle diverse document processing needs efficiently. By leveraging Amazon Bedrock's prompt management and both on-demand and batch inference capabilities, you can unlock valuable business intelligence from your document archives while optimizing for both speed and cost.

The key to success lies in the dynamic prompt selection capability, which allows a single infrastructure to handle multiple document types effectively. Whether you're processing urgent contracts or working through historical archives, this solution adapts to your needs.

Source: AWS Machine Learning Blog by Tim Shear

Related Posts

Attribution & Credits

Content Type: Original content created by the author.

No external sources or adaptations.

Share Feedback