AWS Textract Visualization

AWS Textract Visualization - v0 by VercelAWS Textract Visualization - v0 by Vercel

AWS Textract Bounding Box Visualization
Context
I have AWS Textract JSON output containing document analysis results with bounding box coordinates for text elements, tables, and other document components. I need to create a visualization tool that plots these bounding boxes on a coordinate system to understand the document layout.
Task
Create a Python script or web application that:

Parses the AWS Textract JSON containing:

Multiple blocks (PAGE, LINE, WORD, CELL, TABLE_TITLE, etc.)
Each block has geometry data with BoundingBox and Polygon coordinates
Coordinates are normalized (0-1 range) representing relative positions on the page

Visualizes the bounding boxes by:

Plotting rectangles for each bounding box on a canvas/plot
Using different colors for different block types (LINE, WORD, CELL, etc.)
Scaling coordinates appropriately for display
Adding labels or hover information showing the text content and block type

Key coordinate system details:

BoundingBox format: {Width, Height, Left, Top} (all values 0-1)
Polygon format: Array of {X, Y} points (all values 0-1)
Origin (0,0) is at top-left corner
Coordinates represent percentages of page dimensions

Sample JSON Structure
json{
"Blocks": [
   {
     "BlockType": "LINE",
     "Text": "ALMAC",
     "Geometry": {
       "BoundingBox": {
         "Width": 0.1078,
         "Height": 0.0156,
         "Left": 0.0603,
         "Top": 0.0629
       },
       "Polygon": [
         {"X": 0.0603, "Y": 0.0629},
         {"X": 0.1681, "Y": 0.0629},
         {"X": 0.1681, "Y": 0.0785},
         {"X": 0.0603, "Y": 0.0785}
       ]
     }
   }
]
}
Requirements

Option to toggle visibility of different block types
Display text content when hovering over or clicking bounding boxes
Scale the visualization to a reasonable size (e.g., 800x1000 pixels for a page)
Handle the coordinate transformation from normalized (0-1) to pixel coordinates

Expected Output
A tool that loads the Textract JSON and displays a visual representation of the document layout with all detected text regions, table cells, and other elements clearly outlined with their bounding boxes.
Please create a complete, runnable solution with clear instructions for usage

Show full message

203d ago

I'll create a comprehensive AWS Textract bounding box visualization tool that can parse JSON output and display interactive visualizations of document layouts.

I've created a comprehensive AWS Textract bounding box visualization tool with the following features:

Key Features

JSON Input Options:
- File upload for Textract JSON files
- Direct paste functionality
- Sample data for testing
Interactive Visualization:
- Canvas-based rendering with proper coordinate scaling
- Color-coded block types (LINE, WORD, CELL, TABLE, etc.)
- Hover tooltips showing text content, confidence scores, and coordinates
- Toggle visibility for different block types
Coordinate Handling:
- Converts normalized coordinates (0-1) to pixel coordinates
- Proper scaling for 800x1000 canvas display
- Maintains aspect ratio and positioning
Block Type Management:
- Comprehensive color scheme for all Textract block types
- Count display for each block type
- Easy toggle controls with checkboxes

Usage Instructions

Load Data: Click "Load Sample Data" to see the tool in action, or upload your own Textract JSON file
Explore: Hover over bounding boxes to see detailed information
Filter: Use the checkboxes to show/hide specific block types
Analyze: View the document layout with color-coded elements

Technical Details

Coordinate System: Handles AWS Textract's normalized coordinates (0-1 range) with origin at top-left
Rendering: Uses HTML5 Canvas for smooth, interactive visualization
Responsive: Adapts to different screen sizes while maintaining proper scaling
Performance: Efficient rendering with proper event handling

The tool includes sample data that demonstrates various Textract block types including text lines, words, table cells, and page structure. You can immediately see how the visualization works by clicking "Load Sample Data".

203d ago

User will be uploading a jpeg on top of it create bounding box

7f4a52bf-2f37-4541-ab86-ae8955775347_page-0 1.jpeg

203d ago

I'll enhance the AWS Textract visualization tool to support uploading a JPEG image and overlaying the bounding boxes directly on top of the document image. This will provide a much more intuitive way to verify Textract's detection accuracy.

I've enhanced the AWS Textract visualization tool with the following new capabilities:

New Features Added

Document Image Upload:
- Upload JPEG, PNG, or other image formats
- Automatic canvas resizing to match image aspect ratio
- Cross-origin handling for proper image rendering
Image Overlay Visualization:
- Bounding boxes are now drawn directly on top of the original document
- Maintains proper coordinate scaling between normalized Textract coordinates and image pixels
- Transparent bounding boxes allow you to see the underlying text
Sample Integration:
- Added the uploaded ALMAC document as a sample image
- "Load Sample Image" button to quickly test with the provided document
- Coordinated sample data and image for immediate demonstration
Enhanced User Experience:
- Dynamic canvas sizing based on image dimensions
- Improved visual feedback with image backgrounds
- Better coordinate accuracy verification

Usage Workflow

Upload Document Image: Click "Upload Document Image" to select your original document
Upload Textract JSON: Upload the corresponding AWS Textract analysis results
Verify Accuracy: See exactly where Textract detected text, tables, and other elements
Fine-tune View: Toggle different block types to focus on specific detection results

This enhancement makes it much easier to:

Validate Textract Results: Visually verify that bounding boxes align with actual text
Debug Detection Issues: Identify areas where Textract may have missed or misidentified content
Understand Document Structure: See how Textract interprets complex layouts like tables and forms
Quality Assurance: Ensure detection accuracy before processing documents in production

The tool now provides a complete document analysis workflow, allowing you to overlay machine-detected regions directly onto the source document for immediate visual verification.

203d ago

Convert the code in jsx and use some library instead of canvas

203d ago

Parses the AWS Textract JSON containing:

Visualizes the bounding boxes by:

Key coordinate system details:

Show full message

203d ago

I'll create a comprehensive AWS Textract bounding box visualization tool that can parse JSON output and display interactive visualizations of document layouts.

I've created a comprehensive AWS Textract bounding box visualization tool with the following features:

Key Features

JSON Input Options:
- File upload for Textract JSON files
- Direct paste functionality
- Sample data for testing
Interactive Visualization:
- Canvas-based rendering with proper coordinate scaling
- Color-coded block types (LINE, WORD, CELL, TABLE, etc.)
- Hover tooltips showing text content, confidence scores, and coordinates
- Toggle visibility for different block types
Coordinate Handling:
- Converts normalized coordinates (0-1) to pixel coordinates
- Proper scaling for 800x1000 canvas display
- Maintains aspect ratio and positioning
Block Type Management:
- Comprehensive color scheme for all Textract block types
- Count display for each block type
- Easy toggle controls with checkboxes

Usage Instructions

Load Data: Click "Load Sample Data" to see the tool in action, or upload your own Textract JSON file
Explore: Hover over bounding boxes to see detailed information
Filter: Use the checkboxes to show/hide specific block types
Analyze: View the document layout with color-coded elements

Technical Details

Coordinate System: Handles AWS Textract's normalized coordinates (0-1 range) with origin at top-left
Rendering: Uses HTML5 Canvas for smooth, interactive visualization
Responsive: Adapts to different screen sizes while maintaining proper scaling
Performance: Efficient rendering with proper event handling

203d ago

User will be uploading a jpeg on top of it create bounding box

7f4a52bf-2f37-4541-ab86-ae8955775347_page-0 1.jpeg

203d ago

I've enhanced the AWS Textract visualization tool with the following new capabilities:

New Features Added

Document Image Upload:
- Upload JPEG, PNG, or other image formats
- Automatic canvas resizing to match image aspect ratio
- Cross-origin handling for proper image rendering
Image Overlay Visualization:
- Bounding boxes are now drawn directly on top of the original document
- Maintains proper coordinate scaling between normalized Textract coordinates and image pixels
- Transparent bounding boxes allow you to see the underlying text
Sample Integration:
- Added the uploaded ALMAC document as a sample image
- "Load Sample Image" button to quickly test with the provided document
- Coordinated sample data and image for immediate demonstration
Enhanced User Experience:
- Dynamic canvas sizing based on image dimensions
- Improved visual feedback with image backgrounds
- Better coordinate accuracy verification

Usage Workflow

Upload Document Image: Click "Upload Document Image" to select your original document
Upload Textract JSON: Upload the corresponding AWS Textract analysis results
Verify Accuracy: See exactly where Textract detected text, tables, and other elements
Fine-tune View: Toggle different block types to focus on specific detection results

This enhancement makes it much easier to:

Validate Textract Results: Visually verify that bounding boxes align with actual text
Debug Detection Issues: Identify areas where Textract may have missed or misidentified content
Understand Document Structure: See how Textract interprets complex layouts like tables and forms
Quality Assurance: Ensure detection accuracy before processing documents in production

The tool now provides a complete document analysis workflow, allowing you to overlay machine-detected regions directly onto the source document for immediate visual verification.

203d ago

Convert the code in jsx and use some library instead of canvas

203d ago