OCR Inferencer Node
Overview
The OCR Inferencer node provides specialized Optical Character Recognition capabilities using PaddleOCR models in Docker containers. It supports full OCR pipelines, text detection, text recognition, and document orientation detection for processing documents and images containing text.
Key Features
- Multiple OCR tasks: Full OCR, detection only, recognition only, document rotation
- PaddleOCR integration: Industry-leading OCR accuracy and speed
- Multi-language support: Configurable detection and recognition models
- Document preprocessing: Unwarping, orientation correction, text line rotation
- Docker-based execution: Isolated Python OCR server
- Promise mode: Asynchronous batch processing
- Debug visualization: Display OCR results on canvas
- Auto-warmup: Pre-warm models for faster first requests
- Configurable thresholds: Fine-tune detection and recognition parameters
Architecture
Components
- Node-RED Node: Configuration and message handling
- Docker Container: PaddleOCR Python server
- gRPC Protocol: Communication layer
- PaddleOCR Models: Detection and recognition models
Supported Tasks
| Task | Description | Output |
|---|---|---|
| OCR | Full pipeline (detect + recognize) | Text boxes with recognized text |
| Detection | Find text regions only | Text box coordinates |
| Recognition | Recognize text from images | Text strings |
| Document Rotation | Detect document orientation | Rotation angle (0°, 90°, 180°, 270°) |
Configuration
Settings Tab
Name
- Type: String
- Optional: Yes
- Description: Display name for the node
Task
- Type: Select
- Options: OCR, Detection, Recognition, Document Rotation
- Default: OCR
- Description: Type of OCR operation to perform
Input Field
- Type: Message property path
- Default:
payload - Description: Message field containing input images
Output Field
- Type: Message property path
- Default:
payload - Description: Where OCR results will be stored
Detection Model
- Type: Select
- Visible: OCR and Detection tasks
- Options:
en_PP-OCRv3_det: English optimizedch_PP-OCRv4_det: Chinese optimizedml_PP-OCRv3_det: Multi-language
- Description: Model for text detection
Recognition Model
- Type: Select
- Visible: OCR and Recognition tasks
- Options:
en_PP-OCRv4_rec: English optimizedch_PP-OCRv4_rec: Chinese optimizedlatin_PP-OCRv3_rec: Latin languagesarabic_PP-OCRv3_rec: Arabiccyrillic_PP-OCRv3_rec: Cyrillickorean_PP-OCRv3_rec: Koreanjapan_PP-OCRv3_rec: Japanese
- Description: Model for text recognition
Use Document Unwarping
- Type: Checkbox
- Visible: OCR task only
- Default: Disabled
- Description: Straighten curved/distorted documents
Use Document Orientation
- Type: Checkbox
- Visible: OCR task only
- Default: Disabled
- Description: Correct document rotation (0°, 90°, 180°, 270°)
Use Text Line Orientation
- Type: Checkbox
- Visible: OCR task only
- Default: Disabled
- Description: Correct individual text line rotation
Promise Mode
- Type: Checkbox
- Default: Disabled
- Description: Return promises for async processing
Promises Field
- Type: Message property path
- Default:
promises - Visible: When Promise enabled
Show Debug Image
- Type: Checkbox
- Default: Disabled
Debug Interval
- Type: Number
- Default: 1
Debug Image Width
- Type: Number
- Default: 200
JSON Config Tab
Advanced PaddleOCR configuration:
OCR Task Configuration
{
"common": {
"model_name": "PADDLE",
"config_type": "predict",
"task": "OCR",
"device": "auto",
"verbose": false,
"max_batch": 100
},
"task_specific": {
"det_model": "en_PP-OCRv3_det",
"rec_model": "en_PP-OCRv4_rec"
},
"model_specific": {
"required": {
"use_doc_unwarping": false,
"use_doc_orientation_classify": false,
"use_textline_orientation": false
}
}
}Detection Task Configuration
{
"common": {
"model_name": "PADDLE",
"task": "Detection"
},
"task_specific": {
"det_model": "en_PP-OCRv3_det"
}
}Recognition Task Configuration
{
"common": {
"model_name": "PADDLE",
"task": "Recognition"
},
"task_specific": {
"rec_model": "en_PP-OCRv4_rec"
}
}Document Rotation Task Configuration
{
"common": {
"model_name": "PADDLE",
"task": "Document_Orientation"
}
}Note: Document Rotation is a classification task and does NOT accept OCR preprocessing parameters.
Image Format
Same as Inferencer node:
Rosepetal Bitmap Format
{
width: 1920,
height: 1080,
data: Buffer,
colorSpace: "RGB", // or "GRAY", "BGR"
channels: 3,
dtype: "uint8"
}JPEG/PNG Buffers
msg.payload = imageBuffer;Array of Images
msg.payload = [image1, image2, image3];Input
Basic OCR
msg.payload = {
width: 1024,
height: 768,
data: documentImageBuffer,
colorSpace: "RGB"
};
return msg;Document Rotation Detection
// Configure node with Task: Document Rotation
msg.payload = documentImage;
return msg;Batch Processing
msg.payload = [doc1, doc2, doc3];
return msg;Output
OCR Results
{
payload: [
{
text: "Hello World",
confidence: 0.98,
box: {
points: [[x1, y1], [x2, y2], [x3, y3], [x4, y4]],
xyxy: [x_min, y_min, x_max, y_max]
}
},
{
text: "Sample Text",
confidence: 0.95,
box: {
points: [[x1, y1], [x2, y2], [x3, y3], [x4, y4]],
xyxy: [x_min, y_min, x_max, y_max]
}
}
],
performance: {
payload: {
inferenceTime: 150.5,
preprocessTime: 12.3,
postprocessTime: 8.7,
totalTime: 171.5
}
}
}Detection Results
{
payload: [
{
box: {
points: [[100, 50], [300, 50], [300, 80], [100, 80]],
xyxy: [100, 50, 300, 80]
}
},
{
box: {
points: [[100, 100], [400, 100], [400, 140], [100, 140]],
xyxy: [100, 100, 400, 140]
}
}
]
}Recognition Results
{
payload: [
{
text: "Recognized text line 1",
confidence: 0.97
},
{
text: "Recognized text line 2",
confidence: 0.89
}
]
}Document Rotation Results
{
payload: {
rotation: 90, // 0, 90, 180, or 270
confidence: 0.99
}
}Usage Examples
Example 1: Extract Text from Document
// Function node: Load document
const fs = require('fs');
msg.payload = fs.readFileSync('/path/to/document.jpg');
return msg;OCR Inferencer configuration:
- Task: OCR
- Detection: en_PP-OCRv3_det
- Recognition: en_PP-OCRv4_rec
// Function node: Extract all text
const allText = msg.payload.map(result => result.text).join('\n');
msg.payload = allText;
return msg;Example 2: Document Preprocessing Pipeline
[Load Image] → [OCR: Document Rotation] → [Function: Rotate] → [OCR: Full OCR] → [Output]Step 1: Detect rotation
// OCR Inferencer (Task: Document Rotation)
msg.rotation = msg.payload.rotation;
msg.payload = msg.originalImage;
return msg;Step 2: Rotate image
// Function node: Rotate based on detected angle
const sharp = require('sharp');
msg.payload = await sharp(msg.payload)
.rotate(-msg.rotation) // Counter-rotate
.toBuffer();
return msg;Step 3: Extract text
// OCR Inferencer (Task: OCR)
msg.extractedText = msg.payload.map(r => r.text);
return msg;Example 3: Multi-Language Documents
// English section
msg.englishPart = {
width: 800,
height: 400,
data: cropEnglishRegion(imageBuffer)
};
// Arabic section
msg.arabicPart = {
width: 800,
height: 400,
data: cropArabicRegion(imageBuffer)
};Two OCR nodes with different models:
- Node 1: english detection + recognition
- Node 2: arabic detection + recognition
Example 4: Table Detection and OCR
// Step 1: Detect text regions
// OCR Inferencer (Task: Detection)
msg.textBoxes = msg.payload;
msg.payload = originalImage;
return msg;// Step 2: Filter boxes by position (table columns)
const column1 = msg.textBoxes.filter(box =>
box.box.xyxy[0] < 200 // X position < 200px
);
const column2 = msg.textBoxes.filter(box =>
box.box.xyxy[0] >= 200 && box.box.xyxy[0] < 400
);// Step 3: Crop and recognize each region
const crops = column1.map(box => cropImage(originalImage, box.box.xyxy));
msg.payload = crops;
return msg;OCR Inferencer (Task: Recognition)
Example 5: Confidence Filtering
// Function node: Filter low confidence results
const minConfidence = 0.85;
msg.highConfidence = msg.payload.filter(result =>
result.confidence >= minConfidence
);
msg.lowConfidence = msg.payload.filter(result =>
result.confidence < minConfidence
);
// Log low confidence for review
if (msg.lowConfidence.length > 0) {
node.warn(`${msg.lowConfidence.length} low confidence results`);
}
msg.payload = msg.highConfidence;
return msg;Example 6: Batch Document Processing
// Function node: Load all documents
const fs = require('fs');
const files = fs.readdirSync('/documents/inbox');
msg.payload = files
.filter(f => f.endsWith('.jpg') || f.endsWith('.png'))
.map(f => fs.readFileSync(`/documents/inbox/${f}`));
msg.fileNames = files;
return msg;OCR Inferencer (Promise mode enabled)
// Promise Reader resolves all
msg.ocrResults = await Promise.all(msg.promises);
// Combine with filenames
msg.documents = msg.fileNames.map((name, idx) => ({
fileName: name,
text: msg.ocrResults[idx].map(r => r.text).join('\n')
}));
return msg;Performance Optimization
Model Selection
For speed:
- Detection:
en_PP-OCRv3_det(faster) - Recognition:
en_PP-OCRv4_rec
For accuracy:
- Detection:
ch_PP-OCRv4_det(more accurate) - Recognition: Match to language
Preprocessing
Enable for challenging documents:
- Document unwarping: Curved pages, photos of documents
- Document orientation: Scanned documents in wrong rotation
- Text line orientation: Mixed orientation text
Disable for clean documents:
- All preprocessing OFF for scanned documents
- Faster processing, less overhead
Image Preparation
Before OCR:
const sharp = require('sharp');
// Enhance contrast
msg.payload = await sharp(msg.payload)
.normalize()
.toBuffer();
// Increase resolution
msg.payload = await sharp(msg.payload)
.resize(2000, 2000, { fit: 'inside', withoutEnlargement: true })
.toBuffer();Batch Size
Optimal batch sizes:
- Document rotation: 10-20 images
- Detection only: 10-15 images
- Full OCR: 5-10 images
- Recognition only: 15-25 images
Docker Container Management
Container Lifecycle
Similar to Inferencer node:
- Auto-starts on deploy
- Ready indicator in node status
- Clean removal on redeploy
Memory Usage
Per container (approximate):
- Detection model: 500MB-1GB
- Recognition model: 500MB-1GB
- Full OCR: 1.5GB-2.5GB
- Document rotation: 300MB-500MB
Container Logs
docker logs <ocr-container-id>Available Models
Detection Models
| Model | Language | Version | Use Case |
|---|---|---|---|
en_PP-OCRv3_det | English | v3 | English documents, fast |
ch_PP-OCRv4_det | Chinese | v4 | Chinese/mixed, accurate |
ml_PP-OCRv3_det | Multi-lang | v3 | Multiple languages |
Recognition Models
| Model | Language | Version | Use Case |
|---|---|---|---|
en_PP-OCRv4_rec | English | v4 | English text, latest |
ch_PP-OCRv4_rec | Chinese | v4 | Chinese characters |
latin_PP-OCRv3_rec | Latin | v3 | European languages |
arabic_PP-OCRv3_rec | Arabic | v3 | Arabic script |
cyrillic_PP-OCRv3_rec | Cyrillic | v3 | Russian, etc. |
korean_PP-OCRv3_rec | Korean | v3 | Korean characters |
japan_PP-OCRv3_rec | Japanese | v3 | Japanese text |
Error Handling
Common Errors
"No text detected"
- Cause: Image too low quality, wrong preprocessing
- Solution: Enhance image contrast, adjust preprocessing
"Low confidence results"
- Cause: Poor image quality, wrong recognition model
- Solution: Improve image quality, select correct language model
"Container startup failed"
- Cause: Insufficient memory, model download failed
- Solution: Check available memory, verify model files
"Invalid task configuration"
- Cause: Document rotation with OCR preprocessing params
- Solution: Remove preprocessing params from Document_Orientation task
Debugging
Check text detection:
- Use Detection task first
- Visualize detected boxes
- Verify boxes capture all text
Check recognition:
- Extract detected regions manually
- Test with Recognition task
- Validate model matches language
Enable debug visualization:
- Shows detected boxes on image
- Displays recognized text
- Helps identify issues
Best Practices
Image Quality
- Minimum resolution: 100 DPI for good results
- Contrast: High contrast between text and background
- Lighting: Even illumination, no shadows
- Focus: Sharp, clear text
Model Selection
- Match model to language: Don't use English model for Chinese
- Use latest versions: v4 models generally better than v3
- Test on sample data: Validate before production
Preprocessing
- Start simple: Try without preprocessing first
- Add selectively: Enable only needed features
- Test impact: Measure accuracy improvement vs speed cost
Confidence Thresholds
- Set appropriate filters: 0.8-0.9 for critical applications
- Log low confidence: Review for quality issues
- Adjust per use case: Medical records need higher threshold than labels
Batch Processing
- Group similar documents: Same language, same orientation
- Use promise mode: For large batches
- Monitor memory: Don't exceed capacity
Troubleshooting
Poor OCR Accuracy
Possible causes:
- Wrong language model
- Poor image quality
- Incorrect preprocessing
Solutions:
- Verify model matches text language
- Enhance image (contrast, resolution)
- Try different preprocessing combinations
Missing Text
Possible causes:
- Text too small
- Low contrast
- Unusual fonts
Solutions:
- Increase image resolution
- Enhance contrast
- Use multi-language detection model
Slow Processing
Possible causes:
- Too many preprocessing steps
- Large images
- Complex documents
Solutions:
- Disable unnecessary preprocessing
- Resize images before OCR
- Process in smaller batches
See Also
- Inferencer Node - General AI inference
- Promise Reader Node - Resolve async promises
- Vision Platform Overview - Complete platform documentation
- PaddleOCR Documentation - Official PaddleOCR docs