Skip to content

OCR Inferencer Node

Overview

The OCR Inferencer node provides specialized Optical Character Recognition capabilities using PaddleOCR models in Docker containers. It supports full OCR pipelines, text detection, text recognition, and document orientation detection for processing documents and images containing text.

Key Features

  • Multiple OCR tasks: Full OCR, detection only, recognition only, document rotation
  • PaddleOCR integration: Industry-leading OCR accuracy and speed
  • Multi-language support: Configurable detection and recognition models
  • Document preprocessing: Unwarping, orientation correction, text line rotation
  • Docker-based execution: Isolated Python OCR server
  • Promise mode: Asynchronous batch processing
  • Debug visualization: Display OCR results on canvas
  • Auto-warmup: Pre-warm models for faster first requests
  • Configurable thresholds: Fine-tune detection and recognition parameters

Architecture

Components

  1. Node-RED Node: Configuration and message handling
  2. Docker Container: PaddleOCR Python server
  3. gRPC Protocol: Communication layer
  4. PaddleOCR Models: Detection and recognition models

Supported Tasks

TaskDescriptionOutput
OCRFull pipeline (detect + recognize)Text boxes with recognized text
DetectionFind text regions onlyText box coordinates
RecognitionRecognize text from imagesText strings
Document RotationDetect document orientationRotation angle (0°, 90°, 180°, 270°)

Configuration

Settings Tab

Name

  • Type: String
  • Optional: Yes
  • Description: Display name for the node

Task

  • Type: Select
  • Options: OCR, Detection, Recognition, Document Rotation
  • Default: OCR
  • Description: Type of OCR operation to perform

Input Field

  • Type: Message property path
  • Default: payload
  • Description: Message field containing input images

Output Field

  • Type: Message property path
  • Default: payload
  • Description: Where OCR results will be stored

Detection Model

  • Type: Select
  • Visible: OCR and Detection tasks
  • Options:
    • en_PP-OCRv3_det: English optimized
    • ch_PP-OCRv4_det: Chinese optimized
    • ml_PP-OCRv3_det: Multi-language
  • Description: Model for text detection

Recognition Model

  • Type: Select
  • Visible: OCR and Recognition tasks
  • Options:
    • en_PP-OCRv4_rec: English optimized
    • ch_PP-OCRv4_rec: Chinese optimized
    • latin_PP-OCRv3_rec: Latin languages
    • arabic_PP-OCRv3_rec: Arabic
    • cyrillic_PP-OCRv3_rec: Cyrillic
    • korean_PP-OCRv3_rec: Korean
    • japan_PP-OCRv3_rec: Japanese
  • Description: Model for text recognition

Use Document Unwarping

  • Type: Checkbox
  • Visible: OCR task only
  • Default: Disabled
  • Description: Straighten curved/distorted documents

Use Document Orientation

  • Type: Checkbox
  • Visible: OCR task only
  • Default: Disabled
  • Description: Correct document rotation (0°, 90°, 180°, 270°)

Use Text Line Orientation

  • Type: Checkbox
  • Visible: OCR task only
  • Default: Disabled
  • Description: Correct individual text line rotation

Promise Mode

  • Type: Checkbox
  • Default: Disabled
  • Description: Return promises for async processing

Promises Field

  • Type: Message property path
  • Default: promises
  • Visible: When Promise enabled

Show Debug Image

  • Type: Checkbox
  • Default: Disabled

Debug Interval

  • Type: Number
  • Default: 1

Debug Image Width

  • Type: Number
  • Default: 200

JSON Config Tab

Advanced PaddleOCR configuration:

OCR Task Configuration

json
{
  "common": {
    "model_name": "PADDLE",
    "config_type": "predict",
    "task": "OCR",
    "device": "auto",
    "verbose": false,
    "max_batch": 100
  },
  "task_specific": {
    "det_model": "en_PP-OCRv3_det",
    "rec_model": "en_PP-OCRv4_rec"
  },
  "model_specific": {
    "required": {
      "use_doc_unwarping": false,
      "use_doc_orientation_classify": false,
      "use_textline_orientation": false
    }
  }
}

Detection Task Configuration

json
{
  "common": {
    "model_name": "PADDLE",
    "task": "Detection"
  },
  "task_specific": {
    "det_model": "en_PP-OCRv3_det"
  }
}

Recognition Task Configuration

json
{
  "common": {
    "model_name": "PADDLE",
    "task": "Recognition"
  },
  "task_specific": {
    "rec_model": "en_PP-OCRv4_rec"
  }
}

Document Rotation Task Configuration

json
{
  "common": {
    "model_name": "PADDLE",
    "task": "Document_Orientation"
  }
}

Note: Document Rotation is a classification task and does NOT accept OCR preprocessing parameters.

Image Format

Same as Inferencer node:

Rosepetal Bitmap Format

javascript
{
  width: 1920,
  height: 1080,
  data: Buffer,
  colorSpace: "RGB",  // or "GRAY", "BGR"
  channels: 3,
  dtype: "uint8"
}

JPEG/PNG Buffers

javascript
msg.payload = imageBuffer;

Array of Images

javascript
msg.payload = [image1, image2, image3];

Input

Basic OCR

javascript
msg.payload = {
  width: 1024,
  height: 768,
  data: documentImageBuffer,
  colorSpace: "RGB"
};
return msg;

Document Rotation Detection

javascript
// Configure node with Task: Document Rotation
msg.payload = documentImage;
return msg;

Batch Processing

javascript
msg.payload = [doc1, doc2, doc3];
return msg;

Output

OCR Results

javascript
{
  payload: [
    {
      text: "Hello World",
      confidence: 0.98,
      box: {
        points: [[x1, y1], [x2, y2], [x3, y3], [x4, y4]],
        xyxy: [x_min, y_min, x_max, y_max]
      }
    },
    {
      text: "Sample Text",
      confidence: 0.95,
      box: {
        points: [[x1, y1], [x2, y2], [x3, y3], [x4, y4]],
        xyxy: [x_min, y_min, x_max, y_max]
      }
    }
  ],
  performance: {
    payload: {
      inferenceTime: 150.5,
      preprocessTime: 12.3,
      postprocessTime: 8.7,
      totalTime: 171.5
    }
  }
}

Detection Results

javascript
{
  payload: [
    {
      box: {
        points: [[100, 50], [300, 50], [300, 80], [100, 80]],
        xyxy: [100, 50, 300, 80]
      }
    },
    {
      box: {
        points: [[100, 100], [400, 100], [400, 140], [100, 140]],
        xyxy: [100, 100, 400, 140]
      }
    }
  ]
}

Recognition Results

javascript
{
  payload: [
    {
      text: "Recognized text line 1",
      confidence: 0.97
    },
    {
      text: "Recognized text line 2",
      confidence: 0.89
    }
  ]
}

Document Rotation Results

javascript
{
  payload: {
    rotation: 90,        // 0, 90, 180, or 270
    confidence: 0.99
  }
}

Usage Examples

Example 1: Extract Text from Document

javascript
// Function node: Load document
const fs = require('fs');
msg.payload = fs.readFileSync('/path/to/document.jpg');
return msg;

OCR Inferencer configuration:

  • Task: OCR
  • Detection: en_PP-OCRv3_det
  • Recognition: en_PP-OCRv4_rec
javascript
// Function node: Extract all text
const allText = msg.payload.map(result => result.text).join('\n');
msg.payload = allText;
return msg;

Example 2: Document Preprocessing Pipeline

[Load Image] → [OCR: Document Rotation] → [Function: Rotate] → [OCR: Full OCR] → [Output]

Step 1: Detect rotation

javascript
// OCR Inferencer (Task: Document Rotation)
msg.rotation = msg.payload.rotation;
msg.payload = msg.originalImage;
return msg;

Step 2: Rotate image

javascript
// Function node: Rotate based on detected angle
const sharp = require('sharp');
msg.payload = await sharp(msg.payload)
  .rotate(-msg.rotation)  // Counter-rotate
  .toBuffer();
return msg;

Step 3: Extract text

javascript
// OCR Inferencer (Task: OCR)
msg.extractedText = msg.payload.map(r => r.text);
return msg;

Example 3: Multi-Language Documents

javascript
// English section
msg.englishPart = {
  width: 800,
  height: 400,
  data: cropEnglishRegion(imageBuffer)
};

// Arabic section
msg.arabicPart = {
  width: 800,
  height: 400,
  data: cropArabicRegion(imageBuffer)
};

Two OCR nodes with different models:

  • Node 1: english detection + recognition
  • Node 2: arabic detection + recognition

Example 4: Table Detection and OCR

javascript
// Step 1: Detect text regions
// OCR Inferencer (Task: Detection)
msg.textBoxes = msg.payload;
msg.payload = originalImage;
return msg;
javascript
// Step 2: Filter boxes by position (table columns)
const column1 = msg.textBoxes.filter(box =>
  box.box.xyxy[0] < 200  // X position < 200px
);

const column2 = msg.textBoxes.filter(box =>
  box.box.xyxy[0] >= 200 && box.box.xyxy[0] < 400
);
javascript
// Step 3: Crop and recognize each region
const crops = column1.map(box => cropImage(originalImage, box.box.xyxy));
msg.payload = crops;
return msg;

OCR Inferencer (Task: Recognition)

Example 5: Confidence Filtering

javascript
// Function node: Filter low confidence results
const minConfidence = 0.85;

msg.highConfidence = msg.payload.filter(result =>
  result.confidence >= minConfidence
);

msg.lowConfidence = msg.payload.filter(result =>
  result.confidence < minConfidence
);

// Log low confidence for review
if (msg.lowConfidence.length > 0) {
  node.warn(`${msg.lowConfidence.length} low confidence results`);
}

msg.payload = msg.highConfidence;
return msg;

Example 6: Batch Document Processing

javascript
// Function node: Load all documents
const fs = require('fs');
const files = fs.readdirSync('/documents/inbox');

msg.payload = files
  .filter(f => f.endsWith('.jpg') || f.endsWith('.png'))
  .map(f => fs.readFileSync(`/documents/inbox/${f}`));

msg.fileNames = files;
return msg;

OCR Inferencer (Promise mode enabled)

javascript
// Promise Reader resolves all
msg.ocrResults = await Promise.all(msg.promises);

// Combine with filenames
msg.documents = msg.fileNames.map((name, idx) => ({
  fileName: name,
  text: msg.ocrResults[idx].map(r => r.text).join('\n')
}));

return msg;

Performance Optimization

Model Selection

For speed:

  • Detection: en_PP-OCRv3_det (faster)
  • Recognition: en_PP-OCRv4_rec

For accuracy:

  • Detection: ch_PP-OCRv4_det (more accurate)
  • Recognition: Match to language

Preprocessing

Enable for challenging documents:

  • Document unwarping: Curved pages, photos of documents
  • Document orientation: Scanned documents in wrong rotation
  • Text line orientation: Mixed orientation text

Disable for clean documents:

  • All preprocessing OFF for scanned documents
  • Faster processing, less overhead

Image Preparation

Before OCR:

javascript
const sharp = require('sharp');

// Enhance contrast
msg.payload = await sharp(msg.payload)
  .normalize()
  .toBuffer();

// Increase resolution
msg.payload = await sharp(msg.payload)
  .resize(2000, 2000, { fit: 'inside', withoutEnlargement: true })
  .toBuffer();

Batch Size

Optimal batch sizes:

  • Document rotation: 10-20 images
  • Detection only: 10-15 images
  • Full OCR: 5-10 images
  • Recognition only: 15-25 images

Docker Container Management

Container Lifecycle

Similar to Inferencer node:

  1. Auto-starts on deploy
  2. Ready indicator in node status
  3. Clean removal on redeploy

Memory Usage

Per container (approximate):

  • Detection model: 500MB-1GB
  • Recognition model: 500MB-1GB
  • Full OCR: 1.5GB-2.5GB
  • Document rotation: 300MB-500MB

Container Logs

bash
docker logs <ocr-container-id>

Available Models

Detection Models

ModelLanguageVersionUse Case
en_PP-OCRv3_detEnglishv3English documents, fast
ch_PP-OCRv4_detChinesev4Chinese/mixed, accurate
ml_PP-OCRv3_detMulti-langv3Multiple languages

Recognition Models

ModelLanguageVersionUse Case
en_PP-OCRv4_recEnglishv4English text, latest
ch_PP-OCRv4_recChinesev4Chinese characters
latin_PP-OCRv3_recLatinv3European languages
arabic_PP-OCRv3_recArabicv3Arabic script
cyrillic_PP-OCRv3_recCyrillicv3Russian, etc.
korean_PP-OCRv3_recKoreanv3Korean characters
japan_PP-OCRv3_recJapanesev3Japanese text

Error Handling

Common Errors

"No text detected"

  • Cause: Image too low quality, wrong preprocessing
  • Solution: Enhance image contrast, adjust preprocessing

"Low confidence results"

  • Cause: Poor image quality, wrong recognition model
  • Solution: Improve image quality, select correct language model

"Container startup failed"

  • Cause: Insufficient memory, model download failed
  • Solution: Check available memory, verify model files

"Invalid task configuration"

  • Cause: Document rotation with OCR preprocessing params
  • Solution: Remove preprocessing params from Document_Orientation task

Debugging

Check text detection:

  • Use Detection task first
  • Visualize detected boxes
  • Verify boxes capture all text

Check recognition:

  • Extract detected regions manually
  • Test with Recognition task
  • Validate model matches language

Enable debug visualization:

  • Shows detected boxes on image
  • Displays recognized text
  • Helps identify issues

Best Practices

Image Quality

  1. Minimum resolution: 100 DPI for good results
  2. Contrast: High contrast between text and background
  3. Lighting: Even illumination, no shadows
  4. Focus: Sharp, clear text

Model Selection

  1. Match model to language: Don't use English model for Chinese
  2. Use latest versions: v4 models generally better than v3
  3. Test on sample data: Validate before production

Preprocessing

  1. Start simple: Try without preprocessing first
  2. Add selectively: Enable only needed features
  3. Test impact: Measure accuracy improvement vs speed cost

Confidence Thresholds

  1. Set appropriate filters: 0.8-0.9 for critical applications
  2. Log low confidence: Review for quality issues
  3. Adjust per use case: Medical records need higher threshold than labels

Batch Processing

  1. Group similar documents: Same language, same orientation
  2. Use promise mode: For large batches
  3. Monitor memory: Don't exceed capacity

Troubleshooting

Poor OCR Accuracy

Possible causes:

  • Wrong language model
  • Poor image quality
  • Incorrect preprocessing

Solutions:

  • Verify model matches text language
  • Enhance image (contrast, resolution)
  • Try different preprocessing combinations

Missing Text

Possible causes:

  • Text too small
  • Low contrast
  • Unusual fonts

Solutions:

  • Increase image resolution
  • Enhance contrast
  • Use multi-language detection model

Slow Processing

Possible causes:

  • Too many preprocessing steps
  • Large images
  • Complex documents

Solutions:

  • Disable unnecessary preprocessing
  • Resize images before OCR
  • Process in smaller batches

See Also