Skip to content

Inferencer Node

Overview

The Inferencer node provides AI-powered computer vision inference by running machine learning models in Docker containers. It acts as a bridge between Node-RED flows and Python-based ML models, supporting detection, classification, segmentation, and anomaly detection tasks.

Key Features

  • Multiple ML frameworks: YOLO, PaddlePaddle, RT-DETR, Anomaly detection
  • Task support: Object detection, image classification, instance/semantic segmentation
  • Docker-based execution: Isolated Python containers for model inference
  • Load balancing: Run multiple container instances for higher throughput
  • Dynamic models: Load models at runtime or configure statically
  • Auto-warmup: Pre-warm models on startup for faster first predictions
  • Firebase integration: Automatic model download from Firebase storage
  • Promise mode: Asynchronous processing with promise-reader integration
  • Output customization: Configure detection boxes, masks, and result formats
  • Class mapping: Rename or filter model output classes
  • Debug visualization: Display processed images on Node-RED canvas
  • Performance tracking: Built-in metrics for inference timing

Architecture

Components

  1. Node-RED Node: Configuration and message handling
  2. Docker Containers: Python inference servers (1-3 instances)
  3. gRPC Protocol: Communication between Node-RED and containers
  4. Model Storage: /opt/storage/models/ directory
  5. Firebase: Optional model download source

Data Flow

Input Images → Queue → gRPC → Docker Container → ML Model → Predictions → Output

         Load Balancing (multiple containers)

Configuration

Settings Tab

Name

  • Type: String
  • Optional: Yes
  • Description: Display name for the node

Input Field

  • Type: Message property path
  • Default: payload
  • Description: Message field containing input images (single image or array)

Output Field

  • Type: Message property path
  • Default: payload
  • Description: Where inference results will be stored
  • Note: Performance stats saved to msg.performance.<field>

Model Source

  • Type: Select
  • Options:
    • Select here: Choose from available models in /opt/storage/models
    • Dynamic: Provide model name at runtime via message/flow/global property

Model Name (Static Mode)

  • Type: Dropdown
  • Description: Select from available models
  • Display: Shows model name, task type, and framework

Model Field (Dynamic Mode)

  • Type: TypedInput (msg, flow, global)
  • Example: msg.modelName, flow.currentModel, global.activeModel
  • Description: Property path containing model name at runtime
  • Note: Enables automatic model download if Firebase is configured

Auto Warmup (Static Mode)

  • Type: Checkbox
  • Default: Disabled
  • Description: Trigger synthetic inference on startup to prepare model
  • Benefit: Eliminates cold-start latency on first real request

Promise Mode

  • Type: Checkbox
  • Default: Disabled
  • Description: Return promises instead of waiting for results
  • Requires: promise-reader node to resolve promises
  • Use Case: High-throughput batch processing

Promises Field

  • Type: Message property path
  • Default: promises
  • Visible: When Promise Mode enabled
  • Description: Array field to store pending promises

Number of Concurrent Servers

  • Type: Number
  • Range: 1-3
  • Default: 1
  • Description: Docker container instances for load balancing
  • Impact: Higher = more throughput, more memory usage

Maximum Concurrent Predictions

  • Type: Number
  • Range: 1-20
  • Default: 5
  • Description: Parallel requests per server
  • Behavior: Excess requests queue until slot available

Show Debug Image

  • Type: Checkbox
  • Default: Disabled
  • Description: Display processed images on Node-RED canvas

Debug Interval

  • Type: Number
  • Default: 1
  • Visible: When debug enabled
  • Description: Show every Nth image (1 = all images)

Debug Image Width

  • Type: Number (pixels)
  • Default: 200
  • Visible: When debug enabled
  • Description: Display width for debug images

JSON Config Tab

Advanced model configuration in JSON format. Structure varies by model type:

Common Fields (All Models)

json
{
  "common": {
    "model_name": "my-model",
    "config_type": "predict",
    "task": "Detection",
    "device": "auto",
    "image_shape": {
      "width": 640,
      "height": 640
    },
    "verbose": false,
    "max_batch": 100
  }
}

Fields:

  • model_name: Model identifier
  • config_type: Always "predict"
  • task: Detection, Classification, Segmentation, Anomaly
  • device: "auto", "cpu", "cuda", "mps"
  • image_shape: Target dimensions for resizing
  • verbose: Enable detailed logging
  • max_batch: Maximum batch size

Detection Models (YOLO, RT-DETR, PaddlePaddle)

json
{
  "common": { /* ... */ },
  "task_specific": {
    "conf_threshold": 0.5,
    "nms_iou_threshold": 0.7,
    "max_det": 300
  }
}

Task-specific fields:

  • conf_threshold: Confidence threshold (0.0-1.0)
  • nms_iou_threshold: Non-maximum suppression IoU threshold
  • max_det: Maximum detections per image

Classification Models

json
{
  "common": { /* ... */ },
  "task_specific": {
    "top_k": 5
  }
}

Task-specific fields:

  • top_k: Return top K predictions

Segmentation Models (YOLO)

json
{
  "common": { /* ... */ },
  "task_specific": {
    "conf_threshold": 0.5,
    "retina_masks": true
  }
}

Task-specific fields:

  • conf_threshold: Detection confidence threshold
  • retina_masks: High-resolution mask output

Anomaly Detection

json
{
  "common": { /* ... */ },
  "task_specific": {
    "threshold": 0.5,
    "normalize": true
  },
  "model_specific": {
    "optional": {
      "image_threshold": 0.5,
      "pixel_threshold": 0.5
    }
  }
}

Output Formats Tab

Configure detection and segmentation output formats.

Detection Output Formats

  • boxes_xyxy: Bounding boxes [x1, y1, x2, y2] (top-left, bottom-right)
  • boxes_xywh: Bounding boxes [x_center, y_center, width, height]
  • boxes_tlwh: Bounding boxes [x_top_left, y_top_left, width, height]
  • boxes_corners: Four corners [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]

Segmentation Output Formats

  • masks_rle: Run-length encoding (compact format)
  • masks_polygon: Polygon contours
  • masks_bitmap: Binary pixel masks

Classes Tab

Map or filter model output classes.

Features:

  • Rename classes to custom labels
  • Filter results by remapping to empty string
  • Per-model configuration storage

Example:

Original: "person" → Custom: "human"
Original: "car" → Custom: "vehicle"
Original: "background" → Custom: "" (filtered out)

Image Format

The node accepts multiple image formats:

1. Rosepetal Bitmap Format

javascript
{
  width: 1920,
  height: 1080,
  data: Buffer,           // Raw pixel data
  colorSpace: "RGB",      // "GRAY", "RGB", "RGBA", "BGR", "BGRA"
  channels: 3,            // Auto-inferred if omitted
  dtype: "uint8"          // Currently only uint8 supported
}

Color space mapping:

  • GRAY: 1 channel
  • RGB / BGR: 3 channels
  • RGBA / BGRA: 4 channels

2. JPEG/PNG Buffers

javascript
msg.payload = fs.readFileSync('image.jpg');

The node automatically decodes standard image formats.

3. Array of Images

javascript
msg.payload = [image1, image2, image3];

Processes multiple images in batch for improved performance.

Input

Basic Input

javascript
msg.payload = imageBuffer;
return msg;

Dynamic Model Selection

javascript
// Configure node with Model Source: Dynamic, Model Field: msg.modelName
msg.modelName = "yolo-detection-v8";
msg.payload = imageBuffer;
return msg;

Warmup Request

javascript
msg.warmup = true;
msg.payload = syntheticImage; // Or omit for auto-generated
return msg;

Triggers model warmup without returning results.

Output

Detection Results

javascript
{
  payload: [
    {
      box: {
        xyxy: [100, 150, 300, 400],
        xywh: [200, 275, 200, 250],
        confidence: 0.95
      },
      class: "person",
      class_id: 0
    }
  ],
  performance: {
    payload: {
      inferenceTime: 45.2,
      preprocessTime: 5.1,
      postprocessTime: 3.8,
      totalTime: 54.1
    }
  }
}

Classification Results

javascript
{
  payload: [
    {
      class: "cat",
      class_id: 281,
      confidence: 0.98
    },
    {
      class: "dog",
      class_id: 179,
      confidence: 0.01
    }
  ]
}

Segmentation Results

javascript
{
  payload: [
    {
      box: { xyxy: [100, 150, 300, 400] },
      class: "person",
      mask: {
        rle: "...",           // Run-length encoded
        polygon: [[x1, y1], [x2, y2], ...],
        bitmap: Buffer        // Binary mask
      }
    }
  ]
}

Promise Mode Output

javascript
{
  promises: [
    Promise { <pending> },
    Promise { <pending> }
  ]
}

Resolve with promise-reader node.

Usage Examples

Example 1: Simple Object Detection

javascript
// Function node: Load image
msg.payload = {
  width: 640,
  height: 480,
  data: imageBuffer,
  colorSpace: "RGB"
};
return msg;

Inferencer configuration:

  • Model: yolo-v8-detection
  • Input: payload
  • Output: detections
javascript
// Function node: Filter high confidence
msg.payload = msg.detections.filter(det => det.box.confidence > 0.8);
return msg;

Example 2: Batch Processing

javascript
// Function node: Prepare batch
const images = [
  { width: 640, height: 480, data: buffer1, colorSpace: "RGB" },
  { width: 640, height: 480, data: buffer2, colorSpace: "RGB" },
  { width: 640, height: 480, data: buffer3, colorSpace: "RGB" }
];
msg.payload = images;
return msg;

Output:

javascript
msg.payload = [
  [detection1a, detection1b],  // Results from image 1
  [detection2a],                // Results from image 2
  []                            // No detections in image 3
];

Example 3: Dynamic Model Selection

javascript
// Function node: Choose model based on input
if (msg.topic === "quality-check") {
  msg.modelName = "defect-detection-model";
} else if (msg.topic === "classification") {
  msg.modelName = "product-classifier";
}
msg.payload = imageData;
return msg;

Example 4: Promise Mode for High Throughput

javascript
// Function node: Send batch with promises
msg.payload = imageArray;
return msg;

Inferencer (Promise enabled)

javascript
// Promise Reader node resolves all
msg.results = await Promise.all(msg.promises);
return msg;

Example 5: Class Filtering

Inferencer Classes config:

person → person
car → vehicle
truck → vehicle
bicycle →
motorcycle →

Result: Only "person" and "vehicle" classes in output, bicycle/motorcycle filtered.

Example 6: Multi-Model Pipeline

[Camera] → [Inferencer: Detection] → [Function: Filter] → [Inferencer: Classification] → [Output]

Detection step:

javascript
msg.detections = msg.payload;
msg.payload = cropDetections(msg.payload); // Extract regions
return msg;

Classification step:

javascript
// Classify each detected region
msg.classifications = msg.payload;
return msg;

Performance Optimization

Concurrency Settings

Single server, low concurrency:

  • Servers: 1
  • Max concurrent: 5
  • Use case: Low memory systems, sporadic requests

Multiple servers, high concurrency:

  • Servers: 3
  • Max concurrent: 10
  • Use case: High-throughput production, powerful hardware

Batch Processing

Optimal batch size:

  • Small models (YOLO-nano): 10-20 images
  • Medium models (YOLO-v8): 5-10 images
  • Large models (Segmentation): 2-5 images

Image Preprocessing

Before sending to inferencer:

  • Resize to model's expected dimensions
  • Convert color space if needed
  • Compress if bandwidth limited
javascript
// Resize before inference
const sharp = require('sharp');
msg.payload.data = await sharp(msg.payload.data)
  .resize(640, 640)
  .raw()
  .toBuffer();
msg.payload.width = 640;
msg.payload.height = 640;

Model Warmup

Enable auto-warmup for:

  • Production environments
  • Time-sensitive applications
  • Models with slow cold-start

Disable warmup for:

  • Development/testing
  • Dynamic model switching
  • Memory-constrained systems

Docker Container Management

Container Lifecycle

  1. Startup: Node creates Docker containers
  2. Ready: Containers accept requests
  3. Running: Process inference requests
  4. Shutdown: Clean container removal on deploy

Container Ports

Automatically assigned from available ports. No manual configuration needed.

Container Logs

bash
# View container logs
docker logs <container-id>

# Find inferencer containers
docker ps | grep rosepetal-serving

Memory Usage

Per container (approximate):

  • YOLO-nano: 500MB-1GB
  • YOLO-v8: 2GB-4GB
  • Segmentation: 4GB-8GB
  • PaddlePaddle OCR: 2GB-3GB

Multiple servers multiply memory usage.

Model Management

Model Directory Structure

/opt/storage/models/
├── yolo-v8-detection/
│   ├── model.pt
│   └── config.json
├── product-classifier/
│   ├── model.onnx
│   └── config.json
└── segmentation-model/
    ├── model.pt
    └── config.json

Adding Models Manually

  1. Create directory: /opt/storage/models/<model-name>/
  2. Copy model file: model.pt, model.onnx, etc.
  3. Create config.json (optional, for auto-detection)
  4. Refresh Node-RED editor

Firebase Model Download

Requirements:

  • Firebase config node connected
  • Model exists in Firebase storage
  • Dynamic mode enabled

Behavior:

  • Checks local storage first
  • Downloads if missing
  • Caches for future use
  • Shows download progress in logs

Error Handling

Common Errors

"Docker image not available"

  • Cause: Inference Docker image not pulled
  • Solution: docker pull <image-name>

"Model not found"

  • Cause: Model doesn't exist in /opt/storage/models
  • Solution: Verify model name, check directory, download if needed

"Container failed to start"

  • Cause: Port conflict, insufficient memory, corrupted model
  • Solution: Check Docker logs, verify system resources

"gRPC connection failed"

  • Cause: Container not ready, network issue
  • Solution: Wait for container startup, check Docker status

"Invalid image format"

  • Cause: Missing required fields, wrong data type
  • Solution: Validate image object structure

"CUDA out of memory"

  • Cause: Batch too large, concurrent requests exceeded capacity
  • Solution: Reduce batch size, lower concurrency, use CPU

Debugging

Enable verbose logging:

json
{
  "common": {
    "verbose": true
  }
}

Check container logs:

bash
docker logs <container-id>

Enable debug visualization:

  • Check "Show debug image"
  • Set debug interval: 1
  • Verify images display correctly

Promise Mode

Overview

Promise mode enables asynchronous batch processing:

  • Send multiple images without waiting
  • Process results when ready
  • Higher throughput for large batches

Configuration

Inferencer:

  • Enable "Promise" checkbox
  • Set "Promises field": promises

Flow:

[Function: Batch] → [Inferencer: Promise] → [Promise Reader] → [Function: Process]

Usage

Send batch:

javascript
msg.payload = arrayOf100Images;
return msg;

Inferencer outputs:

javascript
msg.promises = [Promise, Promise, ...]; // 100 promises

Promise Reader resolves:

javascript
msg.results = [...]; // 100 resolved results

Benefits

  • Non-blocking operation
  • Better resource utilization
  • Simplified batch handling
  • Automatic parallelization

Best Practices

Model Selection

  1. Match task to model: Detection vs Classification vs Segmentation
  2. Consider speed/accuracy tradeoff: nano (fast) vs large (accurate)
  3. Test on representative data: Validate before production
  4. Version models: Track changes, enable rollback

Configuration

  1. Use JSON config for repeatability: Save configurations
  2. Document class mappings: Clear naming conventions
  3. Set appropriate thresholds: Balance false positives/negatives
  4. Enable warmup in production: Eliminate first-request latency

Performance

  1. Right-size concurrency: Match hardware capabilities
  2. Use batch processing: Process multiple images together
  3. Monitor memory usage: Prevent OOM errors
  4. Profile inference times: Identify bottlenecks

Maintenance

  1. Clean up unused containers: docker system prune
  2. Monitor disk space: Models can be large
  3. Update Docker images: Stay current with improvements
  4. Log performance metrics: Track degradation over time

Troubleshooting

Slow Inference

Possible causes:

  • Model too large for hardware
  • CPU inference on GPU model
  • High concurrent load
  • Network bottleneck (if remote storage)

Solutions:

  • Use smaller model variant
  • Enable GPU if available
  • Reduce concurrency
  • Cache models locally

High Memory Usage

Possible causes:

  • Too many servers
  • Large batch sizes
  • Memory leak in model

Solutions:

  • Reduce number of servers
  • Process smaller batches
  • Restart containers periodically
  • Update to latest image

Inconsistent Results

Possible causes:

  • Wrong color space
  • Incorrect image dimensions
  • Threshold too sensitive

Solutions:

  • Validate input format
  • Check preprocessing
  • Adjust confidence thresholds
  • Enable debug visualization

See Also