Inferencer Node
Overview
The Inferencer node provides AI-powered computer vision inference by running machine learning models in Docker containers. It acts as a bridge between Node-RED flows and Python-based ML models, supporting detection, classification, segmentation, and anomaly detection tasks.
Key Features
- Multiple ML frameworks: YOLO, PaddlePaddle, RT-DETR, Anomaly detection
- Task support: Object detection, image classification, instance/semantic segmentation
- Docker-based execution: Isolated Python containers for model inference
- Load balancing: Run multiple container instances for higher throughput
- Dynamic models: Load models at runtime or configure statically
- Auto-warmup: Pre-warm models on startup for faster first predictions
- Firebase integration: Automatic model download from Firebase storage
- Promise mode: Asynchronous processing with promise-reader integration
- Output customization: Configure detection boxes, masks, and result formats
- Class mapping: Rename or filter model output classes
- Debug visualization: Display processed images on Node-RED canvas
- Performance tracking: Built-in metrics for inference timing
Architecture
Components
- Node-RED Node: Configuration and message handling
- Docker Containers: Python inference servers (1-3 instances)
- gRPC Protocol: Communication between Node-RED and containers
- Model Storage:
/opt/storage/models/directory - Firebase: Optional model download source
Data Flow
Input Images → Queue → gRPC → Docker Container → ML Model → Predictions → Output
↓
Load Balancing (multiple containers)Configuration
Settings Tab
Name
- Type: String
- Optional: Yes
- Description: Display name for the node
Input Field
- Type: Message property path
- Default:
payload - Description: Message field containing input images (single image or array)
Output Field
- Type: Message property path
- Default:
payload - Description: Where inference results will be stored
- Note: Performance stats saved to
msg.performance.<field>
Model Source
- Type: Select
- Options:
- Select here: Choose from available models in
/opt/storage/models - Dynamic: Provide model name at runtime via message/flow/global property
- Select here: Choose from available models in
Model Name (Static Mode)
- Type: Dropdown
- Description: Select from available models
- Display: Shows model name, task type, and framework
Model Field (Dynamic Mode)
- Type: TypedInput (msg, flow, global)
- Example:
msg.modelName,flow.currentModel,global.activeModel - Description: Property path containing model name at runtime
- Note: Enables automatic model download if Firebase is configured
Auto Warmup (Static Mode)
- Type: Checkbox
- Default: Disabled
- Description: Trigger synthetic inference on startup to prepare model
- Benefit: Eliminates cold-start latency on first real request
Promise Mode
- Type: Checkbox
- Default: Disabled
- Description: Return promises instead of waiting for results
- Requires:
promise-readernode to resolve promises - Use Case: High-throughput batch processing
Promises Field
- Type: Message property path
- Default:
promises - Visible: When Promise Mode enabled
- Description: Array field to store pending promises
Number of Concurrent Servers
- Type: Number
- Range: 1-3
- Default: 1
- Description: Docker container instances for load balancing
- Impact: Higher = more throughput, more memory usage
Maximum Concurrent Predictions
- Type: Number
- Range: 1-20
- Default: 5
- Description: Parallel requests per server
- Behavior: Excess requests queue until slot available
Show Debug Image
- Type: Checkbox
- Default: Disabled
- Description: Display processed images on Node-RED canvas
Debug Interval
- Type: Number
- Default: 1
- Visible: When debug enabled
- Description: Show every Nth image (1 = all images)
Debug Image Width
- Type: Number (pixels)
- Default: 200
- Visible: When debug enabled
- Description: Display width for debug images
JSON Config Tab
Advanced model configuration in JSON format. Structure varies by model type:
Common Fields (All Models)
{
"common": {
"model_name": "my-model",
"config_type": "predict",
"task": "Detection",
"device": "auto",
"image_shape": {
"width": 640,
"height": 640
},
"verbose": false,
"max_batch": 100
}
}Fields:
model_name: Model identifierconfig_type: Always "predict"task: Detection, Classification, Segmentation, Anomalydevice: "auto", "cpu", "cuda", "mps"image_shape: Target dimensions for resizingverbose: Enable detailed loggingmax_batch: Maximum batch size
Detection Models (YOLO, RT-DETR, PaddlePaddle)
{
"common": { /* ... */ },
"task_specific": {
"conf_threshold": 0.5,
"nms_iou_threshold": 0.7,
"max_det": 300
}
}Task-specific fields:
conf_threshold: Confidence threshold (0.0-1.0)nms_iou_threshold: Non-maximum suppression IoU thresholdmax_det: Maximum detections per image
Classification Models
{
"common": { /* ... */ },
"task_specific": {
"top_k": 5
}
}Task-specific fields:
top_k: Return top K predictions
Segmentation Models (YOLO)
{
"common": { /* ... */ },
"task_specific": {
"conf_threshold": 0.5,
"retina_masks": true
}
}Task-specific fields:
conf_threshold: Detection confidence thresholdretina_masks: High-resolution mask output
Anomaly Detection
{
"common": { /* ... */ },
"task_specific": {
"threshold": 0.5,
"normalize": true
},
"model_specific": {
"optional": {
"image_threshold": 0.5,
"pixel_threshold": 0.5
}
}
}Output Formats Tab
Configure detection and segmentation output formats.
Detection Output Formats
- boxes_xyxy: Bounding boxes [x1, y1, x2, y2] (top-left, bottom-right)
- boxes_xywh: Bounding boxes [x_center, y_center, width, height]
- boxes_tlwh: Bounding boxes [x_top_left, y_top_left, width, height]
- boxes_corners: Four corners [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
Segmentation Output Formats
- masks_rle: Run-length encoding (compact format)
- masks_polygon: Polygon contours
- masks_bitmap: Binary pixel masks
Classes Tab
Map or filter model output classes.
Features:
- Rename classes to custom labels
- Filter results by remapping to empty string
- Per-model configuration storage
Example:
Original: "person" → Custom: "human"
Original: "car" → Custom: "vehicle"
Original: "background" → Custom: "" (filtered out)Image Format
The node accepts multiple image formats:
1. Rosepetal Bitmap Format
{
width: 1920,
height: 1080,
data: Buffer, // Raw pixel data
colorSpace: "RGB", // "GRAY", "RGB", "RGBA", "BGR", "BGRA"
channels: 3, // Auto-inferred if omitted
dtype: "uint8" // Currently only uint8 supported
}Color space mapping:
GRAY: 1 channelRGB/BGR: 3 channelsRGBA/BGRA: 4 channels
2. JPEG/PNG Buffers
msg.payload = fs.readFileSync('image.jpg');The node automatically decodes standard image formats.
3. Array of Images
msg.payload = [image1, image2, image3];Processes multiple images in batch for improved performance.
Input
Basic Input
msg.payload = imageBuffer;
return msg;Dynamic Model Selection
// Configure node with Model Source: Dynamic, Model Field: msg.modelName
msg.modelName = "yolo-detection-v8";
msg.payload = imageBuffer;
return msg;Warmup Request
msg.warmup = true;
msg.payload = syntheticImage; // Or omit for auto-generated
return msg;Triggers model warmup without returning results.
Output
Detection Results
{
payload: [
{
box: {
xyxy: [100, 150, 300, 400],
xywh: [200, 275, 200, 250],
confidence: 0.95
},
class: "person",
class_id: 0
}
],
performance: {
payload: {
inferenceTime: 45.2,
preprocessTime: 5.1,
postprocessTime: 3.8,
totalTime: 54.1
}
}
}Classification Results
{
payload: [
{
class: "cat",
class_id: 281,
confidence: 0.98
},
{
class: "dog",
class_id: 179,
confidence: 0.01
}
]
}Segmentation Results
{
payload: [
{
box: { xyxy: [100, 150, 300, 400] },
class: "person",
mask: {
rle: "...", // Run-length encoded
polygon: [[x1, y1], [x2, y2], ...],
bitmap: Buffer // Binary mask
}
}
]
}Promise Mode Output
{
promises: [
Promise { <pending> },
Promise { <pending> }
]
}Resolve with promise-reader node.
Usage Examples
Example 1: Simple Object Detection
// Function node: Load image
msg.payload = {
width: 640,
height: 480,
data: imageBuffer,
colorSpace: "RGB"
};
return msg;Inferencer configuration:
- Model: yolo-v8-detection
- Input: payload
- Output: detections
// Function node: Filter high confidence
msg.payload = msg.detections.filter(det => det.box.confidence > 0.8);
return msg;Example 2: Batch Processing
// Function node: Prepare batch
const images = [
{ width: 640, height: 480, data: buffer1, colorSpace: "RGB" },
{ width: 640, height: 480, data: buffer2, colorSpace: "RGB" },
{ width: 640, height: 480, data: buffer3, colorSpace: "RGB" }
];
msg.payload = images;
return msg;Output:
msg.payload = [
[detection1a, detection1b], // Results from image 1
[detection2a], // Results from image 2
[] // No detections in image 3
];Example 3: Dynamic Model Selection
// Function node: Choose model based on input
if (msg.topic === "quality-check") {
msg.modelName = "defect-detection-model";
} else if (msg.topic === "classification") {
msg.modelName = "product-classifier";
}
msg.payload = imageData;
return msg;Example 4: Promise Mode for High Throughput
// Function node: Send batch with promises
msg.payload = imageArray;
return msg;Inferencer (Promise enabled)
// Promise Reader node resolves all
msg.results = await Promise.all(msg.promises);
return msg;Example 5: Class Filtering
Inferencer Classes config:
person → person
car → vehicle
truck → vehicle
bicycle →
motorcycle →Result: Only "person" and "vehicle" classes in output, bicycle/motorcycle filtered.
Example 6: Multi-Model Pipeline
[Camera] → [Inferencer: Detection] → [Function: Filter] → [Inferencer: Classification] → [Output]Detection step:
msg.detections = msg.payload;
msg.payload = cropDetections(msg.payload); // Extract regions
return msg;Classification step:
// Classify each detected region
msg.classifications = msg.payload;
return msg;Performance Optimization
Concurrency Settings
Single server, low concurrency:
- Servers: 1
- Max concurrent: 5
- Use case: Low memory systems, sporadic requests
Multiple servers, high concurrency:
- Servers: 3
- Max concurrent: 10
- Use case: High-throughput production, powerful hardware
Batch Processing
Optimal batch size:
- Small models (YOLO-nano): 10-20 images
- Medium models (YOLO-v8): 5-10 images
- Large models (Segmentation): 2-5 images
Image Preprocessing
Before sending to inferencer:
- Resize to model's expected dimensions
- Convert color space if needed
- Compress if bandwidth limited
// Resize before inference
const sharp = require('sharp');
msg.payload.data = await sharp(msg.payload.data)
.resize(640, 640)
.raw()
.toBuffer();
msg.payload.width = 640;
msg.payload.height = 640;Model Warmup
Enable auto-warmup for:
- Production environments
- Time-sensitive applications
- Models with slow cold-start
Disable warmup for:
- Development/testing
- Dynamic model switching
- Memory-constrained systems
Docker Container Management
Container Lifecycle
- Startup: Node creates Docker containers
- Ready: Containers accept requests
- Running: Process inference requests
- Shutdown: Clean container removal on deploy
Container Ports
Automatically assigned from available ports. No manual configuration needed.
Container Logs
# View container logs
docker logs <container-id>
# Find inferencer containers
docker ps | grep rosepetal-servingMemory Usage
Per container (approximate):
- YOLO-nano: 500MB-1GB
- YOLO-v8: 2GB-4GB
- Segmentation: 4GB-8GB
- PaddlePaddle OCR: 2GB-3GB
Multiple servers multiply memory usage.
Model Management
Model Directory Structure
/opt/storage/models/
├── yolo-v8-detection/
│ ├── model.pt
│ └── config.json
├── product-classifier/
│ ├── model.onnx
│ └── config.json
└── segmentation-model/
├── model.pt
└── config.jsonAdding Models Manually
- Create directory:
/opt/storage/models/<model-name>/ - Copy model file:
model.pt,model.onnx, etc. - Create
config.json(optional, for auto-detection) - Refresh Node-RED editor
Firebase Model Download
Requirements:
- Firebase config node connected
- Model exists in Firebase storage
- Dynamic mode enabled
Behavior:
- Checks local storage first
- Downloads if missing
- Caches for future use
- Shows download progress in logs
Error Handling
Common Errors
"Docker image not available"
- Cause: Inference Docker image not pulled
- Solution:
docker pull <image-name>
"Model not found"
- Cause: Model doesn't exist in
/opt/storage/models - Solution: Verify model name, check directory, download if needed
"Container failed to start"
- Cause: Port conflict, insufficient memory, corrupted model
- Solution: Check Docker logs, verify system resources
"gRPC connection failed"
- Cause: Container not ready, network issue
- Solution: Wait for container startup, check Docker status
"Invalid image format"
- Cause: Missing required fields, wrong data type
- Solution: Validate image object structure
"CUDA out of memory"
- Cause: Batch too large, concurrent requests exceeded capacity
- Solution: Reduce batch size, lower concurrency, use CPU
Debugging
Enable verbose logging:
{
"common": {
"verbose": true
}
}Check container logs:
docker logs <container-id>Enable debug visualization:
- Check "Show debug image"
- Set debug interval: 1
- Verify images display correctly
Promise Mode
Overview
Promise mode enables asynchronous batch processing:
- Send multiple images without waiting
- Process results when ready
- Higher throughput for large batches
Configuration
Inferencer:
- Enable "Promise" checkbox
- Set "Promises field":
promises
Flow:
[Function: Batch] → [Inferencer: Promise] → [Promise Reader] → [Function: Process]Usage
Send batch:
msg.payload = arrayOf100Images;
return msg;Inferencer outputs:
msg.promises = [Promise, Promise, ...]; // 100 promisesPromise Reader resolves:
msg.results = [...]; // 100 resolved resultsBenefits
- Non-blocking operation
- Better resource utilization
- Simplified batch handling
- Automatic parallelization
Best Practices
Model Selection
- Match task to model: Detection vs Classification vs Segmentation
- Consider speed/accuracy tradeoff: nano (fast) vs large (accurate)
- Test on representative data: Validate before production
- Version models: Track changes, enable rollback
Configuration
- Use JSON config for repeatability: Save configurations
- Document class mappings: Clear naming conventions
- Set appropriate thresholds: Balance false positives/negatives
- Enable warmup in production: Eliminate first-request latency
Performance
- Right-size concurrency: Match hardware capabilities
- Use batch processing: Process multiple images together
- Monitor memory usage: Prevent OOM errors
- Profile inference times: Identify bottlenecks
Maintenance
- Clean up unused containers:
docker system prune - Monitor disk space: Models can be large
- Update Docker images: Stay current with improvements
- Log performance metrics: Track degradation over time
Troubleshooting
Slow Inference
Possible causes:
- Model too large for hardware
- CPU inference on GPU model
- High concurrent load
- Network bottleneck (if remote storage)
Solutions:
- Use smaller model variant
- Enable GPU if available
- Reduce concurrency
- Cache models locally
High Memory Usage
Possible causes:
- Too many servers
- Large batch sizes
- Memory leak in model
Solutions:
- Reduce number of servers
- Process smaller batches
- Restart containers periodically
- Update to latest image
Inconsistent Results
Possible causes:
- Wrong color space
- Incorrect image dimensions
- Threshold too sensitive
Solutions:
- Validate input format
- Check preprocessing
- Adjust confidence thresholds
- Enable debug visualization
See Also
- OCR Inferencer Node - Specialized OCR inference
- Promise Reader Node - Resolve async promises
- Dataset Upload Node - Upload training data
- Firebase Config Node - Configure Firebase access
- Vision Platform Overview - Complete platform documentation