Skip to content

From Dataset Node

Overview

The From Dataset node streams images from Firebase datasets at a controlled rate. It's designed for processing large datasets without overwhelming the system, featuring intelligent buffering, automatic download management, and flexible filtering options.

Key Features

  • Controlled streaming: Set interval between images
  • Intelligent buffering: Pre-downloads images for smooth delivery
  • Tag filtering: Process only specific labels
  • Set filtering: Filter by TRAIN/VALID/TEST sets
  • Loop mode: Continuous streaming or one-time processing
  • Pause/Resume: Control stream mid-process
  • Multiple color spaces: RGB, BGR, RGBA, BGRA
  • Progress tracking: Monitor processing status
  • BMP decoding: Native BMP file support

Configuration

Properties

Firebase Config

  • Type: Node reference
  • Required: Yes
  • Description: Firebase configuration node

Dataset

  • Type: Dropdown
  • Required: Yes
  • Description: Dataset to stream from

Interval (ms)

  • Type: Number
  • Default: 5000
  • Description: Milliseconds between images

Loop

  • Type: Checkbox
  • Default: false
  • Description: Restart from beginning after completion

Color Space

  • Type: Select
  • Options: RGB, BGR, RGBA, BGRA
  • Default: RGB
  • Description: Output image color format

Tag Filter

  • Type: Multi-select
  • Optional: Yes
  • Description: Only process images with selected tags

Set Filter

  • Type: Multi-select
  • Options: TRAIN, VALID, TEST, PREDETERMINED
  • Optional: Yes
  • Description: Only process images from selected sets

Input

Control Messages

Start Streaming

javascript
msg.payload = { action: "start" };

Pause Streaming

javascript
msg.payload = { action: "pause" };

Resume Streaming

javascript
msg.payload = { action: "resume" };

Stop Streaming

javascript
msg.payload = { action: "stop" };

Output

Image Message

javascript
{
  payload: {
    width: 1920,
    height: 1080,
    data: Buffer,
    colorSpace: "RGB",
    channels: 3,
    dtype: "uint8"
  },
  imageIndex: 0,
  totalImages: 1000,
  imageId: "img_abc123",
  tags: ["OK", "frontal"],
  setType: "TRAIN",
  progress: {
    current: 1,
    total: 1000,
    percentage: 0.1
  }
}

Progress Fields

  • imageIndex: Current image index (0-based)
  • totalImages: Total images to process
  • progress.current: Images processed so far
  • progress.total: Total images in dataset
  • progress.percentage: Completion percentage

Usage Examples

Example 1: Process All Images

[Inject: Start] → [From Dataset] → [Inferencer] → [To Dataset]

Inject node:

javascript
msg.payload = { action: "start" };
return msg;

Example 2: Filter by Tag

Configuration:

  • Tag Filter: ["DEFECT", "SCRATCH"]
  • Set Filter: ["TRAIN"]

Result: Only training images tagged as DEFECT or SCRATCH

Example 3: Continuous Loop

Configuration:

  • Loop: Enabled
  • Interval: 1000ms

Behavior: Cycles through dataset repeatedly

Example 4: Progress Monitoring

javascript
// Function node after From Dataset
const pct = msg.progress.percentage.toFixed(1);
node.status({ fill: "blue", shape: "dot", text: `${pct}%` });

if (msg.progress.current === msg.progress.total) {
  msg.payload = "Processing complete!";
  return msg;
}

Example 5: Batch Processing with Pause

javascript
// Function node: Pause after every 100 images
if (msg.imageIndex % 100 === 0 && msg.imageIndex > 0) {
  // Pause for processing
  const pauseMsg = { payload: { action: "pause" } };
  node.send([msg, pauseMsg]);

  // Resume after 5 seconds
  setTimeout(() => {
    const resumeMsg = { payload: { action: "resume" } };
    node.send([null, resumeMsg]);
  }, 5000);
} else {
  return msg;
}

Performance Considerations

Buffer Size

The node automatically calculates optimal buffer size based on:

  • Download speed
  • Interval time
  • Target: Keep 20% interval time worth of images buffered

Example:

  • Interval: 5000ms
  • Download time: 1000ms
  • Buffer size: ceil(1000 / (5000 * 0.2)) = 1 image

Memory Usage

Per buffered image: ~5-10 MB (typical)

  • 1920x1080 RGB: ~6 MB
  • Buffer of 5 images: ~30 MB

Network Usage

Optimization:

  • Pre-downloads images before needed
  • Only downloads once (no re-fetching)
  • Firebase Storage bandwidth

Best Practices

  1. Set appropriate intervals: Match to processing speed
  2. Use tag filters: Reduce unnecessary downloads
  3. Monitor progress: Track completion percentage
  4. Handle completion: Check when processing finishes
  5. Memory management: Don't queue too many processes
  6. Error handling: Handle download failures gracefully

Troubleshooting

Slow streaming

Causes:

  • Slow network connection
  • Large images
  • Interval too short

Solutions:

  • Increase interval
  • Check network speed
  • Resize images in Firebase

Missing images

Causes:

  • Tag filter too restrictive
  • Set filter excludes images
  • Dataset empty

Solutions:

  • Check filter settings
  • Verify dataset has images
  • Review tag/set assignments

High memory usage

Causes:

  • Too many images buffered
  • Processing slower than streaming

Solutions:

  • Increase interval
  • Optimize processing
  • Pause during heavy operations

See Also