Skip to content

Dataset Upload Node

Overview

The Dataset Upload node automates the process of uploading images from local storage to Firebase datasets. It's designed for batch processing of captured images with concurrent uploads, progress tracking, and automatic cleanup of uploaded files.

Key Features

  • Batch image upload: Upload multiple images from configured directories
  • Firebase integration: Direct upload to Firebase Storage and Firestore
  • Concurrent processing: 10 parallel uploads for high performance
  • Progress tracking: Real-time feedback with images done/left counters
  • Timestamp filtering: Upload only images matching a specific timestamp
  • Automatic cleanup: Deletes source images and thumbnails after upload
  • Duplicate detection: Automatically skips already uploaded images
  • Multi-folder support: Process images from multiple directories
  • Flexible configuration: Load config from message, flow, global, or JSON
  • Interruptible: Stop upload process mid-operation

Configuration

Properties

Name

  • Type: String
  • Optional: Yes
  • Description: Custom name for the node instance

Firebase Config

  • Type: Node reference
  • Required: Yes
  • Description: Reference to a firebase-config node for authentication and Firebase access

Config

  • Type: TypedInput (msg, flow, global, json)
  • Default: global.configTraining
  • Description: Source location for the training configuration array

Available sources:

  • msg: Get configuration from message property (e.g., msg.uploadConfig)
  • flow: Get configuration from flow context
  • global: Get configuration from global context (default)
  • json: Provide configuration directly as JSON

Training Configuration

The node requires a configuration array that maps folders and source names to target datasets.

Configuration Format

javascript
[
  {
    "dataset": "rp-class-test",
    "folder": "/opt/storage/images/camera1",
    "sourceName": "raw",
    "suffix": "a"
  },
  {
    "dataset": "rp-detect-parts",
    "folder": "/opt/storage/images/camera2",
    "sourceName": "processed",
    "suffix": "b"
  }
]

Configuration Fields

dataset

  • Type: String
  • Required: Yes
  • Description: Target dataset ID in Firebase where images will be uploaded

folder

  • Type: String
  • Required: Yes
  • Description: Absolute path to the directory containing images
  • Notes:
    • Leading slash is optional (e.g., opt/storage becomes /opt/storage)
    • Path is automatically normalized
    • Must exist and be readable

sourceName

  • Type: String
  • Required: Yes
  • Description: Identifier extracted from filenames to match config entries
  • Format: Extracted from the second-to-last segment of filename
  • Example: In 1234567890_camera1_a.jpg, sourceName is camera1

suffix

  • Type: String
  • Optional: Yes
  • Description: Validation suffix (currently not enforced but preserved for future use)

Configuration Sources

Example: Global context

javascript
// In a function node before upload
global.set("configTraining", [
  {
    dataset: "my-dataset",
    folder: "/opt/images",
    sourceName: "cam1"
  }
]);

Example: Message property

javascript
// Configure node: Config = msg.uploadConfig
msg.uploadConfig = [
  {
    dataset: "my-dataset",
    folder: "/opt/images",
    sourceName: "cam1"
  }
];
return msg;

Example: JSON directly

json
[{"dataset":"my-dataset","folder":"/opt/images","sourceName":"cam1"}]

Filename Format

Images must follow a specific naming convention for proper processing:

Format

{timestamp}_{sourceName}_{suffix}.{extension}

Components

  • timestamp: Unix timestamp in milliseconds
  • sourceName: Identifier matching configuration entry
  • suffix: Optional identifier (e.g., camera position)
  • extension: Image file extension (jpg, png, etc.)

Examples

1759745408220_camera1_a.jpg
1759745408221_raw_b.png
1759745408222_processed_x.jpg

Parsing Logic

The node extracts the sourceName from the second-to-last segment:

Filename: 1759745408220_camera1_a.jpg
          ^^^^^^^^^^^^^^^^ ^^^^^^^ ^
          timestamp        source  suffix

Input

The node accepts messages with different payload values to control operation:

Start Upload (All Images)

javascript
msg.payload = true;

Uploads all images from configured folders.

Start Upload (Filtered by Timestamp)

javascript
msg.payload = "1759745408220";

Uploads only images whose filenames contain the specified timestamp.

Use case: Upload images from a specific capture session.

Stop Upload

javascript
msg.payload = false;

Gracefully stops the current upload process.

Behavior:

  • Completes current uploads
  • Cancels pending uploads
  • Outputs messages for completed images only

Output

The node sends one message per processed image with detailed progress information.

Output Message Format

javascript
{
  payload: "1759745408220_camera1_a.jpg",  // Filename
  done: true,                               // Success status
  message: "",                              // Error message or status
  imagesDone: 5,                            // Images processed so far
  imagesLeft: 15                            // Images remaining
}

Output Fields

payload

  • Type: String
  • Description: Name of the image file being processed

done

  • Type: Boolean
  • Values:
    • true: Upload succeeded
    • false: Upload failed

message

  • Type: String
  • Values:
    • "": Upload succeeded (empty string)
    • "Image already exists": Upload skipped (duplicate)
    • Error message: Upload failed with reason

imagesDone

  • Type: Number
  • Description: Total number of images processed including current image
  • Use: Progress tracking, percentage calculation

imagesLeft

  • Type: Number
  • Description: Remaining images to process (total - imagesDone)
  • Use: Progress bar, ETA calculation

Example Output Sequence

javascript
// First image
{ payload: "img1.jpg", done: true, message: "", imagesDone: 1, imagesLeft: 9 }

// Second image (duplicate)
{ payload: "img2.jpg", done: true, message: "Image already exists", imagesDone: 2, imagesLeft: 8 }

// Third image (error)
{ payload: "img3.jpg", done: false, message: "Dataset not found", imagesDone: 3, imagesLeft: 7 }

// ... continues for all images

Processing Flow

1. Initialization

  • Validates Firebase config node is connected and ready
  • Loads training configuration from specified source
  • Normalizes folder paths

2. File Discovery

  • Reads all files from each configured folder
  • Excludes thumbnails folder
  • Filters by timestamp if provided
  • Validates files are actually files (not directories)

3. Concurrent Upload

  • Processes images with concurrency limit of 10
  • For each image:
    • Extracts sourceName from filename
    • Matches to configuration entry
    • Reads image file as buffer
    • Uploads to Firebase Storage
    • Creates/updates Firestore document
    • Deletes local image and thumbnail
    • Sends output message with progress

4. Completion

  • Updates global.trainingCount with remaining files
  • Logs completion message
  • Resets upload state

Upload Process Diagram