Dataset Operations

This section covers all read operations available for datasets in the Rosepetal API.

Overview

Datasets contain collections of labeled images used for computer vision training and analysis. The API provides endpoints to retrieve dataset information, generate training files, and download data packages.

Dataset Types

The API supports three main dataset types:

MULTICLASS: Classification with single labels per image
MULTILABEL: Segmentation with multiple labels per image
ANOMALY: Anomaly detection datasets
imageObjectDetection: Object detection with bounding boxes

Endpoints

Generate CSV Training File

Generate a CSV file for model training with dataset images and labels.

http

GET /dataset/{dataset_id}/csv

Parameters:

Parameter	Type	Description
`dataset_id`	string	Unique dataset identifier
`test`	number (optional)	Percentage of data for testing (query param)
`validation`	number (optional)	Percentage of data for validation (query param)
`tagmap`	string (optional)	Tag mapping configuration (query param)

Example Request:

http

GET /dataset/my-dataset-123/csv?test=20&validation=10

Response:

json

{
  "status": "success",
  "result": {
    "error": false,
    "training": {
      "name": "my-dataset-123",
      "dataset": "gs://project-bucket/model-config/my-dataset-123.csv"
    }
  }
}

Download Dataset ZIP

Download a ZIP file containing dataset images organized by labels.

http

POST /dataset/{dataset_id}/downloadZip

Parameters:

Parameter	Type	Description
`dataset_id`	string	Unique dataset identifier

Request Body:

json

{
  "images": ["image_id_1", "image_id_2"],
  "userId": "user_123"
}

Response:

Content-Type: application/zip
Content-Disposition: attachment; filename="dataset_2024-01-15.zip"

The ZIP file structure varies by dataset type:

MULTICLASS/ANOMALY Structure:

dataset.zip/
├── Label1/
│   ├── image1.jpeg
│   └── image2.jpeg
├── Label2/
│   ├── image3.jpeg
│   └── image4.jpeg
└── Unclassified/
    └── image5.jpeg

MULTILABEL Structure:

dataset.zip/
├── Normal/
│   ├── image1.jpeg
│   └── image2.jpeg
└── Anomaly/
    ├── 0/
    │   └── unlabeled_image.jpeg
    └── DefectType/
        ├── image3.jpeg
        └── Masks/
            └── image3_mask.png

Get Annotation Crops

Retrieve cropped regions from images containing specific annotations.

http

GET /dataset/{dataset_id}/annotationsCrops/{tag_id}

Parameters:

Parameter	Type	Description
`dataset_id`	string	Unique dataset identifier
`tag_id`	string	Annotation tag identifier

Example Request:

http

GET /dataset/my-dataset/annotationsCrops/defect-type-1

Response:

json

[
  {
    "id": "defect-type-1",
    "imageId": "image_123",
    "cropUri": "data:image/webp;base64,UklGRiYAAABXRUJQVlA4...",
    "tagIndex": 0
  },
  {
    "id": "defect-type-1",
    "imageId": "image_456",
    "cropUri": "data:image/webp;base64,UklGRiYAAABXRUJQVlA4...",
    "tagIndex": 1
  }
]

Refresh Dataset Counters

Recalculate image and annotation counters for a dataset.

http

POST /dataset/{dataset_id}/refreshCounters

Parameters:

Parameter	Type	Description
`dataset_id`	string	Unique dataset identifier

Response:

json

{
  "error": false,
  "status": "success"
}

Dataset Metadata Structure

When working with datasets, you'll encounter these key properties:

Field	Type	Description
`type`	string	Dataset type (MULTICLASS, MULTILABEL, etc.)
`name`	string	Dataset display name
`tags`	array	Available annotation tags
`imageCounter`	number	Total number of images
`createdAt`	timestamp	Dataset creation time

Tag Properties

Dataset tags contain the following information:

Field	Type	Description
`id`	string	Unique tag identifier
`name`	string	Tag display name
`color`	string	Color code for visualization
`imageCounter`	number	Images containing this tag
`annotationCounter`	number	Total annotations with this tag
`unclassified`	boolean	Whether tag represents unclassified data

Set Types

Images in datasets are divided into training sets:

TRAIN: Training data (typically 70-80%)
TEST: Testing data (typically 10-20%)
VALIDATION: Validation data (typically 10-20%)
PREDETERMINED: Default set before splitting
REVIEW: Images requiring manual review

Error Handling

Common error responses:

Dataset Not Found

json

{
  "error": true,
  "result": "Dataset with ID \"dataset-123\" not found"
}

Invalid Parameters

json

{
  "error": true,
  "result": "Images array list is required"
}

Processing Error

json

{
  "error": true,
  "result": "Error generating CSV: insufficient data"
}

Dataset Operations ​

Overview ​

Dataset Types ​

Endpoints ​

Generate CSV Training File ​

Download Dataset ZIP ​

MULTICLASS/ANOMALY Structure: ​

MULTILABEL Structure: ​

Get Annotation Crops ​

Refresh Dataset Counters ​

Dataset Metadata Structure ​

Tag Properties ​

Set Types ​

Error Handling ​

Dataset Not Found ​

Invalid Parameters ​

Processing Error ​

Dataset Operations

Overview

Dataset Types

Endpoints

Generate CSV Training File

Download Dataset ZIP

MULTICLASS/ANOMALY Structure:

MULTILABEL Structure:

Get Annotation Crops

Refresh Dataset Counters

Dataset Metadata Structure

Tag Properties

Set Types

Error Handling

Dataset Not Found

Invalid Parameters

Processing Error