Skip to content

Dataset Operations

This section covers all read operations available for datasets in the Rosepetal API.

Overview

Datasets contain collections of labeled images used for computer vision training and analysis. The API provides endpoints to retrieve dataset information, generate training files, and download data packages.

Dataset Types

The API supports three main dataset types:

  • MULTICLASS: Classification with single labels per image
  • MULTILABEL: Segmentation with multiple labels per image
  • ANOMALY: Anomaly detection datasets
  • imageObjectDetection: Object detection with bounding boxes

Endpoints

Generate CSV Training File

Generate a CSV file for model training with dataset images and labels.

http
GET /dataset/{dataset_id}/csv

Parameters:

ParameterTypeDescription
dataset_idstringUnique dataset identifier
testnumber (optional)Percentage of data for testing (query param)
validationnumber (optional)Percentage of data for validation (query param)
tagmapstring (optional)Tag mapping configuration (query param)

Example Request:

http
GET /dataset/my-dataset-123/csv?test=20&validation=10

Response:

json
{
  "status": "success",
  "result": {
    "error": false,
    "training": {
      "name": "my-dataset-123",
      "dataset": "gs://project-bucket/model-config/my-dataset-123.csv"
    }
  }
}

Download Dataset ZIP

Download a ZIP file containing dataset images organized by labels.

http
POST /dataset/{dataset_id}/downloadZip

Parameters:

ParameterTypeDescription
dataset_idstringUnique dataset identifier

Request Body:

json
{
  "images": ["image_id_1", "image_id_2"],
  "userId": "user_123"
}

Response:

  • Content-Type: application/zip
  • Content-Disposition: attachment; filename="dataset_2024-01-15.zip"

The ZIP file structure varies by dataset type:

MULTICLASS/ANOMALY Structure:

dataset.zip/
├── Label1/
│   ├── image1.jpeg
│   └── image2.jpeg
├── Label2/
│   ├── image3.jpeg
│   └── image4.jpeg
└── Unclassified/
    └── image5.jpeg

MULTILABEL Structure:

dataset.zip/
├── Normal/
│   ├── image1.jpeg
│   └── image2.jpeg
└── Anomaly/
    ├── 0/
    │   └── unlabeled_image.jpeg
    └── DefectType/
        ├── image3.jpeg
        └── Masks/
            └── image3_mask.png

Get Annotation Crops

Retrieve cropped regions from images containing specific annotations.

http
GET /dataset/{dataset_id}/annotationsCrops/{tag_id}

Parameters:

ParameterTypeDescription
dataset_idstringUnique dataset identifier
tag_idstringAnnotation tag identifier

Example Request:

http
GET /dataset/my-dataset/annotationsCrops/defect-type-1

Response:

json
[
  {
    "id": "defect-type-1",
    "imageId": "image_123",
    "cropUri": "...",
    "tagIndex": 0
  },
  {
    "id": "defect-type-1",
    "imageId": "image_456",
    "cropUri": "...",
    "tagIndex": 1
  }
]

Refresh Dataset Counters

Recalculate image and annotation counters for a dataset.

http
POST /dataset/{dataset_id}/refreshCounters

Parameters:

ParameterTypeDescription
dataset_idstringUnique dataset identifier

Response:

json
{
  "error": false,
  "status": "success"
}

Dataset Metadata Structure

When working with datasets, you'll encounter these key properties:

FieldTypeDescription
typestringDataset type (MULTICLASS, MULTILABEL, etc.)
namestringDataset display name
tagsarrayAvailable annotation tags
imageCounternumberTotal number of images
createdAttimestampDataset creation time

Tag Properties

Dataset tags contain the following information:

FieldTypeDescription
idstringUnique tag identifier
namestringTag display name
colorstringColor code for visualization
imageCounternumberImages containing this tag
annotationCounternumberTotal annotations with this tag
unclassifiedbooleanWhether tag represents unclassified data

Set Types

Images in datasets are divided into training sets:

  • TRAIN: Training data (typically 70-80%)
  • TEST: Testing data (typically 10-20%)
  • VALIDATION: Validation data (typically 10-20%)
  • PREDETERMINED: Default set before splitting
  • REVIEW: Images requiring manual review

Error Handling

Common error responses:

Dataset Not Found

json
{
  "error": true,
  "result": "Dataset with ID \"dataset-123\" not found"
}

Invalid Parameters

json
{
  "error": true,
  "result": "Images array list is required"
}

Processing Error

json
{
  "error": true,
  "result": "Error generating CSV: insufficient data"
}