Dataset Operations
This section covers all read operations available for datasets in the Rosepetal API.
Overview
Datasets contain collections of labeled images used for computer vision training and analysis. The API provides endpoints to retrieve dataset information, generate training files, and download data packages.
Dataset Types
The API supports three main dataset types:
- MULTICLASS: Classification with single labels per image
- MULTILABEL: Segmentation with multiple labels per image
- ANOMALY: Anomaly detection datasets
- imageObjectDetection: Object detection with bounding boxes
Endpoints
Generate CSV Training File
Generate a CSV file for model training with dataset images and labels.
GET /dataset/{dataset_id}/csv
Parameters:
Parameter | Type | Description |
---|---|---|
dataset_id | string | Unique dataset identifier |
test | number (optional) | Percentage of data for testing (query param) |
validation | number (optional) | Percentage of data for validation (query param) |
tagmap | string (optional) | Tag mapping configuration (query param) |
Example Request:
GET /dataset/my-dataset-123/csv?test=20&validation=10
Response:
{
"status": "success",
"result": {
"error": false,
"training": {
"name": "my-dataset-123",
"dataset": "gs://project-bucket/model-config/my-dataset-123.csv"
}
}
}
Download Dataset ZIP
Download a ZIP file containing dataset images organized by labels.
POST /dataset/{dataset_id}/downloadZip
Parameters:
Parameter | Type | Description |
---|---|---|
dataset_id | string | Unique dataset identifier |
Request Body:
{
"images": ["image_id_1", "image_id_2"],
"userId": "user_123"
}
Response:
- Content-Type:
application/zip
- Content-Disposition:
attachment; filename="dataset_2024-01-15.zip"
The ZIP file structure varies by dataset type:
MULTICLASS/ANOMALY Structure:
dataset.zip/
├── Label1/
│ ├── image1.jpeg
│ └── image2.jpeg
├── Label2/
│ ├── image3.jpeg
│ └── image4.jpeg
└── Unclassified/
└── image5.jpeg
MULTILABEL Structure:
dataset.zip/
├── Normal/
│ ├── image1.jpeg
│ └── image2.jpeg
└── Anomaly/
├── 0/
│ └── unlabeled_image.jpeg
└── DefectType/
├── image3.jpeg
└── Masks/
└── image3_mask.png
Get Annotation Crops
Retrieve cropped regions from images containing specific annotations.
GET /dataset/{dataset_id}/annotationsCrops/{tag_id}
Parameters:
Parameter | Type | Description |
---|---|---|
dataset_id | string | Unique dataset identifier |
tag_id | string | Annotation tag identifier |
Example Request:
GET /dataset/my-dataset/annotationsCrops/defect-type-1
Response:
[
{
"id": "defect-type-1",
"imageId": "image_123",
"cropUri": "...",
"tagIndex": 0
},
{
"id": "defect-type-1",
"imageId": "image_456",
"cropUri": "...",
"tagIndex": 1
}
]
Refresh Dataset Counters
Recalculate image and annotation counters for a dataset.
POST /dataset/{dataset_id}/refreshCounters
Parameters:
Parameter | Type | Description |
---|---|---|
dataset_id | string | Unique dataset identifier |
Response:
{
"error": false,
"status": "success"
}
Dataset Metadata Structure
When working with datasets, you'll encounter these key properties:
Field | Type | Description |
---|---|---|
type | string | Dataset type (MULTICLASS, MULTILABEL, etc.) |
name | string | Dataset display name |
tags | array | Available annotation tags |
imageCounter | number | Total number of images |
createdAt | timestamp | Dataset creation time |
Tag Properties
Dataset tags contain the following information:
Field | Type | Description |
---|---|---|
id | string | Unique tag identifier |
name | string | Tag display name |
color | string | Color code for visualization |
imageCounter | number | Images containing this tag |
annotationCounter | number | Total annotations with this tag |
unclassified | boolean | Whether tag represents unclassified data |
Set Types
Images in datasets are divided into training sets:
- TRAIN: Training data (typically 70-80%)
- TEST: Testing data (typically 10-20%)
- VALIDATION: Validation data (typically 10-20%)
- PREDETERMINED: Default set before splitting
- REVIEW: Images requiring manual review
Error Handling
Common error responses:
Dataset Not Found
{
"error": true,
"result": "Dataset with ID \"dataset-123\" not found"
}
Invalid Parameters
{
"error": true,
"result": "Images array list is required"
}
Processing Error
{
"error": true,
"result": "Error generating CSV: insufficient data"
}