Train AI Models

What is Training?

Model training is the process where the AI algorithm learns to recognize patterns in labeled images to make accurate predictions on new, previously unseen images.

Prerequisites

📁 Prepared Dataset

Uploaded images: Minimum 50-100 images per class
Complete labeling: All images must be annotated
Class balance: Similar number of examples per category
Validation performed: No errors in annotations

⚖️ Recommended Balance

Classification: 100-500 images per class minimum
Detection: 50-200 examples per object type
Segmentation: 20-100 images with precise masks
Anomalies: 200+ "normal" images

Start Training

🚀 From the Models Tab

Open dataset that you want to train
"Models" tab in the dataset view
"Train Model" button or equivalent
Training parameter configuration

⚙️ Basic Configuration

Main Parameters

Model name: Unique identifier
Description: Purpose and specific characteristics
Model type: According to your dataset (classification, detection, etc.)
Version: For version control

Data Split

Training (70%): Data for learning patterns
Validation (20%): Data for parameter adjustment
Test (10%): Data for final evaluation

🔧 Advanced Configurations

Hyperparameters

Learning Rate: Learning speed (0.001 typical)
Batch Size: Number of images processed simultaneously
Epochs: Number of complete passes through the data
Architecture: Type of neural network (ResNet, EfficientNet, etc.)

Data Augmentation

Rotation: Rotate images for greater variability
Zoom: Zoom in/out for different scales
Flip: Horizontal/vertical flip
Noise: Add variations for robustness

Training Process

📈 Training States

🟡 Starting

Data preparation: Loading and preprocessing
Model initialization: Architecture configuration
Parameter validation: Configuration verification

🔵 Training

Visible progress: Progress bar with percentage
Real-time metrics: Loss, accuracy per epoch
Estimated time: Approximate remaining duration
Cancellation possible: Option to stop if necessary

✅ Completed

Final metrics: Precision, recall, F1-score
Convergence graphs: Evolution during training
Saved model: Available for predictions
Detailed report: Complete analysis of results

❌ Failed

Error message: Problem description
Detailed logs: For technical diagnosis
Suggestions: Possible solutions
Retry: Option to correct and retrain

📊 Monitoring During Training

Visible Metrics

Loss: Should decrease over time
Accuracy: Should increase progressively
Validation Loss: To detect overfitting
Learning Curves: Evolution graphs

Health Indicators

✅ Normal convergence: Loss decreases smoothly
⚠️ Overfitting: Validation worsens while training improves
❌ Underfitting: Both training and validation stagnate

Configurations by Model Type

🏷️ Classification

json

{
  "epochs": 50-100,
  "batch_size": 16-32,
  "learning_rate": 0.001,
  "optimizer": "Adam",
  "augmentation": true
}

🎯 Object Detection

json

{
  "epochs": 100-200,
  "batch_size": 8-16,
  "learning_rate": 0.0001,
  "backbone": "ResNet50",
  "anchor_sizes": [32, 64, 128]
}

🎨 Segmentation

json

{
  "epochs": 150-300,
  "batch_size": 4-8,
  "learning_rate": 0.0001,
  "architecture": "U-Net",
  "loss_function": "Dice + CrossEntropy"
}

🚨 Anomaly Detection

json

{
  "epochs": 100-200,
  "batch_size": 32-64,
  "learning_rate": 0.001,
  "latent_dim": 128,
  "reconstruction_loss": "MSE"
}

Results Evaluation

📊 Main Metrics

For Classification

Global Accuracy: Total percentage of correct predictions
Precision per Class: Accuracy for each category
Recall per Class: Ability to detect each class
Confusion Matrix: Confusion table between classes

For Detection

mAP@0.5: Mean precision with IoU > 0.5
mAP@0.5:0.95: Average precision at different thresholds
Detections per Image: Average number of objects found
False Positives/Negatives: Detection errors

For Segmentation

IoU per Class: Intersection over union per category
Dice Score: Similarity measure between masks
Pixel Accuracy: Percentage of correct pixels
Boundary F1: Precision at edges

📈 Convergence Analysis

Important Graphs

Training vs Validation Loss: To detect overfitting
Accuracy Curves: Precision evolution
Learning Rate Schedule: Learning speed adjustment
Gradient Flow: Gradient flow through layers

Interpretation

Parallel curves: Healthy training
Divergence: Possible overfitting
Stagnation: Possible underfitting or inadequate learning rate

Model Optimization

🎯 Performance Improvement

If Model Doesn't Converge

Increase epochs: More training time
Reduce learning rate: More gradual learning
Change architecture: More appropriate model
Review data: Dataset quality and balance

If There's Overfitting

Data Augmentation: More data variability
Dropout: Regularization during training
Early Stopping: Stop when validation worsens
More data: Increase dataset size

If There's Underfitting

More complex model: More layers or parameters
Increase learning rate: Faster learning
Less regularization: Reduce dropout
More epochs: More training time

⚡ Speed Optimization

During Training

Larger batch size: More parallelization (if memory allows)
Mixed precision: Use of float16 + float32
More powerful GPU: Specialized hardware
Optimized preprocessing: More efficient data loading

For Inference

Model pruning: Remove unnecessary connections
Quantization: Reduce weight precision
TensorRT: NVIDIA GPU-specific optimization
ONNX: Optimized format for production

Versioning and Management

📦 Version Control

Descriptive names: v1.0_high_precision, v2.0_fast
Metadata: Dates, parameters, dataset used
Comparison: Side-by-side metrics between versions
Rollback: Ability to return to previous version

💾 Storage

Cloud models: Automatic backup
Compression: Reduce file sizes
Checkpoints: Save points during training
Export formats: TensorFlow, PyTorch, ONNX

🔄 Lifecycle

Training: Create new version
Evaluation: Compare with previous versions
Testing: Test on real data
Deploy: Put into production
Monitor: Track performance in use
Retrain: Update with new data

Best Practices

📋 Before Training

Data cleaning: Review image and label quality
Exploratory analysis: Understand data distribution
Baseline: Establish minimum acceptable metrics
Strategy: Plan for different scenarios

🎯 During Training

Active monitoring: Watch real-time metrics
Regular checkpoints: Save progress
Detailed logging: Record parameters and results
Experimentation: Try different configurations

✅ After Training

Cross validation: Confirm results on independent data
Error analysis: Analyze cases where model fails
Documentation: Record configuration and results
Prepare deployment: Optimize for production

Train AI Models ​

What is Training? ​

Prerequisites ​

📁 Prepared Dataset ​

⚖️ Recommended Balance ​

Start Training ​

🚀 From the Models Tab ​

⚙️ Basic Configuration ​

Main Parameters ​

Data Split ​

🔧 Advanced Configurations ​

Hyperparameters ​

Data Augmentation ​

Training Process ​

📈 Training States ​

🟡 Starting ​

🔵 Training ​

✅ Completed ​

❌ Failed ​

📊 Monitoring During Training ​

Visible Metrics ​

Health Indicators ​

Configurations by Model Type ​

🏷️ Classification ​

🎯 Object Detection ​

🎨 Segmentation ​

🚨 Anomaly Detection ​

Results Evaluation ​

📊 Main Metrics ​

For Classification ​

For Detection ​

For Segmentation ​

📈 Convergence Analysis ​

Important Graphs ​

Interpretation ​

Model Optimization ​

🎯 Performance Improvement ​

If Model Doesn't Converge ​

If There's Overfitting ​

If There's Underfitting ​

⚡ Speed Optimization ​

During Training ​

For Inference ​

Versioning and Management ​

📦 Version Control ​

💾 Storage ​

🔄 Lifecycle ​

Best Practices ​

📋 Before Training ​

🎯 During Training ​

✅ After Training ​

Train AI Models

What is Training?

Prerequisites

📁 Prepared Dataset

⚖️ Recommended Balance

Start Training

🚀 From the Models Tab

⚙️ Basic Configuration

Main Parameters

Data Split

🔧 Advanced Configurations

Hyperparameters

Data Augmentation

Training Process

📈 Training States

🟡 Starting

🔵 Training

✅ Completed

❌ Failed

📊 Monitoring During Training

Visible Metrics

Health Indicators

Configurations by Model Type

🏷️ Classification

🎯 Object Detection

🎨 Segmentation

🚨 Anomaly Detection

Results Evaluation

📊 Main Metrics

For Classification

For Detection

For Segmentation

📈 Convergence Analysis

Important Graphs

Interpretation

Model Optimization

🎯 Performance Improvement

If Model Doesn't Converge

If There's Overfitting

If There's Underfitting

⚡ Speed Optimization

During Training

For Inference

Versioning and Management

📦 Version Control

💾 Storage

🔄 Lifecycle

Best Practices

📋 Before Training

🎯 During Training

✅ After Training