Train AI Models
What is Training?
Model training is the process where the AI algorithm learns to recognize patterns in labeled images to make accurate predictions on new, previously unseen images.
Prerequisites
📁 Prepared Dataset
- Uploaded images: Minimum 50-100 images per class
- Complete labeling: All images must be annotated
- Class balance: Similar number of examples per category
- Validation performed: No errors in annotations
⚖️ Recommended Balance
- Classification: 100-500 images per class minimum
- Detection: 50-200 examples per object type
- Segmentation: 20-100 images with precise masks
- Anomalies: 200+ "normal" images
Start Training
🚀 From the Models Tab
- Open dataset that you want to train
- "Models" tab in the dataset view
- "Train Model" button or equivalent
- Training parameter configuration
⚙️ Basic Configuration
Main Parameters
- Model name: Unique identifier
- Description: Purpose and specific characteristics
- Model type: According to your dataset (classification, detection, etc.)
- Version: For version control
Data Split
- Training (70%): Data for learning patterns
- Validation (20%): Data for parameter adjustment
- Test (10%): Data for final evaluation
🔧 Advanced Configurations
Hyperparameters
- Learning Rate: Learning speed (0.001 typical)
- Batch Size: Number of images processed simultaneously
- Epochs: Number of complete passes through the data
- Architecture: Type of neural network (ResNet, EfficientNet, etc.)
Data Augmentation
- Rotation: Rotate images for greater variability
- Zoom: Zoom in/out for different scales
- Flip: Horizontal/vertical flip
- Noise: Add variations for robustness
Training Process
📈 Training States
🟡 Starting
- Data preparation: Loading and preprocessing
- Model initialization: Architecture configuration
- Parameter validation: Configuration verification
🔵 Training
- Visible progress: Progress bar with percentage
- Real-time metrics: Loss, accuracy per epoch
- Estimated time: Approximate remaining duration
- Cancellation possible: Option to stop if necessary
✅ Completed
- Final metrics: Precision, recall, F1-score
- Convergence graphs: Evolution during training
- Saved model: Available for predictions
- Detailed report: Complete analysis of results
❌ Failed
- Error message: Problem description
- Detailed logs: For technical diagnosis
- Suggestions: Possible solutions
- Retry: Option to correct and retrain
📊 Monitoring During Training
Visible Metrics
- Loss: Should decrease over time
- Accuracy: Should increase progressively
- Validation Loss: To detect overfitting
- Learning Curves: Evolution graphs
Health Indicators
- ✅ Normal convergence: Loss decreases smoothly
- ⚠️ Overfitting: Validation worsens while training improves
- ❌ Underfitting: Both training and validation stagnate
Configurations by Model Type
🏷️ Classification
json
{
"epochs": 50-100,
"batch_size": 16-32,
"learning_rate": 0.001,
"optimizer": "Adam",
"augmentation": true
}
🎯 Object Detection
json
{
"epochs": 100-200,
"batch_size": 8-16,
"learning_rate": 0.0001,
"backbone": "ResNet50",
"anchor_sizes": [32, 64, 128]
}
🎨 Segmentation
json
{
"epochs": 150-300,
"batch_size": 4-8,
"learning_rate": 0.0001,
"architecture": "U-Net",
"loss_function": "Dice + CrossEntropy"
}
🚨 Anomaly Detection
json
{
"epochs": 100-200,
"batch_size": 32-64,
"learning_rate": 0.001,
"latent_dim": 128,
"reconstruction_loss": "MSE"
}
Results Evaluation
📊 Main Metrics
For Classification
- Global Accuracy: Total percentage of correct predictions
- Precision per Class: Accuracy for each category
- Recall per Class: Ability to detect each class
- Confusion Matrix: Confusion table between classes
For Detection
- mAP@0.5: Mean precision with IoU > 0.5
- mAP@0.5:0.95: Average precision at different thresholds
- Detections per Image: Average number of objects found
- False Positives/Negatives: Detection errors
For Segmentation
- IoU per Class: Intersection over union per category
- Dice Score: Similarity measure between masks
- Pixel Accuracy: Percentage of correct pixels
- Boundary F1: Precision at edges
📈 Convergence Analysis
Important Graphs
- Training vs Validation Loss: To detect overfitting
- Accuracy Curves: Precision evolution
- Learning Rate Schedule: Learning speed adjustment
- Gradient Flow: Gradient flow through layers
Interpretation
- Parallel curves: Healthy training
- Divergence: Possible overfitting
- Stagnation: Possible underfitting or inadequate learning rate
Model Optimization
🎯 Performance Improvement
If Model Doesn't Converge
- Increase epochs: More training time
- Reduce learning rate: More gradual learning
- Change architecture: More appropriate model
- Review data: Dataset quality and balance
If There's Overfitting
- Data Augmentation: More data variability
- Dropout: Regularization during training
- Early Stopping: Stop when validation worsens
- More data: Increase dataset size
If There's Underfitting
- More complex model: More layers or parameters
- Increase learning rate: Faster learning
- Less regularization: Reduce dropout
- More epochs: More training time
⚡ Speed Optimization
During Training
- Larger batch size: More parallelization (if memory allows)
- Mixed precision: Use of float16 + float32
- More powerful GPU: Specialized hardware
- Optimized preprocessing: More efficient data loading
For Inference
- Model pruning: Remove unnecessary connections
- Quantization: Reduce weight precision
- TensorRT: NVIDIA GPU-specific optimization
- ONNX: Optimized format for production
Versioning and Management
📦 Version Control
- Descriptive names: v1.0_high_precision, v2.0_fast
- Metadata: Dates, parameters, dataset used
- Comparison: Side-by-side metrics between versions
- Rollback: Ability to return to previous version
💾 Storage
- Cloud models: Automatic backup
- Compression: Reduce file sizes
- Checkpoints: Save points during training
- Export formats: TensorFlow, PyTorch, ONNX
🔄 Lifecycle
- Training: Create new version
- Evaluation: Compare with previous versions
- Testing: Test on real data
- Deploy: Put into production
- Monitor: Track performance in use
- Retrain: Update with new data
Best Practices
📋 Before Training
- Data cleaning: Review image and label quality
- Exploratory analysis: Understand data distribution
- Baseline: Establish minimum acceptable metrics
- Strategy: Plan for different scenarios
🎯 During Training
- Active monitoring: Watch real-time metrics
- Regular checkpoints: Save progress
- Detailed logging: Record parameters and results
- Experimentation: Try different configurations
✅ After Training
- Cross validation: Confirm results on independent data
- Error analysis: Analyze cases where model fails
- Documentation: Record configuration and results
- Prepare deployment: Optimize for production