Models

Documentation > Models > Model Training > AutoML Training

AutoML Training

AutoML (Automated Machine Learning) enables you to build, optimize, and deploy high-quality machine learning models with minimal effort. This powerful feature automatically discovers the best algorithms, hyperparameters, and preprocessing steps for your specific dataset, making machine learning accessible to users of all experience levels.

Benefits of AutoML

Save Time

Reduce the time spent on model selection, hyperparameter tuning, and preprocessing by automating the entire process.

Improved Performance

Discover model combinations and parameters that you might not have tried manually, leading to better overall model performance.

No Coding Required

Create production-ready machine learning pipelines without writing a single line of code or understanding complex algorithms.

Best Practices Built-in

Leverage industry best practices for preprocessing, cross-validation, feature selection, and model evaluation automatically.

Ways to Start AutoML

ML Clever provides three different approaches to start your AutoML process, depending on your workflow preferences and requirements:

Method 1: One-Click AutoML from Dataset

Launch AutoML directly from your dataset view with minimal configuration using the AutoML button.

Steps:

Navigate to your dataset view page
Click the AutoML button in the top right section
Select your desired automation level (see "AutoML Levels" below)
Click "Train AutoML Models" to begin the automated process

Ideal for: Quick exploration of your dataset's potential or when you want a completely automated solution with minimal input.

Method 2: Manual Preprocessing + AutoML

Manually preprocess your data first to apply domain-specific transformations, then use AutoML for model selection and optimization.

Steps:

Navigate to your dataset view page
Click the "Preprocess Dataset" button to access preprocessing options
Apply desired preprocessing steps (imputation, encoding, scaling, etc.)
After preprocessing, select "Configure AutoML" from the model training options
Adjust AutoML settings if needed
Click "Train AutoML Models" to begin training with your preprocessed data

Ideal for: Users who want to apply their domain knowledge to data preparation while letting AutoML handle the model selection and optimization.

Method 3: Pipeline with AutoML Component

Create a complete end-to-end pipeline that includes preprocessing steps and AutoML in a reusable workflow.

Steps:

Navigate to Workflows and create a new pipeline
Add data source components to load your dataset
Add preprocessing components as needed
Add an AutoML component to the pipeline
Configure each component's settings
Connect the components to create a complete flow
Run the pipeline to execute the entire workflow

Ideal for: Production workflows, recurring tasks, or when you need to standardize your ML process across multiple datasets.

Pro Tip: Check the "ML Readiness" tab on your dataset view page before starting AutoML. Addressing any data quality issues first can significantly improve your AutoML results.

AutoML Levels

ML Clever offers three levels of automation to match your specific needs and time constraints:

Quick Exploration

A rapid analysis that tries several common models with basic optimization to give you quick insights.

• Training time: 2-5 minutes
• Tests 3-5 model types
• Basic hyperparameter tuning
• Standard preprocessing only

Ideal for initial dataset exploration or when you need quick results.

Balanced (Recommended)

A well-rounded approach that balances thoroughness with reasonable time constraints.

• Training time: 10-20 minutes
• Tests 5-10 model types
• Moderate hyperparameter optimization
• Advanced preprocessing techniques
• Ensemble methods included

Suitable for most use cases and production-ready models.

Thorough Optimization

An extensive search for the absolute best model with comprehensive optimization.

• Training time: 30-60+ minutes
• Tests 10+ model types
• Extensive hyperparameter optimization
• Comprehensive feature engineering
• Advanced ensemble techniques
• Cross-validation with multiple metrics

Best for critical applications where model performance is paramount.

Understanding the AutoML Process

When you start an AutoML training job, ML Clever systematically works through the following stages:

1
Data Analysis
AutoML begins by analyzing your dataset's characteristics, including:
- Feature types and distributions
- Missing value patterns
- Target variable type (classification or regression)
- Potential data quality issues
- Feature importance estimates
2
Preprocessing Selection
Based on the data analysis, AutoML selects appropriate preprocessing techniques:
- Imputation strategies for missing values
- Encoding methods for categorical features
- Scaling transformations for numerical features
- Feature selection to eliminate irrelevant variables
- Data transformation to address skewness or outliers
3
Model Selection & Training
AutoML intelligently selects and trains multiple model types:
- Trains a diverse set of models appropriate for your problem type
- For classification: Decision Trees, Random Forests, Gradient Boosting, Neural Networks, etc.
- For regression: Linear Regression, Ridge/Lasso, Random Forests, XGBoost, etc.
- Uses cross-validation to ensure reliable performance estimates
- Applies early stopping to less promising models to focus on high-potential candidates
4
Hyperparameter Optimization
For promising models, AutoML performs sophisticated hyperparameter tuning:
- Uses Bayesian optimization and other advanced techniques
- Efficiently explores hyperparameter spaces
- Focuses computational resources on the most promising configurations
- Balances exploration vs. exploitation of hyperparameter settings
- Adapts search strategies based on intermediate results
5
Ensemble Creation
AutoML builds ensemble models that combine top-performing individual models:
- Stacking: Trains a meta-model on the predictions of base models
- Blending: Combines predictions using optimized weighting
- Voting: Uses majority voting for classification or averaging for regression
- Boosting: Sequential training of models to correct previous errors
6
Final Evaluation & Selection
AutoML performs a comprehensive evaluation to select the best model:
- Evaluates models on holdout data not used during training
- Considers multiple performance metrics relevant to your task
- For classification: Accuracy, F1-score, AUC-ROC, precision, recall
- For regression: RMSE, MAE, R², explained variance
- Assesses model reliability, training time, and inference speed
- Produces a final ranking of models based on overall performance

Understanding AutoML Results

Once your AutoML process completes, ML Clever presents the results in an organized dashboard:

Top Models Section

The Top Models section displays all models created during your AutoML run, sorted by performance:

Model Card Information

Model Name: Descriptive name of the model type (e.g., "Random Forest", "XGBoost")
Model Type: Technical classification of the model
Score: Performance metric value (color-coded by performance level)
- 90-100: Very good (dark green)
- 75-89: Good (light green)
- 50-74: Average (yellow)
- 25-49: Warning (orange)
- 0-24: Poor (red)
Status Indicator: Visual indicator of the model's current state
- Completed: Training finished successfully
- Started: Model is currently training
- Error: Training failed (hover for details)

Action Buttons

Create Dashboard: Generate a detailed performance dashboard for the selected model
View Deployment: Access deployment options for the trained model
Predict: Use the model to make predictions on new data

Note: Clicking on any model card will take you to the detailed model page where you can explore performance metrics, feature importance, confusion matrices (for classification), and other in-depth analyses.

AutoML Best Practices

Follow these guidelines to get the most out of ML Clever's AutoML capabilities:

Clean Your Data First

While AutoML handles preprocessing, addressing obvious data quality issues (extreme outliers, duplicate records, irrelevant columns) beforehand can improve results significantly.

Choose the Right AutoML Level

Match the AutoML level to your use case. For initial exploration, choose Quick; for critical applications with high performance requirements, choose Thorough.

Consider Compute Resources

Thorough optimization requires significant computing power. For large datasets, consider running AutoML during off-hours or upgrading your compute resources.

Review Multiple Models

While AutoML selects the best overall model, review the top 3-5 models. Sometimes the second-best model might be preferable due to explainability, inference speed, or other factors.

Combine with Domain Knowledge

For critical applications, use AutoML as a starting point, then fine-tune the best model with your domain expertise using the manual model configuration options.

Save Pipeline Configurations

When you find effective preprocessing + AutoML combinations, save them as pipeline templates for future use with similar datasets.

Troubleshooting Common Issues

AutoML Process Stops or Fails

Occasionally, the AutoML process might stop unexpectedly or fail to complete.

Solutions:

Check that your dataset size is within platform limits
Ensure there are no extreme outliers causing numerical instability
Try reducing the AutoML level for very large datasets
Check if your account has sufficient compute credits remaining
For persistent issues, use the "Report Issue" button to contact support

Poor Model Performance

If all AutoML models show unexpectedly poor performance, there might be underlying data issues.

Solutions:

Review the ML Readiness score and address identified issues
Check for data leakage or target variable problems
Ensure your target variable has sufficient variance and signal
Consider feature engineering to create more predictive variables
Try a different problem formulation (e.g., binary vs. multi-class)

Long Processing Times

AutoML can sometimes take longer than expected, especially with complex datasets.

Solutions:

Select a faster AutoML level for quicker results
Reduce the number of features by removing low-importance columns
Consider sampling your data if you have millions of rows
Schedule AutoML jobs during off-hours for large datasets
Use the "Stop Task" button if needed, and the best models found so far will be saved

Memory Errors

Very large datasets or complex models might exceed available memory.

Solutions:

Reduce dataset size through appropriate sampling
Remove unnecessary high-cardinality categorical features
Upgrade to a higher compute tier with more memory
Use the Quick or Balanced AutoML level instead of Thorough
Pre-aggregate or bin features with high dimensionality

Frequently Asked Questions

How long does AutoML typically take to run?

Runtime varies significantly based on dataset size, number of features, and the selected AutoML level. Quick Exploration usually completes in 2-5 minutes, Balanced takes 10-20 minutes, and Thorough Optimization can take 30-60+ minutes for complex datasets. The progress indicators will show estimated completion time.

Can I stop an AutoML run in the middle?

Yes, you can stop an AutoML run at any time using the "Stop Task" button. ML Clever will save all completed models up to that point, allowing you to review partial results. This is useful if you've already found a satisfactory model or need to adjust your approach.

What algorithms does AutoML try?

For classification tasks, AutoML evaluates models including Logistic Regression, Decision Trees, Random Forests, Gradient Boosting, Support Vector Machines, K-Nearest Neighbors, and Neural Networks. For regression, it tries Linear Regression, Ridge/Lasso Regression, Decision Trees, Random Forests, Gradient Boosting (XGBoost, LightGBM), and Neural Networks. The specific set varies based on your data characteristics and chosen AutoML level.

How does AutoML handle missing values and categorical features?

AutoML automatically detects and handles missing values using imputation techniques appropriate for your data. For numerical features, it may use mean, median, or model-based imputation. For categorical features, it typically uses mode imputation or creates a dedicated "missing" category. Categorical features are automatically encoded using appropriate methods like one-hot, label, or target encoding based on cardinality and relationship with the target.

Can I see which preprocessing steps were applied?

Yes, in the detailed model view (accessible by clicking on any model card), you'll find a "Preprocessing Pipeline" section that shows all preprocessing steps applied to your data, including imputation methods, encoding techniques, scaling approaches, and feature transformations.

How do I choose between multiple good models?

Consider these factors beyond the primary score: (1) Secondary metrics specific to your problem (e.g., precision vs. recall for imbalanced classification), (2) Model explainability if interpretability is important, (3) Inference speed for real-time applications, (4) Memory footprint for deployment constraints, and (5) Robustness to data drift for long-term stability.

Regression Models

Detailed guide to regression model types, metrics, and use cases

Classification Models

In-depth explanation of classification model options and evaluation

Data Preprocessing

Learn about manual preprocessing options for your datasets

Model Evaluation

Understand how to interpret model performance metrics and charts

Continue to Model Evaluation Skip to Model Deployment Back to Manual Training

Was this page helpful?

Need help?Contact Support

Questions?Contact Sales

Last updated: 3/22/2025

ML Clever Docs

Models

AutoML Training

Benefits of AutoML

Save Time

Improved Performance

No Coding Required

Best Practices Built-in

Ways to Start AutoML

Method 1: One-Click AutoML from Dataset

Steps:

Method 2: Manual Preprocessing + AutoML

Steps:

Method 3: Pipeline with AutoML Component

Steps:

AutoML Levels

Quick Exploration

Balanced (Recommended)

Thorough Optimization

Understanding the AutoML Process

Data Analysis

Preprocessing Selection

Model Selection & Training

Hyperparameter Optimization

Ensemble Creation

Final Evaluation & Selection

Understanding AutoML Results

Top Models Section

Model Card Information

Action Buttons

AutoML Best Practices

Clean Your Data First

Choose the Right AutoML Level

Consider Compute Resources

Review Multiple Models

Combine with Domain Knowledge

Save Pipeline Configurations

Troubleshooting Common Issues

AutoML Process Stops or Fails

Solutions:

Poor Model Performance

Solutions:

Long Processing Times

Solutions:

Memory Errors

Solutions:

Frequently Asked Questions

How long does AutoML typically take to run?

Can I stop an AutoML run in the middle?

What algorithms does AutoML try?

How does AutoML handle missing values and categorical features?

Can I see which preprocessing steps were applied?

How do I choose between multiple good models?

Related Resources

Regression Models

Classification Models

Data Preprocessing

Model Evaluation

Was this page helpful?

Models