Documentation > Models > Model Training > AutoML Training

AutoML Training

AutoML (Automated Machine Learning) enables you to build, optimize, and deploy high-quality machine learning models with minimal effort. This powerful feature automatically discovers the best algorithms, hyperparameters, and preprocessing steps for your specific dataset, making machine learning accessible to users of all experience levels.

AutoML Overview Screenshot

Benefits of AutoML

Save Time

Reduce the time spent on model selection, hyperparameter tuning, and preprocessing by automating the entire process.

Improved Performance

Discover model combinations and parameters that you might not have tried manually, leading to better overall model performance.

No Coding Required

Create production-ready machine learning pipelines without writing a single line of code or understanding complex algorithms.

Best Practices Built-in

Leverage industry best practices for preprocessing, cross-validation, feature selection, and model evaluation automatically.

Ways to Start AutoML

ML Clever provides three different approaches to start your AutoML process, depending on your workflow preferences and requirements:

Method 1: One-Click AutoML from Dataset

Launch AutoML directly from your dataset view with minimal configuration using the AutoML button.

Steps:

  1. Navigate to your dataset view page
  2. Click the AutoML button in the top right section
  3. Select your desired automation level (see "AutoML Levels" below)
  4. Click "Train AutoML Models" to begin the automated process

Ideal for: Quick exploration of your dataset's potential or when you want a completely automated solution with minimal input.

Method 2: Manual Preprocessing + AutoML

Manually preprocess your data first to apply domain-specific transformations, then use AutoML for model selection and optimization.

Steps:

  1. Navigate to your dataset view page
  2. Click the "Preprocess Dataset" button to access preprocessing options
  3. Apply desired preprocessing steps (imputation, encoding, scaling, etc.)
  4. After preprocessing, select "Configure AutoML" from the model training options
  5. Adjust AutoML settings if needed
  6. Click "Train AutoML Models" to begin training with your preprocessed data

Ideal for: Users who want to apply their domain knowledge to data preparation while letting AutoML handle the model selection and optimization.

Method 3: Pipeline with AutoML Component

Create a complete end-to-end pipeline that includes preprocessing steps and AutoML in a reusable workflow.

Steps:

  1. Navigate to Workflows and create a new pipeline
  2. Add data source components to load your dataset
  3. Add preprocessing components as needed
  4. Add an AutoML component to the pipeline
  5. Configure each component's settings
  6. Connect the components to create a complete flow
  7. Run the pipeline to execute the entire workflow

Ideal for: Production workflows, recurring tasks, or when you need to standardize your ML process across multiple datasets.

Pro Tip: Check the "ML Readiness" tab on your dataset view page before starting AutoML. Addressing any data quality issues first can significantly improve your AutoML results.

AutoML Levels

ML Clever offers three levels of automation to match your specific needs and time constraints:

Quick Exploration

A rapid analysis that tries several common models with basic optimization to give you quick insights.

  • • Training time: 2-5 minutes
  • • Tests 3-5 model types
  • • Basic hyperparameter tuning
  • • Standard preprocessing only

Ideal for initial dataset exploration or when you need quick results.

Balanced (Recommended)

A well-rounded approach that balances thoroughness with reasonable time constraints.

  • • Training time: 10-20 minutes
  • • Tests 5-10 model types
  • • Moderate hyperparameter optimization
  • • Advanced preprocessing techniques
  • • Ensemble methods included

Suitable for most use cases and production-ready models.

Thorough Optimization

An extensive search for the absolute best model with comprehensive optimization.

  • • Training time: 30-60+ minutes
  • • Tests 10+ model types
  • • Extensive hyperparameter optimization
  • • Comprehensive feature engineering
  • • Advanced ensemble techniques
  • • Cross-validation with multiple metrics

Best for critical applications where model performance is paramount.

Understanding the AutoML Process

When you start an AutoML training job, ML Clever systematically works through the following stages:

  1. 1

    Data Analysis

    AutoML begins by analyzing your dataset's characteristics, including:

    • Feature types and distributions
    • Missing value patterns
    • Target variable type (classification or regression)
    • Potential data quality issues
    • Feature importance estimates
  2. 2

    Preprocessing Selection

    Based on the data analysis, AutoML selects appropriate preprocessing techniques:

    • Imputation strategies for missing values
    • Encoding methods for categorical features
    • Scaling transformations for numerical features
    • Feature selection to eliminate irrelevant variables
    • Data transformation to address skewness or outliers
  3. 3

    Model Selection & Training

    AutoML intelligently selects and trains multiple model types:

    • Trains a diverse set of models appropriate for your problem type
    • For classification: Decision Trees, Random Forests, Gradient Boosting, Neural Networks, etc.
    • For regression: Linear Regression, Ridge/Lasso, Random Forests, XGBoost, etc.
    • Uses cross-validation to ensure reliable performance estimates
    • Applies early stopping to less promising models to focus on high-potential candidates
  4. 4

    Hyperparameter Optimization

    For promising models, AutoML performs sophisticated hyperparameter tuning:

    • Uses Bayesian optimization and other advanced techniques
    • Efficiently explores hyperparameter spaces
    • Focuses computational resources on the most promising configurations
    • Balances exploration vs. exploitation of hyperparameter settings
    • Adapts search strategies based on intermediate results
  5. 5

    Ensemble Creation

    AutoML builds ensemble models that combine top-performing individual models:

    • Stacking: Trains a meta-model on the predictions of base models
    • Blending: Combines predictions using optimized weighting
    • Voting: Uses majority voting for classification or averaging for regression
    • Boosting: Sequential training of models to correct previous errors
  6. 6

    Final Evaluation & Selection

    AutoML performs a comprehensive evaluation to select the best model:

    • Evaluates models on holdout data not used during training
    • Considers multiple performance metrics relevant to your task
    • For classification: Accuracy, F1-score, AUC-ROC, precision, recall
    • For regression: RMSE, MAE, R², explained variance
    • Assesses model reliability, training time, and inference speed
    • Produces a final ranking of models based on overall performance

Understanding AutoML Results

Once your AutoML process completes, ML Clever presents the results in an organized dashboard:

AutoML Results Dashboard Screenshot

Top Models Section

The Top Models section displays all models created during your AutoML run, sorted by performance:

Model Card Information

  • Model Name: Descriptive name of the model type (e.g., "Random Forest", "XGBoost")
  • Model Type: Technical classification of the model
  • Score: Performance metric value (color-coded by performance level)
    • 90-100: Very good (dark green)
    • 75-89: Good (light green)
    • 50-74: Average (yellow)
    • 25-49: Warning (orange)
    • 0-24: Poor (red)
  • Status Indicator: Visual indicator of the model's current state
    • Completed: Training finished successfully
    • Started: Model is currently training
    • Error: Training failed (hover for details)

Action Buttons

  • Create Dashboard: Generate a detailed performance dashboard for the selected model
  • View Deployment: Access deployment options for the trained model
  • Predict: Use the model to make predictions on new data

Note: Clicking on any model card will take you to the detailed model page where you can explore performance metrics, feature importance, confusion matrices (for classification), and other in-depth analyses.

AutoML Best Practices

Follow these guidelines to get the most out of ML Clever's AutoML capabilities:

Clean Your Data First

While AutoML handles preprocessing, addressing obvious data quality issues (extreme outliers, duplicate records, irrelevant columns) beforehand can improve results significantly.

Choose the Right AutoML Level

Match the AutoML level to your use case. For initial exploration, choose Quick; for critical applications with high performance requirements, choose Thorough.

Consider Compute Resources

Thorough optimization requires significant computing power. For large datasets, consider running AutoML during off-hours or upgrading your compute resources.

Review Multiple Models

While AutoML selects the best overall model, review the top 3-5 models. Sometimes the second-best model might be preferable due to explainability, inference speed, or other factors.

Combine with Domain Knowledge

For critical applications, use AutoML as a starting point, then fine-tune the best model with your domain expertise using the manual model configuration options.

Save Pipeline Configurations

When you find effective preprocessing + AutoML combinations, save them as pipeline templates for future use with similar datasets.

Troubleshooting Common Issues

AutoML Process Stops or Fails

Occasionally, the AutoML process might stop unexpectedly or fail to complete.

Solutions:

  • Check that your dataset size is within platform limits
  • Ensure there are no extreme outliers causing numerical instability
  • Try reducing the AutoML level for very large datasets
  • Check if your account has sufficient compute credits remaining
  • For persistent issues, use the "Report Issue" button to contact support

Poor Model Performance

If all AutoML models show unexpectedly poor performance, there might be underlying data issues.

Solutions:

  • Review the ML Readiness score and address identified issues
  • Check for data leakage or target variable problems
  • Ensure your target variable has sufficient variance and signal
  • Consider feature engineering to create more predictive variables
  • Try a different problem formulation (e.g., binary vs. multi-class)

Long Processing Times

AutoML can sometimes take longer than expected, especially with complex datasets.

Solutions:

  • Select a faster AutoML level for quicker results
  • Reduce the number of features by removing low-importance columns
  • Consider sampling your data if you have millions of rows
  • Schedule AutoML jobs during off-hours for large datasets
  • Use the "Stop Task" button if needed, and the best models found so far will be saved

Memory Errors

Very large datasets or complex models might exceed available memory.

Solutions:

  • Reduce dataset size through appropriate sampling
  • Remove unnecessary high-cardinality categorical features
  • Upgrade to a higher compute tier with more memory
  • Use the Quick or Balanced AutoML level instead of Thorough
  • Pre-aggregate or bin features with high dimensionality

Frequently Asked Questions

How long does AutoML typically take to run?

Runtime varies significantly based on dataset size, number of features, and the selected AutoML level. Quick Exploration usually completes in 2-5 minutes, Balanced takes 10-20 minutes, and Thorough Optimization can take 30-60+ minutes for complex datasets. The progress indicators will show estimated completion time.

Can I stop an AutoML run in the middle?

Yes, you can stop an AutoML run at any time using the "Stop Task" button. ML Clever will save all completed models up to that point, allowing you to review partial results. This is useful if you've already found a satisfactory model or need to adjust your approach.

What algorithms does AutoML try?

For classification tasks, AutoML evaluates models including Logistic Regression, Decision Trees, Random Forests, Gradient Boosting, Support Vector Machines, K-Nearest Neighbors, and Neural Networks. For regression, it tries Linear Regression, Ridge/Lasso Regression, Decision Trees, Random Forests, Gradient Boosting (XGBoost, LightGBM), and Neural Networks. The specific set varies based on your data characteristics and chosen AutoML level.

How does AutoML handle missing values and categorical features?

AutoML automatically detects and handles missing values using imputation techniques appropriate for your data. For numerical features, it may use mean, median, or model-based imputation. For categorical features, it typically uses mode imputation or creates a dedicated "missing" category. Categorical features are automatically encoded using appropriate methods like one-hot, label, or target encoding based on cardinality and relationship with the target.

Can I see which preprocessing steps were applied?

Yes, in the detailed model view (accessible by clicking on any model card), you'll find a "Preprocessing Pipeline" section that shows all preprocessing steps applied to your data, including imputation methods, encoding techniques, scaling approaches, and feature transformations.

How do I choose between multiple good models?

Consider these factors beyond the primary score: (1) Secondary metrics specific to your problem (e.g., precision vs. recall for imbalanced classification), (2) Model explainability if interpretability is important, (3) Inference speed for real-time applications, (4) Memory footprint for deployment constraints, and (5) Robustness to data drift for long-term stability.

Was this page helpful?

Need help?Contact Support
Questions?Contact Sales

Last updated: 3/22/2025

ML Clever Docs