Documentation > Models > Model Training

Train Your Regression and Classification Models

Welcome to the ML Clever model training portal—your one-stop solution to build robust regression and classification models without writing any code. Whether you prefer fine-tuning parameters manually or leveraging our intelligent AutoML engine, this guide will help you turn your preprocessed data into high-performance models.

Regression Models

Powerful predictive modeling algorithms for your no-code machine learning applications. Select from industry-standard regression techniques with optimal configurations.

Machine Learning Made Simple

Our no-code platform leverages the power of scikit-learn, TensorFlow, and PyTorch to provide state-of-the-art predictive modeling capabilities with minimal setup. Each model is carefully documented with implementation guidance, parameter configurations, and performance characteristics to help you choose the right tool for your analytical needs without writing a single line of code.

Available Regression Models

When to use Linear Regression: ideal for datasets with clear linear trends, baseline modeling, and high interpretability in straightforward predictive tasks.
Limitations of Linear Regression: struggles with capturing non-linear patterns and is highly sensitive to outliers, which can reduce prediction accuracy.

Statistical Foundation

Linear Regression employs robust estimation techniques to minimize squared errors and provide optimal linear unbiased estimates under standard assumptions.

Computational Efficiency

Training complexity scales with input dimensionality and sample size, with efficient matrix operations for production deployment.

Data Requirements

Requires normalized, non-collinear input features for optimal performance. Handles continuous and categorical variables with appropriate preprocessing.
Configure these parameters to optimize the Linear Regression for your specific use case. Each parameter affects model training, performance, and prediction accuracy.
ParameterTypeDefaultDescription
fit_interceptBooleanTrueWhether to calculate the intercept for this model
normalizeBooleanFalseNormalizes features before regression

Detailed Parameter Reference

Whether to calculate the intercept for this model
Technical Implementation Guidance
True in most cases, False if data is already centered or you want to force the line through the origin
Effect on Model Performance
Setting to False forces the regression line to pass through the origin (0,0)
Normalizes features before regression
Technical Implementation Guidance
True when features have different scales
Effect on Model Performance
Helps when features have very different magnitudes or units
When is Random Forest Regressor most effective? Ideal for modeling complex non-linear relationships and delivering robust predictions across mixed data types.
Limitations of Random Forest: can be computationally intensive and offers less interpretability compared to simpler linear models.

Statistical Foundation

Random Forest Regressor employs robust estimation techniques to minimize squared errors and provide optimal linear unbiased estimates under standard assumptions.

Computational Efficiency

Training complexity scales with input dimensionality and sample size, with efficient matrix operations for production deployment.

Data Requirements

Requires normalized, non-collinear input features for optimal performance. Handles continuous and categorical variables with appropriate preprocessing.
Configure these parameters to optimize the Random Forest Regressor for your specific use case. Each parameter affects model training, performance, and prediction accuracy.
ParameterTypeDefaultDescription
n_estimatorsInteger100Number of trees in the forest
max_depthIntegerNoneMaximum depth of each tree
min_samples_splitInteger2Minimum samples required to split a node

Detailed Parameter Reference

Number of trees in the forest
Technical Implementation Guidance
100-200 for most datasets, 300+ for very complex problems
Effect on Model Performance
More trees improve performance but increase computation time; diminishing returns after 100-200 trees
Valid Numerical Range
10–500
Maximum depth of each tree
Technical Implementation Guidance
4-10 for small/medium datasets, 10-20 for larger datasets, None lets trees grow fully
Effect on Model Performance
Controls overfitting; smaller values create simpler models, larger values may overfit
Valid Numerical Range
1–50
Minimum samples required to split a node
Technical Implementation Guidance
2 for maximum tree growth, 5-10 for preventing overfitting
Effect on Model Performance
Higher values prevent creating nodes with few samples, reducing overfitting
Valid Numerical Range
2–20
Best use cases for Ridge Regression: effective for datasets with correlated features and where reducing overfitting in linear models is critical.
Limitations of Ridge Regression: it still assumes linear relationships and may not capture complex non-linear trends.

Statistical Foundation

Ridge Regression employs robust estimation techniques to minimize squared errors and provide optimal linear unbiased estimates under standard assumptions.

Computational Efficiency

Training complexity scales with input dimensionality and sample size, with efficient matrix operations for production deployment.

Data Requirements

Requires normalized, non-collinear input features for optimal performance. Handles continuous and categorical variables with appropriate preprocessing.
Configure these parameters to optimize the Ridge Regression for your specific use case. Each parameter affects model training, performance, and prediction accuracy.
ParameterTypeDefaultDescription
alphaFloat1.0Regularization strength
solverString'auto'Solver algorithm to use

Detailed Parameter Reference

Regularization strength
Technical Implementation Guidance
0.1-0.5 for mild regularization, 1.0-5.0 for stronger regularization
Effect on Model Performance
Higher values increase regularization, shrinking coefficients closer to zero
Valid Numerical Range
0.0–10.0
Solver algorithm to use
Technical Implementation Guidance
'auto' for automatic selection, 'sag' or 'saga' for large datasets
Effect on Model Performance
Different solvers are optimized for different dataset sizes and structures
Available Configuration Options
['auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga']
When to use CatBoost Regressor: ideal for datasets with numerous categorical features and for building production-ready models with minimal preprocessing.
Limitations of CatBoost: training can be slower than other boosting methods, which may affect rapid experimentation.

Statistical Foundation

CatBoost Regressor employs robust estimation techniques to minimize squared errors and provide optimal linear unbiased estimates under standard assumptions.

Computational Efficiency

Training complexity scales with input dimensionality and sample size, with efficient matrix operations for production deployment.

Data Requirements

Requires normalized, non-collinear input features for optimal performance. Handles continuous and categorical variables with appropriate preprocessing.
Configure these parameters to optimize the CatBoost Regressor for your specific use case. Each parameter affects model training, performance, and prediction accuracy.
ParameterTypeDefaultDescription
iterationsInteger1000Maximum number of trees to build
learning_rateFloat0.1Step size shrinkage used to prevent overfitting
depthInteger6Depth of the trees

Detailed Parameter Reference

Maximum number of trees to build
Technical Implementation Guidance
500-1000 for small datasets, 1000-3000 for medium/large datasets
Effect on Model Performance
More iterations improve accuracy until convergence but increase training time
Valid Numerical Range
100–10000
Step size shrinkage used to prevent overfitting
Technical Implementation Guidance
0.01-0.03 for large iterations (3000+), 0.05-0.1 for medium iterations (1000-2000)
Effect on Model Performance
Lower values require more iterations but often produce better models
Valid Numerical Range
0.01–1.0
Depth of the trees
Technical Implementation Guidance
6-8 for most problems, 8-10 for complex relationships
Effect on Model Performance
Deeper trees can model more complex patterns but risk overfitting
Valid Numerical Range
1–16
Best scenarios for LightGBM Regressor: optimized for large datasets with fast training and efficient memory usage.
Limitations of LightGBM: requires careful parameter tuning—especially on smaller datasets—to avoid overfitting.

Statistical Foundation

LightGBM Regressor employs robust estimation techniques to minimize squared errors and provide optimal linear unbiased estimates under standard assumptions.

Computational Efficiency

Training complexity scales with input dimensionality and sample size, with efficient matrix operations for production deployment.

Data Requirements

Requires normalized, non-collinear input features for optimal performance. Handles continuous and categorical variables with appropriate preprocessing.
Configure these parameters to optimize the LightGBM Regressor for your specific use case. Each parameter affects model training, performance, and prediction accuracy.
ParameterTypeDefaultDescription
n_estimatorsInteger100Number of boosting iterations
learning_rateFloat0.1Controls how much each tree contributes to the final outcome
max_depthInteger-1Maximum tree depth, -1 means no limit
num_leavesInteger31Maximum number of leaves in one tree

Detailed Parameter Reference

Number of boosting iterations
Technical Implementation Guidance
100-500 for most datasets, increase for complex problems
Effect on Model Performance
More estimators improve accuracy but with diminishing returns and increased training time
Valid Numerical Range
10–1000
Controls how much each tree contributes to the final outcome
Technical Implementation Guidance
0.05-0.1 is a good starting point, lower for more trees
Effect on Model Performance
Lower rates need more trees but often produce better generalizing models
Valid Numerical Range
0.01–1.0
Maximum tree depth, -1 means no limit
Technical Implementation Guidance
8-15 for controlled growth, -1 for maximum accuracy (with risk of overfitting)
Effect on Model Performance
Limiting depth prevents overfitting but may underfit complex relationships
Valid Numerical Range
[-1, 100]
Maximum number of leaves in one tree
Technical Implementation Guidance
20-40 for small/medium datasets, 50-100 for large complex datasets
Effect on Model Performance
Primary parameter affecting model complexity; higher values increase accuracy but may overfit
Valid Numerical Range
2–1000
When to opt for XGBoost Regressor: highly effective for tabular data with complex interactions and for achieving competition-winning performance.
Limitations of XGBoost: can be prone to overfitting if hyperparameters are not meticulously tuned.

Statistical Foundation

XGBoost Regressor employs robust estimation techniques to minimize squared errors and provide optimal linear unbiased estimates under standard assumptions.

Computational Efficiency

Training complexity scales with input dimensionality and sample size, with efficient matrix operations for production deployment.

Data Requirements

Requires normalized, non-collinear input features for optimal performance. Handles continuous and categorical variables with appropriate preprocessing.
Configure these parameters to optimize the XGBoost Regressor for your specific use case. Each parameter affects model training, performance, and prediction accuracy.
ParameterTypeDefaultDescription
n_estimatorsInteger100Number of gradient boosted trees
learning_rateFloat0.1Step size shrinkage used to prevent overfitting
max_depthInteger6Maximum depth of a tree
subsampleFloat1.0Subsample ratio of training instances

Detailed Parameter Reference

Number of gradient boosted trees
Technical Implementation Guidance
100-500 for most problems, more for complex datasets with early stopping
Effect on Model Performance
More trees improve accuracy but require more computation time
Valid Numerical Range
10–1000
Step size shrinkage used to prevent overfitting
Technical Implementation Guidance
0.01-0.05 for more trees (500+), 0.1 for fewer trees (100-200)
Effect on Model Performance
Lower rates provide more accurate models but require more trees and longer training
Valid Numerical Range
0.01–1.0
Maximum depth of a tree
Technical Implementation Guidance
3-6 for simpler problems, 6-10 for complex datasets
Effect on Model Performance
Deeper trees can model more complex patterns but increase overfitting risk
Valid Numerical Range
1–15
Subsample ratio of training instances
Technical Implementation Guidance
0.8-1.0 for most cases, 0.5-0.8 to prevent overfitting in noisy data
Effect on Model Performance
Lower values make training robust to noise but might increase underfitting
Valid Numerical Range
0.5–1.0
Best use cases for Lasso Regression: ideal for high-dimensional datasets needing automatic feature selection to build sparse, interpretable models.
Limitations of Lasso: may perform poorly with highly correlated features, leading to less stable predictions.

Statistical Foundation

Lasso Regression employs robust estimation techniques to minimize squared errors and provide optimal linear unbiased estimates under standard assumptions.

Computational Efficiency

Training complexity scales with input dimensionality and sample size, with efficient matrix operations for production deployment.

Data Requirements

Requires normalized, non-collinear input features for optimal performance. Handles continuous and categorical variables with appropriate preprocessing.
Configure these parameters to optimize the Lasso Regression for your specific use case. Each parameter affects model training, performance, and prediction accuracy.
ParameterTypeDefaultDescription
alphaFloat1.0Constant that multiplies the L1 term, controlling regularization strength
max_iterInteger1000Maximum number of iterations
selectionString'cyclic'If 'random', a random coefficient is updated every iteration

Detailed Parameter Reference

Constant that multiplies the L1 term, controlling regularization strength
Technical Implementation Guidance
0.01-0.1 for mild feature selection, 0.5-1.0 for aggressive selection
Effect on Model Performance
Higher values create simpler models by forcing more coefficients to exactly zero
Valid Numerical Range
0.0–10.0
Maximum number of iterations
Technical Implementation Guidance
1000-5000 for most datasets, more for complex problems
Effect on Model Performance
Increase if model fails to converge with default settings
Valid Numerical Range
100–10000
If 'random', a random coefficient is updated every iteration
Technical Implementation Guidance
'cyclic' for deterministic results, 'random' sometimes helps with convergence
Effect on Model Performance
Random selection can be faster for high-dimensional problems
Available Configuration Options
['cyclic', 'random']
When to use Elastic Net: best for datasets with correlated features requiring balanced feature selection and moderate model complexity.
Limitations of Elastic Net: increased tuning complexity because it requires balancing two regularization parameters.

Statistical Foundation

Elastic Net employs robust estimation techniques to minimize squared errors and provide optimal linear unbiased estimates under standard assumptions.

Computational Efficiency

Training complexity scales with input dimensionality and sample size, with efficient matrix operations for production deployment.

Data Requirements

Requires normalized, non-collinear input features for optimal performance. Handles continuous and categorical variables with appropriate preprocessing.
Configure these parameters to optimize the Elastic Net for your specific use case. Each parameter affects model training, performance, and prediction accuracy.
ParameterTypeDefaultDescription
alphaFloat1.0Constant that multiplies the penalty terms
l1_ratioFloat0.5The mixing parameter, with 0 <= l1_ratio <= 1
max_iterInteger1000Maximum number of iterations

Detailed Parameter Reference

Constant that multiplies the penalty terms
Technical Implementation Guidance
0.1-1.0 for most datasets, tune based on cross-validation
Effect on Model Performance
Controls overall regularization strength; higher values increase regularization
Valid Numerical Range
0.0–10.0
The mixing parameter, with 0 <= l1_ratio <= 1
Technical Implementation Guidance
0.1-0.3 for ridge-like behavior, 0.7-0.9 for lasso-like behavior, 0.5 for an equal mix
Effect on Model Performance
0 is Ridge penalty, 1 is Lasso penalty, and values in between mix both
Valid Numerical Range
0.0–1.0
Maximum number of iterations
Technical Implementation Guidance
1000-5000 for convergence in most problems
Effect on Model Performance
Increase for complex datasets if convergence warnings appear
Valid Numerical Range
100–10000
Best use cases for K Neighbors Regressor: suited for small to medium datasets with non-linear relationships, offering a simple, instance-based prediction method.
Limitations of K Neighbors: performance can suffer on large datasets and is highly sensitive to irrelevant features and feature scaling.

Statistical Foundation

K Neighbors Regressor employs robust estimation techniques to minimize squared errors and provide optimal linear unbiased estimates under standard assumptions.

Computational Efficiency

Training complexity scales with input dimensionality and sample size, with efficient matrix operations for production deployment.

Data Requirements

Requires normalized, non-collinear input features for optimal performance. Handles continuous and categorical variables with appropriate preprocessing.
Configure these parameters to optimize the K Neighbors Regressor for your specific use case. Each parameter affects model training, performance, and prediction accuracy.
ParameterTypeDefaultDescription
n_neighborsInteger5Number of neighbors to use for prediction
weightsString'uniform'Weight function used in prediction
algorithmString'auto'Algorithm used to compute nearest neighbors

Detailed Parameter Reference

Number of neighbors to use for prediction
Technical Implementation Guidance
3-5 for small datasets, 5-10 for medium datasets, 10+ for noisy data
Effect on Model Performance
Lower values make the model sensitive to noise, while higher values smooth predictions
Valid Numerical Range
1–20
Weight function used in prediction
Technical Implementation Guidance
'uniform' for equal neighbor weights, 'distance' when closer neighbors should matter more
Effect on Model Performance
'Distance' gives more weight to closer neighbors, often improving accuracy
Available Configuration Options
['uniform', 'distance']
Algorithm used to compute nearest neighbors
Technical Implementation Guidance
'auto' selects the most appropriate algorithm based on data
Effect on Model Performance
'ball_tree' for high dimensions, 'kd_tree' for lower dimensions, or 'brute' for small datasets
Available Configuration Options
['auto', 'ball_tree', 'kd_tree', 'brute']
When to choose SVM Regressor: effective for medium-sized datasets with complex patterns in high-dimensional spaces.
Limitations of SVM Regressor: can be slow on large datasets and requires careful parameter tuning for optimal performance.

Statistical Foundation

SVM Regressor employs robust estimation techniques to minimize squared errors and provide optimal linear unbiased estimates under standard assumptions.

Computational Efficiency

Training complexity scales with input dimensionality and sample size, with efficient matrix operations for production deployment.

Data Requirements

Requires normalized, non-collinear input features for optimal performance. Handles continuous and categorical variables with appropriate preprocessing.
Configure these parameters to optimize the SVM Regressor for your specific use case. Each parameter affects model training, performance, and prediction accuracy.
ParameterTypeDefaultDescription
kernelString'rbf'Specifies the kernel type to be used in the algorithm
CFloat1.0Regularization parameter
epsilonFloat0.1Epsilon in the epsilon-SVR model, specifies the epsilon-tube within which no penalty is given

Detailed Parameter Reference

Specifies the kernel type to be used in the algorithm
Technical Implementation Guidance
'linear' for linearly separable data, 'rbf' for most problems, 'poly' for specific non-linear patterns
Effect on Model Performance
Determines how the algorithm transforms the data to find support vectors
Available Configuration Options
['linear', 'poly', 'rbf', 'sigmoid']
Regularization parameter
Technical Implementation Guidance
0.1-1.0 for noisy data, 1.0-10.0 for clean data, larger for more complex patterns
Effect on Model Performance
Smaller values specify stronger regularization, helping to prevent overfitting
Valid Numerical Range
0.1–100.0
Epsilon in the epsilon-SVR model, specifies the epsilon-tube within which no penalty is given
Technical Implementation Guidance
0.1 for most problems, or smaller (0.01-0.05) for more accurate models
Effect on Model Performance
Controls the width of the epsilon-tube, which affects the number of support vectors
Valid Numerical Range
0.01–1.0
Best scenarios for Decision Tree Regressor: ideal for interpretable modeling, capturing non-linear patterns, and handling mixed data types.
Limitations of Decision Trees: prone to overfitting and instability, as small data changes can lead to very different tree structures.

Statistical Foundation

Decision Tree Regressor employs robust estimation techniques to minimize squared errors and provide optimal linear unbiased estimates under standard assumptions.

Computational Efficiency

Training complexity scales with input dimensionality and sample size, with efficient matrix operations for production deployment.

Data Requirements

Requires normalized, non-collinear input features for optimal performance. Handles continuous and categorical variables with appropriate preprocessing.
Configure these parameters to optimize the Decision Tree Regressor for your specific use case. Each parameter affects model training, performance, and prediction accuracy.
ParameterTypeDefaultDescription
max_depthIntegerNoneMaximum depth of the tree
min_samples_splitInteger2Minimum number of samples required to split a node
min_samples_leafInteger1Minimum number of samples required to be at a leaf node

Detailed Parameter Reference

Maximum depth of the tree
Technical Implementation Guidance
3-5 for simple problems, 5-10 for medium complexity, None lets tree grow fully (risk of overfitting)
Effect on Model Performance
Controls model complexity; deeper trees capture complex relationships but may overfit
Valid Numerical Range
1–50
Minimum number of samples required to split a node
Technical Implementation Guidance
5-10 for small datasets, 20-50 for large datasets to prevent overfitting
Effect on Model Performance
Larger values prevent creating nodes with few samples, reducing the risk of overfitting
Valid Numerical Range
2–20
Minimum number of samples required to be at a leaf node
Technical Implementation Guidance
1-5 for small datasets, 10-20 for larger datasets
Effect on Model Performance
Prevents creation of very small leaves, improving model generalization
Valid Numerical Range
1–20
When to use Gradient Boosting Regressor: ideal for achieving high predictive accuracy across diverse feature types and for handling outliers robustly.
Limitations of Gradient Boosting: can be computationally intensive and may overfit if default parameters are not properly tuned.

Statistical Foundation

Gradient Boosting Regressor employs robust estimation techniques to minimize squared errors and provide optimal linear unbiased estimates under standard assumptions.

Computational Efficiency

Training complexity scales with input dimensionality and sample size, with efficient matrix operations for production deployment.

Data Requirements

Requires normalized, non-collinear input features for optimal performance. Handles continuous and categorical variables with appropriate preprocessing.
Configure these parameters to optimize the Gradient Boosting Regressor for your specific use case. Each parameter affects model training, performance, and prediction accuracy.
ParameterTypeDefaultDescription
n_estimatorsInteger100Number of boosting stages to perform
learning_rateFloat0.1Shrinks the contribution of each tree
max_depthInteger3Maximum depth of the regression estimators
subsampleFloat1.0Fraction of samples used for fitting individual trees

Detailed Parameter Reference

Number of boosting stages to perform
Technical Implementation Guidance
100-500 depending on dataset size and complexity
Effect on Model Performance
More estimators generally improve performance until a point of diminishing returns
Valid Numerical Range
50–1000
Shrinks the contribution of each tree
Technical Implementation Guidance
0.05-0.1 for most cases, with lower values (0.01-0.05) if more trees are used
Effect on Model Performance
Lower learning rates require more trees but often yield better accuracy
Valid Numerical Range
0.01–0.3
Maximum depth of the regression estimators
Technical Implementation Guidance
3-5 is ideal for most problems
Effect on Model Performance
Shallow trees are generally preferred to balance accuracy and overfitting risk
Valid Numerical Range
1–10
Fraction of samples used for fitting individual trees
Technical Implementation Guidance
0.8-1.0 for most datasets, or 0.5-0.8 for very noisy data to introduce randomness
Effect on Model Performance
Using a subsample ratio less than 1.0 can help prevent overfitting by adding randomness to the training process
Valid Numerical Range
0.5–1.0

Was this page helpful?

Need help?Contact Support
Questions?Contact Sales

Last updated: 3/22/2025

ML Clever Docs