Data imputation is the process of replacing missing values in your dataset. Missing data can reduce model accuracy and lead to biased results. Our platform makes it easy to apply various imputation techniques without writing code.
The imputation selection in our form offers three methods to handle missing values in your dataset:
Replaces missing values with the mean, median, or mode of the column. This is the fastest option and works well for most datasets with random missing values.
Uses k-nearest neighbors to fill missing values based on similar data points. Works well when similar records tend to have similar values.
Models each feature with missing values as a function of other features. Best when there are relationships between features in your data.
To choose an imputation method in the form, simply click on one of the three imputation option cards. The selected option will be highlighted with the primary color.
Imputation Method | Best For | Consider When |
---|---|---|
Simple Imputer |
| You have limited computational resources or need a quick solution |
KNN Imputer |
| Your data has clear patterns or groups where similar records share similar values |
Iterative Imputer |
| You need high accuracy and can trade computation time for better imputation results |
Simple imputation replaces missing values using basic statistical measures from your data:
The K-Nearest Neighbors imputer:
This preserves the relationships between data points and works well when similar records have similar values.
The Iterative imputer is the most sophisticated option:
This method is especially powerful when features in your dataset have strong relationships with each other.