Documentation > Data > Data Preprocessing

Data Imputation

Data imputation is the process of replacing missing values in your dataset. Missing data can reduce model accuracy and lead to biased results. Our platform makes it easy to apply various imputation techniques without writing code.

Imputation Options

The imputation selection in our form offers three methods to handle missing values in your dataset:

Simple Imputer

Replaces missing values with the mean, median, or mode of the column. This is the fastest option and works well for most datasets with random missing values.

KNN Imputer

Uses k-nearest neighbors to fill missing values based on similar data points. Works well when similar records tend to have similar values.

Iterative Imputer

Models each feature with missing values as a function of other features. Best when there are relationships between features in your data.

Selecting an Imputation Method

To choose an imputation method in the form, simply click on one of the three imputation option cards. The selected option will be highlighted with the primary color.

Selection Tips

  • For datasets with few missing values, Simple Imputer is usually sufficient
  • For datasets where similar records have similar values, use KNN Imputer
  • For complex datasets with relationships between features, use Iterative Imputer

When to Use Each Method

Imputation MethodBest ForConsider When
Simple Imputer
  • Randomly missing data
  • Quick analysis
  • Small datasets
You have limited computational resources or need a quick solution
KNN Imputer
  • Structured datasets
  • Similar records
  • Clustered data
Your data has clear patterns or groups where similar records share similar values
Iterative Imputer
  • Complex datasets
  • Feature relationships
  • Production models
You need high accuracy and can trade computation time for better imputation results

How Each Imputation Method Works

Simple Imputer

Simple imputation replaces missing values using basic statistical measures from your data:

  • Mean: Replaces missing values with the average of the column (for numerical data)
  • Median: Replaces missing values with the middle value of the column (good for skewed data)
  • Mode: Replaces missing values with the most frequent value (works for categorical data)

KNN Imputer

The K-Nearest Neighbors imputer:

  1. Identifies the k most similar records to the one with missing values
  2. Uses the values from these similar records to calculate a replacement
  3. Weights closer neighbors more heavily in the calculation

This preserves the relationships between data points and works well when similar records have similar values.

Iterative Imputer

The Iterative imputer is the most sophisticated option:

  1. Treats each feature with missing values as a target variable
  2. Uses all other features to predict the missing values
  3. Repeats this process multiple times, refining estimates with each iteration

This method is especially powerful when features in your dataset have strong relationships with each other.

Was this page helpful?

Need help?Contact Support
Questions?Contact Sales

Last updated: 3/22/2025

ML Clever Docs