Documentation > Data > Data Preprocessing

Feature Encoding

Feature encoding converts categorical data into numerical formats that machine learning algorithms can understand. Different encoding methods are suitable for different types of categorical data and modeling approaches.

Encoding Options

Our form provides four encoding options for transforming categorical data in your dataset:

One-Hot Encoder

Creates binary columns for each category value. Best for nominal categories with no inherent order or ranking relationship between values.

Label Encoder

Assigns a unique integer to each category. Simple but implies an ordering which may not be appropriate for all categorical data.

Ordinal Encoder

Similar to Label Encoder but specifically designed for ordinal data where categories have a meaningful order or rank.

None

Skip encoding if your data is already properly encoded or if you only have numerical features.

Selecting an Encoding Method

To choose an encoding method in the form, simply click on one of the encoding option cards. The selected option will be highlighted with the primary color.

Selection Tips

  • For most categorical variables with no inherent order, use One-Hot Encoder
  • For ordinal data with a clear ranking (e.g., small/medium/large), use Ordinal Encoder
  • For target variables in classification problems, use Label Encoder
  • When no categorical variables exist or for already encoded data, select None

When to Use Each Method

Encoding MethodBest ForExample Use Case
One-Hot Encoder
  • Nominal categories
  • No ordinal relationship
  • Low cardinality features
Color, product type, country, or any category without an inherent order
Label Encoder
  • Target variables
  • Decision tree models
  • High cardinality features
Target variables in classification, text labels where size is a concern
Ordinal Encoder
  • Ranked categories
  • Clear ordering
  • Hierarchical data
Education level, size categories (small/medium/large), satisfaction ratings
None
  • Numerical-only datasets
  • Pre-encoded data
  • Text/image processing
Already processed data, pure numerical datasets, specialized preprocessing pipelines

How Each Encoding Method Works

One-Hot Encoder

One-hot encoding creates a new binary column for each category value:

OriginalEncoded (Red)Encoded (Blue)Encoded (Green)
Red100
Blue010
Green001

Label Encoder

Label encoding assigns a unique integer to each category:

OriginalEncoded
Red0
Blue1
Green2

Note: This creates an implicit ordering (Red < Blue < Green) which may not be appropriate for all categorical data.

Ordinal Encoder

Ordinal encoding is similar to label encoding but is explicitly used for data with a natural order:

OriginalEncoded
Small0
Medium1
Large2

Here the ordering (Small < Medium < Large) correctly represents the inherent hierarchy in the data.

Important Considerations

Encoding Trade-offs

One-Hot Encoding

No ordinal relationship implied

Works well with most algorithms

Creates many new columns (dimensionality)

Label/Ordinal Encoding

Maintains original column count

Handles high cardinality features well

May imply incorrect ordering in non-ordinal data

Handling High Cardinality

One-hot encoding can create too many columns when you have categories with many unique values (high cardinality). For features with more than 10-15 unique values, consider using label encoding or more advanced techniques like target encoding.

Was this page helpful?

Need help?Contact Support
Questions?Contact Sales

Last updated: 3/22/2025

ML Clever Docs