Data

Viewing & Analyzing Your Dataset

The Dataset View page serves as your central hub for data exploration, analysis, and preparation for machine learning. This interactive interface allows you to understand your data's characteristics, identify patterns, detect issues, and prepare for the next stages in your ML workflow.

Dataset Header & Controls

The dataset header provides essential information and powerful management controls:

Information Elements

Dataset Name

Double-click to edit the name directly in the header

Category BadgeFINANCE

AI-selected indicator of your dataset's domain. Proper categorization enables intelligent recommendations.

Target Column profit_margin

Indicates which column is selected as your prediction target. The target selection impacts all downstream ML operations.

Action Controls

Edit Mode

Toggles editable mode that enables column and row deletion

Delete Dataset

Permanently removes the dataset (with confirmation)

Preprocess Dataset

Proceeds to data preprocessing steps such as imputation, encoding, and scaling

AutoML

Automated preprocessing, model selection & evaluation

Check ML Readiness metrics before execution

Warning: Any changes made while in Edit Mode (deleting rows/columns) are permanent and cannot be undone. Consider exporting a backup before making significant modifications.

Interactive Analysis Tabs

ML Clever provides seven specialized tabs for comprehensive data analysis, each offering unique insights:

Dataset Table

The primary view showing your raw data in tabular format with the following capabilities:

View data with column headers and all rows
In Edit Mode: Remove columns via the "X" above column headers
In Edit Mode: Delete specific rows via the "X" at row start
Automatic data loading as you scroll down for efficient handling of large datasets
Sticky headers that remain visible while scrolling through data

Basic Stats

Statistical summary of each feature in your dataset:

For numerical features: mean, median, min, max, standard deviation, and quartiles
For categorical features: unique value counts, most frequent values, and cardinality
Data type detection and statistics appropriate for each type

Missing Values

Visual representation of missing data patterns:

Heatmap showing locations and patterns of missing values
Bar chart of missing percentage by column
Correlation between missing value patterns across columns
Recommendations for appropriate imputation strategies

Correlation

Interactive correlation matrix visualization:

Heatmap showing relationships between all numerical features
Highlights high correlation pairs that may indicate redundant features
Color-coded display from strong negative to strong positive correlations
Tooltips showing exact correlation values on hover

Feature Distribution

Visual analysis of data distributions:

Histograms for numerical features showing value distribution
Bar charts for categorical features showing value frequencies
Detection of outliers and skewness in distributions
Split view showing distribution in relation to target variable

Feature Importance

Preliminary assessment of feature relevance:

Bar chart ranking features by their estimated importance to the target
Uses statistical relationships and preliminary models to assess relevance
Helps identify potentially irrelevant features that might be candidates for removal
Shows connection strength between each feature and your target variable

ML Readiness

Comprehensive assessment of your dataset's readiness for machine learning:

Overall readiness score with detailed breakdown
Checklist of potential issues that need addressing
Personalized recommendations for preprocessing steps
Data quality metrics and warnings about potential problems

Essential Tasks & Workflows

Dataset Management

Rename Your Dataset

Double-click on the dataset name in the header to activate inline editing. Type your new name and click anywhere outside the input to save changes.

Remove Irrelevant Columns

Click the Edit button (pencil icon) to enter Edit Mode, then click the "X" above any column header you wish to remove. Use the Correlation and Feature Importance tabs to make informed decisions about which columns might be redundant or irrelevant.

Remove Problematic Rows

With Edit Mode active, click the "X" at the beginning of any row to remove that specific data point. This is useful for eliminating obvious outliers or corrupted records that could negatively impact your model performance.

Navigate Large Datasets

Scroll through the dataset table to automatically load additional rows. ML Clever uses efficient pagination to handle large datasets without performance issues, loading new data as you scroll down.

Data Analysis Workflow

Assess Data Completeness

Start with the "Missing Values" tab to understand the extent and patterns of missing data. High percentages of missing values in certain columns may require imputation or column removal before modeling.

Explore Feature Distributions

Use the "Feature Distribution" tab to examine how values are distributed across each column. Look for heavily skewed distributions, multi-modal patterns, or extreme outliers that might require transformation or special handling.

Identify Correlated Features

Check the "Correlation" tab to discover highly correlated feature pairs (typically |r| > 0.8). Such features often contain redundant information and removing one from each highly correlated pair can simplify your model without losing predictive power.

Prioritize Important Features

Review the "Feature Importance" tab to see which variables have the strongest relationship with your target. This helps focus your attention on the most relevant features and potentially identify candidates for feature engineering.

Check ML Readiness

Finally, visit the "ML Readiness" tab for an overall assessment and specific recommendations to address any remaining issues before proceeding to preprocessing and modeling steps.

Moving Forward

After thoroughly exploring and understanding your dataset, you'll be ready to move on to preprocessing steps that prepare your data for optimal model performance:

Preprocessing Your Dataset

Click the "Preprocess Dataset" button in the header to access ML Clever’s comprehensive preprocessing suite:

Data Imputation – Fill missing values using various statistical methods
Feature Encoding – Convert categorical variables to numerical formats
Feature Scaling – Normalize or standardize numerical features
Feature Engineering – Create new features to improve model performance
Dimension Reduction – Reduce feature space while preserving information

Pro Tips for Data Exploration

Iterative Analysis: Data exploration is an iterative process. Revisit analysis tabs as you make changes.
Use the Magic Button: ML Clever’s AI-powered insights can highlight issues you might miss and suggest optimal preprocessing steps.
Data Quality First: Address missing values and outliers before advanced preprocessing.
Documentation: Keep notes of your findings and decisions to inform your modeling strategy.

Continue to Manual Preprocessing Learn More About AutoML Skip to Model Selection Back to Dataset Upload

Was this page helpful?

Need help?Contact Support

Questions?Contact Sales

Last updated: 5/16/2025

ML Clever Docs