Documentation > Data > View Dataset

Viewing & Analyzing Your Dataset

The Dataset View page serves as your central hub for data exploration, analysis, and preparation for machine learning. This interactive interface allows you to understand your data's characteristics, identify patterns, detect issues, and prepare for the next stages in your ML workflow.

Dataset Overview Screenshot

Dataset Header & Controls

The dataset header provides essential information and powerful management controls:

Information Elements

Dataset Name

Double-click to edit the name directly in the header

Category BadgeFINANCE

AI-selected indicator of your dataset's domain. Proper categorization enables intelligent recommendations.

Target Column profit_margin

Indicates which column is selected as your prediction target. The target selection impacts all downstream ML operations.

Action Controls

Edit Mode

Toggles editable mode that enables column and row deletion

Delete Dataset

Permanently removes the dataset (with confirmation)

Preprocess Dataset

Proceeds to data preprocessing steps such as imputation, encoding, and scaling

AutoML

Automated preprocessing, model selection & evaluation
Check ML Readiness metrics before execution

Warning: Any changes made while in Edit Mode (deleting rows/columns) are permanent and cannot be undone. Consider exporting a backup before making significant modifications.

Interactive Analysis Tabs

ML Clever provides seven specialized tabs for comprehensive data analysis, each offering unique insights:

Interactive Analysis Tabs Screenshot

Dataset Table

The primary view showing your raw data in tabular format with the following capabilities:

  • View data with column headers and all rows
  • In Edit Mode: Remove columns via the "X" above column headers
  • In Edit Mode: Delete specific rows via the "X" at row start
  • Automatic data loading as you scroll down for efficient handling of large datasets
  • Sticky headers that remain visible while scrolling through data

Basic Stats

Statistical summary of each feature in your dataset:

  • For numerical features: mean, median, min, max, standard deviation, and quartiles
  • For categorical features: unique value counts, most frequent values, and cardinality
  • Data type detection and statistics appropriate for each type

Missing Values

Visual representation of missing data patterns:

  • Heatmap showing locations and patterns of missing values
  • Bar chart of missing percentage by column
  • Correlation between missing value patterns across columns
  • Recommendations for appropriate imputation strategies

Correlation

Interactive correlation matrix visualization:

  • Heatmap showing relationships between all numerical features
  • Highlights high correlation pairs that may indicate redundant features
  • Color-coded display from strong negative to strong positive correlations
  • Tooltips showing exact correlation values on hover

Feature Distribution

Visual analysis of data distributions:

  • Histograms for numerical features showing value distribution
  • Bar charts for categorical features showing value frequencies
  • Detection of outliers and skewness in distributions
  • Split view showing distribution in relation to target variable

Feature Importance

Preliminary assessment of feature relevance:

  • Bar chart ranking features by their estimated importance to the target
  • Uses statistical relationships and preliminary models to assess relevance
  • Helps identify potentially irrelevant features that might be candidates for removal
  • Shows connection strength between each feature and your target variable

ML Readiness

Comprehensive assessment of your dataset's readiness for machine learning:

  • Overall readiness score with detailed breakdown
  • Checklist of potential issues that need addressing
  • Personalized recommendations for preprocessing steps
  • Data quality metrics and warnings about potential problems

Essential Tasks & Workflows

Tasks and Workflow Screenshot

Dataset Management

1

Rename Your Dataset

Double-click on the dataset name in the header to activate inline editing. Type your new name and click anywhere outside the input to save changes.

2

Remove Irrelevant Columns

Click the Edit button (pencil icon) to enter Edit Mode, then click the "X" above any column header you wish to remove. Use the Correlation and Feature Importance tabs to make informed decisions about which columns might be redundant or irrelevant.

3

Remove Problematic Rows

With Edit Mode active, click the "X" at the beginning of any row to remove that specific data point. This is useful for eliminating obvious outliers or corrupted records that could negatively impact your model performance.

4

Navigate Large Datasets

Scroll through the dataset table to automatically load additional rows. ML Clever uses efficient pagination to handle large datasets without performance issues, loading new data as you scroll down.

Data Analysis Workflow

1

Assess Data Completeness

Start with the "Missing Values" tab to understand the extent and patterns of missing data. High percentages of missing values in certain columns may require imputation or column removal before modeling.

2

Explore Feature Distributions

Use the "Feature Distribution" tab to examine how values are distributed across each column. Look for heavily skewed distributions, multi-modal patterns, or extreme outliers that might require transformation or special handling.

3

Identify Correlated Features

Check the "Correlation" tab to discover highly correlated feature pairs (typically |r| > 0.8). Such features often contain redundant information and removing one from each highly correlated pair can simplify your model without losing predictive power.

4

Prioritize Important Features

Review the "Feature Importance" tab to see which variables have the strongest relationship with your target. This helps focus your attention on the most relevant features and potentially identify candidates for feature engineering.

5

Check ML Readiness

Finally, visit the "ML Readiness" tab for an overall assessment and specific recommendations to address any remaining issues before proceeding to preprocessing and modeling steps.

Moving Forward

After thoroughly exploring and understanding your dataset, you'll be ready to move on to preprocessing steps that prepare your data for optimal model performance:

Preprocessing Workflow Screenshot

Preprocessing Your Dataset

Click the "Preprocess Dataset" button in the header to access ML Clever’s comprehensive preprocessing suite:

Pro Tips for Data Exploration

  • Iterative Analysis: Data exploration is an iterative process. Revisit analysis tabs as you make changes.
  • Use the Magic Button: ML Clever’s AI-powered insights can highlight issues you might miss and suggest optimal preprocessing steps.
  • Data Quality First: Address missing values and outliers before advanced preprocessing.
  • Documentation: Keep notes of your findings and decisions to inform your modeling strategy.

Was this page helpful?

Need help?Contact Support
Questions?Contact Sales

Last updated: 3/22/2025

ML Clever Docs