The Dataset View page serves as your central hub for data exploration, analysis, and preparation for machine learning. This interactive interface allows you to understand your data's characteristics, identify patterns, detect issues, and prepare for the next stages in your ML workflow.
The dataset header provides essential information and powerful management controls:
Double-click to edit the name directly in the header
FINANCE
AI-selected indicator of your dataset's domain. Proper categorization enables intelligent recommendations.
profit_margin
Indicates which column is selected as your prediction target. The target selection impacts all downstream ML operations.
Toggles editable mode that enables column and row deletion
Permanently removes the dataset (with confirmation)
Proceeds to data preprocessing steps such as imputation, encoding, and scaling
Warning: Any changes made while in Edit Mode (deleting rows/columns) are permanent and cannot be undone. Consider exporting a backup before making significant modifications.
ML Clever provides seven specialized tabs for comprehensive data analysis, each offering unique insights:
The primary view showing your raw data in tabular format with the following capabilities:
Statistical summary of each feature in your dataset:
Visual representation of missing data patterns:
Interactive correlation matrix visualization:
Visual analysis of data distributions:
Preliminary assessment of feature relevance:
Comprehensive assessment of your dataset's readiness for machine learning:
Double-click on the dataset name in the header to activate inline editing. Type your new name and click anywhere outside the input to save changes.
Click the Edit button (pencil icon) to enter Edit Mode, then click the "X" above any column header you wish to remove. Use the Correlation and Feature Importance tabs to make informed decisions about which columns might be redundant or irrelevant.
With Edit Mode active, click the "X" at the beginning of any row to remove that specific data point. This is useful for eliminating obvious outliers or corrupted records that could negatively impact your model performance.
Scroll through the dataset table to automatically load additional rows. ML Clever uses efficient pagination to handle large datasets without performance issues, loading new data as you scroll down.
Start with the "Missing Values" tab to understand the extent and patterns of missing data. High percentages of missing values in certain columns may require imputation or column removal before modeling.
Use the "Feature Distribution" tab to examine how values are distributed across each column. Look for heavily skewed distributions, multi-modal patterns, or extreme outliers that might require transformation or special handling.
Check the "Correlation" tab to discover highly correlated feature pairs (typically |r| > 0.8). Such features often contain redundant information and removing one from each highly correlated pair can simplify your model without losing predictive power.
Review the "Feature Importance" tab to see which variables have the strongest relationship with your target. This helps focus your attention on the most relevant features and potentially identify candidates for feature engineering.
Finally, visit the "ML Readiness" tab for an overall assessment and specific recommendations to address any remaining issues before proceeding to preprocessing and modeling steps.
After thoroughly exploring and understanding your dataset, you'll be ready to move on to preprocessing steps that prepare your data for optimal model performance:
Click the "Preprocess Dataset" button in the header to access ML Clever’s comprehensive preprocessing suite: