Exploratory
  • Introduction
  • Product Features
    • Summary View
    • Table View
    • Row Filter
    • Column Filter
    • Dashboard
    • Dashboard (日本語)
    • Note
    • Note (日本語)
    • Steps (Right-hand side)
    • Branch
    • Parameter
    • Parameter (日本語)
    • Export
    • Share
      • Share Type
      • Chart / Analytics
      • Data
      • Report (Note / Dashboard)
      • Notification
      • Version History
      • Restore Older Version
      • CSV API
    • Share (日本語)
      • 共有のタイプ
      • チャート / アナリティクス
      • データ
      • レポート (ノート / ダッシュボード)
      • 通知
      • バージョンの履歴
      • 古いバージョンの復元
      • CSV API
    • Schedule
      • Manage Schedules
      • Notification
      • Scheduling History
    • Schedule (日本語)
      • スケジュールの設定
      • 通知
      • スケジュールの履歴
    • Team
      • Manage Teams
    • Team (日本語)
      • チームの設定
    • Project
      • Import
      • Export
      • Search
  • Data Import
    • File Data
      • CSV / Delimited File
      • Amazon S3
      • Google Drive
      • Google Cloud Storage
      • Excel
      • JSON
      • Log File
      • Microsoft Azure
      • Stats - SAS / SPSS / STATA
      • RData / RDS
      • Parquet File
      • EDF - Exploratory
    • Database Data
      • SQL Troubleshooting
      • Create Connection
      • Amazon Athena
      • Amazon Aurora
      • Amazon Redshift
      • Amazon Redshift (日本語)
      • Google BigQuery
      • HP Vertica
      • MariaDB / MySQL DB
      • MariaDB / MySQL DB (日本語)
      • Microsoft Access
      • MongoDB
      • ODBC
      • Oracle
      • PostgreSQL
      • PostgreSQL (日本語)
      • Presto
      • Snowflake
      • SQLServer (DSN)
      • SQLServer
      • Teradata
      • Treasure Data
    • Cloud Apps Data
      • Create Connection
      • FRED - Federal Reserve of Economic Data
      • Github Issues
      • Google Analytics
      • Google Analytics (日本語)
      • Google Spreadsheet
      • Google Cloud Storage
      • Salesforce
      • Twitter Search
      • Stripe
      • Weather Data
      • Stock Price Data
    • Write R Script as Data
      • Currency Exchange Rate
    • Write R Script as Data (日本語)
    • Web Page Scraping
    • Text Input Data
    • Data Source Extension
      • Quandl
      • Holiday
      • RSS Data
    • Create Custom Data Source
  • Data Wrangling
    • Command Line mode for faster and more flexible data interaction in Exploratory
    • Select / Remove Columns
    • Reorder Columns
    • Create New Calculation
    • Create New Calculation for Multiple Columns
    • Summarize (Aggregate)
    • Group
    • Filter
    • Rename
    • Arrange (Sort)
    • Top / Bottom N
    • Join
    • Merge
    • Gather
    • Spread
    • Pivot
    • Expand
    • Complete
    • Separate
    • Unite
    • Bind Rows
    • Bind Columns
    • Keep Only Unique Rows
    • Keep Only Duplicated Rows
    • Slice
    • Drop NA
    • Sample
    • Impute NA
    • Fill
    • Create Buckets
    • Assign New Values to Existing Values - Recode
    • Assign New Values by Setting Conditions - Case When
    • Work with Categories
    • Data Type Conversion
    • Row as header
    • Ungroup
    • Unnest
    • Separate List Items into Columns (Unnest Wider)
    • Separate List Items into Rows (Unnest Longer)
    • Separate Address (Japan)
    • Hoist
    • Remove Empty Rows
    • Remove Empty Columns
    • Clean Column Names
    • Window Calculation
    • Window Calculation (日本語)
    • Add Row
    • Text Wrangling
    • Regular Expression Cheat Sheet
    • Regular Expression Cheat Sheet (日本語)
  • Visualization
    • Types
      • Pivot
      • Summarize Table
      • Table
      • Bar
      • Line
      • Area
      • Pie/Ring
      • Radar
      • Histogram
      • Density Plot
      • Scatter (No Aggregation)
      • Scatter (With Aggregation)
      • Boxplot
      • Violin
      • Error Bar
      • Error Bar (Summarized Data)
      • Map - Standard
      • Map - Extension
      • Map - Long/Lat
      • Map - Heatmap
      • Heatmap
      • Contour
      • Number
      • Word Cloud
      • Word Cloud (日本語)
    • Features
      • Trend Line
      • Reference Line
      • Repeat By
      • Window Calculation
      • Date/Time Aggregation
      • Show Range
      • Highlight
      • Change Marker
      • Multiple Y-Axis Columns
      • Layout Configuration
      • Column Configuration
      • Column Configuration Dialog
      • Color and Group Setting
      • Color and Group Setting (日本語)
      • Color Setting
      • User Color Palette Setting
      • Pin
      • Save as PNG/SVG
      • Save as Exploratory Data File
      • Share/Schedule
      • URL Link
      • Category (Binning)
      • Highlight
      • Limit Values
      • 'Others' Group
      • Edit Display Name
      • Missing Value Handling
      • Rename Column Names
      • Axis Setting
      • Axis Formatting
      • Show Detail
      • Fit to Screen (Table)
      • Number of Unique Values Check
      • Number of Unique Values Check (日本語)
  • Analytics
    • Correlation
    • Distance
    • K-Means Clustering
    • Principal Component Analysis
    • Factor Analysis
    • Correspondence Analysis
    • Linear Regression Analysis
    • Logistic Regression Analysis
    • Generalized Linear Models
    • Survival Curve
    • Cox Regression
    • Random Survival Forest
    • Decision Tree
    • Random Forest
    • XGBoost
    • Time Series Forecasting (Prophet)
    • Time Series Forecasting (ARIMA)
    • Time Series Clustering
    • Anomaly Detection
    • Word Count
    • Text Clustering with Topic Model (LDA)
    • Market Basket Analysis
    • T Test
    • T Test (Aggregated Data)
    • ANOVA
    • Wilcoxon Test
    • Kruskal-Wallis Test
    • Chi-Square Test
    • A/B Test
    • Normality Test
    • Prediction
    • Dictionaries for Text Analysis
  • Statistics
    • Correlation
    • Distance
    • Cosine Similarity
    • SVD
    • Multi Dimensional Scaling
    • T-test
    • F-test
    • Chi-square test
    • A/B Test (Bayesian)
  • Machine Learning
    • Linear Regression
    • Logistic Regression
    • GLM
    • Multinomial Logistic Regression
    • K-means Clustering
    • Random Forest
    • XGBoost
    • Forecasting
    • Time Series Clustering
    • Anomaly Detection
    • Survival Curve
    • Survival Model (Cox Regression)
    • Market Basket
    • Causal Impact
    • Evaluate Prediction - Regression
    • Evaluate Prediction - Binary
    • Calculate ROC
    • Evaluate Prediction - Multiclass
    • Prediction
    • Prediction - Binary Classification
    • Prediction - Survival Model
    • Simulate Survival Curve
    • Extract Summary of Fit
    • Extract Parameter Estimates
    • Run ANOVA Test
    • Fix Imbalanced Data (SMOTE)
  • Text Analysis
    • Tokenize Text
    • Create N-gram Tokens
    • Calculate tf-idf
    • Count Text Pairs
  • Extend with R
    • R Package Install
    • Custom R Script
    • Custom Model Function
  • Setup
    • Disable McAfee virus scan
    • Change Repository Location
    • Change Repository Location (日本語)
    • Holidays Data for Forecast
    • Possible Reasons for Install Error
    • Upgrade Microsoft .NET Framework
  • Diagnostics
    • Log file for debugging
    • Log file for debugging (日本語)
    • Startup Log file for debugging
    • Startup Log file for debugging (日本語)
    • Check version of Exploratory Desktop
    • How to Recover the History Data
  • Keyboard shortcuts
Powered by GitBook
On this page
  • Input Data
  • Analytics Properties
  • How to Use This Feature
  • "Summary" View
  • "Prediction" View
  • "Importance" View
  • "Coefficients" View
  • "Coefficients Table" View
  • "Collinearity" View
  • "Residuals" View
  • "Residual Distribution" View
  • "Data" View
  • R Package

Was this helpful?

  1. Analytics

Linear Regression Analysis

Build Linear Regression Model

Input Data

Input data should contain a numeric column for "Target Variable" and more than one categorical and/or numeric columns as Predictor Variable Columns.

  • Target Variable - Numeric column that you want to Predict.

  • Predictor Variable Columns - Numeric and/or Categorical columns. Prediction is made based on values of those columns.

Analytics Properties

  • Coefficients

    • P Value Threshold to be Significant - P value must be smaller than this value for coefficients to be considered statistically significant.

    • Sort Variables by Coefficients - If set to TRUE, variables displayed in Coefficients View are sorted by coefficients.

  • Variable Importance

    • Enable Relative Importance - Enable/disable calculation of variable importance.

    • Method - Method to calculate variable importance.

      • First Variable - Based on R Squared of the model that predicts with only the variable to estimate the importance of.

      • Last Variable - Based on R Squared of the model that predicts with all the variables except for the variable to estimate the importance of.

      • Lindeman, Merenda, and Gold - Based on average of contribution of the variable to estimate to R Squared in all possible order of adding variables to the model.

    • Confidence Interval - Level of confidence interval to estimate for variable importance of each variable.

    • Bootstrap Method - Bootstrap method to use to estimate confidence interval of variable importance.

    • Number of Bootstrap Iterations - Bootstrap iterations to perform to estimate confidence interval of variable importance.

  • Data Preprocessing

    • Sample Data Size - Number of rows to sample before building linear regression model.

    • Random Seed - Seed used to generate random numbers. Specify this value to always reproduce the same result.

    • Max # of Categories for Predictor Vars - If categorical predictor column has more categories than this number, less frequent categories are combined into 'Other' category.

  • Evaluation

    • Test Mode - Enable/disable Test Mode, where data is split into training data and test data for evaluation of the created linear regression model.

    • Ratio for Test Data - Ratio of test data to select from the entire data. Default is 0.3.

    • Data Splitting Method

      • Random - Test data is randomly sampled from the original data.

      • Reserve Order in Data - Data that shows up later in the original data is used as test data.

How to Use This Feature

  1. Click Analytics View tab.

  2. If necessary, click "+" button on the left of existing Analytics tabs, to create a new Analytics.

  3. Select "Regression Analysis" for Analytics Type.

  4. Select What to Predict Column.

  5. Click Variable Columns and open Column Selector Dialog.

  1. Select Columns that you want to see importance.

  2. Click Run button to run the analytics.

  3. Select view type (explained below) by clicking view type link to see each type of generated visualization.

"Summary" View

"Summary" View displays the summary of the model created for this Regression Analysis. Each column shows the model information like R Squared, Root Mean Square Error from where you can understand model performance.

  • R Squared - A statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression. 1 (100%) indicates that the model explains all the variability of the response data around its mean.

  • Adjusted R Squared - The adjusted R-squared is a modified version of R-squared that has been adjusted for the number of predictors in the model. The adjusted R-squared increases only if the new term improves the model more than would be expected by chance. It decreases when a predictor improves the model by less than expected by chance. The adjusted R-squared can be negative, but it’s usually not. It is always lower than the R-squared.

  • Root Mean Square errors - The Root Mean Square Error (RMSE) (also called the root mean square deviation, RMSD) is a frequently used measure of the difference between values predicted by a model and the values actually observed from the environment that is being modeled.

  • F Ratio - F Ratio gives you a measure of how much of the variation is explained by the model (per parameter) versus how much of the variation is unexplained (per remaining degrees of freedom). This unexplained variation is your error sums of squares. With this F Ratio, a higher ratio means that your model explains that much more of the variation per parameter than there is error per remaining degree of freedom.

"Prediction" View

"Prediction" View plots how the predicted value varies as each variable changes. This is so called partial dependence plot.

"Importance" View

"Importance" View displays importances of variables for the prediction of the target value. Importances are calculated by permutation importance with residual sum of squares as the cost function.

"Coefficients" View

"Coefficients" View displays coefficient estimates for all the predictor variables with Error Bars with P value as a color (i.e. If P Value < 0.05, the color is blue)

"Coefficients Table" View

"Coefficients Table" View displays more details for all the variables along with other metrics like Coefficient, Standard Error, t-Ratio, P-Value, etc. You can click on the column headers to sort the data with a help of bar visualization.

"Collinearity" View

"Collinearity" View displays VIF (Variance Inflation Factor) of each predictor variables. VIF greater than 10 is commonly considered to be the indicator of problematic degree of multicollinearity.

"Residuals" View

"Residuals" View compares the predicted values and the actual values to see how good or bad the model’s prediction capability is. If it is the perfect model, meaning it can predict with 100% accuracy, then all the dots should be lining up along with the gray line called ‘Perfect Fit’.

"Residual Distribution" View

"Data" View

"Data" View displays prediction results on each row in the data.

R Package

PreviousCorrespondence AnalysisNextLogistic Regression Analysis

Last updated 4 years ago

Was this helpful?

"Residual Distribution" View shows the residual which is difference between the predicted and the actual values. And visualizing the Residuals can reveal a lot of useful information that can guide you to decide what types of data transformations are needed to improve the model. (e.g. )

The Regression Analysis uses function from Exploratory R Package under the hood.

Interpreting residual plots to improve your regression
build_lm.fast