Exploratory
  • Introduction
  • Product Features
    • Summary View
    • Table View
    • Row Filter
    • Column Filter
    • Dashboard
    • Dashboard (日本語)
    • Note
    • Note (日本語)
    • Steps (Right-hand side)
    • Branch
    • Parameter
    • Parameter (日本語)
    • Export
    • Share
      • Share Type
      • Chart / Analytics
      • Data
      • Report (Note / Dashboard)
      • Notification
      • Version History
      • Restore Older Version
      • CSV API
    • Share (日本語)
      • 共有のタイプ
      • チャート / アナリティクス
      • データ
      • レポート (ノート / ダッシュボード)
      • 通知
      • バージョンの履歴
      • 古いバージョンの復元
      • CSV API
    • Schedule
      • Manage Schedules
      • Notification
      • Scheduling History
    • Schedule (日本語)
      • スケジュールの設定
      • 通知
      • スケジュールの履歴
    • Team
      • Manage Teams
    • Team (日本語)
      • チームの設定
    • Project
      • Import
      • Export
      • Search
  • Data Import
    • File Data
      • CSV / Delimited File
      • Amazon S3
      • Google Drive
      • Google Cloud Storage
      • Excel
      • JSON
      • Log File
      • Microsoft Azure
      • Stats - SAS / SPSS / STATA
      • RData / RDS
      • Parquet File
      • EDF - Exploratory
    • Database Data
      • SQL Troubleshooting
      • Create Connection
      • Amazon Athena
      • Amazon Aurora
      • Amazon Redshift
      • Amazon Redshift (日本語)
      • Google BigQuery
      • HP Vertica
      • MariaDB / MySQL DB
      • MariaDB / MySQL DB (日本語)
      • Microsoft Access
      • MongoDB
      • ODBC
      • Oracle
      • PostgreSQL
      • PostgreSQL (日本語)
      • Presto
      • Snowflake
      • SQLServer (DSN)
      • SQLServer
      • Teradata
      • Treasure Data
    • Cloud Apps Data
      • Create Connection
      • FRED - Federal Reserve of Economic Data
      • Github Issues
      • Google Analytics
      • Google Analytics (日本語)
      • Google Spreadsheet
      • Google Cloud Storage
      • Salesforce
      • Twitter Search
      • Stripe
      • Weather Data
      • Stock Price Data
    • Write R Script as Data
      • Currency Exchange Rate
    • Write R Script as Data (日本語)
    • Web Page Scraping
    • Text Input Data
    • Data Source Extension
      • Quandl
      • Holiday
      • RSS Data
    • Create Custom Data Source
  • Data Wrangling
    • Command Line mode for faster and more flexible data interaction in Exploratory
    • Select / Remove Columns
    • Reorder Columns
    • Create New Calculation
    • Create New Calculation for Multiple Columns
    • Summarize (Aggregate)
    • Group
    • Filter
    • Rename
    • Arrange (Sort)
    • Top / Bottom N
    • Join
    • Merge
    • Gather
    • Spread
    • Pivot
    • Expand
    • Complete
    • Separate
    • Unite
    • Bind Rows
    • Bind Columns
    • Keep Only Unique Rows
    • Keep Only Duplicated Rows
    • Slice
    • Drop NA
    • Sample
    • Impute NA
    • Fill
    • Create Buckets
    • Assign New Values to Existing Values - Recode
    • Assign New Values by Setting Conditions - Case When
    • Work with Categories
    • Data Type Conversion
    • Row as header
    • Ungroup
    • Unnest
    • Separate List Items into Columns (Unnest Wider)
    • Separate List Items into Rows (Unnest Longer)
    • Separate Address (Japan)
    • Hoist
    • Remove Empty Rows
    • Remove Empty Columns
    • Clean Column Names
    • Window Calculation
    • Window Calculation (日本語)
    • Add Row
    • Text Wrangling
    • Regular Expression Cheat Sheet
    • Regular Expression Cheat Sheet (日本語)
  • Visualization
    • Types
      • Pivot
      • Summarize Table
      • Table
      • Bar
      • Line
      • Area
      • Pie/Ring
      • Radar
      • Histogram
      • Density Plot
      • Scatter (No Aggregation)
      • Scatter (With Aggregation)
      • Boxplot
      • Violin
      • Error Bar
      • Error Bar (Summarized Data)
      • Map - Standard
      • Map - Extension
      • Map - Long/Lat
      • Map - Heatmap
      • Heatmap
      • Contour
      • Number
      • Word Cloud
      • Word Cloud (日本語)
    • Features
      • Trend Line
      • Reference Line
      • Repeat By
      • Window Calculation
      • Date/Time Aggregation
      • Show Range
      • Highlight
      • Change Marker
      • Multiple Y-Axis Columns
      • Layout Configuration
      • Column Configuration
      • Column Configuration Dialog
      • Color and Group Setting
      • Color and Group Setting (日本語)
      • Color Setting
      • User Color Palette Setting
      • Pin
      • Save as PNG/SVG
      • Save as Exploratory Data File
      • Share/Schedule
      • URL Link
      • Category (Binning)
      • Highlight
      • Limit Values
      • 'Others' Group
      • Edit Display Name
      • Missing Value Handling
      • Rename Column Names
      • Axis Setting
      • Axis Formatting
      • Show Detail
      • Fit to Screen (Table)
      • Number of Unique Values Check
      • Number of Unique Values Check (日本語)
  • Analytics
    • Correlation
    • Distance
    • K-Means Clustering
    • Principal Component Analysis
    • Factor Analysis
    • Correspondence Analysis
    • Linear Regression Analysis
    • Logistic Regression Analysis
    • Generalized Linear Models
    • Survival Curve
    • Cox Regression
    • Random Survival Forest
    • Decision Tree
    • Random Forest
    • XGBoost
    • Time Series Forecasting (Prophet)
    • Time Series Forecasting (ARIMA)
    • Time Series Clustering
    • Anomaly Detection
    • Word Count
    • Text Clustering with Topic Model (LDA)
    • Market Basket Analysis
    • T Test
    • T Test (Aggregated Data)
    • ANOVA
    • Wilcoxon Test
    • Kruskal-Wallis Test
    • Chi-Square Test
    • A/B Test
    • Normality Test
    • Prediction
    • Dictionaries for Text Analysis
  • Statistics
    • Correlation
    • Distance
    • Cosine Similarity
    • SVD
    • Multi Dimensional Scaling
    • T-test
    • F-test
    • Chi-square test
    • A/B Test (Bayesian)
  • Machine Learning
    • Linear Regression
    • Logistic Regression
    • GLM
    • Multinomial Logistic Regression
    • K-means Clustering
    • Random Forest
    • XGBoost
    • Forecasting
    • Time Series Clustering
    • Anomaly Detection
    • Survival Curve
    • Survival Model (Cox Regression)
    • Market Basket
    • Causal Impact
    • Evaluate Prediction - Regression
    • Evaluate Prediction - Binary
    • Calculate ROC
    • Evaluate Prediction - Multiclass
    • Prediction
    • Prediction - Binary Classification
    • Prediction - Survival Model
    • Simulate Survival Curve
    • Extract Summary of Fit
    • Extract Parameter Estimates
    • Run ANOVA Test
    • Fix Imbalanced Data (SMOTE)
  • Text Analysis
    • Tokenize Text
    • Create N-gram Tokens
    • Calculate tf-idf
    • Count Text Pairs
  • Extend with R
    • R Package Install
    • Custom R Script
    • Custom Model Function
  • Setup
    • Disable McAfee virus scan
    • Change Repository Location
    • Change Repository Location (日本語)
    • Holidays Data for Forecast
    • Possible Reasons for Install Error
    • Upgrade Microsoft .NET Framework
  • Diagnostics
    • Log file for debugging
    • Log file for debugging (日本語)
    • Startup Log file for debugging
    • Startup Log file for debugging (日本語)
    • Check version of Exploratory Desktop
    • How to Recover the History Data
  • Keyboard shortcuts
Powered by GitBook
On this page
  • Input Data
  • How to Access This Feature
  • Build Survival Model
  • How to Read Model Summary
  • Summary of Fit
  • Parameter Estimates
  • Reference

Was this helpful?

  1. Machine Learning

Survival Model (Cox Regression)

PreviousSurvival CurveNextMarket Basket

Last updated 3 years ago

Was this helpful?

With Cox Proportional Hazard Model, you can predict how particular type of subjects would survive as time goes by, based on the survival data from the past.

Input Data

Input data should be a survival data. Each row should represent one observation (e.g. one user of a subscription service). It should have following columns.

  • Survival Time - A numeric column with survival time. Also called "time to event".

  • Survival Status - A boolean or binary numeric value (can take value of 1 or 0) column with survival status. When this column is true or 1, it means the event of interest happened to the subject at the end of survival time. If it is false or 0, it means we know that the event had not happened to the subject at least until the end of the survival time, but we don't know what happened to the subject after that point.

How to Access This Feature

  • Click "+" button and mouse over "Build Model ...", and select "Build Survival Model (Cox Regression)" submenu to open "Build Survival Model (Cox Regression)" dialog.

  • You can also select "Analytics" from column menu of the survival time column, and then select "Build Survival Model (Cox Regression) for" submenu to open "Build Survival Model (Cox Regression)" dialog.

Build Survival Model

After "Build Survival Model (Cox Regression)" dialog is opened, follow the steps below to build Survival Model.

  1. Select survival time column with "Survival Time (Time to Event)" dropdown.

  2. Select survival status column with "Survival Status (Event)" dropdown.

  3. Select Predictor Columns in "Predictor" section.

  • If you want to include all the columns in the input data other than Survival Time Column and Survival Status Column, choose "All" radio button.

  • If you want to specify particular columns in the input data as predictors, choose "Select" radio button, and select columns from the column selector that appears in this section.

  1. Split test data from traininig data In the "Data Split" section, you can split the data into training and test to evaluate the performance of the model later.

  • Test Data Set Ratio - Ratio of test data in the whole data.

  • Random Seed to Split Training/Test - You can change random seed to try other training and test data combination.

  1. (Optional) Specify additional parameters in "Parameters" section.

  • A Vector to Subset Data

  • Weight Vector

  • How to treat NA?

  • Parameter to Start

  • How to Treat Ties

  • Allow Singular Fit

  • Return Model Object

  • Return Model Matrix X

  • Return Model Matrix Y

  • Time-Transform Function

  • Convergence Tolerance (Epsilon)

  • Tolerance for Singularity during Cholesky Decomposition (Epsilon)

  • Max Iterations

  • Tolerance for Infinite Coefficient Value

  • Max Iterations for Outer Loop

  1. Click "Run" button.

How to Read Model Summary

After building Survival Model, following Model Summary shows up.

Summary of Fit

  • Number of Rows - Number of training data rows.

  • Number of Events - Number of events (deaths) that happened in the training survival data.

  • Likelihood Ratio Test - Test result number for how well the model fits the training data. The larger the better. See Reference.

  • Likelihood Ratio Test P Value - P value calculated from Likelihood Ratio Test result. Can be interpreted as possibility of the model being only as good as a model with no predictor (null model).

  • Score Test - Another test result number for how well the model fits the training data.

  • Score Test P Value - P value calculated from Score Test result

  • Wald Test - Another test result number for how well the model fits the training data.

  • Wald Test P Value - P value calculated from Wald Test result.

  • R Square - Pseudo R-squeared. See Reference.

  • R Square Max - Maximum for the pseudo R-squared. See Reference.

  • Concordance - How well the order of the survival times of the subjects in training data corresponds with order of the hazards of the subjects predicted by the model.

  • Std Error Concordance - Standard error of the concordance when it is viewed as a random variable.

  • Log Likelihood - Likelihood of the model producing the result that is same as the training data, on log scale

  • AIC - Akaike information criterion

  • BIC - Bayesian information criterion

Parameter Estimates

  • Term - Name of a predictor

  • Estimate - Fitted coefficient for the predictor

  • Std Error - Standard error of "Estimates" when it is viewed as a random variable.

  • t Ratio - "Estimate" divided by "Std Error". Indicator of whether the estimate is statistically significantly different from zero.

  • P Value - Can be interpreted as the probability of the predictor being totally irrelevant.

  • Conf Low - Lower limit of confidence interval for "Estimate"

  • Conf High - Upper limit of confidence interval for "Estimate"

  • Hazard Ratio - If this predictor goes up by one unit mount, hazard of the event (death) in a unit time goes up by this ratio.

Reference

Introduction to Survival Analysis Part 2 — Survival Model (Cox Regression)
HOW ARE THE LIKELIHOOD RATIO, WALD, AND LAGRANGE MULTIPLIER (SCORE) TESTS DIFFERENT AND/OR SIMILAR?
WHAT ARE PSEUDO R-SQUAREDS?