# Logistic Regression

## Introduction

Logistic regression model is a statistical model that fits to an response variable that follows binary (0 or 1) distribution with linear predictors.

## How to Access?

There are two ways to access. One is to access from 'Add' (Plus) button.

![](https://2850417076-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M4HLCK3olgduYoe3RVS%2F-M4oMvCUDQwHTJ0eWi_f%2F-M4oNDzD0Y6IVPsqA4UV%2Flr_add.png?generation=1586795485698348\&alt=media)

Another way is to access from a column header menu from a numeric column.

![](https://2850417076-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M4HLCK3olgduYoe3RVS%2F-M4oMvCUDQwHTJ0eWi_f%2F-M4oNDzFYE84skvQsf6Z%2Flr_cols.png?generation=1586795485790483\&alt=media)

## How to Use?

### Column Selection

There are two ways to set what you want to predict by what variables.

![](https://2850417076-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M4HLCK3olgduYoe3RVS%2F-M4oMvCUDQwHTJ0eWi_f%2F-M4oNDzH4u0X4wvDSMOT%2Ffml_col_selection.png?generation=1586795485601070\&alt=media)

If you are on "Select Columns" tab, you can set them by column selector.

![](https://2850417076-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M4HLCK3olgduYoe3RVS%2F-M4oMvCUDQwHTJ0eWi_f%2F-M4oNDzJrXqnrD85G44U%2Ffml_custom.png?generation=1586795485831641\&alt=media)

If you are on "Custom" tab, you can type a formula directly.

### Train Test Split

![](https://2850417076-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M4HLCK3olgduYoe3RVS%2F-M4oMvCUDQwHTJ0eWi_f%2F-M4oNDzLGK7ZU1RCNZk_%2Ftrain_test_split.png?generation=1586795485645143\&alt=media)

You can split the data into training and test to evaluate the performance of the model. You can set

* Test Data Set Ratio - Ratio of test data in the whole data.
* Random Seed to Split Training/Test - You can change random seed to try other training and test data combination.

### Parameters

* Weight Vector (Optional) - "weights" parameter of glm function.
* A Vector to Subset Data (Optional) - "subset" parameter of glm function.
* How to treat NA? (Optional) - "na.action" parameter of glm. function. The default is "na.fail". This changes the behaviour of NA data. Can be one of the following.
  * "na.omit"
  * "na.fail"
  * "na.exclude"
  * "na.pass"
  * NULL
* Parameter to Start (Optional) - "start" parameter in glm. Starting values for the parameters in the linear predictor.
* Predictor to Start (Optional) - "etastart" parameter in glm. Starting values for the linear predictor.
* Means to Start (Optional) - "mustart" parameter in glm.
* Offset (Optional) - "offset" parameter in glm. This can be used to specify an a priori known component to be included in the linear predictor during fitting.
* Convergence Tolerance ε (Optional) - "epsilon" parameter in glm. Positive convergence tolerance ε.
* Maximum # of Iteration - "maxit" parameter in glm. Integer giving the maximal number of iterative weighted least squares iterations.
* Generate Result per Iteration - "trace" parameter in glm. Logical indicating if output should be produced for each iteration.
* Return Model Object - "model" paramter in glm. A logical value indicating whether model frame should be included as a component of the returned value.
* Which method to apply? (Optional) - "method" parameter of glm function. The default is "glm.fit". The method to be used in fitting the model. This can be
  * "glm.fit"
  * "model.frame"
* Return Model Matrix X (Optional) - "x" parameter of glm function. If x should be included in the returned value.
* Return Model Matrix Y (Optional) - "y" parameter of glm function. If y should be included in the returned value.

Take a look at the [reference document](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/glm.html) for the 'glm' function from base R for more details on the parameters.

## How to Read Summary

![](https://2850417076-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M4HLCK3olgduYoe3RVS%2F-M4oMvCUDQwHTJ0eWi_f%2F-M4oNDzNaxemsrnAsdxF%2Flr_summary.png?generation=1586795485623301\&alt=media)

Once you run it, you will see summary info like this.

### Summary of Fit

* Null Deviance - The percent of variance explained by the model.
* DF for Null Model - R Square adjusted based on the degrees of freedom.
* Log Likelihood - The square root of the estimated residual variance.
* AIC - F-statistic.
* BIC - p-value from the F test, describing whether the full regression is significant.
* Deviance - Degrees of freedom used by the coefficients.
* Residual DF - The data's log-likelihood under the model.

### Parameter Estimates

* Term - The term in the linear model being estimated and tested.
* Estimate - The estimated coefficient.
* Std Error - The standard error from the linear model.
* t Ratio - t-statistic.
* P Value - Two sided p-value.
* Conf Low - Lower bound of 95% confidence interval.
* Conf High - Upper bound of confidence interval.
* Odds Ratio - Exponent of the estimated coefficient.

## Step-by-step

Here's a step-by-step tutorial guide on how you can build, predict and evaluate logistic regression model.

* [Introduction to Logistic Regression in Exploratory](https://blog.exploratory.io/quick-introduction-to-logistic-regression-in-exploratory-fdcf321e2d7d)
