Logistic regression is a supervised learning technique applied to classification problems.

In this post, let us explore:

Logistic functions capture the exponential growth when resources are limited (read more here and here).

Sigmoid function is a special case of Logistic function as shown in the picture below (link). In the following pictures, I have shown how to derive the log of odds ratio from the sigmoid function.

Now we will see how to derive the log of odds ratio [p/(1-p)].

As shown in the pictures above, log of odds ratio (logit) is linear function of independent variables.

The data used for demonstrating the logistic regression is from the Titanic dataset. For simplicity I have used only three features (Age, fare and pclass).

The above output shows the default hyperparemeters used in sklearn.

Let us now perform the 5-fold cross-validation and find out the accuracy.

###

In this post, let us explore:

- Logistic Regression model
- Advantages
- Disadvantages
- Example
- Hyperparemeters and Tuning

Logistic functions capture the exponential growth when resources are limited (read more here and here).

Sigmoid function is a special case of Logistic function as shown in the picture below (link). In the following pictures, I have shown how to derive the log of odds ratio from the sigmoid function.

Use sigmoid function to model the probability of dependent variable being 1 or 0 (binary classification).

Now we will see how to derive the log of odds ratio [p/(1-p)].

As shown in the pictures above, log of odds ratio (logit) is linear function of independent variables.

### Advantage

**We can interpret the coefficients (white box model)**

### Disadvantage

- Requires that each data point be independent of other data points (source)

### Example

The data used for demonstrating the logistic regression is from the Titanic dataset. For simplicity I have used only three features (Age, fare and pclass).

And I have performed 5-fold cross-validation (cv=5) after dividing the data into training (80%) and testing (20%) datasets. I have calculated accuracy using both cv and also on test dataset.

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1, penalty='l2', random_state=None, solver='liblinear', tol=0.0001, verbose=0, warm_start=False)

The above output shows the default hyperparemeters used in sklearn.

Let us now perform the 5-fold cross-validation and find out the accuracy.

###
Accuracy: 0.68 (+/- 0.04)
Test Accuracy: 0.70

To get the confusion matrix, we can use the following code. We can also visualize the confusion matrix for easier understanding.

### Hyperparameters

Let us look at the important hyperparameters of Logistic Regression one by one in the order of sklearn's fit output. The following output shows the default hyperparemeters used in sklearn.

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1, penalty='l2', random_state=None, solver='liblinear', tol=0.0001, verbose=0, warm_start=False)

**C**

It is the inverse of regularization strength. Lower the value of C, higher the regularization and hence lower is the chance of overfitting. Default is C=1.

**Class weight**

By default, sklearn uses 'balanced' mode by automatically adjusting the class weights to inverse of frequencies in each class. This is useful while handling class imbalance in dataset.

**Fit Intercept**

By default, intercept is added to the logistic regression model.

**Multiclass**

In this we have three options: ovr’, ‘multinomial’, ‘auto'.

'ovr' corresponds to One-vs-Rest. Auto selects 'ovr' when problem is binary classification, otherwise 'multinomial'.

### Tuning the Hyperparameters

Now let us tune the hyperparameters using GridSearch.

{'C': 2, 'penalty': 'l2'}

Then you can use these values in the pipeline for fitting the model (for more).

We can also use randomized search for finding the best parameters. Advantages of randomized search is that it is faster and also we can search the hyperparameters such as C over distribution of values.