-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy path04_Section_4.qmd
More file actions
executable file
·71 lines (50 loc) · 1.9 KB
/
04_Section_4.qmd
File metadata and controls
executable file
·71 lines (50 loc) · 1.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
title: Logistic Regression
format: html
filters:
- pyodide
---
## Logistic Regression
[Fill description]{style="color:red;"}.
### Quiz : Draw a line!
## Train-Test Split
[Fill description]{style="color:red;"}.
First, we will perform a train-test split.
```{pyodide-python}
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_score, accuracy_score, recall_score, f1_score, roc_auc_score, confusion_matrix
data_url = "https://raw.githubusercontent.com/zadchin/RT-test/master/final_dataset.csv"
data = pd.read_csv(data_url)
features = data.drop("diagnosis", axis = 1)
target = data.diagnosis
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size = 0.3, random_state=42)
print("X train shape: ", X_train.shape)
print("X test shape: ", X_test.shape)
```
## Algorithm
Next, we will run logistic regression on our training dataset, then compute the performance of the metrics using our dataset and then
```{pyodide-python}
import numpy as np
LR_model = LogisticRegression(max_iter=10000, C=1.0, class_weight="balanced")
LR_model.fit(X_train, y_train)
y_pred_LR = LR_model.predict(X_test)
print("Accuracy: ", np.round(accuracy_score(y_test, y_pred_LR), 2))
print("Precision: ",np.round(precision_score(y_test, y_pred_LR), 2))
print("Recall: ", np.round(recall_score(y_test, y_pred_LR), 2))
print("F1 Score: ", np.round(f1_score(y_test, y_pred_LR), 2))
```
::: {.callout-tip appearance="simple"}
Can you try to delete the class weight and changing the C-value to observe what happened to the performance of the algorithm?
:::
## Feature Importance
```{pyodide-python}
feature_importance = pd.DataFrame({
'Feature': X_train.columns,
'Importance': LR_model.coef_[0]
})
feature_importance = feature_importance.sort_values(by='Importance', ascending=False)
feature_importance
```
$\,$