{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Regressione: Come Scegliere il modello"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Questo notebook fornisce un esempio che permette di comparare il comportamento di diversi regressori e scegliere il migliore. Al fine di fare questa scelta viene utilizzata la cross-validation che permette di capire quanto il regressore รจ robusto a variazioni nel train-test set.\n",
"\n",
"\n",
"1. Caricare il dataset (di regressione). \n",
"2. Comparare i modelli\n",
"3. Scegliere il modello migliore\n",
"4. Applicare il modello allenato a un nuovo set di dati"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Importare le librerie"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.preprocessing import MinMaxScaler\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn import linear_model\n",
"from sklearn.metrics import mean_squared_error # MSE\n",
"from sklearn.metrics import mean_absolute_error # MAE\n",
"from sklearn.metrics import median_absolute_error # MedAE\n",
"from sklearn.preprocessing import PolynomialFeatures\n",
"from sklearn.decomposition import PCA\n",
"from sklearn.model_selection import KFold\n",
"from sklearn.model_selection import cross_val_score\n",
"from sklearn.linear_model import RANSACRegressor, SGDRegressor, HuberRegressor, TheilSenRegressor\n",
"from sklearn.linear_model import LinearRegression\n",
"from sklearn.linear_model import Lasso, Ridge\n",
"from sklearn.linear_model import ElasticNet\n",
"from sklearn.tree import DecisionTreeRegressor\n",
"from sklearn.neighbors import KNeighborsRegressor\n",
"from sklearn.svm import SVR\n",
"from sklearn.ensemble import AdaBoostRegressor\n",
"from sklearn.ensemble import GradientBoostingRegressor\n",
"from sklearn.ensemble import RandomForestRegressor\n",
"from sklearn.ensemble import ExtraTreesRegressor\n",
"import seaborn as sn\n",
"import matplotlib.pyplot as plt\n",
"import random\n",
"import numpy as np\n",
"import plotly.graph_objects as go\n",
"import pickle\n",
"import json\n",
"from sklearn.pipeline import Pipeline\n",
"import pandas as pd\n",
"from sklearn import datasets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Caricare il dataset\n",
"Assumiamo che l'ultima colonna del dataset sia il valore di target e che il dataset contenga come input solo valori numerici."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"def load_dataset(name=\"boston\"):\n",
" datasets_list = [\"boston\", \"diabetes\"] #,\"tips\"]\n",
" chosen_dataset = name #\"boston\"\n",
"\n",
" boston = datasets.load_boston()\n",
" df_boston = pd.DataFrame(boston.data,columns=boston.feature_names)\n",
" df_boston[\"price\"] = boston.target\n",
"\n",
" diabetes = datasets.load_diabetes()\n",
" df_diabetes = pd.DataFrame(diabetes.data,columns=diabetes.feature_names)\n",
" df_diabetes[\"desease\"] = diabetes.target\n",
" df_diabetes = df_diabetes.round(decimals=5)\n",
"\n",
" #df_nuovo_dataset = pd.read_csv(\"https://docs.google.com/spreadsheets/d/e/2PACX-1vSqHhx2kS9gCNmI04yksqTP2PRsT6ifTU2DLokKs3Y6KgcSGIAL7_4t_q_8kNhVkFA0xD2nt7hn_w-5/pub?output=csv\")\n",
"\n",
" dict_datasets = {\n",
" \"boston\": df_boston,\n",
" \"diabetes\": df_diabetes,\n",
" # \"nuovo\" : df_nuovo_dataset\n",
" } \n",
" df = dict_datasets[chosen_dataset]#.head(10)\n",
" df = df.dropna()\n",
"\n",
" #print(df.head())\n",
" X = df.iloc[:,0:-1].values # Input: dalla prima alla penultima colonna\n",
" Y = df.iloc[:,-1].values # Target: utlima colonna\n",
"\n",
" return X,Y,df"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n", " | CRIM | \n", "ZN | \n", "INDUS | \n", "CHAS | \n", "NOX | \n", "RM | \n", "AGE | \n", "DIS | \n", "RAD | \n", "TAX | \n", "PTRATIO | \n", "B | \n", "LSTAT | \n", "price | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0.00632 | \n", "18.0 | \n", "2.31 | \n", "0.0 | \n", "0.538 | \n", "6.575 | \n", "65.2 | \n", "4.0900 | \n", "1.0 | \n", "296.0 | \n", "15.3 | \n", "396.90 | \n", "4.98 | \n", "24.0 | \n", "
1 | \n", "0.02731 | \n", "0.0 | \n", "7.07 | \n", "0.0 | \n", "0.469 | \n", "6.421 | \n", "78.9 | \n", "4.9671 | \n", "2.0 | \n", "242.0 | \n", "17.8 | \n", "396.90 | \n", "9.14 | \n", "21.6 | \n", "
2 | \n", "0.02729 | \n", "0.0 | \n", "7.07 | \n", "0.0 | \n", "0.469 | \n", "7.185 | \n", "61.1 | \n", "4.9671 | \n", "2.0 | \n", "242.0 | \n", "17.8 | \n", "392.83 | \n", "4.03 | \n", "34.7 | \n", "
3 | \n", "0.03237 | \n", "0.0 | \n", "2.18 | \n", "0.0 | \n", "0.458 | \n", "6.998 | \n", "45.8 | \n", "6.0622 | \n", "3.0 | \n", "222.0 | \n", "18.7 | \n", "394.63 | \n", "2.94 | \n", "33.4 | \n", "
4 | \n", "0.06905 | \n", "0.0 | \n", "2.18 | \n", "0.0 | \n", "0.458 | \n", "7.147 | \n", "54.2 | \n", "6.0622 | \n", "3.0 | \n", "222.0 | \n", "18.7 | \n", "396.90 | \n", "5.33 | \n", "36.2 | \n", "