{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [], "collapsed_sections": [ "tdj5M1nLwzq4" ] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "#**Линейная регрессия**\n", "▶ Linear Regression \n", "\n", "**Функция модели**\n", "$$y = \\sum_{i=1}^{p}({x_iw_i}) + b$$\n", "\n", "\n", "$$y = x_1w_1 + x_2w_2 + ... + x_pw_p + b$$\n", "\n", "или\n", "\n", "$$y = \\sum_{i=0}^{p}({x_iw_i})$$\n", "$$x_0 = 1$$\n", "\n", "**Предсказание** - $\\hat{y}$\n", "$$\\hat{y} = \\sum_{i=1}^{p}({x_i\\hat{w_i}}) + \\hat{b}$$\n", "\n", "**Цель** - подобрать $\\hat{w_i}$ и $\\hat{b}$ так, чтобы разница между $y$ (истинным значением целевой функции) и $\\hat{y}$ (предсказанием модели) была минимальной.\n", "\n", "**Функция потерь**\n", "$$L(y,\\hat{y}) = \\frac{1}{n}\\sum_{i=1}^{n}({y_i -\\hat{y_i} })^2$$\n", "$$L(w_1,...,w_p) = \\frac{1}{n}\\sum_{i=1}^{n}({y_i -(x_1w_1 + ... + x_pw_p + b) })^2$$\n", "\n" ], "metadata": { "id": "tdj5M1nLwzq4" } }, { "cell_type": "markdown", "source": [ "##**Формулировка задачи**\n", "\n", "\n", "\n", "\n", "Задана выборка значений признаков:\n", "$$X_n : \\{x_1, x_2, ..., x_n \\space| \\space x_i \\in R^p\\}$$\n", "\n", "Здесь $n$ - количество элементов в выборке входных данных, $p$ - размерность признакового пространства.\n", "\n", "Задана выборка соответствующих значений целевой переменной:\n", "$$Y_n : \\{y_1, y_2, ..., y_n \\space| \\space y_i \\in R\\}$$\n", "\n", "Получаем множество исходных данных:\n", "$$D : \\{(x, y)_i\\},\\space i = 1...n $$\n", "\n", "Задано параметрическое семейство функций $f(w, x)$ зависящее от параметров W и от входных признаков X:\n", "$$f(w,x) = x_0w_0+x_1w_1 + x_2w_2 + ... + x_pw_p$$\n", "\n", "Нужно построить модель, предсказывающую по $x_i$ значение $\\hat{y_i}$, наиболее близкое к $y_i$ :\n", "$$\\hat{y_i} = f(w, x_i)$$\n", "\n", "$$|\\hat{y_i} - y_i| → 0$$" ], "metadata": { "id": "lK9sOBtmn1dW" } }, { "cell_type": "markdown", "source": [ "##**Градиентный спуск**\n", "▶ Gradient Descent \n", "\n", "**Градиент** - вектор, указывающий направление роста функции:\n", "$$\\nabla L(w_1,...,w_p) = (\\frac{\\partial L}{\\partial w_1},...,\\frac{\\partial L}{\\partial w_p})$$\n", "$$\\frac{\\partial L}{\\partial w_i} = \\frac{\\partial}{\\partial w_i} \\Bigr(\\frac{1}{n}\\sum_{i=1}^{n}({y_i -(x_1w_1 + ... + x_pw_p + b) })^2\\Bigl)$$\n", "\n", "Для каждого веса:\n", "\n", "$$\\frac{\\partial L}{\\partial w_1} = \\frac{2}{n}\\sum_{i=1}^{n}({x_1w_1 + ... + x_pw_p + b -y_i })x_1 = \\frac{2}{n}\\sum_{i=1}^{n}(wx+b-y_i)x_1$$\n", "$$...$$\n", "$$\\frac{\\partial L}{\\partial w_p} = \\frac{2}{n}\\sum_{i=1}^{n}({x_1w_1 + ... + x_pw_p + b -y_i })x_p = \\frac{2}{n}\\sum_{i=1}^{n}(wx+b-y_i)x_p$$ \n", "\n", "Смещение ($bias$):\n", "\n", "$$\\frac{\\partial L}{\\partial b} = \\frac{2}{n}\\sum_{i=1}^{n}({x_1w_1 + ... + x_pw_p + b -y_i }) = \\frac{2}{n}\\sum_{i=1}^{n}(wx+b-y_i)$$ \n", "Обновление весов и смещения:\n", "$$w = w-α\\nabla L(w)$$\n", "$$b = b-α\\nabla L(b)$$\n", "$\\alpha$ - скорость обучения ($learning\\ rate$)\n", "\n", "**Процесс обучения**: \n", "\n", "$w_1, ..., w_p :=0$ \n", "$b :=0$ \n", "$for\\ i\\ in\\ range(n\\_iter):$ \n", "$\\ \\ \\ \\ \\ w_1:=w_1 - \\alpha \\frac{\\partial L}{\\partial w_1}$ \n", "$\\ \\ \\ \\ \\ ...$ \n", "$\\ \\ \\ \\ \\ w_p:=w_p - \\alpha \\frac{\\partial L}{\\partial w_p}$ \n", "$\\ \\ \\ \\ \\ b:=b - \\alpha \\frac{\\partial L}{\\partial b}$" ], "metadata": { "id": "2vT1hAyy5veJ" } }, { "cell_type": "markdown", "source": [ "### Модель с единственным признаком\n", "$$y = wx+b$$\n", "$$\\hat{y} = {\\hat{w}x} + \\hat{b}$$\n", "$$L(y,\\hat{y}) = \\frac{1}{n}\\sum_{i=1}^{n}({y_i -\\hat{y_i} })^2$$\n", "Пусть: \n", "$X = [1, 2, 3]$ \n", "$Y = [1, 2, 3]$ \n", "$n = 3$ (число экземпляров данных) \n", "$w = 1,5$ \n", "$b = 0$ \n", "\n", "Тогда: \n", "$L(x) = \\frac{1}{n}\\sum_{i=1}^{n}({y_i -(wx_i+b) })^2$ \n", "$\\frac{\\partial L}{\\partial w} = \\frac{\\partial}{\\partial w} \\frac{1}{n}\\sum_{i=1}^{n}({y_i -(wx_i+b) })^2 = \\frac{1}{n}\\sum_{i=1}^{n} \\frac{\\partial}{\\partial w}({y_i -(wx_i+b) })^2$ \n", "\n", "$\\frac{\\partial L}{\\partial w} = \\frac{1}{n}\\sum_{i=1}^{n}2({y_i -(wx_i+b)(-x_i)})$ \n", "\n", "$\\frac{\\partial L}{\\partial w} = \\frac{2}{n}\\sum_{i=1}^{n}({wx_i+b - y_i })x_i$ \n", "\n", "Градиент (расчет выполняется для всех данных выборки - полная **эпоха**): \n", "$\\frac{\\partial L}{\\partial w} = \\frac{2}{3}\\Bigr((1,5w*1-1)*1 + ((1,5w*2-2)*2) + ((1,5w*3-3)*3)\\Bigl)$ \n", "\n", "* Если при расчете используется полный набор (батч) экземпляров данных (используется весь датасет) - **Градиентный спуск, Batch GD**. \n", "* Если при расчете используется только 1 случайный экземпляр данных - **Стохастический градиентный спуск, SGD**. \n", "* Если при расчете используется мини-батч (случайное подмножество экземпляров данных заданного размера, часть от исходного датасета) - **Градиентный спуск с использованием мини-батчей, Mini-batch GD**" ], "metadata": { "id": "rB_vpJWO9JcR" } }, { "cell_type": "markdown", "source": [ "# **Код**" ], "metadata": { "id": "Bpy8IhU390cS" } }, { "cell_type": "code", "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt" ], "metadata": { "id": "VF9a5ORmFLd_" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "###Функция линейной регрессии" ], "metadata": { "id": "MMa2luy6G6O4" } }, { "cell_type": "code", "source": [ "def func_lin_reg(x, w, b):\n", " return x * w + b" ], "metadata": { "id": "JU1kCgISG2nW" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "###Пример с тремя точками" ], "metadata": { "id": "sGYd0NKcHLq_" } }, { "cell_type": "markdown", "source": [ "**Документация:** \n", "[numpy.linspace](https://numpy.org/doc/stable/reference/generated/numpy.linspace.html)" ], "metadata": { "id": "5q07p7fj-tpZ" } }, { "cell_type": "code", "source": [ "lin_space_row = np.linspace(start=0, stop=100, num=5)\n", "lin_space_row" ], "metadata": { "id": "7w9cTgel-Si8" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "**Документация:** \n", "[matplotlib.pyplot.plot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html)" ], "metadata": { "id": "8vF3b19Z_KnT" } }, { "cell_type": "code", "source": [ "w = 1\n", "b = 0\n", "\n", "X_fake = np.linspace(0, 100, 100)\n", "\n", "plt.plot(X_fake, func_lin_reg(X_fake, w, b), color='blue')\n", "plt.plot([1, 2, 3], [1, 2, 3], 'x', color='red', linewidth=20, markersize=12)\n", "plt.xlim(0, 4)\n", "plt.ylim(0, 4)" ], "metadata": { "id": "VfqEYa-5HQBi" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "from ipywidgets import interact, FloatSlider\n", "%matplotlib inline\n", "\n", "# Интерактивная функция для отображения графика\n", "@interact(w=FloatSlider(value=1.0, min=-2.0, max=2.0, step=0.1, description='w:'),\n", " b=FloatSlider(value=0.0, min=-2.0, max=2.0, step=0.1, description='b:'))\n", "\n", "def plot_regression(w=1.0, b=0.0):\n", " plt.figure(figsize=(6, 5))\n", "\n", " X_fake = np.linspace(0, 100, 100)\n", " plt.plot(X_fake, func_lin_reg(X_fake, w, b), color='blue', label=f'y = {w:.1f}x + {b:.1f}')\n", "\n", " # Точки данных\n", " plt.plot([1, 2, 3], [1, 2, 3], 'x', color='red', markersize=12, label='Данные')\n", "\n", " plt.xlim(0, 4)\n", " plt.ylim(0, 4)\n", " plt.xlabel('x')\n", " plt.ylabel('y')\n", " plt.legend()" ], "metadata": { "id": "Ia9Ewqgys_kD" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "###Функция потерь\n", "$$L(y,\\hat{y}) = \\frac{1}{n}\\sum_{i=1}^{n}({y_i -\\hat{y_i} })^2$$" ], "metadata": { "id": "UazHl4VcJeWS" } }, { "cell_type": "markdown", "source": [ "**Документация:** \n", "[numpy.mean](https://numpy.org/doc/2.0/reference/generated/numpy.mean.html) \n", "[numpy.square](https://numpy.org/doc/2.0/reference/generated/numpy.square.html)" ], "metadata": { "id": "GBywCB6s_mj-" } }, { "cell_type": "code", "source": [ "def mse(y, y_pred):\n", " return np.mean(np.square(y - y_pred))" ], "metadata": { "id": "UuviB1NjJhXF" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "X_three = np.array([1, 2, 3])\n", "y_three = np.array([1, 2, 3])\n", "mse(y_three, func_lin_reg(X_three, 1, 0))" ], "metadata": { "id": "3iJDmtNsJ6HT" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "w_possible_values = np.linspace(-5, 5, 30)\n", "X_three_dots = np.array([1, 2, 3])\n", "y_three_dots = np.array([1, 2, 3])\n", "\n", "w=1.5\n", "b=0.0\n", "\n", "fig, (ax1, ax2) = plt.subplots(1, 2)\n", "\n", "# prediction plot\n", "ax1.plot(X_fake, func_lin_reg(X_fake, w, b), color='blue')\n", "ax1.plot(X_three_dots, y_three_dots, 'x', color='red', linewidth=20, markersize=12)\n", "ax1.set_xlim(0, 4)\n", "ax1.set_ylim(0, 4)\n", "\n", "# loss plot\n", "ax2.plot(w_possible_values,\n", " [mse(y_three_dots, func_lin_reg(X_three_dots, w, b)) for w in w_possible_values])\n", "ax2.plot(w, mse(y_three_dots, func_lin_reg(X_three_dots, w, b)), 'x', color='red', markersize=12)\n", "ax2.set_xlim(-1, 3)\n", "ax2.set_ylim(0, 10)" ], "metadata": { "id": "YlfaooKyKJBs" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from ipywidgets import interact, FloatSlider\n", "\n", "%matplotlib inline\n", "\n", "# --- Вспомогательные функции (должны быть определены ДО использования) ---\n", "def func_lin_reg(x, w, b):\n", " return w * x + b\n", "\n", "def mse(y_true, y_pred):\n", " return np.mean((y_true - y_pred) ** 2)\n", "\n", "# --- Константы ---\n", "X = np.array([1, 2, 3], dtype=float)\n", "y = np.array([1, 2, 3], dtype=float)\n", "X_fake = np.linspace(0, 4, 200)\n", "b = 0.0 # фиксированное смещение\n", "\n", "# --- Статичная кривая MSE(w) при фиксированном b ---\n", "w_curve = np.linspace(-1, 3, 400)\n", "mse_curve = np.array([mse(y, func_lin_reg(X, w_i, b)) for w_i in w_curve])\n", "\n", "# --- Интерактивная визуализация ---\n", "@interact(\n", " w=FloatSlider(value=1.0, min=-1.0, max=3.0, step=0.05, description='w:')\n", ")\n", "def plot_interactive(w=1.0):\n", " fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))\n", "\n", " # --- Левый график: предсказания ---\n", " ax1.plot(X_fake, func_lin_reg(X_fake, w, b), 'b-', label=f'y = {w:.2f}x + {b:.1f}')\n", " ax1.plot(X, y, 'rx', markersize=12, label='Данные') # ← красные точки ДАННЫХ\n", " ax1.set_xlim(0, 4)\n", " ax1.set_ylim(0, 4)\n", " ax1.set_xlabel('x')\n", " ax1.set_ylabel('y')\n", " ax1.legend()\n", " ax1.grid(True)\n", " ax1.set_title('Предсказания модели')\n", "\n", " # --- Правый график: статичная парабола + подвижная точка ---\n", " ax2.plot(w_curve, mse_curve, 'g-', linewidth=2, label='MSE(w)')\n", " current_mse = mse(y, func_lin_reg(X, w, b))\n", " ax2.plot(w, current_mse, 'rx', markersize=12, label='Текущая точка') # ← красный крестик на параболе\n", " ax2.set_xlim(-1, 3)\n", " ax2.set_ylim(0, 10)\n", " ax2.set_xlabel('w')\n", " ax2.set_ylabel('MSE')\n", " ax2.legend()\n", " ax2.grid(True)\n", " ax2.set_title(f'Функция потерь (MSE), b = {b:.1f}')\n", "\n", " plt.tight_layout()\n", " plt.show()" ], "metadata": { "id": "dgG-TameD1qQ" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "### Генерация набора данных \n", "**Документация:** \n", "[numpy.random.gumbel](https://numpy.org/doc/stable/reference/random/generated/numpy.random.gumbel.html) \n", "[numpy.reshape](https://numpy.org/doc/2.0/reference/generated/numpy.reshape.html) \n", "[numpy.random.normal](https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html)" ], "metadata": { "id": "z-aBRLj0FWt-" } }, { "cell_type": "code", "source": [ "gumbel = np.random.gumbel(3, 2, 10)\n", "gumbel" ], "metadata": { "id": "392A4pLMETf2" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "gumbel.shape" ], "metadata": { "id": "alyeYYaJEwty" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "gumbel.reshape(-1,1)" ], "metadata": { "id": "M38X0OmsFL6R" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "gumbel.reshape(10,1)" ], "metadata": { "id": "JEElNAsHEzQW" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "$$y = 80000*x$$" ], "metadata": { "id": "zKuqubn_FWwu" } }, { "cell_type": "code", "source": [ "X = np.random.gumbel(loc=50, scale=10, size=1000).reshape(-1,1) # генерируем фичи\n", "y = X * 80000 # генерируем таргет данные\n", "y = y + np.random.normal(loc=0, scale=800000, size=(1000, 1)) # добавляем шум" ], "metadata": { "id": "Wcw3KcxLFWOH" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "X[:5] # первые 5 значений" ], "metadata": { "id": "b0zAz4YyGW98" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "y[:5] # первые 5 значений" ], "metadata": { "id": "I4S_t-AGGiY5" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "plt.plot(X, y, 'o', alpha=0.3)" ], "metadata": { "id": "C07Cv3wjF_5z" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Идеальная модель (подставили известные веса)" ], "metadata": { "id": "9UzCn7gcGtXd" } }, { "cell_type": "code", "source": [ "w = 80000\n", "b = 0\n", "\n", "x_vals = np.arange(30, 110) # числовой ряд\n", "\n", "plt.plot(X, y, 'o', linewidth=20, alpha=0.3)\n", "plt.plot(x_vals, func_lin_reg(x_vals, w, b), color='red')" ], "metadata": { "id": "atOxor4tImjG" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "### То же самое, но со сгенерированным датасетом" ], "metadata": { "id": "g4_Me8gXKr5H" } }, { "cell_type": "code", "source": [ "X_linespace = np.linspace(0, 100, 100)\n", "w_possible_values = np.linspace(0, 100000, 10000)\n", "\n", "w = 80000\n", "b = 0\n", "\n", "fig, (ax1, ax2) = plt.subplots(1, 2)\n", "\n", "fig.set_figwidth(10)\n", "\n", "# prediction plot\n", "ax1.plot(X, y, 'x', color='blue', linewidth=20, markersize=12)\n", "ax1.plot(x_vals, func_lin_reg(x_vals, w, b), color='red')\n", "\n", "# loss plot\n", "ax2.plot(w_possible_values, [mse(y, func_lin_reg(X, w, b)) for w in w_possible_values])\n", "ax2.plot(w, mse(y, func_lin_reg(X, w, b)), 'o', color='red', alpha=0.3)" ], "metadata": { "id": "mHC_XiizKx7E" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Разбиение на train/test \n", "\n", "**Документация** \n", "[sklearn.model_selection.train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)" ], "metadata": { "id": "JQDlceXhHtZs" } }, { "cell_type": "code", "source": [ "from sklearn.model_selection import train_test_split\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)\n", "\n", "print(\"X_train.shape: {}\".format(X_train.shape))\n", "print(\"y_train.shape: {}\".format(y_train.shape))\n", "print(\"X_test.shape: {}\".format(X_test.shape))\n", "print(\"y_test.shape: {}\".format(y_test.shape))" ], "metadata": { "id": "ULgnCbZiLxVO" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "###Визуализация разбиения" ], "metadata": { "id": "616JmuODMTHP" } }, { "cell_type": "code", "source": [ "plt.plot(X_train, y_train, 'bo', label=\"Train\", alpha=0.3) # тренировочные данные\n", "plt.plot(X_test, y_test, 'rx', label=\"Test\", alpha=0.3) # тестовые данные\n", "plt.xlabel(\"Метраж\") # надпись по оси X\n", "plt.ylabel(\"Стоимость\") # надпись по оси Y\n", "plt.legend() # отображение легенды\n", "plt.show() # отображение графика" ], "metadata": { "id": "-1nBdHj1MXsr" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "###Описание модели" ], "metadata": { "id": "Gj1qjZqzNjru" } }, { "cell_type": "code", "source": [ "class MyLinearRegression:\n", "\n", " def __init__(self, lr, n_epochs):\n", " self.lr = lr # скорость обучения\n", " self.n_epochs = n_epochs # число эпох\n", "\n", " def mse(self, y, y_pred):\n", " return np.sum(np.square(y - y_pred)) / y.shape[0]\n", "\n", " def loss_gradient_w(self, y_pred, y, x):\n", " return 2 * np.sum((y_pred - y) * x) / y.shape[0]\n", "\n", " def loss_gradient_b(self, y_pred, y):\n", " return 2 * np.sum(y_pred - y) / y.shape[0]\n", "\n", " def fit(self, X, y):\n", " self.w = 0\n", " self.b = 0\n", " for i in range(self.n_epochs):\n", " self.w = self.w - self.lr * self.loss_gradient_w(self.predict(X), y, X)\n", " self.b = self.b - self.lr * self.loss_gradient_b(self.predict(X), y)\n", " print(f\"MSE: {mse(y, self.predict(X))}\")\n", "\n", " def predict(self, X):\n", " return self.w * X + self.b" ], "metadata": { "id": "w0HTeSXYNnuH" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "y_train.shape[0]" ], "metadata": { "id": "G7OcCJCsIhsr" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "### Обучение модели" ], "metadata": { "id": "O4IeuhlhOG0E" } }, { "cell_type": "code", "source": [ "X_train.shape" ], "metadata": { "id": "PEdHweGRH5oy" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "X_train.shape" ], "metadata": { "id": "h5uuR8PSOJYu" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "model = MyLinearRegression(10e-5, 20)\n", "model.fit(X_train, y_train)" ], "metadata": { "id": "19j47zR9OK7v" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "### Визуализация результатов" ], "metadata": { "id": "D2MkFVcgOmxH" } }, { "cell_type": "code", "source": [ "plt.plot(X_train, y_train, 'bo', label=\"Train\", alpha=0.2) # тренировочные данные\n", "plt.plot(X_test, y_test, 'bx', label=\"Test\", alpha=0.2) # тестовые данные\n", "xx = np.arange(30, 110).reshape(-1, 1) # числовой ряд\n", "plt.plot(xx, model.predict(xx), 'r--', label=\"Model\") # график предсказаний модели\n", "plt.xlabel(\"Метраж\") # надпись по оси X\n", "plt.ylabel(\"Стоимость\") # надпись по оси Y\n", "plt.legend() # отображение легенды\n", "plt.show() # отображение графика" ], "metadata": { "id": "CQqqTiGzOpvr" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "model.w" ], "metadata": { "id": "qwbuDXAQQgvT" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "model.b" ], "metadata": { "id": "zpHzxv3VQ1ef" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "### То же самое, но с использованием `sklearn`\n", "\n", "📘 Ссылка на документацию: [LinearRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression)" ], "metadata": { "id": "CC42G-vDQB_9" } }, { "cell_type": "code", "source": [ "from sklearn.linear_model import LinearRegression\n", "\n", "sklearn_model = LinearRegression()\n", "sklearn_model.fit(X_train, y_train)" ], "metadata": { "id": "mH1Hj4f6QKym" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Параметр `model.coef_` - веса модели (W) \n", "Параметр `model.intercept`_ - свободный параметр (смещение) модели (b)" ], "metadata": { "id": "RyymplyAQRAE" } }, { "cell_type": "code", "source": [ "sklearn_model.coef_" ], "metadata": { "id": "5uo47XVEQcMU" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "sklearn_model.intercept_" ], "metadata": { "id": "mUZTyJMeQ4Pv" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "### Визуальное сравнение моделей" ], "metadata": { "id": "_8lndGN4RQSx" } }, { "cell_type": "code", "source": [ "plt.plot(X_train, y_train, 'bo', label=\"Train\", alpha=0.2) # тренировочные данные\n", "plt.plot(X_test, y_test, 'bx', label=\"Test\", alpha=0.2) # тестовые данные\n", "xx = np.arange(30, 110).reshape(-1, 1) # числовой ряд\n", "plt.plot(xx, model.predict(xx), 'r--', label=\"Model\") # график предсказаний модели\n", "plt.plot(xx, sklearn_model.predict(xx), 'm--', label=\"sklearn Model\") # график предсказаний модели\n", "plt.xlabel(\"Метраж\") # надпись по оси X\n", "plt.ylabel(\"Стоимость\") # надпись по оси Y\n", "plt.legend() # отображение легенды\n", "plt.show() # отображение графика" ], "metadata": { "id": "Sej4eRK8RW36" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "plt.plot(X_train, y_train, 'bo', label=\"Train\", alpha=0.2) # тренировочные данные\n", "plt.plot(X_test, y_test, 'bx', label=\"Test\", alpha=0.2) # тестовые данные\n", "xx = np.arange(30, 110).reshape(-1, 1) # числовой ряд\n", "plt.plot(xx, model.predict(xx), 'r--', label=\"Model\") # график предсказаний модели\n", "plt.plot(xx, sklearn_model.predict(xx), 'm--', label=\"sklearn Model\") # график предсказаний модели\n", "plt.xlabel(\"Метраж\") # надпись по оси X\n", "plt.ylabel(\"Стоимость\") # надпись по оси Y\n", "plt.legend() # отображение легенды\n", "\n", "plt.xlim(40, 45)\n", "plt.ylim(3100000, 3800000)\n", "plt.show() # отображение графика" ], "metadata": { "id": "gbJxB7_URqEw" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "y_pred_mymodel = model.predict(X_test)\n", "y_pred_skmodel = sklearn_model.predict(X_test)" ], "metadata": { "id": "gbocF5MyJXKt" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "mse_mymodel = mse(y_test, y_pred_mymodel)\n", "mse_mymodel" ], "metadata": { "id": "alV1Md_HJsRF" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "mse_skmodel = mse(y_test, y_pred_skmodel)\n", "mse_skmodel" ], "metadata": { "id": "-Kk7cPNvJ3fk" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "$$MSE = \\frac{1}{n}\\sum_{i=1}^{n}({y_i -\\hat{y_i} })^2$$\n" ], "metadata": { "id": "KBhNci23k3Ns" } }, { "cell_type": "code", "source": [ "from sklearn.metrics import mean_squared_error\n", "\n", "mse_mymodel = mean_squared_error(y_test, y_pred_mymodel)\n", "mse_mymodel" ], "metadata": { "id": "hM4YylcCKtU5" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "mse_skmodel = mean_squared_error(y_test, y_pred_skmodel)\n", "mse_skmodel" ], "metadata": { "id": "lHYRbLJKK2B2" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "$$MAE = \\frac{1}{n}\\sum_{i=1}^{n}|{y_i -\\hat{y_i} }|$$\n" ], "metadata": { "id": "xIGEZYtVK9RA" } }, { "cell_type": "code", "source": [ "from sklearn.metrics import mean_absolute_error\n", "\n", "mae_mymodel = mean_absolute_error(y_test, y_pred_mymodel)\n", "mae_mymodel" ], "metadata": { "id": "sOvixBbZJqMn" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "mae_skmodel = mean_absolute_error(y_test, y_pred_skmodel)\n", "mae_skmodel" ], "metadata": { "id": "i1FqOY7dKMsK" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "$$MAPE = \\frac{1}{n}\\sum_{i=1}^{n}\\frac{|{y_i -\\hat{y_i} }|}{y_i}$$\n", "\n", "\n" ], "metadata": { "id": "o6D0JokuLAOD" } }, { "cell_type": "code", "source": [ "from sklearn.metrics import mean_absolute_percentage_error\n", "\n", "mape_mymodel = mean_absolute_percentage_error(y_test, y_pred_mymodel)\n", "mape_mymodel" ], "metadata": { "id": "7e4GBXxCKVjy" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "mape_skmodel = mean_absolute_percentage_error(y_test, y_pred_skmodel)\n", "mape_skmodel" ], "metadata": { "id": "7BSV0FrXKiV3" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "$$R^2 = 1 - \\frac{\\sum_{i=1}^{n}({y_i -\\hat{y_i} })^2} {\\sum_{i=1}^{n}({y_i -\\bar{y_i} })^2}$$\n", "\n", "$$R^2 = 1 - \\frac{MSE_{model}}{MSE_{avg}}$$" ], "metadata": { "id": "I9BuQ0gbLDSS" } }, { "cell_type": "code", "source": [ "from sklearn.metrics import r2_score\n", "\n", "r2_mymodel = r2_score(y_test, y_pred_mymodel)\n", "r2_mymodel" ], "metadata": { "id": "7ThJ8hWMLZ5M" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "r2_skmodel = r2_score(y_test, y_pred_skmodel)\n", "r2_skmodel" ], "metadata": { "id": "G-XY1iG1Lgub" }, "execution_count": null, "outputs": [] } ] }