{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "collapsed_sections": [
        "tdj5M1nLwzq4"
      ]
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "#**Линейная регрессия**\n",
        "▶ Linear Regression  \n",
        "\n",
        "**Функция модели**\n",
        "$$y = \\sum_{i=1}^{p}({x_iw_i}) + b$$\n",
        "\n",
        "\n",
        "$$y = x_1w_1 + x_2w_2 + ... + x_pw_p + b$$\n",
        "\n",
        "или\n",
        "\n",
        "$$y = \\sum_{i=0}^{p}({x_iw_i})$$\n",
        "$$x_0 = 1$$\n",
        "\n",
        "**Предсказание** - $\\hat{y}$\n",
        "$$\\hat{y} = \\sum_{i=1}^{p}({x_i\\hat{w_i}}) + \\hat{b}$$\n",
        "\n",
        "**Цель** - подобрать $\\hat{w_i}$ и $\\hat{b}$ так, чтобы разница между $y$ (истинным значением целевой функции) и $\\hat{y}$ (предсказанием модели) была минимальной.\n",
        "\n",
        "**Функция потерь**\n",
        "$$L(y,\\hat{y}) = \\frac{1}{n}\\sum_{i=1}^{n}({y_i -\\hat{y_i} })^2$$\n",
        "$$L(w_1,...,w_p) = \\frac{1}{n}\\sum_{i=1}^{n}({y_i -(x_1w_1 + ... + x_pw_p + b) })^2$$\n",
        "\n"
      ],
      "metadata": {
        "id": "tdj5M1nLwzq4"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "##**Формулировка задачи**\n",
        "\n",
        "\n",
        "\n",
        "\n",
        "Задана выборка значений признаков:\n",
        "$$X_n : \\{x_1, x_2, ..., x_n  \\space|  \\space x_i \\in R^p\\}$$\n",
        "\n",
        "Здесь $n$ - количество элементов в выборке входных данных, $p$ - размерность признакового пространства.\n",
        "\n",
        "Задана выборка соответствующих значений целевой переменной:\n",
        "$$Y_n : \\{y_1, y_2, ..., y_n \\space| \\space y_i \\in R\\}$$\n",
        "\n",
        "Получаем множество исходных данных:\n",
        "$$D : \\{(x, y)_i\\},\\space i = 1...n $$\n",
        "\n",
        "Задано параметрическое семейство функций $f(w, x)$ зависящее от параметров W и от входных признаков X:\n",
        "$$f(w,x) = x_0w_0+x_1w_1 + x_2w_2 + ... + x_pw_p$$\n",
        "\n",
        "Нужно построить модель, предсказывающую по $x_i$ значение $\\hat{y_i}$, наиболее близкое к $y_i$ :\n",
        "$$\\hat{y_i} = f(w, x_i)$$\n",
        "\n",
        "$$|\\hat{y_i} - y_i| → 0$$"
      ],
      "metadata": {
        "id": "lK9sOBtmn1dW"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "##**Градиентный спуск**\n",
        "▶ Gradient Descent  \n",
        "\n",
        "**Градиент** - вектор, указывающий направление роста функции:\n",
        "$$\\nabla L(w_1,...,w_p) = (\\frac{\\partial L}{\\partial w_1},...,\\frac{\\partial L}{\\partial w_p})$$\n",
        "$$\\frac{\\partial L}{\\partial w_i} = \\frac{\\partial}{\\partial w_i} \\Bigr(\\frac{1}{n}\\sum_{i=1}^{n}({y_i -(x_1w_1 + ... + x_pw_p + b) })^2\\Bigl)$$\n",
        "\n",
        "Для каждого веса:\n",
        "\n",
        "$$\\frac{\\partial L}{\\partial w_1} = \\frac{2}{n}\\sum_{i=1}^{n}({x_1w_1 + ... + x_pw_p + b -y_i })x_1 = \\frac{2}{n}\\sum_{i=1}^{n}(wx+b-y_i)x_1$$\n",
        "$$...$$\n",
        "$$\\frac{\\partial L}{\\partial w_p} = \\frac{2}{n}\\sum_{i=1}^{n}({x_1w_1 + ... + x_pw_p + b -y_i })x_p = \\frac{2}{n}\\sum_{i=1}^{n}(wx+b-y_i)x_p$$  \n",
        "\n",
        "Смещение ($bias$):\n",
        "\n",
        "$$\\frac{\\partial L}{\\partial b} = \\frac{2}{n}\\sum_{i=1}^{n}({x_1w_1 + ... + x_pw_p + b -y_i }) = \\frac{2}{n}\\sum_{i=1}^{n}(wx+b-y_i)$$  \n",
        "Обновление весов и смещения:\n",
        "$$w = w-α\\nabla L(w)$$\n",
        "$$b = b-α\\nabla L(b)$$\n",
        "$\\alpha$ - скорость обучения ($learning\\ rate$)\n",
        "\n",
        "**Процесс обучения**:  \n",
        "\n",
        "$w_1, ..., w_p :=0$  \n",
        "$b :=0$  \n",
        "$for\\ i\\ in\\ range(n\\_iter):$  \n",
        "$\\ \\ \\ \\ \\ w_1:=w_1 - \\alpha \\frac{\\partial L}{\\partial w_1}$  \n",
        "$\\ \\ \\ \\ \\ ...$  \n",
        "$\\ \\ \\ \\ \\ w_p:=w_p - \\alpha \\frac{\\partial L}{\\partial w_p}$  \n",
        "$\\ \\ \\ \\ \\ b:=b - \\alpha \\frac{\\partial L}{\\partial b}$"
      ],
      "metadata": {
        "id": "2vT1hAyy5veJ"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Модель с единственным признаком\n",
        "$$y = wx+b$$\n",
        "$$\\hat{y} = {\\hat{w}x} + \\hat{b}$$\n",
        "$$L(y,\\hat{y}) = \\frac{1}{n}\\sum_{i=1}^{n}({y_i -\\hat{y_i} })^2$$\n",
        "Пусть:  \n",
        "$X = [1, 2, 3]$  \n",
        "$Y = [1, 2, 3]$  \n",
        "$n = 3$ (число экземпляров данных)  \n",
        "$w = 1,5$  \n",
        "$b = 0$  \n",
        "\n",
        "Тогда:  \n",
        "$L(x) = \\frac{1}{n}\\sum_{i=1}^{n}({y_i -(wx_i+b) })^2$  \n",
        "$\\frac{\\partial L}{\\partial w} = \\frac{\\partial}{\\partial w} \\frac{1}{n}\\sum_{i=1}^{n}({y_i -(wx_i+b) })^2 = \\frac{1}{n}\\sum_{i=1}^{n} \\frac{\\partial}{\\partial w}({y_i -(wx_i+b) })^2$  \n",
        "\n",
        "$\\frac{\\partial L}{\\partial w} = \\frac{1}{n}\\sum_{i=1}^{n}2({y_i -(wx_i+b)(-x_i)})$    \n",
        "\n",
        "$\\frac{\\partial L}{\\partial w} = \\frac{2}{n}\\sum_{i=1}^{n}({wx_i+b - y_i })x_i$  \n",
        "\n",
        "Градиент (расчет выполняется для всех данных выборки - полная **эпоха**):  \n",
        "$\\frac{\\partial L}{\\partial w} = \\frac{2}{3}\\Bigr((1,5w*1-1)*1 + ((1,5w*2-2)*2) + ((1,5w*3-3)*3)\\Bigl)$  \n",
        "\n",
        "* Если при расчете используется полный набор (батч) экземпляров данных (используется весь датасет) - **Градиентный спуск, Batch GD**.  \n",
        "* Если при расчете используется только 1 случайный экземпляр данных - **Стохастический градиентный спуск, SGD**.  \n",
        "* Если при расчете используется мини-батч (случайное подмножество экземпляров данных заданного размера, часть от исходного датасета) - **Градиентный спуск с использованием мини-батчей, Mini-batch GD**"
      ],
      "metadata": {
        "id": "rB_vpJWO9JcR"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "# **Код**"
      ],
      "metadata": {
        "id": "Bpy8IhU390cS"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import numpy as np\n",
        "import pandas as pd\n",
        "import matplotlib.pyplot as plt"
      ],
      "metadata": {
        "id": "VF9a5ORmFLd_"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "###Функция линейной регрессии"
      ],
      "metadata": {
        "id": "MMa2luy6G6O4"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "def func_lin_reg(x, w, b):\n",
        "  return x * w + b"
      ],
      "metadata": {
        "id": "JU1kCgISG2nW"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "###Пример с тремя точками"
      ],
      "metadata": {
        "id": "sGYd0NKcHLq_"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "**Документация:**  \n",
        "[numpy.linspace](https://numpy.org/doc/stable/reference/generated/numpy.linspace.html)"
      ],
      "metadata": {
        "id": "5q07p7fj-tpZ"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "lin_space_row = np.linspace(start=0, stop=100, num=5)\n",
        "lin_space_row"
      ],
      "metadata": {
        "id": "7w9cTgel-Si8"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "**Документация:**  \n",
        "[matplotlib.pyplot.plot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html)"
      ],
      "metadata": {
        "id": "8vF3b19Z_KnT"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "w = 1\n",
        "b = 0\n",
        "\n",
        "X_fake = np.linspace(0, 100, 100)\n",
        "\n",
        "plt.plot(X_fake, func_lin_reg(X_fake, w, b), color='blue')\n",
        "plt.plot([1, 2, 3], [1, 2, 3], 'x', color='red', linewidth=20, markersize=12)\n",
        "plt.xlim(0, 4)\n",
        "plt.ylim(0, 4)"
      ],
      "metadata": {
        "id": "VfqEYa-5HQBi"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "from ipywidgets import interact, FloatSlider\n",
        "%matplotlib inline\n",
        "\n",
        "# Интерактивная функция для отображения графика\n",
        "@interact(w=FloatSlider(value=1.0, min=-2.0, max=2.0, step=0.1, description='w:'),\n",
        "          b=FloatSlider(value=0.0, min=-2.0, max=2.0, step=0.1, description='b:'))\n",
        "\n",
        "def plot_regression(w=1.0, b=0.0):\n",
        "    plt.figure(figsize=(6, 5))\n",
        "\n",
        "    X_fake = np.linspace(0, 100, 100)\n",
        "    plt.plot(X_fake, func_lin_reg(X_fake, w, b), color='blue', label=f'y = {w:.1f}x + {b:.1f}')\n",
        "\n",
        "    # Точки данных\n",
        "    plt.plot([1, 2, 3], [1, 2, 3], 'x', color='red', markersize=12, label='Данные')\n",
        "\n",
        "    plt.xlim(0, 4)\n",
        "    plt.ylim(0, 4)\n",
        "    plt.xlabel('x')\n",
        "    plt.ylabel('y')\n",
        "    plt.legend()"
      ],
      "metadata": {
        "id": "Ia9Ewqgys_kD"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "###Функция потерь\n",
        "$$L(y,\\hat{y}) = \\frac{1}{n}\\sum_{i=1}^{n}({y_i -\\hat{y_i} })^2$$"
      ],
      "metadata": {
        "id": "UazHl4VcJeWS"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "**Документация:**  \n",
        "[numpy.mean](https://numpy.org/doc/2.0/reference/generated/numpy.mean.html)  \n",
        "[numpy.square](https://numpy.org/doc/2.0/reference/generated/numpy.square.html)"
      ],
      "metadata": {
        "id": "GBywCB6s_mj-"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "def mse(y, y_pred):\n",
        "  return np.mean(np.square(y - y_pred))"
      ],
      "metadata": {
        "id": "UuviB1NjJhXF"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "X_three = np.array([1, 2, 3])\n",
        "y_three = np.array([1, 2, 3])\n",
        "mse(y_three, func_lin_reg(X_three, 1, 0))"
      ],
      "metadata": {
        "id": "3iJDmtNsJ6HT"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "w_possible_values = np.linspace(-5, 5, 30)\n",
        "X_three_dots = np.array([1, 2, 3])\n",
        "y_three_dots = np.array([1, 2, 3])\n",
        "\n",
        "w=1.5\n",
        "b=0.0\n",
        "\n",
        "fig, (ax1, ax2) = plt.subplots(1, 2)\n",
        "\n",
        "# prediction plot\n",
        "ax1.plot(X_fake, func_lin_reg(X_fake, w, b), color='blue')\n",
        "ax1.plot(X_three_dots, y_three_dots, 'x', color='red', linewidth=20, markersize=12)\n",
        "ax1.set_xlim(0, 4)\n",
        "ax1.set_ylim(0, 4)\n",
        "\n",
        "# loss plot\n",
        "ax2.plot(w_possible_values,\n",
        "        [mse(y_three_dots, func_lin_reg(X_three_dots, w, b)) for w in w_possible_values])\n",
        "ax2.plot(w, mse(y_three_dots, func_lin_reg(X_three_dots, w, b)), 'x', color='red', markersize=12)\n",
        "ax2.set_xlim(-1, 3)\n",
        "ax2.set_ylim(0, 10)"
      ],
      "metadata": {
        "id": "YlfaooKyKJBs"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "import numpy as np\n",
        "import matplotlib.pyplot as plt\n",
        "from ipywidgets import interact, FloatSlider\n",
        "\n",
        "%matplotlib inline\n",
        "\n",
        "# --- Вспомогательные функции (должны быть определены ДО использования) ---\n",
        "def func_lin_reg(x, w, b):\n",
        "    return w * x + b\n",
        "\n",
        "def mse(y_true, y_pred):\n",
        "    return np.mean((y_true - y_pred) ** 2)\n",
        "\n",
        "# --- Константы ---\n",
        "X = np.array([1, 2, 3], dtype=float)\n",
        "y = np.array([1, 2, 3], dtype=float)\n",
        "X_fake = np.linspace(0, 4, 200)\n",
        "b = 0.0  # фиксированное смещение\n",
        "\n",
        "# --- Статичная кривая MSE(w) при фиксированном b ---\n",
        "w_curve = np.linspace(-1, 3, 400)\n",
        "mse_curve = np.array([mse(y, func_lin_reg(X, w_i, b)) for w_i in w_curve])\n",
        "\n",
        "# --- Интерактивная визуализация ---\n",
        "@interact(\n",
        "    w=FloatSlider(value=1.0, min=-1.0, max=3.0, step=0.05, description='w:')\n",
        ")\n",
        "def plot_interactive(w=1.0):\n",
        "    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))\n",
        "\n",
        "    # --- Левый график: предсказания ---\n",
        "    ax1.plot(X_fake, func_lin_reg(X_fake, w, b), 'b-', label=f'y = {w:.2f}x + {b:.1f}')\n",
        "    ax1.plot(X, y, 'rx', markersize=12, label='Данные')  # ← красные точки ДАННЫХ\n",
        "    ax1.set_xlim(0, 4)\n",
        "    ax1.set_ylim(0, 4)\n",
        "    ax1.set_xlabel('x')\n",
        "    ax1.set_ylabel('y')\n",
        "    ax1.legend()\n",
        "    ax1.grid(True)\n",
        "    ax1.set_title('Предсказания модели')\n",
        "\n",
        "    # --- Правый график: статичная парабола + подвижная точка ---\n",
        "    ax2.plot(w_curve, mse_curve, 'g-', linewidth=2, label='MSE(w)')\n",
        "    current_mse = mse(y, func_lin_reg(X, w, b))\n",
        "    ax2.plot(w, current_mse, 'rx', markersize=12, label='Текущая точка')  # ← красный крестик на параболе\n",
        "    ax2.set_xlim(-1, 3)\n",
        "    ax2.set_ylim(0, 10)\n",
        "    ax2.set_xlabel('w')\n",
        "    ax2.set_ylabel('MSE')\n",
        "    ax2.legend()\n",
        "    ax2.grid(True)\n",
        "    ax2.set_title(f'Функция потерь (MSE), b = {b:.1f}')\n",
        "\n",
        "    plt.tight_layout()\n",
        "    plt.show()"
      ],
      "metadata": {
        "id": "dgG-TameD1qQ"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Генерация набора данных  \n",
        "**Документация:**  \n",
        "[numpy.random.gumbel](https://numpy.org/doc/stable/reference/random/generated/numpy.random.gumbel.html)  \n",
        "[numpy.reshape](https://numpy.org/doc/2.0/reference/generated/numpy.reshape.html)  \n",
        "[numpy.random.normal](https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html)"
      ],
      "metadata": {
        "id": "z-aBRLj0FWt-"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "gumbel = np.random.gumbel(3, 2, 10)\n",
        "gumbel"
      ],
      "metadata": {
        "id": "392A4pLMETf2"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "gumbel.shape"
      ],
      "metadata": {
        "id": "alyeYYaJEwty"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "gumbel.reshape(-1,1)"
      ],
      "metadata": {
        "id": "M38X0OmsFL6R"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "gumbel.reshape(10,1)"
      ],
      "metadata": {
        "id": "JEElNAsHEzQW"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "$$y = 80000*x$$"
      ],
      "metadata": {
        "id": "zKuqubn_FWwu"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "X = np.random.gumbel(loc=50, scale=10, size=1000).reshape(-1,1) # генерируем фичи\n",
        "y = X * 80000 # генерируем таргет данные\n",
        "y = y + np.random.normal(loc=0, scale=800000, size=(1000, 1)) # добавляем шум"
      ],
      "metadata": {
        "id": "Wcw3KcxLFWOH"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "X[:5] # первые 5 значений"
      ],
      "metadata": {
        "id": "b0zAz4YyGW98"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "y[:5] # первые 5 значений"
      ],
      "metadata": {
        "id": "I4S_t-AGGiY5"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "plt.plot(X, y, 'o', alpha=0.3)"
      ],
      "metadata": {
        "id": "C07Cv3wjF_5z"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Идеальная модель (подставили известные веса)"
      ],
      "metadata": {
        "id": "9UzCn7gcGtXd"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "w = 80000\n",
        "b = 0\n",
        "\n",
        "x_vals = np.arange(30, 110) # числовой ряд\n",
        "\n",
        "plt.plot(X, y, 'o', linewidth=20, alpha=0.3)\n",
        "plt.plot(x_vals, func_lin_reg(x_vals, w, b), color='red')"
      ],
      "metadata": {
        "id": "atOxor4tImjG"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "### То же самое, но со сгенерированным датасетом"
      ],
      "metadata": {
        "id": "g4_Me8gXKr5H"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "X_linespace = np.linspace(0, 100, 100)\n",
        "w_possible_values = np.linspace(0, 100000, 10000)\n",
        "\n",
        "w = 80000\n",
        "b = 0\n",
        "\n",
        "fig, (ax1, ax2) = plt.subplots(1, 2)\n",
        "\n",
        "fig.set_figwidth(10)\n",
        "\n",
        "# prediction plot\n",
        "ax1.plot(X, y, 'x', color='blue', linewidth=20, markersize=12)\n",
        "ax1.plot(x_vals, func_lin_reg(x_vals, w, b), color='red')\n",
        "\n",
        "# loss plot\n",
        "ax2.plot(w_possible_values, [mse(y, func_lin_reg(X, w, b)) for w in w_possible_values])\n",
        "ax2.plot(w, mse(y, func_lin_reg(X, w, b)), 'o', color='red', alpha=0.3)"
      ],
      "metadata": {
        "id": "mHC_XiizKx7E"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Разбиение на train/test  \n",
        "\n",
        "**Документация**  \n",
        "[sklearn.model_selection.train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)"
      ],
      "metadata": {
        "id": "JQDlceXhHtZs"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from sklearn.model_selection import train_test_split\n",
        "\n",
        "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)\n",
        "\n",
        "print(\"X_train.shape: {}\".format(X_train.shape))\n",
        "print(\"y_train.shape: {}\".format(y_train.shape))\n",
        "print(\"X_test.shape: {}\".format(X_test.shape))\n",
        "print(\"y_test.shape: {}\".format(y_test.shape))"
      ],
      "metadata": {
        "id": "ULgnCbZiLxVO"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "###Визуализация разбиения"
      ],
      "metadata": {
        "id": "616JmuODMTHP"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "plt.plot(X_train, y_train, 'bo', label=\"Train\", alpha=0.3) # тренировочные данные\n",
        "plt.plot(X_test, y_test, 'rx', label=\"Test\", alpha=0.3) # тестовые данные\n",
        "plt.xlabel(\"Метраж\") # надпись по оси X\n",
        "plt.ylabel(\"Стоимость\") # надпись по оси Y\n",
        "plt.legend() # отображение легенды\n",
        "plt.show() # отображение графика"
      ],
      "metadata": {
        "id": "-1nBdHj1MXsr"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "###Описание модели"
      ],
      "metadata": {
        "id": "Gj1qjZqzNjru"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "class MyLinearRegression:\n",
        "\n",
        "  def __init__(self, lr, n_epochs):\n",
        "    self.lr = lr # скорость обучения\n",
        "    self.n_epochs = n_epochs # число эпох\n",
        "\n",
        "  def mse(self, y, y_pred):\n",
        "    return np.sum(np.square(y - y_pred)) / y.shape[0]\n",
        "\n",
        "  def loss_gradient_w(self, y_pred, y, x):\n",
        "    return 2 * np.sum((y_pred - y) * x) / y.shape[0]\n",
        "\n",
        "  def loss_gradient_b(self, y_pred, y):\n",
        "    return 2 * np.sum(y_pred - y) / y.shape[0]\n",
        "\n",
        "  def fit(self, X, y):\n",
        "    self.w = 0\n",
        "    self.b = 0\n",
        "    for i in range(self.n_epochs):\n",
        "      self.w = self.w - self.lr * self.loss_gradient_w(self.predict(X), y, X)\n",
        "      self.b = self.b - self.lr * self.loss_gradient_b(self.predict(X), y)\n",
        "      print(f\"MSE: {mse(y, self.predict(X))}\")\n",
        "\n",
        "  def predict(self, X):\n",
        "    return self.w * X + self.b"
      ],
      "metadata": {
        "id": "w0HTeSXYNnuH"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "y_train.shape[0]"
      ],
      "metadata": {
        "id": "G7OcCJCsIhsr"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Обучение модели"
      ],
      "metadata": {
        "id": "O4IeuhlhOG0E"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "X_train.shape"
      ],
      "metadata": {
        "id": "PEdHweGRH5oy"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "X_train.shape"
      ],
      "metadata": {
        "id": "h5uuR8PSOJYu"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "model = MyLinearRegression(10e-5, 20)\n",
        "model.fit(X_train, y_train)"
      ],
      "metadata": {
        "id": "19j47zR9OK7v"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Визуализация результатов"
      ],
      "metadata": {
        "id": "D2MkFVcgOmxH"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "plt.plot(X_train, y_train, 'bo', label=\"Train\", alpha=0.2) # тренировочные данные\n",
        "plt.plot(X_test, y_test, 'bx', label=\"Test\", alpha=0.2) # тестовые данные\n",
        "xx = np.arange(30, 110).reshape(-1, 1) # числовой ряд\n",
        "plt.plot(xx, model.predict(xx), 'r--', label=\"Model\") # график предсказаний модели\n",
        "plt.xlabel(\"Метраж\") # надпись по оси X\n",
        "plt.ylabel(\"Стоимость\") # надпись по оси Y\n",
        "plt.legend() # отображение легенды\n",
        "plt.show() # отображение графика"
      ],
      "metadata": {
        "id": "CQqqTiGzOpvr"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "model.w"
      ],
      "metadata": {
        "id": "qwbuDXAQQgvT"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "model.b"
      ],
      "metadata": {
        "id": "zpHzxv3VQ1ef"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "### То же самое, но с использованием `sklearn`\n",
        "\n",
        "📘 Ссылка на документацию: [LinearRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression)"
      ],
      "metadata": {
        "id": "CC42G-vDQB_9"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from sklearn.linear_model import LinearRegression\n",
        "\n",
        "sklearn_model = LinearRegression()\n",
        "sklearn_model.fit(X_train, y_train)"
      ],
      "metadata": {
        "id": "mH1Hj4f6QKym"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Параметр `model.coef_` - веса модели (W)  \n",
        "Параметр `model.intercept`_ - свободный параметр (смещение) модели (b)"
      ],
      "metadata": {
        "id": "RyymplyAQRAE"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "sklearn_model.coef_"
      ],
      "metadata": {
        "id": "5uo47XVEQcMU"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "sklearn_model.intercept_"
      ],
      "metadata": {
        "id": "mUZTyJMeQ4Pv"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Визуальное сравнение моделей"
      ],
      "metadata": {
        "id": "_8lndGN4RQSx"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "plt.plot(X_train, y_train, 'bo', label=\"Train\", alpha=0.2) # тренировочные данные\n",
        "plt.plot(X_test, y_test, 'bx', label=\"Test\", alpha=0.2) # тестовые данные\n",
        "xx = np.arange(30, 110).reshape(-1, 1) # числовой ряд\n",
        "plt.plot(xx, model.predict(xx), 'r--', label=\"Model\") # график предсказаний модели\n",
        "plt.plot(xx, sklearn_model.predict(xx), 'm--', label=\"sklearn Model\") # график предсказаний модели\n",
        "plt.xlabel(\"Метраж\") # надпись по оси X\n",
        "plt.ylabel(\"Стоимость\") # надпись по оси Y\n",
        "plt.legend() # отображение легенды\n",
        "plt.show() # отображение графика"
      ],
      "metadata": {
        "id": "Sej4eRK8RW36"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "plt.plot(X_train, y_train, 'bo', label=\"Train\", alpha=0.2) # тренировочные данные\n",
        "plt.plot(X_test, y_test, 'bx', label=\"Test\", alpha=0.2) # тестовые данные\n",
        "xx = np.arange(30, 110).reshape(-1, 1) # числовой ряд\n",
        "plt.plot(xx, model.predict(xx), 'r--', label=\"Model\") # график предсказаний модели\n",
        "plt.plot(xx, sklearn_model.predict(xx), 'm--', label=\"sklearn Model\") # график предсказаний модели\n",
        "plt.xlabel(\"Метраж\") # надпись по оси X\n",
        "plt.ylabel(\"Стоимость\") # надпись по оси Y\n",
        "plt.legend() # отображение легенды\n",
        "\n",
        "plt.xlim(40, 45)\n",
        "plt.ylim(3100000, 3800000)\n",
        "plt.show() # отображение графика"
      ],
      "metadata": {
        "id": "gbJxB7_URqEw"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "y_pred_mymodel = model.predict(X_test)\n",
        "y_pred_skmodel = sklearn_model.predict(X_test)"
      ],
      "metadata": {
        "id": "gbocF5MyJXKt"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "mse_mymodel = mse(y_test, y_pred_mymodel)\n",
        "mse_mymodel"
      ],
      "metadata": {
        "id": "alV1Md_HJsRF"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "mse_skmodel = mse(y_test, y_pred_skmodel)\n",
        "mse_skmodel"
      ],
      "metadata": {
        "id": "-Kk7cPNvJ3fk"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "$$MSE = \\frac{1}{n}\\sum_{i=1}^{n}({y_i -\\hat{y_i} })^2$$\n"
      ],
      "metadata": {
        "id": "KBhNci23k3Ns"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from sklearn.metrics import mean_squared_error\n",
        "\n",
        "mse_mymodel = mean_squared_error(y_test, y_pred_mymodel)\n",
        "mse_mymodel"
      ],
      "metadata": {
        "id": "hM4YylcCKtU5"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "mse_skmodel = mean_squared_error(y_test, y_pred_skmodel)\n",
        "mse_skmodel"
      ],
      "metadata": {
        "id": "lHYRbLJKK2B2"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "$$MAE = \\frac{1}{n}\\sum_{i=1}^{n}|{y_i -\\hat{y_i} }|$$\n"
      ],
      "metadata": {
        "id": "xIGEZYtVK9RA"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from sklearn.metrics import mean_absolute_error\n",
        "\n",
        "mae_mymodel = mean_absolute_error(y_test, y_pred_mymodel)\n",
        "mae_mymodel"
      ],
      "metadata": {
        "id": "sOvixBbZJqMn"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "mae_skmodel = mean_absolute_error(y_test, y_pred_skmodel)\n",
        "mae_skmodel"
      ],
      "metadata": {
        "id": "i1FqOY7dKMsK"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "$$MAPE = \\frac{1}{n}\\sum_{i=1}^{n}\\frac{|{y_i -\\hat{y_i} }|}{y_i}$$\n",
        "\n",
        "\n"
      ],
      "metadata": {
        "id": "o6D0JokuLAOD"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from sklearn.metrics import mean_absolute_percentage_error\n",
        "\n",
        "mape_mymodel = mean_absolute_percentage_error(y_test, y_pred_mymodel)\n",
        "mape_mymodel"
      ],
      "metadata": {
        "id": "7e4GBXxCKVjy"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "mape_skmodel = mean_absolute_percentage_error(y_test, y_pred_skmodel)\n",
        "mape_skmodel"
      ],
      "metadata": {
        "id": "7BSV0FrXKiV3"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "$$R^2 = 1 - \\frac{\\sum_{i=1}^{n}({y_i -\\hat{y_i} })^2} {\\sum_{i=1}^{n}({y_i -\\bar{y_i} })^2}$$\n",
        "\n",
        "$$R^2 = 1 - \\frac{MSE_{model}}{MSE_{avg}}$$"
      ],
      "metadata": {
        "id": "I9BuQ0gbLDSS"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from sklearn.metrics import r2_score\n",
        "\n",
        "r2_mymodel = r2_score(y_test, y_pred_mymodel)\n",
        "r2_mymodel"
      ],
      "metadata": {
        "id": "7ThJ8hWMLZ5M"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "r2_skmodel = r2_score(y_test, y_pred_skmodel)\n",
        "r2_skmodel"
      ],
      "metadata": {
        "id": "G-XY1iG1Lgub"
      },
      "execution_count": null,
      "outputs": []
    }
  ]
}