{ "cells": [ { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### Библиотеки" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### Создание" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### из списка" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "создаем Series значений из списка целых чисел" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2019-01-24T11:40:46.066107Z", "start_time": "2019-01-24T11:40:46.060063Z" }, "hidden": true }, "outputs": [], "source": [ "s = pd.Series(data=[10, 11, 12, 13, 14],\n", " index=[1, 2, 3, 5, 7])\n", "s" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "hidden": true }, "source": [ "создаем Series из строковых значений" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "s = pd.Series(['Blue', 'Yellow', 'Green'])\n", "s" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "создаем Series из 5 элементов, каждый элемент - list python" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true, "scrolled": true }, "outputs": [], "source": [ "l = [[1, 2]]\n", "s = pd.Series(l*5)\n", "s" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "создаем DataFrame из двумерного списка" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "df = pd.DataFrame([[10, 11], [20, 21], [30, 31]])\n", "df" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "задаем имена столбцов" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "df = pd.DataFrame([[10, 11], [20, 21], [30, 31]],\n", " columns=['A', 'B'])\n", "df" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "создаем DataFrame для списка объектов Series" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "series_1 = pd.Series([70, 90])\n", "series_2 = pd.Series([71, 91])\n", "df = pd.DataFrame([series_1, series_2])\n", "df" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "задаем имена столбцов после создания датафрейма" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "df.columns = ['col_1', 'col_2']\n", "df" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### из словаря " ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "hidden": true }, "source": [ "создаем объект Series из словаря, при этом посмотрим, как изменились индексы" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "s = pd.Series({'Homer': 'Dad',\n", " 'Marge': 'Mom',\n", " 'Bart': 'Son',\n", " 'Lisa': 'Daughter',\n", " 'Maggie': 'Daughter'})\n", "s" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "создание DataFrame с помощью питоновского словаря" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "list_1 = [70, 71]\n", "list_2 = [90, 91]\n", "temperatures = {'col_1': list_1,\n", " 'col_2': list_2}\n", "pd.DataFrame(temperatures)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "создание DataFrame с помощью словаря, состоящего из объектов Series" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "series_1 = pd.Series([70, 71])\n", "series_2 = pd.Series([90, 91])\n", "\n", "df = pd.DataFrame({'col_1': series_1,\n", " 'col_2': series_2})\n", "df" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### при помощи функций" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "создание Series, используя np.arange - последовательность чисел от **start** до **stop-1** с шагом **step**:\n", "```python \n", "np.arange(start, stop, step) \n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "s = pd.Series(np.arange(15,25,2))\n", "s" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "создаем Series из 5 значений, равномерно разбивающих отрезок 0 до 9" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "s = pd.Series(np.linspace(0, 9, 5))\n", "s" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Генерация случайных чисел.\n", "\n", "Зафикисруем значение seed, что позволит нам в будущем воcпроизводить свои результаты" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Создадим объект Series из 5 нормально распределенных случайных чисел" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "np.random.seed(123)\n", "s = pd.Series(np.random.normal(size=5))\n", "s" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Создадим объект DataFrame размерности 4х3 из случайных чисел" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "np.random.seed(123)\n", "df = pd.DataFrame(np.random.normal(size=12).reshape(4, 3),\n", " index=['ind_1', 'ind_2', 'ind_3', 'ind_4'],\n", " columns=['col_1', 'col_2', 'col_3'])\n", "df" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### [из файла](http://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-tools-text-csv-hdf5)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "| Column Name | Description\n", "| ------------- |:-------------:|\n", "|Symbol|Сокращенное название организации|\n", "|Name|Полное название организации|\n", "|Sector|Сектор экономики|\n", "|Price|Стоимость акции|\n", "|Dividend Yield|Дивидендная доходность|\n", "|Price/Earnings|Цена / прибыль|\n", "|Earnings/Share|Прибыль на акцию|\n", "|Book Value|Балансовая стоимость компании|\n", "|52 week low|52-недельный минимум|\n", "|52 week high|52-недельный максимум|\n", "|Market Cap|Рыночная капитализация|\n", "|EBITDA|**E**arnings **b**efore **i**nterest, **t**axes, **d**epreciation and **a**mortization|\n", "|Price/Sales|Цена / объём продаж|\n", "|Price/Book|Цена / балансовая стоимость|\n", "|SEC Filings|Ссылка *sec.gov*|" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "pd.read_csv(filepath_or_buffer = \"../data/sp500.csv\",\n", " sep = ';')" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "разделитель" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "pd.read_csv(filepath_or_buffer = \"../data/sp500.csv\",\n", " sep = ',')" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "количество строк" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "pd.read_csv(filepath_or_buffer = \"../data/sp500.csv\",\n", " sep = ',',\n", " nrows = 3)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "столбцы" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "pd.read_csv(filepath_or_buffer = \"../data/sp500.csv\",\n", " sep = ',',\n", " nrows = 3,\n", " usecols=['Symbol', 'Sector', 'Price', 'Book Value'])" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "индекс" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "pd.read_csv(filepath_or_buffer = \"../data/sp500.csv\",\n", " sep = ',',\n", " nrows = 3,\n", " usecols=['Symbol', 'Sector', 'Price', 'Book Value'],\n", " index_col='Symbol')" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "итератор" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "df_chunk = pd.read_csv(filepath_or_buffer = \"../data/sp500.csv\",\n", " sep = ',',\n", " chunksize=50,\n", " usecols=['Symbol', 'Sector', 'Price', 'Book Value'],\n", " index_col='Symbol')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "for df_tmp in df_chunk:\n", " print('DataFrame part:', df_tmp.shape) " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500 = pd.read_csv(filepath_or_buffer = \"../data/sp500.csv\",\n", " sep = ',',\n", " usecols=['Symbol', 'Sector', 'Price', 'Book Value'],\n", " index_col='Symbol')\n", "sp500" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### Свойства" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### создаем Series для примеров" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "Simpsons = pd.Series({'Homer': 120,\n", " 'Marge': 60,\n", " 'Bart': 35,\n", " 'Lisa': 30,\n", " 'Maggie': 7})\n", "\n", "Simpsons" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "np.random.seed(123)\n", "numbers = pd.Series(data = np.random.normal(size=10),\n", " index = np.arange(25,35))\n", "numbers" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### тип данных" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "Simpsons.dtype" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.dtypes" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### количество элементов" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Series:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "print('Первый способ:', len(Simpsons))\n", "print('Второй способ:', Simpsons.size)\n", "print('Третий способ:', Simpsons.shape)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "DataFrame:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "print('Первый способ:', len(sp500))\n", "print('Второй способ:', sp500.size)\n", "print('Третий способ:', sp500.shape)" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### количество уникальных элементов" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "Simpsons.nunique()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.nunique()" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### индекс и значения" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Series:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "Simpsons.index" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "Simpsons.values" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "DataFrame:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.index" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.values" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.columns" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "#### присвоение / изменение имени" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "##### объекта Series" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "Simpsons.name = 'Simpsons weight'\n", "Simpsons" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "##### индекса" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "Simpsons.index.name = 'First name'\n", "Simpsons" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "##### столбца" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500_copy = sp500.rename(columns = {'Book Value': 'BookValue'})" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "проверяем, не изменились ли имена столбцов в исходном датафрейме" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.columns" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500_copy.columns" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "этот программный код переименовывает столбец на месте" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500_copy.rename(columns = {'Book Value': 'BookValue'},\n", " inplace=True)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "смотрим изменилось ли имя столбца" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500_copy.columns" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### Вывод значений" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### первые / последние строки" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "Simpsons.tail(3)" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### столбцы " ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "извлекаем столбец Sector" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500['Sector'].head()" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "тип столбца датафрейма:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "type(sp500['Sector'])" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "извлекаем столбцы Price и Book Value" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500[['Price', 'Book Value']].head()" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "покажем, что результат является объектом DataFrame" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "type(sp500[['Price', 'Book Value']])" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "атрибутивный доступ к столбцу по имени" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.Price.head()" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "пример с названием \"Book Value\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.Book Value" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### строки" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "##### по метке " ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "**Series**" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "numbers.loc[[25,33]]" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "ошибка - нет метки" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "numbers.loc[0]" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "**DataFrame**" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "получаем строку с меткой индекса MMM,которая возвращается в виде объекта Series" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.loc['MMM']" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "type(sp500.loc['MMM'])" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "получаем строки MMM и MSFT результатом будет объект DataFrame" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.loc[['MMM', 'MSFT']]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "type(sp500.loc[['MMM', 'MSFT']])" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "##### по позиции" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "**Series** " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "numbers" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "по позиции" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "numbers.iloc[[5,-5]]" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "ошибка:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "numbers.iloc[10]" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "**DataFrame**" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "получаем строки, имеющие позиции 0 и 2" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.iloc[[0, 2]]" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "получаем позиции меток MMM и A в индексе" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "i1 = sp500.index.get_loc('MMM')\n", "i2 = sp500.index.get_loc('A')\n", "(i1, i2)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "и извлекаем строки" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.iloc[[i1, i2]]" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### поиск скалярного значения" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "ищем скалярное значение по метке строки и метке (имени) столбца" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.at['MMM', 'Price']" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "ищем скалярное значение по позиции строки и позиции столбца; извлекаем значение в строке 0, столбце 1" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.iat[0, 1]" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### одновременный отбор строк и столбцов" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "отбираем строки с метками индекса ABT и ZTS для столбцов Sector и Price" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.loc[['ABT', 'ZTS']][['Sector', 'Price']]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.loc[['ABT', 'ZTS'],['Sector', 'Price']]" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "определение номера позиций заданных меток" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "print(sp500.index.get_loc('ABT'),sp500.index.get_loc('ZTS')) " ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "отбор строк и столбцов по номеру позиций" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.iloc[[1,499],[0,1]]" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### транспонирование" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.T.head()" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### переиндексация" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "ошибка:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.loc[['MMM', 'ABBV', 'NEW VALUE']]" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "делаем переиндексацию, задав метки MMM, ABBV и NEW VALUE" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "reindexed = sp500.reindex(index=['MMM', 'ABBV', 'NEW VALUE'])" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "обратите внимание, что все индексы, кромя перечисленных при вызове, удалены, а *NEW VALUE* содержит значения *NaN*" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "reindexed" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "выполняем переиндексацию столбцов" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.reindex(columns=['Price', 'Book Value', 'NewCol']).head()" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "при этом можем заполнить отсутствующие значения константами вместо *NaN*" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.reindex(columns=['Price',\n", " 'Book Value',\n", " 'NewCol'],\n", " fill_value=0).head()" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### случайная подвыборка" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "отбираем три случайные строки" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.sample(n=3)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "случайный отбор с возвращением" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.sample(frac=5, replace=True, random_state=777)" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### [настройки вывода](http://pandas.pydata.org/pandas-docs/stable/user_guide/options.html#available-options)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "pd.options.display.max_rows" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "pd.options.display.max_rows = 10" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "pd.options.display.max_rows" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### Срезы данных" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### Series" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Задаём срез по правилу: [начальная позиция: конечная позиция: величина шага], при этом:\n", "- Правая граница - не включается\n", "- Шаг может быть отрицательным\n", "- Позиция также может быть отрицательной - тогда отсчёт происходит \"с другого конца\"\n", "- Нумерация происходит от нуля" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "numbers" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "срез, содержащий элементы с позициями от 1 по 5" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "numbers.iloc[1:6]" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "выбираем элементы в позициях 1, 3, 5 == выбираем элементы с 1 по 5 позицию с шагом 2" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "numbers.iloc[1:6:2]" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "можем оставить только конечную позицию" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "numbers.iloc[:6]" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "либо оставим только начальную позицию" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "numbers.iloc[3:]" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "отбираем элементы Series в обратном порядке, начиная с 5" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "numbers.iloc[5::-1]" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "отбор 4 последних строк" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "numbers.iloc[-4:]" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### DataFrame" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.iloc[:5]" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "в обратном порядке" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.iloc[4::-1]" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "строки, начиная с метки ABT и заканчивая меткой ACN" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.loc['ABT':'ACN']" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### Копирование и ссылки" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "numbers" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "элементы с 1 по 4" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "numbers.iloc[[1,2,3,4]]" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "сохранили в переменную n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "n = numbers.iloc[[1,2,3,4]]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "n" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "присваиваем значение 0 всем элементам" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "n.loc[:] = 0\n", "n" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "что-нибудь произошло с numbers?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "numbers" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "еще раз сохраним первые 4 элемента" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "n = numbers.iloc[[1,2,3,4]]\n", "n" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "создаем переменную k = срез с 1 по 4 элемент" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "k = numbers[1:5]\n", "k.loc[:] = 0\n", "k" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "numbers" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "воcстановили numbers" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "numbers[1:5] = n\n", "numbers" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### Удаление " ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### del" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Series" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "Simpsons" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "Simpsons_copy = Simpsons.copy()\n", "del Simpsons_copy['Maggie']\n", "Simpsons_copy" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "DataFrame" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500_copy = sp500.copy()\n", "del sp500_copy['Price']\n", "sp500_copy.iloc[:2]" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### pop " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500_copy = sp500.copy()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500_copy.head(3)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "эта строка удалит столбец Sector и возвратит его как серию" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "popped_column = sp500_copy.pop('Sector')" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "столбец Sector удален на месте" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500_copy.head(3)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "и у нас есть столбец Sector, полученный в результате применения pop" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "popped_column.head(3)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Для Series применение .pop идентично" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### drop " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500_copy = sp500.copy()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500_copy.head(3)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "- эта строка вернет новый датафрейм с удаленным столбцом 'Sector’\n", "- копия датафрейма не изменится" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500_copy_after_drop = sp500_copy.drop(['Sector'], axis = 1)\n", "sp500_copy_after_drop.head(3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500_copy.head(3)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "получаем копию первых 5 строк датафрейма data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500_part_copy = sp500.iloc[:5].copy()\n", "sp500_part_copy" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "удаляем строки с метками ABT и ACN" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500_part_copy = sp500_part_copy.drop(['ABT', 'ACN'], axis=0)\n", "sp500_part_copy.head(5)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Для Series применение .drop идентично" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### Фильтрация по условию" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### Series " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "numbers" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "какие строки имеют значения больше 0 и меньше 1?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "logical_results = (numbers > 0) & (numbers < 1)\n", "logical_results" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Скобки!!! Следующий программный код приведет к выдаче исключения\n", "```python\n", "numbers > 0 & numbers < 1 \n", "```" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "тип полученного результата - Series, который можно использовать для отбора интерсующих нас значений" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "type(logical_results)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "отбираем строки со значением True" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "numbers[logical_results]" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "использование метода .where" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "numbers.where((numbers > 0) & (numbers < 1))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "numbers.where((numbers > 0) & (numbers < 1), other = -1)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "все ли элементы >= 0?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "(numbers >= 0).all()" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "есть ли элемент < 2?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "(numbers < 2).any()" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "сколько значений < 1?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "numbers < 1" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "(numbers < 1).sum()" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### DataFrame" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "какие строки имеют значения Price < 100?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.Price < 100" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "теперь получим строки, в которых Price < 100" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500[sp500.Price < 100]" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "извлекаем лишь те строки, в которых значение Price < 10 и > 6" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "r = sp500[(sp500['Price'] < 10) & \n", " (sp500.Price > 6)] ['Price']\n", "r" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "извлекаем строки, в которых переменная Sector принимает значение Health Care, а переменная Price больше или равна 100.00" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "r = sp500[(sp500.Sector == 'Health Care') & \n", " (sp500.Price >= 100.00)] [['Price', 'Sector']]\n", "r" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "использование метода .isin" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "s_tmp = sp500.Sector.isin(['Information Technology', 'Financials'])\n", "s_tmp" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500[s_tmp].head()" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "использование метода .query" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "r = sp500[(sp500.Sector == 'Health Care') & \n", " (sp500.Price >= 100.00)] [['Price', 'Sector']]\n", "r" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "q = sp500.query(\"Sector=='Health Care' & Price >= 100\")[['Price', 'Sector']]\n", "q" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### Добавление " ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### оператор [ ] " ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "создаем копию, чтобы исходные данные остались в неизменном виде" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500_copy = sp500.copy()" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "добавляем столбец" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500_copy['RoundedPrice'] = sp500_copy.Price.round()\n", "sp500_copy.head(3)" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### метод .insert()" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "создаем копию, чтобы исходные данные остались в неизменном виде" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500_copy = sp500.copy()" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "вставляем столбец RoundedPrice в качестве третьего столбца датафрейма" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500_copy.insert(1, 'RoundedPrice', sp500_copy.Price.round())\n", "sp500_copy.head(3)" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### метод .assign() " ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "создаем копию, чтобы исходные данные остались в неизменном виде" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500_copy = sp500.copy()" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "одновременное добавление двух столбцов:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500_copy.assign(Rounded_Price=sp500_copy.Price.round(),\n", " R_BookValue_Price=lambda x: (x['Book Value'] / x['Rounded_Price']))" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### Выравнивание данных" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### Series " ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "первая серия для примеров" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "s_1 = pd.Series(data=[77,33,11],index=['a','b','f'])\n", "s_1" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "вторая серия для примеров" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "s_2 = pd.Series(data=[11,5,6],index=['c','b','a'])\n", "s_2" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "для непересекающейся части индексов будут получены значения NaN" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "s_1+s_2" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "метки не обязательно должны быть уникальными" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "s_1 = pd.Series(data=[77, 33, 15, 3], index=['a', 'a', 'a', 'd'])\n", "s_1" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "s_2 = pd.Series(data=[11, 5, 6], index=['c', 'a', 'a'])\n", "s_2" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "3 метки 'а' и 2 метки 'а', результат 6 меток а" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "s_2+s_1" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### DataFrame" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500_part_1 = sp500.iloc[0:5, 0:2].copy()\n", "sp500_part_2 = sp500.iloc[2:7, 1:3].copy()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500_part_1" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500_part_2" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500_part_1 + sp500_part_2" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "происходит выравнивание при создании датафрейма" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "series_1 = pd.Series([70, 90])\n", "series_2 = pd.Series([71, 91])\n", "series_3 = pd.Series([85, 87], index=[1, 2])\n", "df = pd.DataFrame({'col_1': series_1,\n", " 'col_2': series_2,\n", " 'col_3': series_3})\n", "df" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### Сортировка" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### по индексу " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.sort_index().head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.sort_index(axis=1).head()" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### по значению" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.sort_values(by='Price').head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.sort_values(by='Price', ascending=False).head()" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "#### наименьшее / наибольшее значение" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.nsmallest(5, 'Price')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "sp500.nlargest(5, 'Price')" ] } ], "metadata": { "hide_input": false, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": { "height": "829px", "left": "1473.61px", "top": "110px", "width": "340.788px" }, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }