{ "cells": [ { "cell_type": "code", "execution_count": 128, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Read the dataset\n", "To read the dataset we are going to use the function `read_csv` from the [pandas library](https://pandas.pydata.org/). In the following box the dataset is first loaded as a \"dataframe\" (similar to those from R), each column correspond to a variable (dimension) and each row to a point.\n", "\n", "This dataset consist of $n=9$ __physiological and medical variables (columns)__ measured for $m=768$ __patients (rows)__\n", "\n", "Each column represents the following variables:\n", "\n", "+ column 0: *Pregnancies*: Number of times pregnant\n", "+ column 1: *Glucose*: Plasma glucose concentration a 2 hours in an oral glucose tolerance test\n", "+ column 2: *BloodPressure*: Diastolic blood pressure (mm Hg)\n", "+ column 3: *SkinThickness*: Triceps skin fold thickness (mm)\n", "+ column 4: *Insulin*: 2-Hour serum insulin (mu U/ml)\n", "+ column 5: *BMI*: Body mass index (weight in kg/(height in m)^2)\n", "+ column 6: *DiabetesPedigreeFunction*: Diabetes pedigree function\n", "+ column 7: *Age*: Age (years)\n", "+ column 8: *Outcome*: The person is diabetic or not (1 or 0)\n" ] }, { "cell_type": "code", "execution_count": 129, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "dataset = pd.read_csv('diabetes.csv',header=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To see the first 3 lines of the dataset we use the `head` method with a parameter `3`" ] }, { "cell_type": "code", "execution_count": 130, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Pregnancies | \n", "Glucose | \n", "BloodPressure | \n", "SkinThickness | \n", "Insulin | \n", "BMI | \n", "DiabetesPedigreeFunction | \n", "Age | \n", "Outcome | \n", "
---|---|---|---|---|---|---|---|---|---|
0 | \n", "6 | \n", "148 | \n", "72 | \n", "35 | \n", "0 | \n", "33.6 | \n", "0.627 | \n", "50 | \n", "1 | \n", "
1 | \n", "1 | \n", "85 | \n", "66 | \n", "29 | \n", "0 | \n", "26.6 | \n", "0.351 | \n", "31 | \n", "0 | \n", "
2 | \n", "8 | \n", "183 | \n", "64 | \n", "0 | \n", "0 | \n", "23.3 | \n", "0.672 | \n", "32 | \n", "1 | \n", "