{ "cells": [ { "cell_type": "markdown", "id": "76a37951", "metadata": {}, "source": [ "# Introduction\n", "I registered at Kaggle and did the Titanic Challenge. Kaggle is a programming challenge platform for machine learning and data analytics. This is a writeup of the Titanic challenge. In this challenge we are building a probabilistic model to predict the survival of passengers based on their gender, age, and socio-economic status.\n", "\n", "# Coding\n", "Kaggle provides us with the file train.csv for training our model. There is also a file test.csv of passengers whose survival we have to predict and upload to the platform.\n", "\n", "## Loading libraries\n", "First, we load all the necessary libraries." ] }, { "cell_type": "code", "execution_count": 1, "id": "f8cf915a", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from pandas_profiling import ProfileReport\n", "from sklearn.naive_bayes import GaussianNB\n", "from sklearn.model_selection import train_test_split" ] }, { "cell_type": "markdown", "id": "a8fc8a86", "metadata": {}, "source": [ "## Loading data\n", "Next, we load the data provided by Kaggle and explore it." ] }, { "cell_type": "code", "execution_count": 2, "id": "d1238cb6", "metadata": {}, "outputs": [], "source": [ "train = pd.read_csv('kaggle/train.csv')\n", "test = pd.read_csv('kaggle/test.csv')" ] }, { "cell_type": "markdown", "id": "b315af35", "metadata": {}, "source": [ "The various columns of the data are as follows:\n", "- survival: 0 = No, 1 = Yes\n", "- pclass: Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd\n", "- sex: Sex\n", "- Age: Age in years \t\n", "- sibsp: # of siblings / spouses aboard the Titanic \t\n", "- parch: # of parents / children aboard the Titanic \t\n", "- ticket: Ticket number \t\n", "- fare: Passenger fare \t\n", "- cabin: Cabin number \t\n", "- embarked: Port of Embarkation" ] }, { "cell_type": "code", "execution_count": 3, "id": "5bf12433", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | PassengerId | \n", "Survived | \n", "Pclass | \n", "Name | \n", "Sex | \n", "Age | \n", "SibSp | \n", "Parch | \n", "Ticket | \n", "Fare | \n", "Cabin | \n", "Embarked | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "0 | \n", "3 | \n", "Braund, Mr. Owen Harris | \n", "male | \n", "22.0 | \n", "1 | \n", "0 | \n", "A/5 21171 | \n", "7.2500 | \n", "NaN | \n", "S | \n", "
1 | \n", "2 | \n", "1 | \n", "1 | \n", "Cumings, Mrs. John Bradley (Florence Briggs Th... | \n", "female | \n", "38.0 | \n", "1 | \n", "0 | \n", "PC 17599 | \n", "71.2833 | \n", "C85 | \n", "C | \n", "
2 | \n", "3 | \n", "1 | \n", "3 | \n", "Heikkinen, Miss. Laina | \n", "female | \n", "26.0 | \n", "0 | \n", "0 | \n", "STON/O2. 3101282 | \n", "7.9250 | \n", "NaN | \n", "S | \n", "
3 | \n", "4 | \n", "1 | \n", "1 | \n", "Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n", "female | \n", "35.0 | \n", "1 | \n", "0 | \n", "113803 | \n", "53.1000 | \n", "C123 | \n", "S | \n", "
4 | \n", "5 | \n", "0 | \n", "3 | \n", "Allen, Mr. William Henry | \n", "male | \n", "35.0 | \n", "0 | \n", "0 | \n", "373450 | \n", "8.0500 | \n", "NaN | \n", "S | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
886 | \n", "887 | \n", "0 | \n", "2 | \n", "Montvila, Rev. Juozas | \n", "male | \n", "27.0 | \n", "0 | \n", "0 | \n", "211536 | \n", "13.0000 | \n", "NaN | \n", "S | \n", "
887 | \n", "888 | \n", "1 | \n", "1 | \n", "Graham, Miss. Margaret Edith | \n", "female | \n", "19.0 | \n", "0 | \n", "0 | \n", "112053 | \n", "30.0000 | \n", "B42 | \n", "S | \n", "
888 | \n", "889 | \n", "0 | \n", "3 | \n", "Johnston, Miss. Catherine Helen \"Carrie\" | \n", "female | \n", "NaN | \n", "1 | \n", "2 | \n", "W./C. 6607 | \n", "23.4500 | \n", "NaN | \n", "S | \n", "
889 | \n", "890 | \n", "1 | \n", "1 | \n", "Behr, Mr. Karl Howell | \n", "male | \n", "26.0 | \n", "0 | \n", "0 | \n", "111369 | \n", "30.0000 | \n", "C148 | \n", "C | \n", "
890 | \n", "891 | \n", "0 | \n", "3 | \n", "Dooley, Mr. Patrick | \n", "male | \n", "32.0 | \n", "0 | \n", "0 | \n", "370376 | \n", "7.7500 | \n", "NaN | \n", "Q | \n", "
891 rows × 12 columns
\n", "\n", " | PassengerId | \n", "Survived | \n", "Pclass | \n", "Age | \n", "SibSp | \n", "Parch | \n", "Fare | \n", "
---|---|---|---|---|---|---|---|
count | \n", "891.000000 | \n", "891.000000 | \n", "891.000000 | \n", "714.000000 | \n", "891.000000 | \n", "891.000000 | \n", "891.000000 | \n", "
mean | \n", "446.000000 | \n", "0.383838 | \n", "2.308642 | \n", "29.699118 | \n", "0.523008 | \n", "0.381594 | \n", "32.204208 | \n", "
std | \n", "257.353842 | \n", "0.486592 | \n", "0.836071 | \n", "14.526497 | \n", "1.102743 | \n", "0.806057 | \n", "49.693429 | \n", "
min | \n", "1.000000 | \n", "0.000000 | \n", "1.000000 | \n", "0.420000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "
25% | \n", "223.500000 | \n", "0.000000 | \n", "2.000000 | \n", "20.125000 | \n", "0.000000 | \n", "0.000000 | \n", "7.910400 | \n", "
50% | \n", "446.000000 | \n", "0.000000 | \n", "3.000000 | \n", "28.000000 | \n", "0.000000 | \n", "0.000000 | \n", "14.454200 | \n", "
75% | \n", "668.500000 | \n", "1.000000 | \n", "3.000000 | \n", "38.000000 | \n", "1.000000 | \n", "0.000000 | \n", "31.000000 | \n", "
max | \n", "891.000000 | \n", "1.000000 | \n", "3.000000 | \n", "80.000000 | \n", "8.000000 | \n", "6.000000 | \n", "512.329200 | \n", "