Titanic Disaster Survival Prediction System

titanic disaster

About The Disaster -

The sinking of the Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew.


Getting Started on Challenge -

The objective of this competition is to learn from a given trained data set to predict who would survive or perish in the Titanic disaster. In essence, we will create our own Machine Learning (ML) model that, using the provided trained data, will automatically predict the passenger survivability.


Submission -

Datasets : Three data sets are provided to us for this competition: the training dataset (train.csv), the test dataset (test.csv) and gender submission outputs (gender_submission.csv). The earlier one is used to train our Machine Learning model, and the latter one is used to test it.

dataset head() :

train_data.head() train
test_data.head() test
gender.head() gender

Survivals : To get an estimate of what percentage of men and women have survived, we can make use of the “Survived” attribute and calculate the respective survived percentages as shown in the below code snippet.

survivals

Initial Score : After the initial submission followed by Kaggle titanic tutorial, submission
the initial score was 77.5 % score


Contribution - RandomForest Classifier

The objective was to improve the original score that was secured by following the titanic tutorial in Kaggle.

Firstly, identify the unique values of the training dataset.

unique

Now, visualizing the dataset Bar graph

bar

Here, we can see the rate of survivability of women is higher than men and thus lot of other features are too, need to be considered to get a higher desired accuracy


Now, we need to clean the data for all the null values.

null

Secondly, we need replace null values of few columns like Age and Embarked, with its mean values

emb

Now, using the 'lambda' function, convert the data from Age and Embarked to lexiographical dictionary data to numbers.

lambda

The Model was predicted using the Random Forest Classifier. In this method, a number of binary trees are used to make predictions based on the conditions. E.g. If Gender =female -> Pclass = 1st -> Survived

rfc

Using the Random Forest Classifier the output was generated and accuracy was increased upto 78.4%

output

References -

1. Higher Percentage upto 90%
2. Titanic Disaster Prediction
3. Predictive Analysis
4. Graph Plots
5. Canvas info