Titanic Disaster Survival Prediction System

About The Disaster -
The sinking of the Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew.
Getting Started on Challenge -
The objective of this competition is to learn from a given trained data set to predict who would survive or perish in the Titanic disaster. In essence, we will create our own Machine Learning (ML) model that, using the provided trained data, will automatically predict the passenger survivability.
Submission -
Datasets : Three data sets are provided to us for this competition: the training dataset (train.csv), the test dataset (test.csv) and gender submission outputs (gender_submission.csv). The earlier one is used to train our Machine Learning model, and the latter one is used to test it.

train_data.head()
test_data.head()
gender.head()
Survivals : To get an estimate of what percentage of men and women have survived, we can make use of the “Survived” attribute and calculate the respective survived percentages as shown in the below code snippet.

Initial Score :
After the initial submission followed by Kaggle titanic tutorial,
the initial score was 77.5 %
Contribution - RandomForest Classifier
The objective was to improve the original score that was secured by following the titanic tutorial in Kaggle.
Firstly, identify the unique values of the training dataset.

Now, visualizing the dataset Bar graph

Here, we can see the rate of survivability of women is higher than men and thus lot of other features are too, need to be considered to get a higher desired accuracy
Now, we need to clean the data for all the null values.

Secondly, we need replace null values of few columns like Age and Embarked, with its mean values

Now, using the 'lambda' function, convert the data from Age and Embarked to lexiographical dictionary data to numbers.

The Model was predicted using the Random Forest Classifier. In this method, a number of binary trees are used to make predictions based on the conditions. E.g. If Gender =female -> Pclass = 1st -> Survived

Using the Random Forest Classifier the output was generated and accuracy was increased upto 78.4%

References -
1. Higher Percentage upto 90%
2. Titanic Disaster Prediction
3. Predictive Analysis
4. Graph Plots
5. Canvas info