A list of awesome projects completed by me while I am on a journey to explore Data Science, AI, ML.
I am in the process of showcasing / opensourcing more of my projects here very soon. Stay Tuned !
The main task was to identify the duplicates questions asked on Quora. I focused on finding the number of unique questions, occurrences of each question, along with Feature Extraction, EDA and Text Preprocessing. I also explored Advanced Feature Extraction (NLP and Fuzzy Features) , Logistic Regression & Linear SVM with hyperparameter tuning. I also generated a WordCloud Generation.
Skills Used - Nltk, distance, BeautifulSoup, fuzzywuzzy, Numpy, Pandas, Seaborn, Matplotlib, Plotly, re, Python
Payon is a banking web-app to assist people in seamless Digital Transactions where clients can create new Bank account and get a unique account number on sign-in . They can store and edit their account details too & can also transfer (fictitious) money from one bank account to another. It was developed as a curriculum project for Database Management System .
Skills Used - Python,Django,Flask,SQL,HTML,CSS
The main task was to extract unique users for each month and calculate total number of bookings made by each, total amount spent in each month, total room nights stayed for each user for each month & then merging these summarized datasets for a collective Data Exploration.
Skills Used - Numpy, Pandas, Python
The main task was to devise the best algorithm to predict user ratings for films. I focused Minimizing RMSE, providing Data interpretability. I created a sparse matrix from data frame and found Global average of all movie ratings etc. Then I calculated User Similarity Matrix with dimensionality reduction. The most similar movies was found using similarity matrix and Matrix Factorization Techniques were used.
Skills Used - Pandas, Matplotlib, Pyplot, Sklearn, Datetime, xgboost, Seaborn, Os, Scipy, Random, Python
The main task was to solve a Mathematical equation from an image. I started with a Manual collection of the Mathematical operators & then storing them with the MNIST dataset in HDF5 format. OpenCV techniques were used to extract equations, digits, operators. Then Caffe, Tensorflow were used to train, validate, recognise the Digits and Operators. Finally an Abstract Syntax Tree model was formed to generate the result of the equation.
Skills Used - Tensoflow, Caffe, HDF5, OpenCV, Numpy, Pandas, Pillow, Sklearn, Python
The main task was to predict the probability of each data-point belonging to each of the 9 Malware classes given in the dataset. I tried using the following matrices - Multi class log-loss and Confusion matrix and performed EDA. Tried Feature Extraction; performed Multivariate, Univariate Analysis. I also tried K Nearest Neighbour Classification, Logistic Regression, Random Forest classifier and Xgboost classification with best hyper parameters using RandomSearch.
Skills Used - Tqdm, Warnings, shutil, os, Pandas, Matplotlib, Seaborn, Numpy, pickle, sklearn, Random, xgboost
The main task was to implement Neural network for semantic segmentation. I started with dataset processing, model definition and finally started Model training. It was followed by a convolutional Encoder-Decoder architecture. The Encoder in my networks was similar to vgg-16. Decoder layers were then inverse to the layers as used in the encoder. I also tried a bit of Data Augmentation.
Skills Used - Python, Tensorflow, OpenCV
The main task was to build a model that predicts the human activities such as Walking, Sitting, Standing or Laying on the basis of data collected from sensors (Accelerometer and Gyroscope) in a smartphone. ‘3-axial linear acceleration' from accelerometer & '3-axial angular velocity' from Gyroscope were used to capture sequences. Logistic Regression, Decision Tree, Random Forest etc. were used to get the accuracy etc.
Skills Used - Numpy, Pandas, Datetime, Seaborn, Sklearn, Matplotlib, Python
The main task was to predict the Heart Disease using the 14 attributes Cleveland Database. I undertook EDA, Data Visualization, Disease vs Age Frequency Correlation. I also tried to generate Decision Tree, a Learning curve for Training score & cross validation score along with Confusion Matrix, Precision score, Recall, F Score, False negative Score. I tried to compare the performance of Random Forest, Naive Bayes, KNNs.
Skills Used - Numpy, Pandas, Matplotlib, Seaborn, sklearn, Python
The main task was to focus on the factors that lead to employee attrition and explore how factors like ‘Distance from home’ or ‘Average monthly income’ effects attrition. I focused on EDA, Feature Selection, SMOTE techniques and evaluated performance using Precision etc. Predictions were made using ANNs and One Hot Encoding was also performed.
Skills Used - Keras, Numpy, Pandas, Matplotlib, Searborn, Sklearn, Python