Projects | Lokesh

AML Name Screening

Develop Models, Algorithms for Primary and Secondary Name Screening Solution.
The approach uses proprietary multilayered NLP and supervised techniques which combine approaches in improved matching and techniques (handles typos, spelling errors, titles, prefix/suffix, etc.) and detailed analysis of secondary information obtained either from internal sources or externally available sources.

Skills Used

Transaction Monitoring

Develop Data transformations (ETL), Models, Algorithms for Primary and Secondary Transaction monitoring.
The approach uses a proprietary semi-supervised technique that is based on a combination of multi-dimensional unsupervised techniques, network analysis, and supervised learning to detect complex money laundering structuring mechanisms.
Deployed in various multinational banks and payment providers in APAC, North-America, European region.

Skills Used

Quora Question Pair Similarity Detection

The main task was to identify the duplicates questions asked on Quora. I focused on finding the number of unique questions, occurrences of each question, along with Feature Extraction, EDA and Text Preprocessing. I also explored Advanced Feature Extraction (NLP and Fuzzy Features) , Logistic Regression & Linear SVM with hyperparameter tuning. I also generated a WordCloud Generation.

Skills Used - Nltk, distance, BeautifulSoup, fuzzywuzzy, Numpy, Pandas, Seaborn, Matplotlib, Plotly, re, Python

Payon

Payon is a banking web-app to assist people in seamless Digital Transactions where clients can create new Bank account and get a unique account number on sign-in . They can store and edit their account details too & can also transfer (fictitious) money from one bank account to another. It was developed as a curriculum project for Database Management System .

Skills Used - Python,Django,Flask,SQL,HTML,CSS

Oyo Rooms Data Analysis

The main task was to extract unique users for each month and calculate total number of bookings made by each, total amount spent in each month, total room nights stayed for each user for each month & then merging these summarized datasets for a collective Data Exploration.

Skills Used - Numpy, Pandas, Python

Netflix Movie Recommendation System

The main task was to devise the best algorithm to predict user ratings for films. I focused Minimizing RMSE, providing Data interpretability. I created a sparse matrix from data frame and found Global average of all movie ratings etc. Then I calculated User Similarity Matrix with dimensionality reduction. The most similar movies was found using similarity matrix and Matrix Factorization Techniques were used.

Skills Used - Pandas, Matplotlib, Pyplot, Sklearn, Datetime, xgboost, Seaborn, Os, Scipy, Random, Python

Mathematics Equation Solver

The main task was to solve a Mathematical equation from an image. I started with a Manual collection of the Mathematical operators & then storing them with the MNIST dataset in HDF5 format. OpenCV techniques were used to extract equations, digits, operators. Then Caffe, Tensorflow were used to train, validate, recognise the Digits and Operators. Finally an Abstract Syntax Tree model was formed to generate the result of the equation.

Skills Used - Tensoflow, Caffe, HDF5, OpenCV, Numpy, Pandas, Pillow, Sklearn, Python

Malware Detection Analysis

The main task was to predict the probability of each data-point belonging to each of the 9 Malware classes given in the dataset. I tried using the following matrices - Multi class log-loss and Confusion matrix and performed EDA. Tried Feature Extraction; performed Multivariate, Univariate Analysis. I also tried K Nearest Neighbour Classification, Logistic Regression, Random Forest classifier and Xgboost classification with best hyper parameters using RandomSearch.

Skills Used - Tqdm, Warnings, shutil, os, Pandas, Matplotlib, Seaborn, Numpy, pickle, sklearn, Random, xgboost

Image Segmentation

The main task was to implement Neural network for semantic segmentation. I started with dataset processing, model definition and finally started Model training. It was followed by a convolutional Encoder-Decoder architecture. The Encoder in my networks was similar to vgg-16. Decoder layers were then inverse to the layers as used in the encoder. I also tried a bit of Data Augmentation.

Skills Used - Python, Tensorflow, OpenCV

Human Activity Recognizer

The main task was to build a model that predicts the human activities such as Walking, Sitting, Standing or Laying on the basis of data collected from sensors (Accelerometer and Gyroscope) in a smartphone. ‘3-axial linear acceleration' from accelerometer & '3-axial angular velocity' from Gyroscope were used to capture sequences. Logistic Regression, Decision Tree, Random Forest etc. were used to get the accuracy etc.

Skills Used - Numpy, Pandas, Datetime, Seaborn, Sklearn, Matplotlib, Python

Heart Disease Prediction

The main task was to predict the Heart Disease using the 14 attributes Cleveland Database. I undertook EDA, Data Visualization, Disease vs Age Frequency Correlation. I also tried to generate Decision Tree, a Learning curve for Training score & cross validation score along with Confusion Matrix, Precision score, Recall, F Score, False negative Score. I tried to compare the performance of Random Forest, Naive Bayes, KNNs.

Skills Used - Numpy, Pandas, Matplotlib, Seaborn, sklearn, Python

HR Employee Attrition Performance

The main task was to focus on the factors that lead to employee attrition and explore how factors like ‘Distance from home’ or ‘Average monthly income’ effects attrition. I focused on EDA, Feature Selection, SMOTE techniques and evaluated performance using Precision etc. Predictions were made using ANNs and One Hot Encoding was also performed.

Skills Used - Keras, Numpy, Pandas, Matplotlib, Searborn, Sklearn, Python