Portfolio of Machine Learning Projects, I have worked for learning and academic purposes. I have been working on different datasets like Amazon Fine Food Reviews, Personalised Cancer diagnosis, What's happening in LA (real time dataset).
- Machine Learning
- Personalised Cancer Diagnosis : Testing out several different supervised learning algorithms to build a model that accurately predicts a given genetic variations/mutations based on evidence from text-based clinical literature using various statistical analysis tools.
- Quora Question Pair Similarity Problem : A dataset where prediction has to be made whether two questions are similar or not. Explored new featurization methods for similarity of two sentences like fuzzywuzzy ratio. Hypertuned the GBDT(XGBoost) model using inbuilt XGBoost methods.
- Natural Language Processing
- KNN on Amazon Fine Food Reviews : Used KNN algorithm to predict the polarity of the reviews. Used various techniques to vectorize the text and improve the accuracy.
- Logistic Regression on Amazon Fine Food Reviews : Built a model using logistic regression on Amazon Fine Food Reviews using various vectorizing techniques.
- SGD Implemention : Implemented own stochastic gradient descent and gradient descent and compared it with scikit-learn implementation on Boston Housing Price Dataset.
- Decision Tree on Amazon Fine Food Reviews : Explored decision tree model on Amazon Fine Food Reviews using various vectorising techniques.
- Random Forest vs GBDT(XGBoost) vs GBDT(LightGbm) : A comparison between random forest implemented using scikit learn and gradient boosted decision tree implemented using XGBoost and LightGBM on Amazon Fine Food Reviews Dataset.
- Kmeans Clustering vs Agglomerative Clustering vs DBScan Clustering : A brief comparison between Kmeans clustering, Agglomerative Clustering and DBScan clutering and how they work.
- Truncated SVD With KMeans Clustering : Created Co-occurence matrix (not co-variance matrix) and performed truncated singular value decomposition and used kmeans clustering to form the clusters of words.
- Linear SVM vs RBF SVM on Amazon Fine Food Reviews : A brief comparison between Linear Kernal and RBF kernal of SVM on Amazon Fine Food Reviews, how well they perform using AUC score as a performance metric.
- Data Analysis And Visualization
- TSNE on Amazon Fine Food Reviews : Perfomed TSNE on Amazon Fine food Reviews to reduce the Word Vector Dimensions And See if Positive and Negative reviews can be separated or not.
- Dashboard Of NYPD Motor Vehicle Collisions: Dashboard of NYPD Motor Vehicle Collisions. This Dataset is updated almost daily. This dashboard is configured to run daily at 0000hrs UTC enabling the visuals to be upto date.
- EDA on Haberman's Survival Dataset : Exploratory data analysis done on Haberman's Cancer Survival study Dataset
- Deep Learning (Keras)
- Amazon Fine food Reviews with LSTM : A dataset cotaining 500k datapoints of reviews of food by users on Amazon. Used LSTM to predict the polarity of the reviews
- Music Generation Using Char RNN : Generating a good quality music after training on around 1850 data points. The files has been downloaded from here and compiled into one dataset. The dataset contains music in abc format and music is generated using Character Rnn.
- Deep Learning (Pytorch)
- Predction of Fashion type of Image : A simple model predicting the type of fashion of an image. The dataset is the exmaple Fashion MNIST dataset. The model is created in pytorch using two methods : first directly creating the model and the second with creaing a class inheriting the nn.Module class.