Education
University of Glasgow
MSc Data Science (2021-22)
Deep understanding on relevant data science topics such as information retrieval, recommendation systems, machine learning, deep learning, and neural networks.
Knowledge of word2vec-based query expansion model, sentiment analysis, stance detection, text representation, and text classification, and text pre-processing including tokenization, normalization, and vectorization.
In-depth knowledge of Information visualization concepts and hands-on experience with the Altair and Tableau library.
2.1 (passed with Merits)
University of Kerala
B. Tech Information Technology (2008-12)
Government Engineering College, Barton Hill
Strong foundation in Internet Technology, Design and Analysis of Algorithms, Theory of Computation, and Engineering Mathematics with a CGPA of 82% (Distinction).
Activities and societies: An active participant in college IT-Techfest called “Inceptra”.
Member of College Magazine Editorial Board - Articles on socially relevant topics found space in college yearly magazines.
A remarkable placement record by obtaining job offers from two leading Multi-National Companies (Infosys Ltd and Accenture India) for the role of Software Developer during the campus placement drive conducted at college.
Projects
Multi-lingual Stance Detection on Social Media, MS Dissertation
The objective of stance detection is to ascertain a text's (or author’s) attitude toward a certain topic or statement. It is a key component of many NLP tasks such as rumour confirmation, fact-checking, and fake news detection.
Due to the lack of annotated data in other languages, the majority of stance detection research has focused on English.
In this dissertation, I focussed on the problem of multi-lingual stance detection on social media platforms.
I experiment the difficulty of text encoding in a multilingual setting using several machine learning algorithms including SVM, LR and transformer models such as BERT, M-BERT, XLM, and XLM-RoBERTa and compare the performance of various models.
Also experiment how language and label imbalances within the dataset contribute to the difficulty of the SD task.
Annotated dataset used for this experiment contains tweets on various political debates in five different languages: English, Spanish, Catalan, French, and Italian.
Extensive experiments show that Multilingual Language Models outperform both traditional classifiers and language-specific state-of-the-art transformer models.
Supervised Text classification on Reddit Posts, second semester, MS
Explored reddit dataset, created custom tokeniser, applied vectorizer and classified reddit posts using various classifier algorithms including Logistic Regression and SVC.
Applied feature engineering using Gensim Word2Vec word Embedding.
Performed hyper parameter tuning and manual error analysis. Performance evaluated using various metrics such as Precision, recall, F1-score and confusion matrix.
Batch-based text search and filtering pipeline in Apache Spark, second semester, MS
Filtered out and ranked top 10 news articles from a large dataset, Washington Post corpus on basis on their relevance towards queries from QuerySet.
News articles were pre-processed -removed stopwords, applied stemming and tokenization
Ranked the documents for each query based on DPH score by consuming DPH score calculator API.
Code deployed and tested locally and remotely.
Attendance Marking using Mobile App, B.Tech Dissertation
Developed an Android application for recording and managing students attendance across the university.