Data Science Projects
Information retrieval-based system for The Kaggle Allen AI Science Challenge
December 2015 to February 2016
I developed an information retrieval-based system to compete in the Kaggle Allen AI Science Challenge. The challenge requires participants to develop systems for predicting correct answers for “multiple-choice” questions. Questions are from 8th grade science exam. My system got the rank 42 among 171 teams. In this project, I used Perl programming language for text processing, Apache Lucy for information retrieval, and Stanford CoreNLP for sentence parsing. I gained many experiences with a data science project, and using information retrieval library.
Machine Learning/NLP Work
Machine Learning Dojo Project (Study project)
March 1st, 2016 ~ March 9th, 2016
Project page: https://github.com/minhpqn/Machine-Learning-Dojo
For the purpose of improving my Python programming skill and refreshing my Machine Learning knowledge, I re-implement all programming assignments in the course Machine Learning (touch by Andrew Ng) in Python. In the course, previous programming assignments were implemented in Octave. I improved experiences of using Python scientific libraries: numpy, scikit-learn, scipy, and matplotlib. For your information, Dojo (in Japanese道場, in Vietnamese: “đạo tràng”) is “aJapanese term which refers to a training place specially for Japanese martial arts such as aikido, judo, karate, or samurai.”
100 NLP Drill Exercises (Study project)
Project page: https://github.com/minhpqn/nlp_100_drill_exercises
For the purpose of improving my Python programming skill for NLP and my Japanese language skill, I translated 100 NLP drill exercises provided by a NLP Laboratory in Tohoku university from Japanese to Vietnamese. This collection of exercises is a good starting point for those who are “newbies” to Natural Language Processing and want to improve their programming skills in NLP. It is even good for those who are learning a programming language and want to practice with non-trivial exercises.