Posts

Paper - Multitask Deep Neural Networks for Natural Language Understading

June 9, 2021

2-Minute Read

363 words

Multitask Deep Neural Networks for Natural Language Understading was published on ACL 2019 by Microsoft.

Kaggle Intordution for Beginners

February 14, 2021

4-Minute Read

646 words

As the world’s largest machine learning competition platform, Kaggle always has ongoing competitions with prizes. More importantly, if you are looking for a machine learning or data science-related job, achieving good results on Kaggle can significantly enhance your resume.

Using line profiler in Python Multiprocessing

September 26, 2020

5-Minute Read

884 words

Python line profiler is a very convenient package that allows you to easily see the time taken for each line of code to execute. However, a fatal flaw is that it does not support profiling in multiprocessing, and there has been an open issue on Github since 2016. Here, I provide a hacky workaround for using line profiler in multiprocessing.

Bond Selection in Asset Allocation

August 29, 2020

5-Minute Read

930 words

When we talk about asset allocation, we refer to the distribution of funds across different types of assets, which have different risk and return. For example, stocks have high expected returns but also high risk, while government bonds are a low-risk asset with relatively low expected returns. In addition to stocks and bonds, commonly traded assets include gold, commodity futures, and real estate investment trusts (REITs). Here, we will analyze the advantages and disadvantages of various bond…

QuantumBlack (McKinsey) Singapore Data Scientist Interview

December 21, 2019

7-Minute Read

1327 words

Record of my interview experience with QuantumBlack (McKinsey) as a data scientist in Singapore in 2019.

How to Solve AUC Less than 0.5 Problem

March 11, 2019

2-Minute Read

384 words

Someone asked this question on PTT (in Chinese). He trained a rectal cancer detection model on MRI images with 5 fold cross validation, but out-of-fold AUC were less than 0.5 in every folds. After some searched on Internet, he found someone said: oh, if you reverse the label (switch class 0 and 1), than you can get AUC better than 0.5, your model still learnt something. In my humble opinion, it is very dangerous to reverse label on a worst than random model. So, how to solve it?

How to Train Word2vec and FastText Embedding on Wikipedia Corpus

February 8, 2019

3-Minute Read

583 words

This article will introduce how to download Wikipedia corpus and train word embedding on it. All the code will be on Github. Downloading time and training time is extremely long, so I also uploaded my pretrained embedding. You can download my pretrained embedding here: Chinese Word2Vec, Chinese FastText, English Word2Vec, English FastText.

Install TensorFlow 1.8.0 on Ubuntu 18.04

May 15, 2018 Latest update at:
May 20, 2023

2-Minute Read

339 words

Recently bought a new computer, encountered some strange problems and stuck for a long time. Just document the steps for setting up the environment here.

Paper - Multitask Deep Neural Networks for Natural Language Understading

Kaggle Intordution for Beginners

Using line profiler in Python Multiprocessing

Bond Selection in Asset Allocation

QuantumBlack (McKinsey) Singapore Data Scientist Interview

How to Solve AUC Less than 0.5 Problem

How to Train Word2vec and FastText Embedding on Wikipedia Corpus

Install TensorFlow 1.8.0 on Ubuntu 18.04

Recent Posts

Why Unlimited Swiping Right Fails Men: A Data Scientist’s View

Five Years of Investment Experience and Changes

Semi Join and Anti Join in SQL

Sending Google Analytics Events on Every Screen in Flutter

Building a Standout Entry-Level Machine Learning Engineer Resume

Categories

Tags