4-Minute Read

646 words

As the world’s largest machine learning competition platform, Kaggle always has ongoing competitions with prizes. More importantly, if you are looking for a machine learning or data science-related job, achieving good results on Kaggle can significantly enhance your resume.

5-Minute Read

930 words

When we talk about asset allocation, we refer to the distribution of funds across different types of assets, which have different risk and return. For example, stocks have high expected returns but also high risk, while government bonds are a low-risk asset with relatively low expected returns. In addition to stocks and bonds, commonly traded assets include gold, commodity futures, and real estate investment trusts (REITs). Here, we will analyze the advantages and disadvantages of various bond…

2-Minute Read

384 words

Someone asked this question on PTT (in Chinese). He trained a rectal cancer detection model on MRI images with 5 fold cross validation, but out-of-fold AUC were less than 0.5 in every folds. After some searched on Internet, he found someone said: oh, if you reverse the label (switch class 0 and 1), than you can get AUC better than 0.5, your model still learnt something. In my humble opinion, it is very dangerous to reverse label on a worst than random model. So, how to solve it?

3-Minute Read

583 words

This article will introduce how to download Wikipedia corpus and train word embedding on it. All the code will be on Github. Downloading time and training time is extremely long, so I also uploaded my pretrained embedding. You can download my pretrained embedding here: Chinese Word2Vec, Chinese FastText, English Word2Vec, English FastText.

Recent Posts

Categories

Tags