Multitask Deep Neural Networks for Natural Language Understading was published on ACL 2019 by Microsoft.
As the world’s largest machine learning competition platform, Kaggle always has ongoing competitions with prizes. More importantly, if you are looking for a machine learning or data science-related job, achieving good results on Kaggle can significantly enhance your resume.
Python line profiler is a very convenient package that allows you to easily see the time taken for each line of code to execute. However, a fatal flaw is that it does not support profiling in multiprocessing, and there has been an open issue on Github since 2016. Here, I provide a hacky workaround for using line profiler in multiprocessing.
When we talk about asset allocation, we refer to the distribution of funds across different types of assets, which have different risk and return. For example, stocks have high expected returns but also high risk, while government bonds are a low-risk asset with relatively low expected returns. In addition to stocks and bonds, commonly traded assets include gold, commodity futures, and real estate investment trusts (REITs). Here, we will analyze the advantages and disadvantages of various bond…
Record of my interview experience with QuantumBlack (McKinsey) as a data scientist in Singapore in 2019.
Someone asked this question on PTT (in Chinese). He trained a rectal cancer detection model on MRI images with 5 fold cross validation, but out-of-fold AUC were less than 0.5 in every folds. After some searched on Internet, he found someone said: oh, if you reverse the label (switch class 0 and 1), than you can get AUC better than 0.5, your model still learnt something. In my humble opinion, it is very dangerous to reverse label on a worst than random model. So, how to solve it?
This article will introduce how to download Wikipedia corpus and train word embedding on it. All the code will be on Github. Downloading time and training time is extremely long, so I also uploaded my pretrained embedding. You can download my pretrained embedding here: Chinese Word2Vec, Chinese FastText, English Word2Vec, English FastText.
Recently bought a new computer, encountered some strange problems and stuck for a long time. Just document the steps for setting up the environment here.