The productization of ChatGPT has set off another wave of AI fever in the world. This time, unlike AlphaGo, everyone can register for an account to play. In addition to text-generating AI, there is also image-generating AI that produces images that are indistinguishable from human works, and anyone can try it out for free. Even people who don’t know coding can easily use them, which will definitely affect the work of machine learning engineers, but I think machine learning engineers can continue to contribute value.
Machine learning jobs today
In terms of model development, there are two categories of machine learning work today: 1. developing new model architectures, publishing papers, and 2. applying existing models to your own company’s scenarios. Developing new models is actually very broad, not just proposing a new architecture and training a model that works well, but also many other tasks, such as proposing new optimizers, new activation functions, proving mathematical theories, and so on. These tasks will not be replaced by generative AI, and will continue to exist in the future.
Applying existing models to your own company’s scenarios, with ChatGPT, you may be able to replace some of the conversational robots, and if not, do some finetuning and then you’re done. ChatGPT just like a non-open-source pre-training BERT, EfficeintNet, a huge pre-training model. Directly use them gives us basic results, but finetune gives better results. How to finetune is where machine learning engineers can contribute value.
Data is always the most important
The most important thing for the performance of a model is always the data. Choose the most suitable model for the moment and simply adjust a set of parameters will get a resonalble performance. Spending too much effort on the model structure and parameters will improve the results, but not too much. Especially if you are using a pre-trained model, the original pre-training will be ineffective if you change the structure too much. Without data and computational resources, re-pre-training could even get a worse performance. Changing the method of using data, or features, or adding new data always has a chance to improve the performance significantly. When I competed Kaggle competitions a few years ago, most of the winners had good observation of the data, and then made smart pre-processing or found good features, so they got better perfromance than others and won the prize. It has been a long time since I have heard of a new invented model becoming the key to winning a prize (or maybe I am just ignorant, I haven’t played Kaggle for a long time). The value of a machine learning engineer is how to properly process data and find good features.
In addition to features, how to properly translate a business problem into a mathematical problem, design appropriate metrics, and then divide the data into training data, validation data, and interpretation of experimental results is also very important. These are all things not related to the model but very important to the results. There are already many existing products that can be used for modeling without writing even a single line of code, but machine learning engineers are not replaced by those products. There will be less programming, and machine learning engineers can focus on contributing value in more irreplaceable (and perhaps more interesting to someone) places.
Mathematics is the underlying logic of machine learning
Eight years ago, AI got its first performance beyond humans in image classification, and today, with all kinds of new models appearing and disappearing, many models are no longer used. ChatGPT is also being updated and retrained every year to get significantly better results, while the old models are being discarded. You can never catch up with the all latest models.
Anyone familiar with machine learning must be familiar with calculus, which has been invented for hundreds of years, with some new theorems being proved and some proofs being rewritten more elegantly. No matter how much has happened in between, derivatives and integrals are the same. In data-driven work, mathematics is always the most important. It is not about knowing difficult mathematics such as real analysis, stochastic calculus, etc., or proving basic theorems of calculus with eyes closed. Rather, it is about really understanding the mathematics used, and knowing how to translate it into a machine learning solvable problem when you get a business problem. When the result is not as expected, then we know what happened and how to fix it.
Maybe in the future finetune will no longer need to do back probagation to update model parameters, and maybe more people will be able to use machine learning to solve problems, but $\frac{d}{dx}e^x$ will still be $e^x$.