One step back and let’s talk about mlops
Machine learning operations (MLOps) is a critical component of successful Data Scientist project deployment. It’s a process that helps organizations and businesses generate long-term value and reduce risk associated with data science, machine learning and AI.
Still being a partially new concept, the question is why is it so valuable in organizations
and what is it, The answer is still in process but in this post I’m going to talk about one component of mlops, what is mlflow for the lifecycle of machine learning.
Mlflow is one piece of Mlops puzzle
Hi everyone im felipe veloso in a new post of feedingthemachine.ai, in this post im gonna talk about the open source platform mlflow, which is built and actively contributed by databricks and its open source. It works to bring support to the ML lifecycle so you can deploy in different environments and gives the ability for everyone in the team to reproduce the experiments in different instances.
In the Advanced Analytics world we probably gonna build a lot of different experiments in our day to day job and the result of the role of the data scientist or ai engineer is just to create a script (python,r even java) but just that, a script (probably in a jupyter notebook or in a python script). So when you try to send the model in a pickle format you may have a lot of trouble in the metadata or also an overwrite problem for the names of the file.
What is mlflow
MLflow is a open source platform for the machine learning lifecycle and tries to solve the main issue off the experiment and retrain of the ML models.It brings an opportunity to map all of the experiments and models you build and create a artifact with this. Apart of this, it brings a very useful UI to visualize all of the models that you or your team create in a centralized environment even the metadata and the hyperparameters, so when you need to compare different models or metrics you can do this with a pair of clicks.
Another element to consider, is that when you try to put this in another environment or use the model in AB test, you can easily deploy with the help of mlops and the artifact that you create.
In mlflow there are several modules with no interdependency, so you can use one or two modules or the four of them. All of this helps in the different tasks of publishing a module in different environments or explaining the hyperparameters to another coworker of your organization.
So let’s start with the module of tracking.
Mlflow Tracking API
MLflow Tracking is organized around the concept of runs, which are executions of some piece of data science code.
you can record in your environment (or a remote server) all of your successful tests, even put them on a cloud (when you work with the mlflow project module) and give visibility to your models. Also, you can run experiments from your environment or github with a practical ui to use post training your model. If you have a remote server, you can record all of the metrics and more in a centralized way and have the visualization for all.
Proof of Concept
in the repo https://github.com/feedingthemachine/exblog/tree/master/heart_mlflow you can download the code and recreate the experiment.
in this case we are using the localhost scenario so we keep the mlruns in your environment.
In this simple scenario, the MLflow client uses the following interfaces to record MLflow entities and artifacts:
- An instance of a LocalArtifactRepository (to store artifacts)
- An instance of a FileStore (to save MLflow entities)
For the commands you are gonna need to install and run mlflow, the rest is very similar when you are working in a jupyter environment.
pip3 install mlflow mlflow ui
for a more clarifying moment let’s see the video.