API Series – OctoML: ML APIs must learn from their ancestors


This is a guest post for the Computer Weekly Developer Network written by Jason Knight in his role as Product Manager at OctoML – the company is known for its work bDevOps agility to ML deployment to enable developers and IT operations to build AI-powered applications.

Knight writes as follows…

We’ve all seen the surprising Exploits this modern machine learning is able to do this. But these are just the tip of the iceberg. The unsung heroes of machine learning models are the smaller models that make existing software work better – often much better – and enable new, smaller experiments that weren’t possible before.

But the difficulty of creating intelligent applications with the combination of machine learning and traditional software engineering still lies in a large amount of pain, sweat and tears.

This is largely due to the lack of stable and robust APIs for machine learning.

Bottlenecks and Technical Debt

Early traditional deep machine learning frameworks like Theano and Caffe were originally created to give data scientists APIs to define and then train models on example datasets. Deployment has often not even been considered or left in the background, as these frameworks were written by and for the academic machine learning community.

Later TensorFlow and later PyTorch increased the flexibility and capabilities available to data scientists and PyTorch’s close adoption of the Python interpreter and language as the primary ML API has allowed great advancements in ergonomics and flexibility for the data scientist.

But these benefits come at a cost. PyTorch’s Python-language-as-model-definition approach makes it difficult to send models to other stacks or production devices.

This contributes to the difficulties in bringing machine learning models from development to production. See this blog post telling a brief history of PyTorch compromises by creator Soumith Chintala, or the infamous and still very applicable google paper“Machine Learning: The High Interest Credit Card of Technical Debt” for a deeper dive into these trade-offs and challenges.

In order to deal with the resulting complexities of coupling Python code with the ML model definition, PyTorch code is often “thrown over the wall” from the development teams in organizations to the operations or production teams who are responsible porting, maintaining or deploying this code on APIs and production systems.

Does this sound familiar to anyone who has done software development before we had APIs to automatically test, provision, and monitor deployed software?

Enabling ML Developers

To accelerate the advancement of ML to power the intelligent applications of tomorrow, we need to make it easier for data scientists to deploy their own code by giving them tools to match their development APIs to production APIs, whether in the cloud or at the edge. .

The only way to achieve this is to create better abstractions (APIs) and platforms that still retain the flexibility that developers enjoy today, but also enable hardware portability and performance without the manual effort of porting/ of integration.

We are starting to see the first signs of this with libraries and tools that better encapsulate the complexity to allow users to do more with less. Examples of this include HuggingFace’s Transformer Library and BentoML.

We are also seeing end-to-end machine learning platforms (AKA hosted ML API offerings). These platforms can be useful for people new to the space because they allow ML development APIs and ML hosted APIs to be more seamless by design, but it remains to be seen if they become the predominant way to do machine learning. An interesting historical data point to use for comparison is the classic SW engineering world where we’ve seen slight success for end-to-end development platforms like Heroku, but in general, software development today Today is still largely done on a mix of hosted and non-hosted solutions that are combined by teams in different ways.

Another possibility for how ML development APIs will be more closely aligned with production APIs is the rise of foundational patterns – a smaller set of large, flexible, community-created patterns. These fundamental models are distributed freely by a small set of sophisticated ML hubs and are then refined or designed quickly for a given purpose.

This could reduce ML engineering workflows enough to simplify the problem of aligning development APIs and production APIs. The distribution of flexible fundamental model building blocks also bears analogies to the rise of open source playbooks and APIs like the LAMP stack in early web programming, SQLite in embedded data storage, or MPI, and then later the Kubernetes API for distributed programming.

But only time will tell if consolidation around a smaller set of fundamental patterns (hence workflows and APIs) will outpace diversification as ML continues to grow and specialize.

What can ML developers do today?

For those of you building smart apps today, the name of the game is trying to avoid accidental complexity – as opposed to essential complexity – wherever possible. This is true for software engineering in general, but becomes even more important when adding ML to software, because the essential complexity of ML itself means you have even less to waste on your budget.

In practice, this means adopting (and building) the minimum number of new ML tools, API platforms, and services. And for those you adopt, make sure they have a limited scope, as the Unix philosophy teaches us of tools that serve unique purposes and simple APIs.

Over time, our software development and deployment APIs will continue to blend and expand enough to encompass ML just as they have grown to handle past innovations in software development before them.


Comments are closed.