This article is more than one year old. Older articles may contain outdated content. Check that the information in the page has not become incorrect since its publication.

Announcing Kubeflow 0.1

By Jeremy Lewi (Google), David Aronchick (Google) | Friday, May 04, 2018

Since Last We Met

Since the initial announcement of Kubeflow at the last KubeCon+CloudNativeCon, we have been both surprised and delighted by the excitement for building great ML stacks for Kubernetes. In just over five months, the Kubeflow project now has:

70+ contributors
20+ contributing organizations
15 repositories
3100+ GitHub stars
700+ commits

and already is among the top 2% of GitHub projects ever.

People are excited to chat about Kubeflow as well! The Kubeflow community has also held meetups, talks and public sessions all around the world with thousands of attendees. With all this help, we’ve started to make substantial in every step of ML, from building your first model all the way to building a production-ready, high-scale deployments. At the end of the day, our mission remains the same: we want to let data scientists and software engineers focus on the things they do well by giving them an easy-to-use, portable and scalable ML stack.

Introducing Kubeflow 0.1

Today, we’re proud to announce the availability of Kubeflow 0.1, which provides a minimal set of packages to begin developing, training and deploying ML. In just a few commands, you can get:

Jupyter Hub - for collaborative & interactive training
A TensorFlow Training Controller with native distributed training
A TensorFlow Serving for hosting
Argo for workflows
SeldonCore for complex inference and non TF models
Ambassador for Reverse Proxy
Wiring to make it work on any Kubernetes anywhere

To get started, it’s just as easy as it always has been:

# Create a namespace for kubeflow deployment
NAMESPACE=kubeflow
kubectl create namespace ${NAMESPACE}
VERSION=v0.1.3

# Initialize a ksonnet app. Set the namespace for its default environment.
APP_NAME=my-kubeflow
ks init ${APP_NAME}
cd ${APP_NAME}
ks env set default --namespace ${NAMESPACE}

# Install Kubeflow components
ks registry add kubeflow github.com/kubeflow/kubeflow/tree/${VERSION}/kubeflow
ks pkg install kubeflow/core@${VERSION}
ks pkg install kubeflow/tf-serving@${VERSION}
ks pkg install kubeflow/tf-job@${VERSION}

# Create templates for core components
ks generate kubeflow-core kubeflow-core

# Deploy Kubeflow
ks apply default -c kubeflow-core

And thats it! JupyterHub is deployed so we can now use Jupyter to begin developing models. Once we have python code to build our model we can build a docker image and train our model using our TFJob operator by running commands like the following:

ks generate tf-job my-tf-job --name=my-tf-job --image=gcr.io/my/image:latest
ks apply default -c my-tf-job

We could then deploy the model by doing

ks generate tf-serving ${MODEL_COMPONENT} --name=${MODEL_NAME}
ks param set ${MODEL_COMPONENT} modelPath ${MODEL_PATH}
ks apply ${ENV} -c ${MODEL_COMPONENT}

Within just a few commands, data scientists and software engineers can now create even complicated ML solutions and focus on what they do best: answering business critical questions.

Community Contributions

It’d be impossible to have gotten where we are without enormous help from everyone in the community. Some specific contributions that we want to highlight include:

Argo for managing ML workflows
Caffe2 Operator for running Caffe2 jobs
Horovod & OpenMPI for improved distributed training performance of TensorFlow
Identity Aware Proxy, which enables using security your services with identities, rather than VPNs and Firewalls
Katib for hyperparameter tuning
Kubernetes volume controller which provides basic volume and data management using volumes and volume sources in a Kubernetes cluster.
Kubebench for benchmarking of HW and ML stacks
Pachyderm for managing complex data pipelines
PyTorch operator for running PyTorch jobs
Seldon Core for running complex model deployments and non-TensorFlow serving

It’s difficult to overstate how much the community has helped bring all these projects (and more) to fruition. Just a few of the contributing companies include: Alibaba Cloud, Ant Financial, Caicloud, Canonical, Cisco, Datawire, Dell, GitHub, Google, Heptio, Huawei, Intel, Microsoft, Momenta, One Convergence, Pachyderm, Project Jupyter, Red Hat, Seldon, Uber and Weaveworks.

Learning More

If you’d like to try out Kubeflow, we have a number of options for you:

You can use sample walkthroughs hosted on Katacoda
You can follow a guided tutorial with existing models from the examples repository. These include the GitHub Issue Summarization, MNIST and Reinforcement Learning with Agents.
You can start a cluster on your own and try your own model. Any Kubernetes conformant cluster will support Kubeflow including those from contributors Caicloud, Canonical, Google, Heptio, Mesosphere, Microsoft, IBM, Red Hat/Openshift and Weaveworks.

There were also a number of sessions at KubeCon + CloudNativeCon EU 2018 covering Kubeflow. The links to the talks are here; the associated videos will be posted in the coming days.

What’s Next?

Our next major release will be 0.2 coming this summer. In it, we expect to land the following new features:

Simplified setup via a bootstrap container
Improved accelerator integration
Support for more ML frameworks, e.g., Spark ML, XGBoost, sklearn
Autoscaled TF Serving
Programmatic data transforms, e.g., tf.transform

But the most important feature is the one we haven’t heard yet. Please tell us! Some options for making your voice heard include:

The Kubeflow Slack channel
The Kubeflow-discuss email list
The Kubeflow twitter account
Our weekly community meeting
Please download and run kubeflow, and submit bugs!