Follow the instructions on the Petuum documentation page for setup instructions.
Petuum provides essential distributed programming tools to tackle the challenges of running ML at scale: Big Data (many data samples) and Big Models (very large parameter and intermediate variable spaces). Unlike general-purpose distributed programming platforms, Petuum is designed specifically for ML algorithms. This means that Petuum takes advantage of data correlation, staleness, and other statistical properties to maximize the performance for ML algorithms, realized through core features such as Bösen, a bounded-asynchronous distributed key-value store, and Strads, a dynamic ML update scheduler.
For an overview of the science behind Petuum, our Research page provides links to 7 tutorials and presentations, and 8 accepted journal and conference papers.
Petuum allows you to run bigger models faster on less hardware, without compromising accuracy. Some speed highlights:
Cloud compute instances with 16 cores and 128GB memory are available for less than $2/hr from providers like Amazon and Google.
Beyond open source, Petuum has been used to develop industrial-strength solutions for big problems, that take advantage of big hardware capacity:
Petuum comes from "perpetuum mobile," which is a musical style characterized by a continuous steady stream of notes. Paganini's Moto Perpetuo is an excellent example. In turn, Bsen and Strads embody the well-known Bsendorfer piano and Stradivarius violin -- fine instruments, played by Liszt and Paganini respectively. It is our goal to build a system that runs efficiently and reliably -- in perpetual motion.
Our paper, On Convergence of Model Parallel Proximal Gradient Algorithm for Stale Synchronous Parallel System, is accepted to AISTATS 2016! This work lays the theoretical foundations for combining model-parallelism (Strads) and Stale Synchronous Parallel (Bösen) to achieve even higher levels of performance than before. This will be featured in our soon-to-come 2nd generation system that mixes the best parts of Bösen and Strads, and incorporates new productivity features such as container management and elasticity.
Our paper, STRADS: A Distributed Framework for Scheduled Model Parallel Machine Learning, is accepted to Eurosys 2016! This paper showcases our high-performance STRADS system, built to schedule and prioritize ML computations for maximum performance and minimum quality loss, achieving 10x or higher speed increases over traditional, non-scheduled systems.
We have posted a new arXiv paper, Strategies and Principles of Distributed Machine Learning on Big Data. This paper explains four keystone points about building Big, distributed ML systems that we distilled from our years of experience building Petuum. By composing these principles and strategies, significant and near-ideal speedup can be realized for not just one or two, but a wide range of ML algorithms. All research can also be found at our Research page.
As a follow up to the v1.0 Poseidon release, we have posted an arXiv paper explaining the unique features of Poseidon, such as Distributed Wait-Free Backpropagation, Structure-Aware Communication, a 3-level CPU/GPU hybrid architecture, and Sufficient Factor Broadcasting. All research can also be found at our Research page.
Our paper Distributed Machine Learning via Sufficient Factor Broadcasting is now available as an arXiv preprint. This paper shows how, instead of relying on a centralized parameter server architecture, one can use compressed peer-to-peer (P2P) messages called "Sufficient Factors" to get a 10x speedup in Deep Learning (e.g. our Poseidon system) and other matrix-based applications such as sparse coding and multiclass logistic regression. Sufficient Factor Broadcasting complements the managed communication and Stale Synchronous Parallel methods previously developed in Petuum, enabling additional speedups on top of each other.
We are pleased to announce the v1.0 release of Poseidon, our open-source distributed GPU deep learning platform built upon the popular Caffe framework! Poseidon inherits many functionalities and benefits of Petuum, including managed communication, bandwidth management, sufficient factor broadcasting, and offers excellent scalability with additional GPU machines. Moreover, almost all the Caffe interfaces have been kept unchanged, so that existing Caffe users will find Poseidon easy to use.
Eric Xing gave a keynote speech at ACML 2015: How to Go Really Big in AI: Strategies & Principles for Distributed Machine Learning. This talk covers 4 major principles for developing distributed Machine Learning and AI systems for Big Data and Big Models, and provides an excellent summary of the driving philosophy behind Petuum's development! While at ACML, we also gave a tutorial: A New Look at the System, Algorithm and Theory Foundations of Distributed Machine Learning.
Our friends at Microsoft have released their Distributed ML Toolkit! This Toolkit has proud origins in our LightLDA (an extremely fast topic model Gibbs sampler) collaboration published at WWW 2015 and our Bosen parameter server system published at NIPS 2013, AAAI 2014 and SoCC 2015. We are estatic to see this continued industrial adoption of Petuum, and we will soon be offering NLP services based on LightLDA - please continue to watch this space!
Our paper Managed Communication and Consistency for Fast Data-Parallel Iterative Analytics has won the best paper award at SOCC'15! This work shows how consistency models and network bandwidth management, built on top of the Petuum Bösen system, can be combined to improve the performance of data-parallel ML, without sacrificing correctness.
Our paper Petuum: A New Platform for Distributed Machine Learning on Big Data is accepted to the inaugural issue of IEEE Transactions on Big Data! This work is a longer journal version of our KDD'15 paper, and provides a self-contained introduction to the Petuum platform, with new theoretical results on correctness and speed.
Eric Xing and Qirong Ho gave a KDD 2015 tutorial: A New Look at the System, Algorithm and Theory Foundations of Distributed Machine Learning.
Eric Xing and Qirong Ho gave an IJCAI 2015 tutorial: A New Look at the System, Algorithm and Theory Foundations of Distributed Machine Learning.
Eric Xing gave a talk at the Data Science Summit 2015: Petuum: A New Platform for Distributed Machine Learning on Big Data.
Petuum v1.1 officially released on GitHub. Highlights: Java language support, YARN+HDFS support for running on Hadoop clusters, distributed GPU support for the CNN app.
Eric Xing gave a WWW 2015 tutorial: The Algorithm and System Interface of Distributed Machine Learning.
Our paper Petuum: A New Platform for Distributed Machine Learning on Big Data is accepted to KDD'15! This work ties together the data- and model-parallel aspects of Petuum as a unified framework for large-scale distributed ML.
Eric Xing gave a WSDM 2015 Winter School talk: The Algorithm and System Interface of Distributed Machine Learning.
Our LightLDA paper is accepted to WWW'15! LightLDA is the world's biggest topic model, running on standard hardware that costs 100x less than competing systems. Coming soon: a large-scale LDA server powered by Petuum LightLDA, that you can try from your browser!
Petuum v1.0 officially released on GitHub. Highlights: many new ML algorithms (our library is over 2x larger!), improved Parameter Server performance, and a new interface for Strads.
Eric Xing gave a keynote speech at the Big Data Technology Conference 2014 in China: A New Platform for Cloud-Based Distributed Machine Learning on Big Data.
Petuum beta release v0.9 is officially on GitHub. Highlights: improved SSP parameter server performance, parameter server snapshots for fault recovery, single-machine out-of-core parameter server, new Strads API, and Logistic Regression application.
Eric Xing gave a keynote speech at ParLearning 2014: On the Algorithmic & System Interface of Big Learning.
Petuum alpha release v0.2 is officially on GitHub. Highlights: new built-in applications, better documentation, an improved Parameter Server, and the Strads variable scheduler.
Petuum alpha release v0.1 is officially on GitHub.
Qirong Ho gave an oral presentation at 2013 NIPS on More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server.