The last time Hackerfall tried to access this page, it returned a not found error. A cached version of the page is below, or click here to continue anyway

John's Soapbox

  1. It would benefit greatly from feedback from the community of researchers who might want to use it. Release early and often as they say.
  2. I am teaching a class at Berkeley called Deep Reinforcement Learning, and I want to make CGT available as a well-supported option that students can use. [That said, students will also be free to use other libraries like Theano, Caffe, or Torch, or roll everything from scratch. If youre interested in the course, watch our webpage for updates. The class starts this Wednesday (August 26th), and well be posting notes, code, and lecture videos.] You can expect to see a whole menagerie of Deep RL algorithms implemented in CGT in the coming months.

CGT makes it possible to work with large recurrent networks, unrolled across time, without having to worry about compilation time. In our examples directory, you can find a working implementation of the Neural Turing Machine. This implementation operates on batches of inputs, in contrast to all of the other open-source reimplementations I am aware of. In this file, the whole process of constructing the computation graph and compiling it to a callable function takes 8s for a computation that includes 15000 nodes in the computation graph (each one corresponding to an operation). Also in the examples directory, you can also find a reimplementation of karpathys char-rnn code, where an unrolled deep LSTM with 18000 operations is compiled in 7s.

If you are interested to give CGT a spin, check out the examples directory, which includes the following:

Or, start simple with the Tutorial.

You may be wondering how CGT currently stacks up against Theano with regard to compilation time and runtime. Here are the results from some simple examples where Ive implemented the exact same model in Theano and CGT. First, Ill show some benchmark results for feedforward networks that operate on MNIST-size inputs and run on the CPU.

Fully-connected Network Library / setting Runtime Theano .22s CGT, sequential .24s CGT, num_threads=4 .18s

See examples/

ConvNet Library / setting Runtime Theano 55s CGT, sequential 5.23s CGT, num_threads=4 1.84s

See examples/ Apparently Theano has very slow CPU convolutions, at least on the platform I am using. CGT uses Caffes im2col approach for convolutions on the CPU, and cuDNN on the GPU (though the API for GPU usage is not quite ready.)

I ran another experiment with a gated recurrent unit network unrolled across time. In the table below, top row shows the number of timesteps that the computation was unrolled for. Theano fails when the computation has more than 30 timesteps, throwing an exception maximum recursion depth exceeded. While Theano provides a Scan operator which could allow this GRU network to be used for more timesteps, Theano takes prohibitively long to compile larger/deeper recurrent models when using Scan. CGTs graph optimization uses a non-recursive algorithm whose time is linear in the size of the graph. (CGT works for at least 2000 timesteps of this model.)

Benchmark with GRU T=10 T=20 T=30 T=40 T=80 Library / setting Compile Run Compile Run Compile Run Compile Run Compile Run Theano 5.7 .8 11.2 1.7 19.7 2.8 FAIL FAIL CGT, sequential .3 1.0 .7 1.9 1.0 2.9 1.3 4.2 2.7 8.5 CGT, num_threads=4 .3 .56 .6 1.1 1.0 1.6 1.3 2.2 2.7 4.7

See examples/bench/ and examples/bench/

These results were obtained with a single-threaded BLAS on my quad-core laptop. On my machine, CGT, num_threads=4 is also faster than Theano on these examples when using a multi-threaded BLAS (vecLib).

Q & A

Here, Ill address some questions that people might ask about CGT and this endeavor.

Youre reimplementing much of Theanos functionality as part of this effort. Why didnt you just contribute to Theano development instead?

We are making some changes at the very core of the software (such as the graph datastructure itself), which will make new functionality possible and also allow for a much cleaner codebase. We explain some of these changes more thoroughly here: Why Not Build on Theano?

There have been new deep learning libraries announced every other week, and its getting tiresome. Why another one?

CGT is not a deep learning library: it provides general functionality for automatic differentiation and efficient execution of computations involving tensors. Currently the only comparable software occupying this niche is Theano. We hope that libraries will be built on top of CGT, as they have been built on Theano (even outside of the realm of neural networks and deep learning, e.g., PyMC3).

Part of my motivation for developing CGT was to provide a base layer for developing a library for implementing the algorithms from this paper on stochastic computation graphs, which provide a generalization of backpropagation that includes policy gradient and variational inference methods as special cases. These algorithms require various queries on the graph, which can only be implemented straightforwardly for a flat computation graphi.e., one that doesnt contain composite operations like Scan. Theano can only handle recurrent networks via Scan, so it was unsuitable for implementing this library. I hope that the computation graph representation used by CGT will be helpful for implementing other algorithms that go beyond just computing gradients, for example this recent paper.

I also believe that the usefulness of software tools is usually greatly underrated. Better tools can act as a significant multiplier on everyones productivity, and the right tools can make it easier for researchers to share code with each other. Code should be concise, readable (closely resembling the underlying math and algorithms), and have light dependencies.

Can I help?

Yes! Theres lots to help with! Take a look at our Issues page, and feel free to post on the cgt-devel discussion group.

I downloaded your code and ran into problem XYZ.

Please post to the cgt-users discussion group

Are you planning to turn CGT into a commercial product?

CGT is MIT-licensed, and I hope it is of interest to people in academia as well as industry. I personally have no plans to commercialize it.

What about GPU support?

GPU and multi-GPU computation has been a core consideration in CGTs design from day one. Usage of GPUs is currently not documented and we need some work to straighten out the API, but the basic scaffolding is in place for transporting data to and from the GPU, calling libraries like cuBLAS and cuDNN, as well as compiling kernels on the fly. We plan to substantially improve GPU and multi-GPU support in the coming weeks and months. So far, the GPU implementations use CUDA, but we are glad to accept code contributions providing OpenCL support, which should be doable given how CGTs code generation works.

Continue reading on