Teaching machines to render
I’ve been studying AI tech a fair bit recently for a variety of reasons. There are lots of areas I want to explore using AI as solvers/approximations but as someone whom is generally employed to do graphics stuff, there always an interest in the application of AI technology to rendering. Currently except for a few papers on copying artistic styles to photos, its not yet a major discipline.
The real big thing to take away from AI like deep neural nets, is that they are approximate function solvers that are taught via showing them the data rather than be explicitly programmed. Once trained they evaluate the inputs, through there learnt ‘function’ and give you a result. With enough nodes and layers in the network and a lot of training, they can give a solution to any equation you have existing data for.
In rendering, the classic Kajiya equation is the solution real and offline renderers attempt to solve. The reason why rendering takes some much compute power, is that direct solving the equation is unfeasible complex and approximations we use have massive dimensional complexity.
So the question is can we replace parts (all?) of the Kajiya equation with a trained deep AI?
A rule of thumb in deep AI, is that if humans can do something in 0.1 seconds, then its tractable to be solved with the number of layer and node counts we have now. Now we know that many artists are capable of good approximation to Kajiya in real time, so its seems to imply that a neural network might be able to do parts of it.
Teaching an AI to paint/shade is a pretty unexplored area, as at this point, most uses of deep AI are reducing complex data where paint/shade is data expansion.
TensorFlow Tensorflow is Googles open source machine learning system, its consists of a Python/C++/CUDA framework for manipulating and training AI systems. My explorations of AI required me to get some underlaying ideas about how it worked at the machine level, so contrary to most inductions I wrote a small library in C++ to get a feel for the data at the low level. Whilst in practise I won’t use it for anything serious and instead am using Tensorflow, going through the exercise of writing my own models, data normalisation and training at a low level helps me understand how Tensorflow and other libraries actually work.
Tensorflow tutorials cover using it for the MNIST classifier problem, MNIST is a database of hand written numbers and the AI should be trained to output the actual number the image represents. Each image in the training set consists of a 28x28 greyscale image and the actual number its represent. A deep ANN (Artificial Neural Network) usually with convolution nets as well is then trained and the score shows how well the AI is doing.
This is classic deep AI, taking large structured data (pictures, audio, words) and reducing down to a simpler classification. In the mnist case it takes 784 dimension data (the input image) and reduces it down to the 10 numeral digits.
This is a relatively simple problem for AI, hence its use as a classic tutorial, even so training can take a while. Tensorflow has a GPU backend, but currently only support CUDA under linux, so using Apple AMD GPU on OS X rules GPU backend completely out!
What I’m currently working on is the first step in my paint/shade AI idea. I’m creating a backwards mnist, the aim is to teach the AI that given a numeric digit, it will output a 28x28 image. Tensorflow python interface makes it easy to try different models out, so i’m hopefully converging on a working solution.
The compute problem makes me tempted to try cloud GPU vendors. Even a few hours of a 4xNV Titans would make my life much easier. If anyone has any recommendations I’d be happy to have them. I’m deliberately trying to avoid getting sucked into see how fast I can make Tensorflow on my machines, as I want to work on the algorithm side of things not the innards of the math kernels but when it takes hours for even simple training its tempted to see how I could improve things…
Where I want to go
Where I want it to go, is to train an AI that given some basic scene data as input, is able to render an image without being programmed. I seriously doubt it will be faster than traditional rendering by a wide margin but it would offer interesting possibility of copying movie or off line rendering onto other new data sets. If you take something like Ambient Occlusion which is extremely expensive to render, if we can train an AI to do a ‘best guess’ it may lead to new approaches to hard problems that Kajiya equation give us.
Off course I know most people will think its nuts and arse way around of working but its interesting to me and I like thinking laterally :)