MindsEye Artist’s Kit

Andrew Charneski
10 min readApr 24, 2018

Hello! Today we’ll be learning about the MindsEye Artist’s Kit, a simple way for nearly anyone to produce their own deep learning artwork! Our goal here is to make this easy to use and simple to set up, using remote execution on Amazon EC2 to negate special hardware requirements. Only basic software skills, or a willingness to learn, are needed.

Although EC2 is a paid service, the prices are economical. We use P3.2XL instances, which cost about a nickel per minute. However, if you consider that this instance seems to perform almost 20x faster than my desktop, the $3 for an hour of compute on this machine is comparable to the $2–3 dollars of energy my home machine costs per day (A 1kW PC running at 100% for 24 hours at $0.12 per kW*hr — typically machines use power savings when idling or sleeping.) However, even though the price is reasonable and even though the software covered here does its best to clean up the instances after use, it is a good idea to double-check the AWS console to make sure you haven’t left any machines on when you are done!

Art Examples

What kind of images can we generate with this project? Here are a few examples. Keep in mind these are fairly raw first-generation examples. Your work could be way cooler, and I can’t wait to see it!

  • This high-resolution image of the Earth has been processed using the Deep Dream algorithm, which enhances latent patterns found in the image:
  • Here is an animation of several different style transfer results, each inspired by the style of a different painting, each rendering a content image of the Taj Mahal:
  • Abstract textures, with optional support for tiling, can also be generated:
  • These techniques can also be hybridized to create computer-generated abstract art:

Background

At the fundamental level, we are using Neural Networks. Neural Networks are an artificial intelligence technique inspired by the biological brain found inside most humans. Each Neuron individually processes only a tiny amount of data, but through many layers of interconnections more advanced behaviors can emerge. The mechanics of a single layer of neurons work largely on the principles of Linear Algebra, and the network is generally trained using Back Propagation.

When processing visual data, however, we can take advantage of self-similarity within the data: A shape in the upper-left of the image is still the same shape even if it is in the lower-right in another image. Pixels which are next to each other tend to be of the same color. These spatial relationships allow us to impose this symmetry on the network itself, creating what is known as a convolutional network. Convolutional networks are great at processing image data and have been known to pick out edges, textures, and shapes at increasing levels of complexity.

One of our standard networks is VGG19, a pre-trained model built by some researchers at Oxford to process the ImageNet Challenge 2014 image recognition dataset. This “deep” convolutional network consists of 20–70 layers, depending on how you count them, and is nearly a gigabyte in size. This network was pre-trained for image classification, but we are going to re-use it for art generation by extracting the first few layers as “feature detection layers”. These first layers have implicitly been trained to respond to details that a relevant to object detection, with the idea that these details will also be the details of importance in the human visual processing system.

One of the first creative uses for these feature layers was the Deep Dream algorithm. This method modifies an image seeking to maximize the L2 norm of the activation of a given layer. This maximizes the signal of a given layer, thus whatever textures that layer was detecting should be visually enhanced. This can also use a weighted combination of the signal strengths of multiple layers, providing some control over the result.

A more flexible method uses an input “style” image to gather inspiration from. This style image is pre-processed by the feature layers, and then we collect metrics about these signals. Specifically, we track the mean value of each feature channel (similar to a color channel, but representing a larger “thing” such as a texture or a shape) and the mean of all cross-products, technically known as a Gram Matrix. By recording these metrics and then optimizing an image to have similar metrics, we can encourage any input image to have a similar distribution of shapes and textures; i.e. style transfer!

This is of course only a very, very brief theoretical overview. There are many other pre-trained networks out there, and there are many variations to these techniques. There is nearly an endless potential here for mathematically-based creativity!

Requirements

By using EC2 and Java, we can reduce the requirements for running this project down to commonly available and free components. These requirements are largely the same as for any Java software project:

  1. An active AWS account, with root user and/or full permissions. This account itself is free to sign up for, though the machine time is a paid resource.
  2. AWS Command Line Tools — Used to configure the local AWS credentials.
  3. A Git client — Used to retrieve and manage the code for this project.
  4. Java 8+ JDK — Code is built locally and deployed automatically, so you need the tools to build Java.
  5. IntelliJ (or compatible tool such as Eclipse or Maven) — Needed to open, build, and run the project. Further instructions assume the reader is using IntelliJ.

Environment Setup

First, let’s assume you don’t have all the requirements installed. All you have is a computer, the internet, an email address, a web browser, and a credit card. No problem! You just need to setup 3 main things:

Amazon Web Services provides cloud computing facilities, including state of the art hardware for Deep Learning with CuDNN.

  1. Sign up for an AWS Account
  2. Install the AWS Command Line Tools
  3. Configure the user login for your system

Git is a popular peer-to-peer version control system used to manage source code.

  1. Make a free Github Account — This is needed to fork the mindseye-art project, if you want to post your own code modifications online.
  2. Install a Git client — This is needed to download the mindseye-art project.
    Alternately, GitHub provides a nice graphical user interface client.

Java Development Tools are needed, including:

  1. Java 8 JDK — The Java Development Kit is used to build Java source code
  2. IntelliJ — The community edition is free and works great; Mindseye was developed using it.

Project Setup

Pulling the project is as simple as any other Java project:

  1. Fork the project — This lets you publish your own work. This is optional and you can do it later, but it is easier to do as step #1.
  2. Clone the project — Use Git to download the project to your machine
  3. Load the project in IntelliJ — This will download all other dependencies

Execution

Execution of any script is easy:

  1. Select a script — for example, com.simiacryptus.mindseye.deep_dream.Simple
  2. Edit the script if desired, for example by editing these common parameters:
    Verbosity — If set to true, the report will include details of the training process used to generate the images.
    Max Iterations — Sets the maximum number of iterations used in each training phase
    Training Minutes — Sets the maximum number of minutes used in each training phase
    Input Image — Each of these scripts use one or more input images
    Output Resolution — The resolution of the output image. Most scripts will risk running out of memory at a resolution of around 1000px — The HiDef script variants should be used to deal with this limit.
  3. Run EC2 entry point — In IntelliJ, you can right click the EC2 inner class and select “Run Simple$EC2.main()”

First Time Setup

When run, the EC2Runner class looks for user-settings.json and ec2-settings.json, and if they are not found it starts a bootstrapping process to initialize itself. This is mostly automated, as long as the user has full permissions to AWS setup properly.

First, the user is prompted for email via standard input. There is no gui for this, so simply type your email into the console window and press enter. This email will be verified via AWS, sending an email to the provided address requesting confirmation to enable messages.

Next, using the aws user configured via the cli, the application sets up various AWS entities to support execution including:

  1. S3 Bucket — This stores published results and code deployed to EC2
  2. IAM role — A non-administrative role is configured for the EC2 node
  3. EC2 Security Group — Configures networking security
  4. SSH Keys — Used to control the EC2 node once launched

Monitor Execution

Once the initialization logic completes, the remote process is launched. This is a complex, multi stage process involving launching a new EC2 instance, connecting to it, deploying code, and managing remote execution. This is largely transparent to the user, who will monitor the process over these stages:

  1. Browser Windows Open — The process will open two browser windows; the first deploys the logged progress of the local process, which dispatches the remote task. The second browser window opens after the remote process has been started, and displays the output of the main script being remotely run.
  2. A “Start” Email is Sent — The user is notified by email with links to monitor output progress and to manage the node in the AWS Console.
  3. A “Finished” Email is Sent — This email includes the full output of the script, with appended links to the HTML, PDF, and ZIP formatted results.

At the end of execution, the instance should automatically shut down. It can also be manually terminated by using the AWS Console, linked to in the “start” email.

The MindsEye Artist’s Kit includes a variety of starter scripts, currently available in 3 families and 5 sub-types:

Script Families

Deep Dream — This process enhances latent patterns within an input image. This script uses the following parameters:

  1. Content Image — A single input image is given, which is input directly and gradually altered.
  2. Per-layer “mean” coefficient — These coefficients anchor the result to the original content by providing a penalty for L2 deviations from the ground-truth signal at a given layer.
  3. Per-layer “gain” coefficient — These coefficients determine the strength of amplification for each layer’s signal

Style Transfer — This combines two images, using the content of one image and the style of another to render a unique combination of the two. This script uses the following parameters:

  1. Content Image — The primary input image determines the content. This is first passed through a degradation process to initialize the output image, which is then evolved using the feature signals of the undegraded input.
  2. Style Image — One or more images style images are also input; these are pre-processed to gather aggregate metrics which describe the overall patterns and textures contained within.
  3. Per-layer “mean” coefficient — These coefficients determine how tightly to match the mean values of each feature channel on the given layer with the target style.
  4. Per-layer “cov” coefficient — These coefficients determine how tightly to match the Gram matrices of each feature channel on the given layer with the target style.
  5. Per-layer “gain” coefficient — These coefficients add the components used in Deep Dream, causing signal amplification at each layer where set.

Texture Generation — Very similar to style transfer, this process lacks a content image and thus generates abstract imagery. This script uses the following parameters:

  1. Style Image — One or more images style images are input; these are pre-processed to gather aggregate metrics which describe the overall patterns and textures contained within.
  2. Per-layer “mean” coefficient — These coefficients determine how tightly to match the mean values of each feature channel on the given layer with the target style.
  3. Per-layer “cov” coefficient — These coefficients determine how tightly to match the Gram matrices of each feature channel on the given layer with the target style.
  4. Per-layer “gain” coefficient — These coefficients add the components used in Deep Dream, causing signal amplification at each layer where set.

Script Sub-Types

Each family of script is provided in several forms, each supporting a slightly different use-case:

  1. Simple — An example is provided that is as simple as possible, using a single phase and a single set of inputs.
  2. Enlarging — The basic process is repeated over several iterations, between which the working image is gradually enlarged. This can provide new behavior, for example by combining multiple scales and resolutions.
  3. ParameterSweep — This repeats the basic process over a range of input parameters, displaying the resulting progression as a formatted table and as an animation.
  4. StyleSurvey — Only applicable to Style Transfer and Texture Generation, this script iterates over a collection of style images to display a variety of output images, formatted as a table and as an animation.
  5. HiDef — These scripts include special logic for processing high resolution images. This is generally done by breaking the calculation down at some level to process using image tiles.

Script Output Examples

For illustration, here are some prior runs of each script. Keep in mind these are works in progress. Feel free to submit your results and improvements!

Deep Dream

  1. Simple: ZipPdfhtml
  2. High Resolution: ZipPdfhtml

Style Transfer

  1. Simple: Zippdfhtml
  2. Enlarging: Zippdfhtml
  3. Style Survey: Zippdfhtml
  4. Parameter Sweep: ZipPdfhtml
  5. High Resolution: ZipPdfhtml

Texture Generation

  1. Simple: Zippdfhtml
  2. Enlarging: ZipPdfhtml
  3. Style Survey: ZipPdfhtml
  4. Parameter Sweep: ZipPdfhtml
  5. High Resolution: ZipPdfhtml

Further Reading

  1. Simia Cryptus MindsEye — The parent project, focusing on Java 8 Neural Networks
  2. Original Style Transfer Paper
  3. Original Google Deep Dream Blog Post
  4. CuDNN powers most of the heavy compute
  5. Aparapi is another supported tool for GPU accelerated layers
  6. Google Arts & Culture — Excellent resource for inspiration

Enjoy!

--

--

Andrew Charneski

Big Data Engineer and Artificial Intelligence Researcher