Classify a satellite image using a convolutional neural network

Thanks to Koray Kavukcuoglu for answers to these questions. Thanks also to Marc’Aurelio Ranzato and Yann LeCun.

How do you design the connection tables between the network layers?

If an image has multiple spectral bands such as red/green/blue/infrared, the resulting feature maps should combine information from more than one spectral band. One suggestion is to keep the spectral bands separate in the first connection table and randomly intermix feature maps from different spectral bands in the second connection table. Make sure that each second-layer feature map draws from a unique set of first-layer feature maps and that the randomization is well-balanced and not too orderly. [1]

How do you determine good convolutional kernel sizes and subsampling ratios?

First determine the eligible combinations of convolutional kernel sizes and subsampling ratios for your network architecture, given the size of your input matrix. [2] Then train classifiers using each eligible combination and select the classifier with the best performance on the test set. [1]

How do you determine the appropriate number of hidden units?

Determine the number of hidden units experimentally in orders of magnitude. For example, train classifiers using twenty, forty and eighty hidden units and select the classifier with the best performance on the test set. [1] Increasing the number of hidden units increases the number of trainable parameters in the classifier, which may improve performance if you have a correspondingly large training set. However, having too many hidden units can worsen performance if there is not enough training data to sufficiently calibrate all of them.

How do you train the classifier when there are more negative than positive examples?

First train the classifier using a 50/50 training set, generated by sampling one example from each class and looping over the smaller set. Stop training when the error rate converges. Continue training the same classifier using a second training set that approximates the natural distribution of houses to non-houses. [1]

How can we use information from both the high-resolution panchromatic and low-resolution multispectral images?

One option is to redesign the neural network architecture so that the panchromatic image runs through one convolutional+subsampling layer, then concatenate the resulting feature map matrix onto the multispectral image matrix. Run the concatenated matrix through two more convolutional+subsampling layers and make sure that the entire architecture is treated as a single learning unit. [1]

Another option is to enlarge the low-resolution multispectral image to match the size of the high-resolution panchromatic image. However, this method will introduce artifacts and greatly increase the size of the input matrix. [1]

Another option is to shrink the high-resolution panchromatic image to match the size of the low-resolution multispectral image. However, this method may discard potentially useful information from the panchromatic image.

How do we interpret the output of the convolutional neural network?

If the input matrix has the same dimensions as the training examples, then the output of the classifier is a vector whose length is the number of possible output classes. In the building detection problem, there are two classes (building or non-building) so the classifier outputs a vector of length two.

Although the classifier returns a scalar value for each class, it is still necessary to tune a threshold to determine whether the classifier’s output is positive. In the building detection problem, we will compare the scalar result corresponding to the building class of the output vector against the the threshold to determine whether the window is marked as containing a building. The threshold must be determined experimentally. Evaluate the classifier’s predictions using different thresholds to plot an ROC curve, then choose the threshold with the desired (false positive to false negative) trade-off. [1]

How many training examples do we need for good performance?

More training data improves the performance of the convolutional neural network. Use all available training data. Don’t handicap yourself by limiting the size of your training set unnecessarily. [1]

Is it better to train on all available training data at once?

It is simpler to train on all available training data at once. Another option is to train on a portion of the available training data, scan the resulting classifier on another portion of the training data and either train the old classifier on the mistakes or retrain a new classifier using the original training set augmented with the mistakes. However, it is unknown whether the lump-sum approach is better than the iterative approach. [1]

Is it better to retrain a new classifier when you have new training data?

When you have new training data, you can either continue training an existing classifier on the new training data or you can retrain a new classifier using all available training data. In most cases, it is sufficient to load an existing classifier and continue its training using the new training data. [1]

Where is the code for using convolutional neural networks?

The original convolutional neural network library is written in a Lisp variant called Lush. A newer library called EBLearn is also available and it is based on recent advances in energy-based learning research; the library is available in C++ and Lush. [1]

How do you efficiently scan the classifier over a large image?

Send the classifier the entire image instead of a single window at a time. [3] It will be able to compute the overlapping convolutions much more efficiently and it will return the resulting classifications as a matrix. [1]

How do you limit the memory usage of the classifier?

Transfer the input matrix using the much more compact binary IDX format instead of ASCII IDX format. Use the Lush map-matrix function to lazy-load huge input matrices from the hard disk. [1]

[1](1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13) Credit to Koray Kavukcuoglu
[2]Credit to Marc’Aurelio Ranzato
[3]Credit to Yann LeCun