Slide 1
Neural networks are built out of a series of neuron layers. Where each layer is essentially performing a dot product with a vector of inputs and a matrix of weights. In computing terms this operation is called Multiply and Accumulate. To make a neural network accelerator, we just want to do this operation as fast and efficiently as possible.

Slide 2
So what our chip does is it uses a standard 8T SRAM to act as an accelerator for the input layer of an image classifier. We first take our 5b pixels and generate a PWM pulse on each RWL that reflects the intensity of the pixel. The bit-cells will see the PWM pulse and discharge one of the two differential bit-lines based on the stored weight. This is the multiply operation. The total discharge in each bit-line is summed up in parallel since they all get activated at the same time. This is the accumulate operation.

The weights come from software based ML training. Needing continuous writes to update the weights. But once they are trained, they will persist and can be used for many operations.

Slide 3
This diagram shows what a does bit-cell visually. When there is an input pulse of 0, neither of the bit-lines will discharge. When there is a non-zero pulse and the weight is +1, RBL will discharge. And the opposite happens when the weight is -1. Again this happens in parallel across the whole SRAM, so in 1 cycle you get 1 MAC operation, or about 10000 ALU operations.