Tensorflow Quantizaiton
During inference, precision in floats is not needed and can be reduced to using 8 bits instead of 32 bits this allows to bin continuous values to discrete ranges, and therefore is known as Quantization. This enables us to increase the bandwith sent (or increase the memory footprint in caseContinue Reading