During inference, precision in floats is not needed and can be reduced to using 8 bits instead of 32 bits this allows to bin continuous values to discrete ranges, and therefore is known as Quantization.
This enables us to increase the bandwith sent (or increase the memory footprint in case we don’t increase bandwidth), to the inference process and also reduce time complexity for real time situations.
Example Code:
converter = tf.contrib.lite.TFLiteConverter.from_frozen_graph(
graph_def_file='tflite-models/tf_model.pb',
input_arrays=input_node_names,
output_arrays=output_node_names)
converter.post_training_quantize = True
tflite_model = converter.convert()
Comments are closed, but trackbacks and pingbacks are open.