vision image-classification mobilenet tensorflow

Model name: mobilenet_v3_small_100_224

Description adapted from TFHub

Overview

MobileNet V3 is a family of neural network architectures for efficient on-device image classification and related tasks, originally published by

Similar to other Mobilenets, MobileNet V3 uses a multiplier for the depth (number of features) in the convolutional layers to tune the accuracy vs. latency tradeoff. In addition, MobileNet V3 comes in two different sizes, small and large, to adapt the network to low or high resource use cases. Although V3 networks can be built with custom input resolutions, just like other Mobilenets, all pre-trained checkpoints were published with the same 224x224 input resolution.

For a quick comparison between these variants, please refer to the following table:

Size Depth multiplier Top1 accuracy (%) Pixel 1 (ms) Pixel 2 (ms) Pixel 3 (ms)
Large 1.0 75.2 51.2 61 44
Large 0.75 73.3 39.8 48 32
Small 1.0 67.5 15.8 19.4 14.4
Small 0.75 65.4 12.8 15.9 11.6

This model uses the TF-Slim implementation of mobilenet_v3 as a small network with a depth multiplier of 1.0.

The model contains a trained instance of the network, packaged to do the image classification that the network was trained on. If you merely want to transform images into feature vectors, use google/imagenet/mobilenet_v3_small_100_224/feature_vector/5 instead, and save the space occupied by the classification layer.

Training

The checkpoint exported into this model was v3-small_224_1.0_float/ema/model-388500 downloaded from MobileNet V3 pre-trained models. Its weights were originally obtained by training on the ILSVRC-2012-CLS dataset for image classification ("Imagenet").

Usage

This model can be used with the hub.KerasLayer as follows. It cannot be used with the hub.Module API for TensorFlow 1.

Using TF Hub and HF Hub

model_path = snapshot_download(repo_id="Dimitre/mobilenet_v3_small")
model =  KerasLayer(handle=model_path)

img = np.random.rand(1, 224, 224, 3) # (batch_size, height, width, num_channels)
model(img) # output shape (1, 1001)

Using TF Hub fork

model = pull_from_hub(repo_id="Dimitre/mobilenet_v3_small")

img = np.random.rand(1, 224, 224, 3) # (batch_size, height, width, num_channels)
model(img) # output shape (1, 1001)

The output is a batch of logits vectors. The indices into the logits are the num_classes = 1001 classes of the classification from the original training (see above). The mapping from indices to class labels can be found in the file at download.tensorflow.org/data/ImageNetLabels.txt (with class 0 for "background", followed by 1000 actual ImageNet classes).

The input images are expected to have color values in the range [0,1], following the common image input conventions. For this model, the size of the input images is fixed to height x width = 224 x 224 pixels.

Fine-tuning

In principle, consumers of this model can fine-tune it by passing trainable=True to hub.KerasLayer.

However, fine-tuning through a large classification might be prone to overfit.

The momentum (a.k.a. decay coefficient) of batch norm's exponential moving averages defaults to 0.99 for this model, in order to accelerate training on small datasets (or with huge batch sizes).

Using TF Hub and HF Hub

model_path = snapshot_download(repo_id="Dimitre/mobilenet_v3_small")
model =  KerasLayer(handle=model_path, trainable=True)

img = np.random.rand(1, 224, 224, 3) # (batch_size, height, width, num_channels)
model(img) # output shape (1, 1001)

Using TF Hub fork

model = pull_from_hub(repo_id="Dimitre/mobilenet_v3_small", trainable=True)

img = np.random.rand(1, 224, 224, 3) # (batch_size, height, width, num_channels)
model(img) # output shape (1, 1001)