In this post we are going to perform the image segmentation detecting a single quadcopter (one quadcopter per image) using a simple CNN. This aproach has its advantages: it is way faster than more complex YOLO, SSD or Faster-RCNN approaches. All we need is to show the image to a network and to get quadcopter's coordinates as an output.
The post is intended to be more educational than practical: we are going to address an issue of dataset generation. After all, to train a NN, we need a lot of images, and in our case, it is going to be THE SAME quadcopter, in different rooms, and at different angles and distances. So... why don't we take two dosens of quadcopter images, and paste them, at different scales and spots, to photos of rooms? This way we can get nearly infinite amount of images for training!
As always, obvious advantages of a method are accompanied by the obvious disadvantages.
First of all, we can not have more than one quadcopter per image. It is ok, as we expected
so initially. Another problem is the size of a copter, related to the size of an image.
If the copter is far enough from the camera, it becomes too small for a CNN resolution.
For example, EfficientNet B0 is 224x224 pixels. if a copter occupies 1/10 of an image, it will be only 22x22 px. Below, we'll try fixing the problem by using larger input sizes (like B1 to B7), but at some point quality of our image recognition will stop improving.
The code below ia s fully functional example, so let's just go through it, looking at details.
As usual, we start with allowing Google Colab to access Google Drive:
from google.colab import drive drive.mount("/content/drive/", force_remount=True)
Install EfficientNet in our system. We use so called "transfer learning" approach. Someone (Google) has already trained the EfficientNet on a huge dataset, so it can recognize cats, chairs and so on. It is of no use for quacopter detection by itself, however, it means this network has already seen a lot of graphical primitives, like lines, circles etc. So we will use it and just alter a bit, to train additionally on our quadcopter set. It will make the training process dramatically shorter and will - also dramatically - reduce requirements for the dataset size.
There are eight Efficient Nets, from EfficientNet0 to EfficientNet7, the difference is in the input (and overall) size. We use the following table (https://keras.io/examples/vision/image_classification_efficientnet_fine_tuning/) as a reference for those input sizes:
!pip install -q efficientnet import efficientnet.tfkeras as efn
Import everything we need. I am sure some links are not required, but I was to lazy to find them and to comment them out:
import numpy as np from sklearn.utils import shuffle import pandas as pd import tensorflow as tf from tensorflow.keras import layers import json from tensorflow.keras.utils import Sequence import sys import random import math from copy import copy, deepcopy import matplotlib.pyplot as plt import matplotlib.patches as patches import os from os import listdir from os.path import isfile, join import json from tensorflow.keras import regularizers from tensorflow.keras.optimizers import Adamax from tensorflow.keras.preprocessing.image import ImageDataGenerator from tensorflow.keras.preprocessing.image import array_to_img, img_to_array from tensorflow.keras import backend as K from tensorflow.keras.applications.vgg16 import VGG16,preprocess_input from tensorflow.keras.applications import InceptionResNetV2, Xception from tensorflow.keras.applications import NASNetLarge from mpl_toolkits.mplot3d import Axes3D from sklearn.manifold import TSNE from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D from tensorflow.keras.layers import Dense, Activation, Dropout from tensorflow.keras.layers import Flatten, Lambda, concatenate from tensorflow.keras.layers import BatchNormalization, GlobalAveragePooling2D from tensorflow.keras.callbacks import LambdaCallback from tensorflow.keras.callbacks import ModelCheckpoint from tensorflow.keras.models import Model from tensorflow.keras.models import Sequential from sklearn.neighbors import NearestNeighbors import seaborn as sns import cv2 import re
Is GPU Working? Google provides us with access to their GPUs, and sometimes we forget turning this feature on:
import tensorflow as tf print(tf.__version__) tf.test.gpu_device_name()
I am going to save the last NN, just in case the training is interrupted, and the best NN - for a future use:
# Folders we are going to use in our project working_path = "/content/drive/My Drive/copter_detect_my/" best_weights_filepath = working_path + "models/copter_detect_best.h5" last_weights_filepath = working_path + "models/copter_detect_last.h5"
The following boolean flag allows us to either train the NN, or load a previously trained one, if we want to do testing. As we save our last training configuration, we don't have to redo training every time we start the notebook: we can simply reload from disk.
bDoTraining = True
I am going to try few versions of EfficientNet (B0, B1, ...), so here are input sizes they accept, in no particular order. For a reference, see the table above. Also, when i select a particular size here, i have to change the type of a network below in CreateNN function:
# Constants for training and image preprocessing IMAGE_SIZE_X = 380 #300 #260 #240 #224 #456 #512 #700 IMAGE_SIZE_Y = 380 #300 #260 #240 #224 #456 #683 #934
The larger the network, the higher are the chances that it will not fit in Colab's memory. As a simple countermeasure, we can reduce the size of a batch. Generally, we should try keeping the batch size as large as we can, it will improve training.
BATCH_SIZE = 4 #1
Now, the most important part: where are we going to get the images? We can, of course, make a large amount of photos, but it is time consuming. Instead, we take some small number of photos of quadcopters, taken by different angles. We clean them, making a background transparent. Then we paste them dynamically (meaning, we generate images on the fly), at different (random) scales to different random spots of photos of rooms.
First, let's load images of quadcopters. We keepthem in an array in memory, as there are just 37 images. Instead of coordinates (left - top - right - bottom), we keep center and radius:
copter_images_path = working_path + "images_copter/" # Download annotations.json with open(copter_images_path + 'annotations.json') as f: json_data = json.load(f) # Scan json arrCopterImageNames =  arrCoordinates =  arrCopterImages =  file_info_json = json_data['_via_img_metadata'] for x in file_info_json: strImageFileName = file_info_json[x]['filename'] shape = file_info_json[x]['regions']['shape_attributes'] copter_rect = [shape['x'], shape['y'], shape['width'], shape['height']] #print("%s:\r\n\t%s, %s" % (x, strImageFileName, copter_rect)) # --- arrCopterImageNames.append(copter_images_path + strImageFileName) # Note that we are downloading a full-size image img_copter=cv2.imread(copter_images_path + strImageFileName, cv2.IMREAD_UNCHANGED) img_copter = cv2.cvtColor(img_copter, cv2.COLOR_BGRA2RGBA) arrCopterImages.append(img_copter) # Replace rect with center/radius copter_center_x = copter_rect + (int)(copter_rect / 2) copter_center_y = copter_rect + (int)(copter_rect / 2) copter_radius = (int)(max(copter_rect, copter_rect) / 2) arrCoordinates.append([copter_center_x, copter_center_y, copter_radius]) # Shuffle, otherwise all negatives will go to validation set arrCoordinates = np.array(arrCoordinates, dtype="float32") arrCoordinates = np.array(arrCoordinates) print(arrCopterImageNames) print(arrCoordinates)
Then we load images of rooms:
arrAppartmentImageNames =  arrAppartmentImages =  appartment_images_path = working_path + "images_appartment/" for strImageFileName in os.listdir(appartment_images_path): if strImageFileName.endswith('.jpg'): arrAppartmentImageNames.append(strImageFileName) img=cv2.imread(appartment_images_path + strImageFileName) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) img = cv2.resize(img, (IMAGE_SIZE_X, IMAGE_SIZE_Y)) arrAppartmentImages.append(img) print(arrAppartmentImageNames)
Now, as we are going to rescale images as we combine them, we need to work with copies, not with items in the original array:
def loadCopterImage(nIdx): return arrCopterImages[nIdx].copy() def loadAppartmentImage(nIdx): return arrAppartmentImages[nIdx].copy()
Let's test our loaders. It is a good idea, every time we write a function we can test, writing also a testing code (note that for loaders above it is not necessary, as we can see values in arrays using Colab's "run selection"):
# Let's test our loadImage() function, just to be sure it works properly nImageIdx = random.randint(0, len(arrCopterImageNames) - 1) print(arrCopterImageNames[nImageIdx]) img = loadCopterImage(nImageIdx) figure, ax = plt.subplots(1) circle = patches.Circle((arrCoordinates[nImageIdx], arrCoordinates[nImageIdx]), arrCoordinates[nImageIdx], linewidth=2, edgecolor='r', facecolor="none") ax.imshow(img) ax.add_patch(circle) plt.show()
Similar test for a room loader:
# Let's test our loadImage() function, just to be sure it works properly nImageIdx = random.randint(0, len(arrAppartmentImageNames) - 1) print(arrAppartmentImageNames[nImageIdx]) img = loadAppartmentImage(nImageIdx) figure, ax = plt.subplots(1) ax.imshow(img) plt.show()
Not only we are going to change scale and location of a copter image within an image of a room, but also we will tilt them a bit:
def rotate_point(pointX, pointY, originX, originY, angle): angle = angle * math.pi / 180.0; return [ math.cos(angle) * (pointX-originX) - math.sin(angle) * (pointY-originY) + originX, math.sin(angle) * (pointX-originX) + math.cos(angle) * (pointY-originY) + originY ] def rotateImage(image, angle): new_image_max_size = max(image.shape, image.shape) * 2 img_background = np.zeros((new_image_max_size, new_image_max_size, 4), np.uint8) padding = (int)(new_image_max_size/4) x = (int)((new_image_max_size - image.shape ) / 2) y = (int)((new_image_max_size - image.shape ) / 2) img_background[x : x + image.shape, y : y + image.shape] = image row,col = img_background.shape[1::-1] center=tuple(np.array([row,col])/2) rot_mat = cv2.getRotationMatrix2D(center,angle,1.0) dst_mat = np.zeros((new_image_max_size, new_image_max_size, 4), np.uint8) new_image = cv2.warpAffine(img_background, rot_mat, (new_image_max_size, new_image_max_size), dst_mat, flags=cv2.INTER_LINEAR, borderMode=cv2.BORDER_TRANSPARENT) return new_image
The scaled/rotated image of a quadcopter need to be pasted to the image of the room, preserving transparent background:
def overlay_image_alpha(img, img_overlay, x, y, alpha_mask): """Overlay `img_overlay` onto `img` at (x, y) and blend using `alpha_mask`. `alpha_mask` must have same HxW as `img_overlay` and values in range [0, 1]. """ # Image ranges y1, y2 = max(0, y), min(img.shape, y + img_overlay.shape) x1, x2 = max(0, x), min(img.shape, x + img_overlay.shape) # Overlay ranges y1o, y2o = max(0, -y), min(img_overlay.shape, img.shape - y) x1o, x2o = max(0, -x), min(img_overlay.shape, img.shape - x) # Exit if nothing to do if y1 >= y2 or x1 >= x2 or y1o >= y2o or x1o >= x2o: return # Blend overlay within the determined ranges img_crop = img[y1:y2, x1:x2] img_overlay_crop = img_overlay[y1o:y2o, x1o:x2o] alpha = alpha_mask[y1o:y2o, x1o:x2o, np.newaxis] alpha_inv = 1.0 - alpha img_crop[:] = alpha * img_overlay_crop + alpha_inv * img_crop return img
Now we can use all functions above to produce a combined image:
def loadCombinedImage(nAppartmentImageIdx, nCopterImageIdx): img_appartment = loadAppartmentImage(nAppartmentImageIdx) #figure, ax = plt.subplots(1, figsize=(12, 12)) #ax.imshow(img_appartment) #plt.show() # --- dAngle = np.random.randint(0, 30) img_copter = loadCopterImage(nCopterImageIdx) img_copter_rotated = rotateImage(img_copter, dAngle) # --- # figure, ax = plt.subplots(1) # ax.imshow(img_copter_rotated) # circle = patches.Circle((center, center), # arrCoordinates[nCopterImageIdx], # linewidth=2, edgecolor='r', facecolor="none") # ax.add_patch(circle) # plt.show() # --- # We use IMAGE_SIZE_X to scale copter_image_new_width = random.randint((int)(IMAGE_SIZE_X / 10), (int)(IMAGE_SIZE_X / 2)) copter_image_scale = copter_image_new_width / img_copter_rotated.shape copter_image_new_height = (int)(img_copter_rotated.shape * copter_image_scale) copter_x = random.randint(0, IMAGE_SIZE_X - 1 - copter_image_new_width) copter_y = random.randint(0, IMAGE_SIZE_Y - 1 - copter_image_new_height) img_copter_rotated_scaled = cv2.resize(img_copter_rotated, (copter_image_new_width, copter_image_new_height)) # --- # Perform blending alpha_mask = img_copter_rotated_scaled[:, :, 3] / 255.0 img_result = img_appartment[:, :, :3].copy() img_overlay = img_copter_rotated_scaled[:, :, :3] img_result = overlay_image_alpha(img_result, img_overlay, copter_x, copter_y, alpha_mask) #img_result = img_to_array(img_result) / 255. #img_result = add_noise(img_result) #img_result = datagen.random_transform(img_result) #img_result = shiftChannelColors(img_result) #img_result = np.array(img_result, dtype="float32") copter_center_shift_x = (int)(img_copter_rotated.shape - img_copter.shape) / 2 copter_center_shift_y = (int)(img_copter_rotated.shape - img_copter.shape) / 2 corter_center = rotate_point(copter_center_shift_x + arrCoordinates[nCopterImageIdx], copter_center_shift_y + arrCoordinates[nCopterImageIdx], (int)(img_copter_rotated.shape/2), (int)(img_copter_rotated.shape/2), dAngle) corter_center_x = copter_x + corter_center * copter_image_scale corter_center_y = copter_y + corter_center * copter_image_scale copter_radius = arrCoordinates[nCopterImageIdx] * copter_image_scale return img_result/255., corter_center_x, corter_center_y, copter_radius
Let's test it:
if(bDoTraining): nAppartmentImageIdx = random.randint(0, len(arrAppartmentImageNames) - 1) nCopterImageIdx = np.random.randint(len(arrCopterImageNames) - 1) print("Appartment:", arrAppartmentImageNames[nAppartmentImageIdx], "; Copter: ", arrCopterImageNames[nCopterImageIdx]) img_result, corter_center_x, corter_center_y, copter_radius = loadCombinedImage(nAppartmentImageIdx, nCopterImageIdx) print(corter_center_x, corter_center_y, copter_radius) figure, ax = plt.subplots(1, figsize=(12, 12)) ax.imshow(img_result) circle = patches.Circle((corter_center_x, corter_center_y), copter_radius, linewidth=2, edgecolor='r', facecolor="none") ax.add_patch(circle) plt.show()
To be able to restart training, we need to delete an old stored NN:
def deleteSavedNet(weights_filepath): if(os.path.isfile(weights_filepath)): os.remove(weights_filepath) print("deleteSavedNet():File removed") else: print("deleteSavedNet():No file to remove")
The following functions serve as a wrapper to plot training history (history is returned by fit() of Keras):
def plotHistory(history, strParam1, strParam2): plt.plot(history.history[strParam1], label=strParam1) plt.plot(history.history[strParam2], label=strParam2) #plt.title('strParam1') #plt.ylabel('Y') #plt.xlabel('Epoch') plt.legend(loc="best") plt.show() def plotFullHistory(history): arrHistory =  for i,his in enumerate(history.history): arrHistory.append(his) plotHistory(history, arrHistory, arrHistory) plotHistory(history, arrHistory, arrHistory)
Let's create a model. We are going to use a pretrained EfficientNet of different sizes, with our own layers on top. Note that instead of using four coordinates of a rectangle, I "cheated" and used 3 coordinates: center and radius. As we rotate our quadcopter, using rectangle is less convenient, while 3 outputs are better than four:
def createModel(nL2, dDrop, optimizer): inputs = Input(shape=(IMAGE_SIZE_X, IMAGE_SIZE_Y, 3)) # As we change input size above, we also need to change model type here model_b0 = efn.EfficientNetB3(weights='imagenet', include_top=False)(inputs) model_b0.trainable = False model_concat = model_b0 flatten = layers.Flatten(name="Flatten")(model_concat) # Above, in "model_b0 = efn.EfficientNetB3", we used # include_top=False to ignore default classifier. Here we add # our own classifier instead: bboxHead = Dense(128, kernel_regularizer=regularizers.l2(nL2), activation="relu")(flatten) bboxHead = Dense(64, kernel_regularizer=regularizers.l2(nL2), activation="relu")(bboxHead) bboxHead = Dense(32, kernel_regularizer=regularizers.l2(nL2), activation="relu")(bboxHead) # outputs: left, top, right, bottom, bIsPositive base_model = layers.Dense(3, activation="sigmoid", kernel_regularizer=regularizers.l2(nL2), name="DenseEmbedding")(bboxHead) model = Model(inputs=inputs, outputs=base_model, name="embedding_model") model.compile(loss='mean_squared_error', optimizer=optimizer, metrics=['MeanSquaredError']) # summarize layers # print(model.summary()) # plot graph # plot_model(model, to_file='convolutional_neural_network.png') return model
In our previous posts we used a getStepSizes() function, to figure out how many steps should there be in an epoch... Not this time: in our synthetic dataset, we have nearly infinite number of images.
def getStepSizes(): # nNumOfSamples = len(arrCopterImageNames) # nNumOfTrainSamples = nNumOfSamples * TRAINING_IMAGES_PERCENT # nNumOfValidSamples = nNumOfSamples - nNumOfTrainSamples # # step_train = nNumOfTrainSamples // BATCH_SIZE # # step_valid = nNumOfValidSamples // BATCH_SIZE # # if(step_train < 100): # step_train = 1000 # # if(step_valid < 100): # step_valid = 100 # # return (step_train, step_valid) # We generate nearly infinite number of different images, so no need to calculate step sizes return (100, 100)
The image data generator returns batches of images and corresponding labels.
class MyImageDataGenerator(Sequence): def __init__(self, bIsTrain): self.batch_size = BATCH_SIZE self.bIsTrain = bIsTrain step_train, step_valid = getStepSizes() # We generate nearly infinite number of different # images, so this is a simplified version if(bIsTrain): self.STEP_SIZE = step_train else: self.STEP_SIZE = step_valid print("STEP_SIZE: ", self.STEP_SIZE, " (bIsTrain: ", bIsTrain, ")") def __len__(self): return self.STEP_SIZE def __getitem__(self, idx): arrBatchImages =  arrBatchLabels =  for i in range(self.batch_size): nAppartmentImageIdx = random.randint(0, len(arrAppartmentImageNames) - 1) nCopterImageIdx = np.random.randint(0, len(arrCopterImageNames) - 1) img_result, corter_center_x, corter_center_y, copter_radius = loadCombinedImage(nAppartmentImageIdx, nCopterImageIdx) arrBatchImages.append(img_result) # We scale radius to IMAGE_SIZE_X, as above we scaled copter to it arrBatchLabels.append([corter_center_x / IMAGE_SIZE_X, corter_center_y / IMAGE_SIZE_Y, copter_radius / (IMAGE_SIZE_X / 5)]) #, bIsPositive]) return np.array(arrBatchImages), np.array(arrBatchLabels)
Now we create generators for training and validation
if(bDoTraining): gen_train = MyImageDataGenerator(True) gen_valid = MyImageDataGenerator(False)
Also note, that as we generate images on the fly, there is no need to break the dataset to training/validation subsets: every image is unique.
Let's create a function to show image:
# Same way as for all image processing routines, let's make sure everything works def ShowImg(img, label): print(label) figure, ax = plt.subplots(1, figsize=(12, 12)) ax.imshow(img) circle_gt = patches.Circle((label * IMAGE_SIZE_X, label * IMAGE_SIZE_Y), label * (IMAGE_SIZE_X / 5), linewidth=2, edgecolor='r', facecolor="none") print((label * IMAGE_SIZE_X, label * IMAGE_SIZE_Y), label * (IMAGE_SIZE_X / 5)) ax.add_patch(circle_gt) plt.show()
And let's test it on our generator:
if(bDoTraining): (images, labels) = gen_train.__getitem__(0) #next(gen_train) for i, img in enumerate(images): ShowImg(img, labels[i]) break
Callbacks are functions Keras will call at particular points of training:
def getCallbacks(monitor, mode, model): checkpoint = ModelCheckpoint(best_weights_filepath, monitor=monitor, save_best_only=True, save_weights_only=True, mode=mode, verbose=1, save_freq='epoch') save_model_at_epoch_end_callback = LambdaCallback(on_epoch_end=lambda epoch, logs: model.save_weights(last_weights_filepath)) callbacks_list = [checkpoint, save_model_at_epoch_end_callback] return callbacks_list
Function to load a previously stored model:
def loadModel(model, bBest): if(bBest): path = best_weights_filepath strMessage = "load best model" else: path = last_weights_filepath strMessage = "load last model" if(os.path.isfile(path)): model.load_weights(path) print(strMessage, ": File loaded") else: print(strMessage, ": No file to load") return model
As we are doing an "educational" project, let's make it more visual. Below you can see train_and_test() function, that should be used, if we do it "straightforward" way, but instead we interrupt training every 5 epochs to show images. This way we can see improvement:
def trainNetwork(EPOCHS, nL2, nDrop, optimizer, bCumulativeLearning): if(bCumulativeLearning == False): deleteSavedNet(best_weights_filepath) model = createModel(nL2, nDrop, optimizer) print("Model created") callbacks_list = getCallbacks("val_mean_squared_error", 'min', model) print(bCumulativeLearning) if(bCumulativeLearning == True): loadModel(model, False) STEP_SIZE_TRAIN, STEP_SIZE_VALID = getStepSizes() print(STEP_SIZE_TRAIN, STEP_SIZE_VALID) print("Available metrics: ", model.metrics_names) history = model.fit(gen_train, validation_data=gen_valid, verbose=1, epochs=EPOCHS, steps_per_epoch=STEP_SIZE_TRAIN, validation_steps=STEP_SIZE_VALID, callbacks=callbacks_list) print(nL2) plotFullHistory(history) # TBD: here, return best model, not last one return model, history # This function performs the actual training and # calculates the accuracy of a resulting net def train_and_test(EPOCHS, nL2, nDrop, optimizer, learning_rate, bCumulativeLearning): model, history = trainNetwork(EPOCHS, nL2, nDrop, optimizer, bCumulativeLearning) print("loading best model") model = loadModel(model, True) return model
Now, test() function is for predictions and displaying results and images. It is used in our "educational" code:
# Testing on "test" part of dataset def test(model): nAppartmentImageIdx = random.randint(0, len(arrAppartmentImageNames) - 1) nCopterImageIdx = np.random.randint(0, len(arrCopterImageNames) - 1) img_result, corter_center_x, corter_center_y, copter_radius = loadCombinedImage( nAppartmentImageIdx, nCopterImageIdx) print("GT: ", corter_center_x, corter_center_y, copter_radius) test_preds = model.predict(img_result.reshape(1, IMAGE_SIZE_X, IMAGE_SIZE_Y, 3)) print("Pred: ", test_preds * IMAGE_SIZE_X, test_preds * IMAGE_SIZE_Y, test_preds * (IMAGE_SIZE_X/5)) figure, ax = plt.subplots(1, figsize=(12, 12)) ax.imshow(img_result) circle_gt = patches.Circle((corter_center_x, corter_center_y), copter_radius, linewidth=2, edgecolor='r', facecolor="none") ax.add_patch(circle_gt) circle_pred = patches.Circle((test_preds * IMAGE_SIZE_X, test_preds * IMAGE_SIZE_Y), test_preds * (IMAGE_SIZE_X/5), linewidth=2, edgecolor='b', facecolor="none") ax.add_patch(circle_pred) plt.show()
Finally, the "educational training" code, with "proper training" parts commented out:
INIT_LR = 2e-3 opt = tf.keras.optimizers.Adam(0.0002) nL2 = 0.02 nDrop = 0.0 #0.2 if(bDoTraining): EPOCHS = 5 model = createModel(nL2, nDrop, opt) model = loadModel(model, False) print("Model created") callbacks_list = getCallbacks("val_mean_squared_error", 'min', model) STEP_SIZE_TRAIN, STEP_SIZE_VALID = getStepSizes() np.random.seed(7) for i in range(40): history = model.fit(gen_train, validation_data=gen_valid, verbose=1, epochs=EPOCHS, steps_per_epoch=STEP_SIZE_TRAIN, validation_steps=STEP_SIZE_VALID, callbacks=callbacks_list) print(nL2) plotFullHistory(history) test(model) test(model) test(model) test(model) test(model) test(model) model = loadModel(model, True) model.save(best_weights_filepath) # A full model is saved
I have performed the testing with different versions of EfficientNet, from 0 to 5. The difference is in the input size they (Google) used for training.
Below, are images returned by the test() function, 3 images per each version of the net. They are zoomed to the same size, but please keep in mind that the actual size is as in the table quoted above, so B0 uses 224x224 px images and so on.
As the result, for B0 I used quadcopters that only are as small as 1/5 of the room photo. For the rest of the nets I used 1/10 as a min. size.
From the images we can see that:
1. The method does work, and it works rather well.
2. The smaller is the quadcopter image, the less the precision is: the circle around predicted position is off and a predicted size is off as well - more than for a larger images.
3. (This is not obvious from the images) The larger the net and corresponding input images is, the longer the training takes. Well, this was expected. What was not expected is the fact that accuracy does not improve that much for more detailed images.
4. We can use this technology to locate quadcopters and to follow them with camera, but we probably can not use it as is, to build the Star Wars like robotic cannon to shoot copters down - it will miss and especially (if we use projectile weapons that have parabolis trajectory rather than a straight one) it will give errors in size estimations. And size of a copter is a way to estimate the distance to it.
5. Then again, we can collect data from few consecutive frames and calculate a much more accurate average.