[an error occurred while processing this directive] Coins classifier Neural Network. [an error occurred while processing this directive]

# Coins Classification using Neural Networks

This is the first article in a serie dedicated to coins classification.

Having countless "dogs vs cats" or "find a pedestrian on the street" classifiers all over the Internet, coins classification doesn't look like a difficult task. At first. Unfortunately, it is degree of magnitude harder - a formidable challenge indeed. You can easily tell heads of tails? Great. Can you figure out if the number is 1 mm shifted to the left? See, from classifier's view it is still the same head... while it can make a difference between a common coin priced according to the number on it and a rare one, 1000 times more expensive.

Of course, we can do what we usually do in image classification: provide 10,000 sample images... No, wait, we can not. Some types of coins are rare indeed - you need to sort through a BASKET (10 liters) of coins to find one. Easy arithmetics suggests that to get 10000 images of DIFFERENT coins you will need 10,000 baskets of coins to start with. Well, and unlimited time.

So it is not that easy.

Anyway, we are going to begin with getting large number of images and work from there. We will use Russian coins as an example, as Russia had money reform in 1994 and so the number of coins one can expect to find in the pocket is limited. Unlike USA with its 200 years of monetary history. And yes, we are ONLY going to focus on current coins: the ultimate goal of our work is to write a program for smartphone to classify coins you have received in a grocery store as a change.

Which makes things even worse, as we can not count on good lighting and quality cameras anymore. But we'll still try.

In addition to "only Russian coins, beginning from 1994", we are going to add an extra limitation: no special occasion coins. Those coins look distinctive, so anyone can figure that this coin is special. We focus on REGULAR coins. Which limits their number severely.

Don't take me wrong: if we need to apply the same approach to a full list of coins... it will work. But I got 15 GB of images for that limited set, can you imagine how large the complete set will be?!

To get images, I am going to scan one of the largest Russian coins site "meshok.ru".

This site allows buyers and sellers to find each other; sellers can upload images... just what we need. Unfortunately, a business-oriented seller can easily upload his 1 rouble image to 1, 2, 5, 10 roubles topics, just to increase the exposure. So we can not count on the topic name, we have to determine what coin is on the photo ourselves.

To scan the site, a simple scanner was written, based on the Python's Beautiful Soup library. In just few hours I got over 50,000 photos. Not a lot by Machine Learning standards, but definitely a start.

After we got the images, we have to - unfortunately - revisit them by hand, looking for images we do not want in our training set, or for images that should be edited somehow. For example, someone could have uploaded a photo of his cat. We don't need a cat in our dataset.

First, we delete all images, that can not be split to head/hail:

We can also edit images we don't like:

Before:

After:

After we are done with the "unfortunately manual" part, we have a long list of images with names inherited from meshok.ru:

28009792_001.jpg

28645289_000.jpg

28645289_001.jpg

30691757_005.jpg

33768803_003.jpg

34831028_001.jpg

39834334_002.jpg

40450398_000.jpg

42780070_002.jpg

The images themselves can be gropped according to the following types:

a) One image for tail, one image for head.

Or it can be few images for tail and few for head, but the ultimate thing is, there is only one coin per image, and for the same image name (same, except for postfix) it is the same coin:

To process these images, we need to locate the coin (coin might occupy small part of the image, so we want to crop it), and for all images, run a head-or-tail classifier. Then we can combine all heads with all tails (for 4 images above it will yeild 2 pairs). These images are going to be for the same coin, but we can consider it as sort of an additional augmentation due to light and angle change.

b) Single image with tail and head on it.

For this image, we KNOW there are only two coins (one head / one tail) per image, so we locate and crop images, and run head-or-tail classifier. As above, the result is pair of images. Also, we can discard images that have all heads and no tails and vice versa (provided, there is no second image with the same name and different postfix, containing missing coin pictures, see "c" below).

c) Image containing multiple coins.

There should also be images with same name / different postfix, where same coins are flipped. Coins should remain more or less on the same place in both images, so the program can figure out the correspondance:

For this sort of images, we need to locate coins, locate their centers, find closest centers on the second image, and group corresponding coins. We need to make sure we got head/tail pairs. Note that sometimes authors publish images with, say, 6 coins (one image for heads, one for tails) and extra images with, say, 4 coins zoomed in. We need to remove such zoomed images during the "manual cleanup", as I don't want to spend time on the algorithm do do it programmatically.

As the result of the first pass of our processing, we have files with the following names:

27855071_00a.png

27855071_00b.png

...

Here "27855071" is image name (postfix removed), _00 is the coin number within that image, and a/b are coin sides. Note that at this stage we haven't ran the head-or-tail classifier yet, so some pairs will be removed later if a/b do not correspond to head/tail (or tail/head).

Let's take a closer look at the algorithm we used to isolate individual coins on the image. We are going to use YOLO3, as a references, the following links are recommended:
https://github.com/AntonMu/TrainYourOwnYOLO

As we use Google Colab, we need some space to store our data, and Google Drive looks like a good idea. We have large amount of images, but 15Gb it provides is (barely) enough.

drive.mount("/content/drive/", force_remount=True)

We use TF 1.x, as that particular version of YOLO doesn't support v2 yet:

%tensorflow_version 1.x

\$ TensorFlow 1.x selected.

Path for our project and subdirectory for YOLO3. As we work with Colab, we have two choices. First, and this is the most common approach used in online tutorials, we can store files (that we have downloaded from internet sources, like GIT) at the content level, ABOVE the /content/drive/. The advantage is, we don't need Google Drive, or at least, we don't keep NN files at Google Drive, which gives us more space for images. As for disadvantage, we loose files when the session is over. Downloading YOLO files every time is counter productive, so I am going to copy them to Drive:

working_path = "/content/drive/My Drive/01h_yolo3_object_detection/"
yolo_path = working_path + "yolo3/"

The following flag determines if we want to re-train our YOLO or if we simply want to download weights we saved earlier.

bDoTraining = False

Now let's clone YOLO files:

# TBD: check, if folder exists, then skip
!mkdir '{yolo_path}'
!git clone https://github.com/AntonMu/TrainYourOwnYOLO '{yolo_path}'

\$ mkdir: cannot create directory ‘/content/drive/My Drive/01h_yolo3_object_detection/yolo3/’: File exists

\$ fatal: destination path '/content/drive/My Drive/01h_yolo3_object_detection/yolo3' already exists and is not an empty directory.

The error says that files are already there, so it is more of a warning and should be ignored.

There are some problems with code at the repository above: it is not flexible enough. So I am going to copy bits and pieces of it, and use the "local" modified copy. Commented, is the original code.

import os
import subprocess
import time
import sys
import argparse
import requests
import progressbar

FLAGS = None

root_folder = yolo_path
"src", "keras_yolo3/")
data_folder = os.path.join(root_folder, "Data")
model_folder = os.path.join(data_folder, "Model_Weights")

if True:
if True: #not FLAGS.is_tiny:
weights_file = yolo_path + "2_Training/src/keras_yolo3/yolov3.weights"
h5_file = yolo_path + "2_Training/src/keras_yolo3/model_data/yolo.h5"
cfg_file = yolo_path + "2_Training/src/keras_yolo3/yolov3.cfg"
# Original URL: https://pjreddie.com/media/files/yolov3.weights
gdrive_id = "1ENKguLZbkgvM8unU3Hq1BoFzoLeGWvE_"

start = time.time()
call_string = "python '" + download_script + "' " + gdrive_id +
" '" + weights_file + "'"

subprocess.call(call_string, shell=True)

end = time.time()

convert_path = yolo_path + "2_Training/src/keras_yolo3/convert.py"
call_string = f"python '{convert_path}' '{cfg_file}' '{weights_file}' '{h5_file}'"

"""
MODIFIED FROM keras-yolo3 PACKAGE, https://github.com/qqwweee/keras-yolo3
Retrain the YOLO model for your own dataset.
"""

import os
import sys
import argparse
import warnings
import numpy as np
import keras.backend as K
from keras.layers import Input, Lambda
from keras.models import Model

#os.path.join(get_parent_dir(0), "src")
src_path = working_path + "yolo3/2_Training/src/"

sys.path.append(src_path)

#os.path.join(get_parent_dir(1), "Utils")
utils_path = working_path + "yolo3/Utils/"

sys.path.append(utils_path)

from keras.callbacks import (
TensorBoard,
ModelCheckpoint,
ReduceLROnPlateau,
EarlyStopping,
)
from keras_yolo3.yolo3.model import (
preprocess_true_boxes,
yolo_body,
tiny_yolo_body,
yolo_loss,
)
from keras_yolo3.yolo3.utils import get_random_data
from PIL import Image
from time import time
import tensorflow.compat.v1 as tf
import pickle

from Train_Utils import (
get_classes,
get_anchors,
create_model,
create_tiny_model,
data_generator,
data_generator_wrapper,
ChangeToOtherMachine,
)

# os.path.join(src_path, "keras_yolo3")
keras_path = working_path + "yolo3/2_Training/src/keras_yolo3/"

#os.path.join(get_parent_dir(1), "Data")
Data_Folder = working_path + "yolo3/Data/"

#os.path.join(Data_Folder, "Source_Images", "Training_Images")
Image_Folder = working_path + "images/train/"

#VoTT_Folder = os.path.join(Image_Folder, "vott-csv-export")

#os.path.join(VoTT_Folder, "data_train.txt")
annotations_file_name = working_path + "images/annotations.txt"

# Model_Folder = os.path.join(Data_Folder, "Model_Weights")

# os.path.join(Model_Folder, "data_classes.txt")
yolo_class_names = working_path + "yolo3/Data/Model_Weights/classes.txt"

#Model_Folder
log_dir = working_path + "yolo3/Data/Model_Weights/"

#os.path.join(keras_path, "model_data", "yolo_anchors.txt")
anchors_path = working_path +
"yolo3/2_Training/src/keras_yolo3/model_data/yolo_anchors.txt"

# Attn! No yolo.h5 at os.path.join(keras_path, "yolo.h5")!
# Either error, or this file is a "created new".

#os.path.join(keras_path, "yolo.h5")
weights_path = working_path +
"yolo3/2_Training/src/keras_yolo3/model_data/yolo.h5"

Training the YOLO3:

FLAGS = None

if bDoTraining:
val_split = 0.1
is_tiny = False
random_seed = None # 42
epochs = 51

if True: #not FLAGS.warnings:
tf.logging.set_verbosity(tf.logging.ERROR)
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
warnings.filterwarnings("ignore")

np.random.seed(random_seed) #FLAGS.random_seed)

#log_dir = FLAGS.log_dir

class_names = get_classes(yolo_class_names) #FLAGS.yolo_class_names)
num_classes = len(class_names)

anchors = get_anchors(anchors_path)

input_shape = (416, 416)  # multiple of 32, height, width
epoch1, epoch2 = epochs, epochs #FLAGS.epochs, FLAGS.epochs

if False: #FLAGS.is_tiny:
model = create_tiny_model(
input_shape, anchors, num_classes, freeze_body=2,
weights_path=weights_path
)
else:
model = create_model(
input_shape, anchors, num_classes, freeze_body=2,
weights_path=weights_path
)  # make sure you know what you freeze

log_dir_time = os.path.join(log_dir, "{}".format(int(time())))
logging = TensorBoard(log_dir=log_dir_time)
checkpoint = ModelCheckpoint(
os.path.join(log_dir, "checkpoint.h5"),
monitor="val_loss",
save_weights_only=True,
save_best_only=True,
period=5,
)
reduce_lr = ReduceLROnPlateau(monitor="val_loss", factor=0.9,
patience=10, verbose=1)
early_stopping = EarlyStopping(monitor="val_loss", min_delta=0,
patience=100, verbose=1)

#with open(FLAGS.annotation_file_name) as f:
with open(annotations_file_name) as f:

# This step makes sure that the path names correspond
# to the local machine
# This is important if annotation and training are done on
# different machines (e.g. training on AWS)
# lines = ChangeToOtherMachine(lines, remote_machine="")

for i, line in enumerate(lines):
line = working_path + "images/train/" + line.split("/")[-1]
lines[i] = line

np.random.shuffle(lines)
num_val = int(len(lines) * val_split)
num_train = len(lines) - num_val

# Train with frozen layers first, to get a stable loss.
# to obtain a decent model.
if True:
model.compile(
loss={
# use custom yolo_loss Lambda layer.
"yolo_loss": lambda y_true, y_pred: y_pred
},
)

batch_size = 32
print(
"Train on {} samples, val on {} samples, with batch size {}.".format(
num_train, num_val, batch_size
)
)
history = model.fit_generator(
data_generator_wrapper(
lines[:num_train], batch_size, input_shape, anchors, num_classes
),
steps_per_epoch=max(1, num_train // batch_size),
validation_data=data_generator_wrapper(
lines[num_train:], batch_size, input_shape, anchors, num_classes
),
validation_steps=max(1, num_val // batch_size),
epochs=epoch1,
initial_epoch=0,
callbacks=[logging, reduce_lr, checkpoint],
)
model.save_weights(os.path.join(log_dir, "trained_weights_stage_1.h5"))

step1_train_loss = history.history["loss"]

file = open(os.path.join(log_dir_time, "step1_loss.npy"), "w")
with open(os.path.join(log_dir_time, "step1_loss.npy"), "w") as f:
for item in step1_train_loss:
f.write("%s\n" % item)
file.close()

step1_val_loss = np.array(history.history["val_loss"])

file = open(os.path.join(log_dir_time, "step1_val_loss.npy"), "w")
with open(os.path.join(log_dir_time, "step1_val_loss.npy"), "w") as f:
for item in step1_val_loss:
f.write("%s\n" % item)
file.close()

# Unfreeze and continue training, to fine-tune.
# Train longer if the result is unsatisfactory.
if True:
for i in range(len(model.layers)):
model.layers[i].trainable = True
model.compile(
y_pred: y_pred}
)  # recompile to apply the change
print("Unfreeze all layers.")

batch_size = (
4  # note that more GPU memory is required after unfreezing the body
)
print(
"Train on {} samples, val on {} samples, with batch size {}.".format(
num_train, num_val, batch_size
)
)
history = model.fit_generator(
data_generator_wrapper(
lines[:num_train], batch_size, input_shape, anchors, num_classes
),
steps_per_epoch=max(1, num_train // batch_size),
validation_data=data_generator_wrapper(
lines[num_train:], batch_size, input_shape, anchors, num_classes
),
validation_steps=max(1, num_val // batch_size),
epochs=epoch1 + epoch2,
initial_epoch=epoch1,
callbacks=[logging, checkpoint, reduce_lr, early_stopping],
)
model.save_weights(os.path.join(log_dir, "trained_weights_final.h5"))
step2_train_loss = history.history["loss"]

file = open(os.path.join(log_dir_time, "step2_loss.npy"), "w")
with open(os.path.join(log_dir_time, "step2_loss.npy"), "w") as f:
for item in step2_train_loss:
f.write("%s\n" % item)
file.close()

step2_val_loss = np.array(history.history["val_loss"])

file = open(os.path.join(log_dir_time, "step2_val_loss.npy"), "w")
with open(os.path.join(log_dir_time, "step2_val_loss.npy"), "w") as f:
for item in step2_val_loss:
f.write("%s\n" % item)
file.close()

if(bDoTraining):
%tensorboard --logdir '{log_dir}'

Now it is the inference time.

input_path = working_path + "images/"
output_file = working_path + "output.txt"
detector_path = working_path + "yolo3/3_Inference/Detector.py"

import os
import sys

#os.path.join(get_parent_dir(1), "2_Training", "src")
src_path = working_path + "yolo3/2_Training/"

#os.path.join(get_parent_dir(1), "Utils")
utils_path = working_path + "yolo3/Utils/"

sys.path.append(src_path)
sys.path.append(utils_path)

import argparse
from keras_yolo3.yolo import YOLO, detect_video, detect_webcam

from PIL import Image
from timeit import default_timer as timer
parse_input, detect_object
import test
import utils
import pandas as pd
import numpy as np
from Get_File_Paths import GetFileList
import random
from Train_Utils import get_anchors

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"

# Set up folder names for default values

#os.path.join(get_parent_dir(n=1), "Data")
data_folder = working_path + "yolo3/Data/"

#os.path.join(image_folder, "Test_Images")
image_test_folder = working_path + "my_converted_from_dcim/"

#os.path.join(image_folder, "Test_Image_Detection_Results")
detection_results_folder = working_path +
"images/Test_Image_Detection_Results/"

detection_results_file = os.path.join(detection_results_folder,
"Detection_Results.csv")

model_folder = os.path.join(data_folder, "Model_Weights")

model_weights = os.path.join(model_folder, "trained_weights_final.h5")
model_classes = os.path.join(model_folder, "classes.txt")

#anchors_path = os.path.join(src_path, "keras_yolo3",
#	"model_data", "yolo_anchors.txt")

#os.path.join(keras_path, "model_data", "yolo_anchors.txt")
anchors_path = working_path +
"yolo3/2_Training/src/keras_yolo3/model_data/yolo_anchors.txt"

FLAGS = None

# Replacement for command line arguments

# Path to image/video directory. All subdirectories will be included.
input_path = image_test_folder

# Output path for detection results.
output = detection_results_folder

# Only save bounding box coordinates but do not
# save output images with annotated boxes.
no_save_img = False

# Specify list of file types to include.
# Default is --file_types .jpg .jpeg .png .mp4
file_types = []

# Path to pre-trained weight files. Default is " + model_weights
model_path = model_weights

# Path to YOLO anchors.
#anchors_path = anchors_path

# Path to YOLO class specifications.
classes_path = model_classes

# Number of GPU to use.
gpu_num = 1

# Threshold for YOLO object confidence score
# to show predictions.
score = 0.25

# File to save bounding box results to.
box = output_file #detection_results_file

# Specify the postfix for images with bounding boxes.
# Default is "coin"
postfix = "_coin"

# Use the tiny Yolo version for better performance
# and less accuracy.
is_tiny = False

# Use webcam for real-time detection. Default is False.
webcam = False

save_img = not no_save_img
#file_types = FLAGS.file_types

webcam_active = webcam

if file_types:
input_paths = GetFileList(input_path, endings=file_types)
else:
input_paths = GetFileList(input_path)

# Split images and videos
img_endings = (".jpg", ".jpeg", ".png")
vid_endings = (".mp4", ".mpeg", ".mpg", ".avi")

input_image_paths = []
input_video_paths = []
for item in input_paths:
if item.endswith(img_endings):
input_image_paths.append(item)
elif item.endswith(vid_endings):
input_video_paths.append(item)

output_path = output
if not os.path.exists(output_path):
os.makedirs(output_path)

if is_tiny and anchors_path == anchors_path:
anchors_path = os.path.join(os.path.dirname(anchors_path),
"yolo-tiny_anchors.txt")

anchors = get_anchors(anchors_path)

# define YOLO detector
yolo = YOLO(
**{
"model_path": model_path,
"anchors_path": anchors_path,
"classes_path": classes_path,
"score": score,
"gpu_num": gpu_num,
"model_image_size": (416, 416)
}
)

# Make a dataframe for the prediction outputs
out_df = pd.DataFrame(
columns=[
"image",
"image_path",
"xmin",
"ymin",
"xmax",
"ymax",
"label",
"confidence",
"x_size",
"y_size",
]
)

# labels to draw on images
class_file = open(classes_path, "r")
input_labels = [line.rstrip("\n") for line in class_file.readlines()]
print("Found {} input labels: {} ...".format(len(input_labels),
input_labels))

arrPredictions = []

if input_image_paths and not webcam_active:
print(
"Found {} input images: {} ...".format(
len(input_image_paths),
[os.path.basename(f) for f in input_image_paths[:5]],
)
)
start = timer()
text_out = ""

# This is for images
for i, img_path in enumerate(input_image_paths):

print(img_path)
prediction, image = detect_object(
yolo,
img_path,
save_img=save_img,
save_img_path=output,
postfix=postfix,
)
y_size, x_size, _ = np.array(image).shape

arrPredictions.append([
img_path.rstrip("\n"),
prediction,
x_size, y_size,
""]) # single/pair, to be filled later

end = timer()
print(
"Processed {} images in {:.1f}sec - {:.1f}FPS".format(
len(input_image_paths),
end - start,
len(input_image_paths) / (end - start),
)
)
#out_df.to_csv(box, index=False)

# This is for videos
# for pre-recorded videos present in the Test_Images folder

Open CV loads images based on EXIF rotation info, which is NOT what we need, so we turn it off.

import cv2

if bDoTraining:
for image_info in arrCoinLocations[1:]:
image_path = image_info[0]

clone = image.copy()
# loop over the bounding boxes and associated probabilities
for boxes in image_info[1:-3]:
for box in boxes:
# draw the bounding box, label, and
# probability on the image
(startX, startY, endX, endY) = box[0], box[1],
box[2], box[3] #int(box[0]), int(box[1]),
int(box[2]), int(box[3])
cv2.rectangle(clone, (startX, startY), (endX, endY),
(0, 255, 0), 2)

# show the output after *before* running NMS
cv2_imshow(clone)

Example of frame the algorithm draws around the coin.

As you can see, we got frames around images, but they are too tight and cross the coin sometimes. So we need to provide extra space:

# Crop images according to boxes but a bit larger

import cv2
import seaborn as sns

cmap = (np.array(sns.color_palette("Paired", 100))
* 255).astype(int)
#print(cmap)

dResize = 0.07

for nCount, image_info in enumerate(arrCoinLocations):

#image_info = arrCoinLocations[:][0]

image_path = image_info[0]
size_x = image.shape[1]
size_y = image.shape[0]

clone = image.copy()
# loop over the bounding boxes and associated probabilities
for boxes in image_info[1:-3]:
for nColorIdx, box in enumerate(boxes):
# draw the bounding box, label, and probability on the image
(startX, startY, endX, endY) = box[0], box[1], box[2], box[3]

box_width = endX - startX + 1
box_height = endY - startY + 1
startX = 0 if startX - box_width * dResize < 0
else int(startX - box_width * dResize)
startY = 0 if startY - box_height * dResize < 0
else int(startY - box_height * dResize)
endX = size_x - 1 if endX + box_width * dResize > size_x - 1
else int(endX + box_width * dResize)
endY = size_y - 1 if endY + box_height * dResize > size_y - 1
else int(endY + box_height * dResize)

box[0], box[1], box[2], box[3] = startX, startY, endX, endY

cv2.rectangle(clone, (startX, startY), (endX, endY),
(int(cmap[nColorIdx][0]), int(cmap[nColorIdx][1]),
int(cmap[nColorIdx][2])), 2)

#img_crop = clone[startY:endY, startX:endX]
#cv2_imshow(img_crop)
#cv2.imwrite(working_path + "images/cropped/" + , img_crop)

# show the output after *before* running NMS
#print(image_path)
#cv2_imshow(clone)
if(nCount%100 == 0):
print(nCount)

from pathlib import Path
from shutil import copyfile

for i, coin_info in enumerate(arrCoinLocations):
image_path = coin_info[0]

# Another way
#filename = imagePath.split(os.path.sep)[-1]
#filename = filename[:filename.rfind(".")]

# Our algoritm scans from current item forward, for each
# item (see below for item[-1] = "pair").
# If an item is already marked, we don't need to re-scan it
if(coin_info[-1] != ""):
continue

# We need to go from path/name_XXX.yyy to name
word_list = image_path.split("/")
image_name = word_list[-1]
#print(image_name)

# ['37076234', '000.jpg']
image_root = re.split("_", image_name)

image_num = image_root[1].split(".")[0]

# Look for other images with same root
for j,item in enumerate(arrCoinLocations[i:]):
if image_root[0] in item[0] and image_name not in item[0]:
#print("\tPairs:\n\t\t", image_path, "\n\t\t",
#   arrCoinLocations[i + j][0])
item[-1] = "pair"
coin_info[-1] = "pair"

# Now only single images left
if(coin_info[-1] == "pair"):
continue

coin_info[-1] = "single"

def get_resize_dim(left_1, top_1, right_1, bottom_1):
if(right_1 - left_1 > bottom_1 - top_1):
dim = (nMinSize, int(nMinSize * (bottom_1 - top_1)
/ (right_1 - left_1)))
else:
dim = (int(nMinSize * (right_1 - left_1)
/ (bottom_1 - top_1)), nMinSize)
return dim

nTotalCoins = 0
nCoinsMinSizePlus = 0
nMinSize = 512

for i, coin_info in enumerate(arrCoinLocations):
if(i%100 == 0):
print(i)

image_path = coin_info[0]

save_path = working_path + "images_individual_coins/"

# We need to go from path/name_XXX.yyy to name
word_list = image_path.split("/")
image_name = word_list[-1]

# ['37076234', '000.jpg']
image_root = re.split("_", image_name)

image_name_no_ext = re.split("\.", image_name)[0]

if(coin_info[-1] == "single"):
nTotalCoins = nTotalCoins + 1

#print("\tSingle:", image_name)

# it was a single image, but there is only one coin on it,
# or more than two coins: can not split it to head+tail
if(len(coin_info[1]) != 2):
continue

# For a "single" type, we expect one image with both head and tail on it
rectSideOne = coin_info[1][0][:4]
rectSideTwo = coin_info[1][1][:4]

(left_1, top_1, right_1, bottom_1) = rectSideOne
(left_2, top_2, right_2, bottom_2) = rectSideTwo

if((bottom_1 - top_1 >= nMinSize or right_1 - left_1 >= nMinSize)
and (bottom_2 - top_2 >= nMinSize or right_2 - left_2 >= nMinSize)):
nCoinsMinSizePlus = nCoinsMinSizePlus + 1

crop_img_1 = image_np[top_1:bottom_1, left_1:right_1]
crop_img_2 = image_np[top_2:bottom_2, left_2:right_2]

# resize image
dim = get_resize_dim(left_1, top_1, right_1, bottom_1)
crop_img_1 = cv2.resize(crop_img_1, dim,
interpolation = cv2.INTER_AREA)

dim = get_resize_dim(left_2, top_2, right_2, bottom_2)
crop_img_2 = cv2.resize(crop_img_2, dim,
interpolation = cv2.INTER_AREA)

image_save_name = save_path + image_root[0] + "_00a.png"
#plt.imsave(save_path + image_save_name, crop_img_1)
cv2.imwrite(image_save_name, crop_img_1)

image_save_name = save_path + image_root[0] + "_00b.png"
#plt.imsave(save_path + image_save_name, crop_img_2)
cv2.imwrite(image_save_name, crop_img_2)

#plt.figure(figsize=(4,4))
#f, axarr = plt.subplots(1,2)

#axarr[0].imshow(crop_img_1)
#axarr[1].imshow(crop_img_2)

#plt.show()

#print("======================")

# We may have one OR MORE extra images: _001, _002...
# As the human (owner of a dataset) haven't removed them,
# they ALL are considered valid.
# Example: we have 3 images: 001, 002, 003, each containing 2 coins,
# a and b, (001 contains heads, 002, 003 contain tails, but shot from
# a different angle, so
# the human decided to include them, to increase dataset's divercity).
# We want to create pairs 001a/002a, 001b/002b, 001a/003a,
#     001b/003b, 002a/003a, 002b/003b
# it means that we DO NOT prevent an image that was already scanned
# as "item" to be scanned again as "coin_info" (otherwise, uncomment
#     "processed" tag.
elif(coin_info[-1] == "pair"): # or coin_info[-1] == "processed"):

if(len(coin_info[1]) < 1):
continue

nTotalCoins = nTotalCoins + len(coin_info[1])

for j,item in enumerate(arrCoinLocations[i:]):

# Both images should contain same number of coins.
# It is possible to create a valid image pair NOT having
# same num. of coins (say, we have 5 coins
# on first image, and zoom in to keep 3 coins on second one).
# Human can probably figure that out, but we discart this
# case, as an extreamely rare one.
if(len(item[1]) != len(coin_info[1])):
continue

# Note that we may have image names like
# 125_001.jpg vs 2125_001.jpg, so we need to do exact check,
# rather than ising "in"
image_path_1 = item[0]
word_list_1 = image_path_1.split("/")
image_name_1 = word_list_1[-1]
# ['37076234', '000.jpg']
image_root_1 = re.split("_", image_name_1)

# image name is 125_001, 125 is in 125_000, but 125_000 not
# in 125_001, we have found the next image of the set
# name_001, name_002...
# Note that this check is excessive, as
# "for j,item in enumerate(arrCoinLocations[i:]):"
# guarantees the "image_name not in item[0]"
if image_root[0] == image_root_1[0] and image_name not in image_name_1:
#print("\tPair:", image_name, ", ", image_name_1)

image_name_no_ext_1 = re.split("\.", image_name_1)[0]

# Find centers of rectangles for second coin side
# (array as there can be more than one coin on image),
# and adjust them for a bounding box of coin group
# (this is done in case "heads" are zoomed, compared to "tails").
left_1 = coin_info[2]      # it is going to decrease
top_1 = coin_info[3]

arrCenters_1 = []
for nRectIdx in range(0, len(coin_info[1])):
(left, top, right, bottom) = coin_info[1][nRectIdx][:4]
arrCenters_1.append([left + (right - left + 1) / 2,
top + (bottom - top + 1) / 2])
left_1 = min(left_1, int(left))
top_1 = min(top_1, int(top))

for center in arrCenters_1:
center[0] = center[0] - left_1
center[1] = center[1] - top_1

left_2 = item[2]
top_2 = item[3]

arrCenters_2 = []
for nRectIdx in range(0, len(item[1])):
(left, top, right, bottom) = item[1][nRectIdx][:4]
arrCenters_2.append([left + (right - left + 1) / 2,
top + (bottom - top + 1) / 2])
left_2 = min(left_2, left)
top_2 = min(top_2, top)

for center in arrCenters_2:
center[0] = center[0] - left_2
center[1] = center[1] - top_2

# Now we need to find the correspondence
arrPairIdx = []
for idx_1,center_1 in enumerate(arrCenters_1):
(x1,y1) = center_1
nMinIdx = 0
nMinDistance = 100000000 # more than any image size squared
for idx_2,center_2 in enumerate(arrCenters_2):
(x2,y2) = center_2
distance = (x1-x2)**2 + (y1 - y2)**2
if(nMinDistance > distance):
nMinIdx = idx_2
nMinDistance = distance
arrPairIdx.append([idx_1, nMinIdx])

# At this point we know which rect on first image
# corresponds to which one on second

for k, idxs in enumerate(arrPairIdx):
(left_1, top_1, right_1, bottom_1) = coin_info[1][idxs[0]][:4]
(left_2, top_2, right_2, bottom_2) = item[1][idxs[1]][:4]

if((bottom_1 - top_1 >= nMinSize or right_1 - left_1 >= nMinSize)
and (bottom_2 - top_2 >= nMinSize or right_2 - left_2 >= nMinSize)):
nCoinsMinSizePlus = nCoinsMinSizePlus + 1

crop_img_1 = image_np_1[top_1:bottom_1, left_1:right_1]

# resize image
dim = get_resize_dim(left_1, top_1, right_1, bottom_1)
crop_img_1 = cv2.resize(crop_img_1, dim,
interpolation = cv2.INTER_AREA)

image_save_name = image_root[0] + "_{:02d}".format(k) + "a.png"
#plt.imsave(save_path + image_save_name, crop_img)
cv2.imwrite(save_path + image_save_name, crop_img_1)

crop_img_2 = image_np_2[top_2:bottom_2, left_2:right_2]

dim = get_resize_dim(left_2, top_2, right_2, bottom_2)
crop_img_2 = cv2.resize(crop_img_2, dim,
interpolation = cv2.INTER_AREA)

image_save_name = image_root[0] + "_{:02d}".format(k) + "b.png"
#plt.imsave(save_path + image_save_name, crop_img)
cv2.imwrite(save_path + image_save_name, crop_img_2)

#axarr[1].imshow(crop_img)
#plt.show()

#print("======================")

#item[-1] = "processed"

# Now we need to move source file to trash, but make it
# zero size first so it doesn't take space there

# overwrite and make the file blank instead
open(image_path, 'w').close()
os.remove(image_path)

print("nTotalCoins:", nTotalCoins, "; nCoinsMinSizePlus:", nCoinsMinSizePlus)

Cleanup:

yolo.close_session()

Let's summarize this step. We have, as the input, raw (ok, some manual cleanup was applied) images, and as the output, we got pairs of (presumably) head/tail. By "presumably" I mean that we'll figure it out on the next step. [an error occurred while processing this directive]