Full Cycle Development of Keras Program
It all started when I learned, that there is an application in Apple Store, capable of detecting quality of a watermelon by the sound of it. The program was... strange. For example, instead of using knuckles or (I found that it produces more reliable results) tips of fingers, it tells you to knock at the watermelon... by the phone itself! Nevertheless, I decided to reproduce this functionality at the Android platform.
Our task can be solved in a few different ways, frankly, I had to restrict my own enthusiasm in order to go the "simple" way. It means no Fourier transforms, no wavelets, and no signal editors. In other words, I wanted to get some experience with neural networks, so I made the networks to do data analysis.
As a library to create and teach neural networks, I chose Kera — a TensorFlow wrapper tool provided by Google. It is a logical choice, both if you are a profi or just a beginner in the field of Deep Learning; one can not think of a better tool. On one side, Keras is a very powerful tool, optimized both by speed, memory use and hardware (yes, it can work on video cards and even clasters of video cards). On the other side, everything that can be "hidden" from the end user, is hidden, so you don't have to think, for example, about connecting layers of a neural network. It is very convenient.
Keras, like most of modern Deep Learning tools, require some knowlege of Python - this language, like a big ugly snake... sorry, just got emotional :) Anyway, I don't think one can seriously work with Deep Networks without using Python. Lucky us, Python can be mastered in two weeks, one month at max.
In addition to Python, you will need some extra libraries, but it is not going to be difficult - i mean, if you managed to handle the Python itself. You will need to be familiar (just familiar) with NumPy, PyPlot and maybe couple of other libraries from which we are going to use just few functions.
To conclude, note, that for this particular task we are not going to need the above mentioned clasters of video cards — it can be solved on a CPU of an average computer. It is not very fast, but not critically slow, either.
First we need to create a Neural Network, using Python and Keras. We are going to work under Ubuntu: it is also possible to use Ubuntu emulation program, like VMware Player. Technically, it is ok to work under Windows, but in my opinion, you will spend too much extra time, because Python and Keras were originally developed for Linux. If you spend the same amount of time studying Ubuntu, you will get a much more comfortable environment, plus useful knowleges.
As our objective is to not just create an abstract neural network, but "something useful", the next step is writing a program. I plan on doing it using Java for Андроид. At this step it is going to be a prototype, meaning that it is going to have UI, but no Neural Network - yet.
You might inquire, what is the purpose of creating the "dull" program, wouldn't it be better to do it at the end, when we already have a Neural Network to integrate in it? The problem is, any data analysing program requires... data. Particularly, we need to get samples to train our net - where should we obtain them from? Indeed, imagine, how many watermelons we have to "knock at" (and taste, as we need to provide both audio sample and the results of a test for the network to learn), so that the neural net could build a reliable model? One hundred? More?
This is when our "dull" program comes into play: we put it in Google Play, give it to our friends (ok, brutally force our friends to install it); and data will, in a thin flow come... By the way, where should it come?
The next step is going to be writing a server program, accepting data from our Android application. This server program is very simple, it only took me about twenty minutes to finish it, but it is still a separate step in our project.
Finally, we have enough data, so we can teach the Neural Network.
Then we need to port the network to Java and release a fully functional version of Android app.
Profit. No, wait! The program was free. Only experience and the eBook that I can write and offer for sale.
Installing Python and Libraries
All installation instructions are provided for Ubuntu. For Windows and other operational systems, instructions are available in the Internet, but as Keras is originally intended for Ubuntu and only is ported to Windows at a residual approach... I suggest that you work under Ubuntu.
Install Python:
Ubuntu novadays ships with both Python 3 and Python 2 pre-installed. I am going to use Python 3. For beginners: the following commands should be typed in Ubuntu Terminal.
To make sure that our versions are up-to-date, let’s upgrade the system with apt-get ($ sign here is a system prompt):
$ sudo apt-get update $ sudo apt-get -y upgrade
The -y flag here confirms that we agree for everything to be installed.
When the process is over, check the version of Python that is installed in your system:
$ python3 -V (Python 3.5.2)
Install pip:
$ sudo apt-get install -y python3-pip
pip is a Python, it allows you to install and manage Python packages. To install an abstract Python package, type:
$ pip3 install package_name
* * *
There are some additional packages and tools you may want to install "just in case":
$ sudo apt-get install build-essential libssl-dev libffi-dev python-dev
To have copy-paste and full screen desktop in your emulator, install Open Vm Desktop:
$ sudo apt-get autoremove open-vm-tools
Reboot the VM, if you use one.
$ sudo apt-get install open-vm-tools-desktop
Reboot the VM, after the reboot copy/paste and drag/drop will work!
There are few ways to organize our tools. If interested, google up iPython and Anakonda. In this project we are going to use a simple yet powerful concept of virtual environments.
What is a virtual environment? As we work with our project, we may use certain versions of certain tools. For example, typing
$ python filename.py
we expect the Python v.3 to run, and use particular versions of a good dosen of libraries. What if for another project we need a different version of the library?
By starting a virtual environment, we make sure that whatever was installed under it - is used. If we start another virtual environment, with diffeent versions of tools installed, it will not interfere with tools installed undet some other virtual environments.
sudo apt-get install -y python3-venv
First of all, we need a Virtual Environment, then we need Keras in it:
$ mkdir python-virtual-environments && cd python-virtual-environments
Create a new virtual environment inside the directory (Python 3):
$ python3 -m venv keras-tf
After the Virtual Environment is created, it has to be activated:
$ source keras-tf/bin/activate
To switch environment off:
$(keras VE prompt): deactivate
Install TensorFlow
From activated virtual environment, type in terminal:
pip install -U tensorflow
To validate the Tensor Flow you have installed, type:
(venv)$ python -c "import tensorflow as tf; print(tf.__version__)"
Installing Keras
pip install keras
You will have to install numpy, scipy, matplotlib, ipython, jupyter, pandas, sympy, and nose.
(venv)$ python -m pip install --user numpy scipy matplotlib ipython jupyter pandas sympy nose (venv)$ sudo apt-get update
If you need to install something else, say, you want some extra functionality, just use pip:
(venv)$ pip install Pillow
At this point we are about to start writing Python code to import our sound samples and to feed them to a Neural Network. However, so far we have no idea what our sound samples are going to be; it means we have to put Python aside and to switch to writing the "dull" application. No mater how "dull" it is, it will be able to record audio samples and therefore, we will be able to work with something we can be sure will not change in future due to our miscalculations.
First of all, let's decide what the UI should be. This is a common (and often missing) step of any project; as our program is simple, declaring its features in a free form is sufficient. Managers of larger projects are enjoing dosens meetings, few hours each, spending millions dollrs on icons and backgrounds (I am not kidding!). Let's thank them for showing us a path we are not going to walk.
The detailed description of the UI flow is awailable at project's site.
Using Android Studio, create an empty project. As it is going to be relatively simple, we will have most of the code in a single Activity.
In the build.gradle (Module:app), add import for Tensor Flow support (note: you do not have to do it right now, just do not forget to add it when importing neural net from Keras):
dependencies { implementation 'org.tensorflow:tensorflow-android:+' ... }
In the Manifest, ask for permissions.
<?xml version="1.0" encoding="utf-8"?> <manifest xmlns:android="http://schemas.android.com/apk/res/android" package="com.snowcron.cortex.melonaire"> <uses-permission android:name="android.permission.RECORD_AUDIO"/> <uses-permission android:name="android.permission.INTERNET" /> <application android:allowBackup="true" android:icon="@mipmap/ic_launcher" android:label="@string/app_name" android:roundIcon="@mipmap/ic_launcher_round" android:supportsRtl="true" android:theme="@style/AppTheme"> <activity android:name=".MainActivity" android:screenOrientation="portrait"> <intent-filter> <action android:name="android.intent.action.MAIN" /> <category android:name="android.intent.category.LAUNCHER" /> </intent-filter> </activity> </application> </manifest>
Add icons for the buttons we are going to use.
Note: I have added them in res/drawable, which is an easy approach, but if you want to have different icons for different screen resolutions, you will have to create corresponding subdirectories for each size of icons. From now on, I will asume that you know the basics of Android programming.
layout_main.xml
This is the main layout file of our application. The second (and last) layout file is included in the first one, and contains a simple template for tabs buttons on the top of a screen.
What happens here, we are creating a screen with three tabs, Test, Save and Submit (see project's site for details). Each tab has its section in this same file, and each section contains its UI elements.
To make it easier to grasp, below are screenshots of the three screens; second and third screens are compressed to take up less screen space.
"Test" screen.
"Save" screen.
"Submit" screen.
MainActivity.java
Our project contains just two java files, MainActivity.java is the main one. It handles user interaction with layout_main.xml, audio recording, feeding data to neural network, in other words, everything except uploading audio samples to our servers: it is handled by HttpFileUpload.java.
Let's explore the most important parts of this code.
First of all, we are going to handle user interaction with screen control in a centralized way, in the activity. It is a common approach in Android programming.
public class MainActivity extends Activity implements View.OnClickListener
We move all class members to the top part of a class, just to improve readability of a text:
// Permissions can be asked explicitly, later in code // I will do it, just as an exercise. For our "common" // permissions it is not really necessary. final int PERMISSION_RECORD_AUDIO = 1; final int PERMISSION_INTERNET = 2; // In more or less complex applications, it is a good idea to // store controls rather than to obtain them every time by // the resource id. In our simple app it is not a must. ImageButton m_btnMicrophone = null; ImageButton m_btnMelon = null; ImageButton m_btnSave = null; ImageButton m_btnSubmit = null; TextView m_viewTest = null; TextView m_viewSave = null; TextView m_viewSubmit = null; TextView m_viewResults = null; View m_icnProgressMicrophone = null; View m_icnProgressMelon = null; View m_icnSubmitProgress = null; // Counter of files with audio samples we created. int m_nCounter = 1; // Names of files with audio samples public Listm_arrFileNames = null; // --- // 5 seconds audio recording. This, of course, should be a constant. int m_nDurationOfSample = 5; // Size of the audio buffer we are going to use during recording. int m_BufferSize = 44100; // Variables used by audio recorder AudioRecord audioRecord = null; boolean m_bIsReading = false; final String TAG = "AudioRecorderLog"; // Note: Our app is capable of finding the max. sample rate // a current device supports and to use it; however, I am // only going to use 44.1K samples to teach the NN. // So this 8000 Herz is not very useful. Nevertheless, the code // is in place. int m_nSampleRate = 8000; OutputStream os = null; // --- //Load the tensorflow inference library. // You can comment it out for "dull" version of a program. static { System.loadLibrary("tensorflow_inference"); } //PATH TO OUR MODEL FILE AND NAMES OF THE INPUT AND OUTPUT NODES // converted.pb is the file we (will) export from Keras. // You can comment it out for "dull" version of a program. private String MODEL_PATH = "file:///android_asset/converted.pb"; // When you have many (hundreds) labels (outputs) for your neural // networks, it is a good idea to keep them in a file, not in the code. // I am not going to use this approach in this project, as our labels // are: // { // "0" : "Sweetness", // "1" : "Ripeness" //} private static final String LABEL_PATH = "file:///android_asset/labels.json"; // Names of inputs and outputs of a network. I am going to explain // how to obtain them later, when building a Keras NN. private String INPUT_NAME = "c1d_input"; private String OUTPUT_NAME = "output_1"; private TensorFlowInferenceInterface tf; // ARRAY TO HOLD THE PREDICTIONS AND FLOAT VALUES TO HOLD THE IMAGE DATA // Note: see explanation for labels.json above. private float[] m_arrPrediction = new float[2]; private float[] m_arrInput = null;
PredictionTask creates and uses the Neural Network from the file, exported from Keras. It can be commented out for the dull version.
class PredictionTask extends AsyncTask{ @Override protected void onPreExecute() { super.onPreExecute(); // While the network is busy, calculating, change apearance // and functions of controls (make sure the user does not // click the same button twice): m_btnMelon.setEnabled(false); m_btnMicrophone.setEnabled(false); m_icnProgressMelon.setVisibility(View.VISIBLE); // Create a Tensor Flow object tf = new TensorFlowInferenceInterface(getAssets(), MODEL_PATH); } // --- @Override protected Void doInBackground(Void... params) { try { //Pass input into the tensorflow tf.feed(INPUT_NAME, m_arrInput, 1 /*batch*/ m_arrInput.length, 1 /*channels*/ ); //compute predictions tf.run(new String[]{OUTPUT_NAME}); //copy the output into the PREDICTIONS array tf.fetch(OUTPUT_NAME, m_arrPrediction); } catch (Exception e) { e.getMessage(); } return null; } // --- @Override protected void onPostExecute(Void result) { super.onPostExecute(result); // Cleanup. We are going to re-initialize these variables tf = null; m_arrInput = null; // Make controls available again. m_btnMelon.setEnabled(true); m_btnMicrophone.setEnabled(true); // From predicted values, build a string and // show it to the user. int nSweetness = Math.round(m_arrPrediction[0]); int nRipeness = Math.round(m_arrPrediction[1]); String[] arrSweetness = getResources().getStringArray( R.array.sweetness); String[] arrRipeness = getResources().getStringArray( R.array.ripeness); String strResult = "Sweet.: " + arrSweetness[nSweetness] + "; Ripe.: " + arrRipeness[nRipeness]; m_viewTest.setText(m_viewTest.getText() + strResult + "\n"); m_viewResults.setText(strResult); // Hide the progress spinner m_icnProgressMelon.setVisibility(View.GONE); // Make controls available again when the job is over enableTabs(); } }
In the UI part of our program, we are going to create and populate tabs.
public void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.layout_main); // Create and populate tabs TabHost tabHost = findViewById(android.R.id.tabhost); tabHost.setup(); TabHost.TabSpec tabSpec; tabSpec = tabHost.newTabSpec("tag1"); tabSpec.setIndicator(getString(R.string.tab_test), getResources().getDrawable(R.drawable.tab_icon_selector)); tabSpec.setContent(R.id.tab_1); tabHost.addTab(tabSpec); tabSpec = tabHost.newTabSpec("tag2"); tabSpec.setIndicator(getString(R.string.tab_save), getResources().getDrawable(R.drawable.tab_icon_selector)); tabSpec.setContent(R.id.tab_2); tabHost.addTab(tabSpec); tabSpec = tabHost.newTabSpec("tag3"); tabSpec.setIndicator(getString(R.string.tab_submit), getResources().getDrawable(R.drawable.tab_icon_selector)); tabSpec.setContent(R.id.tab_3); tabHost.addTab(tabSpec); tabHost.setCurrentTabByTag("tag1"); enableTabs(); // Initialize variables holding controls and set click // listeners for them m_btnMelon = (ImageButton)findViewById(R.id.btnMelon); m_btnMelon.setOnClickListener(this); m_btnMicrophone = (ImageButton)findViewById(R.id.btnMicrophone); m_btnMicrophone.setOnClickListener(this); m_viewTest = findViewById(R.id.viewTest); m_viewTest.setMovementMethod(new ScrollingMovementMethod()); ImageButton btnInstructions = (ImageButton)findViewById(R.id.btnInstructions); btnInstructions.setOnClickListener(this); m_btnSave = (ImageButton)findViewById(R.id.btnSave); m_btnSave.setOnClickListener(this); m_viewSave = findViewById(R.id.viewSave); m_viewSave.setMovementMethod(new ScrollingMovementMethod()); m_btnSubmit = (ImageButton)findViewById(R.id.btnSubmit); m_btnSubmit.setOnClickListener(this); m_viewSubmit = findViewById(R.id.viewSubmit); m_viewSubmit.setMovementMethod(new ScrollingMovementMethod()); m_viewResults = findViewById(R.id.viewResults); m_icnProgressMicrophone = findViewById(R.id.icnProgressMicrophone); m_icnProgressMelon = findViewById(R.id.icnProgressMelon); m_icnSubmitProgress = findViewById(R.id.icnSubmitProgress); // The counter is to be saved to (permanent) settings and // loaded from settings when activity restarts. If we // have saved data for N watermelons without clicking // "Submit", the counter will be 3. SharedPreferences sPref = getPreferences(MODE_PRIVATE); m_nCounter = sPref.getInt("FILE_COUNTER", 1); } // enableTabs() handles availability of program's tabs. // For example, if the user haven't recorded any audio samples yet, // there is no reason to enable "Save" or "Submit" tabs, as there // is nothing to save / submit. public void enableTabs() { TabHost tabHost = findViewById(android.R.id.tabhost); String strOldFileName = "test.pcm"; File fileOld = new File(/*path*/this.getFilesDir(), strOldFileName); if(!fileOld.exists()) { tabHost.getTabWidget().getChildTabViewAt(1).setEnabled(false); tabHost.setCurrentTabByTag("tag1"); } else tabHost.getTabWidget().getChildTabViewAt(1).setEnabled(true); if(m_nCounter == 1) { tabHost.getTabWidget().getChildTabViewAt(2).setEnabled(false); tabHost.setCurrentTabByTag("tag1"); } else tabHost.getTabWidget().getChildTabViewAt(2).setEnabled(true); } // Handle user interaction with screen controls @Override public void onClick(View v) { // Used to ask for microphone and Internet permissions. boolean bPermissions; switch (v.getId()) // id of a button pressed { // Feed the last audio sample recorded to NN case R.id.btnMelon: { try { // Last audio sample is always saved as test.pcm String filename = "test.pcm"; File inFile = new File(this.getFilesDir(), filename); if(inFile.exists()) { // "/2" as short == 2 bytes int len = (int) inFile.length() / 2; byte[] data = new byte[(int) inFile.length()]; FileInputStream fis = null; fis = new FileInputStream(inFile); fis.read(data); fis.close(); // Quick and dirty conversion of "shorts" that // are expected by NN m_arrInput = new float[len]; int nMax = 0; for(int i = 0; i < len; i += 2) { short n = 0; n |= data[i]; n |= data[i + 1] << 8; m_arrInput[i / 2] = n; if(nMax < Math.abs(n)) nMax = Math.abs(n); } // Normalize to -1:1 range expected by NN if(nMax != 0) { for(int i = 0; i < m_arrInput.length; i++) m_arrInput[i] /= nMax; } // Start a task in a separate thread PredictionTask prediction_task = new PredictionTask(); prediction_task.execute(); } } catch(Exception e) { e.printStackTrace(); } } break; // Record a 5 seconds audio sample case R.id.btnMicrophone: // Check (and ask) permission for microphone bPermissions = checkRecordPermission( Manifest.permission.RECORD_AUDIO, PERMISSION_RECORD_AUDIO); // If the user said "no" if(!bPermissions) return; // Start audio recorder AudioRecorderTask task = new AudioRecorderTask(); task.execute(); break; // Open Web page with an online tutorial case R.id.btnInstructions: Intent browserIntent = new Intent(Intent.ACTION_VIEW, Uri.parse("https://robotics.snowcron.com/melonaire.htm")); startActivity(browserIntent); break; // Save the audio sample together with the // user estimation of fruit quality case R.id.btnSave: saveTestResult(); enableTabs(); break; // Submit all data saved by "Save", to our server case R.id.btnSubmit: bPermissions = checkRecordPermission( Manifest.permission.INTERNET, PERMISSION_INTERNET); if (!bPermissions) return; submitTestResults(); break; } } // On destroying the Activity, release the recorder @Override protected void onDestroy() { super.onDestroy(); m_bIsReading = false; if (audioRecord != null) audioRecord.release(); } // --- // Request permission for Microphone or Internet, // if not granted already private boolean checkRecordPermission( String strPermission, int nPermission) { if(ContextCompat.checkSelfPermission(this, strPermission) != PackageManager.PERMISSION_GRANTED) { if(ActivityCompat.shouldShowRequestPermissionRationale( this, strPermission)) { // Show an explanation to the user *asynchronously* // -- don't block this thread waiting for the // user's response! After the user sees the explanation, // try again to request the permission. } else { // No explanation needed; request the permission ActivityCompat.requestPermissions(this, new String[]{strPermission}, nPermission); } return false; } else { // Permission has already been granted return true; } }
Below is the code to support the audio recorder. Note that a "honest" Android device should support 44.1 KHz sample rate. We are NOT going to teach our NN on samples recorded at lower rates. To distinguish is easy: the file size should be 200+ K. If not - do not include it to training/testing pool.
void createAudioRecorder() { int channelConfig = AudioFormat.CHANNEL_IN_MONO; int audioFormat = AudioFormat.ENCODING_PCM_16BIT; getValidSampleRates(); // m_nSampleRate set m_BufferSize = AudioRecord.getMinBufferSize( m_nSampleRate, channelConfig, audioFormat); if(m_BufferSize == AudioRecord.ERROR || m_BufferSize == AudioRecord.ERROR_BAD_VALUE) m_BufferSize = m_nSampleRate * m_nDurationOfSample * 2; else m_BufferSize = m_nSampleRate * m_nDurationOfSample * 2; audioRecord = new AudioRecord(MediaRecorder.AudioSource.MIC, m_nSampleRate, channelConfig, audioFormat, m_BufferSize); } // In this function we test for common sample rates. But // note that we are only, really, after 44.1K, so it is ok to // return error if not supported. public void getValidSampleRates() { m_nSampleRate = 0; for (int rate : new int[] {44100, 22050, 11025, 16000, 8000}) { // add the rates you wish to check against int bufferSize = AudioRecord.getMinBufferSize(rate, AudioFormat.CHANNEL_CONFIGURATION_DEFAULT, AudioFormat.ENCODING_PCM_16BIT); if (bufferSize > 0) { // buffer size is valid, Sample rate supported m_nSampleRate = rate; break; } } } // --- public void recordStart() { createAudioRecorder(); Log.d(TAG, "init state = " + audioRecord.getState()); android.os.Process.setThreadPriority( android.os.Process.THREAD_PRIORITY_AUDIO); byte[] audioBuffer = new byte[m_BufferSize / 2]; if (audioRecord.getState() != AudioRecord.STATE_INITIALIZED) { Log.e(TAG, "Audio Record can't initialize!"); return; } audioRecord.startRecording(); m_bIsReading = true; int recordingState = audioRecord.getRecordingState(); Log.d(TAG, "recordingState = " + recordingState); // --- int read = 0; String filename = "test.pcm"; int nExpectedSampleSize = m_nDurationOfSample * m_nSampleRate; try { File file = new File(this.getFilesDir(), filename); os = new FileOutputStream(file); while(m_bIsReading && read < nExpectedSampleSize) { read += audioRecord.read(audioBuffer, 0, audioBuffer.length); if(AudioRecord.ERROR_INVALID_OPERATION != read) { os.write(audioBuffer); } Log.d(TAG, "recorded = " + read + " Bytes"); } os.close(); recordStop(); // --- Now load the resulting file to an array } catch(Exception e) { e.printStackTrace(); } } // --- public void recordStop() { m_bIsReading = false; Log.d(TAG, "record stop"); audioRecord.stop(); audioRecord.release(); } // This function is not used, but maybe you would like to // listen to what you have just recorded. private void playRecording() { String filename = "test.pcm"; File file = new File(this.getFilesDir(), filename); byte[] audioData = null; try { InputStream inputStream = new FileInputStream(file); audioData = new byte[m_BufferSize]; AudioTrack audioTrack = new AudioTrack( AudioManager.STREAM_MUSIC, m_nSampleRate, AudioFormat.CHANNEL_OUT_MONO, AudioFormat.ENCODING_PCM_16BIT, m_BufferSize, AudioTrack.MODE_STREAM); audioTrack.play(); int i=0; while((i = inputStream.read(audioData)) != -1) { audioTrack.write(audioData,0,i); } } catch(FileNotFoundException fe) { Log.e(TAG,"File not found"); } catch(IOException io) { Log.e(TAG,"IO Exception"); } }
After the audio sample is recorded, the user has two options (that are not mutually exclusive). First, by pressing the "Watermelon" button, the user can feed the data to the NN (see the code above). Or the sample can be annotated on the "Save" tab, and saved together with user estimated sweetness and ripeness info.
public void saveTestResult() { try { String strOldFileName = "test.pcm"; File fileOld = new File(this.getFilesDir(), strOldFileName); if(!fileOld.exists()) return; // --- We append user data at the end (2 last ints) of a PCM file os = new FileOutputStream(fileOld, true); Spinner spinnerSweetness = findViewById(R.id.actualSweetness); // 0, 1, 2 int nSweetness = spinnerSweetness.getSelectedItemPosition(); Spinner spinnerRipeness = findViewById(R.id.actualRipeness); // 0, 1, 2 int nRipeness = spinnerRipeness.getSelectedItemPosition(); os.write(nSweetness); os.write(nRipeness); os.close(); // Now we rename the file to something that is not // going to be overwriten the next time we record a sample String strNewFileName = "test" + m_nCounter + ".pcm"; File fileNew = new File(this.getFilesDir(), strNewFileName); fileOld.renameTo(fileNew); // Increase and store the counter m_nCounter++; SharedPreferences sPref = getPreferences(MODE_PRIVATE); SharedPreferences.Editor ed = sPref.edit(); ed.putInt("FILE_COUNTER", m_nCounter); ed.commit(); } catch(Exception e) { e.printStackTrace(); } }
After few samples were recorded, the user returns from the farmland to a place with the Internet available. Now it is possible to submit us the samples recorded, so we can include them in the next training pool.
public boolean UploadFiles() { try { new HttpFileUpload(this).execute(""); return true; } catch (Exception e) { return false; } } // --- public void submitTestResults() { // Create an array of test0.pcm, test1.pcm... m_arrFileNames = new ArrayList(); for(int i = 1; i < m_nCounter; i++) { String strFileName = this.getFilesDir() + "/test" + i + ".pcm"; m_arrFileNames.add(strFileName); } // And upload all these files to our server UploadFiles(); }
After we have submitted files, the counter should be reset to a default value (which is 1).
public void resetCounter() { m_nCounter = 1; SharedPreferences sPref = getPreferences(MODE_PRIVATE); SharedPreferences.Editor ed = sPref.edit(); ed.putInt("FILE_COUNTER", m_nCounter); ed.commit(); }
"AudioRecorderTask" is executed in a separate thread: it acquires a 5 seconds audio sample. We assume that in these 5 seconds, the user will knock on the watermelon three times.
class AudioRecorderTask extends AsyncTask{ @Override protected void onPreExecute() { super.onPreExecute(); // Disable buttons for the duration of a task m_btnMelon.setEnabled(false); m_btnMicrophone.setEnabled(false); // Show the progress spinner m_icnProgressMicrophone.setVisibility(View.VISIBLE); } // --- @Override protected Void doInBackground(Void... params) { try { recordStart(); } catch (Exception e) { e.printStackTrace(); } return null; } @Override protected void onPostExecute(Void result) { super.onPostExecute(result); // Enable the buttons when task is over m_btnMelon.setEnabled(true); m_btnMicrophone.setEnabled(true); m_icnProgressMicrophone.setVisibility(View.GONE); enableTabs(); } }
HttpFileUpload.java
This file submits data to the server via HTTP(S). As there is nothing unexpected in the code, I will just provide the listing.
Our Java application can not - yet - provide quality predictions, as we do not have a trained neural network. However, it is fully capable of recording audio samples and submitting them to us - together with user estimations of quality of a watermelon the sample was recorded for.
The file it sends should go somewhere to the online storage. In other words, we need an online script to receive the data and to store it in a dedicated (and invisible for anyone but the site owner) directory on a server.
To do it, you will need a hosting provider with PHP (or other scripting) enabled; I personally use DreamHost.
The server script itself is very simple, it took me about 20 minutes to create it; still, this is a separate stage of our project. The program "catches" files that were sent to it, renames them, giving unique file names, and stores in a separate folder called "melonaire".
<?php if (is_uploaded_file($_FILES['file']['tmp_name'])) { $uploads_dir = './melonaire/'; $tmp_name = $_FILES['file']['tmp_name']; $pic_name = $_FILES['file']['name']; $filename = md5(date('Y-m-d H:i:s:u')); move_uploaded_file($tmp_name, $uploads_dir.$filename); } else { echo "File not uploaded successfully."; } ?>
That's it. The code works together with the client part (see HttpFileUpload.java above).
As our ("dull" or a fully functional one) program works, a pile of files is being collected on a server. From time to time, we are going to copy them to our local computer and use them to teach a neural net; hopefully, the more files we have, the better predictions we can make. However, let's face it, not all samples we receive are going to be good. Sometimes, the user would simply press buttons in a random order, trying to figure out what is it all about, and we will get a "flat" audio with no samples. Sometimes they knock, but in a wrong way - so wrong, that it can be recognized by the look of a chart... So let's load data, build charts and delete samples that look awkwardly wrong.
What we are going to do:
1. Copy (I use WinSCP, but you can use any FTP or SMTP client) files from the folder on your site to your local computer (in the code below, the local folder is called "real_data").
2. Load files and build chart for each of them.
3. Save charts as JPG or PNG images.
This way, we are going to have images alongside with data files, and the moment we see that a particular image looks wrong, we will delete the image AND data file. Files that "survive" this cleaning are going to be used for teaching the NN.
Here is an example of a "good" sample:
Sample that looks suspicious:
As we plan on working with Keras, importing and charting is done using Python. Let's walk over the most important points of code (for the complete listing, check here):
... # The following line blocks warning about memory usage os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' ... # A nice trick that makes sure two consecuent rund of the code # produces the same results np.random.seed(7) # 500x500 pixels nTargetImageSize = 500 # Returns directory we work in def get_script_path(): return os.path.dirname(os.path.realpath(sys.argv[0])) # Using PyPlot library, create a chart # After this function call, we will have an image containing a # chart and a .dat file containing data def saveChart(strFileName, arr) : #, image_dim) : fig, ax = plt.subplots(nrows=1, ncols=1) DPI = fig.get_dpi() fig.set_size_inches( (nTargetImageSize/DPI, nTargetImageSize/DPI) ) # We do not need axes, so create a chart without them ax.axes.get_xaxis().set_visible(False) ax.axes.get_yaxis().set_visible(False) ax.plot(arr,color='blue') fig.savefig(strFileName + ".jpg", pad_inches=0, transparent='False') plt.close(fig) # Now save data files #arr.astype('int16').tofile(strFileName + ".dat") # As I mentioned in the tutorial above, just for the sake # of an exercise, it is possible to use Conv2D net and to # analyze this chart, rather than raw data. To do so, we # do not need 3 color channels. This function converts an # image to B&W. def rgb2Bw(strFileName): img = Image.open(strFileName) # open colour image img = img.convert("L") #thresh = 200 #imgData = np.asarray(img) #thresholdedData = (imgData > thresh) * 1.0 img.save(strFileName) # Get initial data file, extract (encoded by Java App) # info about ripeness and sweetness, and create a chart # for it def saveImages(strFileName, strTargetDir) : with open('data_real/' + file, 'rb') as fh: loaded_array = np.fromfile(fh, dtype=np.int16) nData = loaded_array[len(loaded_array) - 1] nSweetness = np.int16(nData << 8) >> 8 nRipeness = nData >> 8 print("Sweetness: ", nSweetness, "; Ripeness: ", nRipeness) # remove sweetness/ripeness data (or else our NN # will learn to cheat) loaded_array = loaded_array[:-1] saveChart(strTargetDir + file, loaded_array) ... # --- Create train and test arrays of file names arrFileNames = []; for file in os.listdir(get_script_path() + "/data_real"): arrFileNames.append(file) print("Images:") for file in arrFileNames: print(file) saveImages(file, "data_real/")
Working with an audio file (knocking on a watermelon produces one) means, in most cases, either recurrent neural networks, or so called uni-dimensional convolution nets. Lately, convolutional networks are used in almost all cases, as recurrent nets are less flexible, simple and scalable. The idea of a convolutional neural net is: among the data array (a chart time-intencity of the sound) slides a "window", and instead of analysing hundreds of thousands of samples at once, we only workd with whatever is inside that "window". The layers that are located in the network after the convolutional one, are performing the job of uniting and analysing the sub-samples created by it.
To get a better understanding, imagine that you need to find a seagull on a naval landscape picture. You scan the image - a "window" of your attention slides among the imaginary rows and columns - looking for a white checkmark-looking pattern. This is how a Conv2D network works, as for an 1D net, it only "slides" among a single coordinate: an optimal choice if we deal with an audio record.
It should be mentioned, by the way, that Conv1D nets are not the only choice. As an exercise, I have tried analysing images (charts we built above) using Conv2D net; surprisingly, it worked just fine. After all, creating a bitmap and analysing it instead of dealing with the raw data is nothing but some pervasive form of data preprocessing.
Below, the important points of Deep Learning code are hilited, for the complete code, see here.
... # Supress the memory use warning os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' ... np.random.seed(7) # ... # In our case, an epoch is a single, randomly chosen, # file. So, if we have 100 sample files and 1000 epochs, # in means that we are going to randomly choose one of # these 100 files, 1000 times. epochs = 10000 # If we have 100 files, 70 will be used as a training # set, and 30 - as a testing set. Never do testing on # samples that were used for training! testing_split=0.7 # 70% ... # Our NN produces 2 outputs: sweetness and ripeness nNumOfOutputs = 2 nSampleSize = 0 # ------------------- def get_script_path(): return os.path.dirname( os.path.realpath(sys.argv[0])) # Take data file and break it into sample, ripeness and # sweetness. Note: the former two were appended at the # end of data array by "Save" functionality of the Java # app def loadData(strFileName) : global nSampleSize with open('data_real/' + strFileName, 'rb') as fh: loaded_array = np.fromfile(fh, dtype=np.int16) # in all data, select shortest if(nSampleSize == 0): nSampleSize = len(loaded_array) - 1 elif(nSampleSize > len(loaded_array) - 1): nSampleSize = len(loaded_array) - 1 nData = loaded_array[len(loaded_array) - 1] nSweetness = np.int16(nData << 8) >> 8 nRipeness = nData >> 8 print(strFileName, ": Sweetness: ", nSweetness, "; Ripeness: ", nRipeness) loaded_array = loaded_array[:-1] return nSweetness, nRipeness, loaded_array # Load data files, split to sample and labels # which are sweetness/ripeness) and store in # arr_data and arr_labels arrays. arr_data = [] arr_labels = [] path = get_script_path() + "/data_real/" for file_name in os.listdir(path): nSweetness, nRipeness, arr_loaded = loadData(file_name) arr_data.append(arr_loaded / max(abs(arr_loaded))) # 2 stands for num. of inputs of a combo box - 1 arr_labels.append([nSweetness / 2.0, nRipeness / 2.0]) # Switch from Python arrays to NumPy arrays arrData = np.asarray(arr_data) arrLabels = np.asarray(arr_labels) # We need an extra dimension in arrData. arrData = arrData[..., np.newaxis] # Split data to training and testing sets. Training set # is used to teach the NN, while Testing set is used to # estimate, how well the NN works on data it never saw before. nTrainData = int(len(arrData) * testing_split) nTestData = len(arrData) - nTrainData arrTrainData = arrData[:nTrainData] arrTestData = arrData[nTrainData:] arrTrainLabels = arrLabels[:nTrainData] arrTestLabels = arrLabels[:nTestData] print("len(arrTrainLabels): ", len(arrTrainLabels), "; len(arrTestLabels): ", len(arrTestLabels)) # Creating the NN itself, Keras model model = Sequential() # filters - how many "windows" we have simultaneously # kernel_size - size of the "window" # strides - "step" the "window" moves at # input_shape=(nSampleSize, batch) # name - we are going to use it when exporting the NN model.add(Conv1D(filters=32, kernel_size=512, strides=3, padding='valid', use_bias=False, input_shape=(nSampleSize, 1), name='c1d', activation='relu')) # "Processor" network layew, making sence from the # output of Conv1D layer model.add(Activation('relu', input_shape=(nSampleSize, 1))) # Reduce the size of the data model.add(MaxPooling1D(pool_size=(2))) # Again model.add(Conv1D(32, (3))) model.add(Activation('relu')) model.add(MaxPooling1D(pool_size=(2))) # And again model.add(Conv1D(64, (3))) model.add(Activation('relu')) model.add(MaxPooling1D(pool_size=(2))) # Flattening to make data for following "Dense" model.add(Flatten()) model.add(Dense(64)) model.add(Activation('relu')) # This layer is used to prevent overfitting model.add(Dropout(0.5)) model.add(Dense(nNumOfOutputs)) #1)) model.add(Activation('sigmoid')) model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy']) # For number of epochs, teach the NN, and every 10th # cycle, calculate (in a primitive way, Keras has # better tools) the error, save it for later charting. arrResult = [] arrRip = [] arrSwt = [] for i in range(epochs): # Choose a random training sample nTrainIdx = np.random.randint(0, nTrainData - 1) trainData = arrTrainData[nTrainIdx] # expand, to provide data, with batch, expected # by train_on_batch() arrTrain = np.expand_dims(trainData, axis=0) trainLabels = arrTrainLabels[nTrainIdx] arrLabels = np.expand_dims(trainLabels, axis=0) model.train_on_batch(arrTrain, arrLabels) if i%10 == 0: nTestIdx = np.random.randint(0, nTestData - 1) testData = arrTrainData[nTestIdx] testLabels = arrTrainLabels[nTestIdx] arrTest = np.expand_dims(testData, axis=0) arrPrediction = model.predict(arrTest, batch_size=None, verbose=1) print("Ripeness, Sweetness: ", str(arrPrediction), "(Expected: ", str(testLabels), ")") arrResult.append(arrPrediction) arrRip.append(abs(arrPrediction[0][0] - testLabels[0])) arrSwt.append(abs(arrPrediction[0][1] - testLabels[1])) # After data is collected, plot the charts plt.plot(arrRip) plt.plot(arrSwt) plt.show() # To export the NN, let's save it on disk: model.save(get_script_path() + "/models/model.h5") ...
In the last line of the code above, we have saved the network to the model.h5 file. As you recall, in our Java application, we used the "converted.pb" file, co we need to convert the model.h5 to converted.pb.
This is a very well formalized part, the only thing you have to keep in mind is that TensorFlow/Keras is a work in progress, and exporting the network is getting more and more easy. So if you want to do it, make sure your textbook is up to date.
We are going to use the following Python code to export the NN to the format that can be used by Java Android app (for a complete code, see here):
... # In Java code, we need to provide names for inputs and outputs. # This function prints them, so we don't have to look throughout # the code. def print_graph_nodes(filename): g = tf.GraphDef() g.ParseFromString(open(filename, 'rb').read()) print() print(filename) print("INPUT:") print([n for n in g.node if n.name.find('input') != -1]) print("OUTPUT:") print([n for n in g.node if n.name.find('output') != -1]) print("KERAS_LEARNING:") print([n for n in g.node if n.name.find('keras_learning_phase') != -1]) print("======================================================") print() # ------------------- def get_script_path(): return os.path.dirname(os.path.realpath(sys.argv[0])) # ------------------- def keras_to_tensorflow(keras_model, output_dir, model_name,out_prefix="output_", log_tensorboard=True): if os.path.exists(output_dir) == False: os.mkdir(output_dir) out_nodes = [] for i in range(len(keras_model.outputs)): out_nodes.append(out_prefix + str(i + 1)) tf.identity(keras_model.output[i], out_prefix + str(i + 1)) sess = K.get_session() from tensorflow.python.framework import graph_util, graph_io init_graph = sess.graph.as_graph_def() main_graph = graph_util.convert_variables_to_constants(sess, init_graph, out_nodes) graph_io.write_graph(main_graph, output_dir, name=model_name, as_text=False) if log_tensorboard: from tensorflow.python.tools import import_pb_to_tensorboard import_pb_to_tensorboard.import_to_tensorboard( os.path.join(output_dir, model_name), output_dir) # ------ model = load_model(get_script_path() + "/models/model.h5") keras_to_tensorflow(model, output_dir=get_script_path() + "/models/model.h5", model_name=get_script_path() + "/models/converted.pb") print_graph_nodes(get_script_path() + "/models/converted.pb")
The code for "dull" Java app above had multiple sections that should have been commented out until the NN is available. Well, now it is. Uncomment those sections, copy the "converted.pb" to the "asserts" folder of your project and recompile the project.
At this point you should have a fully working application.
I would like to thank all people that have downloaded the Melonaire program from Google Play, and sent me countless audio samples. It is due to your involvement, that the program now learned (provided that you "knock" in a right way) how to estimate the quality of watermelons.
Thank you.