-
Extras
09/17/2018 at 09:33 • 0 commentsDownloaded and installed a better MIDI synth from VirtualMidiSynth
Added the FluidR3 GM Bank SoundFont from http://www.synthfont.com/soundfonts.html (get the sfArk decompression tool from the same web site)
Added a "Drum" checkbox to the user interface. This selects MIDI Channel 10 for all output. Note pitch now maps to drum type
-
Demo
09/16/2018 at 15:07 • 0 comments -
User Interface
09/16/2018 at 14:44 • 0 commentsUser Interface
In addition to the tangible blocks interface there is a traditional Windows Forms user interface. This allows parameters such as tempo, instrument, transposition and octave parameters to be set. These parameters can be altered while the audio is playing and it will react in real time.
The user interface also shows the view from the webcam and overlays some markers showing the positions of detected blocks and the current grid column being played.
-
Software - Vision subsystem
09/16/2018 at 14:43 • 0 commentsSoftware - Vision subsystem
The job of the vision subsystem is to keep the sequencer updated as the arrangement of the blocks is changed.
The software continually grabs video frames from the webcam, identifies the block positions and maps them to cells on a 16 by 16 virtual grid. These cells are then converted to an array of notes for loading into the sequencer.
To identify the blocks the "SimpleBlobDetector" class from Emgu.CV is used. As the name suggests this identifies blobs in an image and outputs a list of the blob centroids (the coordinates of the block centres). The blob detector can be configured to only accept blobs in a certain size range, which can be optimised by trial and error for the particular blocks used.
Once the block coordinates are obtained they can be mapped to the nearest grid cells and then to an array of notes, the cell row and column giving the pitch and order of the notes respectively. The note array is then loaded into the sequencer.
This process runs independently of the sequencer so differences in frame rate or blob detection time do not affect the timing of the audio output.
The video sub system also displays the captured images and overlays some markers showing the positions of detected blocks and the current grid column being played.
-
Software - Audio subsystem
09/16/2018 at 14:42 • 0 commentsThe key component here is the Sequencer. This maintains the sequence of notes to be played and steps from one set of notes to another on receipt of a timer tick. Once the last note has been played it repeats from the beginning.
The human ear is very sensitive to changes in timing of sounds, so it's important to use a regular beat for the timer. Standard Windows timers have a resolution of about 15 ms which is not quite good enough. Using the multimedia timer gives a resolution down to 1 ms and can be set to generate a periodic time tick with good consistency.
To simplify the audio output code MIDI synthesis is used. This avoids having to deal with analogue waveforms and keeps all audio in the digital domain. For this project the built-in Windows Microsoft GS Wavetable MIDI Synth is used. This is not the best sounding, but it allows up to 32 notes to be played simultaneously from a selection of 127 instruments and is adequate for our purposes.
The sequencer component can be run and tested independently. A sequence can be loaded using the SetNotes method which takes a 2-dimensional array of Notes and will then play that sequence in a loop until stopped.
-
Software Overview
09/16/2018 at 14:41 • 0 commentsSoftware Overview
The overview of the system is as follows:
On the left are the vision components. These are responsible for getting images of the blocks from the webcam and converting them into a set of notes to be played. Open CV is a well known library for image processing and here we use the Emgu.CV wrapper for C#/.Net
On the right are the audio components. These convert the extracted notes into sound. The NAudio library is used for MIDI sound synthesis.
-
Hardware Design
09/16/2018 at 14:35 • 0 commentsHardware design
The hardware consists of a webcam and some blocks. The webcam is connected to a laptop where all image processing and sound generation takes place.
To keep it simple Lego blocks were used. They have a consistent size and shape and should make it easier for the computer vision system to reliably track them. It was found that yellow blocks were not reliably detected and are more sensitive to lighting conditions. Other darker colours worked better.
The webcam used was a "GUCEE HD92 720P" model. A resolution of 640 x 480 is sufficient for this project so most webcams should work OK.
The webcam was mounted directly over the blocks using a hacked IKEA Tertial lamp. The original webcam clip was removed and the lamp bracket drilled out to accept the fitting.
For audio output the built-in PC sound card was used.