HTML5 Zone is brought to you in partnership with:

A Yahoo! employee. Currently "playing" as a mobile engineer. The posts and comments here are my own and don't necessarily represent my employer, Yahoo!. You, or anyone else, don't have to be agree with what I'm posting or commenting on. Kristiono has posted 9 posts at DZone. You can read more from them at their website. View Full User Profile

Playing with Audio in HTML5

03.04.2013
| 2759 views |
  • submit to reddit

Recently, we had a little challenge on how to record our own voice using microphone with a very minimum effort from the user and store it in the cloud. What I mean with a minimum effort is that you don’t have to record your voice using a very great audio processing software such as GarageBand or Audacity and then manually uploading them to the cloud using ftp or scp (Windows people, please proceed and read the UNIX/Linux manual).

I know those steps are simple but I think we can handover the uploading process to the computer and let that machine do that for us. After all, our wish is their command. So I remember about the audio specification in HTML5 and decide to play with it. (By the way, I’m thinking of using Flash too, but I think you’ve already knew my answer. In fact, at the end of this article, I still leverage Flash ability to get the things done).

First, I need to know how the browser receive audio input via microphone. It turns out that HTML5 introduces a new API, navigator.getUserMedia(). Please beware that most browsers has its own implementation on this, e.g. you will get navigator.webkitGetUserMedia() or navigator.mozGetUserMedia() depends on the browsers. I’m sure you know which implementation for which browser.

It’s better to check whether the browser you or other user’s use support this new API. If the answer was no then you can fallback to anything you wanted on how to handle this kind of situation, including Flash (ha!). But if the answer was yes, congratulations. The journey to the awesomeness of HTML5 is now begin.

Your checking routine will look something like this:

function hasGetUserMedia() {
  return !!(navigator.getUserMedia || navigator.webkitGetUserMedia ||
				navigator.mozGetUserMedia || navigator.msGetUserMedia);
}

or this:

navigator.getMedia = ( navigator.getUserMedia ||
                       navigator.webkitGetUserMedia ||
                       navigator.mozGetUserMedia ||
                       navigator.msGetUserMedia);
                       
if (navigator.getMedia) {
  // do something here
}

Next, you have to explicitly request the browser to grant access to the specific hardware, in this case, microphone. You can easily request this permission using a parameter in getUserMedia using {audio: true}. You can also define the video using {video: true}, so if you really want to access both of them, just write {audio: true, video: true}.

function prepareMicrophone() {
  navigator.mozGetUserMedia({audio: true}, function(localMediaStream) {
		audio.src = window.URL.createObjectURL(localMediaStream);
		
		audio.onloadedmetadata = function(e) {
			localStream = localMediaStream;
			// streamRecorder = localMediaStream.record();
		};
	}, onRejected);	
}

To give the permission, each browsers may implement a different set of rules. Chrome may show a popup on the top and ask whether you give the application to access your hardware (in my case, microphone). Firefox is a little bit confusing by setting the permission throughout the about:config and then change the default value of media.navigator.enabled from false to true. Others may vary so please read their documentation first.

After you’ve got the permission, the browser should be ready to record user’s voice now. The source property of the audio object you created from <audio> should be filled with the object data called MediaStream.

At this point, you can start to record the user’s voice using only browser and the magic of Javascript. Interesting, huh? ;)

Now the last part is sending the voice data to the server and this is my conclusion:

These steps were not complete as my final goal was to be able to send the voice data to the server and it turns out that I couldn’t find anywhere in the specification or near to that on how to use this voice data to further process it to the next level. If one of you have found the way to do it (or I might possibly missing one of those APIs), please share your finding in the column section and I will update my post.

Flash to the rescue!

If you want to know the sample and full source codes, you can visit http://ksetyadi.com/labs/audio-html5

Published at DZone with permission of its author, Kristiono Setyadi. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)