VoiceXML
|
VoiceXML (VXML) is the W3C's standard XML format for specifying interactive voice dialogues between a human and a computer. It is fully analogous to HTML, and brings the same advantages of web application development and deployment to voice applications that HTML brings to visual applications. Just as HTML documents are interpreted by a visual web browser, VoiceXML documents are interpreted by a voice browser. A common architecture is to deploy banks of voice browsers attached to the public switched telephone network (PSTN) so that users can simply pick up a phone to interact with voice applications.
There are already thousands of commercial VoiceXML applications deployed, processing many millions of calls per day. These applications perform a huge variety of services, including order inquiry, package tracking, driving directions, emergency notification, wake-up, flight tracking, voice access to email, customer relationship management, prescription refilling, audio newsmagazines, voice dialing, and real-estate information. They serve all industries, and range in size all the way up to massive national directory assistance applications.
VoiceXML has tags that instruct the voice browser to provide speech synthesis, automatic speech recognition, dialog management, and soundfile playback. The following is an example of a VoiceXML document:
<?xml version="1.0"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"> <form> <block> <prompt> Hello world! </prompt> </block> </form> </vxml>
When interpreted by a VoiceXML interpreter this will output "Hello world" with synthesized speech.
Typically, HTTP is used as the transport protocol for fetching VoiceXML pages. While simpler applications may use static VoiceXML pages, nearly all rely on dynamic VoiceXML page generation using an application server like Tomcat, Weblogic, .NET server or WebSphere. In a well-architected web application, the voice interface and the visual interface share the same back-end business logic.
Historically, VoiceXML platform vendors have implemented the standard in different ways, and added proprietary features. But the W3C's new VoiceXML 2.0 standard clarifies most areas of difference, and vendors are going through a rigorous conformance testing process set up by the VoiceXML Forum, the industry group promoting the use of the standard.
Two closely related W3C standards used with VoiceXML are the Speech Synthesis Markup Language (SSML) and the Speech Recognition Grammar Specification (SRGS). SSML is used to decorate textual prompts with information on how best to render them in synthetic speech, for example which speech synthesizer voice to use, and when to speak louder. SRGS is used to tell the speech recognizer what sentence patterns it should expect to hear.
The Call Control eXtensible Markup Language (CCXML) is a complementary W3C standard. A CCXML interpreter is used on some VoiceXML platforms to handle the initial call setup between the caller and the voice browser, and to provide telephony services like call transfer and disconnect for the voice browser. CCXML is also very useful in non-VoiceXML contexts.
See also
External links
- W3C's Voice Browser Working Group (http://www.w3.org/Voice/)
- W3C's VoiceXML 2.0 Recommendation (http://www.w3.org/TR/voicexml20/)
- W3C's SRGS 1.0 Recommendation (http://www.w3.org/TR/speech-grammar/)
- W3C's SSML 1.0 Recommendation (http://www.w3.org/TR/speech-synthesis/)
- W3C's CCXML 1.0 standard (http://www.w3.org/TR/ccxml/)
- VoiceXML Forum (http://www.voicexml.org/)
- VoiceXML Forum Tutorial (http://www.voicexml.org/tutorials/intro1.html)
- VoiceXML Review e-zine (http://www.voicexmlreview.org/)
- OpenVXI, an open-source VoiceXML interpreter (http://www.speech.cs.cmu.edu/openvxi/index.html)
- VoiceXML Italian User Group (VIUG) (http://www.vxmlitalia.com)de:VoiceXML