{"id":184,"date":"2010-11-19T08:27:15","date_gmt":"2010-11-19T15:27:15","guid":{"rendered":"http:\/\/journeyman.ivystreetinc.com\/?p=184"},"modified":"2010-11-19T09:26:20","modified_gmt":"2010-11-19T16:26:20","slug":"voice-recognition-kinda-hard","status":"publish","type":"post","link":"http:\/\/10kdev.net\/?p=184","title":{"rendered":"Voice Recognition?  Kinda hard . . . ."},"content":{"rendered":"<p>Ok I decided yet an foray into speech-to-text software.\u00a0 I am seeing what exactly I can just do with it and honestly, the idea of being a poor man&#8217;s Tony Stark is just too cool to pass up.\u00a0 Well, I guess his commands are like &#8220;computer, reconfigure titanium actuator motors&#8221; and mine would just be &#8220;computer, open minesweeper.&#8221;\u00a0 Still cool though.<\/p>\n<p>Speech to text: You talk, and the computer either types your statement or executes a command.\u00a0 Speech to text is also called Voice Recognition &#8211; VR.\u00a0 A lot of these applications also do text to speech &#8212; computer speaks the text or responds vocally &#8212; as well.<\/p>\n<p>I started to research what was open source, and what wasn&#8217;t.\u00a0 With these four features as a priority:<\/p>\n<ol>\n<li>Be able to take dictation.<\/li>\n<li>Be able to process already recorded audio files(MP3&#8217;s)\u00a0 into text.<\/li>\n<li>Execute commands.<\/li>\n<li>Respond like Hal 9000.<\/li>\n<\/ol>\n<p>Starting to research I came down with these applications:<\/p>\n<ul>\n<li>Open source &#8211; Sphinx project from Carnegie-Mellon.<\/li>\n<li>Open Source &#8211; Simon (a non-American effort).<\/li>\n<li>Plain old Microsoft VR functionality on a PC.<\/li>\n<li>Shareware &#8211; E-Speaking VR software ($14; 30 day trial).<\/li>\n<li>Commercial &#8211; Dragon Speech from Nuance (Currently $75 with a headset on sale)<\/li>\n<\/ul>\n<p><em><strong>Sphinx <\/strong><\/em>was purely an API for people wanteing to do academic applications or have a core library to make an app.\u00a0 There were pieces in both C++ and Java, but I found the startup time quite long since I don&#8217;t want to CODE speech to text, I want to DO speech to text.<\/p>\n<p>Here is the Sphinx site:\u00a0 http:\/\/cmusphinx.sourceforge.net\/<\/p>\n<p>I played with <em><strong>Simon <\/strong><\/em>for about 4 hours, and came to this conclusion: it may be a fancy and high potential application but its over-engineered, complicated, and lacks basic &#8220;how-to&#8221;\u00a0 instructions that would have any &#8220;see-I-told-you-so&#8221; usability experts write the second edition to their books scolding us developers.\u00a0 There is a LOT of setup to get the thing off the ground: you have to download a dictionary,\u00a0 load it, you have to create &#8220;scenearios,&#8221; create grammars, and record a lot of samples before this thing will get off the ground.\u00a0 And since me and my homey Noam Chomsky haven&#8217;t hung out in years I was a little short on linguistics PhD knowledge to make this thing fly.\u00a0 It exposed all the innards I didn&#8217;t want to know.<\/p>\n<p>Also, Simon is with a Linux-based KDE graphics and operations.\u00a0 I didn&#8217;t dig into the architecture, it kinda reminds me of a desktop KDE install on a PC.\u00a0 But, I think this product is platform-independent and even runs on Mac.<\/p>\n<p>The Simon site: http:\/\/simon-listens.org<\/p>\n<p><strong>In all fairness, projects like Sphinx and Simon are crucial for the advancement of VR technology, CRUCIAL.\u00a0 And I thank the efforts on both projects, thank you!<\/strong><\/p>\n<p>But I want to be Tony Stark.\u00a0 NOW, dammit.<\/p>\n<p>Through my searches I found out <em><strong>Microsoft has some VR built in<\/strong><\/em>, depending on what you have installed.\u00a0\u00a0 Here we go again\u00a0 . . . grasping for straws . . . . maybe.\u00a0\u00a0 First\u00a0 look in my install of Office 2007.\u00a0 Hmm, nothing.\u00a0 I look in my Control Panel-Speech &#8212; VR tab is missing.\u00a0 Upon reading I find this out:\u00a0 XP OS doesn&#8217;t have it, you need to have installed Office 2002 or 2003.\u00a0\u00a0 After that, they moved the VR module in Vista to the OS, and Office 2007 and later will not have it.\u00a0 CRAP.\u00a0 So I find this site:\u00a0 http:\/\/support.microsoft.com\/kb\/306537\u00a0 and a light goes on &#8212; just install the right service pack.<\/p>\n<p>I find all the software here:\u00a0 http:\/\/www.microsoft.com\/downloads\/en\/details.aspx?FamilyID=5e86ec97-40a7-453f-b0ee-6583171b4530&amp;displaylang=en<\/p>\n<p>And after getting all of it finally figure out what I want is the 68 meg file (this was quite painful, I don&#8217;t know WHAT I did to my poor Compaq).\u00a0 So Bingo\/Bango its installed . . . I can uh, train it.\u00a0 I stopped there, because I wasn&#8217;t sure how to make it really work on my XP machine<\/p>\n<p><em><strong>E-Speaking<\/strong><\/em> is next.\u00a0 I am impressed.\u00a0 It gives you immediate out of the box functionality, a nice interface with commands for your PC machine.\u00a0 The UI is relatively easy to use.\u00a0 it TYPES into notepad!\u00a0\u00a0 It has a cool talking face you can skin.\u00a0 Very impressed.\u00a0 The E-Speaking product does most of what I want, with relatively little pain.\u00a0 Awesome.\u00a0 Also, it sits on top of the Microsoft SAPI engine &#8212; PC only but who cares for my uses.<\/p>\n<p>Here&#8217;s the E-Speaking site:\u00a0 http:\/\/www.e-speaking.com\/<\/p>\n<p><em><strong>Dragon Speech<\/strong><\/em> is a pay-for &#8212; I have used it before and it is FAST and the UI is very good,\u00a0 For more money it does what you want.\u00a0 What I don&#8217;t like is reading their site they seem to want to limit your choice of hardware to theirs.\u00a0 I am not sure though if they can lock out your own blue tooth headset, that would totally SUCK.\u00a0\u00a0\u00a0 At the current $75 price point I may purchase it though.\u00a0 Also, its difficult to figure out exactly what features come at the different pricings\/versions on their site.\u00a0 I don&#8217;t want to have to pay another $100 to have the word commands &#8220;unlocked&#8221; &#8212; or whether I can even train my own apps on their software, it might be quite locked down.\u00a0 Also Dragon at a higher price level (for both Mac and PC) can do the transcribing of audio files.<\/p>\n<p>Dragon&#8217;s site:\u00a0 http:\/\/www.nuance.coI<\/p>\n<p>Its an interesting revenue model to follow for E-Speaking; I have an idea Dragon will follow.\u00a0 basically, if you build the wrapper and some basic functionality, you can charge for different voices, bigger dictionaries, command libraries for applications etc.\u00a0 Its like Mapping software &#8212; you buy the wrapper cheap, and pay more for the maps you want if you want them.\u00a0 Not a bad idea.<\/p>\n<p>The winner:\u00a0 right now, for my purposes: E-Speaking.\u00a0\u00a0 It does goals 1,3,4 and very quickly.\u00a0 None of them did goal 2 &#8212; process MP3&#8217;s to text.\u00a0 Maybe I&#8217;ll see what it takes to write something like that.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Ok I decided yet an foray into speech-to-text software.\u00a0 I am seeing what exactly I can just do with it and honestly, the idea of being a poor man&#8217;s Tony Stark is just too cool to pass up.\u00a0 Well, I guess his commands are like &#8220;computer, reconfigure titanium actuator motors&#8221; and mine would just be [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"http:\/\/10kdev.net\/index.php?rest_route=\/wp\/v2\/posts\/184"}],"collection":[{"href":"http:\/\/10kdev.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/10kdev.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/10kdev.net\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/10kdev.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=184"}],"version-history":[{"count":3,"href":"http:\/\/10kdev.net\/index.php?rest_route=\/wp\/v2\/posts\/184\/revisions"}],"predecessor-version":[{"id":186,"href":"http:\/\/10kdev.net\/index.php?rest_route=\/wp\/v2\/posts\/184\/revisions\/186"}],"wp:attachment":[{"href":"http:\/\/10kdev.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=184"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/10kdev.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=184"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/10kdev.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=184"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}