Audio Synthesis, Speech to Text, Text to Speech

elvensynth · December 19, 2008, 8:08pm

I am still not done latexing Brexxa’s left hand…I am using Mold builder latex…it is superior latex, very stretchy, more yellow, less white, unlike slip. I have been having fun with Plastilina…I seem to find myself sculpting a lot from scratch…just using existing wow wee parts changing 10% design and soon to prototype…I also am looking into casting clear resin…and looking forward to spray painting, especially in Metallic…For forms I have come up with using Hobby wire/chicken wire or rather rabbit wire, something to hold a basic form, duct taping, and foam filler if needed, then plastilina-ing custom sculpt for forms…JOOt is a new one I have been told from my Teacher for using in plaster moulds…so they dont break and are virtually indestructable. I was reading the new World of Warcraft comic book and am currently finding interest in the Blood elves…and matching the face/head design of my first few fembot heads…to cast with high-elf ears…and give l.e.d eyes behind clear deodorant roll on’s. the cosmetology heads seem to look similar to the blood elves in world of warcraft.

gardangel · November 20, 2008, 4:41pm

Quite a lot of money you’d have to spend for your project… Reproducing human motion is a difficult goal to reach and it’s the same for AI. Dragon (Nuance) takes a lot of CPU’s juice to “understand” words.

After a lot of research, I concluded that the simpliest and most affordable project is:

VEX robotics for the frame
Phidgets for sensors (with RFID to “detect” and recognize objects) ;
SAPI 4.0 Speech recognition (bad luck : it’s only english and japan here) ;
Lernout and Hauspie for speech to text (in maybe 16 langages) ;
A lot of programming to make custom software (SYNOPSIS) to act like the main controler ;
An EEE-PC for the size and low benchmarks permitting to fit my software on many low costing PCs.

For the AI, I took the Excel file from ALICE that reproduces all kind of answers, which makes it possible to allow for discussion. It’s not really an “AI”. I was disappointed to see that it’s only statistical software… Using WOWWEE for the “main robot” is nice but it requires a lot of “hacking”. Searching for hacks on the web, I didn’t find a lot.

Good luck on your project.

Jeff

gardangel · November 25, 2008, 5:07pm

Cool ideas. E-Speaking use SAPI and .Net framework. Synopsis (my software) use the same technology except that it will be available in maybe more than 10 languages. I’ll add many functionnalitie like the possibilies to send signal on USB port and other.

For Brexxa, it sounds interesting but I don’t know if the AI is enough accurate. I prefer taking the algorithm for the project A.L.I.C.E (which is free) and implement my code to reproduce statistical answers.

Good luck in your project.

Jeff

gardangel · December 1, 2008, 5:59pm

Nice ideas but I’ll probably be too much for the actual computers we possessed. You’ll have to wait for the next generation of computers : bio-computers with nervous flux incorporate. I’m hoping that for the next decade.

Since then, I think the best way to make a robot is to use the strict minimum. There are too much actual problems to make something resemble to a human : energy, mobility, independancy, etc. The actual best approachs are stuffs from WOWEE, Nalbag and recognition for cars…

Wait for synopsis, I’ll need some help to test and have suggestions to incorporate robotics to it. Continue to inform me, you seemed to have the same kind of passion .

a+

Jeff

jbrunet · June 16, 2008, 1:09pm

Basically, you need an audio synthesis module to have your project speak, and a voice recognition circuit to have it comprehend commands. Perhaps you may also require a microcontroller do make the bridge. Most chat bots are PC based and do in software exactly what you wish to do in hardware. They have a voice recognition engine and a text to speech engine. If you prefer a female voice (easier to understand), the Devantech SP03 is now available in Female voice type (soon to be updated on the website). The hardware based solutions do not necessarily provide the PC software based voice recognition flexibility but work well within their limits. The event buttons on the mangevation are used to trigger playback of phrases/sounds.

cbenson · June 25, 2008, 1:40pm

The Magnevation Speakjet is truly versatile, but with this added versatility, comes increased programming. The module can produce a huge variety of sounds that can be used to make up parts of speech. It comes included with 72 “speech elements”, as well as 43 “sound effects”. The 72 speech elements can be used to make words, and subsequently sentences. The module as such does not include a “male” or “female” voice, but rather digitally sythesized sounds that make up voice (as such the sounds would be unisex - neither male nor female). Programmed pauses could separate different words.

The Devantech Speech synthesizer however is able to translate regular text into (almost) proper speech with a more male sounding voice (female version soon available). The magnevation Speakjet, which is one quarter the price, could possibly be used to synthesize a male or female voice, but would require significantly more programming and testing. There are sample sounds (rather than voices) available at speakjet.com/ (click the banner to enter, then “sample sounds”)

Hope this explains a bit more.

710mlcola-2 · June 17, 2008, 12:42pm

I am not currently experimenting with speech to speech, but the concept has always fascinated me - I suspect science fiction was first to have humans interact verbally with robots, and it’s nice to see that companies are following suit and developing products accordingly. I would think the setup would be in several parts are JBrunet indicated:

Human speech (via microphone) to computer.
Computer converts the speech to text (via software).
Computer interprets the text and a search algorithm finds the response.
The response is converted to speech and output via speaker.

I have seen all the components separately (AIMovie was one of the first I saw that had a simple but free artificial intelligence which could reply to whole sentences). Telephone systems like FIDO’s convert speech to speech effectively, so maybe people are developing freeware? Try to find the companies that produce these systems and see if they offer a free (albeit simple) version you can modify yourself.

My advice is to start simply - if you want the computer to “learn”, treat it like a 2 year old; have code that can interpret “Hi my name is” and then look for a variable “name”. The automated response would be “Hello {name}”. Having a well-planned default reply “I’m sorry, I don’t understand what you said, I’m still learning” makes it sound more intelligent that it might be.

Good luck with the project and I hope you keep us posted here with your progree - I for one am very interested!