What hardware, shields, code, etc would I need to set up a Photon to listen for vocal cues and play back preconceived replies?
This encompasses a lot of functionality - I’m not photon expert so am not sure of the hardware that may be available to help, but just from a software development background I think you would need to cover;
- sampling audio locally to the photon, so some sort of microphone input
- a method for understanding what you are hearing - there are cloud services that can take audio data via REST API’s and return the results of Speech-To-Text analysis. You would need to take your RAW audio sampled data and deliver it to the service in a format it can understand. You may be limited by the photon’s memory as to how long a piece of audio you can use, or look to see if it is possible to stream audio data to one of the cloud services. You would likely use REST API calls and an HTTP Client to achieve this.
- A Method for retrieving the audio you wish to play back from some resource / SSD Card / URL in the cloud etc… - because I dont think the photon has the memory to be keeping this all inside its head.
- Be able to convert that audio data if needed, and the drive a speaker from the photon to play it back.
Hope this helps, at least a bit
Something like this might work, depending on what you’re trying to do: https://www.sparkfun.com/products/13316
As for specifics, I was hoping to have it hear a few words, and say them back in various languages depending on the selection input.
Was going to use the cloud for the sound files (prerecorded sound bites), or a scripted translator that reads the text back from something like google translate out loud.
I’m a decent coder, and have tinkered with other products a bit, but nothing as in depth as this. The photon is difficult to find specific answers for without directly asking, so figured I’d post.
As for that VR board. A bit expensive, but I’ll look to see if I can find the 2.0 version, since it’s probably cheaper, and what I saw on that listing, the 2.0 is pre assembled while the 3.0 is not.