Machine learning is becoming ever more important for synthesizing and learning from data captured by connected devices (like Nest or Fitbit), but solutions take time to build and are often challenging to implement in practice. At glowfi.sh, we want to change that by making machine learning as easy as posting event data to Particle Cloud. Our API consumes streaming data from devices like the Photon, via Particle webhooks, and is capable of returning predictions back to the device, specified URL, or ThingSpeak for viz.
To get started, just sign up here for free access.
Here are some of our ideas for using glowfi.sh on Particle devices:
Real-time analytics - post sensor data to glowfi.sh and let us categorize it, compile it, and return statistics of usage.
Intelligent sensing - we have endpoints that can intelligently group multi-sensor data (with or without supervision) to estimate things like activity patterns or predict outcomes.
Collaborative data analysis and learning across networks of Particle devices publishing to common public events.
The short answer is yes, but we would have to consider how the data is being sent to glowfi.sh. Are the short sound clips in the form of a continuous time-series from your device or do these sounds trigger the device to send a single event from the device? Also, with unsupervised classification like this, we could identify events as being different; however, there would be some required identification of events if some were âgoodâ and others were âbadâ and you wanted to know which was which.
We are happy to work with you to test your idea out. Let me know if you are interested.
Cool! It would have to be short clips - Iâm thinking 1 second clips at 8kHz, with 12-bit samples. Not continuously, perhaps only triggered when it looks like thereâs something in the clip.
I was going to generate labeled samples for learning, and my hope is that the machine can then identify which label would go with a new sample. I expect a handful of labels.
Would it be possible to download the result of the learning, to run locally? This way I could save energy by not connecting to the cloud (or perhaps only connecting for ambiguous samples).
Learning usually requires more than a few samples to produce accurate predictions, but hey, letâs give it a shot. I think that with chirps as you describe them, you will want to transform the signal to the frequency domain so that attributes (frequency components) of the chirp are independent. This is probably the best way to classify these types of signals as categories of sounds. We can chat about this if you want.
As far using the learning model on your device, in our classification âtrainâ endpoint, we have an option to return a statistical model. But using that independent of our API âpredictâ endpoint is also somewhat involved.
OK, I can try to generate more - it may take a few weekends but I guess we can try even before I have the full set.
That signal transformation is going to be a bit of work⌠perhaps I should do that on a computer instead of on the chip? If I have to send it out anyways, it may make sense. Then again if I do that, Iâll have to code the FFT all over again on the chip if I decide later to try using the statistical model.
Is there documentation about the model? I couldnât find it on the train API page.
Thank you for your help so far! It feels a bit intimidating to have a CEO help me with my hobby project
We are an early-stage startup, so I do all kinds of thingsâŚand I like to solve problems like this. Anyway, CEOs donât become intimidating until companies get much much bigger than us
Let me suggest a couple of things. Why donât you send me a few examples of the different chirp signals in whatever format you want. Weâll take a look at the signals and determine if transformation is beneficial and let you know. Then if it is advantageous to transform (which I believe it will be), we can probably provide you with cpp code for a simple DFT/DCT to perform on the deviceâŚI think it is doable on a Photon. In terms of the model glowfi.sh produces, that depends on the endpoint and the number of features sent. Iâll provide you some reference later on what type of model we produce, as this is not the typical use case for us so we donât document it precisely in our API docs.
To answer your request, yes, glowfi.sh can distinguish between acoustic chirp signals from various sourcesâŚ
I took your 167 acoustic chirp files (the new ones only), transformed them using a Discrete Cosine Transform to get the 8000 de-correlated frequency components corresponding to your 8000 time samples (I ignored signal phase for this test). I then ran the transformed 8000 components for all 167 1-sec chirps through the glowfi.sh feature_select endpoint to determine which time components were discriminating for the threes cases you have (washer on - sensor on washer, washer on - sensor on dryer, and dryer on - sensor on dryer). Our feature selection identified 1352 frequency components of the 8000 that were significant in differentiating between these cases. I then ran these 1352 components for a randomly selected 146 of the 167 chirps to train a classification model for your three acoustic cases (using our train endpoint). Then I tested this classifier by running the remaining 21 chirps (~7 chirps for each of the three cases) through our predict endpoint. The results are that we achieve a composite accuracy of 81% for correctly predicting the class of your acoustic signals based on DCT transformations. I include a portion of the Json return showing other accuracy numbers. The total run time for predicting the 21 samples was ~500ms.
I include a plot of log10(Amplitude) of DCT transforms of three example acoustic chirps.
Let me know if you want me to go over the flow to/from our API.
Big Thank YOU to @michaelT for helping me with a data model in a matter of a few minutes. I gave him what I wanted to model, what data is/will be available and he laid out a clear and simple path to get to the analytical output I want. Very helpful and clearly communicated what I needed in order to achieve my goals. When I get data feeds live and have results I will share more. Thanks again.
If you look at our github repo for particle here, you can find the code for formatting data in Json for glowfi.sh POSTs via particle webhooks. I can provide guidance on DCT transform as well.
Thanks! So Iâm looking at all this and trying to figure it out. For the feature_select endpoint it looks like each individual frequency would be a feature (so Iâd have perhaps 8000 features), and then for each of those I put one value per sample I have (or 167 samples, as you did). So Iâd pass in a dictionary of 8000 keys, each with 167 values. But how do I tell it which samples are washer and which are dryer?
Yes. You pass in 8000 features for each file and a response variable for each as well that tells glowfi.sh what it is. So, a response looks like some integer or string that denotes the case for each chirp. In your cases, a response for sensor on washer and washer ON could be âWWâ, while washer on and sensor on dryer could be âDWâ. What ever you denote the response glowfi.sh will return the same when you ask for a prediction.
The first row of input for feature_select for a single file (you would have 167 of these) could look like :
The return from feature select will be the most important frequencies in determining the difference between your three cases: dryer dryer, washer dryer, and washer washer. Then you would only pass in those important frequencies with the respective responses to our âtrainâ endpoint to create a model to predict from. Then in the future you pass in those same frequencies with NO response to our endpoint âpredictâ and glowfi.sh gives you the case: washer washer, dryer dryer or washer dryerâŚbased on its predictions. So, feature select and âtrainâ are done in advanceâŚand calls to predict are done whenever you want a predictionâŚevery 5sec for example if your washer and dryer are running and you want to know which is whichâŚif you want to know when they turn off, you need to build another case which is âall offâ to feed to train called âoffâ and train with that as well with the other 3 cases. I can send you files I used if you need examples.
OK, thanks! Iâll try it next time I have a moment. So as a beginner, I was confused by (a) it wasnât clear that ââfeature1ââ in the example was text I could change, and (b) how to specify outputs (âdependent variablesâ). In fact Iâm still confused. Is there something magical about naming it âresponseâ? Because I am still not sure why the algo doesnât end up telling me that the âresponseâ feature is the one that I should select. And of course itâs the one thing I canât give it.
a) yes, you can name the features anything you like as long as you are consistent from API call to API call. And you need to use the same names when you call âtrainâ endpoint (to build the predictive model) and âpredictâ (to get a prediction from features).
b) the key âresponseâ is so that glowfi.sh knows what data describes the âclassâ your current training data is in. So, when you call âtrainâ endpoint to build you predictive model, you do know the response, it will be your tags for your acoustic signals (e.g., washer on sensor on washer could be called âWWâ and washer on sensor on dryer could be called âWDâ). So, the âresponseâ key in your Json for âtrainâ would look like this: {âresponseâ:[âWWâ,âWWâ,âŚâWDâ]}. Now, when you want to get a prediction from glowfi.sh for an acoustic chirp that you donât have a response for, then you just pass in your Json with the feature data and no response key. Then glowfi.sh returns to you an array equal to the length of your input data that gives you âresponseâ:[, âŚ]} where N is the number of rows of input features used when calling âpredictâ endpoint.
I was asking about feature_select; Iâm not yet ready for train and predict (I first have to figure out why my sensor seems to have stopped working).
I think youâre saying that the call to feature_select must include one field named âresponseâ (with the output), and all the other fields can be named anything (and theyâre inputs, aka features).