Glowfi.sh machine learning service - free for Particle users

Particle community, we need beta users!

Machine learning is becoming ever more important for synthesizing and learning from data captured by connected devices (like Nest or Fitbit), but solutions take time to build and are often challenging to implement in practice. At glowfi.sh, we want to change that by making machine learning as easy as posting event data to Particle Cloud. Our API consumes streaming data from devices like the Photon, via Particle webhooks, and is capable of returning predictions back to the device, specified URL, or ThingSpeak for viz.

To get started, just sign up here for free access.

Here are some of our ideas for using glowfi.sh on Particle devices:

  1. Multi-sensor anomaly detection and notification - Check out our Photon real-time learning example video below (code is at https://github.com/glowfishAPI/glowfish-particle).
  2. Real-time analytics - post sensor data to glowfi.sh and let us categorize it, compile it, and return statistics of usage.
  3. Intelligent sensing - we have endpoints that can intelligently group multi-sensor data (with or without supervision) to estimate things like activity patterns or predict outcomes.
  4. Collaborative data analysis and learning across networks of Particle devices publishing to common public events.

Thanks,

Mike and the glowfi.sh team

To learn more about us, you can check out our site (https://glowfi.sh) and our API docs (http://glowfish.readme.io)

3 Likes

Would this work with short sound clips? It’d be cool to distinguish between a closing door, dog bark, or something else.

@cyan

The short answer is yes, but we would have to consider how the data is being sent to glowfi.sh. Are the short sound clips in the form of a continuous time-series from your device or do these sounds trigger the device to send a single event from the device? Also, with unsupervised classification like this, we could identify events as being different; however, there would be some required identification of events if some were “good” and others were “bad” and you wanted to know which was which.

We are happy to work with you to test your idea out. Let me know if you are interested.

Mike

Cool! It would have to be short clips - I’m thinking 1 second clips at 8kHz, with 12-bit samples. Not continuously, perhaps only triggered when it looks like there’s something in the clip.

I was going to generate labeled samples for learning, and my hope is that the machine can then identify which label would go with a new sample. I expect a handful of labels.

Would it be possible to download the result of the learning, to run locally? This way I could save energy by not connecting to the cloud (or perhaps only connecting for ambiguous samples).

@cyan

Learning usually requires more than a few samples to produce accurate predictions, but hey, let’s give it a shot. I think that with chirps as you describe them, you will want to transform the signal to the frequency domain so that attributes (frequency components) of the chirp are independent. This is probably the best way to classify these types of signals as categories of sounds. We can chat about this if you want.

As far using the learning model on your device, in our classification “train” endpoint, we have an option to return a statistical model. But using that independent of our API “predict” endpoint is also somewhat involved.

Mike

@michaelT

OK, I can try to generate more - it may take a few weekends but I guess we can try even before I have the full set.

That signal transformation is going to be a bit of work… perhaps I should do that on a computer instead of on the chip? If I have to send it out anyways, it may make sense. Then again if I do that, I’ll have to code the FFT all over again on the chip if I decide later to try using the statistical model.

Is there documentation about the model? I couldn’t find it on the train API page.

Thank you for your help so far! It feels a bit intimidating to have a CEO help me with my hobby project :smile:

@cyan

We are an early-stage startup, so I do all kinds of things…and I like to solve problems like this. Anyway, CEOs don’t become intimidating until companies get much much bigger than us :wink:

Let me suggest a couple of things. Why don’t you send me a few examples of the different chirp signals in whatever format you want. We’ll take a look at the signals and determine if transformation is beneficial and let you know. Then if it is advantageous to transform (which I believe it will be), we can probably provide you with cpp code for a simple DFT/DCT to perform on the device…I think it is doable on a Photon. In terms of the model glowfi.sh produces, that depends on the endpoint and the number of features sent. I’ll provide you some reference later on what type of model we produce, as this is not the typical use case for us so we don’t document it precisely in our API docs.

Mike

@cyan

Feel free to email me at michael@glowfi.sh when you have your samples.

Mike

I did, thank you so much for working with me on that, it’s awesome! This is a lot more than I was expecting :smile:

@cyan

To answer your request, yes, glowfi.sh can distinguish between acoustic chirp signals from various sources…

I took your 167 acoustic chirp files (the new ones only), transformed them using a Discrete Cosine Transform to get the 8000 de-correlated frequency components corresponding to your 8000 time samples (I ignored signal phase for this test). I then ran the transformed 8000 components for all 167 1-sec chirps through the glowfi.sh feature_select endpoint to determine which time components were discriminating for the threes cases you have (washer on - sensor on washer, washer on - sensor on dryer, and dryer on - sensor on dryer). Our feature selection identified 1352 frequency components of the 8000 that were significant in differentiating between these cases. I then ran these 1352 components for a randomly selected 146 of the 167 chirps to train a classification model for your three acoustic cases (using our train endpoint). Then I tested this classifier by running the remaining 21 chirps (~7 chirps for each of the three cases) through our predict endpoint. The results are that we achieve a composite accuracy of 81% for correctly predicting the class of your acoustic signals based on DCT transformations. I include a portion of the Json return showing other accuracy numbers. The total run time for predicting the 21 samples was ~500ms.

I include a plot of log10(Amplitude) of DCT transforms of three example acoustic chirps.

Let me know if you want me to go over the flow to/from our API.

Mike

glowfi.sh API return

“accuracy_data”: {
“recall”: [
0.8,
0.57,
1.0
],
“f1_scores”: [
0.67,
0.67,
1.0
],
“precision”: [
0.57,
0.8,
1.0
],
“class_names”: [
“DD”,
“DW”,
“WW”
],
“Composite_Accuracy”: 0.81

Big Thank YOU to @michaelT for helping me with a data model in a matter of a few minutes. I gave him what I wanted to model, what data is/will be available and he laid out a clear and simple path to get to the analytical output I want. Very helpful and clearly communicated what I needed in order to achieve my goals. When I get data feeds live and have results I will share more. Thanks again.

wow @michaelT, that’s awesome! It may have to wait for the weekend but I’m super excited!

@cyan

If you look at our github repo for particle here, you can find the code for formatting data in Json for glowfi.sh POSTs via particle webhooks. I can provide guidance on DCT transform as well.

Mike

Thanks! So I’m looking at all this and trying to figure it out. For the feature_select endpoint it looks like each individual frequency would be a feature (so I’d have perhaps 8000 features), and then for each of those I put one value per sample I have (or 167 samples, as you did). So I’d pass in a dictionary of 8000 keys, each with 167 values. But how do I tell it which samples are washer and which are dryer?

@cyan

Yes. You pass in 8000 features for each file and a response variable for each as well that tells glowfi.sh what it is. So, a response looks like some integer or string that denotes the case for each chirp. In your cases, a response for sensor on washer and washer ON could be ‘WW’, while washer on and sensor on dryer could be ‘DW’. What ever you denote the response glowfi.sh will return the same when you ask for a prediction.

The first row of input for feature_select for a single file (you would have 167 of these) could look like :

‘Frequency 1’: (freq 1 datapoint), ‘frequency 2’: (freq 2 datapoint), …, ‘frequency 8000’: (freq 8000 datapoint), ‘response’: ‘DD’

The return from feature select will be the most important frequencies in determining the difference between your three cases: dryer dryer, washer dryer, and washer washer. Then you would only pass in those important frequencies with the respective responses to our ‘train’ endpoint to create a model to predict from. Then in the future you pass in those same frequencies with NO response to our endpoint ‘predict’ and glowfi.sh gives you the case: washer washer, dryer dryer or washer dryer…based on its predictions. So, feature select and ‘train’ are done in advance…and calls to predict are done whenever you want a prediction…every 5sec for example if your washer and dryer are running and you want to know which is which…if you want to know when they turn off, you need to build another case which is ‘all off’ to feed to train called ‘off’ and train with that as well with the other 3 cases. I can send you files I used if you need examples.

OK, thanks! I’ll try it next time I have a moment. So as a beginner, I was confused by (a) it wasn’t clear that “‘feature1’” in the example was text I could change, and (b) how to specify outputs (“dependent variables”). In fact I’m still confused. Is there something magical about naming it “response”? Because I am still not sure why the algo doesn’t end up telling me that the ‘response’ feature is the one that I should select. And of course it’s the one thing I can’t give it.

@cyan

a) yes, you can name the features anything you like as long as you are consistent from API call to API call. And you need to use the same names when you call “train” endpoint (to build the predictive model) and “predict” (to get a prediction from features).

b) the key “response” is so that glowfi.sh knows what data describes the “class” your current training data is in. So, when you call “train” endpoint to build you predictive model, you do know the response, it will be your tags for your acoustic signals (e.g., washer on sensor on washer could be called “WW” and washer on sensor on dryer could be called “WD”). So, the “response” key in your Json for “train” would look like this: {“response”:[“WW”,“WW”,…“WD”]}. Now, when you want to get a prediction from glowfi.sh for an acoustic chirp that you don’t have a response for, then you just pass in your Json with the feature data and no response key. Then glowfi.sh returns to you an array equal to the length of your input data that gives you “response”:[, …]} where N is the number of rows of input features used when calling “predict” endpoint.

Does this help?
Mike

I was asking about feature_select; I’m not yet ready for train and predict (I first have to figure out why my sensor seems to have stopped working).

I think you’re saying that the call to feature_select must include one field named “response” (with the output), and all the other fields can be named anything (and they’re inputs, aka features).