I have two cores, both connected to my home Wi-Fi. Core 2 is indoors - it Publishes the temperature / humidity as well as the state of two toggling push buttons. Core 1 is outdoors - it Publishes the temperature / humidity and Subscribes to the state of the push buttons to turn on / off two relays controlling two lights. The relays can also be controlled by a Function.
Everything works fine for days at a time then, spuriously, Core 1 fails to respond to the Published push button states. I have verified that Core 2 is still Publishing the states. Core 1 continues Publishing the temperature / humidity and responds to the Function controlling the relays.
The only way to remedy the problem is to power off both cores and then power themback on. Doing one at a time does not remedy the problem.
Any idea what could be happening and how to deal with it?
Hey @drbrodie ā most likely youāre seeing this known issue: https://github.com/spark/firmware/issues/278
No timeline on a fix, but Iād love to get it into the next release. Iāll add a comment to the issue linking here.
I implemented the fix recommended by FRAGMA Sep '14 and the problem was solved.
However, a few days ago I flashed my existing publish / subscribe code to a new third core. At first it worked just fine but then after a few hours neither Core2 nor Core3 subscribe to the data published by Core1 (which I have verified to be published). Furthermore, until I commented out the āFRAGMA fixā I would periodically get the SOS Code 1.
Any suggestions?
Hi @drbrodie,
Hmm, I think the firmware team is hoping to address this problem this sprint, but in the meantime you could do a āscorched earthā type workaround of say, adding a āheartbeatā event once a minute, and if your listeners donāt receive it after a few consequetive minutes, have them call System.reset(); This a bit brute force, but should help things fix themselves if messages stop coming through:
unsigned int last_heartbeat = 0;
#define HEARTBEAT_PERIOD_SECONDS 60
#define MAX_MISSED_HEARTBEATS 3
void setup() {
Serial.begin(115200);
Spark.subscribe("heartbeat", heartbeat_handler);
//Spark.subscribe("heartbeat", heartbeat_handler, MY_DEVICES);
last_heartbeat = millis();
}
void loop() {
//it might take a few seconds for the real time clock to get synced, lets assume we weren't turned on in the 1970s, and sync the clock...
//lets essentially keep this on pause until the clock syncs up
if (last_heartbeat < 60000) {
last_heartbeat = millis();
}
double elapsedSeconds = (millis() - last_heartbeat) / 1000.0;
if (elapsedSeconds > (MAX_MISSED_HEARTBEATS * HEARTBEAT_PERIOD_SECONDS)) {
Serial.println("Subscribe is dead, long live subscribe!");
delay(500);
System.reset();
}
else {
Serial.println("things are okay... but it's been " + String(elapsedSeconds) + " since last heartbeat");
delay(1000);
}
}
void heartbeat_handler(const char *topic, const char *data) {
last_heartbeat = millis();
Serial.println("Heartbeat... at " + Time.timeStr());
}
Dave,
I donāt know if that is a solution. I have manually reset the cores numerous times but have never been able to re-establish a subscription, not once, not even for a second, since the initial failure. Does System.reset() do anything different than a manual reset? If not, I think we need to find out why the failure to subscribe of the hardware / firmware which had been operating well for weeks coincided so closely with the addition of the third core.
Dean
Hi @drbrodie,
Iām just seeing your message now, looks like I didnāt get tagged, sorry.
Hmm, thatās weird. Would you want to share your code and we can take a look?
Thanks,
David
Dave,
Here is the code which, previous to this problem ran flawlessly for months. There are numerous other things going on here but the key point is that the it does not seem to subscribe to event TEMPHUMPOOL sent by another core. If the code previously added to correct spurious failures, āspark_protocol.send_subscription(ātemphumpoolā, SubscriptionScope::MY_DEVICES);ā is not commented out, the core periodically goes into SOS Code 1 and resets.
// This #include statement was automatically added by the Spark IDE.
#include "LiquidCrystal.h"
// This #include statement was automatically added by the Spark IDE.
#include "dht.h"
#define DHTPIN D4
#define DHTTYPE DHT22
DHT dht(DHTPIN, DHTTYPE);
extern SparkProtocol spark_protocol;
char eventinfo[64];
unsigned int ms;
int publishdelay = 5 * 60 * 1000;
#define ONE_DAY_MILLIS (24 * 60 * 60 * 1000)
unsigned long lastSync = millis();
void displayData(const char *data, const char *poolData);
LiquidCrystal lcd(A5, A4, A3, A2, A1, A0);
int inputPin1 = D0; //local button
int inputPin2 = D1; //local button
int sendLedPin1 = D5;//local LED
int sendLedPin2 = D6;//local LED
int sendLedVal1 = 0; //local LED status
int sendLedVal2 = 0; //local LED status
int sendLedVal1Old = 0; //local LED status
int sendLedVal2Old = 0; //local LED status
unsigned long lastPub = millis();
unsigned long lastSub;
unsigned long elapsedSub;
unsigned long lastLcd = millis();
void PublishDHTInfo(){
float h = dht.readHumidity();
float t = dht.readTemperature();
float d = dht.dewPoint(t, h);
t = (t*1.8) +32;
d = (d*1.8) +32;
sprintf(eventinfo, "T=%.0f H=%.0f%% DP=%.0f", t, h, d);
Publish(eventinfo);
}
void setup(){
dht.begin();
Spark.subscribe("temphumpool", displayData, MY_DEVICES);
lcd.begin(20, 4);
lcd.print("Out:");
lcd.setCursor(0, 1);
lcd.print("Waiting for data");
lcd.setCursor(0, 2);
lcd.print("In:");
lcd.setCursor(0, 3);
lcd.print("Waiting for data");
pinMode(sendLedPin1, OUTPUT);
pinMode(sendLedPin2, OUTPUT);
digitalWrite(sendLedPin1, LOW);
digitalWrite(sendLedPin2, LOW);
Spark.publish("pToggle1", "State", 0, PRIVATE);
Spark.publish("pToggle2", "State", 0, PRIVATE);
Spark.function("fToggle1", netToggle1);
Spark.function("fToggle2", netToggle2);
attachInterrupt(inputPin1, L1, RISING);
attachInterrupt(inputPin2, L2, RISING);
}
void displayData(const char *data, const char *poolData){
lcd.setCursor(0, 1);
lcd.print(poolData);
lastSub= millis();
}
void Publish(char* szEventInfo){
Spark.publish("temphumhse", szEventInfo);
}
void loop() {
if (millis() - lastPub > 60000) {
PublishDHTInfo();
lastPub = millis();
}
if (millis() - lastLcd > 60000) {
float h = dht.readHumidity();
float t = dht.readTemperature();
float d = dht.dewPoint(t, h);
t = (t*1.8) +32;
d = (d*1.8) +32;
sprintf(eventinfo, "T=%.0f H=%.0f%% DP=%.0f ", t, h, d);
lcd.setCursor(0,3);
lcd.print(" ");
lcd.setCursor(0, 3);
lcd.print(eventinfo);
lastLcd = millis();
}
lcd.setCursor(0,0);
lcd.print("Out:");
elapsedSub= (millis()-lastSub)/1000;
if (elapsedSub >100)
{
spark_protocol.send_subscription("temphumpool", SubscriptionScope::MY_DEVICES);
}
if (sendLedVal1 != sendLedVal1Old)
{
digitalWrite(sendLedPin1, sendLedVal1 ? HIGH : LOW);
Spark.publish("pToggle1", sendLedVal1 ? "ON" : "OFF");
sendLedVal1Old = sendLedVal1;
}
if (sendLedVal2 != sendLedVal2Old)
{
digitalWrite(sendLedPin2, sendLedVal2 ? HIGH : LOW);
Spark.publish("pToggle2", sendLedVal2 ? "ON" : "OFF");
sendLedVal2Old = sendLedVal2;
}
}
int netToggle1(String command)
{
if(command.substring(3,6) == "tgl")
{
sendLedVal1 = !sendLedVal1;
digitalWrite(sendLedPin1, sendLedVal1 ? HIGH : LOW);
Spark.publish("pToggle1", sendLedVal1 ? "ON" : "OFF");
}
return 1;
}
int netToggle2(String command)
{
if(command.substring(3,6) == "tgl")
{
sendLedVal2 = !sendLedVal2;
digitalWrite(sendLedPin2, sendLedVal2 ? HIGH : LOW);
Spark.publish("pToggle2", sendLedVal2 ? "ON" : "OFF");
}
return 1;
}
void L1()
{
sendLedVal1 = !sendLedVal1;
}
void L2()
{
sendLedVal2 = !sendLedVal2;
}
I have run a stripped down version of this code which eliminates everything but the portions related to displaying the subscribed data with no luck.
Dean
@Dave can you update us on the status of this fix? Any idea of a time frame for it? It is seriously holding up my projects. Thanks.
Hi @Muskie,
I think the firmware team hasnāt had a chance to address this yet, so maybe I can add a workaround on the cloud. I think Iāll build and expose a feature so you can ask the cloud to remember your subscriptions until you flash a new app, and set them back up when the device reconnects. This could be a workaround until the firmware can be fixed. I probably wonāt get a chance to look at this for at least a few days, but Iāll bump this thread when I do.
Thanks,
David
Ahh, Iāve totally had this problem too! @Dave, curious if there is any update on your cloud-based workaround? Thanks!
Any news on the fix?
Iām subscribing to sublish events via node-red on a Pi so cannot implement the @Dave workaroundā¦
Heya @achronite,
Hmm, if youāre hitting the API for server sent events, are you seeing that youāre losing the connection, or that events arenāt coming through?
Itāll probably always be the case that network connections will disconnect eventually, so itās good to have your SSE stuff reconnect automatically, we have a few examples for that in SparkJS https://github.com/spark/sparkjs
I hope that helps! 
David
It is a problem on the node-red side, that I suspect is materialising when my spark-core loses internet connection. Redeploying the nodes in node-red forced the subscription to restart. Iāll submit a bug report on the node-red-node-spark code to see if your sparkjs SSE example fix can be incorporated.
Thanks.
The bug for this on github seems to be closed - and a fix of sorts is apparently available - but has it made it to the production code?
Hi @daneboomer,
edit: oops! sent too soon. 
Itās normal for clients listening to SSE events to disconnect periodically, I think the example I wrote in Spark-JS to resubscribe after a disconnect is out in the wild. Iām not sure if you mean something else though?
Thanks,
David
Thanks for your quick reply! 
I did. I wondered if the particle firmware had been fixed yet to not need any software workarounds? Looks like itās been on the cards for a while.
In the meantime, Iāll try to use your code, Dave, but will this method only work four times? Thanks
Hi @Dave, sorry, looks like my last reply wonāt have reached you because I didnāt reference you using the @ symbol. 
Hi @daneboomer,
Ahh, Thanks for the @ 
As far as I know, I think this issue was fixed on the most recent firmware, whatās being used by the Photon. Weāll be making that available to the core as well in the coming weeks.
Thanks!
David
Thanks, @Dave. So if I wait a few more weeks then I wonāt need to use the heartbeat/reset workaround.
In the meantime, is there an alternative Iām overlooking? Could the Cores āpingā each other more directly, bypassing the cloud but obviously still over WiFi? Is there any other more reliable way they can send each other the most basic of messages?
In essence, if a switch connected to Core A is HIGH, I would like an LED on Core B to go HIGH more or less instantly (thatās instant in human not electronics terms). Obviously thereās a bit more to my project than that, but if an alternative can do that, it can do everything else I would ask of it, too.
@Dave, just a thought, there isnāt an alpha or a beta of the firmware available I could use at my own risk to potentially get those bug fixes now, is there?