[SOLVED] Cloud Flashing hardly works when this program is loaded

I’ve spent days now trying to get the Spark to reliably communicate both in terms of flashing a program to it via the cloud, and also with respect to it’s communication with a live server. Spark (Dave & bko) has been quite helpful, but problems still persist.

I can take the Blink sample and cloud flash it to my Spark, and it works 12 times out of 15 tries. Not perfect, but can live with it.

But if I take the slightly more complicated program listed below I’m only successful at flashing 3 times out of 15. Since the web based IDE waits so long to recover after a failed attempt in makes development almost unbearable.

So, why is the following program almost preventing flashing from the cloud?

In a nutshell, the program sends a GET to our server once every 5 seconds and listens for a result, sending it out Serial1.

BTW, I’ve tried this on boards with chip antenna and external antenna. Right now my RSSI is runing at -52. Other wireless devices work ok with this router (Cisco E2000). Nevertheless if someone at Spark could verify that this program impedes cloud communication (or not) it would be helpful.

The delay(10)s are in the program to suppress lots of red SOS messages I was getting. Just a stop gap till the problem can be solved. The SPARK_WLAN_Loop() was put in to try to overcome the cloud problem but I don’t think it helped at all.

Thanks

// 
// June 28, 2014
// Firmware version 0.2.3
//
// This is a test program which demonstrates loss 
// of cloud support due to program complexity.
// If I flash the Spark with the sample blink program 
// it succeds 12 times out 15 tried.
//
// With this code I am only sucessful flashing the device 
// 3 times out of 15 tries.
//
// Notes:
// Delays were added after client.println()S to prevent frequent
// red SOS messages.
// SPARK_WLAN_Loop() was added in attempt to allow access to the 
// Spark Cloud but it had no effect.
//
TCPClient client;
byte serverAddress[] = { 50, 22, 11, 32 };
unsigned long lasttime = 0L;
int connects = 0;
int fails = 0;
volatile int Counter;
//void counterISR(void);

//void counterISR()
//{
//    Counter++;
//}
 
void setup()
{
    //pinMode(D0, INPUT);
    //attachInterrupt(D0, counterISR, RISING);
    Serial1.begin(9600);
    delay(2000);
    Serial1.flush();
    Serial1.println(Network.SSID());
    Serial1.println(Network.gatewayIP());
    Serial1.println(Network.subnetMask());
    Serial1.println(Network.localIP());
    Serial1.println(Network.RSSI());
}
 
void loop()
    {
    char c;
    unsigned long time;
    time = millis();
    
    if ((time - lasttime) >= 5000L)
        {
        lasttime += 5000L;
        Serial1.println(time);
        Serial1.write("\r\n");
        Serial1.println(Counter);
        Serial1.write("\r\n");
        //noInterrupts();
        //Counter = 0;
        //interrupts();

        if (client.connect(serverAddress, 80))
            {
            Serial1.write("(Try65 Connect Success)\r\n");
            delay(10);
            client.println("GET /get_test.php HTTP/1.1");
            delay(10);
            client.println("Host: www.takenwithyou.com");
            delay(10);
            client.println("Content-Length: 0");
            delay(10);
            client.println();
            delay(10);
            while ((millis() - lasttime) < 1000)
                {
                SPARK_WLAN_Loop();
                c = client.read();
                if (c >= 0)
                    {
                    Serial1.print(c);
                    }
                }
            client.stop();
            } 
      else
            {
            Serial1.write("(Connect Fail) ");
            client.stop();
            }

        }
    }

Hi @faraday,

I wonder if you moved the delay(10)s into one delay(100) right before the client.stop() call you might have more luck? I don’t see anything that should be an issue, but I know sockets are limited. I’ve also seen client.flush(); work wonders before a stop as well. I’ll ping @satishgn and @zachary in case they can shed some light on this.

Thanks,
David

Dave, in order to stop all of the red SOS hard stops I had to put the delay before the client.read() loop. When I used one delay(100) there I lost a lot of the characters coming back from the live server. I would try it again, but it takes so many tries to flash a new program that I haven’t been able to do it for 30 minutes now, and I’m kind of getting tired of trying. If the other fellows on the team have any concrete ideas it would be a great help.

Did you have an opportunity to try the code yourself and see if it fails in your environment??

Hi @dave @faraday, i will try to debug this issue.

2 Likes

You rock, thank you @satishgn!

From my very amateur understanding, it might be possible that the program that’s on the Spark is interfering with the attempts at flashing it - probably due to the delays.

Do give this thread a peek : https://community.spark.io/t/spark-not-accepting-new-firmware/5043

cya
R

1 Like

So just looking at this part of your issue... I'm wondering how long the Spark Web IDE will attempt to send a message to the Core to reprogram it before giving up. If it's less time than a few seconds, the ability of the Core to receive this signal is blocked by the client.connect() code. @satishgn can you comment on these times?

If so, you could circumvent this problem by adding in a forced reprogram mode like this:

Rehaan, the delays were necessary to prevent an even more serious issue of the Spark crashing due to red SOS events.

BDub, I had thought about doing some sort of work around like sensing an input to put the Spark into a state where re-flashing would be easier. Trouble is this doesn’t solve the long term problem when you have a gillion of these things deployed and you can’t get to them.

Have any of you kind gentlemen been able to plug the code in and replicate this problem? That would tell me if I have something wrong here in my development environment, or if it’s a real problem.

Hi @faraday, there seems to be an issue in using the OTA update with your test code. You don’t need the SPARK_WLAN_Loop() to be explicitly called in your case since it’s anyway called unless there is a lengthy/infinite loop in which case adding a simple delay(10) should be sufficient. So after removing SPARK_WLAN_Loop() in your loop() code, I sometimes see a stuck Magenta color and then core restarts.

Now your’s is a good test-case to reproduce this bug: https://community.spark.io/t/bug-bounty-ota-flash-reliability-issue/ or https://github.com/spark/core-firmware/issues/233

Currently working on debugging this and other similar issues.

2 Likes

Thanks Satishgn for looking at this and verifying that it is an issue.

You mentioned that you sometimes see the stuck Magenta. I wonder if you’ve also seen a variety of other (not good) color patterns while doing this. I’ve seen stuck blue, rapid blue, funny magenta, brief magenta, stuck magenta, pretty much everything but red (unless of course I remove the delays and then I see that).

Reason I mention this is that I’m wondering if the situation could possibly get worse depending on network latency etc. You’d never see it in a good environment, but real world deployment would. We are in the mountains and have a connection via a 2.4GHz link to a nearby tower (our internet is quite satisfactory of all normal use). Here is a tracert from my XP machine. Note though that things get worse in parts of the network far removed from us:

C:\Documents and Settings\user>tracert www.spark.io

Tracing route to elb024912-2143350218.us-east-1.elb.amazonaws.com [54.225.148.31]
over a maximum of 30 hops:
1-5 removed for privacy reasons but all were under 60ms.
  6    35 ms    27 ms    37 ms  te-5-0-0-spk-cr1.cet.com [206.63.80.1]
  7    50 ms    57 ms    46 ms  te-2-1-sea-cr1.cet.com [198.202.26.5]
  8    16 ms    33 ms    31 ms  ge-6-4.car3.Seattle1.Level3.net [4.71.152.73]
  9    85 ms    92 ms    94 ms  ae-32-52.ebr2.Seattle1.Level3.net [4.69.147.182]
 10    86 ms    83 ms   104 ms  ae-2-2.ebr2.Denver1.Level3.net [4.69.132.54]
 11   100 ms   101 ms   101 ms  ae-3-3.ebr1.Chicago2.Level3.net [4.69.132.62]
 12   137 ms   122 ms    99 ms  ae-1-100.ebr2.Chicago2.Level3.net [4.69.132.114]
 13   112 ms   140 ms   134 ms  ae-6-6.ebr2.Washington12.Level3.net [4.69.148.145]
 14   107 ms    99 ms   103 ms  ae-47-47.ebr2.Washington1.Level3.net [4.69.202.57]
 15    86 ms    96 ms   106 ms  ae-82-82.csw3.Washington1.Level3.net [4.69.134.154]
 16   100 ms    94 ms   100 ms  ae-1-80.edge1.Washington1.Level3.net [4.69.149.141]
 17    92 ms    98 ms   107 ms  AMAZON.COM.edge1.Washington1.Level3.net [4.28.125.110]
 18   100 ms   135 ms   117 ms  72.21.220.141
 19    97 ms   110 ms   120 ms  72.21.222.33
 20     *        *        *     Request timed out.
 21     *        *        *     Request timed out.
 22     *        *        *     Request timed out.
 23   111 ms   141 ms   109 ms  216.182.224.81
 24     *        *        *     Request timed out.
 25     *        *        *     Request timed out.
 26     *        *        *     Request timed out.
 27     *        *        *     Request timed out.
 28     *        *        *     Request timed out.
 29     *        *        *     Request timed out.
 30     *        *        *     Request timed out.

Trace complete.

correct @faraday, I wasn’t able to reproduce this bug while on a 3G/HSPA mobile network but then when I downgraded it to 2G/Edge network, things started behaving erratic because of slow speed.

1 Like

Hi @faraday, I got your issue fixed in the following commit: https://github.com/spark/core-firmware/commit/6d0b3ad20cd7d5ecbbb64bea6c26160290ac0e61

If you like to test it by building locally, please use the following source repos:



https://github.com/spark/core-firmware/tree/bug-fix/ota-flash-reliability

3 Likes

Thanks!!

Really dumb question-- do I need to get setup with JTAG or serial to do this or can it be done via WiFi? I’m pretty sure I can build the firmware ok, just unsure on the easiest way to get the code into the device. I’m running XP, have JTAG breakout board and an ST-Link, but would prefer wireless if that’s possible.

Hi @faraday,

Good question! You can use something like the Spark CLI ( https://github.com/spark/spark-cli ) to flash compiled binaries right to your core when it’s online, or via usb if you have dfu-util and the drivers installed. With the CLI the commands would be:

via the cloud:
spark flash my_core_name firmware.bin

via usb / dfu:
spark flash --usb firmware.bin

Thanks!
David

This looks like it fixed the problem.

I took the core-firmware.bin file (77KB 7/2/2014 5:54AM) and programmed the Spark using CLI:

spark flash mydevicename core-firmware.bin

I repeated this several times and it seemed to work without any problems.

Then I did:
spark flash mydevicename testfile.ino

Doing this several times seems to be working.

My question is, when you flash the .ino program through CLI, does it also re-flash the firmware? And if so, which version does it use? The one I just flashed, or the current release version from the web site?

What about if you flash from the web IDE? Does it replace the newer good firmware with the old release version buggy one?

1 Like

Hi @faraday,

Good question! If you pass source files (like ‘ino’, or ‘cpp’ files), they’re compiled using the same system as the build IDE. In both cases, your source is compiled against the compile-server2 branches on the various core-firmware repositories.

Your source, and your setup/loop functions are run inside the main firmware for the Core. So when you re-flash the program, you’re replacing all the running code on the Core (except the bootloader), with your new program.

I hope that helps! :slight_smile:

Thanks,
David

Thanks David.
So, how do I use the CLI to flash my user land program without disturbing the core firmware?

Or, if I have to do it this way, what is the CLI command to upload both my fixed core firmware & my user land program at the same time?

(I am assuming that this would never be possible using the web interface)

Hi @faraday,

Hmm, your user-land software is also the core firmware. :slight_smile: Your source files are combined with that source project, and compiled together. If you setup a local build - (instructions here https://github.com/spark/core-firmware ), then you have total control over what is loaded and run. You can flash your compiled binary with just spark flash my_core my_firmware.bin, and that’ll copy your application verbatim to the core.

Thanks!
David

Wow, it took going the CLI route for me to see how this whole thing is supposed to work. Actually, while the web interface is nice, it hides so much of what’s going on that I kind of got off on the wrong track.

I programed my first microcomputer (Z80) with a teletype (you know, punched paper tape), and I spent a great deal of time with the early PIC’s (< 200 bytes of RAM), then on through 16 bit and now 32 bit microcontrollers.

It’s all been a wild ride, but I’d have to say that the Spark Core is one of the most exciting technologies that I’ve ever encountered.

Thanks for your foresight, all the hard work, and help along the way (Satishgn, Dave, BDub and others).

1 Like