Unable to reliably flash spark core from CLI (Mac OS X)

rgw · October 29, 2014, 7:54pm

I can’t consistently flash a spark core from the command line interface (Mac OS X). By this I mean that initially (see below for what “initially” means) everything is fine, but after a few successful flashes the core becomes unresponsive, and the CLI gives me this error message: ‘flash core got error: {“code”:“ECONNRESET”}’. All the while, the core is blissfully breathing cyan.

Note that the core is reported as present and online when I ask ‘spark list’. I note this in part because I saw in a related prior post that kennethlimcp wrote “If spark list works, spark flash should work fine.” Unfortunately that’s not so for me.

Merely resetting, or powering off/on the core doesn’t help. But I have come upon a workaround for the problem:

Remove the core using ‘spark core remove’ and kill the Spark iOS app if necessary.
Reset the core by holding MODE, briefly pressing RESET, and releasing MODE when I see fast white flashes.
When the core enters listening mode, I launch the Spark iOS app and claim the core.

If all goes well with the above process, I’ve arrived at the aforementioned “initial” stage, and I can usually flash code to my core a few times before the problem inevitably surfaces again. And of course the workaround is very time-consuming, so it’s frustrating.

A couple of additional notes that might be useful:

I own two separate cores, and the same problem / pattern occurs on both.
The core is placed only a few feet away from my wireless router.
I said above that “the core is blissfully breathing cyan”, but this isn’t quite accurate: every once in awhile, the core briefly switches from breathing cyan to fast-blinking cyan for a second or two. I have no idea what this means or if it’s relevant.

Any guidance is very much appreciated.

kennethlimcp · October 29, 2014, 9:39pm

Connreset means your laptop is not connecting to the spark and there’s a timeout.

You might want to check on your connection.

Flashing cyan also means that the core is trying to connecting to the spark so that’s indicates some issue with the internet

rgw · October 30, 2014, 1:56am

When you say I might want to check on my connection, do you mean my home network connection (cable modem and/or WiFi)? If so, I’m still stumped, because:

I can consistently connect various other devices to my home WiFi network without incident, including some as far away as upstairs and across the house.
As I said in my original post, the spark core is only a few feet away from my wireless router.
As I noted in my original post, ‘spark list’ does in fact accurately report the core as online. If there were a problem with my home network, wouldn’t that also inhibit my Mac’s ability to find the core on my home WiFi network?

I genuinely appreciate your help.

rgw · November 1, 2014, 8:00am

I’m pretty sure I’ve diagnosed the problem, though I had to make some guesses about how the Spark core hardware interacts with sketches uploaded to it.

Summary:

I can’t flash the spark core when the sketch it’s running rarely or never iterates over the Arduino-like main loop(). So long as the extant sketch iterates over loop() frequently, there’s no problem. I’m guessing, but I think this is because the Spark only does certain Wifi admin stuff (e.g. listen for new firmware flashes) when it loop()s.

The obvious workaround is to make sure sketches are constructed such that they frequently loop(). But it appears that iterating over loop() incurs overhead, and in my case the performance differences are dramatic: 2 stepper motors I’m controlling move 5x faster (!) in a sketch that avoids iterating over loop(), vs. an otherwise equivalent sketch that loop()s frequently.

I can cope with the problem by writing sketches that loop() with a frequency that trades off performance vs. access to the core over WiFi. Still, I’d recommend this behavior be documented clearly, because the source of and solution to the symptoms I observed was very non-obvious. I also think the behavior should be changed, i.e. fixed, if possible.

Details:

My diagnosis is consistent with the data I’ve gathered, with one important exception: As noted before, ‘spark list’ showed core(s) as online that I was unable to flash to. I don’t have a great explanation for this. But: (1) I’ve noticed that ‘spark list’ reports lag reality, so maybe that’s what I saw. Consistent with this, I now see that ‘spark list’ eventually reports my unflashable cores as either offline or nonexistent. (2) It might be that the core hw works in such a way that ‘flash’ depends on the sketch loop()ing, but ‘list’ does not. But this is pure speculation.
As I said, my diagnosis relies on some guesses about how the core hardware works. But try this: put a while(1) loop immediately inside your sketch’s main loop(), so it looks like this: void loop() { while(1) { …code… } }. This keeps the sketch from ever loop()ing. And if your results are like mine, any core running such a sketch will be stubbornly unflashable.
If I’m right, the real problem probably has nothing to do with either the CLI or OS X. It’s really about (a) the inability to talk to a core that loop()s rarely or never, and (b) the unexpectedly poor performance of some sketches that loop() frequently.
It would be easy to dismiss the problem – just as my initial post was evidently dismissed – as fundamentally about software that’s been designed badly, or at least idiosyncratically. That would be a mistake, because it would ignore the fact that so-deemed “good” software design can badly hurt performance. As an example, I wrote two sketches to control (x,y) stepper motors, moving a printer head about 10cm diagonally and back. One sketch loop()s very frequently, and completes the task in 50s. The other doesn’t loop() at all but is otherwise identical. It takes 11s.

Short-Term Recommendation: Document the Behaviors

It looks to me like there are two noteworthy behaviors:

Cores running sketches that rarely or never loop() are unreachable over WiFi; and
loop()ing incurs some overhead such that some frequently loop()ing sketches can be markedly and unexpectedly slow.

I suspect that if you look carefully enough at the Spark core documentation, you can find information that suggests or even explicitly informs readers about one or both of these behaviors. But I personally don’t remember anything like that, and I read the docs pretty carefully. Also, iapparently no one who read my initial post could diagnose the symptoms I described. So my claim is that these behaviors are strange and important enough that you need to document them more thoroughly.

More important, the symptoms I saw when working with the Spark core seemed downright bizarre:

Very simple sketch variants, with apparently trivial structural differences, yielded huge (5x) performance differences, which took me forever to debug.
These apparently trivial changes in sketch structure also dictated whether a spark core would subsequently respond to or ignore flash attempts
Sketches that plodded along on the 32-bit, 72MHz Spark core zipped snappily on an 8-bit 16MHz Arduino Uno! Moreover, sketch variants that produced dramatic performance differences on the Spark seemed to have no effect at all on Arduino.
For most kinds of sketches, loop()-frequency sketch variants seem to have no effect when run on the Spark core either. I suspect the problem only occurs in sketches that depend on very precise timing, i.e. in microseconds. This is certainly true of my problematic sketches: there are several stepper motor-controlling programs and libraries in the public domain, but common to most of them are delays between writing to output pins on the order or hundreds or even tens of microseconds. For sketches using such very short delays, could an innocuous-seeming loop() overhead end up slowing things down considerably?

Another clue comes from this Stack Exchange post, which explains that Arduino’s loop() overhead is so modest as to be more or less negligible. By constrast – and again I’m guessing – the spark core’s overhead is far expensive enough (presumably in support of its WiFi comms), that in some cases it becomes meaningful relative to the time per typical loop().

Of course, when all the evidence is summarized like this, the likely culprits suggest themselves. But piecing it all together took me many, many, hours. In my opinion, today’s documentation doesn’t adequately (if at all) explain these behaviors, which come and go with small and seemingly arbitrary changes, and which tend to manifest in ways that obscure rather than identify what’s really going on.

So I think it would be better if the official documentation pointed out the gotchas more clearly.

Medium-Term Recommendation: Fix the Bug

I realize that some people will dispute whether these behaviors constitute a “bug”. So as evidence, let me offer the AccelStepper library for Arduino. AccelStepper improves on the standard Arduino IDE Stepper library in a variety of ways, and is by Arduino standards astonishingly well-documented. In my experience, AccelStepper is f***ing awesome. It’s not perfect, but honestly it’s beautiful.

So you can imagine my disappointment – and confusion – when my efforts to port AccelStepper to the Spark core failed miserably. I changed “Arduino.h” to “application.h” (aside: had to dig deeper for that one than seems reasonable), and I was careful to use the right GPIO pins for my various needs. But it just doesn’t work. Among other things, steppers controlled by AccelStepper are way too slow on a Spark core – much, much slower than the same code running on a dumb, slow Arduino.

Why? Until a few hours ago, I was stumped. But now I think I get it: unlike most other stepper-controlling software, AccelStepper employs an architecture centered on frequent calls to a certain administrative function called run(). The documentation advises sketches to call run() as frequently as possible so as to ensure that the controlled motor is stepped at just the right time, accelerating and decelerating, bobbing and weaving, etc. It really is something. But now I strongly suspec that AccelStepper commonly employs some very brief delays, by which the Spark core’s loop() overhead is by comparison material. So everything gets screwed up.

Because this reliance on frequent calls to run() lies at the heart of the AccelStepper library’s software design, porting the library to the Spark core is quite non-trivial. I’m certainly not going to do it. Instead, to migrate from Arduino to the Spark core, I’m going to abandon AccelStepper and either substantially modify some other stepper library, or write my own from scratch.

A big part of the Spark core’s value proposition is that it’s Arduino-compatible, i.e.: code written for Arduino will run on the Spark with little or no modifications. As far as AccelStepper goes, that’s just plain untrue.

Whether AccelStepper is common or rare I don’t know. But it sure feels like a bug to me.

But if All This Is too Much Trouble…

Maybe we can still blame the whole thing on some silly noob and his flaky home WiFi network.

kennethlimcp · November 1, 2014, 8:14am

This has been fixed in next planned firmware release of V0.4.0. The loop() frequency is increased greater to around 20+ kHz range if my memory did not fail me.

The short term solution is to add Spark.process() in areas where it’s blocking. That’s the function used to process communication with the cloud

As for library porting, some authors baked in AVR specific code and that’s definitely hard to port.

Using generic sensors on SPI/I2C/TxRx/analogread etc works great with the core if non-hardware specific code is not used

rgw · November 1, 2014, 6:28pm

Gotcha – very helpful, thanks.

One more thing and I’ll shut up about it:

I wouldn’t describe the problem as the library uses AVR-specific code. I suspect that if you asked the authors of the AccelStepper library, they’d say they don’t do that at all unless you count ‘#include “Arduino.h”’.

Technically I suppose it’s true, but only in this sense: Any library written for Arduino is effectively unusable on hardware whose loop() is “slow”, defined here as “meaningful relative to timers or delays in the library”. It appears to be the case that loop() is so fast on an AVR that it’s never an issue, but not so for the Spark. I guess you could call that “hardware-specific code” in the same way you could call it hardware-specific code if, say, the Spark core couldn’t do floating point math.

But I think that’s missing the point. What we usually mean by hardware-specific is code that expressly takes advantage of very particular properties of the target hardware, which are unlikely to be true of different but otherwise suitable hardware.

That’s not what’s going on here. Rather, what’s going on is that loop() performance on the Spark core is way, way slower than on an AVR. This is understandable given that Spark’s loop() has to do more stuff. (In fact it would be much more accurate to warn that the Spark core’s loop() performance is slow due to hardware-specific tasks it does.) And I’m happy to learn that the new firmware is going to improve matters. But on paper the Spark core looks much faster, so it’s not at all obvious that it would be too slow to power some libraries originally written for Arduino.

rgw · November 1, 2014, 8:26pm

CORRECTION:

After I wrote this, a simple (and embarrassingly obvious) workaround for using the AccelStepper library occurred to me: instead of putting AccelStepper::run() directly inside the main loop(), put it inside some sort of conditional loop so there’s an N:1 ratio of run()s to loop()s. The elegant way is a while loop that begins every time there’s a new target position for the stepper, and ends when the stepper reaches that target.

In tests this works beautifully.

So the basic behaviors are as I’ve discussed, but I substantially overstated the effort required to adapt a library like AccelStepper for the Spark core. It’s actually very easy, so long as you know the minor but essential changes that need to be made to the AccelStepper’s example sketches for Arduino.

Topic		Replies	Views
Dropping the connection to spark cloud Troubleshooting	94	13016	February 21, 2016
Pulsing cyan but can't flash from /build Troubleshooting	30	5424	March 22, 2015
[SOLVED] Issues with getting started, claiming problems Getting Started	22	5774	November 24, 2014
Core won't connect, and the usual restore/reset methods aren't working Troubleshooting	33	6001	April 2, 2014
Continuous losing connection Troubleshooting	31	9218	March 28, 2014

Unable to reliably flash spark core from CLI (Mac OS X)

Related topics