Core becoming unresponsive, Timed out

hansamann · February 18, 2014, 9:05am

Hi all,

some of the members of the forums and me already try to solve an issue here - https://community.spark.io/t/adafruit-neopixel-library/1143/104 - it’s about the NeoPixel lib and Spark Core - but I would love to get some more insights how the exposed functions in general work. I assume it is some kind of timing issue. It would be great to get some feedback.

Let’s say I have exposed a function like this

Spark.function("lights", lights);

What happens when in the loop() some longer-running code is executing? Would that code be interrupted if a function call for “lights” comes in?
Or does the Spark Core check for function calls after each time the code in loop() was running?
Are there any requirements how long a exposed function, lights, should run. Can code in the exposed function for example run a few seconds? What is the timeout?
I also heard that there are some differences between Arduino and Spark Core. Are these documented somewhere? I think a lot are probably “prototyping” snippets of code on Arduino, then copying snippets over to the Spark Build environment. It would be graet to know what is ok and what not.
Somewhere I saw an example that calculating some value within a function call is no good - it was around functionCall(i++, BLA) - the problem is the i++ within the function call - is that true? Would one better calculate the new value of i outside, e.g. put it above the line seperately and then just pass i?

Thx!
Sven

hansamann · February 18, 2014, 9:07am

Oh, and sorry, a bonus question: where are the Spark Cloud servers physically located? Could therein be an issue for users in Europe? e.g. latency…

mohit · February 18, 2014, 5:39pm

In the current implementation, the function calls are checked at the end of the user loop(). If the loop is too long, it causes stability issues.

The time out is currently set at 10 seconds.

There are definte hardware differnces but this has not been documented in one location. Now that you mention it, we do need a list of it! As far as user code is concerned, we have made it as compatible with Arduino as possible. I like the idea of documenting this as well.

I think you answered your own question there. Creating a local variable, assigning/manipulating the value and then passing it is ideal.[quote="hansamann, post:2, topic:2942"]
where are the Spark Cloud servers physically located? Could therein be an issue for users in Europe?
[/quote]

We use Amazon web servers. Not sure if they have one fixed location. @Dave @jgoggins , any idea?

Dave · February 18, 2014, 6:06pm

Right now our cloud is hosted on EC2, and we’re planning on adding local regions worldwide as capacity grows. At the moment I think the physical servers are located in Virginia, but that could change. We’re measuring average roundtrip latency on the cloud and displaying it here: http://status.spark.io/

I would love to add extra cloud regions, so hopefully we can start adding those as more cores come online.

hansamann · February 18, 2014, 10:13pm

Hi @mohit, first thx a lot. Very good answers. The info about when functions are checked is very useful. Now, with the 10sec limit, is that the duration measured from the Spark Cloud to do a round-trip to the device, have the device calll a function and return? Or is that the duration that is checked against on the device only?

I have the strange issue in combination with the NeoPixel lib of @BDub, that my Spark is halted ( I can see this because I am flashing the D7 LED) once I call a function that in return causes some NeoPixel code to be executed. It strangely works for a single setPixel call, but stops once I address more than one RGB NeoPixel, e.g. setPixelColor(base, c) and setPixelColor(base+1, c) - once base+1 / the secodn call is in, I can call the exposed function ONCE - then the device is halted. It flashes cyan, all looks good, but every API call will time out.

For the i++ issue - why is that? Any explanation I could understand? It looks like these things are rarely an issue in Arduino - what makes it an issue on Spark Core? (Sorry, Electronics dummy here, also very little C/C++ knowledge).

Thx!

zach · February 19, 2014, 12:18am

The 10 second limit has to do with a heartbeat that the Core is responsible for maintaining. The Spark loop is run after the user loop is run; if 10+ seconds passes without the Spark loop running (if, for example, your code blocks for that long), then the heartbeat is not sent, and the Core disconnects from the Cloud.

Is it possible that something about your code is causing the Core to either crash or hang? That would explain the behavior that you're seeing.

I don't think this is a problem per se, although it's not a coding style I would recommend, as it's much cleaner and easier to follow if the increment is within the function. At least I can't see any reason why it would be processed any differently on the Arduino vs. the Core.

hansamann · February 19, 2014, 8:01am

@zach many thanks. My latest code that I use to control the 8x8 RGB 123 board is here: https://community.spark.io/t/adafruit-neopixel-library/1143/104?u=hansamann - it has the issue that I am able to call the “lights” function once, the 4 correct pixels (I created 16 channels for the 64 RGB board, 4 RGBs each) light up, then the spark halts.

What is totally strange about this is: with a single pixel, I can repeatedly toggle the pixel. But that really does not help, as later on I want to toggle many RGB rings, one having 12 RGB leds each. What is even weirder is that it seems to work for @BDub. But I am really using the 100% same code, only differnce might be I am over here in Europe, I have a 8x8 RGB123 pixel board, I might have a different router… etc.

ekbduffy · February 19, 2014, 10:55am

Did you understand that in BDub’s neopixel library is __disable_irq(); command, during sending data to LEDs, maybe this is key of problem?

AndyW · February 19, 2014, 2:34pm

@hansamann are you confident your power supply and/or decoupling is good enough when all 16 LEDs are toggling ?

hansamann · February 19, 2014, 7:37pm

I have a 5V/4A power supply - it’s connected to the RGB123 5V/GND pins and GND is also connected to the Spark Core GND. Sounds good? Not sure what you mean with decoupling…

hansamann · February 19, 2014, 7:38pm

That’s interesting. @BDub could that be an issue? Could disabling the interrupts somehow interfere with the spark heartbeat or the functions being able to be called?

BDub · February 19, 2014, 8:46pm

I’ve thought about this, but I think it should be ok. As long as no incoming requests are sent while the NeoPixels are being updated. It shouldn’t mess with the CC3000’s ability to receive messages. Your example of waiting for a message, getting it and updating the neopixels once is by far the least obtrusive action when disabling the interrupts.

In my opinion, the CC3000 and background architecture should not be so sensitive that it can’t miss some of those interrupts as well. It might currently be that way, and I’m pretty sure work is being done to change that by @david_s5 … is that right David?

david_s5 · February 19, 2014, 9:31pm

@hansamann @BDub

I will have to look at the code, but the cc3000 should be able to tolerate a short periods of the interrupts being off as long as the restoration is conditional on the state of the interrupts at the time of disable.

typedef uint32_t intState;
inline intState DISABLE_INT()
{
  intState is = __get_PRIMASK();
  __disable_irq();
  return is;
}

inline  int ENABLE_INT(intState is)
{
    int rv = ((is & 1) == 0);
    if ((is & 1) == 0) {
        __enable_irq();
    }
    return rv;
}

The other real possibility is a fault that is currently not displayed in the current spark build. Had one my self see

I would try a debug build and capture the Serial1 output using

rm -rf git_master_new
mkdir git_master_new
cd git_master_new
git clone GitHub - Spark-Works/core-firmware: Firmware for the Spark Core, a tiny Wi-Fi development kit.
git clone GitHub - Spark-Works/core-common-lib: Common library for projects that use the Spark Core with the CC3000
git clone GitHub - particle-iot-archived/core-communication-lib: Embedded C++ library for communication between Core & Cloud
cd core-firmware
git checkout spark_master_new
cd ../core-common-lib
git checkout spark_master_new

Then build with make clean && make DEBUG_BUILD=y

The debug output will come from tx,rx 3,3V pins.

Once that works. Update application.cpp to your code and add

void debug_output_(const char *p)
{
  static boolean once = false;
 if (!once)
   {
     once = true;
     Serial1.begin(115200);
   }

 Serial1.print(p);
}

and see what the log says.

log out put will look like:

0000000001:<DEBUG> int main() (103):Hello from Spark!
0000001994:<DEBUG> int Spark_Connect() (616):sparkSocket Now =-1
0000002000:<DEBUG> int Spark_Disconnect() (654):
0000002004:<DEBUG> set_socket_active_status (810):Sd=0, Status SOCKET_STATUS_ACTIVE
0000002012:<DEBUG> int Spark_Connect() (623):socketed sparkSocket=0
0000002018:<DEBUG> int Spark_Connect() (644):connect
0000002164:<DEBUG> int Spark_Connect() (646):connected connect=0
0000002306:<DEBUG> int Spark_Receive(unsigned char*, int) (366):bytes_received 40
0000002314:<PANIC> char* _sbrk(int) (139):Out Of Heap

BDub · February 21, 2014, 6:20pm

typedef uint32_t intState;
inline intState DISABLE_INT()
{
  intState is = __get_PRIMASK();
  __disable_irq();
  return is;
}

inline  int ENABLE_INT(intState is)
{
    int rv = ((is & 1) == 0);
    if ((is & 1) == 0) {
        __enable_irq();
    }
    return rv;
}

@david_s5 can you explain why you save the PRIMASK and only enable the interrupts again if bit 0 of the PRIMASK is 0?

From what I’m reading, it sounds like __disable_irq(); will set bit 0 of the PRIMASK and __enable_irq(); will clear it. I’m guessing you are attempting to detect if the interrupts were previously disabled by some other piece of code, we should not try to enable them yet. However, I would think any other piece of code that is disabling and enabling the IRQ would be in an ISR handler, so it theoretically should be enabled again when executing user code. And given the fact that if user code is running IRQ’s should be enabled, if user code disables IRQ’s, it should be running in a state of bliss until it enables them again.

Is your code just for safety then?

Only other tricky part is if there are any NMI’s set in the system, because __disable_irq() would not disable those.

david_s5 · February 21, 2014, 6:27pm

@Bdub It allows nested calls and I use it in the drivers, and FG to perform atomic operations.
that calls code off an ISR and off the FG. It is akin to not leaving trash at the beach…do thing clean and the “world is a better palce”

Topic		Replies	Views
Delay killing core Troubleshooting	32	5051	February 12, 2014
Spark Core Execution Speed Firmware	20	7720	July 2, 2014
Spark CLI - Calling Spark Functions Troubleshooting	14	2839	February 20, 2015
Interrupt not working on calling other function [Solved] Firmware	18	13042	September 11, 2014
What is Spark Core application area? Getting Started	18	3808	February 28, 2014

Core becoming unresponsive, Timed out

Related topics