Hard fault caused by the TCPServer example from the Spark docs and a simple Python client

HI @dave,
All my client.write calls have the correct size specified. I’m not sure what to do at this time other than rewrite with my own socket calls.

Unfortunately it’s stalled my development and until I find a workaround, or wait for the Photon I’m not sure what else to do. I’ve thought about adding a BTLE micro and I2C data to the spark from other sensors, but that’s a bit of work.

What’s confusing to me is that other people seem to not be having problems and I have an app that wakes up, turns on the CC3000, writes a chunk of data, and closes and shuts down the CC3000. That has been working for over 20 days. Guess just having WiFi on all the time with a frequent socket connection just is not stable.

1 Like

You are not alone. Other people are having problems.

Sometimes I guess it will be their own fault. The example cited above where the TCP or UDP write accesses memory it ought not to because of improper string termination is but one way one can screw up any C program.

But often the fault is with the Spark Core. I too am in your position. The Spark Core does not work as advertised or as intended or as documented in precisely the area which distinguishes the Core from the Arduino - the networking and the Cloud. I cannot use the Cloud because if it goes away (for whatever reason) my user code blocks. I use the so-called “UDP” instead but that is unstable and so I have to force my Cores to reboot every 15mins. Other people report similar problems.

When pushed Spark HQ report they have stopped actively addressing problems with the Core while they focus on the Photon. This is unreasonable in my view.

1 Like

@psb777 You may just want to shut down the WiFi every 10ish minutes and restart it. I’ve had an app running for over 20 days that does that - well in reverse, it will monitor a bunch of sensors and every 10 mins start up the WiFi, post its data, and then shuts down the WiFi. The loop stays running. So instead of rebooting the core just reboot the WiFi. In my case I’m not sure it will help me as I have found that if the client tries to connect and the server is not responding the client panics. :frowning:

1 Like

Hey Guys,

I spent a few hours looking into this yesterday, which is why I posted, so I would definitely say we’re actively supporting and helping resolve issues whenever we can. :slight_smile: The firmware we’re writing for the photon uses a hardware abstraction layer, so any improvements we make there will be applicable to the Core. We haven’t released those yet because they’re still being actively developed, but I personally am very interested in seeing this issue fixed as well. I’m going to look into offering a bug bounty on this to see if we can speed up the resolution.

Thanks,
David

Hi @mtnscott,

We’re going to go ahead with writing up / posting a bug bounty. I think you already sent me your code, I don’t suppose you had any luck on a minimum failing test case? If you guys have specific failing test examples, can you share them with me?

I’m going to write up guidelines, test cases, and examples and we’re going to post a bug bounty. Even if we don’t have the time ourselves this week, we still care deeply about and are willing to pay to improve the experience for everyone. :slight_smile:

Thanks,
David

Great to hear – need more blog posts about what is happening with the photon. I’m putting my christmas light project on hold for the time being as the core just isn’t reliable enough for my needs on that project (and it’s only for next Christmas anyway).

@Dave Use this code example -


// SYNTAX TCPServer server = TCPServer(port);Parameters: port: the port to listen on (int)
// EXAMPLE USAGE
#include <application.h>

// telnet defaults to port 23
TCPServer server = TCPServer(8888);
TCPClient client;

/**************************************************************************/
/*!
    Heartbeat variables and defines
*/
/**************************************************************************/
#define HEARTBEAT_DELAY 1000
int heartbeat_led = D7;
bool heartbeat_state = false;
unsigned long SparkHeartBeatNext;          // next time to toggle heartbeat led

void setup()
{
  // start listening for clients
  server.begin();

  // Configure the LED heartbeat pin
  pinMode(heartbeat_led, OUTPUT);
  digitalWrite(heartbeat_led, LOW);
  heartbeat_state = false;
  SparkHeartBeatNext = millis() + HEARTBEAT_DELAY;

  // Make sure your Serial Terminal app is closed before powering your Core
  Serial.begin(9600);
  // Now open your Serial Terminal, and hit any key to continue!
  while(!Serial.available()) SPARK_WLAN_Loop();

  Serial.println(WiFi.localIP());
  Serial.println(WiFi.subnetMask());
  Serial.println(WiFi.gatewayIP());
  Serial.println(WiFi.SSID());
}

void loop()
{
  unsigned long _curTime = millis();

  /*
   * Flash the heartbeat led
   */
  if (_curTime > SparkHeartBeatNext)
  {
    digitalWrite(heartbeat_led, heartbeat_state?LOW:HIGH);       // Turn LED either on or off
    delay(20);
    digitalWrite(heartbeat_led, heartbeat_state?HIGH:LOW);       // Turn LED either on or off
    SparkHeartBeatNext = millis() + HEARTBEAT_DELAY;
  }

  client = server.available();
  if (client.connected()) {
    // echo all available bytes back to the client
    while (client.available()) {
      server.write(client.read());
    }
    server.begin();
  }
}

It’s what I was using with @bko and @guppy. Simply put - if you write data to the Spark Core it can’t seem to handle it and goes into a state that appears to hang…it seems to not recover.

Here is some really simple python code to push and pull data from the Spark

#!/usr/bin/python
 
import socket
from time import sleep
 
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('192.168.1.105',8888))
 
message = "0123456789\n"
 
while True:
#        s.send(message)
        s.sendall(message)
        print s.recv(len(message))
#        sleep(1)

The above example will fail after the very first send / receive. You could say it hammers the spark, I’ve tried to put in delays but it will eventually fail.

The code I sent you earlier is a more complicated example that sends over a json message (maybe a few hundred bytes) every 5 seconds, in that case the core will fail after about 10-15 minutes (once it ran for 45 minutes). When the core goes into a failed state it will no longer accept incomming TCP connections and runs the user loop once every 20 seconds. I have a D7 blink when the user loop runs, otherwise everything else looks normal (breathing LED)

UPDATE: Just updated the Spark example to have a blinking LED while running the loop.

UPDATE: 12:26PM - Just ran the test again, the Core goes into Panic SOS. I’m not setup for debugging w/ JTAG so I can’t really dig any deeper. Maybe custom building the firmware will get us more debugging info.

2 Likes

I’m happy to read this. In my opinion one of the bugs which needs to be addressed is the one I understand cannot be resolved by trickle-down from the Photon development. This is the design bug where evaporating Cloud causes user code to stop. Once said to be a necessary consequence of blocking calls to the TI chip, a fix to this accidental/deliberate design bug is now conceded possible by Spark HQ and is said to be well within the skill set of a reasonably well clued up user. I am not that user.

@psb777 Not sure the Cloud has anything to do with this. This IMO is in the WiFi layer, I can get this to fail in the same way w/o the Cloud active. So is it the evaporating WiFi. :wink:

2 Likes

You’re right. Perhaps a topic change was needed earlier. But @Dave was responding to us I thought perhaps on the general point of lack of support for Core issues. I agree, this particular issue is not Cloud related.

However, I would say we’re complaining about oops discussing the same thing, fundamentally: That there has seemed to be a family of related Core problems not getting much attention. Your particular problem does at least benefit, potentially, from the trickle down effect of the Photon development. But that the Core will never be able to use the Cloud and still allow user code to run should the Cloud fail, that is an issue which can only be addressed directly. Spark HQ admit a solution is possible and relatively simple. They have said they will not be working on it.

I did not realize they know how to fix it but have not prioritized it. That means all of the inventory I currently have may never be reliable and I will have to either return it or write it off as a loss :frowning: My applications need to retain state, so rebooting is not an option. Writing state to flash every 10 mins will exceed the flash write cycle in a very short time.

1 Like

Is there something I missed here? I don’t think anybody knows how to fix this right now. When did “Spark HQ admit a solution is possible and relatively simple”?

@Dave said he would offer up a bug bounty to a member of the community who supplies a pull request that fixes this–that is all I see here.

1 Like

Hey Guys,

I’d appreciate if we could keep the discourse a little less accusatory, we’re here helping, for example, I’m a Spark employee, and I’m here trying to help. We’re committed to supporting the Core and the community, and we’re making every effort to make the Photon as compatible as possible.

This isn’t a solved issue yet, if it were, it’d be fixed already :slight_smile: The issue I’m seeing, based on the conversation, is that the current flow control when sending out data over the cc3000 is insufficient for long running / high traffic requests. I believe the 1.14 patch provided by TI in the last few weeks makes available ACKs, and could be used to throttle requests to the device to avoid overwhelming it. Essentially it’s an embedded device, so we shouldn’t expect to be able to stream video or anything, but I’d like to have a stable transmit rate we can expect when developing that doesn’t cause the cc3000 to freeze up. I would also add I don’t think this is a necessarily simple fix, well, not simple for me anyway. :slight_smile:

There is a big community around the cc3000, and our community is one of the biggest users, so I wouldn’t worry about not getting the support you need.

Thanks,
David

1 Like

As a side note, I don’t want to hijack this thread, but

@psb777 I think maybe you missed the release and documentation for the semi-automatic and manual modes that address your concerns here:

http://docs.spark.io/firmware/#advanced-system-modes

Thanks,
David

AMEN! :smile:

All I need it a stable platform for a few hundred bytes of data every so often. Currently I can't get 200 bytes of data every 5 seconds to be stable longer than 10-15 mins. I don't think that is unreasonable for the Spark.

1 Like

Agreed, I want to love the spark core but for something that is such a basic feature of it, it’s failing terribly.

3 Likes

I moved 5 posts to a new topic: User code sometimes blocked by cc3000 in manual mode?

I feel the same way. Bringing some robustness to the TCP stack would be a wonderful thing, the code being opaquely locked inside the cc3000 limits our options.

But try we must! I have some free time over the holiday period to take a look at the TCP stack, in particular the ACK/NAK application handling that the new TI driver patch offers in the hope that we can fix these issues. Fingers crossed! :slight_smile:

3 Likes

@mdma Hey - I’m happy to pitch in too. Let me know how I can help. I have a JTAG shield and Segger JLink + and I’m not afraid to use it :smile:

OK - well I have it but have not yet figured out how to use it.

Thanks for your offer of help! There’s a great tutorial by Elco on how to setup JTAG debugging. Familiarize yourself with debugging first and then we can take it from there!

1 Like