Random EEPROM write/put failures - ARGON/XENON

argon
xenon
Tags: #<Tag:0x00007fe220526ca0> #<Tag:0x00007fe220526ae8>

#21

So (thinking aloud) perhaps this is where the issue lies (IMHO) - in single threaded mode, access is no issue as all access is happening “sequentially” in the same thread but when is SYSTEM_THREAD(ENABLED) - there is concurrent access to the LittleFS library and the issue arises - same problem for example, when using this mode and the LCD library and software timers - it causes corruption as the underlying drivers are not thread aware and can be ungraciously interrupted as DeviceOS assumes its in control. Even though SYSTEM_THREAD(ENABLED) works well in most cases, all built in libraries (like EEPROM, WIRE, SPI etc) would need to have “transactional” support refined going forward.


#22

@shanevanj, yup, thread safe behaviour for shared resources in the DeviceOS seems to not be consistent. Transnational devices (SPI, I2C, etc) also need to be protected, which seems to be lacking (eg the LCD example).


#23

Wild thought here - could we start a tutorial type exercise to convert the current wire library to a transactional based one - I for one would love to learn how to do something like this as my code skills are not much beyond pointers, never mind classes and transactional code etc. I could not contribute in terms of code but would be happy to destructively test …


#24

@shanevanj, I’m not sure why you think the existing Wire support in DeviceOS is not transactional. Remember that it is tied to the hardware and may be implemented in the Nordic SoftDevice stack. Furthermore, I2C is intrinsically transactional. Perhaps you can explain more about what you want to achieve?


#25

The SPI API does have a locking semantic.

https://docs.particle.io/reference/device-os/firmware/electron/#begintransaction-

The Wire I2C API does as well via Wire.lock/unlock but I can’t see it in the latest API documentation unless I’m missing something.


#26

I may have used the wrong layer here :-), and I may also have mangled some concepts - on reflection, if I take the issue I had with LCD corruption - I see now that the issue is going to be in the I2C LCD library and not the wire library - so this requirement of mine should change to say how do we make the I2C LCD library “transactional” i.e. if app thread is updating LCD and software timer (system thread) fires an lcd update - the two somehow play nice so you don’t end up with a corrupted display - an example would be a software timer updating the time of day in top left corner of LCD, while the app thread updates values from a sensor in the second line of the LCD. Currently with system_thread(enabled) this results in a corrupted display as the (in my understanding) LCD library is not thread aware (if thats even possible) nor is there a way to “take exclusive” ownership of the lcd. Perhaps it is as simple as implementing a few new functions like

  • bool lcd.available() - returns a boolean if not locked.
  • int lcd.lock() - returns a int (lockNumber) if completed, -1 if locked already
  • bool lcd.release(int lockNumber) - true completed, false lock number not found

#27

@shanevanj, I2C is transactional. The problem is when the software timer interrupts an ongoing transaction being done by the user application. The typical solution is to create a MUTEX which prevents the single resource, the I2C hardware, from being interrupted in the middle of a transaction until it is complete. Thus “with lock” and “release” you mention.

Another solution, a la @rickkas7, is to create a thread that handles all display transactions with a buffer whose elements are built in the user application and/or software timer. This way ONLY the thread transacts with the I2C device.


#28

Retested with DeviceOS1.4.0 - it happens far less frequently, but it still fails. If system thread disabled, it does not fail in 24hrs of iterations of test code below. Device is locked up and needs a hard reset.

SYSTEM_THREAD(ENABLED);

const int dbOffset = 50;
const int sizeOfAttributeName = 4;
const int largestAttributeValue = 20;
const int numOfAttributes = 150;
retained unsigned int iterationCounter = 0;

struct
{ // common struct for attributes stored in EEPROM
  int attributeNum;
  char attributeValue[largestAttributeValue + 1];
  time_t lastChanged;
} attributes;

// ------------------------------------------------------------------------------
void eepromCreateDataTable() //set NVram to defaults
// ------------------------------------------------------------------------------
{

  for (int recno = 0; recno < numOfAttributes; recno++)
  { //

    attributes.attributeNum = recno;
    attributes.lastChanged = Time.now();
    snprintf(attributes.attributeValue, largestAttributeValue, "%05d", recno);

    EEPROM.put((recno * sizeof(attributes)) + dbOffset, attributes);

    Serial.printf("\nRecord %d added", recno);
  }
}

// ------------------------------------------------------------------------------
void selectAll()
// ------------------------------------------------------------------------------
{
  Serial.print("\n#\tName\tUpdated\t\tValue");

  for (int recno = 0; recno < numOfAttributes; recno++)
  {

    EEPROM.get((recno * sizeof(attributes)) + dbOffset, attributes);

    Serial.printf("\n%d\t%d\t\t%lu\t%s", recno, attributes.attributeNum, attributes.lastChanged, attributes.attributeValue);
  }
}

// ------------------------------------------------------------------------------
void setup()
// ------------------------------------------------------------------------------
{
  Serial.begin();               // open serial over USB
  while (!Serial.isConnected()) // wait for Host to open serial port
    Particle.process();
}
// ------------------------------------------------------------------------------
void loop()
// ------------------------------------------------------------------------------
{
  Serial.print("\nStarting EEPROM test");
  Particle.publish("EEPROM_TEST", "Started", PRIVATE);
  eepromCreateDataTable();
  selectAll();

  Particle.publish("EEPROM_TEST", String::format("Iteration %u completed", iterationCounter++), PRIVATE);
  Serial.printf("\nIteration %u completed", iterationCounter++);

  delay(5000);
}


#29

This code fails every time

SYSTEM_THREAD(ENABLED);
STARTUP(System.enableFeature(FEATURE_RETAINED_MEMORY));

const int dbOffset = 0;
const int sizeOfAttributeName = 4;
const int largestAttributeValue = 20;
retained unsigned int iterationCounter = 0;
int maxRecords = 0;

struct
{ // common struct for attributes stored in EEPROM
  int attributeNum;
  char attributeValue[largestAttributeValue + 1];
  time_t lastChanged;
} attributes;

// ------------------------------------------------------------------------------
void eepromCreateDataTable()
// ------------------------------------------------------------------------------
{
  Serial.print("\nCreating data table");

  for (int recno = 0; recno < maxRecords; recno++)
  {
    attributes.attributeNum = recno;
    attributes.lastChanged = Time.now();
    snprintf(attributes.attributeValue, largestAttributeValue, "%05d", recno);

    EEPROM.put((recno * sizeof(attributes)) + dbOffset, attributes);

    //Check what we wrote
    // EEPROM.get((recno * sizeof(attributes)) + dbOffset, attributes);
    // if (atoi(attributes.attributeValue) != recno)
    // {
    //   Serial.print("\n#\tName\tUpdated\t\tValue  - # Name and Value should all be the same face value");
    //   Serial.printf("\n%d\t%d\t\t%lu\t%s", recno, attributes.attributeNum, attributes.lastChanged, attributes.attributeValue);
    //   Serial.printf(
    //       "- Error record %d value is %d",
    //       recno,
    //       attributes.attributeValue);
    // }
  }
}

// ------------------------------------------------------------------------------
void selectAll()
// ------------------------------------------------------------------------------
{
  Serial.print("\nVerifying data table");

  for (int recno = 0; recno < maxRecords; recno++)
  {

    EEPROM.get((recno * sizeof(attributes)) + dbOffset, attributes);

    if (atoi(attributes.attributeValue) != recno)
    {
      Serial.print("\n#\tName\tUpdated\t\tValue  - # Name and Value should all be the same face value");
      Serial.printf("\n%d\t%d\t\t%lu\t%s", recno, attributes.attributeNum, attributes.lastChanged, attributes.attributeValue);
      Serial.printf(
          "- Error record %d value is %d",
          recno,
          attributes.attributeValue);
    }
  }
}

// ------------------------------------------------------------------------------
void setup()
// ------------------------------------------------------------------------------
{
  Serial.begin();
  while (!Serial.isConnected())
    Particle.process();

  maxRecords = (EEPROM.length() - 1) / sizeof(attributes);
}
// ------------------------------------------------------------------------------
void loop()
// ------------------------------------------------------------------------------
{
  Serial.printf(
      "\nStarting EEPROM test - EEPROM size - %d",
      EEPROM.length());
  Serial.printf(
      "\nSpace for %d records of test struct of %d bytes",
      maxRecords,
      sizeof(attributes));
  Serial.print("\nNo progres shown unless error is detected");

  Particle.publish("EEPROM_TEST", "Started", PRIVATE);

  eepromCreateDataTable();
  selectAll();

  Particle.publish("EEPROM_TEST", String::format("Iteration %u completed", iterationCounter), PRIVATE);
  Serial.printf("\nIteration %u completed", iterationCounter++);

  delay(5000);
}


#30

In what way does it fail exactly?
Do your serial prints provide any hints about how far it works and where it breaks?
Can you add more to get a clearer picture?
With SYSTEM_THREAD(ENABLED) you must ensure Particle.connected() == true before trying to Particle.publish().

BTW, it doesn’t matter for your code as is, but for your maxRecords calculation you’d also need to deduct dbOffset for the case dbOffset may not be zero anymore.


#31

So this is interesting … sometimes it just stops in the Creating table function.

If I uncomment the //Check what we write section, then it takes much, much longer to lock up, which is why o wanted to use the retained iterationCounter - but that is not behaving as I expect see:

I have left the Argon off for 4 hours now and just connected it and so far has done 150 iterations all ok … so I will let it run overnight and see what I get?


#32

So there is definitely an anomaly that randomly happens

Test code:

SYSTEM_THREAD(ENABLED);

const int dbOffset = 0;
const int sizeOfAttributeName = 4;
const int largestAttributeValue = 20;
int maxRecords = 0;

struct
{ // common struct for attributes stored in EEPROM
  int attributeNum;
  char attributeValue[largestAttributeValue + 1];
  time_t lastChanged;
} attributes;

// ------------------------------------------------------------------------------
void eepromCreateDataTable()
// ------------------------------------------------------------------------------
{

  Serial.println("\nCreating data table");

  for (int recno = 0; recno < maxRecords; recno++)
  {
    attributes.attributeNum = recno;
    attributes.lastChanged = Time.now();
    snprintf(attributes.attributeValue, largestAttributeValue, "%05d", recno);

    EEPROM.put((recno * sizeof(attributes)) + dbOffset, attributes);
    printDots();

    // Check what we wrote
    EEPROM.get((recno * sizeof(attributes)) + dbOffset, attributes);
    if (atoi(attributes.attributeValue) != recno)
    {

      Serial.print("\n#\tName\tUpdated\t\tValue  - # Name and Value should all be the same face value");
      Serial.printf("\n%d\t%d\t\t%lu\t%s", recno, attributes.attributeNum, attributes.lastChanged, attributes.attributeValue);
      Serial.printf(
          "- Error record %d value is %d",
          recno,
          attributes.attributeValue);
    }
  }
}

// ------------------------------------------------------------------------------
void selectAll()
// ------------------------------------------------------------------------------
{
  Serial.println("\nVerifying data table");

  for (int recno = 0; recno < maxRecords; recno++)
  {

    EEPROM.get((recno * sizeof(attributes)) + dbOffset, attributes);
    printDots();

    if (atoi(attributes.attributeValue) != recno)
    {
      Serial.print("\n#\tName\tUpdated\t\tValue  - # Name and Value should all be the same face value");
      Serial.printf("\n%d\t%d\t\t%lu\t%s", recno, attributes.attributeNum, attributes.lastChanged, attributes.attributeValue);
      Serial.printf(
          "- Error record %d value is %d",
          recno,
          attributes.attributeValue);
      while (1)
      {
        Particle.process();
      }
    }
  }
}

// ------------------------------------------------------------------------------
void printDots()
// ------------------------------------------------------------------------------
{
  const int maxDotWidth = 40;
  static int dotWidth = 0;

  if (dotWidth++ < maxDotWidth)
  {
    Serial.print(".");
  }
  else
  {
    Serial.println();
    dotWidth = 0;
  }
}

// ------------------------------------------------------------------------------
void setup()
// ------------------------------------------------------------------------------
{
  Serial.begin();
  while (!Serial.isConnected())
    Particle.process();

  maxRecords = (EEPROM.length() - 1 - dbOffset) / sizeof(attributes);
}
// ------------------------------------------------------------------------------
void loop()
// ------------------------------------------------------------------------------
{
  static int iterationCount = 0;

  Serial.printlnf(
      "\nStarting EEPROM test - EEPROM size - %d \
      \nSpace for %d records of test struct of %d bytes",
      EEPROM.length(),
      maxRecords,
      sizeof(attributes));

  eepromCreateDataTable();
  selectAll();

  Serial.printf("\nIteration %u completed", iterationCount++);

  delay(5000);
}

Sometimes only a manual DFU mode will bring it back.

If the SYSTEM_THREAD(ENABLED); is commented out - it runs perfectly okay.


#34

Update:

SYSTEM_THREAD is NOT enabled - 2150 iteration completed with no issue

With SYSTEM_THREAD(ENABLED) fails within first iteration