Possible problem with 2 MB Flash on Core

I have been using the flash-eeprom library for some time now to emulate about 1.5KBytes of eeprom. I have two cores (an old one and a recent one) and my colleague has an old core as well. The flash-eeprom was working well on my old core for a while, but then it started to die. Not the core – it works fine. It’s just that the data written to flash via flash-eeprom would not come back reliably after power cycling and the problem got worse with time and now flash doesn’t store at all. OK, a hardware problem. So now I’m trying this on the new Core and only part of my data stores correctly in non-volatile memory. The same firmware seems to still work fine on my colleague’s older core.

Is there some production hardware problem with the 2MB external flash that flash eeprom uses? Has the flash eeprom library been changed recently (or the underlying libraries - perhaps flash storage timing)? Is there a simply, guaranteed to work test program that tests emulated eeprom through flash eeprom that I can use to see if there is a hardware problem with my cores or not? Is there a know hardware problem here and perhaps I should just wait for my photon to arrive, since the photon does not have this external flash chip?

Can you say which library you are using? Did you mean Flashee?

How often are you writing memory?

Can you say what failures you are seeing are? Flash erases to an all-1’s (0xff) condition so typically you see some zero bits after an erase cycle when it is failing. Some sectors will eventually need to be marked as “bad” in a typical implementation.

The built-in EEPROM emulator with wear-leveling emulates 100 bytes using a much larger amount of flash for instance. The @mdma library Flashee is similar in that a relative large number of pages are typically used to emulate a smaller amount of storage.

I am using the eeprom library (McGowen) and it tested fine on my original core. The memory has since degraded, sometimes writing correctly and sometimes not writing at all. That is to say, locations in the memory appear to have been going bad (meaning won’t accept a new write). Now, it will not write correctly at all.

I was testing extensively while this degradation was happening. I might have performed a few hundred writes, perhaps even > one thousand. But the testing was manual, so there is no way that I could have performed 100K + writes. And anyhow, this is supposed to be flash memory which does not wear like real eeprom does.

My real concern is not about development. It is that I have a brand new core and, right out of the box, it will not write to the emulated eeprom (at least not to the first 4K page, which is what I am using). I suspect hardware problems, but perhaps the eeprom library has been changed and a bug introduced?

I am using the following features of the eeprom library:

flash = Devices::createAddressErase(4096); // in setup, one page of 4k bytes, using about 1.5K bytes

flash->writeString(buf, offset); //buf is pointer to a string buffer, offset is bytes in eeprom page

flash->read(buf, loc, size); //buf is pointer to a string buffer, loc is eeprom memory location, size is number of bytes to read from loc

As I mentioned in my original post, my code was working perfectly fine on the old core, but the old core has degraded. The new core does not work at all with the same code.

Is there a hardware problem on cores? Is there a better way to use the library?

Hi @BobG

Flash RAMs have sectors, which is the minimum erasable unit and on the flash chip on the Spark Core’s external flash, the sector size is 4K. That means that every operation you have been doing has all been to one sector and it seems very likely that it is now worn out.

You need to give the flashee library many sectors in order for it to do its job of wear-leveling. Without extra room there is nothing it can do but use the one sector over and over again. With extra room it can translate the address you use for your data to different physical sectors on the flash chip.

The good news is that all the sectors are independent and you can start over avoiding the one worn out sector and this time give it a lot more room to maneuver, like say 64 or 128 sectors.

How do I do this with the eeprom library? Also, this does not explain why a brand new Core should not store to this sector, even once. Actually, as a point of clarification, some of the data that I tried to store in this sector got stored, but the rest of the data did not. The data that got stored successfully was in approximately the first 1/2 of the approximately 1.5 KB total that I am trying to save in non-volatile store. In other words, not even one store operation was successful in this part of this sector.

In any event, I should try wear leveling, so please let me know how to do this.

We should ask @mdma to chime in here too–I don’t understand the failure in the new core either. Did you erase before writing?

Please see this thread:

You need to create a large pool of sectors for the wear-leveled device.

I am not sure how you will be able to avoid the bad sector or if the code will map it out of the pool. Matt likely knows how is code handles this.

Flash RAM erases to all ones (0xff) so when it fails you usually see some zero bits after an erase, but you can also have certain bits that you cannot write to zero.

@bko: Thanks for the tips so far. Re. eeprom library: How do you erase before writing? I do not see any methods for this; it appears to be internal to the write() and writeString() methods. I’m only using writeString(). Also ,is there a better API description than in the “elevator pitch” in GitHub? There seems to be a lot of detail that I am missing.

Hi @BobG

I learned the most about @mdma 's library by reading the unit tests. They describe how to use it in the various modes by testing it. There is a lot of code there and I have certainly not read it all.

You are right, I don’t see the erase option in this library–sorry my bad.

At the end of that thread referenced above the author says to use:

FlashDevice* flash;

void setup() {
   flash = Devices::createAddressErase();
}

Since it allows for the highest number of erase cycles and gives the longest life.

@bko: thanks again. I will change the constructor as you have indicated. Absent anyone else weighing in, I will assume that my second Core simply has bad hardware. The Photon will change all of this anyway. @mdma has indicated that he might update the eeprom library to support Photon as well as Core, and I certainly hope that he does so. Otherwise, the Spark folks say that Photon will natively support 4K (or was it 8K, can’t recall) of emulated eeprom, vs the meager 100 bytes on the Core.

Hi @BobG You don’t need to use erase - the raison d’etre of the library is that it emulates eeprom so you don’t have to worry about managing sectors and erases.

If you do want to manage the erases and writes yourself (losing benefits of wear levelling), you can directly access the user flash via Devices::userFlash(). Then you will have to erase each 4096 byte sector before writing to it.

The photon has only internal flash and there is relatively little of it. (1MB, into which we have to fit 3 copies of firmware - and with basic tinker compiling to 380K that’s quite tricky!) We will certainly be able to support more built-in emulated eeprom than on the core - there is room for 8K, but we have to see how the current emulation algorithm performs, since it also requires as much ram as the eeprom that is emulated.

I will add support for the emulated eeprom in flashee so that it will be simple to migrate code from one to the other - only the line of code that creates the flash device will need changing - the rest will work as is. :smile: Yay for abstractions!

1 Like

@mdma: 8K on the Photon is more than enough for my needs. I just need to store system configuration information from the user – about 1.5KB worth. This is where 8K works and 100 B does not! The alternative is external eeprom – more hardware and manufacturing cost. Thanks for the library and all of the hard work behind it.

Bad sector detection is a great idea @bko, I’ll add an issue for that.

@BobG - to avoid using the first bad sector you can create the flash device like this:

page_size_t pageSize = Devices::userFlash().pageSize();
FlashDevice* flash = Devices::createAddressErase(pageSize*1, pageSize*256);

The pageSize*1 tells flashee to start from the 2nd page (pages counted from 0.)
We allocate 256 pages so that there is plenty of room for wear levelling.

If you later need more flash you can simply create another region in the remaining space:

FlashDevice* flash2 = Devices::createAddressErase(pageSize*256, pageSize*384);

If you think a sector is bad, you can test it. I’d like this to be part of the library, but for now you can try the code here - https://github.com/m-mcgowan/spark-flashee-eeprom/issues/15

1 Like

@mdma: thank you so much. I tried this with my test program on the old Core (worn out flash) and it worked great! I tried the same code with my new core that wrote only once (and then only partially) to the flash, and it did not work. This validates my belief that the new Core has bad external flash memory; not a software problem.

One question: when you execute flash->writeString(stringArg) with a string argument say 60 bytes long, how many erases and how many writes are actually performed on the flash? Although you have helped me confirm that I wore out the first page on my old Core, I cannot possibly have executed this command more than several hundred times in total. Flash is supposed to be good for over a million erase/write cycles and I am wondering why it wore out so easily, even without wear leveling. Of course, I will use full wear leveling from now onward.

@mdma: I could use a little more help, please. As I indicated above, the fix that you gave works fine on my test code. However, when I put it into my full code, the code (same code as far as emulated eeprom is concerned) flashing the code fails. Specifically, I get the correct purple flashing then green flashing but then it goes into red flashing (bad sign) and cycles between cyan and red flashing thereafter.

I reduced the amount of flash used by the library to one page (like the original code, which works):
FlashDevice* flash = Devices::createAddressErase(pageSize1, pageSize2);
I still get the same result. When I replace this with the original code:
FlashDevice* flash = Devices::createAddressErase(4096);
The the code flashes fine and everything works OK (except for the worn our flash page which does not store anymore).

My full firmware is large and it takes up a lot of the RAM, as well as a lot of the code memory. I am not sure how the eeprom constructor works – how it allocates either code flash or RAM or both. What is the difference between these two constructors that might account for crashing my core?

When the core crashes (flashing red), it really crashes! I have to perform a factory reset to get it back – flashing blue – and then setting it up from scratch.

I very much appreciate your help here. Something is limiting me to using the first page of external flash only, but I don’t know what it is.