Cloud Functions Interrupt SD card operations

Hello,

I am working with bsom boards and micro sd cards using the SdFat library. I have found that the SdFat object will occasionally die as a result of cloud function calls. This can even happen on startup when particle is connecting with the cloud as a result of the connection process. This is a fairly rare event where the cloud interrupt occurs just as an sd operation is occurring (typically a write), leading to the failure. All subsequent SD operations fail afterwards.

Because it is rare, I have prepared a particle project for a bsom running 3.30. This firmware writes about 2000 bytes to a file over and over again, printing “Failed” when the write fails. To trigger it requires that the particle function “radek-test” be called multiple times. I have triggered this thru particle.console by clicking the cloud function, but it is easier if you have a python script to call the function in a loop. In this code, I have it recover by calling SD.begin() after a failure. Again, this library is setup as a proof of concept of my error, because it is fairly hard to trigger, but I have triggered it in my actual project code.

I have found that it is possible to recover from these events by calling SD.begin(), but I am not sure what a good way to tell if the SD card object has died or not. Is there a way to check the health of the SD object or should I do something like checking if I can open some file and use that as my reference?

Alternatively, is there a way to control when we handle the responses to cloud functions in the firmware? This would let me control the timing of the response so that it does not occur when the SD card is being used.

2 Likes

I am now using SD.printSdError() to get the error information. I am getting the following error message: SD error: SD_CARD_ERROR_CMD18 = 0xC,0xFF, which corresponds to “Read multiple blocks” error message. Does that mean I am giving too many commands to the SD card in rapid succession in this example code?

I am now thinking that if this is the case, then it might also be the same error I am seeing in my real code, since I am using the SD card across 3 threads (I have locking implemented, but I am not rate limiting the SD calls). Is there a max frequency that I can call the SD card functions? I did some testing using 25ms delays, but the error seems to still occur, so maybe it is something else?

1 Like

I updated the test code to replicate and print the error more easily, now I am getting error message SD error: SD_CARD_ERROR_WRITE_DATA = 0x20,0x0, which makes more sense, but not sure what the cause is.

1 Like

I have played with adjusting the SPI speed in SPI.begin() and seeing that by reducing the speed down to 4MHz, I cannot trigger the issue. I am not sure if the solution here is to try to catch and recover the error or just try to avoid it entirely by lowering the speed?

1 Like

@Radek this is very interesting. Looks like you have done a lot of debug… no doubt very frustrating.

Q1. What version of SD_FAT are you using?

Am wondering if the following comments in my init code has any relevance to your work around?

        // Use SPI_HALF_SPEED on bread boards to avoid bus errors
        // Use SPI_FULL_SPEED for more performance.
        if (!sd.begin(nSS, SPI_FULL_SPEED))
        {
            Log.error("sd.bgn"); // FAIL of some sort
        }

Q2. What is the physical connection with the SD CARD, are you using a breadboard or is it on a PCB?

I think I also have a similar issue that you have, at times the SD CARD cannot be access until after a reboot.

I am using v2.0.7, specifically git commit caece65d13bad93272b77b9c862175dd4e08b6f9.
I have also tried the latest version of SdFat, which didn’t really have an impact. We also #define ENABLE_SPI_TRANSACTIONS 1, but idk if this is relevant. We are using a PCB.

The comments are exactly how I varied the speed, which helped reduce the probability of error, but did not elimnate it. At this point, I dont think it is possible to elimate such errors, so we are opting to recover by checking sd.errorCode() and calling sd.begin() again.

1 Like

I like your solution:

Makes sense.