How to best handle global Strings / strings / text data?

Trying to understand some best practices for global Strings, not entirely sure the best way handle,

(1) For example to cast and concatenate global text variables as Strings:

String(var1) + ";" + String(var2) 

(2) Or to have persistent global Strings outside of function calls:

String someStatus = "Change as needed.."

Above (1) would add / concatenate the 2x global vars (typecast as String) and 1x constant string into 1 String, locally within a function - seems ok. But for above (2) Particle > Code Size Tips says beware of fragmentation with global vars, namely Strings right? So how to handle these?

Some notes, uppercase String is from Spark-Wiring library right? Vs the lowercase string from std:: (standard template library) right? Perhaps both inputs to Particle’s library that has String(), .c_str(), etc. methods no?

So how should text data persist BEYOND a function call, with static local or global vars (ex, in a struct) right? Should you equate them to “ ” (empty) before each assignment? Or perhaps once per hour? But not sure that solves fragmentation as they’d never be deallocated within the app life / run time (or between resets)…

Would it make sense to forgo the strings/Strings entirely and adopt const char * for globals? Does that solve fragmentation because the pointer is a constant which implies a place in heap memory that never changes? And then use the Particle String methods on these now ‘less functional’ char arrays?

OR considering (const char*) type casting is used on strings for Log.info (OS examples), can you (const char*) type cast String globals? But again these global vars could still get fragmented no?

Side note, thought this was a cool thread on determining fragmentation.

Has to be an easy answer here…

Also some interesting links,

Is it possible to completely avoid heap fragmentation?
Are global variables in C++ stored on the stack, heap or neither of them?

Hi Eric,

I think the best practice here is to avoid Strings and use c strings, as was discussed in this post, among others:

Cheers
Gustavo.

2 Likes

One important consideration is with this:

String someStatus = "Change as needed.."

If the string is always set to a literal string (not generated), then you’re better off with:

const char *someStatus = "Status 1";

Then when you change it:

someStatus = "Status 2";

The reason is that string literals are only stored in flash, so the RAM usage is 4 bytes for the pointer. When you store it in a String, there’s some overhead plus a copy of the whole string in RAM, plus the space is still used in flash for the string literal that is copied into RAM.

Of course if you’re generating strings then they will need to be stored in RAM, so there’s less of a downside there.

2 Likes

Wow, thank you Gustavo, Rick-K! Tons of info and still digesting, I didn’t see your ‘best practices’ thread so thank you.

So far I gather,
char arr[] = "Global, mutable strings from now on
is preferred per PKinOttawa in that thread. And basically stored in RAM per Rick-K above.

But wondering if ‘Memory management when modifying strings is programmer’s responsibility’ since it’s a C-style string (and seems like C++ strings aren’t a practice here; either C-style or String class). And if so, no action ultimately needed as char arr[] can be reassigned but never really fragments or uses memory beyond the longest string assigned to it.

And then what to make of const char* as found in Particle OS docs, used for typecasting right? Besides making immutable strings (program crashes if you try to reassign?) as Rick-K also showed. In general hoping to avoid complexities of pointers.

1 Like

Forgive my ignorance, to highlight an earlier side note, why not use:
std::string str = "a handy string"
(C++ style string without the overhead / fragmentation of String class)?

Or similarly in the .cpp file:
using namespace standard
string str = "a handy string"

And same line of thought, why not C++ vectors? I mean whatever is best I’m all for it, just trying to see what that is.

This is a dangerous misconception!
There is nothing stopping you from breaking the boundaries of a string but you as a programmer.

std::string are still utilising the heap and when mutating will fragment it.

The problem with using dynamic memory (including vectors, lists, dictionaries, ...) on this platform is that there is no garbage collector that would collect freed space and defragment the heap as there is on fully fledged computer systems.

3 Likes

Thank you ScruffR.

So then to properly manipulate global char arr[] = "A string"; would you prefer the following?

char arr[] = "A string";

void loop() {
   if(someCondition) {
      String temp = "A new string"                \\ Local String on 'stack' only
      temp.toCharyArray(arr, sizeof(arr));        \\ Play nicely w/ C-strings
   }
}

Guessing this resizes, puts the \0, and otherwise keeps things under control for C-strings. And similarly could replace sizeof(arr) with 64 if say arr[64]. Basically .toCharArray() is OS’s version of strncpy(), and output can feed into Log.info, etc. as arguments right?

Nope, it's a method of the String class and hence requires an intermediate instance of the string to copy from.

I'd rather use

strncpy(arr, "A new string", sizeof(arr)-1); // assuming the terminating zero is still in place
// otherwise make sure to terminate
arr[sizeof(arr)-1] = '\0';

This wouldn't resize arr[] but truncate the string to A new st.
When strictly sticking with C string functions like strncpy() I exactly know what it does and don't have to assume/investigate whether String::toCharArray() would terminate a truncated string or not - strncpy() does not when truncating a string.

BTW, toCharArray() uses this function


void String::getBytes(unsigned char *buf, unsigned int bufsize, unsigned int index) const
{
	if (!bufsize || !buf) return;
	if (index >= len) {
		buf[0] = 0;
		return;
	}
	unsigned int n = bufsize - 1;
	if (n > len - index) n = len - index;
	strncpy((char *)buf, buffer + index, n);
	buf[n] = 0;
}

hence using strncpy() directly would impose less overhead and avoid using the heap for the intermediate copy.

1 Like

So enlightening!

I’ve never been so happy being so wrong. :joy: Coz you don’t learn if the answer is always right. Cheers on the inner workings and takeaways.

1 Like

Few notes,

Seems strlcpy() is not a standard function, although recognized by Particle Workbench, and is preferred due to ‘safety’ (does NOT overrun the destination string, and includes a terminating ‘\0’).

And think you need strlen()+1 here to capture the whole source string (at least that’s how it works in a non-Workbench cpp app in VSCode so I can see the result in cout):

    strlcpy(arrDest, arrSource, strlen(arrSource)+1);

That must be a typo, it should be strncpy().
Where did you see strlcpy()?

And no, you should not use strlen(source)+1 but sizeof(destination).
Your limiting factor is the destination. You cannot resize the array to accommodate a longer string.

Hmm, Geeks for Geeks had something and StackOverflow mentioned too. Actually seems strncpy() may not be the safest either…

Huh, doesn’t sizeof(dest) potentially miss some of the source, or is that the point, to avoid overrun? And is the ‘\0’ included by the way?

That is exactly the point. An array once created is fixed in size.
Just like a bucket won't grow just because you tell it that you intend to fill extra 5 liters into it :wink:

With the size(dest)-1 you leave space for the zero terminator.
When you already had a zero terminator in there from the initialization, the -1 will ensure that this will not be overwritten by any subsequent updates to that array (when only using "save" instructions).
But if you want to be doubly sure you can always ensure the termination by (unconditionally) rewriting the terminator.

That's why I suggested this code earlier

When used this way strncpy() is save (ignoring topics like thread safety tho').

1 Like

Copy that!

Thank you

Hmm that exact code stops copying to destination buffer at 3 chars…

Note dest buffer is 128 (setup as char arrd[128] = "Init string";). Even with a for loop to set all (buffer) char to ‘0’ (plain zero) just in case there was an erroneous ‘\0’. Also to ensure any copied string is treated as ASCII (ie, char type).

sizeof(arrd) returns warning "will return size of char*", nevertheless Log.info("%s", sizeof(arrd)) does in fact return 128. Further strncpy’s 3rd param is size_t n and return type of sizeof() is size_t. So apples to apples.

Could you reiterate what "that exact code" is? (including all relevant definitions)

Yes for example (plus/minus some testing, like (const char*)arr2 within strncpy line just to be sure).

char arrd[128] = "Init string";                         // Global string, desired
char arrs[128] = "A test string";

void loop() {
    scpy(arrd, arrs);
    Log.info("arrd: %s", arrd);
    delay(5000); 
    Log.info("Revert");  
    scpy(arrd, "Init string");
}

void scpy(char arr1[], const char arr2[]) {             // Custom 'safe' strncpy
    for(int i = 0; i < strlen(arr1); i++) {             // empty dest arr
      arr1[i] = ' ';
    }
    strncpy(arr1, arr2, sizeof(arr1)-1);                // copy up to null terminator of dest
    arr1[sizeof(arr1)-1] = '\0';                        // put null terminator just in case
}

Result of arrd is always “A t” or “Ini” (limited to 3 char).

Nope, you are not passing arrd[128] to the function, you are only passing an “anonymous” pointer (char* with a size of 4 byte) - that’s what the warning above informed you about :wink:
That is why you need to pass a size parameter to every C standard function that needs to know the size of the buffer - like the size parameter of strncpy() if it could deduce the size from the first parameter, the third would be superfluous or at least optional.

The stick with the bucket analogy from above:
Since moving a full bucket around is hard, you are not handing it over directly but rather use a hose that is attached to that bucket (a char* pointer).
Consequently the function only knows the diameter of the hose but not how much volume the bucket at the end has.

1 Like