Latest firmware released to Spark Build

It was undefined...

But your pull request looks like it should define it when RELEASE_BUILD is defined so that's good :wink:

Can you #define DEBUG_BUILD in the web IDE if you desire debugging?

I will start a new topic. We absolutely need the PROGMEM functionality that arduino has which allows const vars to be read directly out of flash and NOT copied to RAM.


Hi @satishgn and @zachary and @david_s5 everybody

I have been doing a bit of research tonight with a local build of an empty application.cpp. I have been looking through the output from:

    arm-none-eabi-readelf -a core-firmware.bin

I pulled the OBJECT lines and their sizes out into a spreadsheet and sorted them by size. Some of these are obvious like wlan_tx_buffer and wlan_rx_buffer. It looks like there are two 4k buffers that seem to be crypto related, RT0…3 and FT0…3. I looked at spark_protocol and there are some 294- and 612-byte keys plus a 640-byte queue, but that is surprisingly the largest single object. There are lots of good features in here like the USART_Rx_Buffer, but maybe that size could be optimized.

Here are the top memory hogs:

2064	spark_protocol           
1064	impure_data              
1032	__malloc_av_             
1024	RT3                      
1024	RT0                      
1024	RT1                      
1024	RT2                      
1024	FT0                      
1024	FT3                      
1024	FT1                      
1024	FT2                      
1024	wlan_tx_buffer           
1024	wlan_rx_buffer           
352	User_Func_Lookup_Table   
256	FSb                      
256	RSb                      
256	sbox                     
256	USART_Rx_Buffer          
256	rsbox                    
200	__mprec_tens             
200	User_Var_Lookup_Table    
192	aes_test_cfb128_ct       
176	expandedKey              
140	sha1_hmac_test_sum       
96	flash_codes              
96	aes_test_cfb128_key      
72	rx_buffer                
72	tx_buffer                
68	tSLInformation           
67	profileArray             
67	Virtual_Com_Port_ConfigDe
64	_ZL13exti_channels       
64	aes_test_cfb128_pt       
64	sha1_padding             
64	USB_Rx_Buffer            
60	sha1_test_sum            
58	ip_config                
56	lconv                    
50	Virtual_Com_Port_StringPr
48	aes_test_ecb_dec         
48	aes_test_ecb_enc         
48	aes_test_cbc_dec         
48	aes_test_cbc_enc         
48	Device_Property          
40	RCON                     
40	_ZTV9USBSerial           
40	__mprec_bigtens          
40	_ZTV7TwoWire             
40	__malloc_current_mallinfo
40	_ZTV11USARTSerial        
40	ADC_DualConvertedValues  
38	Virtual_Com_Port_StringVe
36	User_Standard_Requests   
<stuff 32-bytes and smaller> 

I am not sure where to go next . This is good data to understand where that base of around 11k is coming from, but is is not obvious what might need to be worked on from this list and what is already optimized.


OK, so I also should be looking at the addresses to see what is const in FLASH and not in RAM. The sbox and rsbox which are used by CC3000 security.cpp are in flash, for instance because the address 0x08015…

1588: 08015e80   256 OBJECT  GLOBAL DEFAULT    2 sbox

I do think there are two complete AES implementations in there right now, one in the CC3000 code and one in the tropicssl code.

There is a .map file in the build directory of core-firmware that has all the output generated by the linker which may be helpful for your research.

If you do

arm-none-eabi-nm -S --size-sort core-firmware.elf

you’ll get a size sorted list of all the symbols. Along with each symbol is a single letter indicating “type”, where T/t => text (not taking up SRAM), B/b => bss, and D/d => data.

nm docs:

You’ll see spark_protocol as the biggest SRAM taker with 0x810 in the build I just did. Those FT0, RT0, etc are all const, so they don’t use SRAM. These four are big—the CC3000 buffers and the other two I don’t recognize:

20001f9c 00000400 B wlan_rx_buffer
20001b9c 00000400 B wlan_tx_buffer
20000704 00000408 D __malloc_av_
200002d8 00000428 d impure_data

Oh, thanks - played a bit more with nm:

arm-none-eabi-nm -S --size-sort -r core-firmware.elf --radix decimal --debug-syms --special-syms --synthetic | grep -e " [BbDd] "

gives a list with all RAM blocks, along with their decimal size, largest block first :slight_smile:

However, the total of these amounts to 8k only, 7k of them for the largest 9 blocks.

Where’s the 6k difference to the 14k gcc reports for bss+data coming from?

536874336 00002064 B spark_protocol
536871296 00001064 d impure_data
536872364 00001032 D _malloc_av
536877640 00001024 B wlan_tx_buffer
536878664 00001024 B wlan_rx_buffer
536873776 00000352 B User_Func_Lookup_Table
536873500 00000256 B USART_Rx_Buffer
536874132 00000200 B User_Var_Lookup_Table
536877288 00000176 B expandedKey

1 Like

Zachary, those four items you listed add up to 4KB, leaving that 16KB “wall” we are seeing in RAM. What are the qualifiers “B”, “D” and “d”?

Thanks! That makes is much easier to understand. Reading online a bit, these:

are coming from newlib. These can be controlled with switches (or other implementations of new). impure_data is used to allow functions to be reentrant and some space is always needed.

I think from the top few it is clear that the great new stability and other features like interrupt-driven USART come with a RAM price tag. Maybe some of the advanced folks here can experiment with reducing sizes and see if the price can come down at all.

I think david_s5 pegged it when he said that the Spark needs a proper dynamic memory manager. The Spark Team is aware of this and if david_s5 has the time, we will see something in a future sprint.

So if I understand correctly, there is about 6KB of RAM left for user apps. If a user creates a TCPClient or UDPClient, these eat up 512B of RAM in buffers (?) each. It is not clear to me that const vars are not being copied out to RAM so that 6KB can disappear really fast!

@bko @peekay123 @zachary @luz Good work pulling this together! All good points. I think there should be a branch created with an eye on transforming the code into a more “optimized for a resource limited embedded system”

After 20+ years of embedded development, I have leaned that it is better to over optimize everything: power management, speed, ram, flash constantly. This can be hard to do, with a deadline driven project or deadline driven feature requirement and varying skill sets. But if every line of code, module designed, feature added is optimized from the “get go” the “world is a better place”

Having said that, most of the projects I have gotten involved with, have resource issues. One project, had an insane “out of runway” deadline of 3 months to production. So I specked the fastest, biggest RAM, FLASH MPU I could find that would meet the run on battery for X hours requirement. But still every line of code, module designed, feature added was optimized.

There are 3 way that come to mind to mitigate embedded resource constraint:

  1. Write perfect code - not going to happen.
  2. N tiers of development team
  3. design spiral.

Given the degrees of freedom: time, skills, money

design spiral proto, test , learn, optimize, fix, on feature branches. integrate repeat while converging on goal.

or as I like to think of it as :smile:

design spiral costs time to revisit, requires skills to get the the goal,

N tiers of development team: tier 1 does quick feature BRANCH, (may have to #ifdef out stuff to get it to fit), tier 2 does optimization and integration to master. Skill sets increase as N goes up.

Faster time to goal,costs money, requires teams with increasing skill sets.

Optimizations :

The byte killer branch: The ARM support Bit-banding - right there all the bools,uint8_t
The misuse branch: kill things like uint32 for things that have 0:1, 0:-255 range
The power branch: Use interrupts remove all poling optimize IPC and queues
The Clever reuse branch: eliminate copies and use ti buffers in a clever way
The Heap branch: put in a real heap management.
The tricks of the trade branch: Use trick of the trade: For example a buffer is by definition empty or full. Therefore one does not need buffer, on, off. Just buffer and offset, or better yet store the data backwards offset will equal count and the next pointer. cool huh!

So how should we do this?


@david_s5 You are a great asset to this community!

david_s5, I believe targeted optimization may be the way to go. It’s clear that the immediate need is for better RAM resource management especially in the numerous buffers and heaps and whathaveyounots. The design spiral is great but let’s face it - if the STM32 had more flash and RAM, we would worrying about other things instead of these resources. Perhaps the spiral with clear resource optimization goals is another way to go, not unlike what is happening with the WAN/CLOUD connect/disconnect evolution.

Less than a handful of folks (including you of course) are truly capable of optimizing this puppy. So which way do you think we should go? :smile:

From something @zachary said, I thought that the build server had a -DUSE_ONLY_PANIC defined.

@BDub I am not sure how all the webID compile works @Dave maybe can shed light

It would be cool to have release, release-panic-only, debug as build options per core/per project

Hmm, at the moment I think it would always be “RELEASE_BUILD”, since we’re not setting the DEBUG_BUILD environment var before builds. I think we could add an option to pass in a set of build flags, I suspect the tricky part becomes presenting those options in a clear way.

ifeq ("$(DEBUG_BUILD)","y")