Wednesday, December 4, 2013

Trimming the fat from avr-gcc code

Although writing in AVR assembly makes it easy to write programs that fit in a small codespace, writing in C and using AVR Libc is more convenient.  This article outlines how to write C code that avr-gcc will build to a minimal size.  There are a number of other guides for writing small AVR code including AVR 4027, but none of them seem to address the overhead of avr-gcc's start-up library (gcrt1).

Many people seem to be still using avr-gcc 4.3.3 as it usually generates smaller code than 4.5.3 and 4.7.  I recently tried avr-gcc 4.8.2 (linux RPM cross-avr-gcc-4.8.2-3.2) , and for the program I use here, it generates even smaller code than 4.3.3.

The test program uses the ATtiny85's internal temperature sensor and flashes the temperature using a LED.  When compiled using -Os it results in a 274-byte program:
avr-size temperature
   text    data     bss     dec     hex filename
    274       0       0     274     112 temperature.bu
With avr-gcc 4.8.2 that drops to 240 bytes:
 avr-size temperature-4.8
   text    data     bss     dec     hex filename
    240       0       0     240      f0 temperature-4.8

The difference is primarily in the startup files linked to the code.  Disassembling the code with avr-objdump -d shows the reset vector contains a jump to a function called __ctors_end:
   0:   0e c0           rjmp  .+28      ; 0x1e <__ctors_end>
0000001e <__ctors_end>:
  1e:   11 24           eor     r1, r1
  20:   1f be           out     0x3f, r1        ; 63
  22:   cf e5           ldi     r28, 0x5F       ; 95
  24:   d2 e0           ldi     r29, 0x02       ; 2
  26:   de bf           out     0x3e, r29       ; 62
  28:   cd bf           out     0x3d, r28       ; 61

The function __ctors_end falls into __do_copy_data, which falls into __do_clear_bss before an rcall to main followed by an rjmp to _exit.  In total it's about 50 bytes of code before calling main.  With avr-gcc 4.8.2, the only code before main is __ctors_end, or 16 bytes of what would seem to be overhead.

Before trying to cut out __ctors_end, I wanted to make sure the code in __ctors_end is really overhead that can be safely removed.  The first two lines clear SREG.  Section 8.1 of the ATtinyX5 datasheet states, "During reset, all I/O Registers are set to their initial values, and the program starts execution from the Reset Vector."  The datasheet also indicates it's initial value is 0, so the first two lines can go.  The last 4 lines set the stack pointer (SPL and SPH) to RAMEND, which section 4.6 of the datasheet indicates is their initial value.  So it is safe to get rid of __ctors_end and jump straight to main from the reset vector, for a savings of 16 bytes.

Another 30 bytes of data is used for the interrupt vector table (and even more than 30 bytes on the ATmega series MCUs).  Section 9.1 of the datasheet states, "If the program never enables an interrupt source, the Interrupt Vectors are not used, and regular program code can be placed at these locations."  My temperature blinking program doesn't use interrupts so more space can be saved by getting rid of the interrupt table.

The way to tell avr-gcc not to link in the startup code is -nostartfiles.  If that is all you do with your C code, then avr-gcc will stick the first object file at address 0 (the reset vector).  To ensure the reset vector contains a jump to main I wrote a small assembly program (crt1.S).  I this custom startup code instead of gcrt1 included with the compiler libraries.  The code isn't long, so I'll include it inline:
.org 0x0000
__vectors:
rjmp main

Compile it (avr-gcc -c crt1.S), and link it with your C code.  For compiling temperature.c here's the command line I used, including a couple of extra flags helpful for generating small code:
avr-gcc -mmcu=attiny85 -Os -fno-inline-small-functions -mrelax -nostartfiles crt1.o    temperature.c   -o temperature

The resulting program is 190 bytes, saving 84 bytes vs. avr-gcc 4.3.3 or 50 bytes vs. 4.8.2:
avr-size temperature
   text    data     bss     dec     hex filename
    190       0       0     190      be temperature

Note that many virtual bootloaders for the ATtiny MCUs will cause problems with this technique as they tend to assume application code doesn't start until after the interrupt vector table.  MCUs with hardware bootloader support (i.e. the ATmega series) will not have problems.  Picoboot, the bootloader I am writing, will only assume the reset vector contains an rjmp to the start of application code and therefore will work with my custom crt1.o.

No comments:

Post a Comment