| ay back in my October 1996 column in MSJ, I 
            addressed a question concerning the size of executable files. Back 
            then, a simple Hello World program compiled to a 32KB executable. 
            Two compiler versions later, the problem is only slightly better. 
            The same program with the Visual C++? 6.0 compiler is now 
            28KB. In that column, I 
            provided a replacement runtime library that lets you create very 
            small executable programs. There were some restrictions on what 
            situations it was useful for, but for a large number of my own 
            programs it worked well. After living with these restrictions for 
            quite a while, I decided it was time to fix some of them. Making 
            these modifications also happens to provide a great opportunity to 
            describe a little-known linker option that can be used to further 
            reduce program size.
 EXE and DLL SizeBefore 
            jumping into the code for my replacement runtime library, it‘s worth 
            taking the time to review why simple EXEs and DLLs are bigger than 
            you might expect. Consider the canonical Hello World program:
 Let‘s compile this program for size, and generate a map file. Using 
            the command-line Visual C++ compiler, the syntax would be:#include <stdio.h>
void main()
{
    printf ("Hello World!\n" );
}
 
 First, look at 
            the .MAP file; a trimmed down version is shown in Figure 1. 
            From looking at the addresses of main (0001:00000000) and of printf 
            (0001:0000000C), you can infer that function main‘s code is only 0xC 
            bytes in length. Looking at the last line of the file, the __chkstk 
            function at address 0001:00003B10, you can also infer that there‘s 
            at least 0x3B10 bytes of code in the executable. That‘s over 14KB of 
            code to send Hello World to the 
            screen.Cl /O1 Hello.CPP /link /MAP
 Now, start looking 
            through some of the other .MAP file lines. Some items make sense, 
            for example, the __initstdio function. After all, printf writes its 
            output to a file, so some amount of underlying runtime library 
            support routines for stdio makes sense. Likewise, it‘s reasonable to 
            expect that the printf code might call strlen, so its inclusion 
            isn‘t a surprise.
 However, 
            take a look at some of the other functions, for instance 
            __sbh_heap_init. This is the initialization function for the runtime 
            library‘s small block heap. The Win32?-based operating systems offer 
            up their own heap in the form of the HeapAlloc family of functions. 
            Potential performance gains notwithstanding, the Visual C++ library 
            could choose to use the Win32 heap APIs, but doesn‘t. Thus, you end 
            up with more code than necessary in your 
            executable.
 While some people 
            might not care that the runtime library implements its own heap, 
            there are other less defensible examples. Consider the 
            __crtMessageBoxA function near the bottom of the map file. This 
            function allows the runtime library to call the MessageBox API 
            without forcing the executable to link against USER32.DLL. For a 
            simple Hello World program, it‘s hard to anticipate the need to call 
            MessageBox.
 Consider another 
            example: the __crtLCMapStringA function, which does locale-dependent 
            transformations of strings. While Microsoft is somewhat obligated to 
            provide locale support, it‘s not really needed for a large number of 
            programs. Why make programs that don‘t use locales pay the cost for 
            those that do?
 I could 
            continue with other examples of unneeded code, but I‘ve made my 
            point. A typical small program contains lots of little nuggets of 
            code that aren‘t used. By themselves, they don‘t contribute much to 
            the code size, but add up all the cases and you‘re into serious 
            amounts of code!
 What About the C++ Runtime Library DLL?Alert readers might say, "Hey 
            Matt! Why don‘t you just use the DLL version of the runtime 
            library?" In the past, I could make the argument that there was no 
            consistently named version of the C++ runtime library DLL available 
            on Windows? 95, Windows 98, Windows NT 3.51, Windows NT? 4.0, and so 
            forth. Luckily, we‘ve moved past those days, and in most cases you 
            can rely on MSVCRT.DLL being available on your target 
            machines.Making this switch 
            and recompiling Hello.CPP, the resulting executable is now only 
            16KB. Not bad, but you can do better. More importantly, you‘re just 
            shifting all of this unneeded code to someplace else (that is, to 
            MSVCRT.DLL). In addition, when your program starts up, another DLL 
            will have to be loaded and initialized. This initialization includes 
            items like locale support, which you may not care about. If 
            MSVCRT.DLL suits your needs, then by all means use it. However, I 
            believe that using a stripped-down, statically linked runtime 
            library still has merit.
 I 
            may be tilting at windmills here, but my e-mail conversations with 
            readers show that I‘m not alone. There are people out there who want 
            the leanest possible code. In this day of writeable CDs, DVDs, and 
            fast Internet connections, it‘s easy not to worry about code size. 
            However, the best Internet connection I can get at home is only 
            24Kbps. I hate wasting time downloading bloated controls for a Web 
            page.
 As a matter of 
            principle, I want my code to have as small a footprint as possible. 
            I don‘t want to load any extra DLLs that I don‘t really need. Even 
            if I might need a DLL, I‘ll try to delayload it so that I don‘t 
            incur the cost of loading it until I use the DLL. Delayloading is a 
            topic I‘ve described in previous columns, and I strongly encourage 
            you to become familiar with it. See Under the Hood in the December 
            1998 issue of MSJ for starters.
 Digging DeeperNow that 
            I‘ve beaten up the unneeded code within the program, let‘s turn to 
            the executable file itself. If you were to run DUMPBIN /HEADERS on 
            my Hello.EXE, you‘d see the following two lines in the output:
 The second line is interesting. It says that every code and data 
            section in the executable is aligned on a 4KB (0x1000) byte 
            boundary. Because sections are stored contiguously in a file, it‘s 
            not hard to see the potential for wasting up to 4KB between the end 
            of one section and the start of the 
            next.1000 section alignment
1000 file alignment
 If I had linked the 
            program with a version of the linker that came before Visual C++ 
            6.0, I would have seen something different, as you see here:
 
 The key difference is that the alignment between sections is only 
            512 bytes (0x200). There‘s much less space available to waste. In 
            Visual C++ 6.0, the linker defaults were changed to make the file 
            alignment of sections equal to the alignment in memory. This 
            provides a slight load-time performance improvement on Windows 
            9x, but makes executables 
            bigger.1000 section alignment
200 file alignment
 Luckily, the Visual 
            C++ linker has a way to go back to the previous behavior. The magic 
            switch is /OPT:NOWIN98. Rebuilding Hello.CPP as before, but with the 
            addition of this linker switch gets the executable file down to 
            21KB?a savings of 7KB. If I switch to linking with MSVCRT.DLL and 
            using /OPT:NOWIN98, the executable size drops to 2560 bytes!
 
 LIBCTINY: A Minimal Runtime LibraryNow that you understand the 
            problem of why simple EXEs and DLLs are so large, it‘s time to 
            introduce my new and improved replacement runtime library. In the 
            October 1996 column (mentioned earlier), I created a small static 
            .LIB file designed to replace or augment the Microsoft LIBC.LIB and 
            LIBCMT.LIB libraries. I called this replacement runtime library 
            LIBCTINY.LIB, since it was a very stripped-down version of 
            Microsoft‘s own runtime library 
            sources.LIBCTINY.LIB is 
            intended for simple applications that don‘t require a huge amount of 
            runtime library support. Thus, it‘s not suitable for MFC 
            applications or other complicated scenarios that make extensive use 
            of the C++ runtime. LIBCTINY‘s ideal target is small programs or 
            DLLs that call some Win32 APIs and perhaps display some simple 
            output.
 There are two 
            guiding principles behind LIBCTINY.LIB. First, it replaces the 
            standard Visual C++ startup routines with much simpler code. This 
            simpler code doesn‘t refer to any of the more esoteric runtime 
            library functions like __crtLCMapStringA. Because of this, much less 
            extraneous code is linked into your binary. As I‘ll show shortly, 
            the LIBCTINY routines perform a bare minimum of tasks before calling 
            your WinMain, main, or DllMain 
            routines.
 The second guiding 
            principle of LIBCTINY.LIB is to implement relatively large functions 
            like malloc or printf with code that‘s already in the Win32 system 
            DLLs. Beyond the minimal startup code, most of the other LIBCTINY 
            source files are simple implementations of standard C++ runtime 
            library functions such as malloc, free, new, delete, printf, strupr, 
            strlwr, and so on. Take a look at the implementation of printf in 
            printf.cpp (see Figure 2) 
            to get an idea of what I‘m talking 
            about.
 In my original version 
            of LIBCTINY.LIB there were two restrictions that annoyed me. First, 
            the original version did not support DLLs. You could make tiny 
            console and GUI executable programs, but if you wanted to create a 
            tiny DLL, you were out of 
            luck.
 Second, the 
            original LIBCTINY did not support static C++ constructors and 
            destructors. By this, I mean constructors and destructors declared 
            at global scope. In the new version, I‘ve added the basic code that 
            implements this support. Along the way, I learned quite a bit about 
            how the compiler and runtime library play a complicated game to make 
            static constructors and destructors work.
 The Dark Underbelly of ConstructorsWhen the compiler processes a 
            source file that has a static constructor, it generates two things. 
            The first is a small blob of code with a name like $E2 that calls 
            the constructor. The second thing the compiler emits is a pointer to 
            this blob of code. This pointer is written to a specially named 
            section in the .OBJ called 
            .CRT$XCU.Why the funny 
            section name? It‘s a bit complicated. Let me throw another piece of 
            data at you to help explain. If you examine the Visual C++ runtime 
            library sources (for instance, CINITEXE.C), you‘ll find the 
            following:
 
 The previous lines of code create two data segments, .CRT$XCA and 
            .CRT$XCZ. In each segment it places a variable (__xc_a and __xc_z, 
            respectively). Note that the segment names are very similar to the 
            .CRT$XCU segment to which the compiler emits the constructor code 
            pointer.#pragma data_seg(".CRT$XCA")
_PVFV __xc_a[] = { NULL };
#pragma data_seg(".CRT$XCZ")
_PVFV __xc_z[] = { NULL };
 At this point, a 
            little linker theory is needed. When processing all of the segments 
            to create the final portable executable (PE) file, the linker 
            concatenates all the data from identically named segments. Thus, if 
            A.OBJ has a section called .data, and B.OBJ also has a .data 
            section, all the data from A.OBJ and B.OBJ will be written 
            contiguously into a single .data section in the PE 
            file.
 The use of a $ in a 
            segment name puts a new twist on things. When encountering segment 
            names with a $ in them, the linker treats the portion of the name 
            preceding the $ as the final segment name. Thus, the .CRT$XCA, 
            .CRT$XCU, and .CRT$XCZ segments all end up together in the final 
            executable in a segment called 
            .CRT.
 What about the part of 
            the segment name following the $? When combining these types of 
            sections, the linker writes out the segments in the order dictated 
            by the string following the $. The ordering is alphabetical, so all 
            the data from .CRT$XCA goes first, followed by all of the data from 
            .CRT$XCU, and finally all of the data from .CRT$XCZ. This is a 
            crucial point to 
            understand.
 What‘s going on 
            here is that the runtime library code has no idea how many static 
            constructor calls are needed for a given EXE or DLL. However, it 
            does know that only pointers to constructor code blobs will be in 
            the .CRT$XCU segment. When the linker concatenates all the .CRT$XCU 
            sections, it has the net effect of creating a function pointer 
            array. By defining .CRT$XCA and .CRT$XCZ segments along with the 
            __xc_a and __xc_z symbols, the runtime library can reliably locate 
            the beginning and end of the function pointer 
            array.
 As you might 
            expect, calling all the static constructors in a module is a simple 
            matter of enumerating through the function pointer array, calling 
            each pointer in turn. The routine that does this is _initterm, shown 
            in Figure 3. 
            This routine is identical to the version from the Visual C++ runtime 
            library sources.
 All 
            things considered, getting static constructors to work in LIBCTINY 
            was relatively easy. It was mostly a matter of defining the right 
            data segments (specifically, .CRT$XCA and .CRT$XCZ), and calling 
            _initterm from the correct spot in the startup code. Getting static 
            destructors to work was a bit 
            trickier.
 Unlike the function 
            pointer array that the compiler and linker conspire to create for 
            static constructors, the list of static destructors to call is built 
            at runtime. To build this list, the compiler generates calls to the 
            atexit function, which is part of the Visual C++ runtime. The atexit 
            function takes a function pointer and adds the pointer to a 
            first-in, last-out list. When the EXE or DLL unloads, the runtime 
            library iterates through the list and calls each function 
            pointer.
 LIBCTINY‘s 
            implementation of the atexit functionality is significantly simpler 
            than what the Visual C++ runtime library does. There are three 
            functions and a handful of static variables for this support, which 
            is also in initterm.cpp. The _atexit_init function simply allocates 
            an array to hold 32 function pointers, and stores the pointer in the 
            pf_atexitlist static 
            variable.
 The atexit function 
            checks to see if there‘s room in the array, and if so, adds the 
            pointer to the end of the list. A more robust version of this code 
            would reallocate the array to a larger size if necessary. Finally, 
            the _DoExit function uses your friend, _initterm, to iterate through 
            the array and call each function pointer. In an ideal world, _DoExit 
            would iterate through the array in reverse order, mimicking the 
            behavior of the Visual C++ runtime library implementation. However, 
            the whole purpose of LIBCTINY is to be simple and small, rather than 
            striving for perfect compatibility.
 LIBCTINY‘s Minimal Startup RoutinesNow let‘s take a look at 
            LIBCTINY‘s new support for small DLLs. As with EXEs, the trick is to 
            make the DLL‘s entry point code as small as possible and omit calls 
            to unneeded routines that bring in lots of other code. Figure 4 shows 
            the minimal DLL startup code. When your DLL is loaded, it is this 
            code, not your DllMain routine, that executes 
            first.The _DllMainCRTStartup 
            is the very first place execution begins in your DLL. In LIBCTINY‘s 
            implementation, it first checks to see if the DLL is in its 
            DLL_PROCESS_ATTACH call. If so, the code calls _atexit_init 
            (described earlier), and _initterm to invoke any static 
            constructors. The heart of the function is the call to DllMain, 
            which is the routine you supply as part of your DLL‘s code. This 
            DllMain call is made for all four notification types (process 
            attach/detach, and thread 
            attach/detach).
 The last 
            thing DllMainCRTStartup does is to check if the DLL is in its 
            DLL_PROCESS_DETACH code. If so, the code calls _DoExit. As described 
            earlier, this causes any static destructors to be called. If you‘re 
            curious about the startup code for console and GUI mode EXEs, be 
            sure to check out CRT0TCON.CPP and CRT0TWIN.CPP, respectively. 
            (These modules accompany the code download, found at the link at the 
            top of this article.)
 One 
            other thing worth checking out in DLLCRTO.CPP (see Figure 4) 
            is this line near the top:
 
 This 
            puts a linker directive into the DLLCRT0.OBJ file that tells the 
            linker to use the /OPT:NOWIN98 switch. The benefit is that you don‘t 
            have to manually add /OPT:NOWIN98 to your make files or project 
            files by hand. I figure if you‘re using LIBCTINY, you‘d probably 
            want to use /OPT:NOWIN98 as well.#pragma comment(linker, "/OPT:NOWIN98")
 Using LIBCTINY.LIBUsing LIBCTINY is very simple. 
            All you have to do is add LIBCTINY.LIB to the linker‘s list of .LIB 
            files to search. If you‘re using the Visual Studio? IDE, this would 
            be in the Projects | Settings | Link tab. It doesn‘t matter what 
            type of binary you‘re building (console EXE, GUI EXE, or DLL), since 
            LIBCTINY.LIB contains appropriate entry point routines for each of 
            them.Take a look at 
            TEST.CPP in Figure 5. 
            This program simply exercises a few of the routines that 
            LIBCTINY.LIB implements, and includes a static constructor and 
            destructor invocation. When I compile it normally with Visual C++ 
            6.0,
 
 the resulting executable is 
            32768 bytes. By simply adding LIBCTINY.LIB to the command lineCL /O1 TEST.CPP
 
 the resulting 
            executable shrinks to 3072 
            bytes.CL /O1 TEST.CPP LIBCTINY.LIB
 You might be wondering 
            about the runtime library routines that LIBCTINY doesn‘t implement. 
            For instance, in TEST.CPP, there‘s a call to strrchr. There‘s no 
            problem here because that function exists in the regular LIBC.LIB or 
            LIBCMT.LIB that Visual C++ provides. Both LIBCTINY.LIB and LIBC.LIB 
            implement a variety of routines. LIBCTINY‘s list is obviously 
            smaller than what LIBC.LIB provides. The important thing for your 
            purposes is that the linker finds the LIBCTINY routines first when 
            resolving function calls, and so LIBCTINY‘s routines are what‘s 
            used. If something isn‘t implemented in LIBCTINY, the linker finds 
            it in LIBC.LIB 
            instead.
 Finally, it‘s worth 
            repeating that LIBCTINY isn‘t suitable for all purposes. For 
            example, if your code makes use of multiple threads and relies on 
            the runtime library‘s per-thread data support, then LIBCTINY isn‘t 
            for you. What I do is try LIBCTINY with a prospective program. If it 
            works, great! If not, I simply use the normal runtime library.
 Metadata Article CorrectionIn my October 
            2000 MSDN Magazine article "Avoiding 
            DLL Hell: Introducing Application Metadata in the Microsoft .NET 
            Framework," I said that using the Visual C++ 6.0 #import 
            directive causes the compiler to read in a COM type library and 
            generate ATL-ready header files for all the interfaces contained 
            within. While header files are generated by #import, it turns out 
            they don‘t use ATL.Richard 
            Grimes, author of Professional ATL COM 
            Programming (Wrox Press, 1998), kindly pointed out to me 
            that #import generates what Microsoft calls "compiler COM support 
            classes," which are supported by the COMDEF.H header. Richard goes 
            on to say, "There are many differences between the COM compiler 
            support classes and the equivalent in ATL. The most important is 
            that ATL does not use C++ exceptions. In fact, the ATL classes are 
            more lightweight than the COM compiler support classes and so I 
            would have preferred if Microsoft had decided to generate ATL 
            code."
 I have to confess that 
            I should have studied this more before I wrote it. My experience 
            with ATL is limited to the wizards in Visual C++, and tweaking the 
            resulting code. I have used #import on a few occasions, but not 
            enough to have made the connection that the resulting code wasn‘t 
            ATL. Thanks to Richard for pointing this out to me, and for giving 
            me even more incentive to verify everything before I write about 
          it.
 |