Borland/Embarcadero inline assembly experience
I happen to be a weirdo that like to make the occasional inline assembly routine just for fun. Over the years I've learned some stuff about inline assembly in C++ Builder, and I figured that if i write it down I might remember it better - some of the things I've had to discover more than once :-) And since I'm at it I figured maybe someone else might find this useful, so here we go:
- BEWARE that references to AnsiStrings are pointers to possibly null pointers, remember to dereference, and remember to check for null.
- BEWARE of the reference count if you modify an AnsiString:It might not be 1 unless you provide a wrapper function that call's AnsiString::Unique().
- The string length is at offset -4. It is never 0, Empty strings have a null pointer instead.
- Looping over the chars of an AnsiString is most efficientlydone by reading the length, adding the length to the pointer and negating the length, like this:
MOV ECX, -4[EAX]
ADD EAX, ECX
MOV R8, -4[RCX]
This only requires 2 registers, and no CMP's. BUT: It leaves the string pointer at the string end, so it can't be reused, and the length can't be accessed again. If you need to use the string later it will still be more efficient to just store acopy of the string pointer or length instead of CMP'ing a counter to the string length in memory (which also only requires 2 registers).
ADD RCX, R8
- Some string ops can be done with XMM registers. Although somewhat cumbersome they do have the advantage of providing 6 or 8 extra registers and operating on 16 bytes simultaneously.
- Remember: Jcc's can be quite expensive. SETcc, CMOVcc, ADC and SBB enable many small tricks that can eliminate Jcc's. Though often at the expense of an extra register used :-(
- Stack usage: Beware of exception handling, especially in 64 bit code, and if accessing memory. You probably won't get into trouble in 32 bit code if you leave EBP alone.
32 bit Borland x86
for pure assembly function or method use:
Type __declspec(naked) Function(Parameters)
REMEMBER THE RET INSTRUCTION!!!, it's a nightmare debugging the behavior of a missing RET
- The newer instruction sets aren't supported you'll have to make them yourself by DB'ing the instruction bytes into the program. There's a tip for getting the addressing modes right: Make a supported instruction with the addressing mode you wish to use, look in the debugger for the right bytes and use them on the unsupported instruction. Also at least the debugger support a few more instructions than the assembler.
- You can't take the addresses of labels, so forget about DB,DW, DD and DQ'ing data into your routine. But you can get access to variables and constants in the C++ file outside your routine.
- You can't call class/object methods. You can't call labels. And forget virtual methods, there is no reliable way to get the vTable index. You CAN call non class/object functions. And you might be able to cheat, you can call addresses in a register, and you might be able to define function pointer constants outside your routine and use those. Another possibility is exiting the asm block and generating a C++ call like this:((Type*) _EAX)->Method(_EDX, _ECX);, this seems to generate asingle call instruction to Method. Then you can begin another asm block right after the call.
- Know your calling conventions. Use __fastcall: 3 parameters in EAX, EDX, ECX. All other general purpose registers must be preserved. Further parameters on the stack. Return value in AL, AX, EAX or EDX:EAX or a x87 floating point register. BEWARE who is supposed to clean up any parameters on the stack.
- Hidden parameters: For object methods there is the this pointer as a hidden first parameter, in EAX for __fastcall. For return values that can't be returned in the above mentioned registers a hidden pointer is passed, as far as i remember after the last parameter. As far as i know this pointer isn't available in any way in C++, so you could use inline assembly to get at it.
- Be careful about MMX registers, they are shared with the x87 registers, BCC32 uses those.
- Use your SSE (XMM) registers if possible. BCC32 doesn't use them at all, so that's 8 extra registers with float, double, some 64 bit integer operations and 16 byte parallel string operations. BEWARE however that getting BCC32 to align your data seems impossible, and most SSE instructions require 16 byte alignment!
- Some routines might work for both signed and unsigned data, and/or pointers. You can make multiple entry points by declaring several prototypes and then using #pragma alias to connect themto the same routine. BEWARE that #pragma alias works with mangled names! EDIT: 10 Update 1 status: #pragma alias no longer works with bcc64 and doesn't work with the new Clang based bcc32c!
- Access object members by prefixing them with a ., like this:MOV ECX, .Member[EAX]. The member names must be unique! If in trouble a quick anonymous union might solve the problem.
- When making assembly macros it's a good idea to put a ; after each instruction, although it is possible to make multi line macros.
32 bit Clang x86
Should be largely the same as BCC64, but isn't quite, it has problems with borlands mangled symbold and their @'s
- Still uses Borland fastcall calling convention.
- Don't know about exceptions. Linker seems to still be the same, so exceptions might be as well.
- Don't know about XMM registers, seems likely that Clang/bcc32c uses them. EDIT: Actually BCC32C seems not to use them, at least not with default settings.
- Extended assembly can't preserve EBP - or ESP unsurprisingly.
- Extended assembly Makes no attempt to preserve MMx or XMMx registers. Code still uses x87 for FP.
64 bit x64
asm(".intel_syntax;"Yes, i know it's EXTREMELY annoying and UGLY. If you leave out the.intel_syntax you get .att_syntax, which, believe it or not, is even more EXTREMELY annoying and UGLY. EDIT: 10 Update 1 status: Error reporting in the message window seems to only be able to highlight the right line for an error if you use "\n" instead of ";"
" Instruction 1;"
" Instruction 2;");
for pure assembly methods use:
void __fastcall __declspec(naked) Function()
" Instruction 1\n"
" Instruction 2\n");
or use __attribute__((naked)) instead of __declspec(naked)
Currently there is no need to remember the RET instruction, the bugged implementation of the naked attribute adds one, but you might as well be safe, the worst that can happen is a 6.25 % chance of wasting 16 bytes because of alignment.
- The implementation of the naked attribute is bugged. In debug mode your function MUST be void Function(void), instructions for parameters and return values WILL be generated without the code to setup a stack frame, and they WILL thrash your stack and destroy your return value. In release mode with no debugging info you'll be OK. This is not quite as bad as it sounds, you can use the 32 bit tip for using multiple entries to the same function: Put the correct prototypes in the header, then use #pragma alias to connect them to your implementation. Again BEWARE that #pragma alias works with mangled names! BEWARE that methods have this as a hidden parameter, and that return values that can't be returned in a register will also generate a hidden parameter! So stick with void Function(void)! EDIT: 10 Update 1 status: Still adds the RET. Internal compiler error if returning a pointer. Otherwise seems the bugs are fixed.
- Know your calling convention: Parameters are passed in RCX/XMM0, RDX/XMM1, R8/XMM2, R9/XMM3. Return value in RAX. R10, R11, XMM4, XMM5 can be freely used. MMX and x87 registers should also all be available, but their use is not recomended. BEWARE of who is supposed to clean up any parameters on the stack. For methods this is a first hidden parameter. For functions with return values that can't be returned in a register a pointer to the return object/struct/union is a first hidden parameter. In a method with a non register return this is the first hidden parameter, the pointer to the return object/struct/union is the second.
- In BCC64 you can take the address of, and call your labels. You can also call mangled names, so that includes class methods, object methods and functions. Probably also provides access to global and class variables.
- Local labels of the form 0: 1: 2: 3: ... don't work in intel mode. Local symbols might, not sure yet. EDIT: Local labels do work in .intel_syntax, but you have to put () around their use, like (1b) to access the last 1: before. Local symbols with .LSymbol: seems to be OK too, and potentially quite useful, you can jump between functions. EDIT: 10 Update 1 status: Local labels no longer work in Intel syntax, even with () around.
- Stack use: BEWARE that when calling compiler generated routines that YOU are supposed to do part of the stack setup even if the parameters fit in registers! Also beware that the so called "Red Zone" is only available on Linux. Resist the temptation to free up registers by using the 32 bytes available to you on the stack, exceptions will skip the code to restore the registers. It should be OK to store temp values in the 32 bytes. BTW the 32 bytes should be addressed as [RSP+(8:32)].
- BEWARE of exception handling if you make calls (or access memory that might be invalid!)! There is apparently a windows function you must call to register your function to get the stack properly unwound if a sub routine generates an exception! EDIT: On closer inspection it seems that you'll be OK so long as you don't mess with ANYTHING you're not allowed to, including the stack! No pushing to preserve registers! A normal function with extended inline assembly can be used if you need more registers. EDIT: You might be OK pushing to the stack as long as what ever you are pushing can't possibly be interpreted as a pointer into code (that is, as a call return value). In particular that means the following should always be ok: null, pointers to data, 1 and 2 byte values. No 4 or 8 byte values, you can't know the address of your code or runtime linked DLL's!
- BCC64 doesn't seem to have any way to compile to assembly, so the site http://gcc.godbolt.org/ might prove useful, it translates C++, including inline assembly, to assembly. If you select Ubuntu clang version 3.0.6you should get a Clang of about the same vintage as the one BCC64 is based on. BEWARE: Not an exact clone of BCC64 though, for instance handles clobber lists different. EDIT: 10 Update 1 status: Borland Clang is up to 3.1, seems 3.2 is best bet. BEWARE: bcc32c seems quite different from Clang because of the Borland extensions
- A number of the condition codes for the MOVcc, SETcc and Jccinstructions are missing! For instance, z, nz, c, nc where you have to use e, ne, b, ae instead. VERY VERY strange! EDIT: It's even worse, the SETcc and JCC instructions are just missing, the CMOVcc instructions have errors for the memory to register versions, and their parameters reversed for register to register functions! EDIT: 10 Update 1 status: SETcc, Jcc seems fixed, MOVcc seems worse.
- There is another potentially simpler way to write pure simple assembly functions:
This way you don't need the #pragma alias, and any excess code being generated by the compiler. You also don't have a snowball's chance in hell of getting exception handling, so hands off the relevant registers, especially RSP. AND REMEMBER THE RET INSTRUCTION. EDIT: 10 Update 1 status: No longer works, at least not in Clang/bcc32c 32 bit mode.
- .loc directive can be used to generate information for debugging, including enabling breakpoints in inline assembly. Use a couple of macros to prefix to every assembly line:
#define IAsmStrify2(P1) #P1
#define IAsmStrify(P1) IAsmStrify2(P1)
#define AsL ".loc 1 " IAsmStrify(__LINE__) " 0\n"
AsL " CLR ESI\n"
AsL " TST EAX\n"
AsL " MOV EBX, ECX\n");
So the big one: extended assembly
Which specifies outputs, inputs and clobbers after the code like this:
asm("Instructions" : Outputs : Inputs : Clobbers);
Page created by: Sune E. M. Andersen 2015-01-18. Last updated: 2019-04-21.
- Many tips and warnings for simple inline assembly remain valid for extended!
- The usefulness in intel_syntax is severely reduced:
- No register or memory inputs or outputs can be used. The compiler insists on using att_syntax for the registers, and stuff like -40(%rsp) or %rcx is not tolerated in intel mode. EDIT: You can use "a", "b", "c", "d", "S", "D" for RAX, RBX, RCX, RDX, RSI, RDI registers, then you don't have to use %N to get access to them. Limits the compilers optimization choices though.
- The clobber list seems to be useless, or maybe in my simple test the effect was simply optimized away. Either way beware if it tries to use a non existent stack frame! EDIT: Not so, further tests reveal that the clobber list works OK,and is implemented with PUSH'es. It just seems that in some situations the effect might get optimized away. EDIT 2: No pushes for XMM registers of course, SUB RSP,X + MOVAPS is used.
- And not only does the clobber mechanism work, it's aware of which registers are volatile, and it doesn't bother to preserve them. So you can use it to investigate the calling conventions of your architecture.
- However the clobber implementation isn't perfect: If it's activated it also PUSH'es RAX but doesn't bother to restore it if there's no return value.
- Constant inputs are possible, and useful: "i" (&Class::Member) provides offsets to object members. BEWARE that using the constant require %c0 instead of %0, %0 inserts att_syntax constants with a $ prefix!
- This isn't necessarily as bad as it sounds, if you write entire routines in assembly you already know your inputs and outputs from the calling convention.
- .att_mode works for memory, constants and registers, although I'm not sure about the clobber list here either. EDIT: Works as expected.
- Remember: att_syntax require extra % signs in extended assembly!
- Feeling tempted to use att syntax to access inputs and outputs, and intel syntax for the rest? No such luck, the .att_syntax directive doesn't work in BCC64. You start out in att syntax, but once you change to intel syntax you're stuck there. EDIT: 10 Update 1 status: You can now switch back to att_syntax.
- BEWARE of optimizations! Clang/BCC64 can inline functions with extended assembly, and it will optimize away any parameters and outputs you don't use. the volatile keyword might help.
- const AnsiString& parameters can be added as inputs to extended asm in 2 ways: Parm or &Parm. &Parm passes the reference directly, Parm dereferences it for you, but in either case you must remember to test for an empty string. Note that the Parm form might well have merit since it can be used directly without changing it, and thus it can be used as a pure input. See below.
- BEWARE: Officially you are not allowed to modify your extended assembly inputs, presumably not even temporarily if there's achance of needing to handle exceptions! A simple solution is to specify your inputs as outputs as well, BCC64 will even allow you to specify a const value as an output! BEWARE that you are not allowed to use the obvious solution of specifying your inputs in the clobber list as well.