Borland/Embarcadero inline assembly experience
I happen to be a weirdo that like to make the occasional inline
assembly routine just for fun. Over the years I've learned some
stuff about inline assembly in C++ Builder, and I figured that if
i write it down I might remember it better - some of the things
I've had to discover more than once :-) And since I'm
at it I figured maybe someone else might find this useful, so here
- BEWARE that references to AnsiStrings are pointers to
possibly null pointers, remember to dereference, and
remember to check for null.
- BEWARE of the reference count if you modify an
AnsiString:It might not be 1 unless you provide a wrapper
function that call's AnsiString::Unique().
- The string length is at offset -4. It is never 0, Empty
strings have a null pointer instead.
- Looping over the chars of an AnsiString is most
efficiently done by reading the length, adding the length to
the pointer and negating the length, like this:
MOV ECX, -4[EAX]
ADD EAX, ECX
MOV R8, -4[RCX]
This only requires 2 registers, and no CMP's. BUT: It leaves
the string pointer at the string end, so it can't be reused,
and the length can't be accessed again. If you need to use
the string later it will still be more efficient to just
store acopy of the string pointer or length instead of
CMP'ing a counter to the string length in memory (which also
only requires 2 registers).
ADD RCX, R8
- Some string ops can be done with XMM registers. Although
somewhat cumbersome they do have the advantage of providing
6 or 8 extra registers and operating on 16 bytes
- Remember: Jcc's can be quite expensive. SETcc, CMOVcc, ADC and
SBB enable many small tricks that can eliminate Jcc's. Though
often at the expense of an extra register used :-(
- Stack usage: Beware of exception handling, especially in 64
bit code, and if accessing memory. You probably won't get into
trouble in 32 bit code if you leave EBP alone.
32 bit Borland x86
for pure assembly function or method use:
Type __declspec(naked) Function(Parameters)
REMEMBER THE RET INSTRUCTION!!!, it's a nightmare debugging
the behavior of a missing RET
- The newer instruction sets aren't supported you'll have to
make them yourself by DB'ing the instruction bytes into the
program. There's a tip for getting the addressing modes right:
Make a supported instruction with the addressing mode you wish
to use, look in the debugger for the right bytes and use them on
the unsupported instruction. Also at least the debugger support
a few more instructions than the assembler.
- You can't take the addresses of labels, so forget about DB,DW,
DD and DQ'ing data into your routine. But you can get access to
variables and constants in the C++ file outside your routine.
- You can't call class/object methods. You can't call labels.
And forget virtual methods, there is no reliable way to get the
vTable index. You CAN call non class/object functions. And you
might be able to cheat, you can call addresses in a register,
and you might be able to define function pointer constants
outside your routine and use those. Another possibility is
exiting the asm block and generating a C++ call like
this:((Type*) _EAX)->Method(_EDX, _ECX);, this seems to
generate asingle call instruction to Method. Then you can begin
another asm block right after the call.
- Know your calling conventions. Use __fastcall: 3 parameters in
EAX, EDX, ECX. All other general purpose registers must be
preserved. Further parameters on the stack. Return value in AL,
AX, EAX or EDX:EAX or a x87 floating point register. BEWARE who
is supposed to clean up any parameters on the stack.
- Hidden parameters: For object methods there is the this
pointer as a hidden first parameter, in EAX for __fastcall. For
return values that can't be returned in the above mentioned
registers a hidden pointer is passed, as far as i remember after
the last parameter. As far as i know this pointer isn't
available in any way in C++, so you could use inline assembly to
get at it.
- Be careful about MMX registers, they are shared with the x87
registers, BCC32 uses those.
- Use your SSE (XMM) registers if possible. BCC32 doesn't use
them at all, so that's 8 extra registers with float, double,
some 64 bit integer operations and 16 byte parallel string
operations. BEWARE however that getting BCC32 to align your data
seems impossible, and most SSE instructions require 16 byte
- Some routines might work for both signed and unsigned data,
and/or pointers. You can make multiple entry points by declaring
several prototypes and then using #pragma alias to connect them
to the same routine. BEWARE that #pragma alias works with
mangled names! EDIT: 10 Update 1 status: #pragma alias
no longer works with bcc64 and doesn't work with the new Clang
- Access object members by prefixing them with a ., like
this:MOV ECX, .Member[EAX]. The member names must be unique! If
in trouble a quick anonymous union might solve the problem.
- When making assembly macros it's a good idea to put a ; after
each instruction, although it is possible to make multi line
32 bit Clang x86
Should be largely the same as BCC64, but isn't quite, it has
problems with borlands mangled symbold and their @'s
- Still uses Borland fastcall calling convention.
- Don't know about exceptions. Linker seems to still be the
same, so exceptions might be as well.
- Don't know about XMM registers, seems likely that Clang/bcc32c
uses them. EDIT: Actually BCC32C seems not to use them,
at least not with default settings.
- Extended assembly can't preserve EBP - or ESP unsurprisingly.
- Extended assembly Makes no attempt to preserve MMx or XMMx
registers. Code still uses x87 for FP.
64 bit x64
Yes, i know it's EXTREMELY annoying and UGLY. If you leave out
the.intel_syntax you get .att_syntax, which, believe it or not, is
even more EXTREMELY annoying and UGLY. EDIT: 10 Update 1
status: Error reporting in the message window seems to only be able
to highlight the right line for an error if you use "\n" instead of
" Instruction 1;"
" Instruction 2;");
for pure assembly methods use:
void __fastcall __declspec(naked) Function()
" Instruction 1\n"
" Instruction 2\n");
or use __attribute__((naked)) instead of __declspec(naked)
Currently there is no need to remember the RET instruction, the
bugged implementation of the naked attribute adds one, but you might
as well be safe, the worst that can happen is a 6.25 % chance of
wasting 16 bytes because of alignment.
- The implementation of the naked attribute is bugged. In debug
mode your function MUST be void Function(void), instructions for
parameters and return values WILL be generated without the code
to setup a stack frame, and they WILL thrash your stack and
destroy your return value. In release mode with no debugging
info you'll be OK. This is not quite as bad as it sounds, you
can use the 32 bit tip for using multiple entries to the same
function: Put the correct prototypes in the header, then use
#pragma alias to connect them to your implementation. Again
BEWARE that #pragma alias works with mangled names! BEWARE that
methods have this as a hidden parameter, and that return values
that can't be returned in a register will also generate a hidden
parameter! So stick with void Function(void)! EDIT: 10
Update 1 status: Still adds the RET. Internal compiler error if
returning a pointer. Otherwise seems the bugs are fixed.
- Know your calling convention: Parameters are passed in
RCX/XMM0, RDX/XMM1, R8/XMM2, R9/XMM3. Return value in RAX. R10,
R11, XMM4, XMM5 can be freely used. MMX and x87 registers should
also all be available, but their use is not recomended. BEWARE
of who is supposed to clean up any parameters on the stack. For
methods this is a first hidden parameter. For functions
with return values that can't be returned in a register a
pointer to the return object/struct/union is a first hidden
parameter. In a method with a non register return this
is the first hidden parameter, the pointer to the return
object/struct/union is the second.
- In BCC64 you can take the address of, and call your labels.
You can also call mangled names, so that includes class methods,
object methods and functions. Probably also provides access to
global and class variables.
- Local labels of the form 0: 1: 2: 3: ... don't work in intel
mode. Local symbols might, not sure yet. EDIT: Local
labels do work in .intel_syntax, but you have to put () around
their use, like (1b) to access the last 1: before. Local symbols
with .LSymbol: seems to be OK too, and potentially quite useful,
you can jump between functions. EDIT: 10 Update 1
status: Local labels no longer work in Intel syntax, even with
- Stack use: BEWARE that when calling compiler generated
routines that YOU are supposed to do part of the stack setup
even if the parameters fit in registers! Also beware that the so
called "Red Zone" is only available on Linux. Resist the
temptation to free up registers by using the 32 bytes available
to you on the stack, exceptions will skip the code to restore
the registers. It should be OK to store temp values in the 32
bytes. BTW the 32 bytes should be addressed as [RSP+(8:32)].
- BEWARE of exception handling if you make calls (or access
memory that might be invalid!)! There is apparently a windows
function you must call to register your function to get the
stack properly unwound if a sub routine generates an exception!
EDIT: On closer inspection it seems that you'll be OK so
long as you don't mess with ANYTHING you're not allowed to,
including the stack! No pushing to preserve registers! A normal
function with extended inline assembly can be used if you need
more registers. EDIT: You might be OK pushing to the
stack as long as what ever you are pushing can't possibly be
interpreted as a pointer into code (that is, as a call return
value). In particular that means the following should always be
ok: null, pointers to data, 1 and 2 byte values. No 4 or 8 byte
values, you can't know the address of your code or runtime
- BCC64 doesn't seem to have any way to compile to assembly, so
the site http://gcc.godbolt.org/
might prove useful, it translates C++, including inline
assembly, to assembly. If you select Ubuntu clang version
3.0.6you should get a Clang of about the same vintage as the one
BCC64 is based on. BEWARE: Not an exact clone of BCC64 though,
for instance handles clobber lists different. EDIT: 10
Update 1 status: Borland Clang is up to 3.1, seems 3.2 is best
bet. BEWARE: bcc32c seems quite different from Clang because of
the Borland extensions
- A number of the condition codes for the MOVcc, SETcc and Jcc
instructions are missing! For instance, z, nz, c, nc where you
have to use e, ne, b, ae instead. VERY VERY strange! EDIT:
It's even worse, the SETcc and JCC instructions are just
missing, the CMOVcc instructions have errors for the memory to
register versions, and their parameters reversed for
register to register functions! EDIT: 10 Update 1
status: SETcc, Jcc seems fixed, MOVcc seems worse.
- There is another potentially simpler way to write pure simple
This way you don't need the #pragma alias, and any excess code
being generated by the compiler. You also don't have a
snowball's chance in hell of getting exception handling, so
hands off the relevant registers, especially RSP. AND
REMEMBER THE RET INSTRUCTION. EDIT: 10 Update 1 status: No
longer works, at least not in Clang/bcc32c 32 bit mode.
- .loc directive can be used to generate information for
debugging, including enabling breakpoints in inline assembly.
Use a couple of macros to prefix to every assembly line:
#define IAsmStrify2(P1) #P1
#define IAsmStrify(P1) IAsmStrify2(P1)
#define AsL ".loc 1 " IAsmStrify(__LINE__) " 0\n"
AsL " CLR ESI\n"
AsL " TST EAX\n"
AsL " MOV EBX, ECX\n");
So the big one: extended assembly
Which specifies outputs, inputs and clobbers after the code like
asm("Instructions" : Outputs : Inputs : Clobbers);
Page created by: Sune E. M. Andersen 2015-01-18.
Last updated: 2023-08-18..
- Many tips and warnings for simple inline assembly remain valid
- The usefulness in intel_syntax is severely reduced:
- No register or memory inputs or outputs can be used. The
compiler insists on using att_syntax for the registers, and
stuff like -40(%rsp) or %rcx is not tolerated in intel
mode. EDIT: You can use "a", "b", "c", "d", "S", "D"
for RAX, RBX, RCX, RDX, RSI, RDI registers, then you don't
have to use %N to get access to them. Limits the compilers
optimization choices though.
- The clobber list seems to be useless, or maybe in my
simple test the effect was simply optimized away. Either way
beware if it tries to use a non existent stack frame! EDIT:
Not so, further tests reveal that the clobber list works
OK,and is implemented with PUSH'es. It just seems that in
some situations the effect might get optimized away. EDIT
2: No pushes for XMM registers of course, SUB RSP,X +
MOVAPS is used.
- And not only does the clobber mechanism work, it's aware
of which registers are volatile, and it doesn't bother to
preserve them. So you can use it to investigate the calling
conventions of your architecture.
- However the clobber implementation isn't perfect: If it's
activated it also PUSH'es RAX but doesn't bother to restore
it if there's no return value.
- Constant inputs are possible, and useful: "i"
(&Class::Member) provides offsets to object members.
BEWARE that using the constant require %c0 instead of %0, %0
inserts att_syntax constants with a $ prefix!
- This isn't necessarily as bad as it sounds, if you write
entire routines in assembly you already know your inputs and
outputs from the calling convention.
- .att_mode works for memory, constants and registers, although
I'm not sure about the clobber list here either. EDIT:
Works as expected.
- Remember: att_syntax require extra % signs in extended
- Feeling tempted to use att syntax to access inputs and
outputs, and intel syntax for the rest? No such luck, the
.att_syntax directive doesn't work in BCC64. You start out in
att syntax, but once you change to intel syntax you're stuck
there. EDIT: 10 Update 1 status: You can now switch back
- BEWARE of optimizations! Clang/BCC64 can inline functions with
extended assembly, and it will optimize away any parameters and
outputs you don't use. the volatile keyword might help.
- const AnsiString& parameters can be added as inputs to
extended asm in 2 ways: Parm or &Parm. &Parm passes the
reference directly, Parm dereferences it for you, but in either
case you must remember to test for an empty string. Note that
the Parm form might well have merit since it can be used
directly without changing it, and thus it can be used as a pure
input. See below.
- BEWARE: Officially you are not allowed to modify your extended
assembly inputs, presumably not even temporarily if there's
achance of needing to handle exceptions! A simple solution is to
specify your inputs as outputs as well, BCC64 will even allow
you to specify a const value as an output! BEWARE that you are
not allowed to use the obvious solution of specifying your
inputs in the clobber list as well.