NAME
inline - in-line procedure call expander
DESCRIPTION
Assembly language call instructions are replaced by a copy
of their corresponding function body obtained from the
inline template (*.il) file.
Inline files have a suffix of .il,
for example: % CC foo.il hello.c
Inlining is done by code generator (cg).
USAGE
Each inlinefile contains one or more labeled assembly
language templates of the form:
inline-directive
instructions
...
.end
where the instructions constitute an in-line expansion of
the named routine. An inline-directive is a command of the
form:
.inline identifier, argsize
This declares a block of code for the routine named by iden-
tifier, with argsize as the total size of the routine's
arguments, in bytes. Calls to the named routine are
replaced by the code in the in-line template.
NOTE:
The value of argsize is ignored but the argument should be
included for compatibility with compiler versions predating
the Sun WorkShop[tm] 5.0 compilers.
Multiple templates are permitted; matching templates after
the first are ignored. Duplicate templates may be placed in
order of decreasing performance of the corresponding
hardware; thus the most efficient usable version will be
selected.
Coding Conventions for all Sun Systems
Inline templates should be coded as expansions of C-
compatible procedure calls, with the difference that the
return address cannot be depended upon to be in the expected
place, since no call instruction will have been executed.
Inline templates must conform to standard Sun parameter
passing and register usage conventions, as detailed below.
They must not call routines that violate these conventions;
for example, assembly language routines such as setjmp(3c)
may cause problems.
Registers other than the ones mentioned below must not be
used or set.
Branch instructions in an in-line template may only transfer
to numeric labels (1f, 2b, and so on) defined within the
in-line template. No other control transfers are allowed.
Templates do not need ret or retl instructions, and should
not include them.
Only opcodes and addressing modes generated by Sun compilers
are guaranteed to work. Binary encodings of instructions
are not supported.
Coding Conventions for SPARC Systems
Arguments are passed in registers %o0-%o5, followed by
memory locations starting at [%sp+0x5c]. %sp is guaranteed
to be 64-bit aligned. The contents of %o7 are undefined,
since no call instruction will have been executed.
Results are returned in %o0 or %f0/%f1.
Registers %o0-%o5 and %f0-%f31 may be used as temporaries.
Integral and single-precision floating-point arguments are
32-bit aligned.
Double-precision floating-point arguments are guaranteed to
be 64-bit aligned if their offsets are multiples of 8.
Each control-transfer instruction (branches and calls) must
be immediately followed by a nop.
Call instructions must include an extra (final) argument
which indicates the number of registers used to pass parame-
ters to the called routine.
Note that for SPARC systems, the instruction following an
expanded 'call' is deleted.
Coding Conventions for x86 Systems
Arguments are passed on the stack. Since no call instruction
was issued, the first argument is at (%esp), the second
argument is at 4 (%esp), etc. Integer results of 32 bits or
less are returned in %eax, 64-bit integer results are
returned in %edx:%eax. Floating point results are returned
in %st(0).
The code may use registers %eax, %ecx and %edx. The values
in any other registers must be preserved. The floating point
stack will be empty at the start of the inline expansion
template, and must be empty (except for a returned floating
point value) at the end.
SPECIAL x86 NOTE
Programs compiled with -xarch={sse|sse2} to run on Solaris
x86 SSE/SSE2 Pentium 4-compatible platforms must be run only
on platforms that are SSE/SSE2 enabled. Running such pro-
grams on platforms that are not SSE/SSE2-enabled could
result in segmentation faults or incorrect results occuring
without any explicit warning messages. Patches to the OS and
compilers to prevent execution of SSE/SSE2-compiled binaries
on platforms not SSE/SSE2-enabled might be made available at
a later date.
OS releases starting with Solaris 9 update 6 are SSE/SSE2-
enabled on Pentium 4-compatible platforms. Earlier versions
of Solaris OS are not SSE/SSE2-enabled.
This warning extends also to programs that employ .il inline
assembly language functions or __asm() assembler code that
utililize SSE/SSE2 instructions.
If you compile and link in separate steps, always link using
the compiler and with -xarch={sse|sse2} to ensure that the
correct startup routine is linked.
Coding Conventions for AMD-64 Platforms
Arguments are passed according to their classification. The
classification includes integer-, sse- and memory-arguments.
Arguments of types (signed and unsigned) _Bool, char, short,
int, long, long long and pointers are integer arguments.
Arguments of aggregate types (struct,union,array) of size
less than or equal to 16 bytes and that contain aligned
members of types _Bool, char, short, int, long, long long
and pointers are also integer.
Arguments of types float and double are sse arguments.
Arguments of aggregate types of size less than or equal to
16 bytes and that contain aligned members of types float and
double are also sse.
Arguments of types long double and of aggregate types of
size greater than 16 bytes, or with unaligned members are
memory arguments.
Integer arguments are passed in integer registers by the
next sequence: %rdi, %rsi, %rdx, %rcx, %r8 and %r9. One
integer argument of aggregate type can hold up to 2 integer
registers. If the number of integer arguments is greater
than 6, the 7th and next integer arguments are considered as
memory arguments.
Sse arguments are passed in sse registers in the order from
%xmm0 to %xmm7. One sse argument of aggregate type can hold
up to 2 sse registers, each sse register holds up to 8 bytes
of argument. For example, argument of type double complex
is passed in 2 consequent see registers, argument of type
float complex is passed in 1 see register. If the number of
sse arguments is greater than 8, the 9th and next sse argu-
ments are considered as memory arguments.
Integer and sse arguments are numbered independently.
Memory arguments are passed on the stack in order from right
to left how they appear in function arguments list. Each
argument on stack is aligned according to its size, on 8 if
size is less or equal to 8, on 16 otherwise. at the start of
the inline expansion template stack is aligned on 16.
Since no call instruction was issued, the first memory argu-
ment is at (%rsp), the second argument is at 8(%rsp) or at
16(%rsp) depending on the first memory argument size and the
second memory argument alignment, etc.
Returning values are classified in the same way as argu-
ments.
Integer results of 8 bytes or less are returned in %rax,
integer results of 9 to 16 bytes are returned in %rdx:%rax.
Sse results are returned depending on their size too, in
%xmm0 or in %xmm1:%xmm0.
Results of type long double are returned in %st(0).
If returning value is of type long double complex, the real
part of the value is returned in %st0 and the imaginary part
in %st1.
For memory results the caller provides space for the return
value and passes the address of this storage in %rdi as if
it were the first argument to the function. In effect, this
address becomes a hidden first argument. On return %rax
will contain the address that has been passed in by the
caller in %rdi.
The code may not change register %rbp. The floating point
stack will be empty at the start of the inline expansion
template, and must be empty (except for a returned floating
point value) at the end.
EXAMPLES
Please review libm.il or vis.il for examples. You can find a
version of these libraries that is specific to each sup-
ported architecture under the compiler's lib/ directory.
WARNING
inline does not check for violations of the coding conven-
tions described above.
SEE ALSO:
"Techniques for Optimizing Applications: High Performance
Computing" by Rajat P. Garg and Ilya Sharapov uses Fortran
to provide a useful explanation of inline templates. See
Chapter 8.
"The SPARC Architecture Manual Version 9" provided by SPARC
International Inc. at http://www.sparc.com/resource.htm. See
appendix G.