NAME
bit - Binary Improvement Tool
SYNOPSIS
bit instrument [ general_option|instrument_option]... target
bit analyze [ general_option|analyze_option]... target
bit optimize [ general_option|optimize_option]... target
bit collect [ general_option|optimize_option|analyze_option
]... target [ target_arguments ]
bit -V
bit -?
Where:
general_option:= {-i {on|static|off}|-d directory|-s suffix
|-n|-V|-v
instrument_option:= {-m {on|off}|-b suffix}
optimize_option:= {-O {0|1|2}|-f|-Q{y|n}|
-xinline[= v[,v]...]}
analyze_option:= {-o filename|-o experiment-name.er |-e|-E
filterspec| -C comment |-A {on|off|copy}|-a report}
report:={ifreq[={N|function}|cc[={N|function}]|bbc[={N|function}]|
branch[={N|function}]}
AVAILABILITY
***********************************************************
SUN PROPRIETARY AND CONFIDENTIAL -- FOR INTERNAL USE ONLY
***********************************************************
Internal access, Sun Studio 11
DESCRIPTION
bit is a suite of tools for improving binaries. These tools
are used via four subcommands:
The instrument subcommand instruments a binary (the target)
so that when the instrumented target is run, it creates an
instrumentation data file with information about the execu-
tion of target.
The analyze subcommand uses the instrumentation data to pro-
duce reports on instruction execution. Analysis reports can
be generated in ascii text, or as an experiment. An experi-
ment may be examined with a GUI (analyzer) or a command-line
program (er_print).
The optimize subcommand uses the instrumentation data to
optimize target.
The collect subcommand combines all the other subcommands
with a target run. It first instruments target, then runs
it, passing target_arguments at run time. Then it analyzes
it if certain analyze_options are present on the command
line, and/or optimizes it, if certain optimize_options are
present on the command line.
target is the path name of the executable for which you want
to collect performance data. (bit will not search PATH to
find target.) target must be compiled with any optimization
-xO1 or greater, and must be prepared by compiling with -
xbinopt=prepare. In order to see annotated source when view-
ing the experiment, target should be compiled with the -g
flag, and should not be stripped.
Example Use
Typically one would use the commands in a sequence to
instrument, run, and analyze or optimize. To create a simple
experiment, use this sequence of commands:
bit instrument a.out
a.out.instr < input1
bit analyze -e a.out
Or to optimize a binary:
bit instrument a.out
a.out.instr < input1
bit optimize a.out
The examples above can be rewritten more simply using the
collect subcommand:
bit collect -e a.out < input1
and
bit collect -O1 a.out < input1
OPTIONS
If invoked with no arguments, print a usage message.
General options
-i {on|static|off}
Instrumentation data source. This option is not allowed
in the instrument subcommand.
on Use dynamic instrumentation data. This is the
default mode. The data is stored in a file called
target.instr by default.
off Do not use instrumentation data. This is only
applicable in the optimize subcommand.
static
Analyze an executable statically. Every instruc-
tion in the program is assumed to execute once.
This is only applicable in the analyze or collect
subcommands. In the collect subcommand, disable
the instrument and target-run phases.
-d directory
Place the experiment, the instrumented binary, and the
instrumentation data file in directory. By default,
use the current working directory.
-s suffix
Add suffix to target for the instrumented data file
name. The default is ".instrdata"
-n Print the commands that would be run without executing
them.
-V Print the current version. Do not examine further
arguments, and do no further processing.
-v Print the current version and verbose information about
the the commands being executed.
Instrument options
-m {on|off}
-m on means instrument for multithreading. Any mul-
tithreaded application should be instrumented with this
option. The default is on. Turning it off for single-
threaded applications may result in faster instrumenta-
tion runs.
-b suffix
Add suffix to target for the instrumented binary name.
The default is ".instr"
Optimize options
These options are valid for the optimize and collect subcom-
mands.
In the collect subcommand, -O and -f turn on optimizations.
If neither of these options are present, any other optimize
options are ignored.
-O{0|1|2}
Optimize target. The optimized target overwrites the
original. At level 0, no optimizations are performed.
At level 1, do code reordering optimizations. At level
2, data-flow information is constructed and more
aggressive optimizations like inlining and address
related optimizations are performed.
The default optimization level is 1 when the optimize
subcommand is used. There is no default when the
collect subcommand is used; a -On must be given to turn
on optimization.
-f Finalize the output binary so that no more binary
optimizations may be performed.
-Q{y|n}
If -Qy is used, identification information is added to
the output binary. If -Qn is used, this information is
not added. -Qy is the default.
-xinline[=v[,v]...]
where v is [{%auto,func_name,no%func_name}].
Inline only those functions specified in the list. The
list is comprised of either a comma-separated list of
function names, or a comma separated list of
no%func_name values, or the value %auto. If
no%func_name is specified, do not inline func_name. If
%auto is specified, attempt to automatically inline
functions.
Analyze options
These options are valid in the analyze and collect subcom-
mands. In the collect subcommand, -e, -E, and -a turn on
analysis. If none of these options are present, any other
analyze options are ignored.
At least one -e or -E option must be provided
in order to create an experiment.
-o filename
If filename does not end in ".er", write textual
reports (see -a report below) to filename. Multiple -o
filename options may be given; each one affects the
destination of subsequent -a report options on the com-
mand line. Default is standard out.
-o experiment-name .er
Use experiment-name as the name of the experiment to be
recorded. Only one -o experiment-name .er option is
allowed on the command line.
If -o experiment-name .er is not specified, and experi-
ment generation is requested with the -e and/or -E
options, record an experiment with a name in the form
stem.n.er, where stem is a string, and n is a number.
If a -g argument is given, use the string appearing
before the .erg suffix in the group name as the stem
prefix; if no -g argument is given, set the stem prefix
to "test".
If the name is not specified in the form stem.n.er,
and the given name is in use, print an error message
and do not generate an experiment. If the name is of
the form stem.n.er, and the name is in use, record the
experiment under a name corresponding to the first
available value of n that is not in use. Issue a warn-
ing if the name is changed.
-e Create an experiment with simulated hardware counters
representing function count, instruction executed
count, and instruction annulled count.
-E filterspec
Generate a custom counter in the experiment, which will
be viewed in its own column in analyzer or er_print.
See the FILTERSPEC section for more information. Any
number of -E options can be given. Each unique -E
option will produce one counter in the experiment.
-C comment
Put the comment, either a single token, or a quoted
string, into the experiment. Up to ten comments may be
provided.
-g group_name
Consider the experiment to be part of experiment group
group_name. The group_name string must end in ".erg";
if not, report an error and do not create the experi-
ment.
-A option
Control whether or not load-objects used by the target
process should be copied into the recorded experiment.
The allowed values of option are:
Value Meaning
on Archive load objects into the experiment.
off Do not archive load objects into the experi-
ment.
copy Copy and archive load objects into the exper-
iment.
If the user copies experiments onto a different
machine, or reads them on a different machine, the user
should specify -A copy. Note that doing so does not
copy any sources or object files. It is the responsi-
bility of the user to ensure that those files are
accessible on the machine where the experiment is
copied.
-a report
Write a textual (ascii) report to the current output
filename (see -o filename above.) In any of these
reports, if the optional argument (=N or =function) is
not given, or if a limit of 0 is given, the report cov-
ers the whole program. Available reports are:
ifreq[=N]
Instruction frequency. Print a profile of instruc-
tion execution counts for the sum of the top N hot
functions, in descending order of frequency.
Example:
bit analyze -a ifreq a.out | head -13
Instruction frequencies for whole program
Instruction Executed (%)
TOTAL 169067648498 (100.0)
float ops 170346 ( 0.0)
float ld st 170346 ( 0.0)
load store 36788000338 ( 21.7)
load 25144202260 ( 14.8)
store 11643798078 ( 6.8)
-------------------------------------------
Instruction Executed (%) Annulled In Delay Slot
add 16935512560 ( 10.0) 2992 3112858420
br 16762242816 ( 9.9) 0 0
sll 14368909396 ( 8.4) 16 916733870
subcc 13842547720 ( 8.1) 0 1938670930
ifreq=<function>
Prints an instruction frequency breakdown for the
named function.
cc[=N]
Caller-callee report. Prints the top N hottest
caller-callee edges. JMPL's (dynamic calls) are
indicated by the function name "**INDIRECT
CALL**".
Example:
bit analyze -a cc=9 a.out
Top 9 caller-callee edges
Count Caller ---> Callee
563227968 compress_block ---> send_bits
397429280 deflate ---> ct_tally
263338416 deflate_fast ---> ct_tally
165937792 deflate_fast ---> longest_match
5842268 build_tree ---> bi_reverse
2805034 send_tree ---> send_bits
313216 send_all_trees ---> send_bits
82828 huft_build ---> malloc
82828 inflate_dynamic ---> free
cc=<function>
Prints a list of callers in frequency order, then
the given function, followed by a frequency-
ordered list of callees. All calls through a jmpl
are summed and attributed to "unknown".
Example:
bit analyzer -a cc=deflate a.out
Callers and callees of deflate
Callers and callees of deflate
Callers: Count
15 zip
55 **unknown**
------> 70 deflate
Callees: Count
596143936 ct_tally
18423 fill_window
18213 flush_block
6 deflate_fast
bbc[=N]
Basic Block Count. Prints a list of the top N hot
basic blocks. If the block happens to be a func-
tion entry point, the function name is printed.
The listing includes the PC of the first instruc-
tion of the block and the number of instructions
in the block.
Example:
bit analyze -a bbc=6 a.out
Basic Block Counts for top 6 blocks
Count PC #Instrs Function name
991151488 0x10000e940 24 ct_tally
991151488 0x10000e9cc 2
991151488 0x10000e9dc 16
985851840 0x10000e9a0 11
966297600 0x10000e9d8 1
867284096 0x10000e9d4 1
bbc=<function>
Basic Block Count for all blocks in a function.
branch[=N]
Branch taken/not-taken report. For the top N hot
branches, print branch statistics, including
branch direction (Forward or Backward), total exe-
cution count, taken and not taken counts and per-
centages, and an indication of whether the com-
piler correct set the prediction bit.
Example:
bit analyze -a branch=6 a.out
Branch taken/not taken report for top 6 branches
PC Dir %Taken %Not Compiler Trip Cnt Taken etc...
Taken Prediction
Correct?
10000e998 F 0.5% 99.5% Y 991151488 5299648 ..
10000e9cc F 12.5% 87.5% Y 991151488 123867416 ..
100003668 F 49.4% 50.6% Y 849549760 420060832 ..
100009484 F 1.4% 98.6% Y 834797568 11554240 ..
1000094d4 F 0.2% 99.8% Y 834797568 1372160 ..
1000094e8 F 0.6% 99.4% Y 834797568 5033466 ..
branch=<function>
Print branch taken/not-taken information for all
branches in <function>.
FILTERSPECS
This syntax is used to produce a custom analysis column for
er_print or analyzer. You can use many of these flags in a
single invocation of bit to produce multiple custom columns
in the experiment.
----------------------------------------
Spec for filter parameters to bit flags.
Format:
-E filterspec
filterspec := element [ ':' element ... ]
element := instrselector | limiter | metricspecifier
instrselector := instr | instrgroup
instr := <lower case mnemonic from SPARCinstruction>
instrgroup := 'BR' | 'LD' | 'ST' | 'CALL' | 'JUMP' | 'SAVE'
| 'RESTORE' | 'CMP' | 'BA' | 'BN' | 'CBR' | 'ICALL' |
'SWITCH'
limiter := positive_limiter | 'n' positive_limiter
positive_limiter := 'ds' | 'float'
metricspecifier := metric ['%'] | 'targetmark'
metric := pmetric | 'n' pmetric
pmetric := 'executed' | 'annul' | 'taken' | 'correct' |
'target'
NOTES:
1. All instrselector elements are logically OR'd to
produce a pool of instructions. Each of the limiters is
logically AND'd against the result.
2. If no instrselector is given, all instructions are
selected.
3. The default metricspecifier is 'executed'
4. Only one metricspecifier is allowed.
5. Only one limiter is allowed.
6. The "%" metricspecifier calls out the frequency of
the specified metric vs. the frequency of the contain-
ing block.
7. target is the count is the sum of all counts on
incoming branch edges to the instruction. 'ntarget' is
the fallthrough count.
8. targetmark prints the number of branch edges coming
in to the instruction. The 'n' or '%' modifiers are not
allowed.
Examples:
LD:ST:ds will display the issue count for every load or
store in the program that happens to be placed in a delay
slot.
BR:ds will produce a count of zero for every instruction
because a branch cannot be in a delay slot.
BR:ntaken will produce a count showing how many times each
branch was not taken.
BR:correct% will show how often the compiler branch predic-
tion was correct as a percentage.
nop:nds produces counts for all nops that are not in delay
slots
Mnemonics for instr specifier
add addc addcc addccc alignaddr
alignaddrl and andcc andn andncc
array16 array32 array8 bitextract bmask
bpr br bshuffle call casa
casxa done edge16 edge16l edge16ln
edge16n edge32 edge32l edge32ln edge32n
edge8 edge8l edge8ln edge8n fabsd
fabsq fabss faddd faddq fadds
faligndata fand fandnot1 fandnot1s fandnot2
fandnot2s fands fbr fchksm16 fcmpd
fcmped fcmpeq fcmpeq16 fcmpeq32 fcmpes
fcmpgt16 fcmpgt32 fcmple16 fcmple32 fcmpne16
fcmpne32 fcmpq fcmps fdivd fdivq
fdivs fdmulq fdtoi fdtoq fdtos
fdtox fexpand fitod fitoq fitos
flcmpd flcmps flush flushw fmean16
fmovd fmovq fmovrd fmovrq fmovrs
fmovs fmul8sux16 fmul8ulx16 fmul8x16 fmul8x16al
fmul8x16au fmuld fmuld16x16 fmuld8sux16 fmuld8ulx16
fmulq fmuls fnand fnands fnegd
fnegq fnegs fnor fnors fnot1
fnot1s fnot2 fnot2s fone fones
for fornot1 fornot1s fornot2 fornot2s
fors fpack16 fpack32 fpackfix fpadd16
fpadd16s fpadd32 fpadd32s fpadds16 fpadds16s
fpadds32 fpadds32s fpmerge fpmovc16 fpmovc32
fpsub16 fpsub16s fpsub32 fpsub32s fpsubs16
fpsubs16s fpsubs32 fpsubs32s fqtod fqtoi
fqtos fqtox fshl16 fshl32 fshlas16
fshlas32 fshra16 fshra32 fshrl16 fshrl32
fsmuld fsqrtd fsqrtq fsqrts fsrc1
fsrc1s fsrc2 fsrc2s fstod fstoi
fstoq fstox fsubd fsubq fsubs
fxnor fxnors fxor fxors fxtod
fxtoq fxtos fzero fzeros illtrap
jmpl ld lda ldd ldda
ldq ldqa ldsb ldsba ldsh
ldsha ldstub ldstuba ldsw ldswa
ldub lduba lduh lduha lduw
lduwa ldx ldxa lzd membar
mov movr mulscc mulx nop
or orcc orn orncc pdist
popc prefetch prefetcha rd rdpr
restore restored retry return save
saved sbshuffle sdiv sdivcc sdivx
sethi sfabss sfadds sfcmpseq sfcmpsgt
sfcmpsle sfcmpsne sfitos sfmuls sfnegs
sfstoi sfsubs shutdown siam sir
sll smul smulcc sra srl
st sta stb stba stbar
std stda sth stha stq
stqa stw stwa stx stxa
sub subc subcc subccc swap
swapa taddcc taddcctv trap tsubcc
tsubcctv udiv udivcc udivx umul
umulcc wr wrpr xnor xnorcc
xor xorcc
INSTRUMENTATION DATA DETAILS
Only Annotated Code is Counted
bit can only instrument and count (for analyze) or
optimize code which has been annotated by the compiler
by compiling it with -xbinopt=prepare and an optimiza-
tion level of -xO1 or above. Specifically excluded are
assembly language modules, functions which contain
"asm" statements or .il templates, C++ template func-
tions, and modules compiled with -xF.
Only the code which is linked into the executable is
instrumented. Shared libraries and dynamic libraries
are excluded. Extremely small functions in annotated
code modules are not counted when called from nonanno-
tated code.
Some instructions may be overcounted in the face of
asynchronous events, for example if a signal handler
calls longjmp().
Hardware Counter Overflow Profiling
The experiment generated appears as a hardware-
counter-overflow profiling experiment with multiple
counters in the same run, generated on the uninstru-
mented target. Only leaf-PCs are captured. No CPU,
thread, or LWP IDs are recorded---the data is aggre-
gated across all threads and CPUs. No timestamps are
recorded.
The counters generated depend on the arguments passed
to bit. See the Analyze Options section above.
Instruction frequency metrics
A bit experiment contains summary data describing the
execution frequency of various instructions in the run.
The data is shown in response to the ifreq command in
er_print, and on the Inst.Freq. tab in the Analyzer.
EXIT STATUS
The following exit values are returned:
0 Successful completionbit
1 An error occurred.
SEE ALSO
analyzer(1), binopt(1), collect(1), er_archive(1),
er_bit(1), er_cp(1), er_export(1), er_mv(1), er_print(1),
er_rm(1), er_src(1), and the Performance Analyzer manual.