Man Page bit.3f




NAME

     bit - Binary Improvement Tool


SYNOPSIS

     bit instrument [ general_option|instrument_option]... target
     bit analyze [ general_option|analyze_option]... target
     bit optimize [ general_option|optimize_option]... target
     bit collect [  general_option|optimize_option|analyze_option
     ]...   target [ target_arguments ]
     bit -V
     bit -?

     Where:
     general_option:= {-i {on|static|off}|-d directory|-s suffix
     |-n|-V|-v
     instrument_option:= {-m {on|off}|-b suffix}
     optimize_option:= {-O {0|1|2}|-f|-Q{y|n}|
      -xinline[= v[,v]...]}
     analyze_option:= {-o filename|-o  experiment-name.er  |-e|-E
     filterspec| -C comment |-A {on|off|copy}|-a  report}
     report:={ifreq[={N|function}|cc[={N|function}]|bbc[={N|function}]|
     branch[={N|function}]}


AVAILABILITY

     ***********************************************************
     SUN PROPRIETARY AND CONFIDENTIAL -- FOR INTERNAL USE ONLY
     ***********************************************************
     Internal access, Sun Studio 11


DESCRIPTION

     bit is a suite of tools for improving binaries. These  tools
     are used via four subcommands:

     The instrument subcommand instruments a binary (the  target)
     so  that  when the instrumented target is run, it creates an
     instrumentation data file with information about the  execu-
     tion of target.

     The analyze subcommand uses the instrumentation data to pro-
     duce  reports on instruction execution. Analysis reports can
     be generated in ascii text, or as an experiment.  An experi-
     ment may be examined with a GUI (analyzer) or a command-line
     program (er_print).

     The optimize subcommand uses  the  instrumentation  data  to
     optimize target.

     The collect subcommand combines all  the  other  subcommands
     with  a  target  run. It first instruments target, then runs
     it, passing target_arguments at run time. Then  it  analyzes
     it  if  certain  analyze_options  are present on the command
     line, and/or optimizes it, if certain  optimize_options  are
     present on the command line.

     target is the path name of the executable for which you want
     to  collect  performance  data. (bit will not search PATH to
     find target.) target must be compiled with any  optimization
     -xO1  or  greater,  and must be prepared by compiling with -
     xbinopt=prepare. In order to see annotated source when view-
     ing  the  experiment,  target should be compiled with the -g
     flag, and should not be stripped.


  Example Use
     Typically one would  use  the  commands  in  a  sequence  to
     instrument, run, and analyze or optimize. To create a simple
     experiment, use this sequence of commands:

          bit instrument a.out
          a.out.instr < input1
          bit analyze -e a.out

     Or to optimize a binary:

          bit instrument a.out
          a.out.instr < input1
          bit optimize a.out

     The examples above can be rewritten more  simply  using  the
     collect subcommand:

          bit collect -e a.out < input1
     and
          bit collect -O1 a.out < input1



OPTIONS

     If invoked with no arguments, print a usage message.

  General options
     -i {on|static|off}
          Instrumentation data source. This option is not allowed
          in the instrument subcommand.

          on   Use dynamic  instrumentation  data.  This  is  the
               default  mode. The data is stored in a file called
               target.instr by default.

          off  Do not use  instrumentation  data.  This  is  only
               applicable in the optimize subcommand.

          static
               Analyze an executable statically.  Every  instruc-
               tion  in  the  program is assumed to execute once.
               This is only applicable in the analyze or  collect
               subcommands.  In  the  collect subcommand, disable
               the instrument and target-run phases.

     -d directory
          Place the experiment, the instrumented binary, and  the
          instrumentation  data  file  in directory.  By default,
          use the current working directory.

     -s suffix
          Add suffix to target for  the  instrumented  data  file
          name.  The default is ".instrdata"

     -n   Print the commands that would be run without  executing
          them.

     -V   Print the current  version.   Do  not  examine  further
          arguments, and do no further processing.

     -v   Print the current version and verbose information about
          the the commands being executed.

  Instrument options
     -m {on|off}
          -m on means instrument  for  multithreading.  Any  mul-
          tithreaded application should be instrumented with this
          option. The default is on. Turning it off  for  single-
          threaded applications may result in faster instrumenta-
          tion runs.

     -b suffix
          Add suffix to target for the instrumented binary  name.
          The default is ".instr"

  Optimize options
     These options are valid for the optimize and collect subcom-
     mands.

     In the collect subcommand, -O and -f turn on  optimizations.
     If  neither of these options are present, any other optimize
     options are ignored.

     -O{0|1|2}
          Optimize target.  The optimized target  overwrites  the
          original.   At level 0, no optimizations are performed.
          At level 1, do code reordering optimizations.  At level
          2,   data-flow  information  is  constructed  and  more
          aggressive  optimizations  like  inlining  and  address
          related optimizations are performed.

          The default optimization level is 1 when  the  optimize
          subcommand  is  used.  There  is  no  default  when the
          collect subcommand is used; a -On must be given to turn
          on optimization.

     -f   Finalize the output  binary  so  that  no  more  binary
          optimizations may be performed.

     -Q{y|n}
          If -Qy is used, identification information is added  to
          the  output binary. If -Qn is used, this information is
          not added. -Qy is the default.

     -xinline[=v[,v]...]
          where v is [{%auto,func_name,no%func_name}].
          Inline only those functions specified in the list.  The
          list  is  comprised of either a comma-separated list of
          function  names,  or  a   comma   separated   list   of
          no%func_name   values,   or   the   value   %auto.   If
          no%func_name is specified, do not inline func_name.  If
          %auto  is  specified,   attempt to automatically inline
          functions.

  Analyze options
     These options are valid in the analyze and  collect  subcom-
     mands.  In  the  collect  subcommand, -e, -E, and -a turn on
     analysis. If none of these options are  present,  any  other
     analyze options are ignored.

     At least one -e or -E option must be provided
          in order to create an experiment.

     -o filename
          If filename  does  not  end  in  ".er",  write  textual
          reports  (see -a report below) to filename. Multiple -o
          filename options may be given;  each  one  affects  the
          destination of subsequent -a report options on the com-
          mand line. Default is standard out.

     -o experiment-name .er
          Use experiment-name as the name of the experiment to be
          recorded.  Only  one  -o  experiment-name .er option is
          allowed on the command line.

          If -o experiment-name .er is not specified, and experi-
          ment  generation  is  requested  with  the -e and/or -E
          options, record an experiment with a name in  the  form
          stem.n.er,  where  stem is a string, and n is a number.
          If a -g argument is given,  use  the  string  appearing
          before  the  .erg  suffix in the group name as the stem
          prefix; if no -g argument is given, set the stem prefix
          to "test".

          If the name is not specified in  the  form   stem.n.er,
          and  the  given  name is in use, print an error message
          and do not generate an experiment.  If the name  is  of
          the form  stem.n.er, and the name is in use, record the
          experiment under a  name  corresponding  to  the  first
          available  value of n that is not in use. Issue a warn-
          ing if the name is changed.

     -e   Create an experiment with simulated  hardware  counters
          representing   function   count,  instruction  executed
          count, and instruction annulled count.

     -E filterspec
          Generate a custom counter in the experiment, which will
          be  viewed  in  its own column in analyzer or er_print.
          See the FILTERSPEC section for  more  information.  Any
          number  of  -E  options  can  be  given. Each unique -E
          option will produce one counter in the experiment.

     -C comment
          Put the comment, either a single  token,  or  a  quoted
          string, into the experiment.  Up to ten comments may be
          provided.

     -g group_name
          Consider the experiment to be part of experiment  group
          group_name.   The group_name string must end in ".erg";
          if not, report an error and do not create  the  experi-
          ment.

     -A option
          Control whether or not load-objects used by the  target
          process  should be copied into the recorded experiment.
          The allowed values of option are:

          Value     Meaning

          on        Archive load objects into the experiment.

          off       Do not archive load objects into the  experi-
                    ment.

          copy      Copy and archive load objects into the exper-
                    iment.

          If  the  user  copies  experiments  onto  a   different
          machine, or reads them on a different machine, the user
          should specify -A copy.  Note that doing  so  does  not
          copy  any  sources or object files. It is the responsi-
          bility of the user  to  ensure  that  those  files  are
          accessible  on  the  machine  where  the  experiment is
          copied.

     -a report
          Write a textual (ascii) report to  the  current  output
          filename  (see  -o  filename  above.)   In any of these
          reports, if the optional argument (=N or =function)  is
          not given, or if a limit of 0 is given, the report cov-
          ers the whole program.  Available reports are:


          ifreq[=N]
               Instruction frequency. Print a profile of instruc-
               tion execution counts for the sum of the top N hot
               functions, in descending order of frequency.
                  Example:
                  bit analyze -a ifreq a.out | head -13
               Instruction frequencies for whole program
               Instruction               Executed     (%)
                TOTAL                169067648498 (100.0)
                float ops                  170346 (  0.0)
                float ld st                170346 (  0.0)
                load store            36788000338 ( 21.7)
                load                  25144202260 ( 14.8)
                store                 11643798078 (  6.8)
               -------------------------------------------
               Instruction               Executed     (%)        Annulled   In Delay Slot
                add                   16935512560 ( 10.0)            2992      3112858420
                br                    16762242816 (  9.9)               0               0
                sll                   14368909396 (  8.4)              16       916733870
                subcc                 13842547720 (  8.1)               0      1938670930


          ifreq=<function>
               Prints an instruction frequency breakdown for  the
               named function.

          cc[=N]
               Caller-callee report. Prints  the  top  N  hottest
               caller-callee  edges.   JMPL's (dynamic calls) are
               indicated  by  the   function   name   "**INDIRECT
               CALL**".
                 Example:
                 bit analyze -a cc=9 a.out
               Top 9 caller-callee edges
                              Count    Caller ---> Callee
                          563227968    compress_block ---> send_bits
                          397429280    deflate ---> ct_tally
                          263338416    deflate_fast ---> ct_tally
                          165937792    deflate_fast ---> longest_match
                            5842268    build_tree ---> bi_reverse
                            2805034    send_tree ---> send_bits
                             313216    send_all_trees ---> send_bits
                              82828    huft_build ---> malloc
                              82828    inflate_dynamic ---> free

          cc=<function>
               Prints a list of callers in frequency order,  then
               the  given  function,  followed  by  a  frequency-
               ordered list of callees. All calls through a  jmpl
               are summed and attributed to "unknown".
                 Example:
                     bit analyzer -a cc=deflate a.out
               Callers and callees of deflate
               Callers and callees of deflate
               Callers:       Count
                                 15    zip
                                 55    **unknown**
               ------>                   70    deflate
               Callees:                       Count
                                          596143936    ct_tally
                                              18423    fill_window
                                              18213    flush_block
                                                  6    deflate_fast

          bbc[=N]
               Basic Block Count. Prints a list of the top N  hot
               basic  blocks.  If the block happens to be a func-
               tion entry point, the function  name  is  printed.
               The  listing includes the PC of the first instruc-
               tion of the block and the number  of  instructions
               in the block.
                  Example:
                  bit analyze -a bbc=6 a.out
               Basic Block Counts for top 6 blocks
                              Count               PC    #Instrs  Function name
                          991151488      0x10000e940         24  ct_tally
                          991151488      0x10000e9cc          2
                          991151488      0x10000e9dc         16
                          985851840      0x10000e9a0         11
                          966297600      0x10000e9d8          1
                          867284096      0x10000e9d4          1

          bbc=<function>
               Basic Block Count for all blocks in a function.

          branch[=N]
               Branch taken/not-taken report. For the top  N  hot
               branches,   print   branch  statistics,  including
               branch direction (Forward or Backward), total exe-
               cution  count, taken and not taken counts and per-
               centages, and an indication of  whether  the  com-
               piler correct set the prediction bit.
                  Example:
                  bit analyze -a branch=6 a.out
               Branch taken/not taken report for top 6 branches
                    PC   Dir %Taken  %Not  Compiler    Trip Cnt           Taken  etc...
                                     Taken Prediction
                                            Correct?
                  10000e998 F    0.5%  99.5%   Y        991151488          5299648  ..
                  10000e9cc F   12.5%  87.5%   Y        991151488        123867416  ..
                  100003668 F   49.4%  50.6%   Y        849549760        420060832  ..
                  100009484 F    1.4%  98.6%   Y        834797568         11554240  ..
                  1000094d4 F    0.2%  99.8%   Y        834797568          1372160  ..
                  1000094e8 F    0.6%  99.4%   Y        834797568          5033466  ..

          branch=<function>
               Print branch taken/not-taken information  for  all
               branches in <function>.




FILTERSPECS

     This syntax is used to produce a custom analysis column  for
     er_print  or  analyzer. You can use many of these flags in a
     single invocation of bit to produce multiple custom  columns
     in the experiment.

     ----------------------------------------
     Spec for filter parameters to bit flags.

     Format:

     -E filterspec

     filterspec :=  element [ ':' element ... ]
     element := instrselector | limiter | metricspecifier
     instrselector := instr | instrgroup
     instr := <lower case mnemonic from SPARCinstruction>

     instrgroup := 'BR' | 'LD' | 'ST' | 'CALL' | 'JUMP'  | 'SAVE'
          |  'RESTORE'  | 'CMP' | 'BA' | 'BN' | 'CBR' | 'ICALL' |
          'SWITCH'
     limiter := positive_limiter | 'n' positive_limiter
     positive_limiter := 'ds' | 'float'
     metricspecifier := metric ['%'] | 'targetmark'
     metric := pmetric | 'n' pmetric
     pmetric := 'executed' | 'annul'  |  'taken'  |  'correct'  |
     'target'

     NOTES:

          1.  All instrselector elements are  logically  OR'd  to
          produce a pool of instructions. Each of the limiters is
          logically AND'd against the result.

          2. If no instrselector is given, all  instructions  are
          selected.

          3. The default metricspecifier is 'executed'
          4. Only one metricspecifier is allowed.

          5. Only one limiter is allowed.

          6. The "%" metricspecifier calls out the  frequency  of
          the  specified metric vs. the frequency of the contain-
          ing block.

          7. target is the count is the  sum  of  all  counts  on
          incoming  branch edges to the instruction. 'ntarget' is
          the fallthrough count.

          8. targetmark prints the number of branch edges  coming
          in to the instruction. The 'n' or '%' modifiers are not
          allowed.


     Examples:
      LD:ST:ds will display the issue count  for  every  load  or
     store  in  the  program that happens to be placed in a delay
     slot.
      BR:ds will produce a count of zero  for  every  instruction
     because a branch cannot be in a delay slot.
      BR:ntaken will produce a count showing how many times  each
     branch was not taken.
      BR:correct% will show how often the compiler branch predic-
     tion was correct as a percentage.
      nop:nds produces counts for all nops that are not in  delay
     slots

  Mnemonics for instr specifier
     add          addc         addcc        addccc        alignaddr
     alignaddrl   and          andcc        andn          andncc
     array16      array32      array8       bitextract    bmask
     bpr          br           bshuffle     call          casa
     casxa        done         edge16       edge16l       edge16ln
     edge16n      edge32       edge32l      edge32ln      edge32n
     edge8        edge8l       edge8ln      edge8n        fabsd
     fabsq        fabss        faddd        faddq         fadds
     faligndata   fand         fandnot1     fandnot1s     fandnot2
     fandnot2s    fands        fbr          fchksm16      fcmpd
     fcmped       fcmpeq       fcmpeq16     fcmpeq32      fcmpes
     fcmpgt16     fcmpgt32     fcmple16     fcmple32      fcmpne16
     fcmpne32     fcmpq        fcmps        fdivd         fdivq
     fdivs        fdmulq       fdtoi        fdtoq         fdtos
     fdtox        fexpand      fitod        fitoq         fitos
     flcmpd       flcmps       flush        flushw        fmean16
     fmovd        fmovq        fmovrd       fmovrq        fmovrs
     fmovs        fmul8sux16   fmul8ulx16   fmul8x16      fmul8x16al
     fmul8x16au   fmuld        fmuld16x16   fmuld8sux16   fmuld8ulx16
     fmulq        fmuls        fnand        fnands        fnegd
     fnegq        fnegs        fnor         fnors         fnot1
     fnot1s       fnot2        fnot2s       fone          fones
     for          fornot1      fornot1s     fornot2       fornot2s
     fors         fpack16      fpack32      fpackfix      fpadd16
     fpadd16s     fpadd32      fpadd32s     fpadds16      fpadds16s
     fpadds32     fpadds32s    fpmerge      fpmovc16      fpmovc32
     fpsub16      fpsub16s     fpsub32      fpsub32s      fpsubs16
     fpsubs16s    fpsubs32     fpsubs32s    fqtod         fqtoi
     fqtos        fqtox        fshl16       fshl32        fshlas16
     fshlas32     fshra16      fshra32      fshrl16       fshrl32
     fsmuld       fsqrtd       fsqrtq       fsqrts        fsrc1
     fsrc1s       fsrc2        fsrc2s       fstod         fstoi
     fstoq        fstox        fsubd        fsubq         fsubs
     fxnor        fxnors       fxor         fxors         fxtod
     fxtoq        fxtos        fzero        fzeros        illtrap
     jmpl         ld           lda          ldd           ldda
     ldq          ldqa         ldsb         ldsba         ldsh
     ldsha        ldstub       ldstuba      ldsw          ldswa
     ldub         lduba        lduh         lduha         lduw
     lduwa        ldx          ldxa         lzd           membar
     mov          movr         mulscc       mulx          nop
     or           orcc         orn          orncc         pdist
     popc         prefetch     prefetcha    rd            rdpr
     restore      restored     retry        return        save
     saved        sbshuffle    sdiv         sdivcc        sdivx
     sethi        sfabss       sfadds       sfcmpseq      sfcmpsgt
     sfcmpsle     sfcmpsne     sfitos       sfmuls        sfnegs
     sfstoi       sfsubs       shutdown     siam          sir
     sll          smul         smulcc       sra           srl
     st           sta          stb          stba          stbar
     std          stda         sth          stha          stq
     stqa         stw          stwa         stx           stxa
     sub          subc         subcc        subccc        swap
     swapa        taddcc       taddcctv     trap          tsubcc
     tsubcctv     udiv         udivcc       udivx         umul
     umulcc       wr           wrpr         xnor          xnorcc
     xor          xorcc


INSTRUMENTATION DATA DETAILS

     Only Annotated Code is Counted
          bit can only instrument  and  count  (for  analyze)  or
          optimize  code which has been annotated by the compiler
          by compiling it with -xbinopt=prepare and an  optimiza-
          tion  level of -xO1 or above. Specifically excluded are
          assembly  language  modules,  functions  which  contain
          "asm"  statements  or .il templates, C++ template func-
          tions, and modules compiled with -xF.

          Only the code which is linked into  the  executable  is
          instrumented.  Shared  libraries  and dynamic libraries
          are excluded.  Extremely small functions  in  annotated
          code  modules are not counted when called from nonanno-
          tated code.
          Some instructions may be overcounted  in  the  face  of
          asynchronous  events,  for  example if a signal handler
          calls longjmp().


     Hardware Counter Overflow Profiling
          The  experiment  generated  appears  as   a   hardware-
          counter-overflow  profiling  experiment  with  multiple
          counters in the same run, generated  on  the  uninstru-
          mented  target.   Only  leaf-PCs  are captured. No CPU,
          thread, or LWP IDs are recorded---the  data  is  aggre-
          gated  across  all threads and CPUs.  No timestamps are
          recorded.

          The counters generated depend on the  arguments  passed
          to bit. See the Analyze Options section above.

     Instruction frequency metrics
          A bit experiment contains summary data  describing  the
          execution frequency of various instructions in the run.
          The data is shown in response to the ifreq  command  in
          er_print, and on the Inst.Freq. tab in the Analyzer.



EXIT STATUS

     The following exit values are returned:

     0    Successful completionbit

     1    An error occurred.



SEE ALSO

     analyzer(1),    binopt(1),    collect(1),     er_archive(1),
     er_bit(1),  er_cp(1),  er_export(1),  er_mv(1), er_print(1),
     er_rm(1), er_src(1), and the Performance Analyzer manual.