Man Page collect.1




NAME

     collect - command used for performance data collection


SYNOPSIS

     collect <args> target <target-args>


AVAILABILITY

     Part of Forte(TM) C, Forte C++, and Forte for  High  Perfor-
     mance Computing.


DESCRIPTION

     The collect command can record a variety of  different  per-
     formance data, and convert each of them into metrics of per-
     formance computed against functions, callers and callees  of
     any function, and against source and disassembly representa-
     tions of the target program. It  also  records  global  data
     with periodic or manual sampling.

     target is the path name of the executable for which you want
     to  collect  performance data. Programs that are targets for
     the collect command can be compiled with any level of optim-
     ization,  but  must  use  dynamic  linking.  If a program is
     statically linked, the collect command prints an error  mes-
     sage.


ARGUMENTS

     If invoked with no arguments, collect prints  a  usage  mes-
     sage, including the names of any hardware counters available
     for profiling.

  Data Specifications
     -p interval
          Collect clock-based profiles.  interval can  be  speci-
          fied as one of the following:

           o the string on implying the default of 10 millisecond
           profiling

           o the string off implying no clock-based profiling

           o a positive, non-zero number, implying that value  is
           to  be  used  as  the profile interrupt, given in mil-
           liseconds.  The value is rounded down to  the  nearest
           multiple  of  the  resolution available on the system,
           except if the value is lower than the  resolution,  in
           which case it is set to the system resolution.

         NOTE: if no data specification arguments  are  given,  a
         default of clock-based profiling is set.

     -s threshold
          Collect synchronization trace data, using the specified
          threshold.   threshold  can  be specified as one of the
          following:

           o the string on or the string calibrate, implying set-
           ting the threshold value by calibration at runtime

           o the string off, implying no synchronization tracing

           o a positive non-zero number, representing the minimum
           delay,  expressed  in microseconds, for an event to be
           traced

           o the string all, implying  that  all  synchronization
           events to be recorded.

     -h counter[,value[,counter2,[value2]]]
          Collect  hardware  counter  overflow   profiles.    The
          counter  name can be either a standard counter name, or
          the internal name, as used by cputrack(1). If the  name
          is an internal name, and it is trailed by a slash and a
          single digit (0 or 1), the event register specified  by
          that  digit  is  used; if not, whichever event register
          supports that named counter is used.  If  the  optional
          value  is  omitted,  or  set  to  an explicit zero, the
          default  value  for  that  counter  is  used.   If  the
          optional  value is specified as the letter h, the high-
          resolution default for that counter is  used.   If  the
          optional  value is specified as a number, that value is
          used.

          An optional second counter and second interval  can  be
          specified  to  allow profiling based on two counters at
          once.  If specified, the value for  the  first  counter
          must  be  specified,  although  it  can be specified as
          zero.  The two named counters specified must be on dif-
          ferent event registers.

          Hardware counter profiling  and  clock-based  profiling
          are  mutually  exclusive.  Only  one of them can record
          data in a given run.  If both are specified,  an  error
          message wil be printed, and no experiment is run.

          NOTE: Hardware counter profiling cannot  be  run  on  a
          system  where  cpustat  is  running, as that code takes
          control of the counters, and does not let a  user  pro-
          cess access them.

     -a   Collect address space data.  Address  space  data  con-
          sists  of  page  referenced  and modified bits for each
          segment of the target's address space.  It is collected
          at sample points only.

          NOTE: address space data is  only  available  on  SPARC
          machines.

     -n   Disable all data  collection.  Used  only  to  simplify
          scripts  for  conditional  data collection, as would be
          used, for example, for collecting only data  from  only
          some MPI processes in a job.

     If no data specification arguments are supplied, clock-based
     profiling data, with the default resolution, is collected.

     If clock-based profiling is explicitly disabled, and neither
     synchronization  tracing, nor hardware counter based profil-
     ing is enabled, the collect command  reports  an  error  and
     fails.

  Experiment Controls
     -x   Leave the target process stopped on the exit  from  the
          exec  system  call,  in  order  to  allow a debugger to
          attach to it.

     -l signal
          Take a sample whenever the given signal is delivered to
          the process.

     -y signal[,r]
          Control recording of profiling data with signal.  When-
          ever  the  given  signal  is  delivered to the process,
          switch between paused (no data is recorded) and resumed
          (data  is recorded) states. The collector is started in
          the resumed state if the optional  ,r  flag  is  given,
          otherwise it is started in the paused state.

  Output Controls
     -o experiment_name
          Use experiment_name as the name of the experiment to be
          recorded.   The  experiment_name string must end in the
          string .er; if not, an error is reported, and no exper-
          iment is run.

          If -o is not specified, a name of the form stem.n.er is
          chosen,  where stem is a string, and n is a number.  If
          a -g argument is given, the string appearing before the
          .erg  suffix in the group name is used as the stem pre-
          fix; if no -g argument is given, the stem prefix is set
          to the string test.

          If the collect command is  launched  from  one  of  the
          various MPI commands and -o is not specified, the value
          of n used in the name is  taken  from  the  environment
          variable  used  to define the MPI Rank of that process;
          otherwise, n is set to the lowest integer not in use.
          If the name is not specified in  the  form   stem.n.er,
          and  the  the given name is in use, an error message is
          printed, and no experiment run.  If the name is of that
          form,  and  the  name  is  in  use,  the  experiment is
          recorded under a name corresponding the first available
          value  of  n that is not in use; a warning is issued if
          the name is changed.

     -d directory_name
          Place the experiment in directory  directory_name.   if
          none is given, they are recorded into the current work-
          ing directory.

     -g group_name
          Consider the experiment to be part of experiment  group
          group_name.   The  group_name  string  must  end in the
          string .erg; if not,  an  error  is  reported,  and  no
          experiment is run.

  Other Arguments
     -V   Print the current version.  No  further  arguments  are
          examined, and no further processing is done.

     -v   Print the current version and further detailed informa-
          tion about the experiment being run.


USING COLLECT WITH MPI

     collect can be used with MPI by simply prefacing the  target
     and its arguments with collect and its arguments in the com-
     mand line that starts the MPI job.  For example:
          % mprun -np 16 a.out 3 5
     can be replaced by:
          % mprun -np 16 collect -d /tmp/mydirectory -g  run1.erg
          a.out 3 5
     to run a default profiling experiment on each of the 16  MPI
     processes,  collecting them all in a specific directory, and
     collecting them as a group.  The individual experiments  are
     named by the MPI rank, as described above.


DATA COLLECTED

  Program-based metrics
     Clock-based Profiling
          Clock-based profiling can run at normal  frequency  (10
          ms.),  high-resolution  frequency  (1 ms.), or a custom
          frequency,  specified  in  milliseconds.    For   high-
          resolution  profiling,  the  operating  system  on  the
          machine must be running with  a  high-resolution  clock
          routine, which can be done by putting the line:

               set hires_tick=1

          in the file /etc/system and rebooting.  High-resolution
          profiles  record ten times as much data for a given run
          as normal profiles.  Attempting to set  high-resolution
          profiling  on a machine whose operating system does not
          support it posts an warning, and reverts to the highest
          resolution  supported. Similarly, a custom setting that
          is not a multiple of the resolution  supported  by  the
          system  is  rounded to the nearest non-zero multiple of
          that resolution, and a warning message given.

          Profiling  produces  data  to  support  the   following
          metrics:

               User CPU Time
               Wall Time
               Total LWP Time
               System CPU Time
               Wait CPU Time
               Text Page Fault Time
               Data Page Fault Time
               Other Wait Time

          For multiprocessor experiments, all of  the  times  are
          summed across all LWPs in the process.  Total time adds
          up to the wall-clock time, multiplied  by  the  average
          number  of LWPs in the process.  Each record contains a
          timestamp, and the thread and LWP IDs at  the  time  of
          the clock tick.

     Synchronization Delay Tracing

          Synchronization delay tracing records all calls to  the
          various   thread  synchronization  routines  where  the
          real-time delay in the call exceeds a specified  thres-
          hold.  Each  record contains a timestamp, the thread ID
          and the LWP ID at the time the  request  is  initiated.
          (Synchronization  requests  from  a  thread can be ini-
          tiated on one  LWP,  but  complete  on  another.)   MPI
          blocking   calls   are   considered   to   be   thread-
          synchronization routines, and are traced where applica-
          ble.

          Synchronization delay tracing produces data to  support
          the following metrics:

               Synchronization Delay Events
               Synchronization Wait Time

     Hardware Counter Overflow Profiling
          Hardware  counter  overflow   profiling   records   the
          callstack  of each LWP at the time the hardware counter
          for the CPU on which it is running overflows.  The data
          also includes a timestamp, and the CPU, thread, and LWP
          IDs.  Hardware counter overflow profiling can  be  done
          only  on UltraSPARC-III systems running the Solaris(TM)
          8 Operating Environment (SPARC Platform Edition) or  on
          Intel  systems (Pentium II and Pentium III) running the
          Solaris 8 Operating Environment  (Intel  Platform  Edi-
          tion).  On other machines, an attempt to set HW counter
          overflow profiling generates an error.

          The counters available depend on the specific CPU  chip
          and  OS.  They can be determined by running the collect
          command with no arguments.  It prints out a usage  mes-
          sage  that  contains  the  names  of the counters.  The
          standard counters are displayed first in the list, fol-
          lowed by a list of all counters.

          Lines for a standard counter are formatted as follows:

               CPU Cycles (cycles = Cycle_cnt/0) 1000003  h=200003

          In this line, the first field, "CPU-Cycles", represents
          the  metric name. The second field, "cycles", gives the
          counter name that should be used in the  -h  counter...
          argument.  The  third  field,  "Cycle_cnt/0", gives the
          internal name as used by cputrack(1) and  the  register
          number  on  which  that  counter  can be used. The next
          field is the default overflow interval,  and  the  last
          field is the default high-resolution overflow interval.

          Lines for the non-standard counters  are  formatted  as
          follows:

               Cycle_cnt/0 events 1000003  h=200003

          In this line, the first field, "Cycle_cnt/0", gives the
          internal  name  as used by cputrack(1) and the register
          number on which that counter can be  used.  The  string
          "Cycle_cnt/0  events"  is  the  metric  name  for  this
          counter. The next field is the default overflow  inter-
          val,  and the last field is the default high-resolution
          overflow interval.

          For counters that count in cycles, the metrics reported
          are  converted  to  inclusive  and exclusive times; for
          counters that do  not  count  in  cycles,  the  metrics
          reported are inclusive and exclusive event counts.

  Sampling and Global Data
     Sampling refers to the process of generating  markers  along
     the  time  line of execution, and allowing the processing of
     data for just part of the run. At each sample point,  execu-
     tion  statistics, and, if specified, address space data, are
     recorded. All of the  data  recorded  at  sample  points  is
     global  to  the  program, and does not map to function-level
     metrics.

     Sampling
          Under the collect command, samples  are  taken  at  the
          start  of the process, and at its termination. In addi-
          tion, a sample can be taken  by  using  the  libcollec-
          tor(3) API.

          The data recorded at  each  sample  point  consists  of
          microstate  accounting  information  from  the  kernel,
          along with various other statistics  maintained  within
          the kernel.


SEE ALSO

     analyzer(1), collector(1), dbx(1), er_archive(1),  er_cp(1),
     er_export(1),  er_mv(1),  er_print(1),  er_rm(1), er_src(1),
     workshop(1), libcollector(3), and
     Analyzing Program Performance With Sun WorkShop.