Man Page collect.1
NAME
collect - command used for performance data collection
SYNOPSIS
collect <args> target <target-args>
AVAILABILITY
Part of Forte(TM) C, Forte C++, and Forte for High Perfor-
mance Computing.
DESCRIPTION
The collect command can record a variety of different per-
formance data, and convert each of them into metrics of per-
formance computed against functions, callers and callees of
any function, and against source and disassembly representa-
tions of the target program. It also records global data
with periodic or manual sampling.
target is the path name of the executable for which you want
to collect performance data. Programs that are targets for
the collect command can be compiled with any level of optim-
ization, but must use dynamic linking. If a program is
statically linked, the collect command prints an error mes-
sage.
ARGUMENTS
If invoked with no arguments, collect prints a usage mes-
sage, including the names of any hardware counters available
for profiling.
Data Specifications
-p interval
Collect clock-based profiles. interval can be speci-
fied as one of the following:
o the string on implying the default of 10 millisecond
profiling
o the string off implying no clock-based profiling
o a positive, non-zero number, implying that value is
to be used as the profile interrupt, given in mil-
liseconds. The value is rounded down to the nearest
multiple of the resolution available on the system,
except if the value is lower than the resolution, in
which case it is set to the system resolution.
NOTE: if no data specification arguments are given, a
default of clock-based profiling is set.
-s threshold
Collect synchronization trace data, using the specified
threshold. threshold can be specified as one of the
following:
o the string on or the string calibrate, implying set-
ting the threshold value by calibration at runtime
o the string off, implying no synchronization tracing
o a positive non-zero number, representing the minimum
delay, expressed in microseconds, for an event to be
traced
o the string all, implying that all synchronization
events to be recorded.
-h counter[,value[,counter2,[value2]]]
Collect hardware counter overflow profiles. The
counter name can be either a standard counter name, or
the internal name, as used by cputrack(1). If the name
is an internal name, and it is trailed by a slash and a
single digit (0 or 1), the event register specified by
that digit is used; if not, whichever event register
supports that named counter is used. If the optional
value is omitted, or set to an explicit zero, the
default value for that counter is used. If the
optional value is specified as the letter h, the high-
resolution default for that counter is used. If the
optional value is specified as a number, that value is
used.
An optional second counter and second interval can be
specified to allow profiling based on two counters at
once. If specified, the value for the first counter
must be specified, although it can be specified as
zero. The two named counters specified must be on dif-
ferent event registers.
Hardware counter profiling and clock-based profiling
are mutually exclusive. Only one of them can record
data in a given run. If both are specified, an error
message wil be printed, and no experiment is run.
NOTE: Hardware counter profiling cannot be run on a
system where cpustat is running, as that code takes
control of the counters, and does not let a user pro-
cess access them.
-a Collect address space data. Address space data con-
sists of page referenced and modified bits for each
segment of the target's address space. It is collected
at sample points only.
NOTE: address space data is only available on SPARC
machines.
-n Disable all data collection. Used only to simplify
scripts for conditional data collection, as would be
used, for example, for collecting only data from only
some MPI processes in a job.
If no data specification arguments are supplied, clock-based
profiling data, with the default resolution, is collected.
If clock-based profiling is explicitly disabled, and neither
synchronization tracing, nor hardware counter based profil-
ing is enabled, the collect command reports an error and
fails.
Experiment Controls
-x Leave the target process stopped on the exit from the
exec system call, in order to allow a debugger to
attach to it.
-l signal
Take a sample whenever the given signal is delivered to
the process.
-y signal[,r]
Control recording of profiling data with signal. When-
ever the given signal is delivered to the process,
switch between paused (no data is recorded) and resumed
(data is recorded) states. The collector is started in
the resumed state if the optional ,r flag is given,
otherwise it is started in the paused state.
Output Controls
-o experiment_name
Use experiment_name as the name of the experiment to be
recorded. The experiment_name string must end in the
string .er; if not, an error is reported, and no exper-
iment is run.
If -o is not specified, a name of the form stem.n.er is
chosen, where stem is a string, and n is a number. If
a -g argument is given, the string appearing before the
.erg suffix in the group name is used as the stem pre-
fix; if no -g argument is given, the stem prefix is set
to the string test.
If the collect command is launched from one of the
various MPI commands and -o is not specified, the value
of n used in the name is taken from the environment
variable used to define the MPI Rank of that process;
otherwise, n is set to the lowest integer not in use.
If the name is not specified in the form stem.n.er,
and the the given name is in use, an error message is
printed, and no experiment run. If the name is of that
form, and the name is in use, the experiment is
recorded under a name corresponding the first available
value of n that is not in use; a warning is issued if
the name is changed.
-d directory_name
Place the experiment in directory directory_name. if
none is given, they are recorded into the current work-
ing directory.
-g group_name
Consider the experiment to be part of experiment group
group_name. The group_name string must end in the
string .erg; if not, an error is reported, and no
experiment is run.
Other Arguments
-V Print the current version. No further arguments are
examined, and no further processing is done.
-v Print the current version and further detailed informa-
tion about the experiment being run.
USING COLLECT WITH MPI
collect can be used with MPI by simply prefacing the target
and its arguments with collect and its arguments in the com-
mand line that starts the MPI job. For example:
% mprun -np 16 a.out 3 5
can be replaced by:
% mprun -np 16 collect -d /tmp/mydirectory -g run1.erg
a.out 3 5
to run a default profiling experiment on each of the 16 MPI
processes, collecting them all in a specific directory, and
collecting them as a group. The individual experiments are
named by the MPI rank, as described above.
DATA COLLECTED
Program-based metrics
Clock-based Profiling
Clock-based profiling can run at normal frequency (10
ms.), high-resolution frequency (1 ms.), or a custom
frequency, specified in milliseconds. For high-
resolution profiling, the operating system on the
machine must be running with a high-resolution clock
routine, which can be done by putting the line:
set hires_tick=1
in the file /etc/system and rebooting. High-resolution
profiles record ten times as much data for a given run
as normal profiles. Attempting to set high-resolution
profiling on a machine whose operating system does not
support it posts an warning, and reverts to the highest
resolution supported. Similarly, a custom setting that
is not a multiple of the resolution supported by the
system is rounded to the nearest non-zero multiple of
that resolution, and a warning message given.
Profiling produces data to support the following
metrics:
User CPU Time
Wall Time
Total LWP Time
System CPU Time
Wait CPU Time
Text Page Fault Time
Data Page Fault Time
Other Wait Time
For multiprocessor experiments, all of the times are
summed across all LWPs in the process. Total time adds
up to the wall-clock time, multiplied by the average
number of LWPs in the process. Each record contains a
timestamp, and the thread and LWP IDs at the time of
the clock tick.
Synchronization Delay Tracing
Synchronization delay tracing records all calls to the
various thread synchronization routines where the
real-time delay in the call exceeds a specified thres-
hold. Each record contains a timestamp, the thread ID
and the LWP ID at the time the request is initiated.
(Synchronization requests from a thread can be ini-
tiated on one LWP, but complete on another.) MPI
blocking calls are considered to be thread-
synchronization routines, and are traced where applica-
ble.
Synchronization delay tracing produces data to support
the following metrics:
Synchronization Delay Events
Synchronization Wait Time
Hardware Counter Overflow Profiling
Hardware counter overflow profiling records the
callstack of each LWP at the time the hardware counter
for the CPU on which it is running overflows. The data
also includes a timestamp, and the CPU, thread, and LWP
IDs. Hardware counter overflow profiling can be done
only on UltraSPARC-III systems running the Solaris(TM)
8 Operating Environment (SPARC Platform Edition) or on
Intel systems (Pentium II and Pentium III) running the
Solaris 8 Operating Environment (Intel Platform Edi-
tion). On other machines, an attempt to set HW counter
overflow profiling generates an error.
The counters available depend on the specific CPU chip
and OS. They can be determined by running the collect
command with no arguments. It prints out a usage mes-
sage that contains the names of the counters. The
standard counters are displayed first in the list, fol-
lowed by a list of all counters.
Lines for a standard counter are formatted as follows:
CPU Cycles (cycles = Cycle_cnt/0) 1000003 h=200003
In this line, the first field, "CPU-Cycles", represents
the metric name. The second field, "cycles", gives the
counter name that should be used in the -h counter...
argument. The third field, "Cycle_cnt/0", gives the
internal name as used by cputrack(1) and the register
number on which that counter can be used. The next
field is the default overflow interval, and the last
field is the default high-resolution overflow interval.
Lines for the non-standard counters are formatted as
follows:
Cycle_cnt/0 events 1000003 h=200003
In this line, the first field, "Cycle_cnt/0", gives the
internal name as used by cputrack(1) and the register
number on which that counter can be used. The string
"Cycle_cnt/0 events" is the metric name for this
counter. The next field is the default overflow inter-
val, and the last field is the default high-resolution
overflow interval.
For counters that count in cycles, the metrics reported
are converted to inclusive and exclusive times; for
counters that do not count in cycles, the metrics
reported are inclusive and exclusive event counts.
Sampling and Global Data
Sampling refers to the process of generating markers along
the time line of execution, and allowing the processing of
data for just part of the run. At each sample point, execu-
tion statistics, and, if specified, address space data, are
recorded. All of the data recorded at sample points is
global to the program, and does not map to function-level
metrics.
Sampling
Under the collect command, samples are taken at the
start of the process, and at its termination. In addi-
tion, a sample can be taken by using the libcollec-
tor(3) API.
The data recorded at each sample point consists of
microstate accounting information from the kernel,
along with various other statistics maintained within
the kernel.
SEE ALSO
analyzer(1), collector(1), dbx(1), er_archive(1), er_cp(1),
er_export(1), er_mv(1), er_print(1), er_rm(1), er_src(1),
workshop(1), libcollector(3), and
Analyzing Program Performance With Sun WorkShop.