Sun WorkShop[tm] 6 update 2 Performance Analyzer Readme

Updated 2001/05/22

Sun WorkShop[tm] 6 update 2 Performance Analyzer Readme

Contents

Introduction

About Sun WorkShop 6 update 2 Performance Analyzer

New Features

Software Corrections

Problems and Workarounds

Limitations and Incompatibilities

Documentation Errata

Required Patches

A. Introduction

This document contains last-minute information about the Sun WorkShop 6 update 2 Performance Analyzer, Sampling Collector and related software. This document describes the software corrections addressed by this release and lists known problems, limitations, and incompatibilities.

For installation-related and late-breaking information about this release, see the Sun WorkShop 6 update 2 Release Notes. Information in the release notes overrides information in all readme files.

To access the release notes and the full Forte[tm] Developer/Sun WorkShop[tm] documentation set, point your Netscape[tm] Communicator 4.0 or compatible browser to the documentation index (file:/opt/SUNWspro/docs/index.html).

To access the HTML version of this readme, do one of the following:

Choose Help > Readme from the Sun WorkShop main window.

Point your Netscape Communicator 4.0 or compatible browser to file:/opt/SUNWspro/docs/index.html.

To view the text version of this readme, type the following at a command prompt:

example% more /opt/SUNWspro/READMEs/analyzer

Note - If your Sun WorkShop software is not installed in the /opt directory, ask your system administrator for the equivalent path on your system.

Note - In this document the term "IA" refers to the Intel 32-bit processor architecture, which includes the Pentium, Pentium Pro, Pentium II, Pentium II Xeon, Celeron, Pentium III, and Pentium III Xeon processors and compatible microprocessor chips made by AMD and Cyrix.

B. About Sun WorkShop 6 update 2 Performance Analyzer

The Sun WorkShop 6 update 2 Performance Analyzer is available on the Solaris[tm] operating environment (SPARC[tm] Platform Edition) and Solaris operating environment (Intel Platform Edition) versions 2.6, 7, and 8.

The Performance Analyzer and the Sampling Collector are a pair of tools which collect and analyze statistical profiles of a program's performance. The data is converted into performance metrics which can be viewed at the load object, function, source line or instruction level. The Performance Analyzer provides a means of navigating program structure which is useful for identifying functions and paths within the code which are responsible for resource usage, inefficiencies or time delays.

In addition to these tools, the performance tools package includes command-line utilities for data collection, program analysis and annotated source browsing.

C. New Features

This section describes the new and changed features for the Sun WorkShop 6 update 2 Performance Analyzer, Sampling Collector and related software. In addition, it lists the new features that were introduced in the Sun WorkShop 6 and Sun WorkShop 6 update 1 releases of the Performance Analyzer, Sampling Collector and related software.

See also What's New in Sun WorkShop 6 update 1, which describes all the new features in the Sun WorkShop 6 update 1 release and in the Sun WorkShop 6 release. You can access this book by pointing your browser to http://docs.sun.com. Click Search Book Titles Only, and search for "What's New."

Sun WorkShop 6 update 2 New Features

Sun WorkShop 6 update 1 New Features

Sun WorkShop 6 New Features

Sun WorkShop 6 update 2 New Features

The Sun WorkShop 6 update 2 Performance Analyzer, Sampling Collector and related software include the following new and changed features.

Hardware counter overflow profiling using two counters has been implemented. The Sampling Collector window, the -h option of the collect command and the dbx collector hwprofile counter subcommand have been extended to permit the specification of two hardware counters. See the collect(1) and collector(1) man pages for details.

A standalone source browser er_src has been provided for viewing annotated source and disassembly code including compiler commentary, without having to load an experiment. See the er_src(1) man page for details. Note that compiler commentary is not yet fully implemented.

The dbx collector command has two new subcommands, pause and resume. These turn off and turn on recording of profiling data. Sample points are still recorded. The functionality of the commands can also be accessed from the collector library libcollector.so through the C routines collector_pause() and collector_resume(), which are called with no arguments. The collector library does not have a Fortran interface, but the C routines can be called with the insertion of a compiler directive.

The collect command has a new option, -y <signal> [,r]. The named signal is used to turn on and turn off data collection, in the same way as the new pause and resume subcommands of dbx collector. With this option, the collector starts in the paused state by default, and waits for the given signal before recording data. If the ,r option is given, the collector starts in the resumed state and begins recording data immediately.

The default naming scheme for MPI experiments has been improved. The number n in the standard name test.n.er defaults to the MPI rank. If an experiment group <group>.erg is specified, test is replaced by <group> in the experiment names. If an experiment name is given, the naming scheme reverts to the default scheme in which the number n is incremented for each new experiment. In any of these cases, if the experiment name exists, n is incremented with a warning.

Experiment groups are now supported by the dbx collector store group <group-name> subcommand. The group name must end in .erg.

The .er suffix for experiment names is now enforced in all interfaces to the Collector.

The Performance Analyzer, er_print and er_src now read defaults files when they are launched. The defaults file can contain er_print directives which define default metrics and metric sort order, compiler commentary classes to be displayed, and a path to an alternate library for C++ name demangling. The file must be named .er.rc, and can be placed in the user's home directory and in a local directory. Values from the file in the local directory override any read from the file in the home directory, which in turn override any read from the system-wide file. See the er_print(1) man page for details.

The meaning of the "." and "+" components of the metric keywords in er_print has been changed. "." now means time and "+" means value. This change only affects hardware counter metrics for which the count is in cycles.

The presentation of hardware counter metrics for cycles and instructions has changed. Previously, all cycle counts were aggregated, whether recorded with the counter name "cycles", "Cycle_cnt/0" or "Cycle_cnt/1", and presented as cycles. Cycle counts recorded on registers 0 and 1 are now presented separately. The same is true for instruction counts.

The collector enable_once subcommand of dbx is no longer supported. The Sampling Collector window now only has radio buttons to turn data collection on and off.

The er_archive command has two new options, to make repairing of experiment files easier. See the er_archive(1) man page for details.

Sun WorkShop 6 update 1 New Features

Sun WorkShop 6 update 1 Performance Analyzer, Sampling Collector and related software included the following new and changed features.

Full support for hardware counter overflow profiling on UltraSPARC[tm] III systems running the Solaris 8 operating environment (SPARC Platform Edition) and Intel 32-bit processor architecture (IA) systems running the Solaris 8 operating environment (Intel Platform Edition)

Stand-alone collect command that allows you to collect performance data on your applications independently of Sun WorkShop and dbx.

Improved MPI (Message Passing Interface) support:

The collect command lets you specify experiment groups, allowing experiments from all of the processes of an MPI run to be grouped and processed together.

Synchronization delay tracing records all calls to the various thread synchronization routines where the real-time delay in the call exceeds a specified threshold.

Improved support for OpenMP (libmtsk) applications that lets you distinguish when a slave thread is waiting for synchronization at the end of a parallel region, and when it is waiting because the code is in a serial region.

Improved map file generation -- the mapfile is now produced ordering the executable by whatever metric is being used for sorting the function list.

Additions to the Select Filters dialog box that let you select experiments for which you want to change the data displayed, and enable and disable data display for experiments.

Sun WorkShop 6 New Features

The Sun WorkShop 6 Performance Analyzer, Sampling Collector and related software included the following new and changed features.

The Function List is the primary display, and is displayed by default when the Analyzer is invoked.

The Function List displays multiple metrics at the same time, instead of requiring you to select one category at a time to view. The Function List can also display metrics as values or a percentage.

A new Summary Metrics window, accessed from the View menu, displays all metrics recorded for a selected function, both as values and percentages. The contents of the Summary Metrics window are independent of what appears in the function list display.

From the Function List, you can access a new Callers-Callees window that shows how metrics are attributed from the callees of a selected function and to the callers of that function.

You can generate annotated source code for a selected function and display the results in an edit window.

You can generate annotated disassembly for the selected function and display the results in an edit window.

You can now use the Select Filters dialog box to filter data by samples, threads, LWPs, or any combination of these. All displays and windows are updated to show data from the selected subset only.

Two thread synchronization delay metrics are now available: a count of synchronization events exceeding the designated threshold, and the aggregate delay from those events.

You can now load multiple experiments into the Analyzer at the same time. Their combined metrics appear in the Function List display.

D. Software Corrections

The following bugs in the Sun WorkShop 6 update 1 release have been fixed:

The sort metric displayed in the right pane of the Callers-Callees window was incorrectly linked to the Function List sort metric. As a consequence, the Sort By radio buttons in the Callers-Callees Select Metrics dialog box did not work (analyzer bug 4373757). These problems have been corrected.

The collector close subcommand of dbx did not function correctly. It is now a synonym for collector disable. The collector quit subcommand is now also a synonym for collector disable. Both of these subcommands are now considered obsolete.

The annotated source code display for C++ programs did not work correctly (analyzer bug 4400219). This problem has been fixed.

The exclusive and inclusive percentages reported in the Callers-Callees Display of the Analyzer were incorrect (analyzer bug 4386940). These have now been corrected to display the percentages of the total program metric.

Sleep time was incorrectly added to Kernel Page Fault time, as well as being added to Other Wait time. All kernel microaccounting states are now correctly accrued in the appropriate metric.

The time for calls to __mt_MasterFunction_ and __mt_WorkSharing_ were not attributed to the source code line on which the OpenMP directive appears (compiler bug 4333641). This bug has been fixed.

E. Problems and Workarounds

This section discusses the following software bugs that could not be fixed for this release. For updates, check the Forte Developer Hot Product News web page (http://www.sun.com/forte/developer/hotnews.html).

See also the Required Patches section in this Readme.

Note - Some bugs that appear to be in the Analyzer may actually be Collector bugs. For information about these bugs, see the dbx readme and the Collector man page. Bugs in the compilers and the Solaris operating environment can also affect the Analyzer.

Lost Clock-Based Profiling Data for LWPs

Lost Hardware Counter Profiling Interrupts

Clock-Based Profiling Inaccuracies on UltraSPARC III Hardware

Poor Scalability Past 32 CPUs

Unpredictable Behavior With libaio.so

Address Space Data on IA Hardware

Call to system() Causes Collector Failure

Incorrect Behavior of dbx collector Commands

Cannot Collect All Synchronization Events With dbx collector Commands

er_mv Corrupts Original Experiment When There Is Insufficient Space

Detaching From an Attached Process in dbx Before Closing Experiment Causes Hang

Using collect With -o and -d Produces an Error
Lost Clock-Based Profiling Data for LWPs

Under some circumstances profiling interrupts (SIGPROF) for one or more LWPs may be lost. When this happens, data displayed does not include thread profile metrics for threads run on those LWPs. This happens most often with unbound threads on Solaris 2.6 or Solaris 7 operating environment (libthread bug 4248299), and much less frequently on the Solaris 8 operating environment (libthread bug 4298226).

Lost Hardware Counter Profiling Interrupts

When a multiprocessor application is running with unbound threads, the interrupt from a hardware counter overflow (SIGEMT) is usually lost and cannot be recovered (libthread bug 4352643). The workaround is to use bound threads, or the alternate libthread library (which uses bound threads, even in support of the unbound-thread APIs).

Clock-Based Profiling Inaccuracies on UltraSPARC III Hardware

In profiling an application when there is a load on the system, kernel bug 4350574 may cause significant undercount of User CPU time, up to 20%. The missing User CPU time shows up as either System CPU time or as Wait-CPU time. It may happen on any hardware type, but happens far more frequently on UltraSPARC III hardware.

Poor Scalability Past 32 CPUs

Due to libthread bug 4273174, applications being measured may slow down considerably when using more than 32 CPUs or threads.

Unpredictable Behavior With libaio.so

Programs which use the asynchronous I/O library, libaio.so, can produce unpredictable behavior when performance data is collected using dbx or the Debugging GUI. The problems occur when periodic sampling is enabled. The workaround is to select manual sampling.

Address Space Data on IA Hardware

Due to kernel bug 4283005, address space data collection on IA hardware corrupts files, including experiment files. Address space data collection has been disabled on IA hardware.

Call to system() Causes Collector Failure

If the LD_LIBRARY_PATH environment variable is defined in the resource file (.cshrc, .profile) which is loaded when a call to system() is made, the collector fails (bug 4405865). The workaround is to ensure that LD_LIBRARY_PATH is not defined in the resource file which is used by the shell defined by the environment variable SHELL.

Incorrect Behavior of dbx collector Subcommands

If an experiment is active, the dbx collector data collection and output subcommands address_space, hwprofile, profile, sample, synctrace and store are accepted silently and applied to the next experiment, if any (bug 4445393). They should be ignored with a warning if an experiment is active.

Cannot Collect All Synchronization Events With dbx collector Commands

It is not possible to set the synchronization threshold to zero using dbx commands or the Collector GUI. The dbx collector synctrace threshold 0 command reports an error (bug 4455260). Selecting the All option from the Synchronization Threshold list box in the GUI fails with an error message but does not reset the information in the GUI to the last accepted threshold.

The workaround is to use the collect command with the -s all option if you want to collect synchronization wait tracing data for all events, regardless of delay. Setting the smallest possible threshold of 1 microsecond in the GUI or in dbx might collect most events, but cannot be guaranteed to collect all events.

er_mv Corrupts Original Experiment When There Is Insufficient Space

When er_mv moves an experiment, it deletes files as it progresses. If there is insufficient space in the destination, the experiment is split between the source and the destination, and the source experiment is incomplete (bug 4421263). The workaround is to check that there is enough disk space before moving an experiment.

Detaching From an Attached Process in dbx Before Closing Experiment Causes Hang

If you attach dbx to a process, enable data collection (using collector enable), and then try to detach from the process before disabling data collection, dbx unloads the collector library, fails to close the experiment, fails to detach from the process, and cannot complete any more commands (bug 4456506). You must kill dbx, which in turn kills the target process. The workaround is to explicitly disable data collection (using collector disable) before detaching dbx from the process.
Using -d before -o in collect Produces an Error

If the -d option precedes the -o option in the collect command, the command fails with the following error message.
Experiment name may only be set once
The workaround is to ensure that the -d option follows the -o option if it is used.
F. Limitations and Incompatibilities

This section describes some known limitations and incompatibilities of the software.

Hardware-Counter Overflow Profiling

Finding Source and Object Files

Experiment Incompatibility

Optimized C/C++ Code on IA Platforms

Hardware-Counter Overflow Profiling

Hardware-counter overflow profiling is not supported on processors earlier than the UltraSPARC III series.

Hardware-counter overflow profiling is not supported on versions of the operating environment that precede the Solaris 8 release.

Some early versions of UltraSPARC III hardware do not support profiling based on User DTLB or ITLB misses. They only support TLB counters for kernel-mode.

Hardware-counter profiling and clock-based profiling are mutually exclusive. Only one of them can record data in a given run. If both are specified, an error message is printed, and no experiment is run.

Finding Source and Object Files

The executable name generated by the debugger for attached experiments may be a relative path, not an absolute path, or the path, even though absolute, may not be accessible to the Analyzer. Similar problems can arise with object files loaded from an archive (.a).

The Performance Analyzer copes with this problem as follows:

It looks for the file using the given path.

If it does not find the file, it extracts the basename from the given path (the name following the last "/") and looks for the file as ./<basename>.

If it still does not find the file, it generates an error or warning, showing the path as it originally appeared in the experiment.

At this point you can look at the Summary Metrics window to determine the file name the Performance Analyzer is using, then set up a symbolic link from the current directory pointing to the real file and try the operation again.

Experiment Incompatibility

The Analyzer cannot load experiments created with versions of the Sampling Collector prior to the Sun WorkShop 6 software release. It reports, "Unrecognized version number; expected 8".

Optimized C/C++ Code on IA Platforms

If you compile a C program on an IA platform with an optimization level of 4 or 5, the Collector is unable to reliably unwind the call stack. As a consequence, only the exclusive metrics for a function are correct. If you compile a C++ program on an IA platform, you can use any optimization level, provided you do not use the -noex (or -features=no@except) compiler option to disable C++ exceptions. If you do use this option the Collector is unable to reliably unwind the call stack, and only the exclusive metrics for a function are correct.

G. Documentation Errata

This section discusses the following errors found in the manual, Analyzing Program Performance With Sun WorkShop.

The description of tcov in Appendix A does not define a basic block. The definition of a basic block is "a contiguous section of code that has no branches".

The following note was omitted from the chapter, Before You Begin.
Note - In this document the term "IA" refers to the Intel 32-bit processor architecture, which includes the Pentium, Pentium Pro, Pentium II, Pentium II Xeon, Celeron, Pentium III, and Pentium III Xeon processors and compatible microprocessor chips made by AMD and Cyrix.

H. Required Patches

The following patches are not included in the Sun WorkShop 6 update 2 Early Access 1 release, but are required for the proper operation of the Sampling Collector and Performance Analyzer.

Solaris Version
SPARC Platform
IA Platform

libaio, libc, and watchmalloc patch

2.6

105210-27

105211-27

libaio patch

2.7

108244-01
(requires 106541-09)

108245-01

/usr/lib/libthread.so.1 patch

2.6

105568-17
(requires 105210-27)

105569-16
(requires 105211-27)

2.7

106980-11
(requires 106541-09)

106981-10
(requires 106542-09)

/usr/lib/fs/ufs/fsck patch

2.7

107544-03

107545-03

Kernel patch for Performance Tools

2.7

106541-11
(requires 107544-03)

106542-09
(requires 107545-03)

SIGEMT

2.8

108528-02

108529-02

alt-libthread

2.8

109461-01

109462-01

These patches can be downloaded at http://sunsolve.sun.com.

The Collector and Analyzer encounter the following problems when the patches are not installed:
Programs that use libaio and invoke aio_cancel() abort during data collection with a variety of error messages, including the following:
dbx: Cannot read status for 1@1--No such file or directory
dbx: Warning: proc state race condition encountered! 
Multithreaded executables cause a SEGV during data collection. Sometimes the core dump occurs in the thread library code, and sometimes it occurs in sigacthandler() for the SIGPROF signal.
Multithreaded executables can fail during collection with various dbx error messages, including those listed under the first bullet and messages reporting the following:
generic libthread_db.so error
Multithreaded executables can fail during collection with a libthread panic relating to a signal fault in a critical section.
The workaround for all these problems is to install the required patches.

Copyright 2001 Sun Microsystems, Inc., 901 San Antonio Road, Palo Alto, CA 94303, U.S.A. All rights reserved.

Sun, Sun Microsystems, the Sun logo, docs.sun.com, and Solaris are trademarks, registered trademarks, or service marks of Sun Microsystems, Inc. in the U.S. and other countries.

Sun WorkShop[tm] 6 update 2 Performance Analyzer Readme

Contents

A. Introduction

B. About Sun WorkShop 6 update 2 Performance Analyzer

C. New Features

Sun WorkShop 6 update 2 New Features

Sun WorkShop 6 update 1 New Features

Sun WorkShop 6 New Features

D. Software Corrections

E. Problems and Workarounds

Lost Clock-Based Profiling Data for LWPs

Lost Hardware Counter Profiling Interrupts

Clock-Based Profiling Inaccuracies on UltraSPARC III Hardware

Poor Scalability Past 32 CPUs

Unpredictable Behavior With `libaio.so`

Address Space Data on IA Hardware

Call to `system()` Causes Collector Failure

Incorrect Behavior of `dbx collector` Subcommands

Cannot Collect All Synchronization Events With `dbx collector` Commands

`er_mv` Corrupts Original Experiment When There Is Insufficient Space

Detaching From an Attached Process in `dbx` Before Closing Experiment Causes Hang

Using `-d` before `-o` in `collect` Produces an Error

F. Limitations and Incompatibilities

Hardware-Counter Overflow Profiling

Finding Source and Object Files

Experiment Incompatibility

Optimized C/C++ Code on IA Platforms

G. Documentation Errata

H. Required Patches

Sun WorkShop[tm] 6 update 2 Performance Analyzer Readme

Contents

A. Introduction

B. About Sun WorkShop 6 update 2 Performance Analyzer

C. New Features

Sun WorkShop 6 update 2 New Features

Sun WorkShop 6 update 1 New Features

Sun WorkShop 6 New Features

D. Software Corrections

E. Problems and Workarounds

Lost Clock-Based Profiling Data for LWPs

Lost Hardware Counter Profiling Interrupts

Clock-Based Profiling Inaccuracies on UltraSPARC III Hardware

Poor Scalability Past 32 CPUs

Unpredictable Behavior With libaio.so

Address Space Data on IA Hardware

Call to system() Causes Collector Failure

Incorrect Behavior of dbx collector Subcommands

Cannot Collect All Synchronization Events With dbx collector Commands

er_mv Corrupts Original Experiment When There Is Insufficient Space

Detaching From an Attached Process in dbx Before Closing Experiment Causes Hang

Using -d before -o in collect Produces an Error

F. Limitations and Incompatibilities

Hardware-Counter Overflow Profiling

Finding Source and Object Files

Experiment Incompatibility

Optimized C/C++ Code on IA Platforms

G. Documentation Errata

H. Required Patches

Unpredictable Behavior With `libaio.so`

Call to `system()` Causes Collector Failure

Incorrect Behavior of `dbx collector` Subcommands

Cannot Collect All Synchronization Events With `dbx collector` Commands

`er_mv` Corrupts Original Experiment When There Is Insufficient Space

Detaching From an Attached Process in `dbx` Before Closing Experiment Causes Hang

Using `-d` before `-o` in `collect` Produces an Error