Standard Wording for Capabilities
Version 2.0 (November 14, 1995)
This portion of the document provides standard verbiage for all the capabilities
(both Baseline Development Environment and Priority Capabilities) identified as
important to a significant number of HPC user sites.
The task force strongly urges that the entire Baseline Development
Environment -- that is, all capabilities identified as (BDE) -- be included
on every procurement for a parallel or clustered machine, even if a particular
site does not believe that a particular feature is critical. Only by presenting
HPC providers with consistent requirements can the user community hope to achieve
a consistent program development environment spanning multiple platforms.
In addition, most sites will want to include capabilities from one or more of the
priority levels (identified as (D1) for "desirables 1," etc.; see
Terms and Definitions section for more information on
the levels.) The standard verbiage does not include any specific performance
criteria to be met by proposers' implementations, nor any indication of how
compliance will be evaluated. It is expected that each agency will add such
criteria, according to its own procurement regulations.
Some RFPs will also need additional items
addressing site-specific software needs that are not covered here.
We recommend that all such additions be done in the form of new requirements,
rather than modification to the standard verbiage. It should be remembered that
HPC providers will already be familiar with the standard capabilities, and are
unlikely to notice small changes in the wording introduced by a particular site.
New, numbered items are more likely to be noted.
Terminology
The following terms have been defined for specific use in this document:
platform, PE (Processing Element), API (Application
Programming Interface), standard API, published API, current
standard, fully-supported implementations, XXX-compatible
software, and single-point control interfaces. See
Terms and Definitions section for details.
Contents
1.1. Shells and Utilities
- 1.1.1 (BDE) Fully supported implementation of sh, as specified in POSIX
1003.2.
- 1.1.2 (BDE) Fully supported implementation of csh
(compatible with version 4 of System V UNIX).
- 1.1.3 (D3) Availability of tcsh, version 6.03.
- 1.1.4 (D3) Availability of ksh, version dated 12/28/93.
- 1.1.5 (BDE) Fully supported implementation of grep, egrep, sed,
and diff, version 2.0.
- 1.1.6 (D2) Availability of perl, version 5.0.
- 1.1.7 (BDE) Fully supported implementation of vi
(compatible with version 4 of System V UNIX, including
ex), supporting line-oriented as well as full-screen mode.
1.2. Sequential Compilers
- 1.2.1 (BDE) Fully supported implementations of Fortran77 (ANSI standard
plus MIL standard extensions) and C (current ANSI standard).
These will be referred to hereafter as the baseline
languages.
- 1.2.2 (D1) Fully supported implementation of Fortran90 (current ANSI
standard).
- 1.2.3 (D1) Fully supported implementation of C++ (ANSI standard when
it becomes available).
1.3. Parallelizing Compilers/Translators
- 1.3.1 (D1) Fully supported implementation of Subset HPF Version 1.1.
Platform-specific extensions are allowed, but the full subset must
be included.
- 1.3.2 (D3) Availability of full HPF standard. Platform-specific
extensions are allowed.
- 1.3.3 (D1) For shared-memory systems, capability of a Fortran compiler
to support parallelism through directives or language constructs.
- 1.3.4 (D2) For shared-memory systems, capability of a Fortran compiler
to perform automatic parallelization.
- 1.3.5 (D2) For shared-memory systems, capability of a C compiler
to support parallelism through pragmas or language constructs.
1.4. Additional Language Support
- 1.4.1 (BDE) Fully supported implementation of mechanisms for mixed-language
applications (i.e., allowing
inter-language subprocedure invocations), for the baseline
languages.
- 1.4.2 (D1) Availability of mixed language support, as defined in
1.4.1, for all languages specified in Section 1.2.
- 1.4.3 (D1) Availability of an assembler.
- 1.4.4 (D2) Availability of compiler option(s) to produce pseudo-
assembly-language listings for the baseline languages.
- 1.4.5 (D3) Availability of compiler directives/pragmas allowing the
embedding of assembly-language instructions in applications written
in the baseline languages.
- 1.4.6 (D2) Availability of mechanisms to perform basic
interprocedural analysis (e.g., variable cross-reference
listing, COMMON block analysis, use/def analysis) for applications
written in the baseline languages.
- 1.4.7 (D2) Capability for language-sensitive modes, handling the
baseline languages, in some editor (e.g., emacs).
- 1.4.8 (D3) Availability of lint-like tool(s) for the baseline
languages.
1.5 Application Building
- 1.5.1 (BDE) Fully supported make, conforming to the GNU make interface
(version 3.74).
- 1.5.2 (D1) Capability of preprocessing ANSI C preprocessor directives in
applications written in any of the baseline languages.
- 1.5.3 (D3) Capability of performing make in parallel.
- 1.5.4 (BDE) Fully supported implementation of an object module linker.
- 1.5.5 (D1) Capability of re-linking selected portions of an application
(i.e., replace specific objects within the binary).
- 1.5.6 (D2) Capability of deferring linking of library objects until load
time (i.e., entire library linked to application as it is
loaded).
- 1.5.7 (D3) Capability of deferring linking of library objects until run
time (i.e., individual objects are linked in only if they
are referenced at run time).
- 1.5.8 (D1) Capability of building (i.e., preprocess, compile, and link)
an application intended to execute on a parallel computer, using
some common workstation platform; this requirement encompasses
any special licensing arrangements that might be necessary
for such cross-platform development.
- 1.5.9 (D3) Availability of a source code management tool, for projects
involving multiple programmers (e.g., SCCS, USM, RCS, CVS).
2.1. Stack Traceback Utilities
- 2.1.1 (BDE) Fully supported implementation of a feature whereby
critical information is generated to stderr upon interruption
of a process/thread involving any trap for which the application
has not defined a handler. The information will include a
source-level stack traceback (indicating the approximate
location of the process/thread in terms of source routine and
line number) and an indication of the interrupt type.
- 2.1.2 (D1) Availability of a feature whereby critical information is
generated to stderr upon failure of a parallel application,
The information will include aggregate source-level stack
tracebacks (as defined in 2.1.1), representing all
processes/threads involved in the
application, and an indication of the reason for failure.
- 2.1.3 (D1) Availability of the standard lightweight corefile
API, defined by the Parallel
Tools Consortium, to trigger generation of aggregate
traceback data like that described in 2.1.2 (may
produce a platform-specific format).
- 2.1.4 (D2) The format of the information in 2.1.1 (also 2.1.2 and
2.1.3, if included in the requirements) must be that specified
for the "lightweight corefile" facility defined by the Parallel
Tools Consortium.
- 2.1.5 (D2) Availability of the "lightweight corefile" facility
defined by the Parallel Tools Consortium, including capabilities
for storing lightweight corefiles in standardized format, as well
as command-line and graphical browsers for analyzing and
displaying that information.
2.2. Interactive Debugger
- 2.2.1 (BDE) Fully supported implementation of an interactive parallel
debugger providing single-point control for debugging both
sequential and parallel applications (multiple debugger invocations
to control individual processes are not acceptable). At least
the following functionality must be supported:
- control of parallel processes: start/stop
processes, set/list/remove breakpoints and data watchpoints,
single-step into/over subprocedure invocations
- examination of program state: stack traceback(s) for
processes, contents of variables, aggregates, and blocks of
memory, current states and source locations of processes
- modification of program state: change contents of
variables, aggregates, and blocks of memory
The debugger must report information at the level of
application source code (before preprocessing) for all baseline
languages, including support for mixed-language applications.
Both a full-screen, window-based interface and a command-line
interface must be fully supported, although they need not be
functionally identical.
- 2.2.2 (BDE) Where the programming model supports it, fully supported
implementation of some mechanism for
viewing and controlling MPMD (multiple executable) as well as SPMD
applications.
- 2.2.3 (BDE) In the presence of code optimization, fully supported
implementation of some mechanism for reporting at least
minimal information on program state (stack traceback, access to
variables that have not been eliminated) and some degree of
functionality (breakpoints where possible, single-stepping at some
level, stepping over subroutines).
- 2.2.4 (D1) Capability of reporting information at the source level
(before preprocessing), for all languages in 1.2 and 1.3,
including support for mixed-language applications.
- 2.2.5 (D1) Capability of viewing source information after preprocessing
(i.e., #ifdef's, #include's, and macros expanded), for all languages
in 1.2 and 1.3, including support for mixed-language
applications.
- 2.2.6 (BDE) Fully supported implementation of some mechanism for invoking
the debugger for examining the final state of an application that
failed
("postmortem debugging"). Facilities for modifying program state
and/or continuing execution need not be available in this mode.
If the code was not compiled for debugging, it is understood that
access to source-level information may be limited.
- 2.2.7 (D2) Capability of attaching/detaching the debugger to/from an
executing (serial or
parallel) application. Facilities for modifying program state and
continuing execution must be available. If the code was not
compiled for debugging, it is understood that access to
source-level information may be limited.
- 2.2.8 (D1) Availability of an evaluator capable of calculating the
results of simple expressions (in some scripting-like language)
such as values of conditionals and indirect array references.
- 2.2.9 (D2) Same as 2.2.8, but supporting Fortran and C expressions.
- 2.2.10 (D3) Availability of a high-level language interpreter (the
language(s) supported by the debugger) so user-defined functions
can be executed at breakpoints and watchpoints.
2.3. Debugger Interface Issues
- 2.3.1 (D2) Capability of specifying that debugger operation(s)
be applied to just a subset of the processes/threads in a parallel
application.
- 2.3.2 (D1) For time-consuming operations, a visual indication of
operation-in-progress, where time-consuming refers to any
situation where there is a long enough lapse that the
user might be led to think that nothing was happening and would
attempt the operation a second time.
- 2.3.3 (D2) For time-consuming operations (as defined in 2.3.2),
capability of canceling the operation after it has been
started. There need be no guarantee that the original state
(prior to start of the operation) will be restored, just that
the operation will not continue.
- 2.3.4 (D3) Ability to compose macros as a shortcut for specifying
frequently executed sequences of debugger commands.
- 2.3.5 (D2) Online help for all debugger features, including online
access to all debugger manuals.
- 2.3.6 (D1) Capability of setting a breakpoint by point-and-click on a
source location in the source code listing. Point-and-click refers
to a simple one- or two-step operation; the need to highlight a
portion of the code, then move the cursor to invoke menu items or
perform multi-step operations are not acceptable solutions.
- 2.3.7 (D1) Capability of viewing the source code for a function or
subroutine by point-and-click (as defined in 2.3.6) on its name,
in a routine list or in source lines in other routines.
- 2.3.8 (D2) Capability of setting a data watchpoint by point-and-click (as
defined in 2.3.6) on the variable's name in the source listing.
- 2.3.9 (D1) Capability of viewing the value of a variable by
point-and-click (as defined in 2.3.6) on its name in the source
listing.
- 2.3.10 (D3) Capability of modifying the value of a variable by
point-and-click (as defined in 2.3.6) on its name in the source
listing.
- 2.3.11 (D3) Availability of facilities to view/filter the contents of
message queues, as specified by the Parallel Tools Consortium's
"message queue manager."
- 2.3.12 (D3) Capability of editing source code and re-make the executable
from within (i.e., without having to exit) the debugger.
- 2.3.13 (D3) Capability of opening some simple editor (e.g., vi)
at the offending source code location when an error occurs.
2.4. Debugger Data Displays
- 2.4.1 (D1) Availability of a scrollable, tabular representation showing
all values in a vector, matrix, or 2-D array slice.
- 2.4.2 (D1) Availability of multiple visual representations of values in
a matrix or 2-D array slice (e.g., bitmap showing elements exceeding
a threshold value, colormap, surface map, contour map).
- 2.4.3 (D1) Capability of viewing the layout and values of aggregate data
structures other than arrays with point-and-click navigation,
as defined in 2.3.6.
- 2.4.4 (D3) Capability of assimilating the local values of variables
that are replicated across multiple threads/processes, and
presenting a condensed summary within a single window.
- 2.4.5 (D3) Where distributed arrays are supported by the programming
model, capability of gathering the elements of a distributed 2-D
array and presenting them in a single table/visualization, as
described in 2.4.1 and 2.4.2.
- 2.4.6 (D3) Capability of specifying that the displays defined in Section
2.4 be shown for just a subset of the processes/threads in a
parallel application.
2.5. Profiling Tool
- 2.5.1 (BDE) Fully supported implementation of a tool for profiling
CPU time distribution from all processes/threads in a parallel
application, at the levels of subprocedures and coarse blocks (e.g.,
large loops). Must include capability for statically restricting
the amount of profiling data collected
to certain portions of the source code (e.g., a specific
subset of procedures), through the use of compiler directives or
command-line switches. Must provide visual as well as textual
displays of tool output.
- 2.5.2 (D1) Capability to provide a consolidated summary report for the
data provided in 2.5.1.
- 2.5.3 (D3) Availability of a published API (possibly platform-specific)
for dynamically activating and deactivating profiling during
execution.
- 2.5.4 (D3) Capability to view the source code listing for a function or
subroutine by point-and-click (as defined in 2.3.6) on its name
in the profiling tool's display.
- 2.5.5 (D2) Availability of online help for all profiling tool features,
including online access to all manuals.
2.6. Event Tracing Tool
- 2.6.1 (BDE) Fully supported implementation of an event tracing tool.
Mechanisms for generating event records must include timestamp
and event type designator and be formatted in SDDF (self-defining
data format), and require the availability
of a published API (possibly platform-specific)
for dynamically activating and deactivating event monitoring
during execution. A single visual tool must be capable of
displaying the event data.
- 2.6.2 (BDE) For all message-passing libraries supported on the platform,
fully supported implementation of some mechanism
for tracing message sends, receives, and
synchronizations, at least to the level supported interactively
by the Parallel Tool Consortium's "message queue manager."
- 2.6.3 (D1) Where the programming model supports it, capability of
viewing MPMD (multiple executable) as well as SPMD
applications.
- 2.6.4 (D2) Capability of viewing the pertinent source code by
point-and-click (as defined in 2.3.6) on an event representation
in the tool's display.
- 2.6.5 (D2) Availability of online help for all event tool features,
including online access to all manuals.
2.7. Performance Statistics Tool
- 2.7.1 (BDE) Fully supported implementation of performance statistics
tool(s), whereby performance measures obtained for individual
PEs/processes are reported and summarized for the entire
application. There must be some mechanism for capturing the
statistics and storing them for later analysis/viewing. The
measures may be platform-specific, but must include a summary
of memory usage.
- 2.7.2 (D1) Where the programming model supports it, capability of
generating and viewing performance statistics for
MPMD (multiple executable) as well as SPMD applications.
- 2.7.3 (D3) Capability of reporting statistics on cache misses.
- 2.7.4 (D3) Capability of reporting statistics on page faults.
- 2.7.5 (D3) Capability of reporting statistics on communications,
in terms of bytes sent/received.
- 2.7.6 (D3) Capability of reporting statistics on FLOPS.
- 2.7.7 (D3) Capability of reporting statistics on OPS.
3.1. Message-Passing Libraries
- 3.1.1 (BDE) Fully supported implementation of the current standard,
as defined by the most recent specification from the MPI Forum.
- 3.1.2 (BDE) Fully supported implementation of the dynamic process control
routines specified by the MPI Forum (released at Supercomputing
'95).
- 3.1.3 (BDE) Fully supported implementation of PVM (version 3.3.7).
- 3.1.4 (D3) Availability of the current standard, as defined by the
most recent public-domain version from ORNL.
3.2. Remote Memory Operations
- 3.2.1 (D1) Availability of a published API (possibly platform-specific)
for performing remote get/put operations.
- 3.2.2 (D2) Availability of a published API (possibly platform-specific)
for performing atomic-increment-and-return-previous-value.
3.3. Thread Operations
- 3.3.1 (BDE) Fully supported implementation, as defined by the POSIX 1003.4
working group standard, of thread operations, in shared address
spaces.
3.4. Math Libraries
- 3.4.1 (BDE) Fully supported implementation of
a published API (may be platform-specific) for one-, two- and
three-dimensional FFTs for both radix-2 and mixed-radix arrays,
executed on a single PE. Must handle complex-to-complex,
real-to-complex, and complex-to-real formats.
- 3.4.2 (BDE) Fully supported implementation of
a published API (may be platform-specific) for one-, two- and
three-dimensional FFTs for both radix-2 and mixed-radix arrays,
in parallel form for execution across multiple PEs, handling
same formats as 3.4.1.
- 3.4.3 (BDE) Fully supported implementation of levels 1, 2, and 3 of the
BLAS, executed on a single PE.
- 3.4.4 (BDE) Fully supported implementation of LAPACK (single-PE)
and ScaLAPACK (multiple-PE).
- 3.4.5 (D1) Availability of a published API (may be platform-specific)
supporting both sequential and parallel sparse matrix
operations, using both general-sparse and sparse/block
representations, for the following functionality:
- (sparse-matrix)*(dense-vector)
- (dense-vector)*(sparse-matrix)
- (sparse-matrix)*(sparse-matrix)
- (sparse-matrix)*(dense-matrix)
- (dense-matrix)*(sparse-matrix)
- scatter and gather communications
- 3.4.6 (D2) Availability of a published API (may be platform-specific)
supporting both sequential and parallel mesh operations,
for the following functionality:
- generate_dual
- partition_mesh
- reorder_pointers
- 3.4.7 (D1) Availability of a published API (may be platform-specific)
supporting both sequential and parallel eigensolver
routines for sparse matrices, analogous to the routines
defined by LAPACK for dense matrices.
- 3.4.8 (BDE) Fully supported implementation of
a published API (may be platform-specific) for a parallel lagged
Fibonacci random number generator using Mascagni's seed selection
algorithm, so that
- the same seed for same random number generator produces
the same (reproducible) sequence of random numbers on all
platforms; and
- there is a mathematically sound method of choosing seeds for
the capability of producing different sequences of random
numbers on different processors.
- 3.4.9 (BDE) Fully supported implementation of
a published API (may be platform-specific) for transposing
arrays among the PEs corresponding to all permutations of the
array's indices, including straightforward (blocked) distribution.
It must be possible for the user to specify which indices
correspond to data that is distributed.
- 3.4.10 (BDE) Fully supported implementation of
a published API (may be platform-specific) for converting
among standard data decompositions, including the ScaLAPACK
distribution, blocked distributions where up to three indices are
distributed (other indices area serial), and all other distributions
supported on the platform.
- 3.4.11 (D1) Availability of a published API (may be platform-specific)
supporting gather and scatter operations.
- 3.4.12 (D3) Availability of a published API (may be platform-specific)
supporting blocked and grouped
gather/scatter operations (i.e., collective communications
within a subset of PEs).
3.5 Performance Measurement Libraries
- 3.5.1 (BDE) Fully supported implementation of the standard API defined by
the Parallel Tools Consortium for interval wallclock timers local
to a thread/process. This must access the best available
wallclock timer on the platform, in terms of accuracy and
non-intrusiveness.
- 3.5.2 (BDE) Fully supported implementation of the standard API defined by
the Parallel Tools Consortium for interval CPU timers local
to a thread/process. This must provide access to both
user CPU time and system CPU time (where criteria for each may
be platform-dependent) and must access the best available timers
the platform, in terms of accuracy and non-intrusiveness.
- 3.5.3 (D2) Availability of a published API (possibly platform-specific) to
a globally synchronized wallclock timer, with uniform count
intervals and 64-bit standardized readout format.
- 3.5.4 (D3) Availability of a published API (possibly platform-specific)
for
routines that measure approximate counts for floating-point
add, multiply, and divide operations. These may be intended for
application to relatively coarse execution units (e.g., large
loops or significant numbers of iterations).
- 3.5.5 (D3) Availability of an API as specified in 3.5.4, to measure
square root operations.
3.6. Parallel I/O
- 3.6.1.(BDE) Fully supported implementation of a published API (possibly
platform-specific) supporting four kinds of concurrent file I/O,
where it is the user's responsibility to ensure that all
participating processes open the logical file in the same mode:
- Sequential read: all participating processes
read from a logically shared file using a shared file
pointer. Each record will be read just once.
- Parallel read: all participating processes read
from a logically shared file using independent file pointers.
Thus, each process reads each record.
- Sequentialized write: all participating processes
write to a logically shared file using a shared file pointer.
Records are atomic and cannot be overwritten, but they may
be merged into the shared output file in any order.
- Direct access read/write: each participating process
can read or write any specified record location of a logically
shared file. It is the user's resposibilty to assure that
records do not overlap. When a file is open for reading
and writing at the same time, the effect of reads and
writes into the same locationn is implementation dependent.
Updates are not guaranteed to take effect until the file
is closed.
- 3.6.2.(BDE) All processes in a parallel application must be capable of
performing the operations in 3.6.1, although there may be variation
in performance from one process to another.
- 3.6.3.(BDE) A process' buffers associated with the operations in 3.6.2 must
be flushed automatically upon completion or failure of the
process.
- 3.6.4 (D3) Availability of a sequentialized write, as specified in 3.6.1,
whereby output directed to stdout/stderr are automatically prefixed
by a label indicating which process performed the write.
4.1. Authentication/Security/Namespace Services
- 4.1.1 (BDE) Fully supported implementation of DCE-compatible (version 1.1)
authentication and access control services. (Note that Kerberos
version 5 satisfies the authentication portion of this
requirement.) Such facilities must be capable of processing
messages, and also the following UNIX commands:
- login and passwd on machines defined to the
authentication facility; and
- rcp, rlogin, rsh, rexec, telnet, and ftp coming
from machines defined to the authentication facility.
- 4.1.2 (BDE) Fully supported implementation of DCE-compatible RPCs.
- 4.1.3 (BDE) Fully supported implementation mapping service names to service
locations, so that clients are not required to know locations.
- 4.1.4 (BDE) Fully supported implementation of mechanisms for the name
mapping ervice of 4.1.3, so that it works for the RPCs specified in
4.1.2, as well as any system messages supported (e.g., Mach
messages).
- 4.1.5 (D1) Capability to allow smart services that redirect requests to
replicated servers for redundancy.
4.2. Job Management and Scheduling
- 4.2.1 (BDE) Fully supported commands to manipulate a job as a single
entity, including kill, modify, query characteristics, and query
state (similar to commands provided by UNIX processes); must
include mechanisms whereby system can
fully kill all processes of any job and free all resources. Crash
recovery methods must clean up all cases of "partially dead" jobs,
taking special care to release locks on allocated resources.
- 4.2.2 (BDE) Fully supported implementation of a batch system interface
conforming to POSIX 1003.2d. (Note that PBS 2.1 satisfies this
requirement.)
- 4.2.3 (BDE) Capability for a single batch system to span any subset of
user-accessible PEs.
- 4.2.4 (BDE) Fully supported implementation of spacesharing, or tiling,
making it possible to allocate PEs as dedicated
resources to support non-overlapping jobs. This feature
is critical for benchmarking purposes and for special
resources that may suffer performance degradation if shared
among multiple jobs.
- 4.2.5 (D1) Capability of the job scheduling system to ensure that jobs are
processed by the security system specified in 4.1.1 (e.g., ability
of job scheduling system to handle tickets).
- 4.2.6 (D2) Availability of time-sharing (with no guarantee
of synchronization across PEs).
- 4.2.7 (D3) Availability of a published API (perhaps platform-specific)
enabling the system administrator to tailor the scheduling policy to
specific site requirements.
4.3. Job Checkpointing
- 4.3.1 (D1) Capability to perform process-level
checkpointing, at the request of either the job or the
job scheduler, and continue execution.
- 4.3.2 (D1) Capability to perform job-level checkpointing (i.e.,
storing state information on all its processes) and continue
execution.
- 4.3.3 (D1) Availability of a published API (possibly platform-specific)
for user-initiated checkpoint and restart operations. The
implementation may specify conditions for successful
checkpoint/restart.
4.4. Resource Management and Accounting
- 4.4.1 (BDE) Fully supported implementation to manage a minimal set of
resources, including number and type of PEs, plus per-PE as
as well as aggregate CPU time, wallclock time, memory (high-water
allocation), network adapters, and temporary disk space.
- 4.4.2 (BDE) At least the minimal resource set, as defined in 4.4.1, must be
allocatable to individual jobs.
- 4.4.3 (BDE) At least the minimal resource set, as defined in 4.4.1, must be
allocatable to individual processes within a job.
- 4.4.4 (BDE) Availability of a published API (possibly platform-specific) for
getting and setting the status of at least the minimal resource
set, as defined in 4.4.1.
- 4.4.5 (BDE) Fully supported implementation of job accounting, where data
for all processes of a job is combined to provide an aggregate job
accounting record, for at least the minimal resource set defined
in 4.4.1.
- 4.4.6 (BDE) Fully supported implementation of mechanisms enforcing a hard
limit for at least the minimal resource set defined in 4.4.1.
- 4.4.7 (D2) Capability to assign and detect/report a soft limit for each
supported resource (at least the minimal resource set
defined in 4.4.1).
- 4.4.8 (D2) Accurate accounting at the level of individual processes,
providing conventional UNIX accounting (including system and user
CPU time, wallclock time, network usage, memory usage, disk usage,
and I/O performed).
- 4.4.9 (D1) Capability to selectively preempt or revoke critical resources
from an application using a published API (possibly
platform-specific), including default provisions for application to
abort or suspend.
- 4.4.10 (D3) Availability of a published API (possibly platform-specific)
to determine the status of overall system resources.
4.5 File System
- 4.5.1 (BDE) Fully supported implementation of POSIX-compliant (version
1003) file system, including long filename support.
- 4.5.2 (BDE) Fully supported implementation of file system larger than
4 gigabytes.
- 4.5.3 (BDE) Fully supported implementation of file system capable of
supporting files larger than 4 gigabytes.
- 4.5.4 (D1) Availability of a DCE/DFS-compatible (version 1.1) distributed
filesystem. (Note that AFS version 3.3 does not satisfy needs for
replicated file service, nor for integration with the security
and authentication services.)
- 4.5.5 (D1) Availability of an efficient mechanism to backup and restore
all file systems (local as well as distributed) while maintaining
the ACLs specified in 4.1.1.
4.6. System Availability
- 4.6.1 (BDE) Fully supported implementation of mechanisms for detecting
and reporting failures of critical resources, including
PEs, network paths, and disks.
- 4.6.2 (BDE) Message delivery must be guaranteed; neither messages nor RPCs
may be discarded or ignored without notification to the sender.
- 4.6.3 (D1) Availability of a tool for performing self-consistency checks
of system configuration parameters.
- 4.6.4 (D2) Availability of error recovery mechanisms for the failures
specified in 4.6.1.
- 4.6.5 (D2) Capability for performing optional full system consistency
checks at reboot.
- 4.6.6 (D2) Availability of speedy reboots.
- 4.6.7 (D2) Capability to perform self-tests at PE power-up or reboot,
including at least local memory, local file systems, and external
adapters.
- 4.6.8 (D3) Extension of capability specified in 4.6.7 to include
any global or distributed shared system memory, distributed
file systems, and inter-PE communication paths.
4.7 Other Services
- 4.7.1 (BDE) Fully supported TCP/IP suite.
- 4.7.2 (D2) Availability of IPI-3 channel protocol.
- 4.7.3 (D1) Availability of operating system on each PE that provides
XPG4-compliant (version 2) functionality. The OS need not be
be resident on each PE, but must be accessible to an application
running on any PE. (Note that POSIX 1003 is not sufficient to
meet this requirement.)
5.1 Resource Administration
- 5.1.1 (BDE) Fully supported implementation of a single-point
system administration tool for parallel or clustered machines
administered as a single system to handle:
- file system mounts
- PE booting, where appropriate
- PE status, where appropriate
- PE consistency checks, where appropriate
- software installation
- resource administration
- 5.1.2 (D2) Availability of a tool for managing user administration,
including some means of integrating the namespace manager and the
authentication server in order to facilitate adding, removing,
and modifying users.
- 5.1.3 (D3) Availability of a centralized resource data repository,
keeping track of the state of all system resources and their
current usage policies.
5.2. System Debugging and Performance Analysis
- 5.2.1 (D1) Availability of a tool to dynamically monitor and display
system performance, including:
- PE status
- key resources: system CPU usage, memory usage, page faults
- run queues (on each PE)
- current scheduling information
- current system configuration
- 5.2.2 (D2) Availability of a utility for remotely examining a PE
operating system image, a core image, or the running kernel.
- 5.2.3 (D1) Availability of a centralized system error logging mechanism.
(Note that this requirement is not met by independent error logs
located on each PE.)
- 6.1.1 (BDE) Fully supported availability of online versions, in a
non-proprietary format (preferably SGML, HTML, or PostScript)
for all documentation on baseline software.
- 6.1.2 (D1) Fully supported availability of online versions, as
specified in 6.1.1, for all documentation on other (non-baseline)
delivered software.
Back to document home page.