RADICAL-Pilot 0.50.21 Documentation¶
RADICAL-Pilot (RP) is a Pilot Job system
written in Python. It allows a user to run large numbers of computational tasks
(called ComputeUnits
) concurrently on one or more remote ComputePilots
that RADICAL-Pilot can start transparently on a multitude of different
distributed resources, like HPC clusters and Clouds.
In this model, a part (slice) of a resource is acquired by a user’s application
so that the application can directly schedule ComputeUnits
into that
resource slice, rather than going through the system’s job scheduler. In many
cases, this can drastically shorten overall exeuction time as the individual
ComputeUnits
don’t have to wait in the system’s scheduler queue but can
execute directly on the ComputePilots
.
ComputeUnits
can be sequential, multi-threaded (e.g. OpenMP), parallel process
(e.g. MPI) executables, Hadoop or Spark applications.
RADICAL-Pilot is not a static system, but it rather provides the user with a programming library (“Pilot-API”) that provides abstractions for resource access and task management. With this library, the user can develop everything from simple “submission scripts” to arbitrarily complex applications, higher- level services and tools.
Links
- repository: https://github.com/radical-cybertools/radical.pilot
- user list: https://groups.google.com/d/forum/radical-pilot-users
- developer list: https://groups.google.com/d/forum/radical-pilot-devel
Contents:¶
- 1. Introduction
- 2. RADICAL-Pilot - Overview
- 3. Installation
- 4. User Guide
- 4.1. Getting Started
- 4.2. Obtaining Unit Details
- 4.3. Handle Failing Units
- 4.4. Use Multiple Pilots
- 4.5. Selecting a Unit Scheduler
- 4.6. Staging Unit Input Data
- 4.7. Staging Unit Output Data
- 4.8. Sharing Unit Input Data
- 4.9. Setup Unit Environment
- 4.10. MPI Applications
- 4.11. Using Pre- and Post- exec commands
- 5. Examples
- 6. API Reference
- 7. Data Staging
- 8. Using Local and Remote HPC Resources
- 9. Unit Scheduler
- 10. Testing
- 11. Benchmarks
- 12. Details on Profiling
- 13. Frequently Asked Questions
- 14. Developer Documentation