12. API Reference

12.1. Sessions and Security Contexts

12.1.1. Sessions

class radical.pilot.Session(database_url=None, database_name='radicalpilot', session_uid=None)[source]

A Session encapsulates a RADICAL-Pilot instance and is the root object for all other RADICAL-Pilot objects.

A Session holds radical.pilot.PilotManager and radical.pilot.UnitManager instances which in turn hold radical.pilot.Pilot and radical.pilot.ComputeUnit instances.

Each Session has a unique identifier radical.pilot.Session.uid that can be used to re-connect to a RADICAL-Pilot instance in the database.

Example:

s1 = radical.pilot.Session(database_url=DBURL)
s2 = radical.pilot.Session(database_url=DBURL, session_uid=s1.uid)

# s1 and s2 are pointing to the same session
assert s1.uid == s2.uid
__init__(database_url=None, database_name='radicalpilot', session_uid=None)[source]

Creates a new or reconnects to an exising session.

If called without a session_uid, a new Session instance is created and stored in the database. If session_uid is set, an existing session is retrieved from the database.

Arguments:
  • database_url (string): The MongoDB URL. If none is given, RP uses the environment variable RADICAL_PILOT_DBURL. If that is not set, an error will be raises.
  • database_name (string): An alternative database name (default: ‘radicalpilot’).
  • session_uid (string): If session_uid is set, we try re-connect to an existing session instead of creating a new one.
Returns:
  • A new Session instance.
Raises:
close(cleanup=True, terminate=True, delete=None)[source]

Closes the session.

All subsequent attempts access objects attached to the session will result in an error. If cleanup is set to True (default) the session data is removed from the database.

Arguments:
  • cleanup (bool): Remove session from MongoDB (implies * terminate)
  • terminate (bool): Shut down all pilots associated with the session.
Raises:
  • radical.pilot.IncorrectState if the session is closed or doesn’t exist.
as_dict()[source]

Returns a Python dictionary representation of the object.

created

Returns the UTC date and time the session was created.

last_reconnect

Returns the most recent UTC date and time the session was reconnected to.

list_pilot_managers()[source]

Lists the unique identifiers of all radical.pilot.PilotManager instances associated with this session.

Example:

s = radical.pilot.Session(database_url=DBURL)
for pm_uid in s.list_pilot_managers():
    pm = radical.pilot.PilotManager(session=s, pilot_manager_uid=pm_uid) 
Returns:
Raises:
  • radical.pilot.IncorrectState if the session is closed or doesn’t exist.
get_pilot_managers(pilot_manager_ids=None)[source]

Re-connects to and returns one or more existing PilotManager(s).

Arguments:

  • session [radical.pilot.Session]: The session instance to use.
  • pilot_manager_uid [string]: The unique identifier of the PilotManager we want to re-connect to.

Returns:

Raises:

  • radical.pilot.pilotException if a PilotManager with pilot_manager_uid doesn’t exist in the database.
list_unit_managers()[source]

Lists the unique identifiers of all radical.pilot.UnitManager instances associated with this session.

Example:

s = radical.pilot.Session(database_url=DBURL)
for pm_uid in s.list_unit_managers():
    pm = radical.pilot.PilotManager(session=s, pilot_manager_uid=pm_uid) 
Returns:
Raises:
  • radical.pilot.IncorrectState if the session is closed or doesn’t exist.
get_unit_managers(unit_manager_ids=None)[source]

Re-connects to and returns one or more existing UnitManager(s).

Arguments:

  • session [radical.pilot.Session]: The session instance to use.
  • pilot_manager_uid [string]: The unique identifier of the PilotManager we want to re-connect to.

Returns:

Raises:

  • radical.pilot.pilotException if a PilotManager with pilot_manager_uid doesn’t exist in the database.
add_resource_config(resource_config)[source]

Adds a new radical.pilot.ResourceConfig to the PilotManager’s dictionary of known resources, or accept a string which points to a configuration file.

For example:

rc = radical.pilot.ResourceConfig
rc.name                 = "mycluster"
rc.job_manager_endpoint = "ssh+pbs://mycluster
rc.filesystem_endpoint  = "sftp://mycluster
rc.default_queue        = "private"
rc.bootstrapper         = "default_bootstrapper.sh"

pm = radical.pilot.PilotManager(session=s)
pm.add_resource_config(rc)

pd = radical.pilot.ComputePilotDescription()
pd.resource = "mycluster"
pd.cores    = 16
pd.runtime  = 5 # minutes

pilot = pm.submit_pilots(pd)
get_resource_config(resource_key)[source]

Returns a dictionary of the requested resource config

12.1.2. Security Contexts

class radical.pilot.Context(ctype, thedict=None)[source]
classmethod from_dict(thedict)[source]

Creates a new object instance from a string. c._from_dict(x.as_dict) == x

12.2. Pilots and PilotManagers

12.2.1. PilotManagers

class radical.pilot.PilotManager(session, pilot_launcher_workers=1, _reconnect=False)[source]

A PilotManager holds radical.pilot.ComputePilot instances that are submitted via the radical.pilot.PilotManager.submit_pilots() method.

It is possible to attach one or more Using Local and Remote HPC Resources to a PilotManager to outsource machine specific configuration parameters to an external configuration file.

Each PilotManager has a unique identifier radical.pilot.PilotManager.uid that can be used to re-connect to previoulsy created PilotManager in a given radical.pilot.Session.

Example:

s = radical.pilot.Session(database_url=dbURL)

pm1 = radical.pilot.PilotManager(session=s, resource_configurations=RESCONF)
# Re-connect via the 'get()' method.
pm2 = radical.pilot.PilotManager.get(session=s, pilot_manager_id=pm1.uid)

# pm1 and pm2 are pointing to the same PilotManager
assert pm1.uid == pm2.uid
__init__(session, pilot_launcher_workers=1, _reconnect=False)[source]

Creates a new PilotManager and attaches is to the session.

Note

The resource_configurations (see Using Local and Remote HPC Resources) parameter is currently mandatory for creating a new PilotManager instance.

Arguments:

  • session [radical.pilot.Session]: The session instance to use.

  • resource_configurations [string or list of strings]: A list of URLs pointing to Using Local and Remote HPC Resources. Currently file://, http:// and https:// URLs are supported.

    If one or more resource_configurations are provided, Pilots submitted via this PilotManager can access the configuration entries in the files via the ComputePilotDescription. For example:

    pm = radical.pilot.PilotManager(session=s)
    
    pd = radical.pilot.ComputePilotDescription()
    pd.resource = "futuregrid.india"  # defined in futuregrid.json
    pd.cores    = 16
    pd.runtime  = 5 # minutes
    
    pilot = pm.submit_pilots(pd)
    
  • pilot_launcher_workers (int): The number of pilot launcher worker processes to start in the background.

Note

pilot_launcher_workers can be used to tune RADICAL-Pilot’s performance. However, you should only change the default values if you know what you are doing.

Returns:

Raises:
close(terminate=True)[source]

Shuts down the PilotManager and its background workers in a coordinated fashion.

Arguments:

  • terminate [bool]: If set to True, all active pilots will get canceled (default: False).
as_dict()[source]

Returns a Python dictionary representation of the object.

submit_pilots(pilot_descriptions)[source]

Submits a new radical.pilot.ComputePilot to a resource.

Returns:

Raises:

list_pilots()[source]

Lists the unique identifiers of all radical.pilot.ComputePilot instances associated with this PilotManager

Returns:

Raises:

get_pilots(pilot_ids=None)[source]

Returns one or more radical.pilot.ComputePilot instances.

Arguments:

  • pilot_uids [list of strings]: If pilot_uids is set, only the Pilots with the specified uids are returned. If pilot_uids is None, all Pilots are returned.

Returns:

Raises:

wait_pilots(pilot_ids=None, state=['Done', 'Failed', 'Canceled'], timeout=None)[source]

Returns when one or more radical.pilot.ComputePilots reach a specific state or when an optional timeout is reached.

If pilot_uids is None, wait_pilots returns when all Pilots reach the state defined in state.

Arguments:

  • pilot_uids [string or list of strings] If pilot_uids is set, only the Pilots with the specified uids are considered. If pilot_uids is None (default), all Pilots are considered.

  • state [list of strings] The state(s) that Pilots have to reach in order for the call to return.

    By default wait_pilots waits for the Pilots to reach a terminal state, which can be one of the following:

    • radical.pilot.DONE
    • radical.pilot.FAILED
    • radical.pilot.CANCELED
  • timeout [float] Optional timeout in seconds before the call returns regardless whether the Pilots have reached the desired state or not. The default value -1.0 never times out.

Raises:

cancel_pilots(pilot_ids=None)[source]

Cancels one or more ComputePilots.

Arguments:

  • pilot_uids [string or list of strings] If pilot_uids is set, only the Pilots with the specified uids are canceled. If pilot_uids is None, all Pilots are canceled.

Raises:

register_callback(callback_function, callback_data=None)[source]

Registers a new callback function with the PilotManager. Manager-level callbacks get called if any of the ComputePilots managed by the PilotManager change their state.

All callback functions need to have the same signature:

def callback_func(obj, state, data)

where object is a handle to the object that triggered the callback, state is the new state of that object, and data are the data passed on callback registration.

12.2.2. ComputePilotDescription

class radical.pilot.ComputePilotDescription[source]

A ComputePilotDescription object describes the requirements and properties of a radical.pilot.Pilot and is passed as a parameter to radical.pilot.PilotManager.submit_pilots() to instantiate a new pilot.

Note

A ComputePilotDescription MUST define at least resource and the number of cores to allocate on the target resource.

Example:

pm = radical.pilot.PilotManager(session=s)

pd = radical.pilot.ComputePilotDescription()
pd.resource = "local.localhost"  # defined in futuregrid.json
pd.cores    = 16
pd.runtime  = 5 # minutes

pilot = pm.submit_pilots(pd)
resource

[Type: string] [`mandatory`] The key of a Using Local and Remote HPC Resources entry. If the key exists, the machine-specifc configuration is loaded from the configuration once the ComputePilotDescription is passed to radical.pilot.PilotManager.submit_pilots(). If the key doesn’t exist, a radical.pilot.pilotException is thrown.

access_schema

[Type: string] [`optional`] The key of an access mechanism to use. The valid access mechanism are defined in the resource configurations, see Using Local and Remote HPC Resources. The first one defined there is used by default, if no other is specified.

runtime

[Type: int] [mandatory] The maximum run time (wall-clock time) in minutes of the ComputePilot.

sandbox

[Type: string] [optional] The working (“sandbox”) directory of the ComputePilot agent. This parameter is optional. If not set, it defaults to radical.pilot.sandox in your home or login directory.

Warning

If you define a ComputePilot on an HPC cluster and you want to set sandbox manually, make sure that it points to a directory on a shared filesystem that can be reached from all compute nodes.

cores

[Type: int] [mandatory] The number of cores the pilot should allocate on the target resource.

memory

[Type: int] [optional] The amount of memorty (in MB) the pilot should allocate on the target resource.

queue

[Type: string] [optional] The name of the job queue the pilot should get submitted to . If queue is defined in the resource configuration (resource) defining queue will override it explicitly.

project

[Type: string] [optional] The name of the project / allocation to charge for used CPU time. If project is defined in the machine configuration (resource), defining project will override it explicitly.

cleanup

[Type: bool] [optional] If cleanup is set to True, the pilot will delete its entire sandbox upon termination. This includes individual ComputeUnit sandboxes and all generated output data. Only log files will remain in the sandbox directory.

12.2.3. Pilots

class radical.pilot.ComputePilot[source]
A ComputePilot represent a resource overlay on a local or remote
resource.

Note

A ComputePilot cannot be created directly. The factory method radical.pilot.PilotManager.submit_pilots() has to be used instead.

Example:

pm = radical.pilot.PilotManager(session=s)

pd = radical.pilot.ComputePilotDescription()
pd.resource = "local.localhost"
pd.cores    = 2
pd.runtime  = 5 # minutes

pilot = pm.submit_pilots(pd)
as_dict()[source]

Returns a Python dictionary representation of the ComputePilot object.

uid

Returns the Pilot’s unique identifier.

The uid identifies the Pilot within the PilotManager and can be used to retrieve an existing Pilot.

Returns:
  • A unique identifier (string).
description

Returns the pilot description the pilot was started with.

sandbox

Returns the Pilot’s ‘sandbox’ / working directory url.

Returns:
  • A URL string.
state

Returns the current state of the pilot.

state_history

Returns the complete state history of the pilot.

stdout

Returns the stdout of the pilot.

stderr

Returns the stderr of the pilot.

logfile

Returns the logfile of the pilot.

log

Returns the log of the pilot.

resource_detail

Returns the names of the nodes managed by the pilot.

pilot_manager

Returns the pilot manager object for this pilot.

unit_managers

Returns the unit manager object UIDs for this pilot.

units

Returns the units scheduled for this pilot.

submission_time

Returns the time the pilot was submitted.

start_time

Returns the time the pilot was started on the backend.

stop_time

Returns the time the pilot was stopped.

resource

Returns the resource.

register_callback(callback_func, callback_data=None)[source]

Registers a callback function that is triggered every time the ComputePilot’s state changes.

All callback functions need to have the same signature:

def callback_func(obj, state, data)

where object is a handle to the object that triggered the callback, state is the new state of that object, and data is the data passed on callback registration.

wait(state=['Done', 'Failed', 'Canceled'], timeout=None)[source]

Returns when the pilot reaches a specific state or when an optional timeout is reached.

Arguments:

  • state [list of strings] The state(s) that Pilot has to reach in order for the call to return.

    By default wait waits for the Pilot to reach a terminal state, which can be one of the following:

    • radical.pilot.states.DONE
    • radical.pilot.states.FAILED
    • radical.pilot.states.CANCELED
  • timeout [float] Optional timeout in seconds before the call returns regardless whether the Pilot has reached the desired state or not. The default value None never times out.

Raises:

  • radical.pilot.exceptions.radical.pilotException if the state of the pilot cannot be determined.
cancel()[source]

Sends sends a termination request to the pilot.

Raises:

  • radical.pilot.radical.pilotException if the termination request cannot be fulfilled.
stage_in(directives)[source]

Stages the content of the staging directive into the pilot’s staging area

12.3. ComputeUnits and UnitManagers

12.3.1. UnitManager

class radical.pilot.UnitManager(session, scheduler=None, input_transfer_workers=2, output_transfer_workers=2, _reconnect=False)[source]

A UnitManager manages radical.pilot.ComputeUnit instances which represent the executable workload in RADICAL-Pilot. A UnitManager connects the ComputeUnits with one or more Pilot instances (which represent the workload executors in RADICAL-Pilot) and a scheduler which determines which ComputeUnit gets executed on which Pilot.

Each UnitManager has a unique identifier radical.pilot.UnitManager.uid that can be used to re-connect to previoulsy created UnitManager in a given radical.pilot.Session.

Example:

s = radical.pilot.Session(database_url=DBURL)

pm = radical.pilot.PilotManager(session=s)

pd = radical.pilot.ComputePilotDescription()
pd.resource = "futuregrid.alamo"
pd.cores = 16

p1 = pm.submit_pilots(pd) # create first pilot with 16 cores
p2 = pm.submit_pilots(pd) # create second pilot with 16 cores

# Create a workload of 128 '/bin/sleep' compute units
compute_units = []
for unit_count in range(0, 128):
    cu = radical.pilot.ComputeUnitDescription()
    cu.executable = "/bin/sleep"
    cu.arguments = ['60']
    compute_units.append(cu)

# Combine the two pilots, the workload and a scheduler via
# a UnitManager.
um = radical.pilot.UnitManager(session=session,
                           scheduler=radical.pilot.SCHED_ROUND_ROBIN)
um.add_pilot(p1)
um.submit_units(compute_units)
__init__(session, scheduler=None, input_transfer_workers=2, output_transfer_workers=2, _reconnect=False)[source]

Creates a new UnitManager and attaches it to the session.

Args:

  • session (string): The session instance to use.
  • scheduler (string): The name of the scheduler plug-in to use.
  • input_transfer_workers (int): The number of input file transfer worker processes to launch in the background.
  • output_transfer_workers (int): The number of output file transfer worker processes to launch in the background.

Note

input_transfer_workers and output_transfer_workers can be used to tune RADICAL-Pilot’s file transfer performance. However, you should only change the default values if you know what you are doing.

Raises:
close()[source]

Shuts down the UnitManager and its background workers in a coordinated fashion.

as_dict()[source]

Returns a Python dictionary representation of the UnitManager object.

uid

Returns the unique id.

scheduler

Returns the scheduler name.

scheduler_details

Returns the scheduler logs.

add_pilots(pilots)[source]

Associates one or more pilots with the unit manager.

Arguments:

Raises:

list_pilots()[source]

Lists the UIDs of the pilots currently associated with the unit manager.

Returns:

Raises:

get_pilots()[source]

get the pilots instances currently associated with the unit manager.

Returns:

Raises:

remove_pilots(pilot_ids, drain=True)[source]

Disassociates one or more pilots from the unit manager.

TODO: Implement ‘drain’.

After a pilot has been removed from a unit manager, it won’t process any of the unit manager’s units anymore. Calling remove_pilots doesn’t stop the pilot itself.

Arguments:

  • drain [boolean]: Drain determines what happens to the units which are managed by the removed pilot(s). If True, all units currently assigned to the pilot are allowed to finish execution. If False (the default), then ACTIVE units will be canceled.

Raises:

list_units()[source]

Returns the UIDs of the radical.pilot.ComputeUnit managed by this unit manager.

Returns:

submit_units(unit_descriptions)[source]

Submits on or more radical.pilot.ComputeUnit instances to the unit manager.

Arguments:

Returns:

Raises:

get_units(unit_ids=None)[source]

Returns one or more compute units identified by their IDs.

Arguments:

  • unit_ids [string or list of strings]: The IDs of the compute unit objects to return.

Returns:

Raises:

wait_units(unit_ids=None, state=['Done', 'Failed', 'Canceled'], timeout=None)[source]

Returns when one or more radical.pilot.ComputeUnits reach a specific state.

If unit_uids is None, wait_units returns when all ComputeUnits reach the state defined in state.

Example:

# TODO -- add example

Arguments:

  • unit_uids [string or list of strings] If unit_uids is set, only the ComputeUnits with the specified uids are considered. If unit_uids is None (default), all ComputeUnits are considered.

  • state [string] The state that ComputeUnits have to reach in order for the call to return.

    By default wait_units waits for the ComputeUnits to reach a terminal state, which can be one of the following:

    • radical.pilot.DONE
    • radical.pilot.FAILED
    • radical.pilot.CANCELED
  • timeout [float] Timeout in seconds before the call returns regardless of Pilot state changes. The default value None waits forever.

Raises:

cancel_units(unit_ids=None)[source]

Cancel one or more radical.pilot.ComputeUnits.

Arguments:

  • unit_ids [string or list of strings]: The IDs of the compute unit objects to cancel.

Raises:

register_callback(callback_function, metric='UNIT_STATE', callback_data=None)[source]

Registers a new callback function with the UnitManager. Manager-level callbacks get called if the specified metric changes. The default metric UNIT_STATE fires the callback if any of the ComputeUnits managed by the PilotManager change their state.

All callback functions need to have the same signature:

def callback_func(obj, value, data)

where object is a handle to the object that triggered the callback, value is the metric, and data is the data provided on callback registration.. In the example of UNIT_STATE above, the object would be the unit in question, and the value would be the new state of the unit.

Available metrics are:

  • UNIT_STATE: fires when the state of any of the units which are managed by this unit manager instance is changing. It communicates the unit object instance and the units new state.
  • WAIT_QUEUE_SIZE: fires when the number of unscheduled units (i.e. of units which have not been assigned to a pilot for execution) changes.

12.3.2. ComputeUnitDescription

class radical.pilot.ComputeUnitDescription[source]

A ComputeUnitDescription object describes the requirements and properties of a radical.pilot.ComputeUnit and is passed as a parameter to radical.pilot.UnitManager.submit_units() to instantiate and run a new ComputeUnit.

Note

A ComputeUnitDescription MUST define at least an executable.

Example:

# TODO 
executable

(Attribute) The executable to launch (string) [mandatory].

cores

(Attribute) The number of cores (int) required by the executable. (int) [mandatory].

mpi

(Attribute) Set to true if the task is an MPI task. (bool) [optional].

name

(Attribute) A descriptive name for the compute unit (string) [optional].

arguments

(Attribute) The arguments for executable (list of strings) [optional].

environment

(Attribute) Environment variables to set in the execution environment (dict) [optional].

stdout

(Attribute) the name of the file to store stdout in.

stderr

(Attribute) the name of the file to store stderr in.

input_staging

(Attribute) The files that need to be staged before execution (list of staging directives) [optional].

Note

TODO: Explain input staging.

output_staging

(Attribute) The files that need to be staged after execution (list of staging directives) [optional].

Note

TODO: Explain output staging.

pre_exec

(Attribute) Actions to perform before this task starts (list of strings) [optional].

post_exec

(Attribute) Actions to perform after this task finishes (list of strings) [optional].

Note

Before the BigBang, there was nothing ...

kernel

(Attribute) Name of a simulation kernel which expands to description attributes once the unit is scheduled to a pilot (and resource).

Note

TODO: explain in detal, reference ENMDTK.

restartable

(Attribute) If the unit starts to execute on a pilot, but cannot finish because the pilot fails or is canceled, can the unit be restarted on a different pilot / resource? (default: False)

Note

TODO: explain in detal, reference ENMDTK.

cleanup

[Type: bool] [optional] If cleanup is set to True, the pilot will delete the entire unit sandbox upon termination. This includes all generated output data in that sandbox. Output staging will be performed before cleanup.

12.3.3. ComputeUnit

class radical.pilot.ComputeUnit[source]

A ComputeUnit represent a ‘task’ that is executed on a ComputePilot. ComputeUnits allow to control and query the state of this task.

Note

A ComputeUnit cannot be created directly. The factory method radical.pilot.UnitManager.submit_units() has to be used instead.

Example:

umgr = radical.pilot.UnitManager(session=s)

ud = radical.pilot.ComputeUnitDescription()
ud.executable = "/bin/date"
ud.cores      = 1

unit = umgr.submit_units(ud)
as_dict()[source]

Returns a Python dictionary representation of the object.

uid

Returns the unit’s unique identifier.

The uid identifies the ComputeUnit within a UnitManager and can be used to retrieve an existing ComputeUnit.

Returns:
  • A unique identifier (string).
name

Returns the unit’s application specified name.

Returns:
  • A name (string).
working_directory

Returns the full working directory URL of this ComputeUnit.

pilot_id

Returns the pilot_id of this ComputeUnit.

stdout

Returns a snapshot of the executable’s STDOUT stream.

If this property is queried before the ComputeUnit has reached ‘DONE’ or ‘FAILED’ state it will return None.

stderr

Returns a snapshot of the executable’s STDERR stream.

If this property is queried before the ComputeUnit has reached ‘DONE’ or ‘FAILED’ state it will return None.

description

Returns the ComputeUnitDescription the ComputeUnit was started with.

state

Returns the current state of the ComputeUnit.

state_history

Returns the complete state history of the ComputeUnit.

exit_code

Returns the exit code of the ComputeUnit.

If this property is queried before the ComputeUnit has reached ‘DONE’ or ‘FAILED’ state it will return None.

log

Returns the logs of the ComputeUnit.

execution_details

Returns the exeuction location(s) of the ComputeUnit.

execution_locations

Returns the exeuction location(s) of the ComputeUnit. This is just an alias for execution_details.

submission_time

Returns the time the ComputeUnit was submitted.

start_time

Returns the time the ComputeUnit was started on the backend.

stop_time

Returns the time the ComputeUnit was stopped.

register_callback(callback_func, callback_data=None)[source]

Registers a callback function that is triggered every time the ComputeUnit’s state changes.

All callback functions need to have the same signature:

def callback_func(obj, state)

where object is a handle to the object that triggered the callback and state is the new state of that object.

wait(state=['Done', 'Failed', 'Canceled'], timeout=None)[source]

Returns when the ComputeUnit reaches a specific state or when an optional timeout is reached.

Arguments:

  • state [list of strings] The state(s) that compute unit has to reach in order for the call to return.

    By default wait waits for the compute unit to reach a terminal state, which can be one of the following:

    • radical.pilot.states.DONE
    • radical.pilot.states.FAILED
    • radical.pilot.states.CANCELED
  • timeout [float] Optional timeout in seconds before the call returns regardless whether the compute unit has reached the desired state or not. The default value None never times out.

Raises:

cancel()[source]

Cancel the ComputeUnit.

Raises:

  • radical.pilot.radical.pilotException

12.4. Exceptions

class radical.pilot.PilotException(msg, obj=None)[source]
Parameters:
  • msg (string) – Error message, indicating the cause for the exception being raised.
  • obj (object) – RADICAL-Pilot object on whose activity the exception was raised.
Raises:

The base class for all RADICAL-Pilot Exception classes – this exception type is never raised directly, but can be used to catch all RADICAL-Pilot exceptions within a single except clause.

The exception message and originating object are also accessable as class attributes (e.object() and e.message()). The __str__() operator redirects to get_message().

get_object()[source]

Return the object instance on whose activity the exception was raised.

get_message()[source]

Return the error message associated with the exception

class radical.pilot.DatabaseError(msg, obj=None)[source]

TODO: Document me!

12.5. State Models

12.5.1. ComputeUnit State Model

_images/cu_state_model.png

12.5.2. ComputePilot State Model

_images/pilot_state_model.png
  1. A new compute pilot is launched via radical.pilot.PilotManager.submit_pilots()
  2. The pilot is submitted to the remote resource and enters LAUNCHING state.
  3. The pilot has been succesfully launched on the remote machine and is now waiting to become ACTIVE.
  4. The pilot has been launched by the queueing system and is now in ACTIVE STATE.
  5. The pilot has finished execution regularly and enters DONE state.
  6. An error has occured during preparation for pilot launching and the pilot enters FAILED state.
  7. An error has occured during pilot launching and the pilot enters FAILED state.
  8. An error has occured on the backend and the pilot couldn’t become active and the pilot enters FAILED state.
  9. An error has occured during pilot runtime and the pilot enters FAILED state.
  10. The active pilot has been canceled via the radical.pilot.ComputePilot.cancel() call and enters CANCELED state.