12. API Reference¶

12.1. Sessions and Security Contexts¶

12.1.1. Sessions¶

class radical.pilot.Session(database_url=None, database_name='radicalpilot', session_uid=None)[source]¶

A Session encapsulates a RADICAL-Pilot instance and is the root object for all other RADICAL-Pilot objects.

A Session holds radical.pilot.PilotManager and radical.pilot.UnitManager instances which in turn hold radical.pilot.Pilot and radical.pilot.ComputeUnit instances.

Each Session has a unique identifier radical.pilot.Session.uid that can be used to re-connect to a RADICAL-Pilot instance in the database.

Example:

s1 = radical.pilot.Session(database_url=DBURL)
s2 = radical.pilot.Session(database_url=DBURL, session_uid=s1.uid)

# s1 and s2 are pointing to the same session
assert s1.uid == s2.uid

__init__(database_url=None, database_name='radicalpilot', session_uid=None)[source]¶

Creates a new or reconnects to an exising session.

If called without a session_uid, a new Session instance is created and stored in the database. If session_uid is set, an existing session is retrieved from the database.

Arguments:

database_url (string): The MongoDB URL. If none is given, RP uses the environment variable RADICAL_PILOT_DBURL. If that is not set, an error will be raises.
database_name (string): An alternative database name (default: ‘radicalpilot’).
session_uid (string): If session_uid is set, we try re-connect to an existing session instead of creating a new one.

Returns:

A new Session instance.

Raises:

radical.pilot.DatabaseError

close(cleanup=True, terminate=True, delete=None)[source]¶

Closes the session.

All subsequent attempts access objects attached to the session will result in an error. If cleanup is set to True (default) the session data is removed from the database.

Arguments:

cleanup (bool): Remove session from MongoDB (implies * terminate)
terminate (bool): Shut down all pilots associated with the session.

Raises:

radical.pilot.IncorrectState if the session is closed or doesn’t exist.

as_dict()[source]¶: Returns a Python dictionary representation of the object.

created¶: Returns the UTC date and time the session was created.

last_reconnect¶: Returns the most recent UTC date and time the session was reconnected to.

list_pilot_managers()[source]¶

Lists the unique identifiers of all radical.pilot.PilotManager instances associated with this session.

Example:

s = radical.pilot.Session(database_url=DBURL)
for pm_uid in s.list_pilot_managers():
    pm = radical.pilot.PilotManager(session=s, pilot_manager_uid=pm_uid) 

Returns:

A list of radical.pilot.PilotManager uids (list oif strings`).

Raises:

radical.pilot.IncorrectState if the session is closed or doesn’t exist.

get_pilot_managers(pilot_manager_ids=None)[source]¶

Re-connects to and returns one or more existing PilotManager(s).

Arguments:

session [radical.pilot.Session]: The session instance to use.

pilot_manager_uid [string]: The unique identifier of the PilotManager we want to re-connect to.

Returns:

One or more new [radical.pilot.PilotManager] objects.

Raises:

radical.pilot.pilotException if a PilotManager with pilot_manager_uid doesn’t exist in the database.

list_unit_managers()[source]¶

Lists the unique identifiers of all radical.pilot.UnitManager instances associated with this session.

Example:

s = radical.pilot.Session(database_url=DBURL)
for pm_uid in s.list_unit_managers():
    pm = radical.pilot.PilotManager(session=s, pilot_manager_uid=pm_uid) 

Returns:

A list of radical.pilot.UnitManager uids (list of strings).

Raises:

radical.pilot.IncorrectState if the session is closed or doesn’t exist.

get_unit_managers(unit_manager_ids=None)[source]¶

Re-connects to and returns one or more existing UnitManager(s).

Arguments:

session [radical.pilot.Session]: The session instance to use.

pilot_manager_uid [string]: The unique identifier of the PilotManager we want to re-connect to.

Returns:

One or more new [radical.pilot.PilotManager] objects.

Raises:

radical.pilot.pilotException if a PilotManager with pilot_manager_uid doesn’t exist in the database.

add_resource_config(resource_config)[source]¶

Adds a new radical.pilot.ResourceConfig to the PilotManager’s dictionary of known resources, or accept a string which points to a configuration file.

For example:

rc = radical.pilot.ResourceConfig
rc.name                 = "mycluster"
rc.job_manager_endpoint = "ssh+pbs://mycluster
rc.filesystem_endpoint  = "sftp://mycluster
rc.default_queue        = "private"
rc.bootstrapper         = "default_bootstrapper.sh"

pm = radical.pilot.PilotManager(session=s)
pm.add_resource_config(rc)

pd = radical.pilot.ComputePilotDescription()
pd.resource = "mycluster"
pd.cores    = 16
pd.runtime  = 5 # minutes

pilot = pm.submit_pilots(pd)

get_resource_config(resource_key)[source]¶: Returns a dictionary of the requested resource config

12.1.2. Security Contexts¶

class radical.pilot.Context(ctype, thedict=None)[source]¶

classmethod from_dict(thedict)[source]¶: Creates a new object instance from a string. c._from_dict(x.as_dict) == x

12.2. Pilots and PilotManagers¶

12.2.1. PilotManagers¶

class radical.pilot.PilotManager(session, pilot_launcher_workers=1, _reconnect=False)[source]¶

A PilotManager holds radical.pilot.ComputePilot instances that are submitted via the radical.pilot.PilotManager.submit_pilots() method.

It is possible to attach one or more Using Local and Remote HPC Resources to a PilotManager to outsource machine specific configuration parameters to an external configuration file.

Each PilotManager has a unique identifier radical.pilot.PilotManager.uid that can be used to re-connect to previoulsy created PilotManager in a given radical.pilot.Session.

Example:

s = radical.pilot.Session(database_url=dbURL)

pm1 = radical.pilot.PilotManager(session=s, resource_configurations=RESCONF)
# Re-connect via the 'get()' method.
pm2 = radical.pilot.PilotManager.get(session=s, pilot_manager_id=pm1.uid)

# pm1 and pm2 are pointing to the same PilotManager
assert pm1.uid == pm2.uid

__init__(session, pilot_launcher_workers=1, _reconnect=False)[source]¶

Creates a new PilotManager and attaches is to the session.

Note

The resource_configurations (see Using Local and Remote HPC Resources) parameter is currently mandatory for creating a new PilotManager instance.

Arguments:

session [radical.pilot.Session]: The session instance to use.
resource_configurations [string or list of strings]: A list of URLs pointing to Using Local and Remote HPC Resources. Currently file://, http:// and https:// URLs are supported.

If one or more resource_configurations are provided, Pilots submitted via this PilotManager can access the configuration entries in the files via the ComputePilotDescription. For example:
pm = radical.pilot.PilotManager(session=s)

pd = radical.pilot.ComputePilotDescription()
pd.resource = "futuregrid.india"  # defined in futuregrid.json
pd.cores    = 16
pd.runtime  = 5 # minutes

pilot = pm.submit_pilots(pd)
pilot_launcher_workers (int): The number of pilot launcher worker processes to start in the background.

Note

pilot_launcher_workers can be used to tune RADICAL-Pilot’s performance. However, you should only change the default values if you know what you are doing.

Returns:

A new PilotManager object [radical.pilot.PilotManager].

Raises:

radical.pilot.PilotException

close(terminate=True)[source]¶

Shuts down the PilotManager and its background workers in a coordinated fashion.

Arguments:

terminate [bool]: If set to True, all active pilots will get canceled (default: False).

as_dict()[source]¶: Returns a Python dictionary representation of the object.

submit_pilots(pilot_descriptions)[source]¶

Submits a new radical.pilot.ComputePilot to a resource.

Returns:

One or more radical.pilot.ComputePilot instances [list of :class:`radical.pilot.ComputePilot].

Raises:

radical.pilot.PilotException

list_pilots()[source]¶

Lists the unique identifiers of all radical.pilot.ComputePilot instances associated with this PilotManager

Returns:

A list of radical.pilot.ComputePilot uids [string].

Raises:

radical.pilot.PilotException

get_pilots(pilot_ids=None)[source]¶

Returns one or more radical.pilot.ComputePilot instances.

Arguments:

pilot_uids [list of strings]: If pilot_uids is set, only the Pilots with the specified uids are returned. If pilot_uids is None, all Pilots are returned.

Returns:

A list of radical.pilot.ComputePilot objects [list of :class:`radical.pilot.ComputePilot].

Raises:

radical.pilot.PilotException

wait_pilots(pilot_ids=None, state=['Done', 'Failed', 'Canceled'], timeout=None)[source]¶

Returns when one or more radical.pilot.ComputePilots reach a specific state or when an optional timeout is reached.

If pilot_uids is None, wait_pilots returns when all Pilots reach the state defined in state.

Arguments:

pilot_uids [string or list of strings] If pilot_uids is set, only the Pilots with the specified uids are considered. If pilot_uids is None (default), all Pilots are considered.

state [list of strings] The state(s) that Pilots have to reach in order for the call to return.

By default wait_pilots waits for the Pilots to reach a terminal state, which can be one of the following:

radical.pilot.DONE

radical.pilot.FAILED

radical.pilot.CANCELED

timeout [float] Optional timeout in seconds before the call returns regardless whether the Pilots have reached the desired state or not. The default value -1.0 never times out.

Raises:

radical.pilot.PilotException

cancel_pilots(pilot_ids=None)[source]¶

Cancels one or more ComputePilots.

Arguments:

pilot_uids [string or list of strings] If pilot_uids is set, only the Pilots with the specified uids are canceled. If pilot_uids is None, all Pilots are canceled.

Raises:

radical.pilot.PilotException

register_callback(callback_function, callback_data=None)[source]¶

Registers a new callback function with the PilotManager. Manager-level callbacks get called if any of the ComputePilots managed by the PilotManager change their state.

All callback functions need to have the same signature:

def callback_func(obj, state, data)

where object is a handle to the object that triggered the callback, state is the new state of that object, and data are the data passed on callback registration.

12.2.2. ComputePilotDescription¶

class radical.pilot.ComputePilotDescription[source]¶

A ComputePilotDescription object describes the requirements and properties of a radical.pilot.Pilot and is passed as a parameter to radical.pilot.PilotManager.submit_pilots() to instantiate a new pilot.

Note

A ComputePilotDescription MUST define at least resource and the number of cores to allocate on the target resource.

Example:

pm = radical.pilot.PilotManager(session=s)

pd = radical.pilot.ComputePilotDescription()
pd.resource = "local.localhost"  # defined in futuregrid.json
pd.cores    = 16
pd.runtime  = 5 # minutes

pilot = pm.submit_pilots(pd)

resource¶: [Type: string] [`mandatory`] The key of a Using Local and Remote HPC Resources entry. If the key exists, the machine-specifc configuration is loaded from the configuration once the ComputePilotDescription is passed to radical.pilot.PilotManager.submit_pilots(). If the key doesn’t exist, a radical.pilot.pilotException is thrown.

access_schema¶: [Type: string] [`optional`] The key of an access mechanism to use. The valid access mechanism are defined in the resource configurations, see Using Local and Remote HPC Resources. The first one defined there is used by default, if no other is specified.

runtime¶: [Type: int] [mandatory] The maximum run time (wall-clock time) in minutes of the ComputePilot.

sandbox¶: [Type: string] [optional] The working (“sandbox”) directory of the ComputePilot agent. This parameter is optional. If not set, it defaults to radical.pilot.sandox in your home or login directory.

Warning

If you define a ComputePilot on an HPC cluster and you want to set sandbox manually, make sure that it points to a directory on a shared filesystem that can be reached from all compute nodes.

cores¶: [Type: int] [mandatory] The number of cores the pilot should allocate on the target resource.

memory¶: [Type: int] [optional] The amount of memorty (in MB) the pilot should allocate on the target resource.

queue¶: [Type: string] [optional] The name of the job queue the pilot should get submitted to . If queue is defined in the resource configuration (resource) defining queue will override it explicitly.

project¶: [Type: string] [optional] The name of the project / allocation to charge for used CPU time. If project is defined in the machine configuration (resource), defining project will override it explicitly.

cleanup¶: [Type: bool] [optional] If cleanup is set to True, the pilot will delete its entire sandbox upon termination. This includes individual ComputeUnit sandboxes and all generated output data. Only log files will remain in the sandbox directory.

12.2.3. Pilots¶

class radical.pilot.ComputePilot[source]¶

A ComputePilot represent a resource overlay on a local or remote: resource.

Note

A ComputePilot cannot be created directly. The factory method radical.pilot.PilotManager.submit_pilots() has to be used instead.

Example:

pm = radical.pilot.PilotManager(session=s)

pd = radical.pilot.ComputePilotDescription()
pd.resource = "local.localhost"
pd.cores    = 2
pd.runtime  = 5 # minutes

pilot = pm.submit_pilots(pd)

as_dict()[source]¶: Returns a Python dictionary representation of the ComputePilot object.

uid¶

Returns the Pilot’s unique identifier.

The uid identifies the Pilot within the PilotManager and can be used to retrieve an existing Pilot.

Returns:

A unique identifier (string).

description¶: Returns the pilot description the pilot was started with.

sandbox¶

Returns the Pilot’s ‘sandbox’ / working directory url.

Returns:

A URL string.

state¶: Returns the current state of the pilot.

state_history¶: Returns the complete state history of the pilot.

stdout¶: Returns the stdout of the pilot.

stderr¶: Returns the stderr of the pilot.

logfile¶: Returns the logfile of the pilot.

log¶: Returns the log of the pilot.

resource_detail¶: Returns the names of the nodes managed by the pilot.

pilot_manager¶: Returns the pilot manager object for this pilot.

unit_managers¶: Returns the unit manager object UIDs for this pilot.

units¶: Returns the units scheduled for this pilot.

submission_time¶: Returns the time the pilot was submitted.

start_time¶: Returns the time the pilot was started on the backend.

stop_time¶: Returns the time the pilot was stopped.

resource¶: Returns the resource.

register_callback(callback_func, callback_data=None)[source]¶

Registers a callback function that is triggered every time the ComputePilot’s state changes.

All callback functions need to have the same signature:

def callback_func(obj, state, data)

where object is a handle to the object that triggered the callback, state is the new state of that object, and data is the data passed on callback registration.

wait(state=['Done', 'Failed', 'Canceled'], timeout=None)[source]¶

Returns when the pilot reaches a specific state or when an optional timeout is reached.

Arguments:

state [list of strings] The state(s) that Pilot has to reach in order for the call to return.

By default wait waits for the Pilot to reach a terminal state, which can be one of the following:

radical.pilot.states.DONE

radical.pilot.states.FAILED

radical.pilot.states.CANCELED

timeout [float] Optional timeout in seconds before the call returns regardless whether the Pilot has reached the desired state or not. The default value None never times out.

Raises:

radical.pilot.exceptions.radical.pilotException if the state of the pilot cannot be determined.

cancel()[source]¶

Sends sends a termination request to the pilot.

Raises:

radical.pilot.radical.pilotException if the termination request cannot be fulfilled.

stage_in(directives)[source]¶: Stages the content of the staging directive into the pilot’s staging area

12.3. ComputeUnits and UnitManagers¶

12.3.1. UnitManager¶

class radical.pilot.UnitManager(session, scheduler=None, input_transfer_workers=2, output_transfer_workers=2, _reconnect=False)[source]¶

A UnitManager manages radical.pilot.ComputeUnit instances which represent the executable workload in RADICAL-Pilot. A UnitManager connects the ComputeUnits with one or more Pilot instances (which represent the workload executors in RADICAL-Pilot) and a scheduler which determines which ComputeUnit gets executed on which Pilot.

Each UnitManager has a unique identifier radical.pilot.UnitManager.uid that can be used to re-connect to previoulsy created UnitManager in a given radical.pilot.Session.

Example:

s = radical.pilot.Session(database_url=DBURL)

pm = radical.pilot.PilotManager(session=s)

pd = radical.pilot.ComputePilotDescription()
pd.resource = "futuregrid.alamo"
pd.cores = 16

p1 = pm.submit_pilots(pd) # create first pilot with 16 cores
p2 = pm.submit_pilots(pd) # create second pilot with 16 cores

# Create a workload of 128 '/bin/sleep' compute units
compute_units = []
for unit_count in range(0, 128):
    cu = radical.pilot.ComputeUnitDescription()
    cu.executable = "/bin/sleep"
    cu.arguments = ['60']
    compute_units.append(cu)

# Combine the two pilots, the workload and a scheduler via
# a UnitManager.
um = radical.pilot.UnitManager(session=session,
                           scheduler=radical.pilot.SCHED_ROUND_ROBIN)
um.add_pilot(p1)
um.submit_units(compute_units)

__init__(session, scheduler=None, input_transfer_workers=2, output_transfer_workers=2, _reconnect=False)[source]¶

Creates a new UnitManager and attaches it to the session.

Args:

session (string): The session instance to use.

scheduler (string): The name of the scheduler plug-in to use.

input_transfer_workers (int): The number of input file transfer worker processes to launch in the background.

output_transfer_workers (int): The number of output file transfer worker processes to launch in the background.

Note

input_transfer_workers and output_transfer_workers can be used to tune RADICAL-Pilot’s file transfer performance. However, you should only change the default values if you know what you are doing.

Raises:

radical.pilot.PilotException

close()[source]¶: Shuts down the UnitManager and its background workers in a coordinated fashion.

as_dict()[source]¶: Returns a Python dictionary representation of the UnitManager object.

uid¶: Returns the unique id.

scheduler¶: Returns the scheduler name.

scheduler_details¶: Returns the scheduler logs.

add_pilots(pilots)[source]¶

Associates one or more pilots with the unit manager.

Arguments:

pilots [radical.pilot.ComputePilot or list of radical.pilot.ComputePilot]: The pilot objects that will be added to the unit manager.

Raises:

radical.pilot.PilotException

list_pilots()[source]¶

Lists the UIDs of the pilots currently associated with the unit manager.

Returns:

A list of radical.pilot.ComputePilot UIDs [string].

Raises:

radical.pilot.PilotException

get_pilots()[source]¶

get the pilots instances currently associated with the unit manager.

Returns:

A list of radical.pilot.ComputePilot instances.

Raises:

radical.pilot.PilotException

remove_pilots(pilot_ids, drain=True)[source]¶

Disassociates one or more pilots from the unit manager.

TODO: Implement ‘drain’.

After a pilot has been removed from a unit manager, it won’t process any of the unit manager’s units anymore. Calling remove_pilots doesn’t stop the pilot itself.

Arguments:

drain [boolean]: Drain determines what happens to the units which are managed by the removed pilot(s). If True, all units currently assigned to the pilot are allowed to finish execution. If False (the default), then ACTIVE units will be canceled.

Raises:

radical.pilot.PilotException

list_units()[source]¶

Returns the UIDs of the radical.pilot.ComputeUnit managed by this unit manager.

Returns:

A list of radical.pilot.ComputeUnit UIDs [string].

submit_units(unit_descriptions)[source]¶

Submits on or more radical.pilot.ComputeUnit instances to the unit manager.

Arguments:

unit_descriptions [radical.pilot.ComputeUnitDescription or list of radical.pilot.ComputeUnitDescription]: The description of the compute unit instance(s) to create.

Returns:

A list of radical.pilot.ComputeUnit objects.

Raises:

radical.pilot.PilotException

get_units(unit_ids=None)[source]¶

Returns one or more compute units identified by their IDs.

Arguments:

unit_ids [string or list of strings]: The IDs of the compute unit objects to return.

Returns:

A list of radical.pilot.ComputeUnit objects.

Raises:

radical.pilot.PilotException

wait_units(unit_ids=None, state=['Done', 'Failed', 'Canceled'], timeout=None)[source]¶

Returns when one or more radical.pilot.ComputeUnits reach a specific state.

If unit_uids is None, wait_units returns when all ComputeUnits reach the state defined in state.

Example:

# TODO -- add example

Arguments:

unit_uids [string or list of strings] If unit_uids is set, only the ComputeUnits with the specified uids are considered. If unit_uids is None (default), all ComputeUnits are considered.

state [string] The state that ComputeUnits have to reach in order for the call to return.

By default wait_units waits for the ComputeUnits to reach a terminal state, which can be one of the following:

radical.pilot.DONE

radical.pilot.FAILED

radical.pilot.CANCELED

timeout [float] Timeout in seconds before the call returns regardless of Pilot state changes. The default value None waits forever.

Raises:

radical.pilot.PilotException

cancel_units(unit_ids=None)[source]¶

Cancel one or more radical.pilot.ComputeUnits.

Arguments:

unit_ids [string or list of strings]: The IDs of the compute unit objects to cancel.

Raises:

radical.pilot.PilotException

register_callback(callback_function, metric='UNIT_STATE', callback_data=None)[source]¶

Registers a new callback function with the UnitManager. Manager-level callbacks get called if the specified metric changes. The default metric UNIT_STATE fires the callback if any of the ComputeUnits managed by the PilotManager change their state.

All callback functions need to have the same signature:

def callback_func(obj, value, data)

where object is a handle to the object that triggered the callback, value is the metric, and data is the data provided on callback registration.. In the example of UNIT_STATE above, the object would be the unit in question, and the value would be the new state of the unit.

Available metrics are:

UNIT_STATE: fires when the state of any of the units which are managed by this unit manager instance is changing. It communicates the unit object instance and the units new state.

WAIT_QUEUE_SIZE: fires when the number of unscheduled units (i.e. of units which have not been assigned to a pilot for execution) changes.

12.3.2. ComputeUnitDescription¶

class radical.pilot.ComputeUnitDescription[source]¶

A ComputeUnitDescription object describes the requirements and properties of a radical.pilot.ComputeUnit and is passed as a parameter to radical.pilot.UnitManager.submit_units() to instantiate and run a new ComputeUnit.

Note

A ComputeUnitDescription MUST define at least an executable.

Example:

# TODO

executable¶: (Attribute) The executable to launch (string) [mandatory].

cores¶: (Attribute) The number of cores (int) required by the executable. (int) [mandatory].

mpi¶: (Attribute) Set to true if the task is an MPI task. (bool) [optional].

name¶: (Attribute) A descriptive name for the compute unit (string) [optional].

arguments¶: (Attribute) The arguments for executable (list of strings) [optional].

environment¶: (Attribute) Environment variables to set in the execution environment (dict) [optional].

stdout¶: (Attribute) the name of the file to store stdout in.

stderr¶: (Attribute) the name of the file to store stderr in.

input_staging¶: (Attribute) The files that need to be staged before execution (list of staging directives) [optional].

Note

TODO: Explain input staging.

output_staging¶: (Attribute) The files that need to be staged after execution (list of staging directives) [optional].

Note

TODO: Explain output staging.

pre_exec¶: (Attribute) Actions to perform before this task starts (list of strings) [optional].

post_exec¶: (Attribute) Actions to perform after this task finishes (list of strings) [optional].

Note

Before the BigBang, there was nothing ...

kernel¶: (Attribute) Name of a simulation kernel which expands to description attributes once the unit is scheduled to a pilot (and resource).

Note

TODO: explain in detal, reference ENMDTK.

restartable¶: (Attribute) If the unit starts to execute on a pilot, but cannot finish because the pilot fails or is canceled, can the unit be restarted on a different pilot / resource? (default: False)

Note

TODO: explain in detal, reference ENMDTK.

cleanup¶: [Type: bool] [optional] If cleanup is set to True, the pilot will delete the entire unit sandbox upon termination. This includes all generated output data in that sandbox. Output staging will be performed before cleanup.

12.3.3. ComputeUnit¶

class radical.pilot.ComputeUnit[source]¶

A ComputeUnit represent a ‘task’ that is executed on a ComputePilot. ComputeUnits allow to control and query the state of this task.

Note

A ComputeUnit cannot be created directly. The factory method radical.pilot.UnitManager.submit_units() has to be used instead.

Example:

umgr = radical.pilot.UnitManager(session=s)

ud = radical.pilot.ComputeUnitDescription()
ud.executable = "/bin/date"
ud.cores      = 1

unit = umgr.submit_units(ud)

as_dict()[source]¶: Returns a Python dictionary representation of the object.

uid¶

Returns the unit’s unique identifier.

The uid identifies the ComputeUnit within a UnitManager and can be used to retrieve an existing ComputeUnit.

Returns:

A unique identifier (string).

name¶

Returns the unit’s application specified name.

Returns:

A name (string).

working_directory¶: Returns the full working directory URL of this ComputeUnit.

pilot_id¶: Returns the pilot_id of this ComputeUnit.

stdout¶

Returns a snapshot of the executable’s STDOUT stream.

If this property is queried before the ComputeUnit has reached ‘DONE’ or ‘FAILED’ state it will return None.

stderr¶

Returns a snapshot of the executable’s STDERR stream.

If this property is queried before the ComputeUnit has reached ‘DONE’ or ‘FAILED’ state it will return None.

description¶: Returns the ComputeUnitDescription the ComputeUnit was started with.

state¶: Returns the current state of the ComputeUnit.

state_history¶: Returns the complete state history of the ComputeUnit.

exit_code¶

Returns the exit code of the ComputeUnit.

If this property is queried before the ComputeUnit has reached ‘DONE’ or ‘FAILED’ state it will return None.

log¶: Returns the logs of the ComputeUnit.

execution_details¶: Returns the exeuction location(s) of the ComputeUnit.

execution_locations¶: Returns the exeuction location(s) of the ComputeUnit. This is just an alias for execution_details.

submission_time¶: Returns the time the ComputeUnit was submitted.

start_time¶: Returns the time the ComputeUnit was started on the backend.

stop_time¶: Returns the time the ComputeUnit was stopped.

register_callback(callback_func, callback_data=None)[source]¶

Registers a callback function that is triggered every time the ComputeUnit’s state changes.

All callback functions need to have the same signature:

def callback_func(obj, state)

where object is a handle to the object that triggered the callback and state is the new state of that object.

wait(state=['Done', 'Failed', 'Canceled'], timeout=None)[source]¶

Returns when the ComputeUnit reaches a specific state or when an optional timeout is reached.

Arguments:

state [list of strings] The state(s) that compute unit has to reach in order for the call to return.

By default wait waits for the compute unit to reach a terminal state, which can be one of the following:

radical.pilot.states.DONE

radical.pilot.states.FAILED

radical.pilot.states.CANCELED

timeout [float] Optional timeout in seconds before the call returns regardless whether the compute unit has reached the desired state or not. The default value None never times out.

Raises:

cancel()[source]¶

Cancel the ComputeUnit.

Raises:

radical.pilot.radical.pilotException

12.4. Exceptions¶

class radical.pilot.PilotException(msg, obj=None)[source]¶

Parameters:	msg (string) – Error message, indicating the cause for the exception being raised. obj (object) – RADICAL-Pilot object on whose activity the exception was raised.
Raises:	–

The base class for all RADICAL-Pilot Exception classes – this exception type is never raised directly, but can be used to catch all RADICAL-Pilot exceptions within a single except clause.

The exception message and originating object are also accessable as class attributes (e.object() and e.message()). The __str__() operator redirects to get_message().

get_object()[source]¶: Return the object instance on whose activity the exception was raised.

get_message()[source]¶: Return the error message associated with the exception

class radical.pilot.DatabaseError(msg, obj=None)[source]¶: TODO: Document me!

12.5. State Models¶

12.5.1. ComputeUnit State Model¶

12.5.2. ComputePilot State Model¶

A new compute pilot is launched via radical.pilot.PilotManager.submit_pilots()
The pilot is submitted to the remote resource and enters LAUNCHING state.
The pilot has been succesfully launched on the remote machine and is now waiting to become ACTIVE.
The pilot has been launched by the queueing system and is now in ACTIVE STATE.
The pilot has finished execution regularly and enters DONE state.
An error has occured during preparation for pilot launching and the pilot enters FAILED state.
An error has occured during pilot launching and the pilot enters FAILED state.
An error has occured on the backend and the pilot couldn’t become active and the pilot enters FAILED state.
An error has occured during pilot runtime and the pilot enters FAILED state.
The active pilot has been canceled via the radical.pilot.ComputePilot.cancel() call and enters CANCELED state.

12. API Reference¶

12.1. Sessions and Security Contexts¶

12.1.1. Sessions¶

12.1.2. Security Contexts¶

12.2. Pilots and PilotManagers¶

12.2.1. PilotManagers¶

12.2.2. ComputePilotDescription¶

12.2.3. Pilots¶

12.3. ComputeUnits and UnitManagers¶

12.3.1. UnitManager¶

12.3.2. ComputeUnitDescription¶

12.3.3. ComputeUnit¶

12.4. Exceptions¶

12.5. State Models¶

12.5.1. ComputeUnit State Model¶

12.5.2. ComputePilot State Model¶

Table Of Contents

Previous topic

Next topic

This Page