6. API Reference¶

6.1. Sessions and Security Contexts¶

6.1.1. Sessions¶

class radical.pilot.Session(dburl=None, uid=None, cfg=None, _connect=True)[source]¶

A Session encapsulates a RADICAL-Pilot instance and is the root object

A Session holds radical.pilot.PilotManager and radical.pilot.UnitManager instances which in turn hold radical.pilot.ComputePilot and radical.pilot.ComputeUnit instances.

__init__(dburl=None, uid=None, cfg=None, _connect=True)[source]¶

Creates a new session. A new Session instance is created and stored in the database.

Arguments:

dburl (string): The MongoDB URL. If none is given, RP uses the environment variable RADICAL_PILOT_DBURL. If that is not set, an error will be raises.
uid (string): Create a session with this UID. Only use this when you know what you are doing!

Returns:

A new Session instance.

Raises:

radical.pilot.DatabaseError

close(cleanup=False, terminate=True, download=False)[source]¶

Closes the session.

All subsequent attempts access objects attached to the session will result in an error. If cleanup is set to True (default) the session data is removed from the database.

Arguments:

cleanup (bool): Remove session from MongoDB (implies * terminate)
terminate (bool): Shut down all pilots associated with the session.

Raises:

radical.pilot.IncorrectState if the session is closed or doesn’t exist.

as_dict()[source]¶: Returns a Python dictionary representation of the object.

created¶: Returns the UTC date and time the session was created.

connected¶: Returns the most recent UTC date and time the session was reconnected to.

closed¶: Returns the time of closing

inject_metadata(metadata)[source]¶: Insert (experiment) metadata into an active session RP stack version info always get added.

list_pilot_managers()[source]¶

Lists the unique identifiers of all radical.pilot.PilotManager instances associated with this session.

Returns:

A list of radical.pilot.PilotManager uids (list of strings).

get_pilot_managers(pmgr_uids=None)[source]¶

returns known PilotManager(s).

Arguments:

pmgr_uids [string]: unique identifier of the PilotManager we want

Returns:

One or more [radical.pilot.PilotManager] objects.

list_unit_managers()[source]¶

Lists the unique identifiers of all radical.pilot.UnitManager instances associated with this session.

Returns:

A list of radical.pilot.UnitManager uids (list of strings).

get_unit_managers(umgr_uids=None)[source]¶

returns known UnitManager(s).

Arguments:

umgr_uids [string]: unique identifier of the UnitManager we want

Returns:

One or more [radical.pilot.UnitManager] objects.

list_resources()[source]¶: Returns a list of known resource labels which can be used in a pilot description. Not that resource aliases won’t be listed.

add_resource_config(resource_config)[source]¶

Adds a new radical.pilot.ResourceConfig to the PilotManager’s dictionary of known resources, or accept a string which points to a configuration file.

For example:

rc = radical.pilot.ResourceConfig(label="mycluster")
rc.job_manager_endpoint = "ssh+pbs://mycluster
rc.filesystem_endpoint  = "sftp://mycluster
rc.default_queue        = "private"
rc.bootstrapper         = "default_bootstrapper.sh"

pm = radical.pilot.PilotManager(session=s)
pm.add_resource_config(rc)

pd = radical.pilot.ComputePilotDescription()
pd.resource = "mycluster"
pd.cores    = 16
pd.runtime  = 5 # minutes

pilot = pm.submit_pilots(pd)

get_resource_config(resource, schema=None)[source]¶: Returns a dictionary of the requested resource config

6.1.2. Security Contexts¶

class radical.pilot.Context(ctype, thedict=None)[source]¶

__init__(ctype, thedict=None)[source]¶: ctype: string ret: None

classmethod from_dict(thedict)[source]¶: Creates a new object instance from a string. c._from_dict(x.as_dict) == x

6.2. Pilots and PilotManagers¶

6.2.1. PilotManagers¶

class radical.pilot.PilotManager(session)[source]¶

A PilotManager manages radical.pilot.ComputePilot instances that are submitted via the radical.pilot.PilotManager.submit_pilots() method.

It is possible to attach one or more Using Local and Remote HPC Resources to a PilotManager to outsource machine specific configuration parameters to an external configuration file.

Example:

s = radical.pilot.Session(database_url=DBURL)

pm = radical.pilot.PilotManager(session=s)

pd = radical.pilot.ComputePilotDescription()
pd.resource = "futuregrid.alamo"
pd.cpus = 16

p1 = pm.submit_pilots(pd)  # create first  pilot with 16 cores
p2 = pm.submit_pilots(pd)  # create second pilot with 16 cores

# Create a workload of 128 '/bin/sleep' compute units
compute_units = []
for unit_count in range(0, 128):
    cu = radical.pilot.ComputeUnitDescription()
    cu.executable = "/bin/sleep"
    cu.arguments = ['60']
    compute_units.append(cu)

# Combine the two pilots, the workload and a scheduler via
# a UnitManager.
um = radical.pilot.UnitManager(session=session,
                               scheduler=radical.pilot.SCHEDULER_ROUND_ROBIN)
um.add_pilot(p1)
um.submit_units(compute_units)

The pilot manager can issue notification on pilot state changes. Whenever state notification arrives, any callback registered for that notification is fired.

NOTE: State notifications can arrive out of order wrt the pilot state model!

__init__(session)[source]¶

Creates a new PilotManager and attaches is to the session.

Arguments:

session [radical.pilot.Session]: The session instance to use.

Returns:

A new PilotManager object [radical.pilot.PilotManager].

close(terminate=True)[source]¶

Shuts down the PilotManager.

Arguments:

terminate [bool]: cancel non-final pilots if True (default)

is_valid(term=True)[source]¶: Just as the Process’ is_valid() call, we make sure that the component is still viable, and will raise an exception if not. Additionally to the health of the component’s child process, we also check health of any sub-components and communication bridges.

as_dict()[source]¶: Returns a dictionary representation of the PilotManager object.

uid¶: Returns the unique id.

list_pilots()[source]¶

Returns the UIDs of the radical.pilot.ComputePilots managed by this pilot manager.

Returns:

A list of radical.pilot.ComputePilot UIDs [string].

submit_pilots(descriptions)[source]¶

Submits on or more radical.pilot.ComputePilot instances to the pilot manager.

Arguments:

descriptions [radical.pilot.ComputePilotDescription or list of radical.pilot.ComputePilotDescription]: The description of the compute pilot instance(s) to create.

Returns:

A list of radical.pilot.ComputePilot objects.

get_pilots(uids=None)[source]¶

Returns one or more compute pilots identified by their IDs.

Arguments:

uids [string or list of strings]: The IDs of the compute pilot objects to return.

Returns:

A list of radical.pilot.ComputePilot objects.

wait_pilots(uids=None, state=None, timeout=None)[source]¶

Returns when one or more radical.pilot.ComputePilots reach a specific state.

If pilot_uids is None, wait_pilots returns when all ComputePilots reach the state defined in state. This may include pilots which have previously terminated or waited upon.

Example:

# TODO -- add example

Arguments:

pilot_uids [string or list of strings] If pilot_uids is set, only the ComputePilots with the specified uids are considered. If pilot_uids is None (default), all ComputePilots are considered.

state [string] The state that ComputePilots have to reach in order for the call to return.

By default wait_pilots waits for the ComputePilots to reach a terminal state, which can be one of the following:

radical.pilot.rps.DONE

radical.pilot.rps.FAILED

radical.pilot.rps.CANCELED

timeout [float] Timeout in seconds before the call returns regardless of Pilot state changes. The default value None waits forever.

cancel_pilots(uids=None, _timeout=None)[source]¶

Cancel one or more radical.pilot.ComputePilots.

Arguments:

uids [string or list of strings]: The IDs of the compute pilot objects to cancel.

register_callback(cb, metric='PILOT_STATE', cb_data=None)[source]¶

Registers a new callback function with the PilotManager. Manager-level callbacks get called if the specified metric changes. The default metric PILOT_STATE fires the callback if any of the ComputePilots managed by the PilotManager change their state.

All callback functions need to have the same signature:

def cb(obj, value, cb_data)

where object is a handle to the object that triggered the callback, value is the metric, and data is the data provided on callback registration.. In the example of PILOT_STATE above, the object would be the pilot in question, and the value would be the new state of the pilot.

Available metrics are:

PILOT_STATE: fires when the state of any of the pilots which are managed by this pilot manager instance is changing. It communicates the pilot object instance and the pilots new state.

6.2.2. ComputePilotDescription¶

class radical.pilot.ComputePilotDescription(from_dict=None)[source]¶

A ComputePilotDescription object describes the requirements and properties of a radical.pilot.Pilot and is passed as a parameter to radical.pilot.PilotManager.submit_pilots() to instantiate and run a new pilot.

Note

A ComputePilotDescription MUST define at least resource, cores and runtime.

Example:

pm = radical.pilot.PilotManager(session=s)

pd = radical.pilot.ComputePilotDescription()
pd.resource = "local.localhost"  # defined in futuregrid.json
pd.cores    = 16
pd.runtime  = 5 # minutes

pilot = pm.submit_pilots(pd)

resource¶: [Type: string] [`mandatory`] The key of a Using Local and Remote HPC Resources entry. If the key exists, the machine-specifc configuration is loaded from the configuration once the ComputePilotDescription is passed to radical.pilot.PilotManager.submit_pilots(). If the key doesn’t exist, a radical.pilot.pilotException is thrown.

access_schema¶: [Type: string] [`optional`] The key of an access mechanism to use. The valid access mechanism are defined in the resource configurations, see Using Local and Remote HPC Resources. The first one defined there is used by default, if no other is specified.

runtime¶: [Type: int] [mandatory] The maximum run time (wall-clock time) in minutes of the ComputePilot.

sandbox¶: [Type: string] [optional] The working (“sandbox”) directory of the ComputePilot agent. This parameter is optional. If not set, it defaults to radical.pilot.sandox in your home or login directory.

Warning

If you define a ComputePilot on an HPC cluster and you want to set sandbox manually, make sure that it points to a directory on a shared filesystem that can be reached from all compute nodes.

cores¶

[Type: int] [mandatory] The number of cores the pilot should allocate on the target resource.

NOTE: for local pilots, you can set a number larger than the physical machine limit when setting RADICAL_PILOT_PROFILE in your environment.

memory¶: [Type: int] [optional] The amount of memorty (in MB) the pilot should allocate on the target resource.

queue¶: [Type: string] [optional] The name of the job queue the pilot should get submitted to . If queue is defined in the resource configuration (resource) defining queue will override it explicitly.

project¶: [Type: string] [optional] The name of the project / allocation to charge for used CPU time. If project is defined in the machine configuration (resource), defining project will override it explicitly.

candidate_hosts¶: [Type: list] [optional] The list of names of hosts where this pilot is allowed to start on.

cleanup¶: [Type: bool] [optional] If cleanup is set to True, the pilot will delete its entire sandbox upon termination. This includes individual ComputeUnit sandboxes and all generated output data. Only log files will remain in the sandbox directory.

6.2.3. Pilots¶

class radical.pilot.ComputePilot(pmgr, descr)[source]¶

A ComputePilot represent a resource overlay on a local or remote resource.

Note

A ComputePilot cannot be created directly. The factory method radical.pilot.PilotManager.submit_pilots() has to be used instead.

Example:

pm = radical.pilot.PilotManager(session=s)

pd = radical.pilot.ComputePilotDescription()
pd.resource = "local.localhost"
pd.cores    = 2
pd.runtime  = 5 # minutes

pilot = pm.submit_pilots(pd)

as_dict()[source]¶: Returns a Python dictionary representation of the object.

session¶

Returns the pilot’s session.

Returns:

A Session.

pmgr¶

Returns the pilot’s manager.

Returns:

A PilotManager.

resource_details¶: Returns agent level resource information

uid¶

Returns the pilot’s unique identifier.

The uid identifies the pilot within a PilotManager.

Returns:

A unique identifier (string).

state¶

Returns the current state of the pilot.

Returns:

state (string enum)

log¶

Returns a list of human readable [timestamp, string] tuples describing various events during the pilot’s lifetime. Those strings are not normative, only informative!

Returns:

log (list of [timestamp, string] tuples)

stdout¶

Returns a snapshot of the pilot’s STDOUT stream.

If this property is queried before the pilot has reached ‘DONE’ or ‘FAILED’ state it will return None.

Returns:

stdout (string)

stderr¶

Returns a snapshot of the pilot’s STDERR stream.

If this property is queried before the pilot has reached ‘DONE’ or ‘FAILED’ state it will return None.

Returns:

stderr (string)

resource¶

Returns the resource tag of this pilot.

Returns:

A resource tag (string)

pilot_sandbox¶

Returns the full sandbox URL of this pilot, if that is already known, or ‘None’ otherwise.

Returns:

A string

description¶

Returns the description the pilot was started with, as a dictionary.

Returns:

description (dict)

register_callback(cb, metric='PILOT_STATE', cb_data=None)[source]¶

Registers a callback function that is triggered every time the pilot’s state changes.

All callback functions need to have the same signature:

def cb(obj, state)

where object is a handle to the object that triggered the callback and state is the new state of that object. If ‘cb_data’ is given, then the ‘cb’ signature changes to

def cb(obj, state, cb_data)

and ‘cb_data’ are passed along.

wait(state=None, timeout=None)[source]¶

Returns when the pilot reaches a specific state or when an optional timeout is reached.

Arguments:

state [list of strings] The state(s) that pilot has to reach in order for the call to return.

By default wait waits for the pilot to reach a final state, which can be one of the following:

radical.pilot.states.DONE

radical.pilot.states.FAILED

radical.pilot.states.CANCELED

timeout [float] Optional timeout in seconds before the call returns regardless whether the pilot has reached the desired state or not. The default value None never times out.

cancel()[source]¶: Cancel the pilot.

stage_in(directives)[source]¶: Stages the content of the staging directive into the pilot’s staging area

6.3. ComputeUnits and UnitManagers¶

6.3.1. UnitManager¶

class radical.pilot.UnitManager(session, scheduler=None)[source]¶

A UnitManager manages radical.pilot.ComputeUnit instances which represent the executable workload in RADICAL-Pilot. A UnitManager connects the ComputeUnits with one or more Pilot instances (which represent the workload executors in RADICAL-Pilot) and a scheduler which determines which ComputeUnit gets executed on which Pilot.

Example:

s = rp.Session(database_url=DBURL)

pm = rp.PilotManager(session=s)

pd = rp.ComputePilotDescription()
pd.resource = "futuregrid.alamo"
pd.cores = 16

p1 = pm.submit_pilots(pd) # create first pilot with 16 cores
p2 = pm.submit_pilots(pd) # create second pilot with 16 cores

# Create a workload of 128 '/bin/sleep' compute units
compute_units = []
for unit_count in range(0, 128):
    cu = rp.ComputeUnitDescription()
    cu.executable = "/bin/sleep"
    cu.arguments = ['60']
    compute_units.append(cu)

# Combine the two pilots, the workload and a scheduler via
# a UnitManager.
um = rp.UnitManager(session=session,
                    scheduler=rp.SCHEDULER_ROUND_ROBIN)
um.add_pilot(p1)
um.submit_units(compute_units)

The unit manager can issue notification on unit state changes. Whenever state notification arrives, any callback registered for that notification is fired.

NOTE: State notifications can arrive out of order wrt the unit state model!

__init__(session, scheduler=None)[source]¶

Creates a new UnitManager and attaches it to the session.

Arguments:

session [radical.pilot.Session]: The session instance to use.
scheduler (string): The name of the scheduler plug-in to use.

Returns:

A new UnitManager object [radical.pilot.UnitManager].

close()[source]¶: Shut down the UnitManager, and all umgr components.

is_valid(term=True)[source]¶: Just as the Process’ is_valid() call, we make sure that the component is still viable, and will raise an exception if not. Additionally to the health of the component’s child process, we also check health of any sub-components and communication bridges.

as_dict()[source]¶: Returns a dictionary representation of the UnitManager object.

uid¶: Returns the unique id.

scheduler¶: Returns the scheduler name.

add_pilots(pilots)[source]¶

Associates one or more pilots with the unit manager.

Arguments:

pilots [radical.pilot.ComputePilot or list of radical.pilot.ComputePilot]: The pilot objects that will be added to the unit manager.

list_pilots()[source]¶

Lists the UIDs of the pilots currently associated with the unit manager.

Returns:

A list of radical.pilot.ComputePilot UIDs [string].

get_pilots()[source]¶

Get the pilots instances currently associated with the unit manager.

Returns:

A list of radical.pilot.ComputePilot instances.

remove_pilots(pilot_ids, drain=False)[source]¶

Disassociates one or more pilots from the unit manager.

After a pilot has been removed from a unit manager, it won’t process any of the unit manager’s units anymore. Calling remove_pilots doesn’t stop the pilot itself.

Arguments:

drain [boolean]: Drain determines what happens to the units which are managed by the removed pilot(s). If True, all units currently assigned to the pilot are allowed to finish execution. If False (the default), then non-final units will be canceled.

list_units()[source]¶

Returns the UIDs of the radical.pilot.ComputeUnit managed by this unit manager.

Returns:

A list of radical.pilot.ComputeUnit UIDs [string].

submit_units(descriptions)[source]¶

Submits on or more radical.pilot.ComputeUnit instances to the unit manager.

Arguments:

descriptions [radical.pilot.ComputeUnitDescription or list of radical.pilot.ComputeUnitDescription]: The description of the compute unit instance(s) to create.

Returns:

A list of radical.pilot.ComputeUnit objects.

get_units(uids=None)[source]¶

Returns one or more compute units identified by their IDs.

Arguments:

uids [string or list of strings]: The IDs of the compute unit objects to return.

Returns:

A list of radical.pilot.ComputeUnit objects.

wait_units(uids=None, state=None, timeout=None)[source]¶

Returns when one or more radical.pilot.ComputeUnits reach a specific state.

If uids is None, wait_units returns when all ComputeUnits reach the state defined in state. This may include units which have previously terminated or waited upon.

Example:

# TODO -- add example

Arguments:

uids [string or list of strings] If uids is set, only the ComputeUnits with the specified uids are considered. If uids is None (default), all ComputeUnits are considered.

state [string] The state that ComputeUnits have to reach in order for the call to return.

By default wait_units waits for the ComputeUnits to reach a terminal state, which can be one of the following:

radical.pilot.rps.DONE

radical.pilot.rps.FAILED

radical.pilot.rps.CANCELED

timeout [float] Timeout in seconds before the call returns regardless of Pilot state changes. The default value None waits forever.

cancel_units(uids=None)[source]¶

Cancel one or more radical.pilot.ComputeUnits.

Note that cancellation of units is immediate, i.e. their state is immediately set to CANCELED, even if some RP component may still operate on the units. Specifically, other state transitions, including other final states (DONE, FAILED) can occur after cancellation. This is a side effect of an optimization: we consider this acceptable tradeoff in the sense “Oh, that unit was DONE at point of cancellation – ok, we can use the results, sure!”.

If that behavior is not wanted, set the environment variable:

export RADICAL_PILOT_STRICT_CANCEL=True

Arguments:

uids [string or list of strings]: The IDs of the compute units objects to cancel.

register_callback(cb, metric='UNIT_STATE', cb_data=None)[source]¶

Registers a new callback function with the UnitManager. Manager-level callbacks get called if the specified metric changes. The default metric UNIT_STATE fires the callback if any of the ComputeUnits managed by the PilotManager change their state.

All callback functions need to have the same signature:

def cb(obj, value, cb_data)

where object is a handle to the object that triggered the callback, value is the metric, and data is the data provided on callback registration.. In the example of UNIT_STATE above, the object would be the unit in question, and the value would be the new state of the unit.

Available metrics are:

UNIT_STATE: fires when the state of any of the units which are managed by this unit manager instance is changing. It communicates the unit object instance and the units new state.

WAIT_QUEUE_SIZE: fires when the number of unscheduled units (i.e. of units which have not been assigned to a pilot for execution) changes.

6.3.2. ComputeUnitDescription¶

class radical.pilot.ComputeUnitDescription(from_dict=None)[source]¶

A ComputeUnitDescription object describes the requirements and properties of a radical.pilot.ComputeUnit and is passed as a parameter to radical.pilot.UnitManager.submit_units() to instantiate and run a new unit.

Note

A ComputeUnitDescription MUST define at least an executable or kernel – all other elements are optional.

Example:

# TODO

executable¶

The executable to launch (string). The executable is expected to be either available via $PATH on the target resource, or to be an absolute path.

default: None

cpu_processes¶
number of application processes to start on CPU cores
default: 0

cpu_threads¶
number of threads each process will start on CPU cores
default: 1

cpu_process_type¶
process type, determines startup method (POSIX, MPI)
default: POSIX

cpu_thread_type¶
thread type, influences startup and environment (POSIX, OpenMP)
default: POSIX

gpu_processes¶
number of application processes to start on GPU cores
default: 0

gpu_threads¶
number of threads each process will start on GPU cores
default: 1

gpu_process_type¶
process type, determines startup method (POSIX, MPI)
default: POSIX

gpu_thread_type¶
thread type, influences startup and environment (POSIX, OpenMP, CUDA)
default: POSIX

lfs(local file storage)¶
amount of data (MB) required on the local file system of the node
default: 0

name¶

A descriptive name for the compute unit (string). This attribute can be used to map individual units back to application level workloads.

default: None

arguments¶

The command line arguments for the given executable (list of strings).

default: []

environment¶

Environment variables to set in the environment before execution (dict).

default: {}

stdout¶

The name of the file to store stdout in (string).

default: STDOUT

stderr¶

The name of the file to store stderr in (string).

default: STDERR

input_staging¶

The files that need to be staged before execution (list of staging directives, see below).

default: {}

output_staging¶

The files that need to be staged after execution (list of staging directives, see below).

default: {}

pre_exec¶

Actions (shell commands) to perform before this task starts (list of strings). Note that the set of shell commands given here are expected to load environments, check for work directories and data, etc. They are not expected to consume any significant amount of CPU time or other resources! Deviating from that rule will likely result in reduced overall throughput.

No assumption should be made as to where these commands are executed (although RP attempts to perform them in the unit’s execution environment).

No assumption should be made on the specific shell environment the commands are executed in.

Errors in executing these commands will result in the unit to enter FAILED state, and no execution of the actual workload will be attempted.

default: []

post_exec¶

Actions (shell commands) to perform after this task finishes (list of strings). The same remarks as on pre_exec apply, inclusive the point on error handling, which again will cause the unit to fail, even if the actual execution was successful..

default: []

kernel¶

Name of a simulation kernel which expands to description attributes once the unit is scheduled to a pilot (and resource).

Note

TODO: explain in detail, reference ENMDTK.

default: None

restartable¶

If the unit starts to execute on a pilot, but cannot finish because the pilot fails or is canceled, can the unit be restarted on a different pilot / resource?

default: False

metadata¶

user defined metadata

default: None

cleanup¶

If cleanup (a bool) is set to True, the pilot will delete the entire unit sandbox upon termination. This includes all generated output data in that sandbox. Output staging will be performed before cleanup.

Note that unit sandboxes are also deleted if the pilot’s own cleanup flag is set.

default: False

pilot¶: If specified as string (pilot uid), the unit is submitted to the pilot with the given ID. If that pilot is not known to the unit manager, an exception is raised.

The Staging Directives are specified using a dict in the following form:

staging_directive = {

‘source’ : None, # see ‘Location’ below ‘target’ : None, # see ‘Location’ below ‘action’ : None, # See ‘Action operators’ below ‘flags’ : None, # See ‘Flags’ below ‘priority’: 0 # Control ordering of actions (unused)

}

source and target locations can be given as strings or ru.URL instances. Strings containing :// are converted into URLs immediately. Otherwise they are considered absolute or relative paths and are then interpreted in the context of the client’s working directory.

RP accepts the following special URL schemas:

client:// : relative to the client’s working directory

resource://: relative to the RP sandbox on the target resource

pilot:// : relative to the pilot sandbox on the target resource

unit:// : relative to the unit sandbox on the target resource

In all these cases, the hostname element of the URL is expected to be empty, and the path is always considered relative to the locations specified above (even though URLs usually don’t have a notion of relative paths).

RP accepts the following action operators:

rp.TRANSFER: remote file transfer from source URL to target URL.

rp.COPY : local file copy, ie. not crossing host boundaries

rp.MOVE : local file move

rp.LINK : local file symlink

rp.CREATE_PARENTS: create the directory hierarchy for targets on the fly rp.RECURSIVE : if source is a directory, handle it recursively

verify()[source]¶: Verify that the description is syntactically and semantically correct. This method encapsulates checks beyond the SAGA attribute level checks.

6.3.3. ComputeUnit¶

class radical.pilot.ComputeUnit(umgr, descr)[source]¶

A ComputeUnit represent a ‘task’ that is executed on a ComputePilot. ComputeUnits allow to control and query the state of this task.

Note

A unit cannot be created directly. The factory method radical.pilot.UnitManager.submit_units() has to be used instead.

Example:

umgr = radical.pilot.UnitManager(session=s)

ud = radical.pilot.ComputeUnitDescription()
ud.executable = "/bin/date"

unit = umgr.submit_units(ud)

as_dict()[source]¶: Returns a Python dictionary representation of the object.

session¶

Returns the unit’s session.

Returns:

A Session.

umgr¶

Returns the unit’s manager.

Returns:

A UnitManager.

uid¶

Returns the unit’s unique identifier.

The uid identifies the unit within a UnitManager.

Returns:

A unique identifier (string).

name¶

Returns the unit’s application specified name.

Returns:

A name (string).

state¶

Returns the current state of the unit.

Returns:

state (string enum)

exit_code¶

Returns the exit code of the unit, if that is already known, or ‘None’ otherwise.

Returns:

exit code (int)

stdout¶

Returns a snapshot of the executable’s STDOUT stream.

If this property is queried before the unit has reached ‘DONE’ or ‘FAILED’ state it will return None.

Returns:

stdout (string)

stderr¶

Returns a snapshot of the executable’s STDERR stream.

If this property is queried before the unit has reached ‘DONE’ or ‘FAILED’ state it will return None.

Returns:

stderr (string)

pilot¶

Returns the pilot ID of this unit, if that is already known, or ‘None’ otherwise.

Returns:

A pilot ID (string)

unit_sandbox¶

Returns the full sandbox URL of this unit, if that is already known, or ‘None’ otherwise.

Returns:

A URL (radical.utils.Url).

description¶

Returns the description the unit was started with, as a dictionary.

Returns:

description (dict)

metadata¶: Returns the metadata field of the unit’s description

register_callback(cb, cb_data=None)[source]¶

Registers a callback function that is triggered every time the unit’s state changes.

All callback functions need to have the same signature:

def cb(obj, state)

where object is a handle to the object that triggered the callback and state is the new state of that object. If ‘cb_data’ is given, then the ‘cb’ signature changes to

def cb(obj, state, cb_data)

and ‘cb_data’ are passed along.

wait(state=None, timeout=None)[source]¶

Returns when the unit reaches a specific state or when an optional timeout is reached.

Arguments:

state [list of strings] The state(s) that unit has to reach in order for the call to return.

By default wait waits for the unit to reach a final state, which can be one of the following:

radical.pilot.states.DONE

radical.pilot.states.FAILED

radical.pilot.states.CANCELED

timeout [float] Optional timeout in seconds before the call returns regardless whether the unit has reached the desired state or not. The default value None never times out.

cancel()[source]¶: Cancel the unit.

6.4. Exceptions¶

class radical.pilot.PilotException(msg, obj=None)[source]¶

Parameters:	msg (string) – Error message, indicating the cause for the exception being raised. obj (object) – RADICAL-Pilot object on whose activity the exception was raised.
Raises:	–

The base class for all RADICAL-Pilot Exception classes – this exception type is never raised directly, but can be used to catch all RADICAL-Pilot exceptions within a single except clause.

The exception message and originating object are also accessable as class attributes (e.object() and e.message()). The __str__() operator redirects to get_message().

get_object()[source]¶: Return the object instance on whose activity the exception was raised.

get_message()[source]¶: Return the error message associated with the exception

class radical.pilot.DatabaseError(msg, obj=None)[source]¶: TODO: Document me!

6. API Reference¶

6.1. Sessions and Security Contexts¶

6.1.1. Sessions¶

6.1.2. Security Contexts¶

6.2. Pilots and PilotManagers¶

6.2.1. PilotManagers¶

6.2.2. ComputePilotDescription¶

6.2.3. Pilots¶

6.3. ComputeUnits and UnitManagers¶

6.3.1. UnitManager¶

6.3.2. ComputeUnitDescription¶

6.3.3. ComputeUnit¶

6.4. Exceptions¶

Table Of Contents

Previous topic

Next topic

This Page