.. _chapter_user_guide_00: *************** Getting Started *************** In this section we will walk you through the basics of using RP. After you have worked through this chapter, you will understand how to launch a local ``ComputePilot`` and use a ``UnitManager`` to schedule and run ``ComputeUnits`` (tasks) on it. .. note:: The reader is assumed to be familiar with the general RP concepts as described in :ref:`chapter_overview` for reference. .. note:: This chapter assumes that you have successfully installed RADICAL-Pilot, and also configured access to the resources you intent to use for the examples (see chapter :ref:`chapter_installation`). .. note:: We colloquially refer to ``ComputePilot`` as `pilot`, and to ``ComputeUnit`` as `unit`. You can download the basic :download:`00_getting_started.py <../../../examples/00_getting_started.py>`. The text below will explain the most important code sections, and at the end shows the expected output from the execution of the example. Please look carefully at the code comments as they explain some aspects of the code which are not explicitly covered in the text below. Loading the RP Module, Follow the Application Execution ------------------------------------------------------- In order to use RADICAL-Pilot, you need to import the ``radical.pilot`` module (we use the `rp` abbreviation for the module name) in your Python script or application: .. code-block:: python import radical.pilot as rp All example scripts used in this user guide use the ``LogReporter`` facility (of RADICAL-Utils) to print runtime and progress information. You can control that output with the ``RADICAL_PILOT_VERBOSE`` variable, which can be set to the normal Python logging levels, and to the value ``REPORT`` to obtain well formatted output. We assume the ``REPORT`` setting to be used when referencing any output in this chapter. .. code-block:: python os.environ['RADICAL_PILOT_VERBOSE'] = 'REPORT' import radical.pilot as rp import radical.utils as ru report = ru.LogReporter(name='radical.pilot') report.title('Getting Started (RP version %s)' % rp.version) Creating a Session ------------------ A :class:`radical.pilot.Session` is the root object for all other objects in RADICAL- Pilot. :class:`radical.pilot.PilotManager` and :class:`radical.pilot.UnitManager` instances are always attached to a Session, and their lifetime is controlled by the session. A Session also encapsulates the connection(s) to a backend `MongoDB `_ server which facilitates the communication between the RP application and the remote pilot jobs. More information about how RADICAL-Pilot uses MongoDB can be found in the :ref:`chapter_overview` section. To create a new Session, the only thing you need to provide is the URL of a MongoDB server. If no MongoDB URL is specified on session creation, RP attempts to use the value specified via the ``RADICAL_PILOT_DBURL`` environment variable. .. code-block:: python os.environ['RADICAL_PILOT_DBURL'] = 'mongodb://db.host.net:27017/' session = rp.Session() .. warning:: Always call :func:`radical.pilot.Session.close` before your application terminates. This will terminate all lingering pilots and cleans out the database entries of the session. Creating ComputePilots ---------------------- A :class:`radical.pilot.ComputePilot` is responsible for ``ComputeUnit`` execution. Pilots can be launched either locally or remotely, and they can manage a single node or a large number of nodes on a cluster. Pilots are created via a :class:`radical.pilot.PilotManager`, by passing a :class:`radical.pilot.ComputePilotDescription`. The most important elements of the ``ComputePilotDescription`` are * `resource`: a label which specifies the target resource to run the pilot on, ie. the location of the pilot; * `cores` : the number of CPU cores the pilot is expected to manage, ie. the size of the pilot; * `runtime` : the numbers of minutes the pilot is expected to be active, ie. the runtime of the pilot. Depending on the specific target resource and use case, other properties need to be specified. In our user guide examples, we use a separate `config.json` file to store a number of properties per resource label, to simplify the example code. The examples themselves then accept one or more resource labels, and create the pilots on those resources: .. code-block:: python # use the resource specified as argument, fall back to localhost try : resource = sys.argv[1] except: resource = 'local.localhost' # create a pilot manage in the session pmgr = rp.PilotManager(session=session) # define an [n]-core local pilot that runs for [x] minutes pdesc = rp.ComputePilotDescription({ 'resource' : resource, 'cores' : 64, # pilot size 'runtime' : 10, # pilot runtime (min) 'project' : config[resource]['project'], 'queue' : config[resource]['queue'], 'access_schema' : config[resource]['schema'] } # submit the pilot for launching pilot = pmgr.submit_pilots(pdesc) For a list of available resource labels, see :ref:`chapter_resources` (not all of those resources are configured for the userguide examples). For further details on the pilot description, please check the :class:`API Documentation `. .. warning:: Note that the submitted pilot agent **will not terminate** when your Python scripts finishes. Pilot agents terminate only after they have reached their ``runtime`` limit, are killed by the target system, or if you explicitly cancel them via :func:`radical.pilot.Pilot.cancel`, :func:`radical.pilot.PilotManager.cancel_pilots`, or :func:`radical.pilot.Session.close(terminate=True)`. Submitting ComputeUnits ----------------------- After you have launched a pilot, you can now generate :class:`radical.pilot.ComputeUnit` objects for the pilot to execute. You can think of a ``ComputeUnit`` as something very similar to an operating system process that consists of an ``executable``, a list of ``arguments``, and an ``environment`` along with some runtime requirements. Analogous to pilots, a units is described via a :class:`radical.pilot.ComputeUnitDescription` object. The mandatory properties that you need to define are: * ``executable`` - the executable to launch * ``cores`` - the number of cores required by the executable Our basic example creates 128 units which each run `/bin/date`: .. code-block:: python n = 128 # number of units to run cuds = list() for i in range(0, n): # create a new CU description, and fill it. cud = rp.ComputeUnitDescription() cud.executable = '/bin/date' cuds.append(cud) Units are executed by pilots. The `:class:radical.pilot.UnitManager` class is responsible for routing those units from the application to the available pilots. The ``UnitManager`` accepts ``ComputeUnitDescriptions`` as we created above and assigns them, according to some scheduling algorithm, to the set of available pilots for execution (pilots are made available to a ``UnitManager`` via the ``add_pilot`` call): .. code-block:: python # create a unit manager, submit units, and wait for their completion umgr = rp.UnitManager(session=session) umgr.add_pilots(pilot) umgr.submit_units(cuds) umgr.wait_units() Running the Example ------------------- .. note:: Remember to set `RADICAL_PILOT_DBURL` in you environment (see chapter :ref:`chapter_installation`). Running the example will result in an output similar to the one shown below: .. image:: 00_getting_started.png The runtime can vary significantly, and typically the first run on any resource will be longest. This is because the first time RP is used on a new resource for a specific user, it will set up a Python virtualenv for the pilot to use. Subsequent runs may update that virtualenv, or may install additional components as needed, but that should take less time than its creation. So please allow for a couple of minutes on the first execution (depending on your network connectivity, the connectivity of the target resource, and the location of the MongoDB service). What's Next? ------------ The next user guide section (:ref:`chapter_user_guide_01`) will describe how an application can inspect completed units for more detailed information, such as exit codes and stdout/stderr.