.. _chapter_user_guide_07: *********************** Sharing Unit Input Data *********************** RP aims to support the concurrent execution of many tasks, and for many workloads which fit that broad description, those tasks share (some or all) input data. We have seen :ref:`earlier ` that input staging can incur a significant runtime overhead -- but that can be significantly reduced by avoiding redundant staging operations. For this purpose, each RP `pilot` manages a spaces of shared data, and any data put into that space by the application can later be symlinked into the unit's workdir, for consumption: .. code-block:: python # stage shared data from `pwd` to the pilot's shared staging space pilot.stage_in({'source': 'file://%s/input.dat' % os.getcwd(), 'target': 'staging:///input.dat', 'action': rp.TRANSFER}) [...] for i in range(0, n): cud = rp.ComputeUnitDescription() cud.executable = '/usr/bin/wc' cud.arguments = ['-c', 'input.dat'] cud.input_staging = {'source': 'staging:///input.dat', 'target': 'input.dat', 'action': rp.LINK } The `rp.LINK` staging action requests a symlink to be created by RP, instead of the copy operation used on the default `rp.TRANSFER` action. The full example can be found here: :download:`07_shared_unit_data.py <../../../examples/07_shared_unit_data.py>`. .. note:: Unlike many other methods in RP, the `pilot.stage_in` option is *synchronous*, ie. it will only return once the transfer has been completed. That semantics may change in a future version of RP. Running the Example ------------------- The result of this example's execution is the very same as we saw in the :ref:`previous `, but it will now run significantly faster due to the removed staging redundancy (at least for non-local pilots): .. image:: 07_shared_unit_data.png What's Next? ------------ This completes the discussion on data staging -- the next sections will go into more details of the units execution: :ref:`environment setup `, :ref:`pre- and post- execution `, and :ref:`MPI applications `.