Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-35813: Tests and docs for shared_memory #11816

Merged
merged 44 commits into from
Feb 24, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
720a8ea
Added tests for shared_memory submodule.
applio Feb 2, 2019
29a7f80
Added tests for ShareableList.
applio Feb 3, 2019
c56e29c
Fix bug in allocationn size during creation of empty ShareableList il…
applio Feb 3, 2019
c36de70
Initial set of docs for shared_memory module.
applio Feb 7, 2019
3c89c7c
Added docs for ShareableList, added doctree entry for shared_memory s…
applio Feb 8, 2019
5f4ba8f
Added examples to SharedMemoryManager docs, for ease of documentation…
applio Feb 9, 2019
f9aaa11
Wording tweaks to docs.
applio Feb 9, 2019
2377cfd
Fix test failures on Windows.
applio Feb 9, 2019
6bfa560
Added tests around SharedMemoryManager.
applio Feb 9, 2019
eaf7888
Documentation tweaks.
applio Feb 9, 2019
e166ed9
Fix inappropriate test on Windows.
applio Feb 9, 2019
0f18511
Further documentation tweaks.
applio Feb 11, 2019
a097dbb
Fix bare exception.
applio Feb 11, 2019
7c65017
Removed __copyright__.
applio Feb 11, 2019
da7731d
Fixed typo in doc, removed comment.
applio Feb 11, 2019
242a5e9
Merge remote-tracking branch 'upstream/master' into enh-tests-shmem
applio Feb 11, 2019
7bdfbbb
Updated SharedMemoryManager preliminary tests to reflect change of no…
applio Feb 11, 2019
eec4bb1
Added Sphinx doctest run controls.
applio Feb 11, 2019
1076567
CloseHandle should be in a finally block in case MapViewOfFile fails.
applio Feb 12, 2019
0be0531
Missed opportunity to use with statement.
applio Feb 12, 2019
1e5341e
Switch to self.addCleanup to spare long try/finally blocks and save o…
applio Feb 12, 2019
a5800a9
Simplify the posixshmem extension module.
nascheme Feb 13, 2019
34f1e9a
Added to doc around size parameter of SharedMemory.
applio Feb 16, 2019
9846290
Changed PosixSharedMemory.size to use os.fstat.
applio Feb 16, 2019
1f9bbf2
Change SharedMemory.buf to a read-only property as well as NamedShare…
applio Feb 17, 2019
69dd8a9
Marked as provisional per PEP411 in docstring.
applio Feb 17, 2019
8cf9ba3
Merge branch 'enh-tests-neilsimplify-shmem' into enh-tests-shmem
applio Feb 17, 2019
594140a
Changed SharedMemoryTracker to be private.
applio Feb 17, 2019
395709b
Removed registered Proxy Objects from SharedMemoryManager.
applio Feb 17, 2019
aa4a887
Removed shareable_wrap().
applio Feb 17, 2019
885592b
Removed shareable_wrap() and dangling references to it.
applio Feb 17, 2019
9001b76
Merge remote and local branches regarding elimination of
applio Feb 17, 2019
5848ec4
For consistency added __reduce__ to key classes.
applio Feb 17, 2019
6ff8eed
Fix for potential race condition on Windows for O_CREX.
applio Feb 18, 2019
06620e2
Remove unused imports.
applio Feb 18, 2019
868b83d
Update access to kernel32 on Windows per feedback from eryksun.
applio Feb 19, 2019
9d83b06
Moved kernel32 calls to _winapi.
applio Feb 20, 2019
715ded9
Removed ShareableList.copy as redundant.
applio Feb 20, 2019
6878533
Changes to _winapi use from eryksun feedback.
applio Feb 20, 2019
0d3d06f
Adopt simpler SharedMemory API, collapsing PosixSharedMemory and Wind…
applio Feb 21, 2019
05e26dd
Fix missing docstring on class, add test for ignoring size when attac…
applio Feb 21, 2019
7a3c7e5
Moved SharedMemoryManager to managers module, tweak to fragile test.
applio Feb 21, 2019
caf0a5d
Tweak to exception in OpenFileMapping suggested by eryksun.
applio Feb 21, 2019
12c097d
Mark a few dangling bits as private as suggested by Giampaolo.
applio Feb 22, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Doc/library/concurrency.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ multitasking). Here's an overview:

threading.rst
multiprocessing.rst
multiprocessing.shared_memory.rst
concurrent.rst
concurrent.futures.rst
subprocess.rst
Expand Down
343 changes: 343 additions & 0 deletions Doc/library/multiprocessing.shared_memory.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,343 @@
:mod:`multiprocessing.shared_memory` --- Provides shared memory for direct access across processes
===================================================================================================

.. module:: multiprocessing.shared_memory
:synopsis: Provides shared memory for direct access across processes.

**Source code:** :source:`Lib/multiprocessing/shared_memory.py`

.. versionadded:: 3.8

.. index::
single: Shared Memory
single: POSIX Shared Memory
single: Named Shared Memory

--------------

This module provides a class, :class:`SharedMemory`, for the allocation
and management of shared memory to be accessed by one or more processes
on a multicore or symmetric multiprocessor (SMP) machine. To assist with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the double whitepace on machine. To assist if for more readable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the resulting html, whether one or two spaces are in the source only one space is presented in the browser. The convention of using two spaces predates patterns of using only one space after a period. As far as I know, both are "correct". Thankfully the output is the same either way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not go looking for this but I accidentally stumbled across it in the dev guide (see devguide.python.org for lots more such goodies):

A sentence-ending period may be followed by one or two spaces;
while reST ignores the second space, it is customarily put in
by some users, for example to aid Emacs’ auto-fill mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh great thanks for the clarification. I had the idea that double space is used just on docstring.

the life-cycle management of shared memory especially across distinct
processes, a :class:`~multiprocessing.managers.BaseManager` subclass,
:class:`SharedMemoryManager`, is also provided in the
``multiprocessing.managers`` module.

In this module, shared memory refers to "System V style" shared memory blocks
(though is not necessarily implemented explicitly as such) and does not refer
to "distributed shared memory". This style of shared memory permits distinct
processes to potentially read and write to a common (or shared) region of
volatile memory. Processes are conventionally limited to only have access to
their own process memory space but shared memory permits the sharing
of data between processes, avoiding the need to instead send messages between
processes containing that data. Sharing data directly via memory can provide
significant performance benefits compared to sharing data via disk or socket
or other communications requiring the serialization/deserialization and
copying of data.


.. class:: SharedMemory(name=None, create=False, size=0)

Creates a new shared memory block or attaches to an existing shared
memory block. Each shared memory block is assigned a unique name.
In this way, one process can create a shared memory block with a
particular name and a different process can attach to that same shared
memory block using that same name.

As a resource for sharing data across processes, shared memory blocks
may outlive the original process that created them. When one process
no longer needs access to a shared memory block that might still be
needed by other processes, the :meth:`close()` method should be called.
When a shared memory block is no longer needed by any process, the
:meth:`unlink()` method should be called to ensure proper cleanup.

*name* is the unique name for the requested shared memory, specified as
a string. When creating a new shared memory block, if ``None`` (the
default) is supplied for the name, a novel name will be generated.

*create* controls whether a new shared memory block is created (``True``)
or an existing shared memory block is attached (``False``).

*size* specifies the requested number of bytes when creating a new shared
memory block. Because some platforms choose to allocate chunks of memory
based upon that platform's memory page size, the exact size of the shared
memory block may be larger or equal to the size requested. When attaching
to an existing shared memory block, the ``size`` parameter is ignored.

.. method:: close()

Closes access to the shared memory from this instance. In order to
ensure proper cleanup of resources, all instances should call
``close()`` once the instance is no longer needed. Note that calling
``close()`` does not cause the shared memory block itself to be
destroyed.

.. method:: unlink()

Requests that the underlying shared memory block be destroyed. In
order to ensure proper cleanup of resources, ``unlink()`` should be
called once (and only once) across all processes which have need
for the shared memory block. After requesting its destruction, a
shared memory block may or may not be immediately destroyed and
this behavior may differ across platforms. Attempts to access data
inside the shared memory block after ``unlink()`` has been called may
result in memory access errors. Note: the last process relinquishing
its hold on a shared memory block may call ``unlink()`` and
:meth:`close()` in either order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm... what's the point of having both close and unlink? Why not simply having close() also destroy the memory blocks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After all instances have called close(), there are still reasons to potentially want to have a shared memory block persist. An existing technique that some services use to preserve data even when the service dies unexpectedly is to store critical data in a shared memory block with an established name -- when the service restarts, it looks for the shared memory to still be there by name, thereby reducing the startup time of the service. Other use cases likely helped motivate the mentality behind the Windows implementation of shared memory which always preserves the shared memory block until the last process with a handle on it terminates -- Windows offers no way to perform an unlink() action ahead of time. Closest thing you can do on Windows is truncate its size to near zero but even that might not trigger a partial release of memory space. Across platforms, the common mentality and conscious design is to preserve shared memory until its release is explicitly demanded.


.. attribute:: buf

A memoryview of contents of the shared memory block.

.. attribute:: name

Read-only access to the unique name of the shared memory block.

.. attribute:: size

Read-only access to size in bytes of the shared memory block.
Copy link
Contributor

@giampaolo giampaolo Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the size previously passed as argument or the actual size of the occupied by all memory blocks (aka, the whole object's size)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the requested size in bytes of the current (only one) shared memory block. It is possible to attach to a larger shared memory block but only request access to the first n bytes. One use case for this would be to avoid any potential for processes to access rapidly changing parts of the shared memory block (since two threads or processes reading and writing to the same location in memory is dangerous...)

The actual size of the whole shared memory block is obtainable. When requesting to attach to an existing shared memory segment, supply size=0 and the actual size of the existing shared memory block is discovered and made available. This is probably the dominant use case when attaching to existing shared memory blocks.



The following example demonstrates low-level use of :class:`SharedMemory`
instances::

>>> from multiprocessing import shared_memory
>>> shm_a = shared_memory.SharedMemory(create=True, size=10)
>>> type(shm_a.buf)
<class 'memoryview'>
>>> buffer = shm_a.buf
>>> len(buffer)
10
>>> buffer[:4] = bytearray([22, 33, 44, 55]) # Modify multiple at once
>>> buffer[4] = 100 # Modify single byte at a time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if pep8 will fail here. Because it say you that you need have two whitespaces.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this physical alignment of two related comments helps readability significantly. To not do so would make the example much less readable. Pep8 encourages considering this in particular.

>>> # Attach to an existing shared memory block
>>> shm_b = shared_memory.SharedMemory(shm_a.name)
>>> import array
>>> array.array('b', shm_b.buf[:5]) # Copy the data into a new array.array
array('b', [22, 33, 44, 55, 100])
>>> shm_b.buf[:5] = b'howdy' # Modify via shm_b using bytes
>>> bytes(shm_a.buf[:5]) # Access via shm_a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as on the prior; I believe this significantly helps readability in a situation where such help is needed.

b'howdy'
>>> shm_b.close() # Close each SharedMemory instance
>>> shm_a.close()
>>> shm_a.unlink() # Call unlink only once to release the shared memory



The following example demonstrates a practical use of the :class:`SharedMemory`
class with `NumPy arrays <https://www.numpy.org/>`_, accessing the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if is necessary show a sample with NumPy, it's just an opinion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of NumPy arrays with shared memory is anticipated to be one of the most popular use cases for working with shared memory. Demonstrating this combination is likely very important to a large number of people.

same ``numpy.ndarray`` from two distinct Python shells:

.. doctest::
:options: +SKIP

>>> # In the first Python interactive shell
>>> import numpy as np
>>> a = np.array([1, 1, 2, 3, 5, 8]) # Start with an existing NumPy array
>>> from multiprocessing import shared_memory
>>> shm = shared_memory.SharedMemory(create=True, size=a.nbytes)
>>> # Now create a NumPy array backed by shared memory
>>> b = np.ndarray(a.shape, dtype=a.dtype, buffer=shm.buf)
>>> b[:] = a[:] # Copy the original data into shared memory
>>> b
array([1, 1, 2, 3, 5, 8])
>>> type(b)
<class 'numpy.ndarray'>
>>> type(a)
<class 'numpy.ndarray'>
>>> shm.name # We did not specify a name so one was chosen for us
'psm_21467_46075'

>>> # In either the same shell or a new Python shell on the same machine
>>> import numpy as np
>>> from multiprocessing import shared_memory
>>> # Attach to the existing shared memory block
>>> existing_shm = shared_memory.SharedMemory(name='psm_21467_46075')
>>> # Note that a.shape is (6,) and a.dtype is np.int64 in this example
>>> c = np.ndarray((6,), dtype=np.int64, buffer=existing_shm.buf)
>>> c
array([1, 1, 2, 3, 5, 8])
>>> c[-1] = 888
>>> c
array([ 1, 1, 2, 3, 5, 888])

>>> # Back in the first Python interactive shell, b reflects this change
>>> b
array([ 1, 1, 2, 3, 5, 888])

>>> # Clean up from within the second Python shell
>>> del c # Unnecessary; merely emphasizing the array is no longer used
>>> existing_shm.close()

>>> # Clean up from within the first Python shell
>>> del b # Unnecessary; merely emphasizing the array is no longer used
>>> shm.close()
>>> shm.unlink() # Free and release the shared memory block at the very end


.. class:: SharedMemoryManager([address[, authkey]])

A subclass of :class:`~multiprocessing.managers.BaseManager` which can be
used for the management of shared memory blocks across processes.

A call to :meth:`~multiprocessing.managers.BaseManager.start` on a
:class:`SharedMemoryManager` instance causes a new process to be started.
This new process's sole purpose is to manage the life cycle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process' ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After double-checking with a technical editor just now, I have been assured that either is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

of all shared memory blocks created through it. To trigger the release
of all shared memory blocks managed by that process, call
:meth:`~multiprocessing.managers.BaseManager.shutdown()` on the instance.
This triggers a :meth:`SharedMemory.unlink()` call on all of the
:class:`SharedMemory` objects managed by that process and then
stops the process itself. By creating ``SharedMemory`` instances
through a ``SharedMemoryManager``, we avoid the need to manually track
and trigger the freeing of shared memory resources.

This class provides methods for creating and returning :class:`SharedMemory`
instances and for creating a list-like object (:class:`ShareableList`)
backed by shared memory.

Refer to :class:`multiprocessing.managers.BaseManager` for a description
of the inherited *address* and *authkey* optional input arguments and how
they may be used to connect to an existing ``SharedMemoryManager`` service
from other processes.

.. method:: SharedMemory(size)

Create and return a new :class:`SharedMemory` object with the
specified ``size`` in bytes.

.. method:: ShareableList(sequence)

Create and return a new :class:`ShareableList` object, initialized
by the values from the input ``sequence``.


The following example demonstrates the basic mechanisms of a
:class:`SharedMemoryManager`:

.. doctest::
:options: +SKIP

>>> from multiprocessing import shared_memory
>>> smm = shared_memory.SharedMemoryManager()
>>> smm.start() # Start the process that manages the shared memory blocks
>>> sl = smm.ShareableList(range(4))
>>> sl
ShareableList([0, 1, 2, 3], name='psm_6572_7512')
>>> raw_shm = smm.SharedMemory(size=128)
>>> another_sl = smm.ShareableList('alpha')
>>> another_sl
ShareableList(['a', 'l', 'p', 'h', 'a'], name='psm_6572_12221')
>>> smm.shutdown() # Calls unlink() on sl, raw_shm, and another_sl

The following example depicts a potentially more convenient pattern for using
:class:`SharedMemoryManager` objects via the :keyword:`with` statement to
ensure that all shared memory blocks are released after they are no longer
needed:

.. doctest::
:options: +SKIP

>>> with shared_memory.SharedMemoryManager() as smm:
... sl = smm.ShareableList(range(2000))
... # Divide the work among two processes, storing partial results in sl
... p1 = Process(target=do_work, args=(sl, 0, 1000))
... p2 = Process(target=do_work, args=(sl, 1000, 2000))
... p1.start()
... p2.start() # A multiprocessing.Pool might be more efficient
... p1.join()
... p2.join() # Wait for all work to complete in both processes
... total_result = sum(sl) # Consolidate the partial results now in sl

When using a :class:`SharedMemoryManager` in a :keyword:`with` statement, the
shared memory blocks created using that manager are all released when the
:keyword:`with` statement's code block finishes execution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, having this in multiprocessing.managers would allow the repetition of the with statement usage/clarification.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all my years of working with multiprocessing even before becoming a core dev, it has been extraordinarily rare to meet anyone who knew of multiprocessing.managers and even fewer people have used one. I believe this is because (1) multiprocessing.managers are not commonly needed for the vast majority of use cases, and (2) they are not being discovered because there is too much information and too many non-trivial concepts covered in the multiprocessing main documentation.

When a user wishes to use the zero-copy shared memory functionality, they will very commonly also want to use SharedMemoryManager. We should help users make this mental connection right away.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the preferred way should be using SharedMemoryManager because it's more high level.

In all my years of working with multiprocessing even before becoming a core dev, it has been extraordinarily rare to meet anyone who knew of multiprocessing.managers and even fewer people have used one.

Mmm... apparently it's not so rare:
https://www.programcreek.com/python/example/8456/multiprocessing.Manager



.. class:: ShareableList(sequence=None, *, name=None)

Provides a mutable list-like object where all values stored within are
stored in a shared memory block. This constrains storable values to
only the ``int``, ``float``, ``bool``, ``str`` (less than 10M bytes each),
``bytes`` (less than 10M bytes each), and ``None`` built-in data types.
It also notably differs from the built-in ``list`` type in that these
lists can not change their overall length (i.e. no append, insert, etc.)
and do not support the dynamic creation of new :class:`ShareableList`
instances via slicing.

*sequence* is used in populating a new ``ShareableList`` full of values.
Set to ``None`` to instead attach to an already existing
``ShareableList`` by its unique shared memory name.

*name* is the unique name for the requested shared memory, as described
in the definition for :class:`SharedMemory`. When attaching to an
existing ``ShareableList``, specify its shared memory block's unique
name while leaving ``sequence`` set to ``None``.

.. method:: count(value)

Returns the number of occurrences of ``value``.

.. method:: index(value)

Returns first index position of ``value``. Raises :exc:`ValueError` if
``value`` is not present.

.. attribute:: format

Read-only attribute containing the :mod:`struct` packing format used by
all currently stored values.

.. attribute:: shm

The :class:`SharedMemory` instance where the values are stored.


The following example demonstrates basic use of a :class:`ShareableList`
instance:

>>> from multiprocessing import shared_memory
>>> a = shared_memory.ShareableList(['howdy', b'HoWdY', -273.154, 100, None, True, 42])
>>> [ type(entry) for entry in a ]
[<class 'str'>, <class 'bytes'>, <class 'float'>, <class 'int'>, <class 'NoneType'>, <class 'bool'>, <class 'int'>]
>>> a[2]
-273.154
>>> a[2] = -78.5
>>> a[2]
-78.5
>>> a[2] = 'dry ice' # Changing data types is supported as well
>>> a[2]
'dry ice'
>>> a[2] = 'larger than previously allocated storage space'
Traceback (most recent call last):
...
ValueError: exceeds available storage for existing str
>>> a[2]
'dry ice'
>>> len(a)
7
>>> a.index(42)
6
>>> a.count(b'howdy')
0
>>> a.count(b'HoWdY')
1
>>> a.shm.close()
>>> a.shm.unlink()
>>> del a # Use of a ShareableList after call to unlink() is unsupported

The following example depicts how one, two, or many processes may access the
same :class:`ShareableList` by supplying the name of the shared memory block
behind it:

>>> b = shared_memory.ShareableList(range(5)) # In a first process
>>> c = shared_memory.ShareableList(name=b.shm.name) # In a second process
>>> c
ShareableList([0, 1, 2, 3, 4], name='...')
>>> c[-1] = -999
>>> b[-1]
-999
>>> b.shm.close()
>>> c.shm.close()
>>> c.shm.unlink()

Loading