JayKimDevolved/deepseek

c011401 verified 3 months ago

20.2 kB

	"""
	=============================
	Subclassing ndarray in python
	=============================

	Credits
	-------

	This page is based with thanks on the wiki page on subclassing by Pierre
	Gerard-Marchant - http://www.scipy.org/Subclasses.

	Introduction
	------------

	Subclassing ndarray is relatively simple, but it has some complications
	compared to other Python objects. On this page we explain the machinery
	that allows you to subclass ndarray, and the implications for
	implementing a subclass.

	ndarrays and object creation
	============================

	Subclassing ndarray is complicated by the fact that new instances of
	ndarray classes can come about in three different ways. These are:

	#. Explicit constructor call - as in ``MySubClass(params)``. This is
	the usual route to Python instance creation.
	#. View casting - casting an existing ndarray as a given subclass
	#. New from template - creating a new instance from a template
	instance. Examples include returning slices from a subclassed array,
	creating return types from ufuncs, and copying arrays. See
	:ref:`new-from-template` for more details

	The last two are characteristics of ndarrays - in order to support
	things like array slicing. The complications of subclassing ndarray are
	due to the mechanisms numpy has to support these latter two routes of
	instance creation.

	.. _view-casting:

	View casting
	------------

	View casting is the standard ndarray mechanism by which you take an
	ndarray of any subclass, and return a view of the array as another
	(specified) subclass:

	>>> import numpy as np
	>>> # create a completely useless ndarray subclass
	>>> class C(np.ndarray): pass
	>>> # create a standard ndarray
	>>> arr = np.zeros((3,))
	>>> # take a view of it, as our useless subclass
	>>> c_arr = arr.view(C)
	>>> type(c_arr)
	<class 'C'>

	.. _new-from-template:

	Creating new from template
	--------------------------

	New instances of an ndarray subclass can also come about by a very
	similar mechanism to :ref:`view-casting`, when numpy finds it needs to
	create a new instance from a template instance. The most obvious place
	this has to happen is when you are taking slices of subclassed arrays.
	For example:

	>>> v = c_arr[1:]
	>>> type(v) # the view is of type 'C'
	<class 'C'>
	>>> v is c_arr # but it's a new instance
	False

	The slice is a view onto the original ``c_arr`` data. So, when we
	take a view from the ndarray, we return a new ndarray, of the same
	class, that points to the data in the original.

	There are other points in the use of ndarrays where we need such views,
	such as copying arrays (``c_arr.copy()``), creating ufunc output arrays
	(see also :ref:`array-wrap`), and reducing methods (like
	``c_arr.mean()``.

	Relationship of view casting and new-from-template
	--------------------------------------------------

	These paths both use the same machinery. We make the distinction here,
	because they result in different input to your methods. Specifically,
	:ref:`view-casting` means you have created a new instance of your array
	type from any potential subclass of ndarray. :ref:`new-from-template`
	means you have created a new instance of your class from a pre-existing
	instance, allowing you - for example - to copy across attributes that
	are particular to your subclass.

	Implications for subclassing
	----------------------------

	If we subclass ndarray, we need to deal not only with explicit
	construction of our array type, but also :ref:`view-casting` or
	:ref:`new-from-template`. Numpy has the machinery to do this, and this
	machinery that makes subclassing slightly non-standard.

	There are two aspects to the machinery that ndarray uses to support
	views and new-from-template in subclasses.

	The first is the use of the ``ndarray.__new__`` method for the main work
	of object initialization, rather then the more usual ``__init__``
	method. The second is the use of the ``__array_finalize__`` method to
	allow subclasses to clean up after the creation of views and new
	instances from templates.

	A brief Python primer on ``__new__`` and ``__init__``
	=====================================================

	``__new__`` is a standard Python method, and, if present, is called
	before ``__init__`` when we create a class instance. See the `python
	__new__ documentation
	<http://docs.python.org/reference/datamodel.html#object.__new__>`_ for more detail.

	For example, consider the following Python code:

	.. testcode::

	class C(object):
	def __new__(cls, *args):
	print 'Cls in __new__:', cls
	print 'Args in __new__:', args
	return object.__new__(cls, *args)

	def __init__(self, *args):
	print 'type(self) in __init__:', type(self)
	print 'Args in __init__:', args

	meaning that we get:

	>>> c = C('hello')
	Cls in __new__: <class 'C'>
	Args in __new__: ('hello',)
	type(self) in __init__: <class 'C'>
	Args in __init__: ('hello',)

	When we call ``C('hello')``, the ``__new__`` method gets its own class
	as first argument, and the passed argument, which is the string
	``'hello'``. After python calls ``__new__``, it usually (see below)
	calls our ``__init__`` method, with the output of ``__new__`` as the
	first argument (now a class instance), and the passed arguments
	following.

	As you can see, the object can be initialized in the ``__new__``
	method or the ``__init__`` method, or both, and in fact ndarray does
	not have an ``__init__`` method, because all the initialization is
	done in the ``__new__`` method.

	Why use ``__new__`` rather than just the usual ``__init__``? Because
	in some cases, as for ndarray, we want to be able to return an object
	of some other class. Consider the following:

	.. testcode::

	class D(C):
	def __new__(cls, *args):
	print 'D cls is:', cls
	print 'D args in __new__:', args
	return C.__new__(C, *args)

	def __init__(self, *args):
	# we never get here
	print 'In D __init__'

	meaning that:

	>>> obj = D('hello')
	D cls is: <class 'D'>
	D args in __new__: ('hello',)
	Cls in __new__: <class 'C'>
	Args in __new__: ('hello',)
	>>> type(obj)
	<class 'C'>

	The definition of ``C`` is the same as before, but for ``D``, the
	``__new__`` method returns an instance of class ``C`` rather than
	``D``. Note that the ``__init__`` method of ``D`` does not get
	called. In general, when the ``__new__`` method returns an object of
	class other than the class in which it is defined, the ``__init__``
	method of that class is not called.

	This is how subclasses of the ndarray class are able to return views
	that preserve the class type. When taking a view, the standard
	ndarray machinery creates the new ndarray object with something
	like::

	obj = ndarray.__new__(subtype, shape, ...

	where ``subdtype`` is the subclass. Thus the returned view is of the
	same class as the subclass, rather than being of class ``ndarray``.

	That solves the problem of returning views of the same type, but now
	we have a new problem. The machinery of ndarray can set the class
	this way, in its standard methods for taking views, but the ndarray
	``__new__`` method knows nothing of what we have done in our own
	``__new__`` method in order to set attributes, and so on. (Aside -
	why not call ``obj = subdtype.__new__(...`` then? Because we may not
	have a ``__new__`` method with the same call signature).

	The role of ``__array_finalize__``
	==================================

	``__array_finalize__`` is the mechanism that numpy provides to allow
	subclasses to handle the various ways that new instances get created.

	Remember that subclass instances can come about in these three ways:

	#. explicit constructor call (``obj = MySubClass(params)``). This will
	call the usual sequence of ``MySubClass.__new__`` then (if it exists)
	``MySubClass.__init__``.
	#. :ref:`view-casting`
	#. :ref:`new-from-template`

	Our ``MySubClass.__new__`` method only gets called in the case of the
	explicit constructor call, so we can't rely on ``MySubClass.__new__`` or
	``MySubClass.__init__`` to deal with the view casting and
	new-from-template. It turns out that ``MySubClass.__array_finalize__``
	does get called for all three methods of object creation, so this is
	where our object creation housekeeping usually goes.

	* For the explicit constructor call, our subclass will need to create a
	new ndarray instance of its own class. In practice this means that
	we, the authors of the code, will need to make a call to
	``ndarray.__new__(MySubClass,...)``, or do view casting of an existing
	array (see below)
	* For view casting and new-from-template, the equivalent of
	``ndarray.__new__(MySubClass,...`` is called, at the C level.

	The arguments that ``__array_finalize__`` recieves differ for the three
	methods of instance creation above.

	The following code allows us to look at the call sequences and arguments:

	.. testcode::

	import numpy as np

	class C(np.ndarray):
	def __new__(cls, args, *kwargs):
	print 'In __new__ with class %s' % cls
	return np.ndarray.__new__(cls, args, *kwargs)

	def __init__(self, args, *kwargs):
	# in practice you probably will not need or want an __init__
	# method for your subclass
	print 'In __init__ with class %s' % self.__class__

	def __array_finalize__(self, obj):
	print 'In array_finalize:'
	print ' self type is %s' % type(self)
	print ' obj type is %s' % type(obj)


	Now:

	>>> # Explicit constructor
	>>> c = C((10,))
	In __new__ with class <class 'C'>
	In array_finalize:
	self type is <class 'C'>
	obj type is <type 'NoneType'>
	In __init__ with class <class 'C'>
	>>> # View casting
	>>> a = np.arange(10)
	>>> cast_a = a.view(C)
	In array_finalize:
	self type is <class 'C'>
	obj type is <type 'numpy.ndarray'>
	>>> # Slicing (example of new-from-template)
	>>> cv = c[:1]
	In array_finalize:
	self type is <class 'C'>
	obj type is <class 'C'>

	The signature of ``__array_finalize__`` is::

	def __array_finalize__(self, obj):

	``ndarray.__new__`` passes ``__array_finalize__`` the new object, of our
	own class (``self``) as well as the object from which the view has been
	taken (``obj``). As you can see from the output above, the ``self`` is
	always a newly created instance of our subclass, and the type of ``obj``
	differs for the three instance creation methods:

	* When called from the explicit constructor, ``obj`` is ``None``
	* When called from view casting, ``obj`` can be an instance of any
	subclass of ndarray, including our own.
	* When called in new-from-template, ``obj`` is another instance of our
	own subclass, that we might use to update the new ``self`` instance.

	Because ``__array_finalize__`` is the only method that always sees new
	instances being created, it is the sensible place to fill in instance
	defaults for new object attributes, among other tasks.

	This may be clearer with an example.

	Simple example - adding an extra attribute to ndarray
	-----------------------------------------------------

	.. testcode::

	import numpy as np

	class InfoArray(np.ndarray):

	def __new__(subtype, shape, dtype=float, buffer=None, offset=0,
	strides=None, order=None, info=None):
	# Create the ndarray instance of our type, given the usual
	# ndarray input arguments. This will call the standard
	# ndarray constructor, but return an object of our type.
	# It also triggers a call to InfoArray.__array_finalize__
	obj = np.ndarray.__new__(subtype, shape, dtype, buffer, offset, strides,
	order)
	# set the new 'info' attribute to the value passed
	obj.info = info
	# Finally, we must return the newly created object:
	return obj

	def __array_finalize__(self, obj):
	# ``self`` is a new object resulting from
	# ndarray.__new__(InfoArray, ...), therefore it only has
	# attributes that the ndarray.__new__ constructor gave it -
	# i.e. those of a standard ndarray.
	#
	# We could have got to the ndarray.__new__ call in 3 ways:
	# From an explicit constructor - e.g. InfoArray():
	# obj is None
	# (we're in the middle of the InfoArray.__new__
	# constructor, and self.info will be set when we return to
	# InfoArray.__new__)
	if obj is None: return
	# From view casting - e.g arr.view(InfoArray):
	# obj is arr
	# (type(obj) can be InfoArray)
	# From new-from-template - e.g infoarr[:3]
	# type(obj) is InfoArray
	#
	# Note that it is here, rather than in the __new__ method,
	# that we set the default value for 'info', because this
	# method sees all creation of default objects - with the
	# InfoArray.__new__ constructor, but also with
	# arr.view(InfoArray).
	self.info = getattr(obj, 'info', None)
	# We do not need to return anything


	Using the object looks like this:

	>>> obj = InfoArray(shape=(3,)) # explicit constructor
	>>> type(obj)
	<class 'InfoArray'>
	>>> obj.info is None
	True
	>>> obj = InfoArray(shape=(3,), info='information')
	>>> obj.info
	'information'
	>>> v = obj[1:] # new-from-template - here - slicing
	>>> type(v)
	<class 'InfoArray'>
	>>> v.info
	'information'
	>>> arr = np.arange(10)
	>>> cast_arr = arr.view(InfoArray) # view casting
	>>> type(cast_arr)
	<class 'InfoArray'>
	>>> cast_arr.info is None
	True

	This class isn't very useful, because it has the same constructor as the
	bare ndarray object, including passing in buffers and shapes and so on.
	We would probably prefer the constructor to be able to take an already
	formed ndarray from the usual numpy calls to ``np.array`` and return an
	object.

	Slightly more realistic example - attribute added to existing array
	-------------------------------------------------------------------

	Here is a class that takes a standard ndarray that already exists, casts
	as our type, and adds an extra attribute.

	.. testcode::

	import numpy as np

	class RealisticInfoArray(np.ndarray):

	def __new__(cls, input_array, info=None):
	# Input array is an already formed ndarray instance
	# We first cast to be our class type
	obj = np.asarray(input_array).view(cls)
	# add the new attribute to the created instance
	obj.info = info
	# Finally, we must return the newly created object:
	return obj

	def __array_finalize__(self, obj):
	# see InfoArray.__array_finalize__ for comments
	if obj is None: return
	self.info = getattr(obj, 'info', None)


	So:

	>>> arr = np.arange(5)
	>>> obj = RealisticInfoArray(arr, info='information')
	>>> type(obj)
	<class 'RealisticInfoArray'>
	>>> obj.info
	'information'
	>>> v = obj[1:]
	>>> type(v)
	<class 'RealisticInfoArray'>
	>>> v.info
	'information'

	.. _array-wrap:

	``__array_wrap__`` for ufuncs
	-------------------------------------------------------

	``__array_wrap__`` gets called at the end of numpy ufuncs and other numpy
	functions, to allow a subclass to set the type of the return value
	and update attributes and metadata. Let's show how this works with an example.
	First we make the same subclass as above, but with a different name and
	some print statements:

	.. testcode::

	import numpy as np

	class MySubClass(np.ndarray):

	def __new__(cls, input_array, info=None):
	obj = np.asarray(input_array).view(cls)
	obj.info = info
	return obj

	def __array_finalize__(self, obj):
	print 'In __array_finalize__:'
	print ' self is %s' % repr(self)
	print ' obj is %s' % repr(obj)
	if obj is None: return
	self.info = getattr(obj, 'info', None)

	def __array_wrap__(self, out_arr, context=None):
	print 'In __array_wrap__:'
	print ' self is %s' % repr(self)
	print ' arr is %s' % repr(out_arr)
	# then just call the parent
	return np.ndarray.__array_wrap__(self, out_arr, context)

	We run a ufunc on an instance of our new array:

	>>> obj = MySubClass(np.arange(5), info='spam')
	In __array_finalize__:
	self is MySubClass([0, 1, 2, 3, 4])
	obj is array([0, 1, 2, 3, 4])
	>>> arr2 = np.arange(5)+1
	>>> ret = np.add(arr2, obj)
	In __array_wrap__:
	self is MySubClass([0, 1, 2, 3, 4])
	arr is array([1, 3, 5, 7, 9])
	In __array_finalize__:
	self is MySubClass([1, 3, 5, 7, 9])
	obj is MySubClass([0, 1, 2, 3, 4])
	>>> ret
	MySubClass([1, 3, 5, 7, 9])
	>>> ret.info
	'spam'

	Note that the ufunc (``np.add``) has called the ``__array_wrap__`` method of the
	input with the highest ``__array_priority__`` value, in this case
	``MySubClass.__array_wrap__``, with arguments ``self`` as ``obj``, and
	``out_arr`` as the (ndarray) result of the addition. In turn, the
	default ``__array_wrap__`` (``ndarray.__array_wrap__``) has cast the
	result to class ``MySubClass``, and called ``__array_finalize__`` -
	hence the copying of the ``info`` attribute. This has all happened at the C level.

	But, we could do anything we wanted:

	.. testcode::

	class SillySubClass(np.ndarray):

	def __array_wrap__(self, arr, context=None):
	return 'I lost your data'

	>>> arr1 = np.arange(5)
	>>> obj = arr1.view(SillySubClass)
	>>> arr2 = np.arange(5)
	>>> ret = np.multiply(obj, arr2)
	>>> ret
	'I lost your data'

	So, by defining a specific ``__array_wrap__`` method for our subclass,
	we can tweak the output from ufuncs. The ``__array_wrap__`` method
	requires ``self``, then an argument - which is the result of the ufunc -
	and an optional parameter context. This parameter is returned by some
	ufuncs as a 3-element tuple: (name of the ufunc, argument of the ufunc,
	domain of the ufunc). ``__array_wrap__`` should return an instance of
	its containing class. See the masked array subclass for an
	implementation.

	In addition to ``__array_wrap__``, which is called on the way out of the
	ufunc, there is also an ``__array_prepare__`` method which is called on
	the way into the ufunc, after the output arrays are created but before any
	computation has been performed. The default implementation does nothing
	but pass through the array. ``__array_prepare__`` should not attempt to
	access the array data or resize the array, it is intended for setting the
	output array type, updating attributes and metadata, and performing any
	checks based on the input that may be desired before computation begins.
	Like ``__array_wrap__``, ``__array_prepare__`` must return an ndarray or
	subclass thereof or raise an error.

	Extra gotchas - custom ``__del__`` methods and ndarray.base
	-----------------------------------------------------------

	One of the problems that ndarray solves is keeping track of memory
	ownership of ndarrays and their views. Consider the case where we have
	created an ndarray, ``arr`` and have taken a slice with ``v = arr[1:]``.
	The two objects are looking at the same memory. Numpy keeps track of
	where the data came from for a particular array or view, with the
	``base`` attribute:

	>>> # A normal ndarray, that owns its own data
	>>> arr = np.zeros((4,))
	>>> # In this case, base is None
	>>> arr.base is None
	True
	>>> # We take a view
	>>> v1 = arr[1:]
	>>> # base now points to the array that it derived from
	>>> v1.base is arr
	True
	>>> # Take a view of a view
	>>> v2 = v1[1:]
	>>> # base points to the view it derived from
	>>> v2.base is v1
	True

	In general, if the array owns its own memory, as for ``arr`` in this
	case, then ``arr.base`` will be None - there are some exceptions to this
	- see the numpy book for more details.

	The ``base`` attribute is useful in being able to tell whether we have
	a view or the original array. This in turn can be useful if we need
	to know whether or not to do some specific cleanup when the subclassed
	array is deleted. For example, we may only want to do the cleanup if
	the original array is deleted, but not the views. For an example of
	how this can work, have a look at the ``memmap`` class in
	``numpy.core``.


	"""
	from __future__ import division, absolute_import, print_function