Module Flatten.Flatten
Flatten.py
==========
Overview
--------
Flatten is a serialization scheme extremely well suited to network
serialization of python data. The serialization output is a structure of
tuples and strings, which can be converted from/to a byte-stream by the
functions in the Serialize.py file.
Flatten tries to work around the shortcomings of pickle in the area of
security issues by taking a more stringent look towards serialization of
user classes, which involves registering classes which are allowed to be
serialized with this module before they can be serialized. The scheme used
to serialize user classes is described below in more detail.
Flatten also eases checking data of unserialized class instances, as,
unlike pickle, it does not create an empty instance of the serialized
class and then sets the instance dictionary, but after creating an empty
instance will call attribute setting functions of each base class,
allowing you to do parameter validation on creation, or even changing
parameters that have been passed in. The scheme flatten uses to achieve
this makes sure that when an unserialization function of a subclass is
called, all base classes have had their turn to unserialize, so that you
can rely on all bases functioning correctly. This also remedies problems
that occur ever so often when dealing with private members of base classes
during serialization/unserialization.
Flatten does not choke on circular structures, but there are two minor
problems associated with handling this kind of data which are noted below.
Flatten was developed using Python 2.3, but with minor changes should be
able to run on Python as far back as Python 2.1.3. As I use Python 2.3 for
my development and production environment, I did not deem developing
Flatten for older versions necessary, but if someone out there wants to
backport it, I'd surely be grateful for a diff.
Serializable data
-----------------
Flatten by default only knows how to flatten the following kinds of
objects:
None - Serialized as None, and read as None.
bool - True or False are serialized as strings.
complex - Data is stored as real and imaginary part, as network doubles.
dict - Dictionaries are fully supported for serialization, keys and
values being serialized by the flatten function separately.
float - Floats are serialized as network doubles.
int - Ints are serialized as network longs.
list - Lists are serialized as lists of values, flatten being applied
to all objects in the list.
long - Longs are serialized as strings, which can be read regardless of
destination machine.
str - Strings are serialized as is.
tuple - Tuples are serialized just like lists. There is a problem with
circular references in tuples, as described below.
unicode - Unicode strings are serialized as UTF-8 encoded strings.
Classes derived from one of these are not serialized by default, but
additionally need to derive from the IFlatten interface and implement
explicit serialization support.
Instances of classes can also be serialized, but how this happens is
described below.
User classes
------------
When a class wants to be serializable, it must derive itself from the
class IFlatten in this file, which in turn is an object derived class. The
IFlatten interface shows the necessary methods the class needs to
implement, and their working will be described here in more detail.
_new():
A classmethod which is called by the unserializer to construct a new
instance of the class as it is being unserialized. This function by
default calls object.__new__(<classtype>), which should be sufficient for
most classes. When you need to do extra initialization in the __new__
method of your class, you should, additionally to overriding the __new__
method of your class for instance creation by other sources, override this
method, so that you can catch instance creation by the serializer. Be
aware however, that no stored data is available on this call, so you'll
have to move most logic to the __unserialize function.
__serialize(*args,**kwargs):
This private function is called when serialization of this instance is
requested. It must return a dictionary which contains the state of the
class. The keys of the state dictionary must be strings, whereas the
values will be serialized using the flattening functions. The __serialize
method may raise a NotImplementedError when it wishes not to serialize
itself, and not to appear in the output stream (no unserialization is done
in this case). When the actual instance that is being serialized throws
this exception, the immediate base of this class will be the class type
serialized, so that to remote hosts it will appear as though the base
class had been serialized. Of course this only works with linear
polymorphism. From top to bottom, all __serialize methods of all bases
of the instance will be called, so that each class has the chance to store
its private data members, or to change them before serialization. The
arguments and keyword arguments passed in are all extra parameters passed
to the flatten function. All __serialize functions should accept unknown
parameters, as these may be relevant for a sub or base class in
serialization. This is why it is almost always best to use keyword
arguments.
__unserialize(serdata,*args,**kwargs):
This private function is called when unserialization of a class instance
is requested. serdata contains the data that was returned by the
__serialize function, args and kwargs are all optional arguments passed to
the unflatten function. In principle __unserialize should work reverse to
__serialize, and all notes that have been stated there also apply here.
__unserialize is called in the reverse order that __serialize has been
called, so the instance is constructed from bottom to top. The
__unserialize function must return a value which indicates whether
unserialization was successful. In case it returns False, the serializer
will break off unserialization, and throw an exception. This method may
also throw NotImplementedError, but if it does, an exception will be
propagated.
Data members of IFlatten:
__typeMap
This private class data member is used by the unserialization functions
for classes to screen the input data before actually calling the
__unserialize function. It should consist of entries which as a key have
the same name that is returned by the __serialize function, and as values
have types, which are checked to be the class of the instance which is
reconstructed. Only members which appear in __typeMap are checked by the
unserialization functions, if they are not present in the serialized data,
and exception is raised as well. When you want to do parameter validation
for a certain data member yourself, just leave it out of __typeMap.
Class registration:
registerClass(classtype,name):
This function is a public function of the Flatten module. It registers the
requested class as being serializable, and assigns the corresponding name
to it for the serialization stream. name must be a string. When using the
flatten module to transport data over a network, you should make sure that
all hosts have registered compatible classes for this class ID (they need
not be identical, which makes "version upgrades" easier).
Examples of user classes
========================
from Flatten import IFlatten, registerClass
class A(IFlatten):
__typeMap = {"pim":str,"pam":str}
def __serialize(self,*args,**kwargs):
return {"pim":"pom","pam":"pum"}
def __unserialize(self,serdata,*args,**kwargs):
assert serdata["pim"] == "pom"
assert serdata["pam"] == "pum"
class B(A):
__typeMap = {"pim":int,"pam":int}
def __serialize(self,*args,**kwargs):
return {"pim":1,"pam":2}
def __unserialize(self,serdata,*args,**kwargs):
assert serdata["pim"] == 1
assert serdata["pam"] == 2
class C(B):
def __serialize(self,*args,**kwargs):
raise NotImplementedError
def __unserialize(self,serdata,*args,**kwargs):
raise NotImplementedError
registerClass(A,"A")
registerClass(B,"B")
registerClass(C,"C")
binst = B() # New instance.
data = flatten(binst) # Flatten instance.
newbinst = unflatten(data) # Reconstructs instance, no exception raised.
print isinstance(newbinst,B) # Prints True.
cinst = C() # New instance.
data = flatten(cinst) # Flatten instance.
newcinst = unflatten(data) # Reconstructs instance, no exception raised.
print isinstance(newcinst,C) # Prints False.
print isinstance(newcinst,B) # Prints True.
Problems with recursive structures
==================================
This section is currently pending a rewrite, suffice it to say that tuples
which contain themselves as data members (which can happen when say a
tuple contains a dictionary, which itself contains a reference to the
outside tuple), the unserialization works, but is not done correctly.
I will further document what the exact nature of the problem is (any why I
don't plan on changing this in any future release) later on, when I have
rewritten the original section.
Copyright
=========
Flatten is copyright (C) 2002-3 by Heiko Wundram
<heiko@asta.uni-saarland.de>.
This library is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation; either version 2.1 of the License, or (at
your option) any later version.
This library is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
License for more details.
You should have received a copy of the GNU Lesser General Public License
along with this library in the file "COPYLEFT"; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
02111-1307 USA
Classes |
IFlatten |
Interface from which all serializable classes need to be derived. |
Function Summary |
|
flatten (data,
*args,
**kwargs)
Main call point for data flattening. |
|
registerClass (classtype,
name)
Register a class for serialization. |
|
unflatten (data,
*args,
**kwargs)
Main call point for data unflattening. |
flatten(data,
*args,
**kwargs)
Main call point for data flattening. This function takes the data item
to flatten as argument data. Any additional arguments are passed to
instance serialization methods.
-
|
registerClass(classtype,
name)
Register a class for serialization. classtype is the type of the
class, name is the name that this class is given in serialization.
-
|
unflatten(data,
*args,
**kwargs)
Main call point for data unflattening. This function takes the data
item to unflatten (the structure returned by flatten) as argument data.
Any additional argumens are passed to instance unserialization
methods.
-
|
__date__
-
- Type:
-
str
- Value:
|
__version__
-
- Type:
-
str
- Value:
|
UNICODE_ENCODING
-
- Type:
-
str
- Value:
|