sets and bags (multisets) in Python

Peter Dobcsanyi p.dobcsanyi at designtheory.org
Sat May 1 21:01:00 BST 2004


Dear All,

In the upcoming version of Python, "set" will be a builtin data type and
there is chance that multisets (named "bags") will also be implemented
in a C extension module. So sets and multisets will have high speed
implementation. I sent the attached mail to the Python developers list
and received two kind replies. One from the developer of 'set' and 'bag'
(also attached), and another one from somebody else offering a
combinatorial module also written in C for speed.

It is a rare opportunity for us to influence Python's development to fit
to our needs so I would like to ask all of you to comment on my planned
reply to the developer. My planned reply comes as the follow-up to this
email on our list.

On Thu, Apr 29, 2004 at 08:01:33PM +0100, Peter Dobcsanyi wrote:
> Dear All,
> 
> I have just read PEP-320 and noticed the following comments regarding
> the "collections" package:
> 
>     ...
>     - ? bag (only if use cases established)
>     ...
> 
> I would like to argue for such a use case. We at designtheory.org are
> working on a (maths / stats / comp. sci.) project in the area of Design
> Theory. Design theory intersects with many fields, to mention a few:
> combinatorics, graph theory, finite geometry, coding theory, and the
> design of statistical experiments (from which then name of the field
> comes).  Most of our software development will be in Python, although
> the released python software so far is very modest.
> 
> To show why bags are important for us and, in general, for discrete
> mathematicians, here is the definition of the most important type of
> designs. A binary block design is a multiset (that is a 'bag') of
> subsets of a 'base' set; the elements of the base set are called
> 'points' and its subsets are called 'blocks'. (Personally, I prefer the
> name bag but most mathematicians use multiset.) A non-binary block
> design is one whose blocks can also be multisets.
> 
> The computations we are facing are very much combinatorial in nature so
> we are happy to see that Sets became a builtin and, of course, would
> like to see a C implementation for bags too. Our pydesign package (will)
> heavily use C extensions. So far we use numarray to compute statistical
> properties of designs, but many more combinatorial functionalities will
> be implemented in C.
> 
> In particular, we are planning to implement a basic permutation group
> package whose core will eventually be a C extension. We would like to
> deal with automorphism groups and isomorphisms of designs on at least a
> basic level within the pydesign package without having to resort to
> specialized mathematical packages like GAP. These functionalities can be
> useful not only for design theorist, but for anybody using Python to
> deal with combinatorial structures, like graphs for example.
> 
> In all of these areas the use of multisets (bags) is pervasive.
> Please, implement it  --  so we don't have to :-)
> 
> For more details on our project, please visit http://designtheory.org/

Here is the reply I received from the developer:

On Thu, Apr 29, 2004 at 03:56:03PM -0400, Raymond Hettinger wrote:
> I'm interested in knowing which bag features are the most useful to
> you.
> 
> For high-speed counting and count lookups, perhaps a dictionary class
> method would suffice:
...
> Other ideas were tossed around.  Which of the following methods arise
> in
> your use cases:
> 
> * total number of elements in a bag 
> * number of unique elements
> * pop() the most numerous element (or arbitrary if more than one)
> * hash() and eq() as if the bag were stored as:
> tuple(sorted(d.items()))
> * iterate over unique elements
> * iterate over all elements
> 
> I appreciate your note.  Other developers strongly questioned whether
> there were valid use cases for bags.
> 
> 
> Raymond

--             ,
    Peter Dobcsanyi



More information about the Developers mailing list