sets and bags (multisets) in Python

Peter Dobcsanyi p.dobcsanyi at designtheory.org
Sat May 1 21:09:44 BST 2004


Here is my planned reply to the Python developers regarding bags. I
wrote this after a quick discussion with Leonard about what features we
would like to see for bags.

In my planned reply I make references to the "set builtin data type",
you can find the related documentation at:

    http://www.python.org/dev/doc/devel/lib/types-set.html

You may also want to take a look at the "collections" module's doc of
which "bag" would be part:

    http://www.python.org/dev/doc/devel/lib/module-collections.html

Please note, our online Python documentation on the internal pages does
not contain these details and/or is out of date, you should look the
development versions for the new Python release at the above URLs.

.........................................................................

Here are the bag (multiset) features we thought would be useful.

Regardless of actual implementation, let us consider a bag as a set of
ordered pairs (x, m) where x is an element of the bag and m >= 0 is the
multiplicity of x.

A = {(x, m_A), ...}
B = {(x, m_B), ...}

union           A | B = { (x, m_A + m_B), ...}
intersection    A & B = { (x, min(m_A, m_B), ... }
difference      A - B = { (x, max(m_A - m_B, 0)), ... }
symmetric diff  A ^ B = { (x, m_A + m_B - 2 * min(m_A, m_B)), ... }

A.multiplicity(x) returns a non-negative integer

    special cases:
        x in A  iff  A.multiplicity(x) > 0
        x not in A  iff  A.multiplicity(x) == 0

len(A)  returns the sum of multiplicities

A.underlying_set()      return the underlying set of A

A.underlying_size()  (== len(A.underlying_set())

A.no_repeats()          return True if for all x in A.underlying_set()
                            A.multiplicity(x) == 1

A <= B  iff  for all x in A.underlying_set()
                        A.multiplicity(x) <= B.multiplicity(x)

    A == B  iff  A<=B and B<=A


A.add(x, [m])       add x to A with multiplicity m; if m is omitted it
                    defaults to 1

A.remove(x, [m])    remove x from A m-times, if m is omitted it defaults
                    to 1; raise BagError (or whatever) if x is not present.
                    (I don't like KeyError for this, neither here nor in
                    the case of sets.)

A.discard(x, [m])   remove x from A m-times if it is present; if m is
                    omitted it defaults to 1

A.pop([selection_mode]) if selection_mode is omitted remove and return
                        an arbitrary element; raise BagError if A is empty

    selection modes     UNIFORM: make a uniformly distributed random
                        choice over the list of elements
                        
                        UNDERLYING_UNIFORM: make a uniformly distributed
                        random choice over the list of underlying elements.

                        MAJORITY: the most numerous element (or
                        uniform among them if more than one)

                        MINORITY: similarly, as the name suggests


The mutable-immutable distinction: as it is with the builtin set.

xxx_update() functions, operator, non-operator variants, using iterable
arguments: similarly to the builtin set versions.

Conversions:

    Most of them provided by the xxx_update() functions

    A.items()   return A as a list

Aside: could we have S.items() if S is a set?


Iteration:

    for x in A:       as it was         for x in A.items():


hash(A) as it was tuple(sorted(A.items()))

    (or more precisely, a recursive "deep-freezing sorting" -- ?)


...........................................................................

Note, how the xxx_update() functions and releated operators provides the
in-place modification; that is the nice "pythonic way".

--             ,
    Peter Dobcsanyi



More information about the Developers mailing list