13th Jun 2009

Short experiences migrating programs to Python 3

After I migrated yesterday a short program, optmatch to python 3.0, I decided to write on the experience, not so much as to list the problems I faced, but to be able to remember the subtle changes later on: I should extend the list as I migrate other programs. Currently, it just includes:

[].sort does not accept cmp argument
filter, map returns an iterator, not a list
map's function cannot be None

My first stop place was Dive Into Python 3, specially the porting document. I do not include here anything already included there, like printing being now a function, and, if I do, is only to refer to my own experience about the change.

The program I migrated was quite short, around 1300 lines, and the accompanying unit tests reach just below 2000 lines, so I am pretty sure that the problems I solved were a subset of the normal headaches a more usual migration would take.

I was more interested on getting to know Python 3, than just migrating my program. So I did use 2to3.py, but more for curiosity, at first, and later, to know how it would handle some idioms not valid anymore in Python 3.

[].sort does not accept cmp argument

In python 2.x, [].sort supports 3 arguments: key, reverse, cmp. The latest is gone in Python 3.

In this thread I read the reasoning behind: people normally uses cmp, but using key would produce better and shorter solutions, and in most cases it is very easy to transform a sorting based on the cmp argument into one based on the key argument.

The same thread proposes using the following function to convert a sorting based on cmp:

def cmp2key(mycmp):
    "Converts a cmp= function into a key= function"
    class K:
        def __init__(self, obj, *args):
            self.obj = obj
        def __cmp__(self, other):
            return mycmp(self.obj, other.obj)
    return K

s.sort(key=cmp2key(cmp))

This struck me as odd, for its zealotry to force a solution seen as better, and for the need to include the new function into the code.

I had some code such as:

functionsAndPriorities.sort(lambda x, y: y[0] - x[0])

This is the kind of code that the thread refered to, as it could have been written:

functionsAndPriorities.sort(key=lambda x: -x[0])

That is nicer, shorter, and probably faster. But I had this other code which does not seem so suitable for a change:

   def compare(a, b):
       #first: give less weight to options than to flags
       noflag = isinstance(a, OptionInfo)
       if noflag != isinstance(b, OptionInfo):
           return (noflag and 1) or - 1
       #then: alphabetical order, case insensitive
       A = a.name.lower()    
       B = b.name.lower()
       return (A < b="" and="" -="" 1)="" or="" (a=""> B and 1) or 0
        
   ret.sort(compare)

Using 2to3.py was not help, as it let unchanged any code related to the sort + cmp problem.

But, in fact, the change is quite easy, so I had to agree (so far, at least), with the reasoning to remove the argument:

ret.sort(key=lambda x: (isinstance(x, OptionInfo), x.name.lower()))

Conclusion: better to realize that [].sort has no cmp argument anymore, and be ready to spend some time replacing comparison functions, because the change seems to be for the better.

filter, map returns an iterator, not a list

This change seems quite harmless: it is.

The easier way to change old code that would expect a list, would be adding it explicitly:

filter(....) -> list(filter(...))

Interestingly, 2to3.py removes any usage of the filter function. For example:

return self.providedPars or filter(lambda x: x != [], self.provided.values())

Is replaced by:

return self.providedPars or [x for x in list(self.provided.values()) if x != []]

The only problem that I have experienced with filter returning an iterator is on the idioms where I evaluated directly the returned value as a boolean expression. For example, to check if any element in a list is an odd number, I would have written:

if filter(lambda x: x%2, lst): #do anything

This code is useless in Python 3, as the returned value is an iterator, and will evaluate as True even if the list to iterate is empty. 2to3.py will replace the previous code with:

if [x for x in lst if x%2]:

But this means that the whole list is calculated, which defeats the benefit of having filter returning an iterator: as soon as an element in the list is evaluated as odd, there is no need to evaluate the rest of the list, which can be written as:

if any(filter(lambda x: x%2, lst)): #do anything

Conclusion: learn the new feature, and try to get advantage of the performance improvements. And get to know added functions, like any

map's function cannot be None

I haven't yet understood the reason for this change. If you have a number of lists with different number of elements, using map(None, lst1, ...) is equivalent to using zip(lst1, ...), but it will returns as many tuples as elements are in the larger passed list, filling the other lists with None elements.

For example:

a=[1, 2, 3]
b=[10, 20]
map(None, a, b)

Returns, in python 2.x:

[(1, 10), (2, 20), (3, None)]

In Python 3, you obtain a TypeError exception:

TypeError: 'NoneType' object is not callable

There is no offered alternative for this behaviour. It must be coded explicitly, such as:

   def oldMapNone(*a):
       '''A replace for map(None, ....), invalid in 3.0 :-( '''
       m = max([len(each) for each in a])
       return list(zip(* [each + [None] * (m - len(each)) for each in a]))

Conclusion: why? I presume there must be some reason lying somewhere, but so far I didn't find it.

Update: I posted this question on the Python forums, and I got pretty fast an answer, not on the philosophical grounds for the change, but on how to code it now:

Use itertools.zip_longest().

(Thanks to Raymonf Hettinger for the answer)