Python and tricky function arguments

Python allowed me to write several pieces of useful software using drastically smaller number of lines required at some other languages. My first programming steps were in Java, during the Introduction to programming course at the Faculty of Mathematics and Physics, Ljubljana. Java and I didn't get along well. I enjoyed the course but writing Java code didn't really inspire me. The second year, however, our tutors upgraded to Python. Now that felt pretty good. I felt much more comfortable writing small scripts in Python than Java and used Python and it's ecosystem in other courses as well. Python code, however, sometimes behaves in a way that seems buggy at first, unless you know what is happening.

A "bug"

Just yesterday, two colegues looked into a strange behaviour of a piece of code, that was written by our best programmer at the time, although in a hurry. It went something like this:

In [1]:
class Pipeline(object):
    def __init__(self, callable_list):
        self.callable_list = callable_list
        
    def __call__(self, text, metadata={}):
        for func in self.callable_list:
            text, metadata = func(text, metadata)
        
        return text, metadata

It is a callable class that we can instantiate by providing a list of callables, which can be applied to some text and metadata that will carry aditional info about the text.

Now we need a function we can put into the Pipeline. One such function could be:

In [2]:
from collections import defaultdict
def character_counter(text, metadata):
    """
    Counts the characters in the text and 
    stores them into metadata
    """
    characters = defaultdict(int)
    for char in text:
        characters[char] += 1
    for key, value in characters.items():
        metadata[key] = value
    return text, metadata

Great, we are all set now to use our pipeline.

In [3]:
pipeline = Pipeline([
    character_counter
])
In [4]:
pipeline('aaa')
Out[4]:
('aaa', {'a': 3})

Great, this seems OK. What about another example:

In [5]:
pipeline('bbb')
Out[5]:
('bbb', {'a': 3, 'b': 3})

This seem strange, where have the three 'a's come from?

A new empty dictionary is created once when the Pipeline is defined, and the same list is used in each successive call.

Python’s default arguments are evaluated once when the function/method is defined and not each time the function/method is called. This means that if you use a mutable default argument and mutate it, you end up with the mutated object for all future function/method calls as well.

Mutable vs. immutable objects in Python

The value of a mutable can change, while the value of immutable object cannot.

In [6]:
mutable_dict = {"a": 1}
In [7]:
id(mutable_dict)
Out[7]:
139653346954880
In [8]:
mutable_dict['b'] = 2
In [9]:
id(mutable_dict)
Out[9]:
139653346954880

So the object stayed the same but it's value has changed. Now for the immutable:

In [10]:
immutable_number = 3.14
id(immutable_number)
Out[10]:
139653347182224
In [11]:
immutable_number = 2.7
id(immutable_number)
Out[11]:
139653347183472

The value is assigned to the object with the same name, but it is a different object. An example of common mutable datatype includes dictionary, list and set, while floats, integers, tuples, None belong to immutable objects camp.

Solution

Now back to our bug.

Lesson is to always use immutable objects for function/method arguments and set an immutable object later. To signal an empty default value None is used often. We should therefore rewrite our Pipeline class in the following way:

In [12]:
class Pipeline(object):
    def __init__(self, callable_list):
        self.callable_list = callable_list
        
    def __call__(self, text, metadata=None):
        # - metadata={}
        # + metadata=None
        if metadata is None:   # new line
            metadata = {}      # new line
        for func in self.callable_list:
            text, metadata = func(text, metadata)
        
        return text, metadata

Does it work?

In [13]:
pipeline = Pipeline([
    character_counter
])
In [14]:
pipeline('aaa')
Out[14]:
('aaa', {'a': 3})
In [15]:
pipeline('bbb')
Out[15]:
('bbb', {'b': 3})

It does!

Exceptions

While this is usually what you want, sometimes, you may want to keep the previous state, maybe when you want to utilize memoization - caching. This would be a story for some other time.

Komentarji

Comments powered by Disqus