Friday, August 6, 2010

Python annoyance - scoping

I recently tried writing a small program in Python, and I have to give it credit here - it was pretty easy to get started, using the online language reference and standard library reference for help.

One thing I stumbled over was a subtlety in the scoping that I didn't expect. Here's an example:
foo = 123
def fun():
print(foo)
fun()
print(foo)
This prints 123 and 123, as you would expect. This implies that function blocks can access variables in outer blocks. Let's try another example:
foo = 123
def fun():
foo = 456
print(foo)
fun()
print(foo)
This prints 456 and 123. This means that assignment of a variable in a function doesn't assign the outer variable. Instead it creates a new variable local to the function block. Let's look at the final example:
foo = 123
def fun():
print(foo)
foo = 456
print(foo)
fun()
print(foo)
I can think of two reasonable ways to interpret this:
  1. First print 123 (from the outer block), then 456 (creating a new variable in the function block), then 123 (outer block variable unchanged)
  2. First print 123 (from the outer block), then 456 (modifying the outer variable), then 456.
Instead, Python throws an error:
UnboundLocalError: local variable 'foo' referenced before assignment
This means that Python actually inspects the entire function and searches for assignment operations. If it finds any, it creates a new local variable. If it doesn't, it refers to the outer.

You can override this behaviour by doing this:
foo = 123
def fun():
global foo
print(foo)
foo = 456
print(foo)
fun()
print(foo)
This forces python to only refer to the outer variable instead of creating a new. This outputs 123, 456, 456.

The annoyance is that Pythons automatic choice between "read outer" and "create new local" comes from inspecting the entire function, which means that an assignment at the end of the function can create an error at the first line of the function, which can be very confusing for a beginner.

I think a better solution would have been to always have outer by default, except for in loops and function parameter definitions. You could then add the keyword "local, my, var or val" for when you really want to create a new local variable (like lua, perl, javascript, scala et.c. does).

No comments:

Post a Comment