Reload  Basics

{ Re-write of pp 267-269 in Learning Python, 2nd ed.}

 

Unlike import and from:

Because reload expects an object, a module must have been previously imported successfully before you can reload it.  In fact, if the import was unsuccessful due to syntax or other error, you may need to repeat the import before you can reload.  The syntax of import statements and reload() calls differs.  Reloads require parentheses, but imports do not.  Here is a simple example:

 

import module     # Initial import

... use module attributes ...

...               # Now, go change the module file.

...

reload(module)    # Get updated attributes.

... use module attributes ...

You typically import a module, then change its source code in a text editor and reload.  When you call reload()Python executes the updated module file's source code in the module's own namespace [1], adding to memory any new or updated objects, but not removing any objects that were already there.  The old objects are destroyed only when all references to them are gone.  They are garbage-collected just like any other objects in Python.  This is an important point, because you may have created references to the old objects without realizing it.  This results in two versions of the same object in memory at once, which can lead to problems.  Here is a more detailed example:

 

## module M1.py

a = 'abc'

b = 'def'

c = 'ghi'

# d = 'xyz'

print 'Hello from M1:', a, b

 

>>> import M1

Hello from M1: abc def

>>> dir()         # see what's here

['M1', '__builtins__', '__doc__', '__name__']

>>> dir(M1)       # see what's there

['__builtins__', '__doc__', '__file__', '__name__', 'a', 'b', 'c']

>>> M1.a, M1.b, M1.c

('abc', 'def', 'ghi')

>>>

>>> # get and set module attributes:

>>> a = M1.a      # new reference in the current namespace

>>> M1.b = a + a

>>> a, M1.b

('abc', 'abcabc')

>>>

>>> # change a = 'ABC', remove c, add d = 'xyz' in module M1.

>>> reload(M1)          # rerun M1 in its own namespace

Hello from M1: ABC def

<module 'M1' from 'M1.py'>

>>> dir(M1)       # surprise – M1.c is still there

['__builtins__', '__doc__', '__file__', '__name__', 'a', 'b', 'c', 'd']

>>> a, M1.a, M1.b, M1.c, M1.d       # surprise – 'a'didn't change

('abc', 'ABC', 'def', 'ghi', 'xyz') # M1.b is no longer 'abcabc'

>>> from M1 import a

>>> a                               # now it is changed

'ABC'

>>>

Perhaps the most confusing thing about reload is that it changes only references in its own namespace (with names like "M1.a" above ).  It does not change references outside this namespace (with names like "a" above ).  You must find all such references and change them by making new assignments of each name in each namespace.  ( i.e. "a" in the code above is not the same as "a" in module "M1".)  These new assignments can be made with statements like "a = M1.a", or "from M1 import a".  Each of these statements changes the name "a" in the current namespace to point to the new object, but does not change any objects or references in other namespaces.

To summarize what happens with reload(M1):

The module M1 must have been already imported in the current namespace.

The objects defined in M1 are compiled and loaded into memory as new objects. 

The old objects remain in memory until all references to them are gone, and they are removed by the normal garbage-collection process.

The names in the M1 namespace are updated to point to any new or changed objects.  Names of unchanged objects, or of objects no longer present in the new module, remain pointing at the old objects.

Names in other modules that refer directly to the old objects (without the module-name qualifier) remain unchanged and must be updated in each namespace where they occur.

Reload() should be used with great caution.  Changing all references to reloaded objects throughout a large program can be tedious and prone to error.  The errors from leaving just one reference to an old object can be subtle.  The safe but slow method is to restart the whole program so as to re-initialize everything after each edit.  Development environments like IDLE do this automatically.

If you are designing a multi-module program, and users may need to reload certain modules, and re-starting everything may be impractical, then you should avoid any direct references to objects within the modules to be reloaded.  Direct references are created by statements like 'a = M1.a' or 'from M1 import *'.  Always access these variables via their fully-qualified names, like M1.a, and you will avoid leftover references to objects in the old M1.

Background on Reload

Users often ask why doesn't reload just "do what we expect" and update everything.  The fundamental problem is that the current state of objects in a running program can be dependent on the conditions which existed when the object was created, and those conditions may have changed. [2]  Say you have in your reloaded module:

 

class C1:

    def __init__(self, x, y ):

      ...

Say you have an object I1 created from an earlier version of class C1.  The current state of I1 depends on the values of x and y at the time I1 was created.  Asking reload to "do what we expect" in this case, is asking to put the object I1 into the state it would be now, had we made the changes in C1 earlier.  Using a fully-qualified name, you can access objects in the new M1.C1, but this won't solve the "old-instances-still-there" problem.  I1 remains exactly as it was before the reload, including any attributes it might have inherited from a superclass of C1.

Gotchas

Here are some more examples of subtle problems that can occur after a reload.

 

import Constants

h0 = Constants.h0

z1 = Constants.z1; z2 = Constants.z2

p1 = Constants.p1; p2 = Constants.p2; p3 = Constants.p3

def h23(freq):

    s = complex(2*pi*freq)

    return h0*(s-z1)*(s-z2)/((s-p1)*(s-p2)*(s-p3))

Good programming says we have done the right thing by pre-computing the values of all constants outside of this frequently-called function.  The use of simple local variables also makes the standard "poles and zeroes" expression a lot easier to read.  However, if we change any of the above values and reload the module Constants, we will find our changes have no effect on the result computed by h23().  We need to access these new values inside the function, using their full names, or we need to re-execute this whole section of code after each change in Constants.

 

## module M1.py

a = 'First Message'

def printer(): print 'Hello from M1:', a

 

>>> import M1

>>> M1pr = M1.printer

>>> M1pr()

Hello from M1: First Message

>>> # Change printer and message ...

>>> reload(M1)

<module 'M1' from 'M1.py'>

>>> M1pr()                     # still using old printer

Hello from M1: Second Message  # surprise – message is changed

>>> M1pr = M1.printer

>>> M1pr()

Hello from second printer: Second Message     # as expected

>>> M1.a = 'Third Message'

>>> M1pr()

Hello from second printer: Third Message

The surprise here is that we didn't expect any changes after the reload, because we are still using the old M1pr().  The reason we get a new message in this case is that the old printer(), which is defined in module M1, implicitly uses M1.a as its message, and M1.a is updated with a reload of M1.  To say this in another way, when printer() is defined in module M1, it points to the name a in the M1 dictionary, not to an object in memory.  When M1.a is redefined, the old printer function gets the new message.  The third test above makes this clear.

Exercises

Use the id() function to investigate which objects are added to memory, and which are left as-is after a reload.  To set things up, run the 'detailed example' under Reload Basics.  Does id(a) change when you reload(M1) ?  What happens to id(M1.a) if you change M1.b then reload?  Does id(M1) ever change?  Try del sys.modules['M1'] then reload(M1).  Does id(M1) change now?  Hint: You may have to import M1 again after this last move, since you have fooled the system into thinking M1 does not exist.  Did either id(M1) or id(M1.a) change after the second import?  What must you do to get id(M1.a) to change?

Solutions

id(a) remains the same during a reload of M1.  id(M1.a) also remains the same, as long as you don't change a in module M1.  id(M1) doesn't change unless you brute-force delete it from sys.modules then import again.  After this radical surgery, id(M1) has changed, but not id(M1.a) !!  You have to actually change a in module M1 to get a new object in memory.

Footnotes

[1]  Reload(M1) is equivalent to the following:

>>> file = open(M1.__file__.rstrip('c'), 'r')

>>> file.seek(0)  # needed if this is not the first reload

>>> exec file in M1.__dict__  # repeat from line 2

>>> file.close()

[2]  There is another more subtle problem having to do with the way Python stores simple immutable objects.  Rather than keep dozens of copies of every integer or short string, the compiler will often keep just one copy in memory, and point all references from any module to that one copy.  Since these objects are immutable, there is no danger of changes in one module affecting another.  When you change an immutable object by *editing* the source code, you simply get a new pointer to a different object when the module is reloaded.  This works for the one module you edited, but what about references to the old object in other modules?  Many of those need to stay exactly as they are.   Rather than try to figure out which references should or should not be updated after a reload, Python has chosen a simpler methodology.  The only references that get updated are the ones in the namespace of the reloaded module.