Time to once again follow in the footsteps of those greater1 than myself. A quick google search will turn up all the other blog posts that have been written since Brian D Foy inspired Titus Brown. I made my initial list of five shortly after PyCon when I first read the post and commentary on Titus’ genesis post. Work and other projects have been taking up all my time, so I never got around to it. In the past two weeks each of these five things has come up at work, to the point that I have started working on a patch, and a PEP.

1. Decorators are lossy.

UPDATE: Peter Fein mentions his decorator module in the comments which solves this problem. See that page for a better description of the problem and how to fix it. We need a Py3K PEP to include this functionality in the distribution and is function annotation aware.

More to the point, it is extremely hard to get the function or method declaration information onto the new wrapping decoration.

def threadsafe(func):
    locallock = threading.Lock()
    def _threadsafe(*args, **kwdargs):
        locallock.aquire()
        res = func(*args, **kwdargs)
        locallock.release()
        return res
    _threadsafe.__doc__ = func.__doc__
    return _threadsafe

@threadsafe
def myfunc(fileobj, nslices,  throw_on_error=True):
    """myfunc does something
    fileobj - must be an actual file object, not a file like object
    nslices  - the number of columns in tabular form. If throw_on_error is True,
                 then this must match exactly the columns in the file.
    A bunch of doctest code
    """
    pass
>>> help(myfunc)

Help on function _threadsafe in module __main__:
_threadsafe(*args, **kwdargs)

>>>

Doc strings are the most obvious problem as they break doctest, but they are the easiest to solve. There was discussion about this on the dev list about a year ago, and a preliminary patch to add a decorator-decorator to deal with most of this, but it was shot down primarily because doc strings were considered the most important trait to transfer and no one really cared about the rest of it.

>>> help(myfunc)

Help on function _threadsafe in module __main__:
_threadsafe(*args, **kwdargs)
    myfunc does something
    fileobj - must be an actual file object, not a file like object
    nslices  - the number of columns in tabular form. If throw_on_error is True,
                 then this must match exactly the columns in the file.
    A bunch of doctest code

>>>

Great.. that cleared up everything. I have been known to go to extremes to get around this limitation. I have a 900 line utility module just dealing with decorators and issues like this. As I have workarounds I have not raised a big stink about it, but Python 3000 changes everything with function annotations. Imagine now you have something like:

def __import_ex__(name:Sequence[str], caller__name__:str,
                            caller__path__:(Sequence[str]|None)=None) -> object

In Py3K, there is allot more being lost and convoluted workarounds will not be acceptable. Once people really start using them to get real work done, the need to ‘clone’ a definition will become apparent and someone will come up with a proper solution. Who knows, it might be me.

2. sys.exit, Py_Exit, and SystemExit

This is the one which pissed me off for the last time and got me finally writing a patch for what I considered to be a bug. On further reflection it is a PEP. sys.exit and Py_Exit should either be something which terminates the process, or it should be a special python exception which does not result in an error stack if unhandeled. Currently it is both and neither. “Py_Exit(3)”, “sys.exit(3)” and “raise SystemExit, 3″ are all equivalent. In truth all that sys.exit() does is set the python exception. If there is an unhandeled exception and it is the SystemExit exception, all the PyRun_* code will omit the exception printing and call stdlibc exit(). This allows atexit to do its work, garbage collection and some of the extension module stuff to clean up properly, and Py_Finalize to occur. What this also means is that code can catch the SystemExit exception and ignore it. When this happens, nothing happens. You do not have a guaranteed means of, from python, terminating the python process with an exit code, even from the python C/API. Calling Py_Exit() could result in the exception being caught and the interpreter not being shut down. Worse yet, if you get the exception unhandled, control is not passed back to Py_Main and the ‘-i’ option is ignored!

print "sorry... you have no clue where or why the script ended..."
print "No interactive mode for you!"
print "importing this as a module will terminate even IDLE!"
raise SystemExit, 5
Mya@miyu ~
$ python -i exit.py
sorry... you have no clue where or why the script ended...
No interactive mode for you!
importing this as a module will terminate even IDLE!

Mya@miyu ~
$

The really messes up some IDE’s and debuggers, and completely screws over people embedding python in another application. None of the PyRun_* calls return if they catch SystemExit, they call exit() instead. Thanks allot! Either provide a way to call exit() and terminate the process, or have a means of stopping the interpreter cleanly. What we have currently is sometimes one, sometimes the other, many times neither. The PyRun_* calls need to return the exit code and Py_Main needs to deal with it properly. I do not believe there is any reason to call exit() ever.

There is never any time to write patches or get into discussions on these things. Thankfully some of them have caused us enough problems that they are being put on the schedule at work, so they are no longer things I need to spend my ’spare personal time’ on. Once I get the preliminary Py_Exit PEP done, I will pass it by first by the python-users and then python-dev lists to get feedback and support. If it’s not in before 3.1 that is fine, but I will be in shock if people believe the current behavior is the proper behavior.

3. No Py_Initialize hook

For Py_Finalize we have Py_AtExit and the atexit module, but for people embedding python in another application, we have no Py_AtInit. This might seem silly, but we need a better bootstrap system. To get around this we embed the site module. That right, we have the site module as a builtin, and part of what it does is go out to disk and import the real one after our system initialization is done. This is actually a nice way of dealing with things, especially for debugging our code, as the ‘-s’ option will stop site import. It is still a hack, and for all its nice side effects, it is still an ugly, ugly hack. There should be an approved and supported way of doing this.

4. .pyc .pyo compile location inflexibility

The compiled python files must be in the same directory as the .py files. You cannot compile them to another directory and have them still refer back to the .py file for proper exception stacks and debugging. This is a major problem for people who like clean build systems. We have taken to doing a custom compile, then moving the file into the proper build directory and running from the .pyc/.pyo files. This requires custom tools, cruft in your source directory if you abort a build at the wrong time, and in general is an ugly hack. For certain network resources it would be nice to have one version of the source files and different .pyc/,pyo files in different directories depending on the version of python. Some custom import hooks can achieve this to a point, but inorder for it to work, once again, you need to first compile the file in the .py directory then move it, so that the .pyc/.pyo has the proper .py file listed in its header. This means you cant have your source release directory read only. And no, precompiling is not always an option. Jython had this ability back in 2000, but we don’t have it in any other python implementation that I know of.

A simple change to compile() should do the trick. Between that and the work Brett Cannon is doing on the import system, it should be trivial to implement any inane import/compile customization one could ever want to hang themselves with.

5. No .so/.dll zip import

This is not really true, there was .so importing from zip’s at one point, but it had issues. There was a patch for .dll imports from a ‘compressed directory’ but again there were issues, and no Mac support. Being able to package everything up into a single zip file would be very very nice, and help enforce that the proper library was loaded with the proper source. Currently if you use a zip with a private interface (_sre.so is an example of this), you need two entries on the python path, one for the zip and one for the shared libraries. This can cause problems. I can’t count the number of times I have had to add ‘-vv’ to the commandline options of some experiment to diagnose some bazaar behavior that was due to loading the wrong .so for the parent .py.

Relative imports will solve the vast majority of the problems, but it would still be nice to have it all in one package as sometimes people forgot to copy the .so’s as well.

Conclusion

Everything here has to do with embedding and extending python for custom environments. I would be interested in hearing form other people who do this. This is a small part of my day job and oddly enough one of the least interesting. These are the things that are painful, everything else python related (except SWIG) just works and thus goes unnoticed. The really interesting stuff I can’t talk about. It sucks working on mind blowing technology and not being able to talk about it, but its better than the alternative; talking about working on uninteresting, boring, junk. I guess I just have to give up on sleep.

1. This is not self deprecation. There is no doubt of the greater impact of their personal work on the greater open source community.