Code Comments

It is widely believed that code with-out comments is shoddy and irresponsible for the longterm maintainability of the code. A good engineer would document their code faithfully for the future engineer to fully understand the code and intentions.

It is widely believed that writing comments is irrelevant as the code should be well written and names should be clear. A good engineer must be competent to both read and write code understanding common idioms in code and translating them to a complete picture of design requirements.

The premise of both of these views is that clarity of design is important to the future maintenance of the code. I'll agree with that premise and I believe it can be well argued to support either of the claims. I have a question: If a developer cannot write their code with clear method names, variable names, and database identifiers which elucidate the reasons behind the code, why on earth would we believe that that same developer could accurately describe the intent of the code in natural language prose? Writing is writing, the skills for clear code are indistinguishable from the skills for proper prose.

Author: Joel B. Mohler
Published on: Aug 15, 2015, 12:40:27 PM
Tags: code, engineering
Comments - Permalink - Source code

Python Services from IIS

With a new job came some different technologies for me. While I have plenty of latitude to make my own decisions about technologies, I made some choices to run with existing technologies with-in the company. One of those choices was to use IIS on the Windows server. The immediate question was how to serve a Python web service from IIS. Sure enough I found the isapi-wsgi library on the internet fairly quickly. I had no experience whatsoever with ISAPI, but a bit of reading revealed that it solved the problem of CGI script startup time by keeping the server components in memory (well, ok, that last sentence may reveal how antiquated my web server authoring experience really is).

Anyhow, I used the opportunity to consider the fact that here is a technology that I know basically nothing about, but I need to become an expert. Specifically, I need to know enough to put this in place in an environment which will have in excess of 100 users and become mission critical. I documented a few questions I had right off the bat:

A first major hurdle was that the code https://code.google.com/p/isapi-wsgi/ seems to be simply dead. Nobody is on the mailing list and the last release was about 4 years old. That's ludicrous to even think of using that in production! Well, I am. Perhaps it is ludicrous, but it's been working well.

I read up on WSGI from the perspective of a long-time Python developer. It's a very simple protocol for how a web service should interact with the environment in which it is served. That is, it basically provides one function with a simple signature which is called whenever there is an incoming http request. It turns out that all the Python web platforms support it -- bottle, flask, django, cherrypy all expose the WSGI API.

On (now, older) IIS, ISAPI was the protocol which defined a service to be added to IIS. So it turns out that all that is necessary is to wrap a WSGI service with an adapter to allow it to be called from ISAPI. Well, it turns out that Mark Hammond already did that (seriously, if you are writing Python on Windows, that is probably true every single day!) -- http://sourceforge.net/projects/pywin32/.

So, if pywin32 has an ISAPI module (it does), what good is the isapi-wsgi project? Not much, actually. After reading the pretty small amount of code, it turns out it's a good example usage of pywin32 ISAPI module and an installation script.

There were two main obstacles to turning this into production. Troubleshooting this stuff is horrible. The error messages are were cryptic ... if you got a message at all. It was a pretty miserable 1-2 days of thinking this was never going to work. Well, it turns out that most of that was from the second obstacle: isapi-wsgi wasn't ready for Python 3.4, the last release seems to target 2.7. Since the code is small and pywin32 exists for Python 3.4, this was easily resolved and I published my work at https://github.com/jbmohler/isapi-wsgi . The documentation should be updated and improved. Maybe I'll get to that.

The issue of cryptic error messages has been resolved. The tricky things all our configured correctly, I've learned a few more goods about the installation process, but I'm not looking forward to configuring this on a new server. The main thing I really did finally catch on to was:

python -m win32traceutil

Run that in a console and you'll get tracebacks everytime a web request call fails.

Is IIS overkill? Well, it was already running on that server. It now works, the performance is good. Just this week, I found an issue and resolved it using IIS diagnostics but that's an issue for another post!

The short answer:

Author: Joel B. Mohler
Published on: Mar 7, 2015, 9:39:54 AM - Modified on: Mar 7, 2015, 9:41:54 AM
Tags: python, windows, web-servers
Comments - Permalink - Source code

Parse & Evaluator

About when I was a senior in high school, I had my first fascination with language design and found a neat algorithm by Edsger Dijkstra to convert a mathematical expression to reverse polish notation. This post gives a very basic implementation of this algorithm in Python. It is not complete in error checking for malformed expressions nor does it handle unary negation.

The Dijkstra algorithm transforms strings like 3+5*2 to token lists like (3 5 2 * +). This token list can be evaluated very simply with a stack-based algorithm. The code below shows the tokenization of the string, the Dijkstra algorithm and finally the evaluation of the reverse polish token list.

Here is some front matter for the main program.

import re
import operator

class ParseError(Exception):
    def __init__(self, msg, offset):
        super(ParseError, self).__init__(msg)
        self.offset = offset

The first step is to tokenize the input string. Tokenizing with regular expressions is quite inefficient. The more performant mechanism is to use flex to generate a tokenizer function from a description of tokens.

RE_OPERATOR = re.compile(r'[-+*/^]')
RE_OPEN = re.compile(r'[(]')
RE_CLOSE = re.compile(r'[)]')
RE_NUMBER = re.compile(r'[0-9]+(\.[0-9]+|)')

TOKENS = [
        (RE_OPERATOR, 'operator', lambda x: x),
        (RE_OPEN, 'open', lambda x: x),
        (RE_CLOSE, 'close', lambda x: x),
        (RE_NUMBER, 'constant', float)]

def tokenize(s):
    index = 0
    yield '', 'open', index, 0
    while index < len(s):
        for t, kind, converter in TOKENS:
            m = t.match(s, index)
            if m != None:
                index += len(m.group(0))
                yield converter(m.group(0)), kind, index
                break
        else:
            raise ParseError('invalid token', index)
    yield '', 'close', index

The next step is the algorithm of E. Dijkstra which re-orders the tokens into reverse polish. This removes all parentheses.

PRIORITY = {
        '^': 3,
        '*': 2,
        '/': 2,
        '+': 1,
        '-': 1}

def to_reverse_polish(infix):
    # In Dijkstra's conversion algorithm, the train goes from New York to
    # California.  Cars (tokens) which are not constants take a detour to
    # Texas.
    new_york = infix
    california = []
    texas = []
    for n in new_york:
        if n[1] == 'constant':
            california.append(n)
        elif n[1] == 'open':
            texas.append(n)
        elif n[1] == 'close':
            while True:
                t = texas.pop()
                if t[1] == 'open':
                    break
                california.append(t)
        elif n[1] == 'operator':
            while True:
                t = texas[-1]
                if t[1] == 'operator' and PRIORITY[t[0]] > PRIORITY[n[0]]:
                    california.append(texas.pop())
                else:
                    break
            texas.append(n)
        else:
            raise ParseError('unrecognized token')
    return california

Finally, the second part of Dijkstra's algorithm is the very simple reverse polish evaluator.

OPERATORS = {
        '^': (2, operator.pow),
        '/': (2, operator.div),
        '*': (2, operator.mul),
        '+': (2, operator.add),
        '-': (2, operator.sub)}

def evaluate(postfix):
    stack = []

    for v in postfix:
        if v[1] == 'operator':
            o = OPERATORS[v[0]]
            portion = stack[-o[0]:]
            args = tuple([s[0] for s in portion])
            del stack[-o[0]:]
            result = o[1](*args)
            stack.append((result, 'constant', portion[0][2]))
        else:
            stack.append(v)
    return stack[0][0]

All this can be put together by:

assert 8.0 == evaluate(to_reverse_polish(tokenize('3+5')))

Of course, in Python this could all be replaced by a one-liner.

assert 8.0 == eval('3+5')

The implementation shown here is in the source file.

Author: Joel B. Mohler
Published on: Jan 17, 2015, 2:06:30 PM
Tags: algorithms, python
Comments - Permalink - Source code

Virtual Environments

Virtual environments in Python provide a Python based approach to providing a sand-boxed environment for developing & deploying Python applications. I've always had mixed feelings about them because sand-boxing apps shouldn't be a function of the user space of a programming language -- it seems obvious it should be a feature of the operating system. However the integration with the tooling of Python means that virtual environments seem to me to be feature rich compared to their minimal complexity. Very little of the infrastructure is magic -- it's just Python packaging & path work.

As I begin a new job in what will probably be a fairly pure Python environment (since I get to choose) I'm fiddling with virtual environments in more earnest than ever before. Here's my experience of getting them going in Ubuntu 14.04. One of my frustrations with managing Ubuntu is trying to figure out if it is better to use Python's pip installer or apt-get to install packages available both on PyPI and Ubuntu's package manager. I think my ideal answer so far is to use apt-get for everything going into the system Python and use virtual environments for every specific Python application I care about and use pip inside the virtual environments.

The following transcript shows the way from a vanilla Ubuntu to using a virtual environment. The transcripts shown here may not be exactly complete especially in regards to Ubuntu packaged dependencies (so I might update this or write a new post which is totally pedantically correct). One beautiful feature of the snippets below is that nothing requires root access aside from the once-and-done installation of python-virtualenv.

$ sudo apt-get install python-virtualenv
< ... output snipped ... >
$ virtualenv ~/venvblog
New python executable in /home/joel/venvblog/bin/python
Installing setuptools, pip...done.
$ source ~/venvblog/bin/activate
(venvblog)$ pip install <desired-package-name-here>

The applications I cared about (at the moment, my qtalchemy based applications) have numerous dependencies. This new virtual environment has nothing so I have a lot of packages to pip to get a running application. This is where pip's dependency resolution and requirements files become relevant. I'll show an example with my personal finance tracker pyhacc.

(venvblog)$ cat pyhacc-requirements.txt
sphinx
pyside
sqlalchemy
pyhacc
(venvblog)$ pip install -r pyhacc-requirements.txt
< ... long install output snipped ... >
(venvblog)$ pyhaccgui --conn=sqlite://
< ... view gui application with demo data ... >

To exit the virtual environment, close the bash prompt directly with exit or deactivate to return to the system Python.

(venvblog)$ deactivate
$
Author: Joel B. Mohler
Published on: Jan 16, 2015, 2:55:54 PM
Tags: development, python
Comments - Permalink - Source code

A Million Dollar Program

Some people use winword to write documents, some people use a markdown language, latex or html. They both have advantages, but they don't integrate well. In my experience, the people who use the respective tools would prefer to not to use the other with varying degrees of refusal. How can we integrate people with varying degrees of comfort with markdown languages? Why are the wysiwyg editors for the formatting languages all broken?

In this post, I'm going to simply focus on the pure text editting crowd in the form of markdown languages.

We need a program that will edit markdown formatted documents in a simple tool. I claim such a thing would cost a million dollars. Well, in fact, I claim such a thing will never exist. Why not? To claim a thing will never exist requires a fairly good description of what that thing is. Therein lies a great deal of complication and I'm not going to live up to that high bar of description in this post.

To me, some of the hallmarks of markdown languages are:

One can conceive of a simple editor that would write complex documents to plain text file formats. For LaTeX the LyX editor is a good example. However such editors often introduce spurious differences in plain text differences. Furthermore, if such a program is to deal with embedded media like pictures, it is impossible to hide from the end user that there are multiple files necessary for the document. It is my opinion that LyX failed the test of crucial simplicity compared to (say) MS Word with one file embedding pictures & text. I love the appeal of simply plopping pictures in documents in a GUI and saving the file and knowing it is internally consistent. It is clearly impossible to embed pictures in plain text. However, it's just as plainly impossible to have a simple revision control program display differences in two versions of MS Word documents.

There are, in my opinion, other crucial things impossible with binary file formats. There's probably an API for whatever format, but it's is unquestionably more complicated than writing text to standard out. This means that once you know markdown, it is trivial to write software in any language on any platform that generates documents of that format. This extends to the use of templating languages -- e.g. generate a latex document with mako templates. If I do any of these generation tricks, suddenly it becomes downright impossible to conceive of a GUI that edits the markdown and deals with the templating correctly.

Confession: I use office software like MS Office or LibreOffice only as a last resort and I feel constrained and dirty every time I do. Such content always feels trapped to me. Is it ever the right tool for the job for me? Some days it is and I need to get over feeling 'dirty' ... how juvenile! Generally, I think it is appropriate for a terminal format. I'm using it once for presentation as a throw-away document. If a document is ever going to last more than a month, I'm better served with a document that I can remix, version control, and template.

Confession 2: This blog post has been ridiculously hard to write. I don't believe you can write a program or file format that has all the benefits of plain text and all the ease and conveniences of MS Office, but giving definition to the vagueness of this spec was difficult.

Author: Joel B. Mohler
Published on: Jan 2, 2015, 4:38:22 PM
Tags: documents
Comments - Permalink - Source code

Why PEP8?

In the Python world the PEP8 style guide is the canonical way to format code. It specifies such ridiculous things as spaces around operators, details about function names and even argument names, and import order. Why would such pedantia be respected? And what would anyone do with it?

Everybody has opinions about the way their code should be formatted. I find that mine vary from language to language and I try to fit in with the code around the new code I'm writing. In spite of trying to fit in the ambient space, my opinions about code formatting often sneak in. In a project with multiple coders (e.g. longterm colleagues), I find that it's often not too difficult to tell who wrote what code. I believe that software projects should be concerned about gross lacks of uniformity.

Why? One of my favorite pet reasons for concern is that non-uniform code leads to spurious changes in revision control. (And, yes, I care very much about whitespace changes in a commit log!) When a developer makes changes purely to code format while making other real changes they must make a choice -- either do a double commit with one for formatting and one for the actual functionality change or make one commit with both formatting and functionality changes. Either choice makes revision annotations more cumbersome than they need to be. A code standard which speaks to pedantia means everybody is using the same format and that means less purely format related changes.

There are other obvious reasons to care about uniformity -- proper name choices make libraries much more discoverable is one of the most crucial.

I've finally found a way to make PEP8 reality in my day-to-day code. I dabbled with pylint various times in the past, but it produced way too much chatter to be usable. If a check like this isn't fairly thoughtless it won't happen like it should and it should happen before every commit. My pylint strategy is to finetune the style rules in the rc file and quiet down some of the most obnoxious warnings. The next part of my strategy was a wrapper script which reads the output of pylint and further prunes errors in a more context sensitive way.

One example of context sensitive pruning is that Python's '*' imports are to be avoided. However, I actually use them a lot in one specific context -- __init__ files. With my wrapper script I can still warn about '*' imports where they shouldn't be and also put them in the in __init__ scripts with-out pylint pragmas.

Maybe someday I'll be able to post my codecheck.py wrapper script.

Author: Joel B. Mohler
Published on: Dec 11, 2014, 4:16:52 AM
Tags: code, python
Comments - Permalink - Source code

Inplace Swapping Variables

Long ago my high school computer science teacher posed the problem of swapping two variables with-out use of a temporary variable. I figured it out on the ride home from school. One of the primary sticking points was that many of the interim solutions I found imposed ridiculous constraints on the values. The solution I post here clearly works for "small enough" integers.

So long as your data type has a concept of a well behaved addition and no overflow from the values the following snippet of Python code illustrates the technique:

# reverse values in variables x & y
x = x + y
y = x - y # x+y - y = x
x = x - y # x+y - (x+y-y) = y

The 'x' and 'y' used in the post-fix comments are to be understood as the original values & the spacing is suggestive.

Author: Joel B. Mohler
Published on: Dec 5, 2014, 3:11:47 AM
Comments - Permalink - Source code