Friday, November 27, 2009

Injecting attributes into Python modules

One of the things I don't like about frameworks like Django and Pyramid is the amount of boilerplate imports you end up having at the beginning of modules, especially those modules that are more like configuration files. It spoils the DSL-like nature of them, and I was interested in finding a way to be able to import a module with certain attributes already defined. The __import__ function doesn't allow you to do this, because the locals argument is ignored, so I looked for another way. Before I describe the method I came up with, here's how you might use it:
import elixir
from inject import module_inject
module_inject('myapp.models', elixir)
import myapp.models
PEP 302 describes the import hooks that have been available since Python 2.3, and defines an import protocol. By adding an object with find_module and load_module methods to sys.meta_path, you can get hooked into the import process. find_module is called with the module name to see if an object knows how to load it. load_module is then called to do the actual loading. The class below implements both of those methods.
class InjectionLoader(object):
    def __init__(self, name, dicts): = name
        self.dicts = dicts
    def find_module(self, fullname, path=None):
        if fullname ==
            return self
    def load_module(self, fullname):
        # Get the leaf module name and the directory it should be found in
        if '.' in fullname:
            package, leaf = fullname.rsplit('.', 1)
            path = sys.modules[package].__path__   
            leaf = fullname
            path = None

        # Open the module file
        file, filename, description = imp.find_module(leaf, path)

        # Get the existing module or create a new one (for reload to work)
        module = sys.modules.setdefault(fullname, imp.new_module(fullname))
        module.__file__ = filename
        module.__loader__ = self  
        code = compile(, filename, 'exec')
        # Populate the module namespace with the injected attributes
        for d in self.dicts:
        # Finally execute the module with its injected attributes
        eval(code, module.__dict__)
        return module
It's instantiated with the module name it's injecting to, and the dicts it is injecting. To make it easier to use, I wrote a helper function, module_inject. It takes a module name, and one or more dicts or modules. Dicts are injected as-is. Modules have their __dict__s injected, but only those attributes listed in the module's __all__ attribute, or if that isn't present then only those that don't begin with a double underscore, are used. This is like doing a from module import * at the beginning of the imported module. Here is its implementation:
def module_inject(name, *args):
    """Set a hook so that when module 'name' is imported, it is executed with
    the attributes in 'args' already in module scope. The arguments can be
    dictionaries or modules (see 'normalize_dict')."""
    args = map(normalize_dict, args)
    sys.meta_path.append(InjectionLoader(name, args))

def normalize_dict(d):
    """If the argument is a module, return the module's dictionary filtered
    by the module's __all__ attribute, otherwise return the argument as-is.
    If the module doesn't have an __all__ attribute, use all the attributes
    that don't begin with a double underscore."""
    if isinstance(d, types.ModuleType):
        keys = getattr(
            filter(lambda k: not k.startswith('__'), d.__dict__.keys())
        d = dict([(key, d.__dict__[key]) for key in keys])
    return d
It's something to be used with caution, though. In general, the Python mantra of *explicit is better than implicit* is a good guideline to follow.
Update: somebody asked me about the use of file as a local variable. I'm actually torn on the issue. Yes, it does shadow the built-in file function, but on the other hand it's concise, and it's the same name used in the Python documentation.

Saturday, October 3, 2009

Initializing attributes from __init__ arguments

Every once in a while, I get fed up of having to do lots of = foo in Python __init__ methods, and wonder if it couldn't be done automatically. I came up with the following function to do just that, but I doubt I'll ever use it myself, because it goes against the *explicit is better than implicit* philosophy of Python.

#!/usr/bin/env python
import inspect

def init_from_args():
    frame = inspect.stack()[1][0]
    code = frame.f_code
    var_names = code.co_varnames # __init__'s parameters and locals
    init_locals = frame.f_locals # __init__'s dict of locals
    num_args = code.co_argcount # Number of arguments
    arg_names = var_names[1:num_args] # Positional argument names

    # If there's a **kwargs parameter, get the name of it
    kw_name = None
    if code.co_flags | 12:
        kw_name = var_names[num_args + 1]
    elif code.co_flags | 8:
        kw_name = var_names[num_args]

    # Copy the positional arguments
    for name in arg_names:
        setattr(init_locals[var_names[0]], name, init_locals[name])

    # If there was a **kwargs parameter, copy the keywork arguments.
    if kw_name:
        for name, value in init_locals[kw_name].items():
            setattr(init_locals[var_names[0]], name, value)

class Foo:
    def __init__(self, a, b, *args, **kwargs):
        bar = 123
        baz = "hello"
        quux = "foo"

if __name__ == "__main__":
    foo = Foo(1, 2, 3, something="something else")
    print foo.__dict__

Tuesday, January 20, 2009

Towards talker standards

A few years ago I wrote an article about the desire to bring talkers out of their strictly console-based world. I'm reproducing it here so it has a permanent home.

Despite the rapidly rising popularity of instant messaging on the Internet, talkers have maintained a loyal following due to the unrivaled sense of presence and community they offer. However, their implementations have remained largely unchanged since their inception, and they have failed to take advantage of the past decade of developments in Internet technologies. This article presents a case for the collaborative development of standard talker protocols.

The state of talker development

Browsing the source code of any current popular talker implementation will reveal signs of a long heritage of modification upon modification. The most popular talkers have long departed from the stock implementations they began with, each adding a rich diversity of new features. More recent talkers such as Amnuts and PG+ are derived from talker code written in 1992. In software development terms, this is a long time. To put it into historical perspective, when Talkserv and Elsewhere were first released, Microsoft had just released Windows 3.1. These talkers were conceptually based on MUDs implemented as far back as 1978.

Talkers are based upon simple client/server architecture. While it is often claimed that they are TELNET servers, in fact most do not adhere to the TELNET protocol, and TELNET clients are just used in their capacity as terminal emulators. Both talker clients and talker servers have limited terminal functionality, and because of this they are limited to line-based input processing. This results in a non-intuitive and often off-putting user interface. For example, most talkers offer the facility to send messages similar to e-mails between users, but there is very little message-editing functionality. While it would be possible to provide a curses-based interface for such operations, the added complexity of doing so has prevent its adoption.

Futhermore, the look and feel of the user interface is defined in the code of a talker server. When establishing a new talker, a sysop first chooses a talker base code to start from. This is usually based on personal preference for style, with the biggest decision being EWToo-style versus NUTS-style. This decision will usually have a large impact which users will use the talker. The sysop must then customise the talker to give it unique characteristics. Some of these customisations simply involve changing text files supplied in the talker distribution package, but most involve changing or adding to the talker's source code. This means the sysop must be a programmer, or at least have available a programmer who is willing to donate his time. This is accepted practice, but it's easy to see how absurd it is by imagining having to recompile your web server in order to update your web site! The customisations come in two forms: modifications or additions to the behaviour of the talker, and modifications to the appearance of the talker. The latter, while relatively easy, is tedious and error prone because each talker's output is intermingled with its control logic, meaning that many disparate functions must be modified in order to create a new unified visual appearance.

Because of the ad-hoc nature of the additions and the lack of separation between logic and presentation, changes are rarely returned to the stock implementation they derived from, and features are often reimplemented afresh in other talkers. This leads to problems later; when the original code base is updated with important fixes the author of the derivative must then decide whether to attempt to isolate and integrate those fixes, or to abandon his code and begin again with the new code base. As a large proportion of the modifications are customisations that provide the derived talker with its uniqueness, this can be a difficult decision to take. It also means that additional code is not reviewed, increasing the risk of introducing security problems.

The problems

The existing problems identified above can be summarised as follows:

  • Non-intuitive UI. Because the talker emulates a text terminal, the server—not the clients—defines the UI, and this UI is restrictive.
  • Output interleaved with logic. A developer must change many disparate functions to obtain a new unified visual appearance, even though those functions may contain logic which is unchanged. This is tedious, leads to errors, and prevents code merging.
  • EWToo versus NUTS. Choosing one style of talker over another restricts the userbase . Others, such as Nilex-style talkers, have even more limited appeal.
  • A sysop must be, or must have, a programmer. Very little of the talker can be changed without changing its source code and recompiling it.
  • Ad-hoc design. Talker code has evolved over many years, under different developers, with no common design goals.
  • Fork and forget. When a new talker is developed the base code is forked and rarely merged. Fixes in the base code are difficult to isolate and integrate.
  • No code review. Single developers develop most talker code. This provides little opportunity for them to receive feedback about code quality.
  • Features are reimplemented. Because no widely used talker supports the notion of plug-in components , it is difficult for developers to release packaged features.

User agents

When MUDs and talkers were first developed Internet access was uncommon, and mostly limited to academic users with Unix accounts. These users were quite used to text-based interfaces driven by abbreviated command names, but most of today's talker users are more familiar with GUIs, multimedia, the World Wide Web and instant messaging software. It's therefore unsurprising that there are a number of MUD and talker clients, such as Pueblo and Z-MUD, that offer enhancements over basic terminal emulation. Talkers can use Pueblo's protocol, while MUDs have even more extensions such as MUD Sound Protocol, MUD eXtension Protocol and MUD Client Protocol. These protocols differ in their design, but they all have a common goal: to allow the client to provide a richer user experience while maintaining compatibility with non-enhanced clients.

To help illustrate the kind of experience an enhanced user agent might provide, the following scenarios suggest some likely interactions.

  • Alice is idly chatting on a talker. She sees a message telling her that Bob is requesting a game of Connect 4 with her. She clicks on the message and a small window containing a playing board opens up. She hears the familiar sound as Bob places his first piece, then clicks on another column to make her own move. She then returns to chatting while Bob ponders his next move.
  • Charlie logs into a talker for the first time in a couple of weeks. He clicks the who's online? button on the toolbar and looks at the list of users. He doesn't recognise Dan, so he clicks on the Dan's profile icon. He sees that Dan is a new user who joined today, so he clicks on his name to open a private chat with him, and welcomes him to the talker.
  • Emma is chatting in the main room, but is getting annoyed with Fred. She clicks on his name, and chooses ignore from the menu. She no longer sees anything Fred says.
  • As Gini logs into her usual talker, a message pops up telling her there are new news items. She chooses to read them now, so a message reading window opens with two messages in it. She reads the messages and deletes them. While she has the message reading window open, she looks back at a couple of old talker mail messages and decides to reply to one of them, before closing the window and returning to chatting.
  • Hayley is chatting in the main room, but also having a private conversation with Ian. It's busy in the main room, and she keeps missing messages from Ian, so she opens up a conversation with Ian window. Her messages from Ian now appear in there instead of in the main window, so she can keep track of both conversations.

These are the sort of interactions that users are familiar with from using other GUI applications. UI design is complex and, to some extent, subjective, so no restrictions on how such a client should behave are given here. Instead, it is anticipated several clients would be developed independently, catering for people with differing tastes.

A client/server protocol

If, instead of the current terminal emulation approach, talkers and their clients communicated using a domain-specific protocol, a number of possibilities would open up. Most importantly, it would allow for a radically different kind of user agent that would be able to present information in a much clearer way.

It would also allow other software to communicate with the talker. Bots are a common example of software agents that need to do this. Most bots currently parse the human-readable output from a talker and respond with the same commands a user would use. This is not a foolproof strategy, and depends on the talker's output for a given event not changing as the talker is developed.

Another recent trend is that of embedding other services into the talker. Examples include the HTTP and SMTP servers in Anthony Biacco's Ncohafmuta talker code. These are only partial implementations of the respective protocols, and are likely to introduce bugs that, if exploited, may crash the entire talker process. A talker protocol would allow for software agents acting as gateways been web servers and mail servers respectively.

Finally, a standardised protocol would allow for talker-to-talker links between talkers using different implementations, so long as each talker adhered to a common standard.

The diagram here shows the structure of the client, gateway and server tiers. Because of the vastly increased flexibility such a protocol would bring, it would be the cornerstone of new talker developments, and careful design would be vital. Any protocol to be considered would need to satisfy certain requirements:

  • The session layer must support the transfer of arbitrary data, including binary types. This does not preclude the use of textual data such as XML documents at the presentation layer.
  • It must provide an inline mechanism for negotiating encrypted connections so that such connections would not require a separate port.
  • It must support authentication methods including, but not limited to, plain passwords and asymmetric keys.
  • It must provide a facility for bi-directional delivery of asynchronous events, and for bi-directional request/response pairs. (It would be acceptable for the latter to be implemented in terms of the former.)

In addition, the application layer would need to not only support the features found in a large subset of current talkers, but also be extensible enough to support future features.


A small number of talker developers have expressed a desire to enable end-to-end encryption between their talker and its clients. This relatively straightforward application of cryptography could be implemented without too much difficulty on the server side, using a free Transport Layer Security implementation, and similarly on the client side if clients such as those described above were used.


However, once cryptography has been introduced, it opens up a number of interesting possibilities. The first of these is asymmetric key authentication. Asymmetric key algorithms use a pair of keys: one public, and one private. The two are mathematically related, but to derive one from the other is considered computationally infeasible. Such algorithms are now widespread, and used extensively in protocols such as PGP, SSL/TLS and SSH. This authentication scheme has significant security advantages, because the server need only ever know a user's public key. This public key can be used on every talker the user connects to, and as long as the corresponding private key is never revealed no security is compromised. Typically, private keys themselves are encrypted using a passphrase. It is this passphrase that a user would type when connecting to a talker, and the passphrase never leaves the user's computer. In fact, a client could be designed so that once the user enters their passphrase to connect to one talker, they don't need to enter it again until they restart the client, no matter how many talkers they connect to. This is functionality similar to that provided by SSH agents such as ssh-agent and Pageant.

Trust networks

A problem that talker sysops face regularly is that of user identity. If a sysop wishes to ban a malevolent user, there are no real ways to ensure he stays gone. The user may reconnect at any time using a different name and from a different address, or from an address the sysop knows is used by many users (such as that of a shell server). Because of the high value the Internet places on users' right to anonymity there is unlikely to be a complete solution to this problem, but it can be approached from a different angle. Instead of trying to track users as they change their identity, we can persuade them to use only a single identity.

Suppose that Alice is a user who wants to use a particular talker called Foo Hills. She uses several other talkers, and she knows Bob, who is already a user of Foo Hills. Bob has used the talker for several months, and the talker's sysop has indicated his trust in Bob by using the talker's private key to sign Bob's public key. Bob knows Alice is also trustworthy, so he similarly indicates this by using his own private key to sign her public key. Now Alice becomes a user of the talker and a chain a trust exists from the sysop, to Bob, to Alice.

Now supposed that Carl wanted to join the talker. He has a public key that has been signed by the private key of a small, little-known talker called The Bar. However, no trust relationship exists between Foo Hills and The Bar, so Carl is considered an untrusted user. Depending on the policy chosen by the Foo Hills sysop, he may be denied access, or be allowed to connect as an untrusted user. This concept of trusted and untrusted users could form the basis of what many talkers refer to as citizenship.

Portable objects

Trust networks require that the talker have its own private key, which can be used to sign users' public keys. One interesting possibility that arises from this is the ability to export signed data from the talker, such that anything else with access to that talker's public key can assert two things about that data: that it was indeed exported from that talker, and that it hasn't been modified since it was exported from that talker. Objects (items that users carry, wear, use etc) could be exported in this fashion, and then used in another talker, providing that the importing talker understood the nature of the objects, and had a trust relationship with the exporting talker. The same is true of the 'currency' used on talkers, which suggests that ideas regarding simple economics could be explored. Curiously, there have been several instances of items from MUDs being auctioned off on e-Bay (for real money). This does suggest that such a feature might have some appeal.


The information stored about a user on a talker can be divided into three categories: transient state, local profile, and user information.

Transient state is implementation-specific data regarding the user's session. This data is discarded when the user disconnects. Local profile includes the user's description, how much currency they have, and which room they connect in. This information is saved when the user disconnects.

Most users use more than one talker. Some users use many talkers, often using the same identity on each. There are many pieces of information associated with users that they must enter manually into each talker. These include name, e-mail address, sex, age or date of birth, IM handles and homepage URL. It would make sense for this information to be kept in one location. An LDAP directory would be one possible solution, even though it leaves a number of details to be considered, such as who would keep the directory online, and what would happen in the event of a failure.


Talkers don't currently attempt to deal with internationalisation (i18n) issues. This is understandable; it's a complex issue. For example, should the sequence of bytes EF BB BF E4 BD A0 E5 A5 BD look like "ï»¿ä½ å¥½" or "你好"? It's clear to us which one is correct, but not to either the talker or the clients, because the answer depends on the character set being used. Talkers make little attempt to interpret input characters other than the few they directly act upon, instead passing them straight to the clients. If the two conversing parties are using non-ASCII characters (i.e. those with code points above 127) but are using the same character set, then this isn't a problem. However, if they're using incompatible character sets then the non-ASCII characters will be displayed wrongly. The TELNET protocol allows the discovery of a client's character set using a sub-option, but this isn't currently used. Even if it were, the talker would have to perform complex conversions between character sets. A new talker architecture could overcome this problem by storing and transferring all text in Unicode , a character coding system that assigns a single unique number to every character used by modern languages today (and then some). When the talker and all its clients know that they're exchanging Unicode text, agreeing on which characters are being exchanged is no longer a problem. Other issues, such as directionality and normalisation, must still be addressed by client software, but the Unicode Consortium gives clear guidelines on this.



Most long-term talkers are hosted on servers that also host several other talkers. The server's DNS name might be, but, as each talker requires a unique rendezvous point, a TCP port number must also be specified. Many talkers can be partially referenced by their own DNS name, such as, but as this still resolves to the same IP address it must still be disambiguated with a port number. We use names for addresses because they're easier to remember than numbers, but while we don't have to remember an IP address, we still have to remember a port number.

A similar problem existed in web hosting. Before HTTP 1.1, the solution was to give the network interface of a server multiple IP addresses, and allow one web server to bind to each of these addresses. The extremely rapid expansion of the Web and the limited supply of IP addresses meant a better solution was needed, so the current HTTP protocol requires that the DNS name used to access the web site be specified in each request to the web server. This was a great improvement, but because it required a change to the protocol it was impossible to retrospectively apply the same principle to other services.

DNS SRV resource records offer an even more flexible solution. These records are similar to A records in that they provide an IP address for a particular DNS label, but they also provide a port number, and weighting and priority indicators. The practical result of this is that specifying would be sufficient to direct a next-generation talker client to the desired server. The weight field is intended for use in load balancing situations, and is unlikely to be used by talkers. The priority field, which has the same purpose as the priority field in MX records, could be used to allow clients to automatically failover to a backup server.


A possible extension to addressing talkers solely by name is to address entities (or resources) within the talker. URIs provide a natural facility for doing this. For example, a talker might have a room called entrance, which could be identified by the URI talker:// (Note that this is an example only, and talker is not being suggested as a URI scheme.) An 'advanced' option in a talker client might allow such a URI as an indication of which room the user wanted to be in after connecting. URIs might also identify users, groups of users, objects, message boards, messages and administrative controls.

Towards a solution

If a co-ordinated effort is made to develop the next generation of talkers, the problems mentioned can be overcome, and the new features introduced. However, doing so is a delicate task; it is essential to ensure that even if talkers move forwards technologically they still retain the distinct character that separates them from the instant messengers, MUDs and—perhaps most importantly—IRC that they're competing with.

I think the primary goal is to effect a paradigm shift where the talkers stops becoming a program that people interact with directly, and becomes a service that people use. The Web is a good model of this, having user agents (web browsers such as IE and Mozilla), servers (such as Apache and IIS) and resources (primarily HTML pages, but also graphical and interactive content). The server delivers the resources to the user via the user agent. In the case of talkers the user agent would be the client software and the resource would be something that defines the unique characteristics of a talker.

Providing a way in which a talker is defined separately from the code which delivers it yields a number strong advantages:

  • the talker is no longer cluttered with boilerplate code for networking, logging, authentication, loading and saving resources, error handling, etc
  • the server can be replaced independently of the talker definition
  • while it would be important to create a reference implementation of the server, independent implementations could be created by others, giving a choice to those creating a talker definition
The last point also applies to user agents where the freedom of a user to choose an implementation that suits him or her is even more important.

This is only possible with standardisation. Just as HTML describes web pages, a method of defining talkers would have to be devised—a mixture of static and scripted content. There would be many issues to address here. For example, would a single scripting language be chosen to aid interoperability? Popular contenders would no doubt be Python, Ruby, Lua and ECMAScript (also known as JavaScript), but each has its champions and critics.

The protocol used for communication between clients and servers would also need to be defined. The previous section placed some requirements on this, but still leaves much open for discussion. While I think a purely XML-based protocol such as XMPP (developed for use by Jabber) is inappropriate, there are other possible starting points such as BEEP.

Once these were defined with reference implementations, all that would be left is the hurdle of persuading people to adopt the new technology. Hopefully, those talkers run by the people involved in creating the new specifications would be compelling demonstrations of the way forward.