4.9.1.1 Codec Objects

The Codec class defines these methods which also define the function interfaces of the stateless encoder and decoder:

encode( input[, errors])

Encodes the object input and returns a tuple (output object, length consumed). While codecs are not restricted to use with Unicode, in a Unicode context, encoding converts a Unicode object to a plain string using a particular character set encoding (e.g., cp1252 or iso-8859-1).

errors defines the error handling to apply. It defaults to 'strict' handling.

The method may not store state in the Codec instance. Use StreamCodec for codecs which have to keep state in order to make encoding/decoding efficient.

The encoder must be able to handle zero length input and return an empty object of the output object type in this situation.

decode( input[, errors])

Decodes the object input and returns a tuple (output object, length consumed). In a Unicode context, decoding converts a plain string encoded using a particular character set encoding to a Unicode object.

input must be an object which provides the bf_getreadbuf buffer slot. Python strings, buffer objects and memory mapped files are examples of objects providing this slot.

errors defines the error handling to apply. It defaults to 'strict' handling.

The method may not store state in the Codec instance. Use StreamCodec for codecs which have to keep state in order to make encoding/decoding efficient.

The decoder must be able to handle zero length input and return an empty object of the output object type in this situation.

The StreamWriter and StreamReader classes provide generic working interfaces which can be used to implement new encodings submodules very easily. See encodings.utf_8 for an example on how this is done.

See About this document... for information on suggesting changes.