Numcodecs

Numcodecs is a Python package providing buffer compression and transformation codecs for use in data storage and communication applications. These include:

  • Compression codecs, e.g., Zlib, BZ2, LZMA and Blosc.
  • Pre-compression filters, e.g., Delta, Quantize, FixedScaleOffset, PackBits, Categorize.
  • Integrity checks, e.g., CRC32, Adler32.

All codecs implement the same API, allowing codecs to be organized into pipelines in a variety of ways.

If you have a question, find a bug, would like to make a suggestion or contribute code, please raise an issue on GitHub.

Installation

Numcodecs depends on NumPy. It is generally best to install NumPy first using whatever method is most appropriate for you operating system and Python distribution.

Install from PyPI:

$ pip install numcodecs

Alternatively, install via conda:

$ conda install -c conda-forge numcodecs

Numcodecs includes a C extension providing integration with the Blosc library. Installing via conda will install a pre-compiled binary distribution. However, if you have a newer CPU that supports the AVX2 instruction set (e.g., Intel Haswell, Broadwell or Skylake) then installing via pip is preferable, because this will compile the Blosc library from source with optimisations for AVX2.

Note that if you compile the C extensions on a machine with AVX2 support you probably then cannot use the same binaries on a machine without AVX2. To disable compilation with AVX2 support regardless of the machine architecture:

$ export DISABLE_NUMCODECS_AVX2=
$ pip install -v --no-cache-dir numcodecs

To work with Numcodecs source code in development, install from GitHub:

$ git clone --recursive https://github.com/alimanfoo/numcodecs.git
$ cd numcodecs
$ python setup.py install

To verify that Numcodecs has been fully installed (including the Blosc extension) run the test suite:

$ pip install nose
$ python -m nose -v numcodecs

Contents

Codec API

This module defines the Codec base class, a common interface for all codec classes.

Codec classes must implement Codec.encode() and Codec.decode() methods. Inputs to and outputs from these methods may be any Python object exporting a contiguous buffer via the new-style Python protocol or array.array under Python 2.

Codec classes must implement a Codec.get_config() method, which must return a dictionary holding all configuration parameters required to enable encoding and decoding of data. The expectation is that these configuration parameters will be stored or communicated separately from encoded data, and thus the codecs do not need to store all encoding parameters within the encoded data. For broad compatibility, the configuration object must contain only JSON-serializable values. The configuration object must also contain an ‘id’ field storing the codec identifier (see below).

Codec classes must implement a Codec.from_config() class method, which will return an instance of the class initiliazed from a configuration object.

Finally, codec classes must set a codec_id class-level attribute. This must be a string. Two different codec classes may set the same value for the codec_id attribute if and only if they are fully compatible, meaning that (1) configuration parameters are the same, and (2) given the same configuration, one class could correctly decode data encoded by the other and vice versa.

class numcodecs.abc.Codec

Codec abstract base class.

codec_id = None
encode(buf)

Encode data in buf.

Parameters:

buf : buffer-like

Data to be encoded. May be any object supporting the new-style buffer protocol or array.array under Python 2.

Returns:

enc : buffer-like

Encoded data. May be any object supporting the new-style buffer protocol or array.array under Python 2.

decode(buf, out=None)

Decode data in buf.

Parameters:

buf : buffer-like

Encoded data. May be any object supporting the new-style buffer protocol or array.array under Python 2.

out : buffer-like, optional

Writeable buffer to store decoded data. N.B. if provided, this buffer must be exactly the right size to store the decoded data.

Returns:

dec : buffer-like

Decoded data. May be any object supporting the new-style buffer protocol or array.array under Python 2.

get_config()

Return a dictionary holding configuration parameters for this codec. Must include an ‘id’ field with the codec identifier. All values must be compatible with JSON encoding.

classmethod from_config(config)

Instantiate codec from a configuration object.

Codec registry

The registry module provides some simple convenience functions to enable applications to dynamically register and look-up codec classes.

numcodecs.registry.get_codec(config)

Obtain a codec for the given configuration.

Parameters:

config : dict-like

Configuration object.

Returns:

codec : Codec

Examples

>>> import numcodecs as codecs
>>> codec = codecs.get_codec(dict(id='zlib', level=1))
>>> codec
Zlib(level=1)
numcodecs.registry.register_codec(cls, codec_id=None)

Register a codec class.

Parameters:cls : Codec class

Notes

This function maintains a mapping from codec identifiers to codec classes. When a codec class is registered, it will replace any class previously registered under the same codec identifier, if present.

Blosc

class numcodecs.blosc.Blosc

Codec providing compression using the Blosc meta-compressor.

Parameters:

cname : string, optional

A string naming one of the compression algorithms available within blosc, e.g., ‘zstd’, ‘blosclz’, ‘lz4’, ‘lz4hc’, ‘zlib’ or ‘snappy’.

clevel : integer, optional

An integer between 0 and 9 specifying the compression level.

shuffle : integer, optional

Either NOSHUFFLE (0), SHUFFLE (1), BITSHUFFLE (2) or AUTOSHUFFLE (-1). If -1 (default), bit-shuffle will be used for buffers with itemsize 1, and byte-shuffle will be used otherwise.

blocksize : int

The requested size of the compressed blocks. If 0 (default), an automatic blocksize will be used.

codec_id = 'blosc'
NOSHUFFLE = 0
SHUFFLE = 1
BITSHUFFLE = 2
AUTOSHUFFLE = -1
encode(self, buf)
decode(self, buf, out=None)
get_config()

Return a dictionary holding configuration parameters for this codec. Must include an ‘id’ field with the codec identifier. All values must be compatible with JSON encoding.

from_config(config)

Instantiate codec from a configuration object.

Helper functions

numcodecs.blosc.init()

Initialize the Blosc library environment.

numcodecs.blosc.destroy()

Destroy the Blosc library environment.

numcodecs.blosc.compname_to_compcode(cname)

Return the compressor code associated with the compressor name. If the compressor name is not recognized, or there is not support for it in this build, -1 is returned instead.

numcodecs.blosc.list_compressors()

Get a list of compressors supported in the current build.

numcodecs.blosc.get_nthreads()

Get the number of threads that Blosc uses internally for compression and decompression.

numcodecs.blosc.set_nthreads(int nthreads)

Set the number of threads that Blosc uses internally for compression and decompression.

numcodecs.blosc.cbuffer_sizes(source)

Return information about a compressed buffer, namely the number of uncompressed bytes (nbytes) and compressed (cbytes). It also returns the blocksize (which is used internally for doing the compression by blocks).

Returns:

nbytes : int

cbytes : int

blocksize : int

numcodecs.blosc.cbuffer_complib(source)

Return the name of the compression library used to compress source.

numcodecs.blosc.cbuffer_metainfo(source)

Return some meta-information about the compressed buffer in source, including the typesize, whether the shuffle or bit-shuffle filters were used, and the whether the buffer was memcpyed.

Returns:

typesize

shuffle

memcpyed

numcodecs.blosc.compress(source, char *cname, int clevel, int shuffle=SHUFFLE, int blocksize=AUTOBLOCKS)

Compress data.

Parameters:

source : bytes-like

Data to be compressed. Can be any object supporting the buffer protocol.

cname : bytes

Name of compression library to use.

clevel : int

Compression level.

shuffle : int

Either NOSHUFFLE (0), SHUFFLE (1), BITSHUFFLE (2) or AUTOSHUFFLE (-1). If -1 (default), bit-shuffle will be used for buffers with itemsize 1, and byte-shuffle will be used otherwise.

blocksize : int

The requested size of the compressed blocks. If 0, an automatic blocksize will be used.

Returns:

dest : bytes

Compressed data.

numcodecs.blosc.decompress(source, dest=None)

Decompress data.

Parameters:

source : bytes-like

Compressed data, including blosc header. Can be any object supporting the buffer protocol.

dest : array-like, optional

Object to decompress into.

Returns:

dest : bytes

Object containing decompressed data.

LZ4

class numcodecs.lz4.LZ4

Codec providing compression using LZ4.

Parameters:

acceleration : int

Acceleration level. The larger the acceleration value, the faster the algorithm, but also the lesser the compression.

codec_id = 'lz4'
encode(self, buf)
decode(self, buf, out=None)
get_config()

Return a dictionary holding configuration parameters for this codec. Must include an ‘id’ field with the codec identifier. All values must be compatible with JSON encoding.

from_config(config)

Instantiate codec from a configuration object.

Helper functions

numcodecs.lz4.compress(source, int acceleration=DEFAULT_ACCELERATION)

Compress data.

Parameters:

source : bytes-like

Data to be compressed. Can be any object supporting the buffer protocol.

acceleration : int

Acceleration level. The larger the acceleration value, the faster the algorithm, but also the lesser the compression.

Returns:

dest : bytes

Compressed data.

Notes

The compressed output includes a 4-byte header storing the original size of the decompressed data as a little-endian 32-bit integer.

numcodecs.lz4.decompress(source, dest=None)

Decompress data.

Parameters:

source : bytes-like

Compressed data. Can be any object supporting the buffer protocol.

dest : array-like, optional

Object to decompress into.

Returns:

dest : bytes

Object containing decompressed data.

Zstd

class numcodecs.zstd.Zstd

Codec providing compression using Zstandard.

Parameters:

level : int

Compression level (1-22).

codec_id = 'zstd'
encode(self, buf)
decode(self, buf, out=None)
get_config()

Return a dictionary holding configuration parameters for this codec. Must include an ‘id’ field with the codec identifier. All values must be compatible with JSON encoding.

from_config(config)

Instantiate codec from a configuration object.

Helper functions

numcodecs.zstd.compress(source, int level=DEFAULT_CLEVEL)

Compress data.

Parameters:

source : bytes-like

Data to be compressed. Can be any object supporting the buffer protocol.

level : int

Compression level (1-22).

Returns:

dest : bytes

Compressed data.

numcodecs.zstd.decompress(source, dest=None)

Decompress data.

Parameters:

source : bytes-like

Compressed data. Can be any object supporting the buffer protocol.

dest : array-like, optional

Object to decompress into.

Returns:

dest : bytes

Object containing decompressed data.

Zlib

class numcodecs.zlib.Zlib(level=1)

Codec providing compression using zlib via the Python standard library.

Parameters:

level : int

Compression level.

codec_id = 'zlib'
encode(buf)
decode(buf, out=None)
get_config()

Return a dictionary holding configuration parameters for this codec. Must include an ‘id’ field with the codec identifier. All values must be compatible with JSON encoding.

from_config(config)

Instantiate codec from a configuration object.

BZ2

class numcodecs.bz2.BZ2(level=1)

Codec providing compression using bzip2 via the Python standard library.

Parameters:

level : int

Compression level.

codec_id = 'bz2'
encode(buf)
decode(buf, out=None)
get_config()

Return a dictionary holding configuration parameters for this codec. Must include an ‘id’ field with the codec identifier. All values must be compatible with JSON encoding.

from_config(config)

Instantiate codec from a configuration object.

LZMA

class numcodecs.lzma.LZMA(format=1, check=-1, preset=None, filters=None)

Codec providing compression using lzma via the Python standard library (only available under Python 3).

Parameters:

format : integer, optional

One of the lzma format codes, e.g., lzma.FORMAT_XZ.

check : integer, optional

One of the lzma check codes, e.g., lzma.CHECK_NONE.

preset : integer, optional

An integer between 0 and 9 inclusive, specifying the compression level.

filters : list, optional

A list of dictionaries specifying compression filters. If filters are provided, ‘preset’ must be None.

codec_id = 'lzma'
encode(buf)
decode(buf, out=None)
get_config()

Return a dictionary holding configuration parameters for this codec. Must include an ‘id’ field with the codec identifier. All values must be compatible with JSON encoding.

from_config(config)

Instantiate codec from a configuration object.

Delta

class numcodecs.delta.Delta(dtype, astype=None)

Codec to encode data as the difference between adjacent values.

Parameters:

dtype : dtype

Data type to use for decoded data.

astype : dtype, optional

Data type to use for encoded data.

Notes

If astype is an integer data type, please ensure that it is sufficiently large to store encoded values. No checks are made and data may become corrupted due to integer overflow if astype is too small. Note also that the encoded data for each chunk includes the absolute value of the first element in the chunk, and so the encoded data type in general needs to be large enough to store absolute values from the array.

Examples

>>> import numcodecs
>>> import numpy as np
>>> x = np.arange(100, 120, 2, dtype='i8')
>>> codec = numcodecs.Delta(dtype='i8', astype='i1')
>>> y = codec.encode(x)
>>> y
array([100,   2,   2,   2,   2,   2,   2,   2,   2,   2], dtype=int8)
>>> z = codec.decode(y)
>>> z
array([100, 102, 104, 106, 108, 110, 112, 114, 116, 118])
codec_id = 'delta'
encode(buf)
decode(buf, out=None)
get_config()
from_config(config)

Instantiate codec from a configuration object.

FixedScaleOffset

class numcodecs.fixedscaleoffset.FixedScaleOffset(offset, scale, dtype, astype=None)

Simplified version of the scale-offset filter available in HDF5. Applies the transformation (x - offset) * scale to all chunks. Results are rounded to the nearest integer but are not packed according to the minimum number of bits.

Parameters:

offset : float

Value to subtract from data.

scale : int

Value to multiply by data.

dtype : dtype

Data type to use for decoded data.

astype : dtype, optional

Data type to use for encoded data.

Notes

If astype is an integer data type, please ensure that it is sufficiently large to store encoded values. No checks are made and data may become corrupted due to integer overflow if astype is too small.

Examples

>>> import numcodecs
>>> import numpy as np
>>> x = np.linspace(1000, 1001, 10, dtype='f8')
>>> x
array([ 1000.        ,  1000.11111111,  1000.22222222,  1000.33333333,
        1000.44444444,  1000.55555556,  1000.66666667,  1000.77777778,
        1000.88888889,  1001.        ])
>>> codec = numcodecs.FixedScaleOffset(offset=1000, scale=10, dtype='f8', astype='u1')
>>> y1 = codec.encode(x)
>>> y1
array([ 0,  1,  2,  3,  4,  6,  7,  8,  9, 10], dtype=uint8)
>>> z1 = codec.decode(y1)
>>> z1
array([ 1000. ,  1000.1,  1000.2,  1000.3,  1000.4,  1000.6,  1000.7,
        1000.8,  1000.9,  1001. ])
>>> codec = numcodecs.FixedScaleOffset(offset=1000, scale=10**2, dtype='f8', astype='u1')
>>> y2 = codec.encode(x)
>>> y2
array([  0,  11,  22,  33,  44,  56,  67,  78,  89, 100], dtype=uint8)
>>> z2 = codec.decode(y2)
>>> z2
array([ 1000.  ,  1000.11,  1000.22,  1000.33,  1000.44,  1000.56,
        1000.67,  1000.78,  1000.89,  1001.  ])
>>> codec = numcodecs.FixedScaleOffset(offset=1000, scale=10**3, dtype='f8', astype='u2')
>>> y3 = codec.encode(x)
>>> y3
array([   0,  111,  222,  333,  444,  556,  667,  778,  889, 1000], dtype=uint16)
>>> z3 = codec.decode(y3)
>>> z3
array([ 1000.   ,  1000.111,  1000.222,  1000.333,  1000.444,  1000.556,
        1000.667,  1000.778,  1000.889,  1001.   ])
codec_id = 'fixedscaleoffset'
encode(buf)
decode(buf, out=None)
get_config()
from_config(config)

Instantiate codec from a configuration object.

Quantize

class numcodecs.quantize.Quantize(digits, dtype, astype=None)

Lossy filter to reduce the precision of floating point data.

Parameters:

digits : int

Desired precision (number of decimal digits).

dtype : dtype

Data type to use for decoded data.

astype : dtype, optional

Data type to use for encoded data.

Examples

>>> import numcodecs
>>> import numpy as np
>>> x = np.linspace(0, 1, 10, dtype='f8')
>>> x
array([ 0.        ,  0.11111111,  0.22222222,  0.33333333,  0.44444444,
        0.55555556,  0.66666667,  0.77777778,  0.88888889,  1.        ])
>>> codec = numcodecs.Quantize(digits=1, dtype='f8')
>>> codec.encode(x)
array([ 0.    ,  0.125 ,  0.25  ,  0.3125,  0.4375,  0.5625,  0.6875,
        0.75  ,  0.875 ,  1.    ])
>>> codec = numcodecs.Quantize(digits=2, dtype='f8')
>>> codec.encode(x)
array([ 0.       ,  0.109375 ,  0.21875  ,  0.3359375,  0.4453125,
        0.5546875,  0.6640625,  0.78125  ,  0.890625 ,  1.       ])
>>> codec = numcodecs.Quantize(digits=3, dtype='f8')
>>> codec.encode(x)
array([ 0.        ,  0.11132812,  0.22265625,  0.33300781,  0.44433594,
        0.55566406,  0.66699219,  0.77734375,  0.88867188,  1.        ])
codec_id = 'quantize'
encode(buf)
decode(buf, out=None)
get_config()
from_config(config)

Instantiate codec from a configuration object.

PackBits

class numcodecs.packbits.PackBits

Codec to pack elements of a boolean array into bits in a uint8 array.

Notes

The first element of the encoded array stores the number of bits that were padded to complete the final byte.

Examples

>>> import numcodecs
>>> import numpy as np
>>> codec = numcodecs.PackBits()
>>> x = np.array([True, False, False, True], dtype=bool)
>>> y = codec.encode(x)
>>> y
array([  4, 144], dtype=uint8)
>>> z = codec.decode(y)
>>> z
array([ True, False, False,  True], dtype=bool)
codec_id = 'packbits'
encode(buf)
decode(buf, out=None)
get_config()

Return a dictionary holding configuration parameters for this codec. Must include an ‘id’ field with the codec identifier. All values must be compatible with JSON encoding.

from_config(config)

Instantiate codec from a configuration object.

Categorize

class numcodecs.categorize.Categorize(labels, dtype, astype='u1')

Filter encoding categorical string data as integers.

Parameters:

labels : sequence of strings

Category labels.

dtype : dtype

Data type to use for decoded data.

astype : dtype, optional

Data type to use for encoded data.

Examples

>>> import numcodecs
>>> import numpy as np
>>> x = np.array(['male', 'female', 'female', 'male', 'unexpected'], dtype=object)
>>> x
array(['male', 'female', 'female', 'male', 'unexpected'],
      dtype=object)
>>> codec = numcodecs.Categorize(labels=['female', 'male'], dtype=object)
>>> y = codec.encode(x)
>>> y
array([2, 1, 1, 2, 0], dtype=uint8)
>>> z = codec.decode(y)
>>> z
array(['male', 'female', 'female', 'male', ''],
      dtype=object)
codec_id = 'categorize'
encode(buf)
decode(buf, out=None)
get_config()
from_config(config)

Instantiate codec from a configuration object.

32-bit checksums

CRC32

class numcodecs.checksum32.CRC32
codec_id = 'crc32'
encode(buf)
decode(buf, out=None)
get_config()

Return a dictionary holding configuration parameters for this codec. Must include an ‘id’ field with the codec identifier. All values must be compatible with JSON encoding.

from_config(config)

Instantiate codec from a configuration object.

Adler32

class numcodecs.checksum32.Adler32
codec_id = 'adler32'
encode(buf)
decode(buf, out=None)
get_config()

Return a dictionary holding configuration parameters for this codec. Must include an ‘id’ field with the codec identifier. All values must be compatible with JSON encoding.

from_config(config)

Instantiate codec from a configuration object.

AsType

class numcodecs.astype.AsType(encode_dtype, decode_dtype)

Filter to convert data between different types.

Parameters:

encode_dtype : dtype

Data type to use for encoded data.

decode_dtype : dtype, optional

Data type to use for decoded data.

Notes

If encode_dtype is of lower precision than decode_dtype, please be aware that data loss can occur by writing data to disk using this filter. No checks are made to ensure the casting will work in that direction and data corruption will occur.

Examples

>>> import numcodecs
>>> import numpy as np
>>> x = np.arange(100, 120, 2, dtype=np.int8)
>>> x
array([100, 102, 104, 106, 108, 110, 112, 114, 116, 118], dtype=int8)
>>> f = numcodecs.AsType(encode_dtype=x.dtype, decode_dtype=np.int64)
>>> y = f.decode(x)
>>> y
array([100, 102, 104, 106, 108, 110, 112, 114, 116, 118])
>>> z = f.encode(y)
>>> z
array([100, 102, 104, 106, 108, 110, 112, 114, 116, 118], dtype=int8)
codec_id = 'astype'
encode(buf)
decode(buf, out=None)
get_config()
from_config(config)

Instantiate codec from a configuration object.

JSON

class numcodecs.json.JSON(encoding='utf-8', skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=True, indent=None, separators=None, strict=True)

Codec to encode data as JSON. Useful for encoding an array of Python string objects.

Examples

>>> import numcodecs
>>> import numpy as np
>>> x = np.array(['foo', 'bar', 'baz'], dtype='object')
>>> codec = numcodecs.JSON()
>>> codec.decode(codec.encode(x))
array(['foo', 'bar', 'baz'], dtype=object)
codec_id = 'json'
encode(buf)
decode(buf, out=None)
get_config()
from_config(config)

Instantiate codec from a configuration object.

Pickle

class numcodecs.pickles.Pickle(protocol=2)

Codec to encode data as as pickled bytes. Useful for encoding an array of Python string objects.

Parameters:

protocol : int, defaults to pickle.HIGHEST_PROTOCOL

The protocol used to pickle data.

Examples

>>> import numcodecs as codecs
>>> import numpy as np
>>> x = np.array(['foo', 'bar', 'baz'], dtype='object')
>>> f = codecs.Pickle()
>>> f.decode(f.encode(x))
array(['foo', 'bar', 'baz'], dtype=object)
codec_id = 'pickle'
encode(buf)
decode(buf, out=None)
get_config()
from_config(config)

Instantiate codec from a configuration object.

MsgPack

class numcodecs.msgpacks.MsgPack(encoding='utf-8')

Codec to encode data as msgpacked bytes. Useful for encoding an array of Python string objects.

Notes

Requires msgpack-python to be installed.

Examples

>>> import numcodecs
>>> import numpy as np
>>> x = np.array(['foo', 'bar', 'baz'], dtype='object')
>>> codec = numcodecs.MsgPack()
>>> codec.decode(codec.encode(x))
array(['foo', 'bar', 'baz'], dtype=object)
codec_id = 'msgpack'
encode(buf)
decode(buf, out=None)
get_config()
from_config(config)

Instantiate codec from a configuration object.

Codecs for variable-length objects

VLenUTF8

class numcodecs.vlen.VLenUTF8

Encode variable-length unicode string objects via UTF-8.

Notes

The encoded bytes values for each string are packed into a parquet-style byte array.

Examples

>>> import numcodecs
>>> import numpy as np
>>> x = np.array(['foo', 'bar', 'baz'], dtype='object')
>>> codec = numcodecs.VLenUTF8()
>>> codec.decode(codec.encode(x))
array(['foo', 'bar', 'baz'], dtype=object)
codec_id = 'vlen-utf8'
encode(self, buf)
decode(self, buf, out=None)
get_config()

Return a dictionary holding configuration parameters for this codec. Must include an ‘id’ field with the codec identifier. All values must be compatible with JSON encoding.

from_config(config)

Instantiate codec from a configuration object.

VLenBytes

class numcodecs.vlen.VLenBytes

Encode variable-length byte string objects.

Notes

The bytes values for each string are packed into a parquet-style byte array.

Examples

>>> import numcodecs
>>> import numpy as np
>>> x = np.array([b'foo', b'bar', b'baz'], dtype='object')
>>> codec = numcodecs.VLenBytes()
>>> codec.decode(codec.encode(x))
array([b'foo', b'bar', b'baz'], dtype=object)
codec_id = 'vlen-bytes'
encode(self, buf)
decode(self, buf, out=None)
get_config()

Return a dictionary holding configuration parameters for this codec. Must include an ‘id’ field with the codec identifier. All values must be compatible with JSON encoding.

from_config(config)

Instantiate codec from a configuration object.

VLenArray

class numcodecs.vlen.VLenArray

Encode variable-length 1-dimensional arrays via UTF-8.

Notes

The binary data for each array are packed into a parquet-style byte array.

Examples

>>> import numcodecs
>>> import numpy as np
>>> x = np.array([[1, 3, 5], [4], [7, 9]], dtype='object')
>>> codec = numcodecs.VLenArray('<i4')
>>> codec.decode(codec.encode(x))
array([array([1, 3, 5], dtype=int32), array([4], dtype=int32),
       array([7, 9], dtype=int32)], dtype=object)
codec_id = 'vlen-array'
encode(self, buf)
decode(self, buf, out=None)
get_config(self)
from_config(config)

Instantiate codec from a configuration object.

Release notes

0.5.2

  • Add support for encoding None values in VLen... codecs (#59).

0.5.1

  • Fixed a compatibility issue with the Zlib codec to ensure it can handle bytearray objects under Python 2.7 (#57).
  • Restricted the numcodecs.categorize.Categorize codec to object (‘O’) and unicode (‘U’) dtypes and disallowed bytes (‘S’) dtypes because these do not round-trip through JSON configuration.

0.5.0

0.4.1

  • Resolved an issue where providing an array with dtype object as the destination when decoding could cause segaults with some codecs (#55).

0.4.0

0.3.1

  • Revert the default shuffle argument to SHUFFLE (byte shuffle) for the numcodecs.blosc.Blosc codec for compatibility and consistency with previous code.

0.3.0

  • The numcodecs.blosc.Blosc codec has been made robust for usage in both multithreading and multiprocessing programs, regardless of whether Blosc has been configured to use multiple threads internally or not (#41, #42).
  • The numcodecs.blosc.Blosc codec now supports an AUTOSHUFFLE argument when encoding (compressing) which activates bit- or byte-shuffle depending on the itemsize of the incoming buffer (#37, #42). This is also now the default.
  • The numcodecs.blosc.Blosc codec now raises an exception when an invalid compressor name is provided under all circumstances (#40, #42).
  • The bundled version of the c-blosc library has been upgraded to version 1.12.1 (#45, #42).
  • An improvement has been made to the system detection capabilities during compilation of C extensions (by Prakhar Goel; #36, #38).
  • Arrays with datetime64 or timedelta64 can now be passed directly to compressor codecs (#39, #46).

0.2.1

The bundled c-blosc libary has been upgraded to version 1.11.3 (#34, #35).

0.2.0

New codecs:

Other changes:

Maintenance work:

  • A data fixture has been added to the test suite to add some protection against changes to codecs that break backwards-compatibility with data encoded using a previous release of numcodecs (#30, #33).

0.1.1

This release includes a small modification to the setup.py script to provide greater control over how compiler options for different instruction sets are configured (#24, #27).

0.1.0

New codecs:

Other new features:

  • The numcodecs.lzma.LZMA codec is now supported on Python 2.7 if backports.lzma is installed (John Kirkham; #11, #13).
  • The bundled c-blosc library has been upgraded to version 1.11.2 (#10, #18).
  • An option has been added to the numcodecs.blosc.Blosc codec to allow the block size to be manually configured (#9, #19).
  • The representation string for the numcodecs.blosc.Blosc codec has been tweaked to help with understanding the shuffle option (#4, #19).
  • Options have been added to manually control how the C extensions are built regardless of the architecture of the system on which the build is run. To disable support for AVX2 set the environment variable “DISABLE_NUMCODECS_AVX2”. To disable support for SSE2 set the environment variable “DISABLE_NUMCODECS_SSE2”. To disable C extensions altogether set the environment variable “DISABLE_NUMCODECS_CEXT” (#24, #26).

Maintenance work:

  • CI tests now run under Python 3.6 as well as 2.7, 3.4, 3.5 (#16, #17).
  • Test coverage is now monitored via coveralls (#15, #20).

0.0.1

Fixed project description in setup.py.

0.0.0

First release. This version is a port of the codecs module from Zarr 2.1.0. The following changes have been made from the original Zarr module:

  • Codec classes have been re-organized into separate modules, mostly one per codec class, for ease of maintenance.
  • Two new codec classes have been added based on 32-bit checksums: numcodecs.checksum32.CRC32 and numcodecs.checksum32.Adler32.
  • The Blosc extension has been refactored to remove code duplications related to handling of buffer compatibility.

Acknowledgments

Numcodecs bundles the c-blosc library.

Development of this package is supported by the MRC Centre for Genomics and Global Health.

Indices and tables