Categorize

class numcodecs.categorize.Categorize(labels, dtype, astype='u1')

Filter encoding categorical string data as integers.

Parameters:
labels : sequence of strings

Category labels.

dtype : dtype

Data type to use for decoded data.

astype : dtype, optional

Data type to use for encoded data.

Examples

>>> import numcodecs
>>> import numpy as np
>>> x = np.array(['male', 'female', 'female', 'male', 'unexpected'], dtype=object)
>>> x
array(['male', 'female', 'female', 'male', 'unexpected'],
      dtype=object)
>>> codec = numcodecs.Categorize(labels=['female', 'male'], dtype=object)
>>> y = codec.encode(x)
>>> y
array([2, 1, 1, 2, 0], dtype=uint8)
>>> z = codec.decode(y)
>>> z
array(['male', 'female', 'female', 'male', ''],
      dtype=object)
codec_id = 'categorize'
encode(buf)

Encode data in buf.

Parameters:
buf : buffer-like

Data to be encoded. May be any object supporting the new-style buffer protocol or array.array under Python 2.

Returns:
enc : buffer-like

Encoded data. May be any object supporting the new-style buffer protocol or array.array under Python 2.

decode(buf, out=None)

Decode data in buf.

Parameters:
buf : buffer-like

Encoded data. May be any object supporting the new-style buffer protocol or array.array under Python 2.

out : buffer-like, optional

Writeable buffer to store decoded data. N.B. if provided, this buffer must be exactly the right size to store the decoded data.

Returns:
dec : buffer-like

Decoded data. May be any object supporting the new-style buffer protocol or array.array under Python 2.

get_config()

Return a dictionary holding configuration parameters for this codec. Must include an ‘id’ field with the codec identifier. All values must be compatible with JSON encoding.

classmethod from_config(config)

Instantiate codec from a configuration object.