Categorize

class numcodecs.categorize.Categorize(labels, dtype, astype='u1')[source]

Filter encoding categorical string data as integers.

Parameters:
labelssequence of strings

Category labels.

dtypedtype

Data type to use for decoded data.

astypedtype, optional

Data type to use for encoded data.

Examples

>>> import numcodecs
>>> import numpy as np
>>> x = np.array(['male', 'female', 'female', 'male', 'unexpected'], dtype=object)
>>> x
array(['male', 'female', 'female', 'male', 'unexpected'],
      dtype=object)
>>> codec = numcodecs.Categorize(labels=['female', 'male'], dtype=object)
>>> y = codec.encode(x)
>>> y
array([2, 1, 1, 2, 0], dtype=uint8)
>>> z = codec.decode(y)
>>> z
array(['male', 'female', 'female', 'male', ''],
      dtype=object)
codec_id = 'categorize'

Codec identifier.

encode(buf)[source]

Encode data in buf.

Parameters:
bufbuffer-like

Data to be encoded. May be any object supporting the new-style buffer protocol.

Returns:
encbuffer-like

Encoded data. May be any object supporting the new-style buffer protocol.

decode(buf, out=None)[source]

Decode data in buf.

Parameters:
bufbuffer-like

Encoded data. May be any object supporting the new-style buffer protocol.

outbuffer-like, optional

Writeable buffer to store decoded data. N.B. if provided, this buffer must be exactly the right size to store the decoded data.

Returns:
decbuffer-like

Decoded data. May be any object supporting the new-style buffer protocol.

get_config()[source]

Return a dictionary holding configuration parameters for this codec. Must include an ‘id’ field with the codec identifier. All values must be compatible with JSON encoding.

classmethod from_config(config)

Instantiate codec from a configuration object.