Categorize

class numcodecs.categorize.Categorize(labels, dtype, astype='u1')[source]

Filter encoding categorical string data as integers.

Parameters
labelssequence of strings

Category labels.

dtypedtype

Data type to use for decoded data.

astypedtype, optional

Data type to use for encoded data.

Examples

>>> import numcodecs
>>> import numpy as np
>>> x = np.array(['male', 'female', 'female', 'male', 'unexpected'], dtype=object)
>>> x
array(['male', 'female', 'female', 'male', 'unexpected'],
      dtype=object)
>>> codec = numcodecs.Categorize(labels=['female', 'male'], dtype=object)
>>> y = codec.encode(x)
>>> y
array([2, 1, 1, 2, 0], dtype=uint8)
>>> z = codec.decode(y)
>>> z
array(['male', 'female', 'female', 'male', ''],
      dtype=object)
codec_id = 'categorize'

Codec identifier.

encode(buf)[source]

Encode data in buf.

Parameters
bufbuffer-like

Data to be encoded. May be any object supporting the new-style buffer protocol.

Returns
encbuffer-like

Encoded data. May be any object supporting the new-style buffer protocol.

decode(buf, out=None)[source]

Decode data in buf.

Parameters
bufbuffer-like

Encoded data. May be any object supporting the new-style buffer protocol.

outbuffer-like, optional

Writeable buffer to store decoded data. N.B. if provided, this buffer must be exactly the right size to store the decoded data.

Returns
decbuffer-like

Decoded data. May be any object supporting the new-style buffer protocol.

get_config()[source]

Return a dictionary holding configuration parameters for this codec. Must include an ‘id’ field with the codec identifier. All values must be compatible with JSON encoding.

classmethod from_config(config)

Instantiate codec from a configuration object.