Categorize#
- class numcodecs.categorize.Categorize(labels, dtype, astype='u1')[source]#
Filter encoding categorical string data as integers.
- Parameters:
- labelssequence of strings
Category labels.
- dtypedtype
Data type to use for decoded data.
- astypedtype, optional
Data type to use for encoded data.
Examples
>>> import numcodecs >>> import numpy as np >>> x = np.array(['male', 'female', 'female', 'male', 'unexpected'], dtype=object) >>> x array(['male', 'female', 'female', 'male', 'unexpected'], dtype=object) >>> codec = numcodecs.Categorize(labels=['female', 'male'], dtype=object) >>> y = codec.encode(x) >>> y array([2, 1, 1, 2, 0], dtype=uint8) >>> z = codec.decode(y) >>> z array(['male', 'female', 'female', 'male', ''], dtype=object)
- codec_id: str | None = 'categorize'#
Codec identifier.
- encode(buf)[source]#
Encode data in buf.
- Parameters:
- bufbuffer-like
Data to be encoded. May be any object supporting the new-style buffer protocol.
- Returns:
- encbuffer-like
Encoded data. May be any object supporting the new-style buffer protocol.
- decode(buf, out=None)[source]#
Decode data in buf.
- Parameters:
- bufbuffer-like
Encoded data. May be any object supporting the new-style buffer protocol.
- outbuffer-like, optional
Writeable buffer to store decoded data. N.B. if provided, this buffer must be exactly the right size to store the decoded data.
- Returns:
- decbuffer-like
Decoded data. May be any object supporting the new-style buffer protocol.
- get_config()[source]#
Return a dictionary holding configuration parameters for this codec. Must include an ‘id’ field with the codec identifier. All values must be compatible with JSON encoding.
- classmethod from_config(config)#
Instantiate codec from a configuration object.