Categorize¶
-
class
numcodecs.categorize.
Categorize
(labels, dtype, astype='u1')[source]¶ Filter encoding categorical string data as integers.
Parameters: - labels : sequence of strings
Category labels.
- dtype : dtype
Data type to use for decoded data.
- astype : dtype, optional
Data type to use for encoded data.
Examples
>>> import numcodecs >>> import numpy as np >>> x = np.array(['male', 'female', 'female', 'male', 'unexpected'], dtype=object) >>> x array(['male', 'female', 'female', 'male', 'unexpected'], dtype=object) >>> codec = numcodecs.Categorize(labels=['female', 'male'], dtype=object) >>> y = codec.encode(x) >>> y array([2, 1, 1, 2, 0], dtype=uint8) >>> z = codec.decode(y) >>> z array(['male', 'female', 'female', 'male', ''], dtype=object)
-
codec_id
= 'categorize'¶
-
encode
(buf)[source]¶ Encode data in buf.
Parameters: - buf : buffer-like
Data to be encoded. May be any object supporting the new-style buffer protocol.
Returns: - enc : buffer-like
Encoded data. May be any object supporting the new-style buffer protocol.
-
decode
(buf, out=None)[source]¶ Decode data in buf.
Parameters: - buf : buffer-like
Encoded data. May be any object supporting the new-style buffer protocol.
- out : buffer-like, optional
Writeable buffer to store decoded data. N.B. if provided, this buffer must be exactly the right size to store the decoded data.
Returns: - dec : buffer-like
Decoded data. May be any object supporting the new-style buffer protocol.
-
get_config
()[source]¶ Return a dictionary holding configuration parameters for this codec. Must include an ‘id’ field with the codec identifier. All values must be compatible with JSON encoding.
-
classmethod
from_config
(config)¶ Instantiate codec from a configuration object.