Preprocessing Datasets¶
Dealing with categorical/one-hot values¶
-
pykitml.
onehot
(input_array)¶ Converts input array to one-hot array.
Parameters: input_array (numpy.array) – The input numpy array. Returns: one_hot – The converted onehot array. Return type: numpy.array Example
>>> import numpy as np >>> import pykitml as pk >>> a = np.array([0, 1, 2]) >>> pk.onehot(a) array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])
-
pykitml.
onehot_cols
(dataset, cols)¶ Converts/replaces columns of dataset to one-hot values.
Parameters: - dataset (numpy.array) – The input dataset.
- cols (list) – The columns which has to be replaced/converted to one-hot values.
Returns: dataset_new – The new dataset with replaced columns.
Return type: numpy.array
Example
>>> import pykitml as pk >>> import numpy as np >>> a = np.array([[0, 1, 2.2], [1, 2, 3.4], [0, 0, 1.1]]) >>> a array([[0. , 1. , 2.2], [1. , 2. , 3.4], [0. , 0. , 1.1]]) >>> pk.onehot_cols(a, cols=[0, 1]) array([[1. , 0. , 0. , 1. , 0. , 2.2], [0. , 1. , 0. , 0. , 1. , 3.4], [1. , 0. , 1. , 0. , 0. , 1.1]])
-
pykitml.
onehot_cols_traintest
(dataset_train, dataset_test, cols)¶ Converts/replaces columns of
dataset_train
anddataset_test
to one-hot values.Parameters: - dataset_train (numpy.array) – The training dataset.
- dataset_test (numpy.array) – The testing dataset.
- cols (list) – The columns which has to be replaced/converted to one-hot values.
Returns: - dataset_train_new (numpy.array) – The new training dataset with replaced columns.
- dataset_test_new (numpy.array) – The new testing dataset with replaced columns.
Example
>>> import pykitml as pk >>> import numpy as np >>> a_train = np.array([[0, 1, 3.2], [1, 2, 3.5], [0, 0, 3.4]]) >>> a_test = np.array([[0, 3, 3.2], [1, 2, 4.5], [1, 3, 4.5]]) >>> a_train_onehot, a_test_onehot = pk.onehot_cols_traintest(a_train, a_test, cols=[0,1]) >>> a_train_onehot array([[1. , 0. , 0. , 1. , 0. , 0. , 3.2], [0. , 1. , 0. , 0. , 1. , 0. , 3.5], [1. , 0. , 1. , 0. , 0. , 0. , 3.4]]) >>> a_test_onehot array([[1. , 0. , 0. , 0. , 0. , 1. , 3.2], [0. , 1. , 0. , 0. , 1. , 0. , 4.5], [0. , 1. , 0. , 0. , 0. , 1. , 4.5]])
Generating Polynomial Features¶
-
pykitml.
polynomial
(dataset_inputs, degree=3, cols=[])¶ Generates polynomial features from the input dataset. For example, if an input sample is two dimensional and of the form [a, b], the degree-2 polynomial features are
[a, b, a^2, ab, b^2]
, and degree-3 polynomial features are[a, b, a^2, ab, b^2, a^3, (a^2)*b, a*(b^2), b^3]
.Parameters: - dataset_inputs (numpy.array) – The input dataset to generate the polynomials from.
- degree (int) – The degree of the polynomial.
- cols (list) – The columns to use to generate polynomial features, columns not in this list will be ignored. If empty (default), all columns will used to generate polynomial features.
Returns: The new dataset with polynomial features.
Return type: numpy.array
Example
>>> import numpy as np >>> import pykitml as pk >>> pk.polynomial(np.array([[1, 2], [2, 3]]), degree=2) array([[1., 2., 1., 2., 4.], [2., 3., 4., 6., 9.]]) >>> pk.polynomial(np.array([[1, 2], [2, 3]]), degree=3) array([[ 1., 2., 1., 2., 4., 1., 2., 4., 8.], [ 2., 3., 4., 6., 9., 8., 12., 18., 27.]]) >>> pk.polynomial(np.array([[1, 4, 5, 2], [2, 5, 6, 3]]), degree=2, cols=[0, 3]) array([[1., 4., 5., 2., 1., 2., 4.], [2., 5., 6., 3., 4., 6., 9.]])