Preprocessing Datasets
Dealing with categorical/one-hot values
- pykitml.onehot(input_array)
Converts input array to one-hot array.
- Parameters:
input_array (numpy.array) – The input numpy array.
- Returns:
one_hot – The converted onehot array.
- Return type:
numpy.array
Example
>>> import numpy as np >>> import pykitml as pk >>> a = np.array([0, 1, 2]) >>> pk.onehot(a) array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])
- pykitml.onehot_cols(dataset, cols)
Converts/replaces columns of dataset to one-hot values.
- Parameters:
dataset (numpy.array) – The input dataset.
cols (list) – The columns which has to be replaced/converted to one-hot values.
- Returns:
dataset_new – The new dataset with replaced columns.
- Return type:
numpy.array
Example
>>> import pykitml as pk >>> import numpy as np >>> a = np.array([[0, 1, 2.2], [1, 2, 3.4], [0, 0, 1.1]]) >>> a array([[0. , 1. , 2.2], [1. , 2. , 3.4], [0. , 0. , 1.1]]) >>> pk.onehot_cols(a, cols=[0, 1]) array([[1. , 0. , 0. , 1. , 0. , 2.2], [0. , 1. , 0. , 0. , 1. , 3.4], [1. , 0. , 1. , 0. , 0. , 1.1]])
- pykitml.onehot_cols_traintest(dataset_train, dataset_test, cols)
Converts/replaces columns of
dataset_trainanddataset_testto one-hot values.- Parameters:
dataset_train (numpy.array) – The training dataset.
dataset_test (numpy.array) – The testing dataset.
cols (list) – The columns which has to be replaced/converted to one-hot values.
- Returns:
dataset_train_new (numpy.array) – The new training dataset with replaced columns.
dataset_test_new (numpy.array) – The new testing dataset with replaced columns.
Example
>>> import pykitml as pk >>> import numpy as np >>> a_train = np.array([[0, 1, 3.2], [1, 2, 3.5], [0, 0, 3.4]]) >>> a_test = np.array([[0, 3, 3.2], [1, 2, 4.5], [1, 3, 4.5]]) >>> a_train_onehot, a_test_onehot = pk.onehot_cols_traintest(a_train, a_test, cols=[0,1]) >>> a_train_onehot array([[1. , 0. , 0. , 1. , 0. , 0. , 3.2], [0. , 1. , 0. , 0. , 1. , 0. , 3.5], [1. , 0. , 1. , 0. , 0. , 0. , 3.4]]) >>> a_test_onehot array([[1. , 0. , 0. , 0. , 0. , 1. , 3.2], [0. , 1. , 0. , 0. , 1. , 0. , 4.5], [0. , 1. , 0. , 0. , 0. , 1. , 4.5]])
Generating Polynomial Features
- pykitml.polynomial(dataset_inputs, degree=3, cols=[])
Generates polynomial features from the input dataset. For example, if an input sample is two dimensional and of the form [a, b], the degree-2 polynomial features are
[a, b, a^2, ab, b^2], and degree-3 polynomial features are[a, b, a^2, ab, b^2, a^3, (a^2)*b, a*(b^2), b^3].- Parameters:
dataset_inputs (numpy.array) – The input dataset to generate the polynomials from.
degree (int) – The degree of the polynomial.
cols (list) – The columns to use to generate polynomial features, columns not in this list will be ignored. If empty (default), all columns will used to generate polynomial features.
- Returns:
The new dataset with polynomial features.
- Return type:
numpy.array
Example
>>> import numpy as np >>> import pykitml as pk >>> pk.polynomial(np.array([[1, 2], [2, 3]]), degree=2) array([[1., 2., 1., 2., 4.], [2., 3., 4., 6., 9.]]) >>> pk.polynomial(np.array([[1, 2], [2, 3]]), degree=3) array([[ 1., 2., 1., 2., 4., 1., 2., 4., 8.], [ 2., 3., 4., 6., 9., 8., 12., 18., 27.]]) >>> pk.polynomial(np.array([[1, 4, 5, 2], [2, 5, 6, 3]]), degree=2, cols=[0, 3]) array([[1., 4., 5., 2., 1., 2., 4.], [2., 5., 6., 3., 4., 6., 9.]])