Normalization/Feature-scaling

Min-Max Normalization

pykitml.get_minmax(array)

Returns two row arrays, one array containing minimum values of each column and another one with maximum values.

Parameters:array (numpy.array) – The array to get minimum and maximum values for.
Returns:
  • array_min (numpy.array) – Array containing minimum values of each column.
  • array_max (numpy.array) – Array containing maximum values of each column.
pykitml.normalize_minmax(array, array_min, array_max, cols=[])

Normalizes columns of the array to between 0 and 1 using min-max normalization.

Parameters:
  • array (numpy.array) – The array to normalize.
  • array_min (numpy.array) – Array containing minimum values of each column.
  • array_max (numpy.array) – Array containing maximum values of each column.
  • cols (list) – The columns to normalize. If the list is empty (default), all columns will be normalized.
Returns:

The normalized array.

Return type:

numpy.array

Note

You can use get_minmax() function to get array_min and array_max parameters.

pykitml.denormalize_minmax(array, array_min, array_max, cols=[])

Denormalizes columns of a min-max normalized array.

Parameters:
  • array (numpy.array) – The array to denormalize.
  • array_min (numpy.array) – Array containing minimum values of each column.
  • array_max (numpy.array) – Array containing maximum values of each column.
  • cols (list) – The columns to normalize. If the list is empty (default), all columns will be denormalized.
Returns:

The denormalized array.

Return type:

numpy.array

Note

You can use get_minmax() function to get array_min and array_max parameters.

Example

>>> import numpy as np
>>> import pykitml as pk
>>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
>>> min_array, max_array = pk.get_minmax(a)
>>> normalized_a = pk.normalize_minmax(a, min_array, max_array)
>>> normalized_a
array([[0.        , 0.        , 0.        , 0.        ],
       [0.33333333, 0.33333333, 0.33333333, 0.33333333],
       [0.66666667, 0.66666667, 0.66666667, 0.66666667],
       [1.        , 1.        , 1.        , 1.        ]])
>>> pk.denormalize_minmax(normalized_a, min_array, max_array)
array([[ 1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.],
       [ 9., 10., 11., 12.],
       [13., 14., 15., 16.]])

You can also only normalize/denormalize specific columns,

>>> import numpy as np
>>> import pykitml as pk
>>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
>>> min_array, max_array = pk.get_minmax(a)
>>> normalized_a = pk.normalize_minmax(a, min_array, max_array, cols=[0, 2])
>>> normalized_a
array([[ 0.        ,  2.        ,  0.        ,  4.        ],
       [ 0.33333333,  6.        ,  0.33333333,  8.        ],
       [ 0.66666667, 10.        ,  0.66666667, 12.        ],
       [ 1.        , 14.        ,  1.        , 16.        ]])
>>> pk.denormalize_minmax(normalized_a, min_array, max_array, cols=[0, 2])
array([[ 1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.],
       [ 9., 10., 11., 12.],
       [13., 14., 15., 16.]])

Mean Normalization

pykitml.get_meanstd(array)

Returns two row arrays, one array containing mean of each column and another one with standard deviation of each column.

Parameters:array (numpy.array) – The array to get mean and standard deviation values for.
Returns:
  • array_mean (numpy.array) – Array containing mean values of each column.
  • array_stddev (numpy.array) – Array containing standard deviation values of each column.
pykitml.normalize_mean(array, array_mean, array_stddev, cols=[])

Normalizes columns of the array with mean normalization.

Parameters:
  • array (numpy.array) – The array to normalize.
  • array_mean (numpy.array) – Array containing mean values of each column.
  • array_stddev (numpy.array) – Array containing standard deviation values of each column.
  • cols (list) – The columns to normalize. If the list is empty (default), all columns will be normalized.
Returns:

The normalized array.

Return type:

numpy.array

Note

You can use get_meanstd() function to get array_mean and array_stddev parameters.

pykitml.denormalize_mean(array, array_mean, array_stddev, cols=[])

Denormalizes a mean normalized array.

Parameters:
  • array (numpy.array) – The array to denormalize.
  • array_mean (numpy.array) – Array containing mean values of each column.
  • array_stddev (numpy.array) – Array containing standard deviation values of each column.
Returns:

The denormalized array.

Return type:

numpy.array

Note

You can use get_meanstd() function to get array_mean and array_stddev parameters.

Example

>>> import numpy as np
>>> import pykitml as pk
>>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
>>> array_mean, array_stddev = pk.get_meanstd(a)
>>> normalized_a = pk.normalize_mean(a, array_mean, array_stddev)
>>> normalized_a
array([[-1.34164079, -1.34164079, -1.34164079, -1.34164079],
       [-0.4472136 , -0.4472136 , -0.4472136 , -0.4472136 ],
       [ 0.4472136 ,  0.4472136 ,  0.4472136 ,  0.4472136 ],
       [ 1.34164079,  1.34164079,  1.34164079,  1.34164079]])
>>> pk.denormalize_mean(normalized_a, array_mean, array_stddev)
array([[ 1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.],
       [ 9., 10., 11., 12.],
       [13., 14., 15., 16.]])

You can also only normalize/denormalize specific columns,

>>> import numpy as np
>>> import pykitml as pk
>>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
>>> array_mean, array_stddev = pk.get_meanstd(a)
>>> normalized_a = pk.normalize_mean(a, array_mean, array_stddev, cols=[0,2])
>>> normalized_a
array([[-1.34164079,  2.        , -1.34164079,  4.        ],
       [-0.4472136 ,  6.        , -0.4472136 ,  8.        ],
       [ 0.4472136 , 10.        ,  0.4472136 , 12.        ],
       [ 1.34164079, 14.        ,  1.34164079, 16.        ]])
>>> pk.denormalize_mean(normalized_a, array_mean, array_stddev, cols=[0,2])
array([[ 1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.],
       [ 9., 10., 11., 12.],
       [13., 14., 15., 16.]])