Normalization/Feature-scaling¶
Min-Max Normalization¶
-
pykitml.
get_minmax
(array)¶ Returns two row arrays, one array containing minimum values of each column and another one with maximum values.
Parameters: array (numpy.array) – The array to get minimum and maximum values for. Returns: - array_min (numpy.array) – Array containing minimum values of each column.
- array_max (numpy.array) – Array containing maximum values of each column.
-
pykitml.
normalize_minmax
(array, array_min, array_max, cols=[])¶ Normalizes columns of the array to between 0 and 1 using min-max normalization.
Parameters: - array (numpy.array) – The array to normalize.
- array_min (numpy.array) – Array containing minimum values of each column.
- array_max (numpy.array) – Array containing maximum values of each column.
- cols (list) – The columns to normalize. If the list is empty (default), all columns will be normalized.
Returns: The normalized array.
Return type: numpy.array
Note
You can use
get_minmax()
function to getarray_min
andarray_max
parameters.
-
pykitml.
denormalize_minmax
(array, array_min, array_max, cols=[])¶ Denormalizes columns of a min-max normalized array.
Parameters: - array (numpy.array) – The array to denormalize.
- array_min (numpy.array) – Array containing minimum values of each column.
- array_max (numpy.array) – Array containing maximum values of each column.
- cols (list) – The columns to normalize. If the list is empty (default), all columns will be denormalized.
Returns: The denormalized array.
Return type: numpy.array
Note
You can use
get_minmax()
function to getarray_min
andarray_max
parameters.
Example
>>> import numpy as np
>>> import pykitml as pk
>>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
>>> min_array, max_array = pk.get_minmax(a)
>>> normalized_a = pk.normalize_minmax(a, min_array, max_array)
>>> normalized_a
array([[0. , 0. , 0. , 0. ],
[0.33333333, 0.33333333, 0.33333333, 0.33333333],
[0.66666667, 0.66666667, 0.66666667, 0.66666667],
[1. , 1. , 1. , 1. ]])
>>> pk.denormalize_minmax(normalized_a, min_array, max_array)
array([[ 1., 2., 3., 4.],
[ 5., 6., 7., 8.],
[ 9., 10., 11., 12.],
[13., 14., 15., 16.]])
You can also only normalize/denormalize specific columns,
>>> import numpy as np
>>> import pykitml as pk
>>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
>>> min_array, max_array = pk.get_minmax(a)
>>> normalized_a = pk.normalize_minmax(a, min_array, max_array, cols=[0, 2])
>>> normalized_a
array([[ 0. , 2. , 0. , 4. ],
[ 0.33333333, 6. , 0.33333333, 8. ],
[ 0.66666667, 10. , 0.66666667, 12. ],
[ 1. , 14. , 1. , 16. ]])
>>> pk.denormalize_minmax(normalized_a, min_array, max_array, cols=[0, 2])
array([[ 1., 2., 3., 4.],
[ 5., 6., 7., 8.],
[ 9., 10., 11., 12.],
[13., 14., 15., 16.]])
Mean Normalization¶
-
pykitml.
get_meanstd
(array)¶ Returns two row arrays, one array containing mean of each column and another one with standard deviation of each column.
Parameters: array (numpy.array) – The array to get mean and standard deviation values for. Returns: - array_mean (numpy.array) – Array containing mean values of each column.
- array_stddev (numpy.array) – Array containing standard deviation values of each column.
-
pykitml.
normalize_mean
(array, array_mean, array_stddev, cols=[])¶ Normalizes columns of the array with mean normalization.
Parameters: - array (numpy.array) – The array to normalize.
- array_mean (numpy.array) – Array containing mean values of each column.
- array_stddev (numpy.array) – Array containing standard deviation values of each column.
- cols (list) – The columns to normalize. If the list is empty (default), all columns will be normalized.
Returns: The normalized array.
Return type: numpy.array
Note
You can use
get_meanstd()
function to getarray_mean
andarray_stddev
parameters.
-
pykitml.
denormalize_mean
(array, array_mean, array_stddev, cols=[])¶ Denormalizes a mean normalized array.
Parameters: - array (numpy.array) – The array to denormalize.
- array_mean (numpy.array) – Array containing mean values of each column.
- array_stddev (numpy.array) – Array containing standard deviation values of each column.
Returns: The denormalized array.
Return type: numpy.array
Note
You can use
get_meanstd()
function to getarray_mean
andarray_stddev
parameters.
Example
>>> import numpy as np
>>> import pykitml as pk
>>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
>>> array_mean, array_stddev = pk.get_meanstd(a)
>>> normalized_a = pk.normalize_mean(a, array_mean, array_stddev)
>>> normalized_a
array([[-1.34164079, -1.34164079, -1.34164079, -1.34164079],
[-0.4472136 , -0.4472136 , -0.4472136 , -0.4472136 ],
[ 0.4472136 , 0.4472136 , 0.4472136 , 0.4472136 ],
[ 1.34164079, 1.34164079, 1.34164079, 1.34164079]])
>>> pk.denormalize_mean(normalized_a, array_mean, array_stddev)
array([[ 1., 2., 3., 4.],
[ 5., 6., 7., 8.],
[ 9., 10., 11., 12.],
[13., 14., 15., 16.]])
You can also only normalize/denormalize specific columns,
>>> import numpy as np
>>> import pykitml as pk
>>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
>>> array_mean, array_stddev = pk.get_meanstd(a)
>>> normalized_a = pk.normalize_mean(a, array_mean, array_stddev, cols=[0,2])
>>> normalized_a
array([[-1.34164079, 2. , -1.34164079, 4. ],
[-0.4472136 , 6. , -0.4472136 , 8. ],
[ 0.4472136 , 10. , 0.4472136 , 12. ],
[ 1.34164079, 14. , 1.34164079, 16. ]])
>>> pk.denormalize_mean(normalized_a, array_mean, array_stddev, cols=[0,2])
array([[ 1., 2., 3., 4.],
[ 5., 6., 7., 8.],
[ 9., 10., 11., 12.],
[13., 14., 15., 16.]])