Welcome to the documentation of KNNOR - a novel data augmentation technique!¶
KNNOR¶
Generic python library to perform augmentation of data
Features¶
- Help balance datasets by upsampling
- Enhance accuracy of models
Installation¶
- Install knnor by running::
- pip install knnor
Using the Library¶
Convert your dataset to numpy array
All values of the data must be numeric
Separate the data into X and y
X being the feataures and y being the class labels
import numpy as np
from knnor import data_augment
l = [
[1.0,2.0,1.0,0],
[1,3,1,0],
[2,1,1,0],
[3,2,1,0],
[3,1,1,0],
[1,3,4,1],
[1,4,3,1],
[1,4,4,1],
[2,3,3,1],
[2,3,4,1],
[2,4,3,1],
[2,4,4,1],
[3,2,2,1],
[3,3,2,1],
[3,3,2,1],
[3,4,2,1],
[4,3,1,1]
]
l = np.array(l)
print("Original Data:")
print(l)
X = l[:,:-1]
y=l[:,-1]
knnor = data_augment.KNNOR()
X_new, y_new, X_aug_min, y_aug_min = knnor.fit_resample(X,y)
y_new = y_new.reshape(-1,1)
print("KNNOR Data:")
new_data = np.append(X_new, y_new, axis=1)
print(new_data)
new_data contains the augmented data
Original Data:
[[1. 2. 1. 0.]
[1. 3. 1. 0.]
[2. 1. 1. 0.]
[3. 2. 1. 0.]
[3. 1. 1. 0.]
[1. 3. 4. 1.]
[1. 4. 3. 1.]
[1. 4. 4. 1.]
[2. 3. 3. 1.]
[2. 3. 4. 1.]
[2. 4. 3. 1.]
[2. 4. 4. 1.]
[3. 2. 2. 1.]
[3. 3. 2. 1.]
[3. 3. 2. 1.]
[3. 4. 2. 1.]
[4. 3. 1. 1.]]
KNNOR Data:
[[1. 2. 1. 0. ]
[1. 3. 1. 0. ]
[2. 1. 1. 0. ]
[3. 2. 1. 0. ]
[3. 1. 1. 0. ]
[1. 3. 4. 1. ]
[1. 4. 3. 1. ]
[1. 4. 4. 1. ]
[2. 3. 3. 1. ]
[2. 3. 4. 1. ]
[2. 4. 3. 1. ]
[2. 4. 4. 1. ]
[3. 2. 2. 1. ]
[3. 3. 2. 1. ]
[3. 3. 2. 1. ]
[3. 4. 2. 1. ]
[4. 3. 1. 1. ]
[1. 2.8596414 1. 0. ]
[3. 1.89795961 1. 0. ]
[2.76031358 1. 1. 0. ]
[1. 2.95194388 1. 0. ]
[3. 1.72737314 1. 0. ]
[2.712059 1. 1. 0. ]
[1. 2.94970565 1. 0. ]]
Input Parameters¶
Above example leverages the default parameters of the KNNOR algorithm. Following parameters can be used to tweak the functioning of the algorithm
X_new, y_new, X_aug_min, y_aug_min = knnor.fit_resample(X, y,
num_neighbors=10, # the number of neighbors that will be used for generation of each artificial point
max_dist_point=0.01, # the maximum distance at which the new point will be placed
proportion_minority=0.3, # proportion of the minority population that will be used to generate the artificial point
final_proportion=2 # final number of minority datapoints
# example, if num majority =15 and num minority =5,
# putting final_proportion as 1 will add 10 artificial minority points
)
- num_neighbors is the number of neighbors that will be used for generation of each artificial point
- max_dist_point is the maximum distance at which the new point will be placed, 1 being highest
- proportion_minority is the proportion of the minority population that will be used to generate the artificial point
- final_proportion decides the number of minority datapoints to be augmented. For example, if the number of majority datapoints is 15 and the number of minority datapoints is 5, then final_proportion=1 implies that 10 artificial minority points will be added so that the ratio of minority over majority count is 1.
Outputs¶
- X_new: complete data with augmented datapoints
- y_new: Labels including the augmented ones
- X_aug_min: Just the augmented minority points
- y_aug_min: Labels for only augmented minority points
Support¶
If you are having issues, please let us know at ashhadulislam@gmail.com or samir.brahim@gmail.com
Cite¶
If you are using this library in your research please cite the following.
Ashhadul Islam, Samir Brahim Belhaouari, Atiq Ur Rahman, Halima Bensmail, KNNOR: An oversampling technique for imbalanced datasets, Applied Soft Computing, 2021, 108288, ISSN 1568-4946, https://doi.org/10.1016/j.asoc.2021.108288.
(https://www.sciencedirect.com/science/article/pii/S1568494621010942)
License¶
The project is licensed under the MIT license.
Code¶
The code can be found in github:
https://github.com/ashhadulislam/augmentdatalib_source
More examples can be found at
https://github.com/ashhadulislam/augmentdatalib_source/blob/master/example/Example.ipynb