Welcome to the documentation of KNNOR - a novel data augmentation technique!

KNNOR

Generic python library to perform augmentation of data

Features

  • Help balance datasets by upsampling
  • Enhance accuracy of models

Installation

Install knnor by running::
pip install knnor

Using the Library

Convert your dataset to numpy array

All values of the data must be numeric

Separate the data into X and y

X being the feataures and y being the class labels

import numpy as np
from knnor import data_augment
l = [

  [1.0,2.0,1.0,0],
  [1,3,1,0],
  [2,1,1,0],
  [3,2,1,0],
  [3,1,1,0],

  [1,3,4,1],
  [1,4,3,1],
  [1,4,4,1],

  [2,3,3,1],
  [2,3,4,1],
  [2,4,3,1],
  [2,4,4,1],

  [3,2,2,1],
  [3,3,2,1],
  [3,3,2,1],
  [3,4,2,1],

  [4,3,1,1]
]

l = np.array(l)
print("Original Data:")
print(l)
X = l[:,:-1]
y=l[:,-1]

knnor = data_augment.KNNOR()
X_new, y_new, X_aug_min, y_aug_min = knnor.fit_resample(X,y)
y_new = y_new.reshape(-1,1)

print("KNNOR Data:")
new_data = np.append(X_new, y_new, axis=1)
print(new_data)

new_data contains the augmented data

Original Data:
[[1. 2. 1. 0.]
 [1. 3. 1. 0.]
 [2. 1. 1. 0.]
 [3. 2. 1. 0.]
 [3. 1. 1. 0.]
 [1. 3. 4. 1.]
 [1. 4. 3. 1.]
 [1. 4. 4. 1.]
 [2. 3. 3. 1.]
 [2. 3. 4. 1.]
 [2. 4. 3. 1.]
 [2. 4. 4. 1.]
 [3. 2. 2. 1.]
 [3. 3. 2. 1.]
 [3. 3. 2. 1.]
 [3. 4. 2. 1.]
 [4. 3. 1. 1.]]

KNNOR Data:
[[1.         2.         1.         0.        ]
 [1.         3.         1.         0.        ]
 [2.         1.         1.         0.        ]
 [3.         2.         1.         0.        ]
 [3.         1.         1.         0.        ]
 [1.         3.         4.         1.        ]
 [1.         4.         3.         1.        ]
 [1.         4.         4.         1.        ]
 [2.         3.         3.         1.        ]
 [2.         3.         4.         1.        ]
 [2.         4.         3.         1.        ]
 [2.         4.         4.         1.        ]
 [3.         2.         2.         1.        ]
 [3.         3.         2.         1.        ]
 [3.         3.         2.         1.        ]
 [3.         4.         2.         1.        ]
 [4.         3.         1.         1.        ]
 [1.         2.8596414  1.         0.        ]
 [3.         1.89795961 1.         0.        ]
 [2.76031358 1.         1.         0.        ]
 [1.         2.95194388 1.         0.        ]
 [3.         1.72737314 1.         0.        ]
 [2.712059   1.         1.         0.        ]
 [1.         2.94970565 1.         0.        ]]

Input Parameters

Above example leverages the default parameters of the KNNOR algorithm. Following parameters can be used to tweak the functioning of the algorithm

X_new, y_new, X_aug_min, y_aug_min = knnor.fit_resample(X, y,
                            num_neighbors=10, # the number of neighbors that will be used for generation of each artificial point
                            max_dist_point=0.01, # the maximum distance at which the new point will be placed
                            proportion_minority=0.3, # proportion of the minority population that will be used to generate the artificial point
                            final_proportion=2 # final number of minority datapoints
                             # example, if num majority =15 and num minority =5,
#           putting final_proportion as 1 will add 10 artificial minority points
                            )
  • num_neighbors is the number of neighbors that will be used for generation of each artificial point
  • max_dist_point is the maximum distance at which the new point will be placed, 1 being highest
  • proportion_minority is the proportion of the minority population that will be used to generate the artificial point
  • final_proportion decides the number of minority datapoints to be augmented. For example, if the number of majority datapoints is 15 and the number of minority datapoints is 5, then final_proportion=1 implies that 10 artificial minority points will be added so that the ratio of minority over majority count is 1.

Outputs

  • X_new: complete data with augmented datapoints
  • y_new: Labels including the augmented ones
  • X_aug_min: Just the augmented minority points
  • y_aug_min: Labels for only augmented minority points

Support

If you are having issues, please let us know at ashhadulislam@gmail.com or samir.brahim@gmail.com

Cite

If you are using this library in your research please cite the following.

Ashhadul Islam, Samir Brahim Belhaouari, Atiq Ur Rahman, Halima Bensmail, KNNOR: An oversampling technique for imbalanced datasets, Applied Soft Computing, 2021, 108288, ISSN 1568-4946, https://doi.org/10.1016/j.asoc.2021.108288.

(https://www.sciencedirect.com/science/article/pii/S1568494621010942)

License

The project is licensed under the MIT license.