Proven Precision

Utilizing three publicly available and well respected datasets, Integral Regularization © was compared to standard neural network regularization techniques in the fields of finance, image processing, and education.  For each dataset, Integral Regularization © out performed all seven unique combinations of the standard neural network regularization techniques: activity, dropout, and kernel.

Credit Card Fraud

Regularization Type
Incorrect Predictions
F1 Score
Single Pass Run Time
Best of Standard Techniques
167
0.7067
85
Integral Regularization
26
0.9089
157
Percent Improvement
+642%
+28.6%
-46%
credit card fraud machine learning

The Credit Card Fraud Data Set is taken from real world European credit card fraud cases.  The data set contains 284,807 credit card transactions, 30 features per sample, and highly imbalanced classes. 

The data was standardized and fed into a fully connected dense layered neural network architecture.  Cross entropy loss was used to classify transactions as: either "fraud" or "not fraud".

https://www.kaggle.com/mlg-ulb/creditcardfraud 

MNIST Handwritten Digits

Regularization Type
Incorrect Predictions
F1 Score
Single Pass Run Time
Best of Standard Techniques
43
0.9932
1083
Integral Regularization
34
0.9945
1200
Percent Improvement
+27%
+0.1%
-10%

The MNIST Handwritten Digits Data Set is comprised of 42,000 handwritten digits from zero through nine.  Each sample is a 28 by 28 grey scale image.

The data was prepossessed by rescaling it into the range of minus three to three.  Each image was augmented every epoch.  The neural network architecture had initial convolution layers that fed into dense layers.  Cross entropy loss was used to classify images into ten categories, one for each number.

https://www.kaggle.com/c/digit-recognizer

U.S. Graduate Admissions

Regularization Type
MSE Loss
Single Pass Run Time
Best of Standard Techniques
0.005469
190
Integral Regularization
0.005231
39
Percent Improvement
+4.5%
+487%
research deep learning

The U.S. Graduate Admissions Data Set is comprised of 500 sample points of real world, students data with seven features per sample.  The features include GPA, test scores, and research experience.

The data was prepossessed with principal component analysis.  The neural network architecture had seven dense fully connected layers.  Mean squared error loss was used to fit the samples to their targets. 

 

https://www.kaggle.com/tanmoyie/us-graduate-schools-admission-parameters