An Adaptive Benchmark Testing Method for Evaluating Arithmetic Precision

Authors

DOI:

https://doi.org/10.31861/sisiot2025.2.02003

Keywords:

arithmetic verification, floating-point arithmetic, precision evaluation, numerical accuracy, adaptive testing

Abstract

Floating-point arithmetic is inherently prone to precision errors, which can accumulate over time and significantly influence the outcomes of numerical computations. This work presents a method designed to systematically assess and compare the accuracy of various arithmetic implementations by adaptively refining test inputs in response to observed computational inaccuracies. In contrast to conventional approaches that use either fixed sets of numerical values or random sampling techniques, the method introduced here continuously updates the test set. It does so by identifying areas in the numerical domain where computational errors tend to be the most significant. The refinement process is iterative and guided by statistical analysis of previous results, ensuring that regions with elevated error levels receive more focused attention in subsequent testing phases. At the heart of the method is an adaptive process for determining which numerical values require further examination. This is achieved by analyzing the distribution of previously recorded errors and updating a decision criterion based on those findings. Specifically, thresholds for acceptable accuracy are recalculated using statistical measures such as quantiles, which reflect the severity and frequency of encountered errors. This ensures that the refinement of test inputs is driven by actual data, rather than relying on predetermined heuristics. The method begins with the generation of a diverse collection of numerical inputs that spans a broad spectrum of floating-point values, including those known to cause instability in calculations – such as extremely small or large values and those located at the boundaries of numerical precision. These inputs are then used to perform arithmetic operations including addition, subtraction, multiplication, and division. Two different arithmetic implementations are evaluated: the standard arithmetic used in a widely adopted programming language and an alternative, custom-developed arithmetic designed to enhance numerical accuracy. For each operation, the resulting values produced by the two arithmetic systems are compared. Measures of accuracy are derived by calculating the differences between the outputs using both absolute and relative error estimations. These differences are then statistically analyzed to detect patterns in the occurrence and magnitude of errors. Based on this analysis, if the error associated with a particular input is determined to be higher than expected, additional test values are generated in the vicinity of that input. This is accomplished through carefully controlled variations, allowing the method to explore neighboring regions where similar errors might occur. In this way, the test suite evolves over time, becoming increasingly focused on those numerical situations that are most likely to expose weaknesses in arithmetic implementations. By uncovering patterns in how errors emerge and accumulate, the method provides a structured and repeatable process for evaluating the reliability of floating-point arithmetic under varying conditions. Its targeted nature makes it especially useful for scientific and engineering applications, where computational precision is critical. In summary, this approach improves upon traditional benchmarking techniques by introducing an adaptive, data-driven strategy that emphasizes the most challenging areas of numerical computation. As such, it offers a powerful tool for the verification and validation of arithmetic systems, supporting both development and quality assurance in software that relies heavily on floating-point calculations.

Downloads

Download data is not yet available.

Author Biography

  • Denys Deineko, Yuriy Fedkovych Chernivtsi National University

    PhD student at Computer Systems Software Department, Yuriy Fedkovych Chernivtsi National University. Master of Computer Science. Research Interests: instrumentation and profiling of CPython, compiler-level optimizations for numerical correctness.

References

G. Dahlquist and Åke Björck, Numerical Methods in Scientific Computing. SIAM, 2008.

N. J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd ed. SIAM, 2002.

T. Nelson, E. Rivera, S. Soucie, T. D. Vecchio, J. Wrenn, and S. Krishnamurthi, “Automated, Targeted Testing of Property-Based Testing Predicates,” The Art Science and Engineering of Programming, vol. 6, no. 2, Nov. 2021, doi: https://doi.org/10.22152/programming-journal.org/2022/6/10.

N. Metropolis and S. Ulam, “The Monte Carlo Method,” Journal of the American Statistical Association, vol. 44, no. 247, pp. 335–341, Sep. 1949, doi: https://doi.org/10.1080/01621459.1949.10483310.

“Monte Carlo: Concepts, Algorithms, and Applications,” Journal of Computational and Applied Mathematics, vol. 75, no. 2, pp. N3–N4, Nov. 1996, doi: https://doi.org/10.1016/s0377-0427(97)80822-9.

J. R. D’Errico, “An Adaptive Quadrature Routine,” ACM SIGAPL APL Quote Quad, vol. 17, no. 2, pp. 19–20, Dec. 1986, doi: https://doi.org/10.1145/9327.9331.

M. Redmann and S. Riedel, “Runge-Kutta Methods for Rough Differential Equations,” Journal of Stochastic Analysis, vol. 3, no. 4, Dec. 2022, doi: https://doi.org/10.31390/josa.3.4.06.

P. Godefroid, “Fuzzing,” Communications of the ACM, vol. 63, no. 2, pp. 70–76, Jan. 2020, doi: https://doi.org/10.1145/3363824.

B. Beizer and J. Wiley, “Black Box Testing: Techniques for Functional Testing of Software and Systems,” IEEE Software, vol. 13, no. 5, p. 98, Sep. 1996, doi: https://doi.org/10.1109/ms.1996.536464.

E. Hairer et al., Solving Ordinary Differential Equations I, 2nd ed. Springer Series in Computational Mathematics. Berlin, Heidelberg: Springer Berlin Heidelberg, 1993.

L. F. Shampine and M. W. Reichelt, “The MATLAB ODE Suite,” SIAM Journal on Scientific Computing, vol. 18, no. 1, pp. 1–22, Jan. 1997, doi: https://doi.org/10.1137/s1064827594276424.

P. Virtanen et al., “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python,” Nature Methods, vol. 17, no. 3, pp. 261–272, Feb. 2020, doi: https://doi.org/10.1038/s41592-019-0686-2.

“Defining Extension Types: Tutorial,” Python Documentation, 2025. [Online]. Available: https://docs.python.org/3/extending/newtypes_tutorial.html#adding-data-and-methods-to-the-basic-example (accessed Dec. 08, 2025).

S. S. Shapiro and M. B. Wilk, “An Analysis of Variance Test for Normality (Complete Samples),” Biometrika, vol. 52, no. 3/4, pp. 591–611, 1965, doi: https://doi.org/10.2307/2333709.

B. L. Welch, “The Generalization of `Student’s’ Problem When Several Different Population Variances Are Involved,” Biometrika, vol. 34, no. 1/2, p. 28, Jan. 1947, doi: https://doi.org/10.2307/2332510.

J. Cohen, Statistical Power Analysis for the Behavioral Sciences, 2nd ed. Routledge, 1988.

R. D. Richtmyer and R. E. Moore, “Interval Analysis,” Mathematics of Computation, vol. 22, no. 101, p. 219, Jan. 1968, doi: https://doi.org/10.2307/2004792.

“Adapting strategies - Hypothesis 6.148.7 documentation,” Readthedocs.io, 2025. [Online]. Available: https://hypothesis.readthedocs.io/en/latest/tutorial/adapting-strategies.html (accessed Dec. 08, 2025).

Downloads


Abstract views: 12

Published

2025-12-30

Issue

Section

Articles

How to Cite

[1]
D. Deineko, “An Adaptive Benchmark Testing Method for Evaluating Arithmetic Precision”, SISIOT, vol. 3, no. 2, p. 02003, Dec. 2025, doi: 10.31861/sisiot2025.2.02003.

Similar Articles

11-20 of 47

You may also start an advanced similarity search for this article.