I have just released the source code of the BIGNUM Library for Multiple Precision Arithmetic v7.61, which I have implemented as part of my diploma thesis and a subsequent paper.

More can be found here: http://wrogn.com/bignum/

I have just released the source code of the BIGNUM Library for Multiple Precision Arithmetic v7.61, which I have implemented as part of my diploma thesis and a subsequent paper.

More can be found here: http://wrogn.com/bignum/

“Fast Multiplication of Large Integers: Implementation and Analysis of the DKSS Algorithm”, that is my diploma thesis. I just uploaded it to arXiv: http://arxiv.org/abs/1503.04955

Abstract: The Schönhage-Strassen algorithm (SSA) is the de-facto standard for multiplication of large integers. For N-bit numbers it has a time bound of \(O(N \cdot \log N \cdot \log \log N)\). De, Kurur, Saha and Saptharishi (DKSS) presented an asymptotically faster algorithm with a better time bound of \(N \cdot \log N \cdot 2^{O(\log^∗ N)}\). In this diploma thesis, results of an implementation of DKSS multiplication are presented: run-time is about 30 times larger than SSA, while memory requirements are about 3.75 times higher than SSA. A possible crossover point is estimated to be out of reach even if we utilized the whole universe for computer memory.

It contains not only what the title promises, but also a long presentation of my own endeavors regarding fast multiplication, from ordinary multiplication, Karatsuba, Toom-Cook 3-way and Schönhage-Strassen with theory and some code examples.

I’m happy for any commentary.

I wrote a C++ function to multiply two large positive integers of the same length, say \(n\) 64-bit words, with the grade-school method. Let’s call that function `omul_n()`. Then, I wrote extensive benchmarking to assess the speed of my efforts. The resulting run-times for the multiplication of two numbers with \(n\) words look like this:

Words | Cycles |
---|---|

1 | 18 |

9 | 281 |

17 | 915 |

25 | 1959 |

33 | 3421 |

41 | 5207 |

49 | 7392 |

57 | 10093 |

65 | 13000 |

73 | 16397 |

81 | 20224 |

89 | 24326 |

97 | 28800 |

105 | 33941 |

113 | 39764 |

121 | 45487 |

129 | 51212 |

137 | 57453 |

145 | 64142 |

153 | 71778 |

Now I wanted to find a closed function to most accurately describe the run-time of `omul_n()` We know that to multiply two numbers of \(n\) digits each, we need to do \(n^2\) digit-multiplications. So, most likely, the desired function will look something like $$ T(n) = c_0 + c_1 n + c_2 n^2. $$

The only question is: what values to use for \(c_0\), \(c_1\) and \(c_2\)? I like linear regression, but it only works for linear relationships, like \(T(n) = c_0 + c_1 n\). We cannot use that here.

The solution to my question is curve-fitting. I used Python functions to do so, namely `scipy.optimize.curve_fit` from the SciPy package (a good starter article that inspired my use of curve-fittings is here.)

The program is really simple. You input your data plus the describing function (like \(T(n)\) above) into the curve-fitting function and out pop the coefficients \(c_i\) that yield the \(T(n)\) with the least squared error.

The Python script:

omul_str = open("omul-speed.txt", "r").read() # read measured values o = [float(i) for i in omul_str.split()] # make one big list os = o[0::2] # slice out first column ot = o[1::2] # slice out second column import numpy as np # imports from scipy.optimize import curve_fit # the magic function xdata = np.array(os) # convert lists to np.array ydata = np.array(ot) def func(x, c0, c1, c2): # the modeled function return c0 + c1*x + c2*x*x popt, pcov = curve_fit(func, xdata, ydata) # and fit it! print(popt) # print optimized parameters

If you’re not used to NumPy, `array` features an unfamiliar usage:

Python 3.4.1 |Anaconda 2.1.0 (64-bit)| ... Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> a = np.array([1,2,3]) >>> a array([1, 2, 3]) >>> import math >>> math.log(a) Traceback (most recent call last): File "", line 1, in TypeError: only length-1 arrays can be converted to Python scalars >>> np.log(a) array([ 0. , 0.69314718, 1.09861229])

NumPy functions that are applied to an `array` again return an array with values of said function applied to every `array` element. That comes in pretty handy when handling larger sets of data.

Back to our curve-fitting. The above listed script generates this output:

[-60.37910437 5.09798716 3.03566267]

That means that the best fitting function is about

$$ T_\text{abs}(n) = -60 + 5.1 \cdot n + 3.04 \cdot n^2. $$

Pretty neat, eh? Plotted it looks like this. The red line is not the connection of the dots, but our model:

So far, so very cool. An issue arises when we look at the relative errors between data points and model. That is, \(|T(n) / T_n|\), where \(T(n)\) is our model and \(T_n\) is the measured run-time. In contrast, the above curve-fitting minimized the absolute error \(|T(n) – T_n|\). (Actually, it minimized the *squared* absolute error, but I let that slide here and focus on absolute vs. relative error.)

Some additional lines of Python code added to the end of our script will print the relative errors and their average:

relerr = abs(1 - ydata / func(xdata, *popt)) # relative errors np.set_printoptions(suppress=True) # switch off sci. notation print(relerr * 100) avgrel = sum(relerr) / len(ydata) * 100 # calc average print("avgrel:", avgrel)

Which does produce this extra output:

[ 134.45275796 21.43922899 1.26238363 0.27284922 0.21410507 0.84902538 1.15067892 0.00073479 0.73808731 0.55686398 0.22467506 0.46166589 0.6782697 0.00615863 1.23715341 1.07859592 0.19226999 0.28013432 0.56064524 0.00479269] avgrel: 8.28305380503

So, we have an average relative error of 8 %, which seems rather high for me. Obviously, the relative error is extremely high with the two starting values: 134 % and 21 %. Can we improve that? That is, can we model so that the average and maximum relative error is lower?

Least squares optimization with minimized absolute error is used very widely, but unfortunately, there is no easy way to switch the functions performing this to minimize the relative error. But I found this forum post that was very helpful. It’s on some other math software system, but we can borrow the idea: *“Usually the best way to do relative error is to log your model. This changes a proportional error structure into an additive one, which is exactly what you want”* (with “*log*” as in

logarithm).

Luckily, that is very easy to accomplish in Python. This is a changed version of the earlier script:

omul_str = open("omul-speed.txt", "r").read() # read measured values o = [float(i) for i in omul_str.split()] # make one big list os = o[0::2] # slice out first column ot = o[1::2] # slice out second column import numpy as np # imports from scipy.optimize import curve_fit # the magic function xdata = np.array(os) # convert lists to np.array ydata = np.array(ot) def func(x, c0, c1, c2): # the modeled function return c0 + c1*x + c2*x*x def logfunc(x, c0, c1, c2): # ... and the log of it return np.log(func(x, c0, c1, c2)) popt, pcov = curve_fit(logfunc, xdata, np.log(ydata)) # and fit it! print(popt) # print optimized parameters relerr = abs(1 - ydata / func(xdata, *popt)) # relative errors np.set_printoptions(suppress=True) # switch off sci. notation print(relerr * 100) avgrel = sum(relerr) / len(ydata) * 100 # calc average print("avgrel:", avgrel)

And now the output looks like this:

[ 12.98237958 1.9705695 3.05332744] [ 0.03485745 1.06567529 1.49572472 0.58745607 0.52644119 0.37155771 0.65289914 0.47219135 0.31728127 0.18879885 0.09165894 0.19598836 0.45928895 0.171688 1.37775523 1.18298158 0.26311364 0.23936649 0.54721279 0.01646704] avgrel: 0.512920201737

Awesome! The average relative error is down to 0.5 % with a maximum of 1.5 %.

The linear plot looks largely the same, because the absolute differences are too small to see. But if we switch to a double-logarithmic plot, we can see them clearly:

Clearly, the smaller the values are, the larger the difference is between the red graph (minimized absolute errors model) and the data points, whereas the green graph (minimized relative errors) is much closer to the data points for small \(n\).

There is a nicely typeset PDF of this article available here.