I have just released PtCut 3.5.0. You can download it here. Full PtCut page here. I know, I tend to gossip. Sorry.

# Tag Archives: Python

# Finally! PPLpy for Windows (Python 3.7)!

Jonas Weinz has finally managed to compiled a version of PPLpy (together with gmpy2) that works under Windows and can be installed without compiling the whole lot. Many thanks again! đź™‚ This version works for Python 3.7 only.

His compiled binaries can be found on his Github page, together with his recipe how to compile it yourself.

I mirror the files here as well:

# PtCut v3.0.0 released!

I am happy to announce the latest release of PtCut, my software to calculate Tropical Prevarieties and Tropical Equilibria.

The news for version 3.0.0 is that it now supports pplpy instead of SageMath. That makes it much smaller, faster to start, slightly faster in execution and allows Python 3.x to be used. If you work natively on Linux, you can start it easily from the commandline.

Jonas Weinz has produced a first set of pplpy wheels for 64-bit Linux and Python 3.5 & 3.6, see here.

Share and enjoy!

# First version of Python PPLpy wheel available

Jonas Weinz has produced the first version of Python wheels of the **pplpy** library. Great work Jonas, thanks!

PPLpy, by Vincent Delecroix, is a Python wrapper for PPL, the Parma Polyhedral Library. It lets you create and intersect high-dimensional polyhedra. This is what I need for tropical geometry and especially for PtCut, my program to find tropical equilibria and prevarieties.

To use pplpy, you usually need to compile PPL and some other libraries from their C sources. This is long and cumbersome, so Jonas made the effort to write build scripts for that and provide Python wheels. A *wheel* is a pre-compiled library that you can just install without compiling anything.

On Jonas’ github page you can find three wheels that should work any x64 Linux and for Python 3.5 and 3.6. You need all three wheels (gmpy2, pplpy and pplpy_dependencies). And yes, you need exactly **this** gmpy2, even if you already have another version of gmpy2 installed. Furthermore, you need to install cython and cysignals as well.

To install, follow the instructions on Jonas’ page. Jonas didn’t mention this, but you might need to set their location (likely /usr/local/lib) in LD_LIBRARY_PATH.

I mirror Jonas files here as well:

# Simple Top-Down Parsing in Python 3.x

I found this nice top-down parser on Fredrik Lundh’s site http://effbot.org/ (thanks, Fredrik!), but unfortunately, it was in Python 2.x.Â To make it run under Python 3.x, you only need some small changes, but *where* took me some time…

So here is the result for everyone to download: topdownparse-py3k.zip.

# Curve-fitting With Minimized Relative Error

### The Problem

I wrote a C++ function to multiply two large positive integers of the same length, say \(n\) 64-bit words, with the grade-school method. Let’s call that function `omul_n()`. Then, I wrote extensive benchmarking to assess the speed of my efforts. The resulting run-times for the multiplication of two numbers with \(n\) words look like this:

Words | Cycles |
---|---|

1 | 18 |

9 | 281 |

17 | 915 |

25 | 1959 |

33 | 3421 |

41 | 5207 |

49 | 7392 |

57 | 10093 |

65 | 13000 |

73 | 16397 |

81 | 20224 |

89 | 24326 |

97 | 28800 |

105 | 33941 |

113 | 39764 |

121 | 45487 |

129 | 51212 |

137 | 57453 |

145 | 64142 |

153 | 71778 |

Now I wanted to find a closed function to most accurately describe the run-time ofÂ `omul_n()` We know that to multiply two numbers of \(n\) digits each, we need to do \(n^2\) digit-multiplications. So, most likely, the desired function will look something like $$Â T(n) = c_0 + c_1 n + c_2 n^2.Â $$

The only question is: what values to use for \(c_0\), \(c_1\) and \(c_2\)? I like linear regression, but it only works for linear relationships, like \(T(n) = c_0 + c_1 n\). We cannot use that here.

### The First Solution

The solution to my question is curve-fitting. I used Python functions to do so, namely `scipy.optimize.curve_fit` from the SciPy package (a good starter article that inspired my use of curve-fittings is here.)

The program is really simple. You input your data plus the describing function (like \(T(n)\) above) into the curve-fitting function and out pop the coefficients \(c_i\) that yield the \(T(n)\) with the least squared error.

The Python script:

omul_str = open("omul-speed.txt", "r").read() # read measured values o = [float(i) for i in omul_str.split()] # make one big list os = o[0::2] # slice out first column ot = o[1::2] # slice out second column import numpy as np # imports from scipy.optimize import curve_fit # the magic function xdata = np.array(os) # convert lists to np.array ydata = np.array(ot) def func(x, c0, c1, c2): # the modeled function return c0 + c1*x + c2*x*x popt, pcov = curve_fit(func, xdata, ydata) # and fit it! print(popt) # print optimized parameters

If you’re not used to NumPy, `array` features an unfamiliar usage:

Python 3.4.1 |Anaconda 2.1.0 (64-bit)| ... Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> a = np.array([1,2,3]) >>> a array([1, 2, 3]) >>> import math >>> math.log(a) Traceback (most recent call last): File "", line 1, in TypeError: only length-1 arrays can be converted to Python scalars >>> np.log(a) array([ 0. , 0.69314718, 1.09861229])

NumPy functions that are applied to an `array` again return an array with values of said function applied to every `array` element. That comes in pretty handy when handling larger sets of data.

Back to our curve-fitting. The above listed script generates this output:

[-60.37910437 5.09798716 3.03566267]

That means that the best fitting function is about

$$ T_\text{abs}(n) = -60 + 5.1 \cdot n + 3.04 \cdot n^2. $$

Pretty neat, eh? Plotted it looks like this. The red line is not the connection of the dots, but our model:

### My Discontent

So far, so very cool. An issue arises when we look at the relative errors between data points and model. That is, \(|T(n) / T_n|\), where \(T(n)\) is our model and \(T_n\) is the measured run-time. In contrast, the above curve-fitting minimized the absolute error \(|T(n) – T_n|\). (Actually, it minimized the *squared* absolute error, but I let that slide here and focus on absolute vs. relative error.)

Some additional lines of Python code added to the end of our script will print the relative errors and their average:

relerr = abs(1 - ydata / func(xdata, *popt)) # relative errors np.set_printoptions(suppress=True) # switch off sci. notation print(relerr * 100) avgrel = sum(relerr) / len(ydata) * 100 # calc average print("avgrel:", avgrel)

Which does produce this extra output:

[ 134.45275796 21.43922899 1.26238363 0.27284922 0.21410507 0.84902538 1.15067892 0.00073479 0.73808731 0.55686398 0.22467506 0.46166589 0.6782697 0.00615863 1.23715341 1.07859592 0.19226999 0.28013432 0.56064524 0.00479269] avgrel: 8.28305380503

So, we have an average relative error of 8 %, which seems rather high for me. Obviously, the relative error is extremely high with the two starting values: 134 % and 21 %. Can we improve that? That is, can we model so that the average and maximum relative error is lower?

### The Improved Solution

Least squares optimization with minimized absolute error is used very widely, but unfortunately, there is no easy way to switch the functions performing this to minimize the relative error. But I found this forum post that was very helpful. It’s on some other math software system, but we can borrow the idea: *“Usually the best way to do relative error is to log your model. This changes a proportional error structure into an additive one, which is exactly what you want”* (with “*log*” as in

logarithm).

Luckily, that is very easy to accomplish in Python. This is a changed version of the earlier script:

omul_str = open("omul-speed.txt", "r").read() # read measured values o = [float(i) for i in omul_str.split()] # make one big list os = o[0::2] # slice out first column ot = o[1::2] # slice out second column import numpy as np # imports from scipy.optimize import curve_fit # the magic function xdata = np.array(os) # convert lists to np.array ydata = np.array(ot) def func(x, c0, c1, c2): # the modeled function return c0 + c1*x + c2*x*x def logfunc(x, c0, c1, c2): # ... and the log of it return np.log(func(x, c0, c1, c2)) popt, pcov = curve_fit(logfunc, xdata, np.log(ydata)) # and fit it! print(popt) # print optimized parameters relerr = abs(1 - ydata / func(xdata, *popt)) # relative errors np.set_printoptions(suppress=True) # switch off sci. notation print(relerr * 100) avgrel = sum(relerr) / len(ydata) * 100 # calc average print("avgrel:", avgrel)

And now the output looks like this:

[ 12.98237958 1.9705695 3.05332744] [ 0.03485745 1.06567529 1.49572472 0.58745607 0.52644119 0.37155771 0.65289914 0.47219135 0.31728127 0.18879885 0.09165894 0.19598836 0.45928895 0.171688 1.37775523 1.18298158 0.26311364 0.23936649 0.54721279 0.01646704] avgrel: 0.512920201737

Awesome! The average relative error is down to 0.5 % with a maximum of 1.5 %.

The linear plot looks largely the same, because the absolute differences are too small to see. But if we switch to a double-logarithmic plot, we can see them clearly:

Clearly, the smaller the values are, the larger the difference is between the red graph (minimized absolute errors model) and the data points, whereas the green graph (minimized relative errors) is much closer to the data points for small \(n\).

There is a nicely typeset PDF of this article available here.