Vorkurs “Formale Methoden der Informatik” im WS 2016/17

Ich halte auch im WS 2016/17 wieder den Vorkurs “Formale Methoden der Informatik” an der Universität Bonn. Der Vorkurs wendet sich an Studierende, die jetzt im Wintersemester ihr Studium der Informatik beginnen.

Der Vorkurs ist freiwillig und unbenotet, aber eine Anmeldung ist hier erforderlich.

Der Vorkurs findet statt von Montag, 26.9.2016 bis Freitag, 7.10.2016, jeden Tag von 10-12 Uhr im Hörsaal 2, Institut für Informatik, Römerstr. 164, Bonn. Am 3.10. (Feiertag) fällt der Kurs ersatzlos aus.

Weiterhin finden jeden Tag Übungen dazu statt, von 12-14 Uhr. Wir raten Ihnen, zu den Übungen zu gehen, auch wenn Sie denken, dass Sie das alles schon voll drauf haben. 😉 In den Übungen haben Sie Gelegenheit, Fragen zu stellen, es werden Aufgaben zur Vorlesung gerechnet und Sie lernen Ihre neuen Kommilitonen kennen. Es gibt keine Hausaufgaben.

Es gibt ein Skript, welches hier verfügbar ist (Version 3.2 vom 7.10.2016). Da es auch während des Vorkurses Updates geben wird, drucken Sie besser immer nur Teile aus, falls Sie es überhaupt ausdrucken. 🙂 Ich werde hier immer die neueste Version des Skripts anbieten. Ich bin sehr interessiert daran, Fehler im Skript zu korrigieren: bitte sprechen Sie mich in oder nach der Vorlesung an oder auch online.

Getting a kernel mode driver signed for Windows 10

In this article I want to describe my experiences with the new (as of August 2016) driver signing issues and Windows 10.

Since the Anniversary Update of Windows 10 (version 1607, also called Redstone1), Microsoft requires new signatures on your kernel mode drivers under certain circumstances. This is called “attestation signing”. That is to say, only if certain circumstances are met — and I guess these “loopholes” will get smaller over the years — you can run drivers that were signed the “old” way (called “cross-certification”) on Windows 10.

The “new” way brings certain big changes:

  • The “old” way meant you — as the software developer — use your certificate to sign your software. Since your certificate is cross-signed by a certification authority (CA) that is (in the end) trusted by Microsoft this ensures that no one could tamper with the file and that Microsoft trusts that you are who you claim you are.
  • The “new” way means that you submit your software to Microsoft and they add their certificate, provided all the requirements are met.
  • For that, you need an EV code signing certificate, there is no way around it. This basically means, it’s more expensive and it comes on an USB hardware token (so you cannot copy it).

To give an idea of what has to be done, I describe what I did. My company, cFos Software GmbH, needs cFosSpeed, our traffic shaping driver signed for Windows 10.

Getting an EV certificate

I got the certificate from Globalsign Germany, as I always did, just this time I ordered the “extended verification” (EV) certificate. This increased the price from 429€ (about $480 as of 2016-09-02) for 3 years to 709€ (about $793)! (Note to self: in next life, run a certification authority!)

The process is kinda smooth, they require some (updated) documents (HRB printout), check that the provided contact data (email address and phone number) is correct and have you sign the agreement. They were very helpful and the whole process required some three days plus another two until I got the USB token by mail. If this is your first time you order a certificate from a CA, it might take longer, since they must check more data.

On that USB token you have to download the certificate from their website. You can do that only once, so there is no easy way to share the same certificate with several people (two teams in two offices, for example). You cannot just copy the USB token, of course.

The actual signing works as always, only now a password box pops up every time you sign. Luckily, copy-and-paste works for the password entry.

On the plus side, Microsoft SmartScreen instantly trusts files that where signed with an EV certificate. So that might ease out the roll-out of new software a bit.

Here is what Microsoft says about getting a code signing certificate.

Getting the Microsoft Signature

Prerequisites

It’s harder as I imagined (of course). Firstly, you need a Microsoft sysdev account. I don’t recall how I created that, but I gather it wasn’t that hard.

Secondly, you need to download some file (called winqual.exe)  from Microsoft (under Administration / manage certificates), sign it with your EV certificate and upload it again. This way, you only prove that your company has an EV certificate. That same certificate has to be used later to sign your submission.

Thirdly, you need to “sign” some legal documents. “Sign” luckily only means typing your name and the date and it’s almost instantly countersigned. They don’t say which documents need signing beforehand, so I guessed we might need “Windows Compatibility Program and Driver Quality Attestment Testing Agreement” and signed it. But that wasn’t enough, so I signed “Windows Certification Program Testing Agreement v1.0” as well, which seemed to be almost the same. After that, I was able to proceed further.

Submitting

Now I was able to upload our driver for signing. There are two ways of submitting:

  • Using the Hardware Lab Kit (HLK) to test your submission against Windows 10 and use the Hardware Certification Kit (HCK) to test against earlier versions of Windows. Merge the results and upload that to Microsoft.
  • Cross-sign the drivers yourself and upload to Microsoft for attestation signing. That is what I did. Microsoft mentions that drivers signed this way won’t run on Windows Server 2016 Technical Preview.

Firstly, I sign the files myself in the usual way using signtool.exe:

signtool sign /a /ac GlobalSign_Root_CA.crt /s my /n “Company name” /fd sha256 /td sha256 /tr “http://timestamp.globalsign.com/?signature=sha2” /du “http://www.example.com” cabfilename.cab

You need the “sha256” options, since SHA1 has been deprecated since 2016-01-01 and will work less and less in the future. Make sure that you use the same certificate you used to sign winqual.exe earlier on.

Secondly, you need to put the .SYS driver file and its .INF file into one .CAB archive into a subdirectory. Here is a description from Microsoft on how to actually pack the files and sign them by yourself. Note, that I used CABARC instead of MAKECAB to pack the files:

md driver
copy driver.sys driver.inf driver
cabarc -p -r N cabfilename.cab driver\*

I chose to include our .CAT file as well, even though the submission process will create one for you. I don’t know yet if that was a good or bad choice, both ways seem to work.

Thirdly, sign that .CAB file like you signed the .SYS file (see above).

Forthly, I uploaded that signed .CAB archive to sysdev.microsoft.com. You will have to choose for which versions of Windows 10 your driver qualifies. I checked all versions, but only one architecture (x64 or x86). You can only have one architecture per submission!

… and waiting

The submission takes a while and goes through a ten step process, namely:

  1. (no idea)
  2. Transferring CAB File
  3. Scanning CAB file for Viruses
  4. Decompressing CAB File
  5. Validating HCK/HLK Submission Package
  6. Creating Catalog Files
  7. Archiving Files
  8. Parsing Driver Data
  9. Signing Catalog Files
  10. Transferring Catalog File to Server

OSR describes in this post that the whole process took them 30 Minutes. But the first time I submitted, step 5 took six days and hadn’t completed! Then I wrote an email to sysdev@microsoft.com to find out that I had built my .CAB file the wrong way (I had put all files into the root folder instead of a “driver” sub-folder) and that had apparantly hung the process. (Thanks again for a timely and succinct answer, Jack!) So If you have to wait for long period of time, it’s probably best to contact Microsoft by mail and solve the issue.

Fixing that problem and re-submitting the driver package got me an approved driver within 10 minutes! 🙂 🙂

Victory!

After that you can download the signed .SYS and .CAT files packed into a .ZIP file. When you install that driver it doesn’t pop up that box asking “Do you trust Company X?” So, that’s nice.

What’s less nice is that drivers signed like this, even though they retain our old EV signature, don’t load under Windows 8.1 and load like they have no signature at all (and thus look kinda bogus, since they have no reference to the publisher) under Windows 7. So far, I have found no way to have a single driver file that loads under all Windows 7+ operating systems.

Resources

 

Vorkurs “Formale Methoden der Informatik” WS 2015/16

Ich halte in diesem WS 2015/16 wieder den Vorkurs “Formale Methoden der Informatik” an der Universität Bonn. Der Vorkurs ist gedacht für Studenten, die jetzt im Wintersemester ihr Studium der Informatik beginnen.

Der Vorkurs ist freiwillig und unbenotet, aber eine Anmeldung ist hier erforderlich.

Der Vorkurs findet statt von Montag, 28.9.2015 bis Freitag, 9.10.2015, jeden Tag von 10-12 Uhr (außer am 2.10. und 5.10.: 13-15 Uhr) im Hörsaal 2, Institut für Informatik, Römerstr. 164, Bonn.

Weiterhin finden jeden Tag Übungen dazu statt, von 13-15 Uhr (außer am 2.10. und 5.10.: 15-17 Uhr). Wir raten Ihnen zu den Übungen zu gehen, auch wenn Sie denken, dass Sie das alles voll drauf haben. 😉 In den Übungen haben Sie Gelegenheit, Fragen zu stellen und es werden Aufgaben zur Vorlesung gerechnet. Es gibt keine Hausaufgaben.

Ich habe ein Skript vorbereitet, welches hier verfügbar ist (Version 2.0.6). Da es auch während des Vorkurses Updates geben wird, drucken Sie besser immer nur Teile aus, wenn Sie es überhaupt ausdrucken. 🙂 Ich werde hier immer die neueste Version des Skript anbieten.

Update: der Vorkurs ist inzwischen vorbei, aber das Skript ist hier immer noch verfügbar. Hinweise sind weiterhin willkommen.

Slides and Links to my ISSAC ’15 Talk

I held my talk at ISSAC 2015 in Bath, UK on Wednesday July, 8th about “Implementation of the DKSS Algorithm for Multiplication of Large Numbers”. My paper is now available from the ACM digital library (or here locally) and the slides are available on the ISSAC website (or here locally).

To the ISSAC organizers and everyone present: the conference was awesome, thank you very much! I had a great time.

BibTeX for the paper:

@Conference{Lueders2015,
  Title                    = {Implementation of the DKSS Algorithm for Multiplication of Large Numbers},
  Author                   = {Christoph L{\"u}ders},
  Booktitle                = {ISSAC 2015 --- Proceedings of the 2015 ACM on International Symposium on Symbolic and Algebraic Computation},
  Year                     = {2015},
  Pages                    = {267--274},
  Abstract                 = {The Sch{\"o}nhage-Strassen algorithm (SSA) is the de-facto standard for multiplication of large integers. For N-bit numbers it has a time bound of $O(N \log N \log \log N)$. De, Kurur, Saha and Saptharishi (DKSS) presented an asymptotically faster algorithm with a better time bound of $N \log N 2^{O(\log^* N)}$. For this paper, a simplified DKSS multiplication was implemented. Assuming a sensible upper limit on the input size, some required constants could be precomputed. This allowed to simplify the algorithm to save some complexity and run-time. Still, run-time is about 30 times larger than SSA, while memory requirements are about 2.3 times higher than SSA. A possible crossover point is estimated to be out of reach even if we utilized the whole universe for computer memory.},
  Doi                      = {10.1145/2755996.2756643}
}

Implementation of the DKSS Algorithm for Multiplication of Large Numbers — ISSAC’15

My paper “Implementation of the DKSS Algorithm for Multiplication of Large Numbers” was accepted for ISSAC’15 Conference to be held on 6-9 July 2015 at the University of Bath, U.K. !

Abstract: The Schönhage-Strassen algorithm (SSA) is the de-facto standard for multiplication of large integers. For \(N\)-bit numbers it has a time bound of \(O(N \cdot \log N \cdot \log \log N)\). De, Kurur, Saha and Saptharishi (DKSS) presented an asymptotically faster algorithm with a better time bound of \(N \cdot \log N \cdot 2^{O(\log^∗ N)}\). For this paper, a simplified DKSS multiplication was implemented. Assuming a sensible upper limit on the input size, some required constants could be precomputed. This allowed to simplify the algorithm to save some complexity and run-time. Still, run-time is about 30 times larger than SSA, while memory requirements are about 2.3 times higher than SSA. A possible crossover point is estimated to be out of reach even if we utilized the whole universe for computer memory.

This is an improved version of what I wrote about in my diploma thesis.

My source code that was used for the tests is available here and is licensed under LGPL.

Fast Multiplication of Large Integers: Implementation and Analysis of the DKSS Algorithm — Diploma Thesis

“Fast Multiplication of Large Integers: Implementation and Analysis of the DKSS Algorithm”, that is my diploma thesis. I just uploaded it to arXiv: http://arxiv.org/abs/1503.04955

Abstract: The Schönhage-Strassen algorithm (SSA) is the de-facto standard for multiplication of large integers. For N-bit numbers it has a time bound of \(O(N \cdot \log N \cdot \log \log N)\). De, Kurur, Saha and Saptharishi (DKSS) presented an asymptotically faster algorithm with a better time bound of \(N \cdot \log N \cdot 2^{O(\log^∗ N)}\). In this diploma thesis, results of an implementation of DKSS multiplication are presented: run-time is about 30 times larger than SSA, while memory requirements are about 3.75 times higher than SSA. A possible crossover point is estimated to be out of reach even if we utilized the whole universe for computer memory.

It contains not only what the title promises, but also a long presentation of my own endeavors regarding fast multiplication, from ordinary multiplication, Karatsuba, Toom-Cook 3-way and Schönhage-Strassen with theory and some code examples.

I’m happy for any commentary.

Curve-fitting With Minimized Relative Error

The Problem

I wrote a C++ function to multiply two large positive integers of the same length, say \(n\) 64-bit words, with the grade-school method. Let’s call that function omul_n(). Then, I wrote extensive benchmarking to assess the speed of my efforts. The resulting run-times for the multiplication of two numbers with \(n\) words look like this:

WordsCycles
118
9281
17915
251959
333421
415207
497392
5710093
6513000
7316397
8120224
8924326
9728800
10533941
11339764
12145487
12951212
13757453
14564142
15371778

omulruntime

Now I wanted to find a closed function to most accurately describe the run-time of omul_n() We know that to multiply two numbers of \(n\) digits each, we need to do \(n^2\) digit-multiplications. So, most likely, the desired function will look something like $$  T(n) = c_0 + c_1 n + c_2 n^2.  $$

The only question is: what values to use for \(c_0\), \(c_1\) and \(c_2\)? I like linear regression, but it only works for linear relationships, like \(T(n) = c_0 + c_1 n\). We cannot use that here.

The First Solution

The solution to my question is curve-fitting. I used Python functions to do so, namely scipy.optimize.curve_fit from the SciPy package (a good starter article that inspired my use of curve-fittings is here.)

The program is really simple. You input your data plus the describing function (like \(T(n)\) above) into the curve-fitting function and out pop the coefficients \(c_i\) that yield the \(T(n)\) with the least squared error.

The Python script:

omul_str = open("omul-speed.txt", "r").read() # read measured values
o = [float(i) for i in omul_str.split()] # make one big list
os = o[0::2]                             # slice out first column
ot = o[1::2]                             # slice out second column

import numpy as np                       # imports
from scipy.optimize import curve_fit     # the magic function

xdata = np.array(os)                     # convert lists to np.array
ydata = np.array(ot)
def func(x, c0, c1, c2):                 # the modeled function
   return c0 + c1*x + c2*x*x

popt, pcov = curve_fit(func, xdata, ydata) # and fit it!
print(popt)                              # print optimized parameters

If you’re not used to NumPy, array features an unfamiliar usage:

Python 3.4.1 |Anaconda 2.1.0 (64-bit)| ...
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> a = np.array([1,2,3])
>>> a
array([1, 2, 3])
>>> import math
>>> math.log(a)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: only length-1 arrays can be converted to Python scalars
>>> np.log(a)
array([ 0.        ,  0.69314718,  1.09861229])

NumPy functions that are applied to an array again return an array with values of said function applied to every array element. That comes in pretty handy when handling larger sets of data.

Back to our curve-fitting. The above listed script generates this output:

[-60.37910437   5.09798716   3.03566267]

That means that the best fitting function is about
$$ T_\text{abs}(n) = -60 + 5.1 \cdot n + 3.04 \cdot n^2. $$

Pretty neat, eh? Plotted it looks like this. The red line is not the connection of the dots, but our model:

omulfit1

My Discontent

So far, so very cool. An issue arises when we look at the relative errors between data points and model. That is, \(|T(n) / T_n|\), where \(T(n)\) is our model and \(T_n\) is the measured run-time. In contrast, the above curve-fitting minimized the absolute error \(|T(n) – T_n|\). (Actually, it minimized the squared absolute error, but I let that slide here and focus on absolute vs. relative error.)

Some additional lines of Python code added to the end of our script will print the relative errors and their average:

relerr = abs(1 - ydata / func(xdata, *popt))    # relative errors
np.set_printoptions(suppress=True)              # switch off sci. notation
print(relerr * 100)
avgrel = sum(relerr) / len(ydata) * 100         # calc average
print("avgrel:", avgrel)

Which does produce this extra output:

[ 134.45275796   21.43922899    1.26238363    0.27284922    0.21410507
    0.84902538    1.15067892    0.00073479    0.73808731    0.55686398
    0.22467506    0.46166589    0.6782697     0.00615863    1.23715341
    1.07859592    0.19226999    0.28013432    0.56064524    0.00479269]
avgrel: 8.28305380503

So, we have an average relative error of 8 %, which seems rather high for me. Obviously, the relative error is extremely high with the two starting values: 134 % and 21 %. Can we improve that? That is, can we model so that the average and maximum relative error is lower?

The Improved Solution

Least squares optimization with minimized absolute error is used very widely, but unfortunately, there is no easy way to switch the functions performing this to minimize the relative error. But I found this forum post that was very helpful. It’s on some other math software system, but we can borrow the idea: “Usually the best way to do relative error is to log your model. This changes a proportional error structure into an additive one, which is exactly what you want” (with “log” as in
logarithm).

Luckily, that is very easy to accomplish in Python. This is a changed version of the earlier script:

omul_str = open("omul-speed.txt", "r").read()   # read measured values
o = [float(i) for i in omul_str.split()]        # make one big list
os = o[0::2]                                    # slice out first column
ot = o[1::2]                                    # slice out second column

import numpy as np                              # imports
from scipy.optimize import curve_fit            # the magic function

xdata = np.array(os)                            # convert lists to np.array
ydata = np.array(ot)
def func(x, c0, c1, c2):                        # the modeled function
   return c0 + c1*x + c2*x*x
def logfunc(x, c0, c1, c2):                     # ... and the log of it
   return np.log(func(x, c0, c1, c2))

popt, pcov = curve_fit(logfunc, xdata, np.log(ydata))  # and fit it!
print(popt)                                     # print optimized parameters

relerr = abs(1 - ydata / func(xdata, *popt))    # relative errors
np.set_printoptions(suppress=True)              # switch off sci. notation
print(relerr * 100)
avgrel = sum(relerr) / len(ydata) * 100         # calc average
print("avgrel:", avgrel)

And now the output looks like this:

[ 12.98237958   1.9705695    3.05332744]
[ 0.03485745  1.06567529  1.49572472  0.58745607  0.52644119  0.37155771
  0.65289914  0.47219135  0.31728127  0.18879885  0.09165894  0.19598836
  0.45928895  0.171688    1.37775523  1.18298158  0.26311364  0.23936649
  0.54721279  0.01646704]
avgrel: 0.512920201737

Awesome! The average relative error is down to 0.5 % with a maximum of 1.5 %.

The linear plot looks largely the same, because the absolute differences are too small to see. But if we switch to a double-logarithmic plot, we can see them clearly:

omulfit2

Clearly, the smaller the values are, the larger the difference is between the red graph (minimized absolute errors model) and the data points, whereas the green graph (minimized relative errors) is much closer to the data points for small \(n\).

There is a nicely typeset PDF of this article available here.