Sniff is a "Scratch-like" programming language that's designed to help Scratchers move gently from Scratch to more conventional languages. They can start writing programs, without having to learn a new language because Sniff is based on Scratch. They learn a little more about variables, compiling, syntax errors (!), and they can have fun controlling real hardware while they're doing it.

Saturday, 4 April 2015

Beyond Newton Rhapson

Normally I blog about fun programming exercises you can do using Sniff. Often this involves robots, flashing lights, and physics experiments. They're usually appropriate for kids somewhere in the age range 10-16. Recently however we've got a little off track. It started innocently enough with a simple Monte Carlo Simulation, but then moved on to root finding. It got a bit out of hand with Taylor Approximations, and then Newton Rhapson. If you're looking for a fun exercise for you KS4 physics class check the older posts on the right!

In the meantime here's one last maths heavy post (promise!)

Newton's Method for root finding (or strictly the Secant Method, as we're finding the derivative analytically) works pretty good. It fits a straight line to a function, then solves for the straight line. Then we repeat again using the straight line solution as a starting point to fit a new straight line.

We're about to leave anything that's relevant to anything that's taught in schools - so lets abandon any pretence this isn't about calculus. I'll also abandon any claim that I actually know this stuff - my maths is pretty good, but I don't have a maths degree, so there may be some inaccuracies in here! This is stuff I picked up when I was fact checking the previous post!

Newton Rahpson takes the derivative of a function find its gradient at a point and fit a straight line, but what if we could do better? The function isn't a straight line - its a curve, which means its gradient is changing, so we get the wrong answer. But what if we could include information about not just the gradient, but how the gradient was changing? Well if we plotted a graph of the gradient, what would be it's gradient? In other words the second derivative!

For a straight line it has a constant gradient so f'(x)=m which is a constant. f''(x) is how the gradient changes, and (for a straight line) it doesn't so f''(x)=0. But if we had a quadratic (say x*x), then the gradient does change - its a curve! In fact in this case f'(x)=2x. That of course is a straight line with gradient 2, so f''(x)=2. Newton assumes that f''(x)=0 which is why it only approximates the curve rather than getting it right.

It turns out Newton's method is only the most basic approach to this problem and there are a whole bunch of more advanced solutions which incorporate higher order derivatives, and fit ever more complex curves. After Newton, the next simplest method is Halley's method. It's just like Newton Rhapson except that we use:

That's a bit more complex, but most of it is just an equation that we need to type in, in place of the Newton equation. As you can see it has f''(x) in the bottom right, so its accounting for the curvature.

We approximate f'(x) as:
   f'(x) = (f(x+dx)-f(x))/dx
so we can approximate f''(x) as:
   f''(x) = (f'(x+dx)-f'(x))/dx

However using our equations so far this would approximate f'(x+dx) using f(x+dx) and f(x+dx+dx), which is a bit far away from where we're interested in. However we could just as easily have used:
   f'(x) = (f(x)-f(x-dx))/dx
(this is called backward differencing, rather than forward differencing), so we can use these two approximations - one backwards and one forwards to get two derivatives close to x, and use those to find f''(x) so:
f''(x)=((f(x+dx)-f(x))/dx - (f(x)-f(x-dx))/dx )/dx
It's the forward difference, minus the backward difference, divided by the distance between them. Rearranging this gives:

Also while we're at it we can get a better approximation for f'(x) by combining both the forward and backward differencing equations to get:
   f'(x) = (f(x+dx)-f(x-dx))/2dx
In fact we can drop this into Newtons method for slightly better results.

Now we can find f''(x), and a better approximation to f'(x) we can just drop these into the previous code, and off we go:

when start
.set x to 45
.set dx to 0.005
.repeat 10
..broadcast calcF and wait
..set fx to f
..say join join [x] ":" [fx]
..change x by dx
..broadcast calcF and wait
..set fpdx to f
..change x by -dx
..change x by -dx
..broadcast calcF and wait
..set fmdx to f
..change x by dx
..set m to (fpdx-fmdx)/(dx+dx)
..set d2 to (fpdx+fmdx-(fx+fx))/(dx*dx)
..set x to x-((2*fx*m)/(2*m*m-fx*d2))

Running that on our random cubic, prints out:

Compare that to the results from Newton:

Including the information about the curvature get us to the first answer a lot quicker! Using -1 and -2 as starting points to find the other roots produces:



And for Newton:


Both of these converge pretty quick, so the results are a little inconclusive: for -2 Halley converges 1 step faster. For the root near -1 it looks like Newton gets their first... 4 steps rather than 5, but look more closely:  on the first three iterations Halley is closer, and on the fourth is actually printing the right answer, but the error is 10e-7 rather than 10e-8 when it converges on the final step.

Overall Halley definitely converges faster. However there are problems.

The first is numerical instability. In the Newton code I used a value of dx=0.001. It worked great. In this code the second derivative requires dividing by dx squared, which would be 0.000001. Thats small. In fact it didn't work - I had to increase dx to 0.005 as a minimum otherwise the whole thing went crazy. Larger values of dx mean less accuracy to our derivatives, and things start to go wrong.

The second is that we're relying on more maths. Both Newton and Halley rely on the functions we're testing to be "well behaved", but there's just more to go wrong with that for Halley than Newton. There's also more code to go wrong - we're approximating more things, and that means more errors can creep in. In both cases there are a lot of things that need to go right to produce versions of the code that work reliably (the code here is for experimenting with and isn't designed to handle when things go wrong), but again theres more to go wrong in the more complex code.

The real problem however is that Halley converges faster PER STEP. But look at the work we're doing per step. It evaluates the function three times, rather than two, and then does a lot more maths on those values. Overall, although Halley uses less steps it probably uses about the same, or more cpu time. Computers are really good at doing lots of simple steps, so giving them the simpler code, and letting them run it fast, with lots of iterations is probably the better solution in the real world.

Even if this doesn't necessarily produce better results, it was still fun to try it, and there are a lot of good things to be learnt about implementing numerical analysis in code. We now return to our normal programming.

No comments:

Post a Comment