I watched this really fascinating video on prime spirals recently and decided to see if I could write my own code to generate them. It's not terribly efficient but it works! Code is on github! :D

# Scientia potestas est. - Knowledge is power.

## Friday, 20 November 2015

### Prime Spirals in Python

I watched this really fascinating video on prime spirals recently and decided to see if I could write my own code to generate them. It's not terribly efficient but it works! Code is on github! :D

### Machine Learning: Implementing Linear Regression and Gradient Descent in Python

It's been a while since I went through Andrew Ng's Machine Learning course on Coursera. The course is very good at introducing the basic concepts but it lacks the depth of knowledge that I need if I really want to understand how to implement and use some of the algorithms on my own. So I decided to work through a more difficult textbook on machine learning. One of the first things I decided to do was implement the machine learning algorithms from scratch using my language of choice: Python.

The most simple idea out there that can be called machine 'learning' is linear regression. I first learned about linear regression in high school math class. So when I saw that linear regression was the first topic in the machine learning course, I was a bit skeptical. I felt that linear regression was too simple to be called machine learning. But it turns out that linear regression is quite a powerful tool to make predictions from data. The term 'linear' in linear regression is a bit misleading because it makes you think about straight line graphs and gives you the impression that trying to fit straight lines to data can't really be that useful. The trick to levelling up the power of linear regression lies in how you choose your variables (or features)! So you can have features that are crazy functions of your initial variables.

To understand this think about how we can use a straight line to fit a set of data points that you think might be exponential in nature. Say yoususpect that output $y$ is related to the variable $x$ by $y = ke^{cx}$. You can rearrange this equation to $log(y) = cx + k_1$ by taking the natural logarithm of both sides of the equation. Now we can plot $log(y)$ vs $x$ on a graph and try to fit a straight line to it. Here the parameter that we 'learn' is $c$ and $k_1$. The variable $x$ is called a feature.

In a similar way you can add features that are functions of the existing features if you think that the relationship is not linear. In a sense we are choosing a 'basis' of functions which we can mix together in different amounts to (hopefully) accurately model the output $y$. The way a Fourier Series builds up a periodic sequence using sine waves of different frequencies is exactly the same as how linear regression works with non linear basis functions.

So I decided to make the system 'learn' the square wave in my first implementation of linear regression. First some math:

The output prediction $\hat{y}$ can is represented by the linear model below. Here 'n' is the number of feature vectors.

$$ \hat{y} = \sum_{i=1}^{n} \theta_i x_i$$

To use gradient descent to train this model we need two things: a cost function and a gradient function.

The cost function is a function that takes the values of theta, and the features and the data samples and calculates a cost based on the error between the prediction that the model makes and the actual value. The most commonly used cost function is the mean square error cost function.

In this equation, m is the total number of training samples that we have and $x^{(i)}$ represents a feature vector that is ith training example. The cost function sums over all the training examples available to give one real number that represents the cost of the current values of the parameters.

$$

J = \frac{1}{m} \sum_{i=1}^{m}(\hat{y}^{(i)} - y^{(i)})^2 \\

\frac{\partial J}{\partial \theta_j} = \frac{2}{m} \sum_{i=1}^{m}(\hat{y}^{(i)} - y^{(i)})x^{(i)}_j

$$

The gradient of a function is a vector that points in the direction in which the function changes the most. So if I want to find the place where the cost is minimum all I have to do is to start out at a point, find the gradient at that point and go in the opposite direction. I have to keep doing this until the gradient is zero (or very close to it). Think of it like a ball rolling down slopes until it comes to rest at a place where there is no slope. So in Python I have to make a loop and each time the loop is run, I update the parameters theta using the rule. Here alpha is the learning parameter.

$$ \theta_j := \theta_j - \alpha \frac{2}{m} \sum_{i=1}^{m}(\hat{y}^{(i)} - y^{(i)})x^{(i)}_j$$

So I got a large set of points that were in the shape of a square wave and I used sine wave basis functions as features. After running gradient descent for a thousand iterations I got coeffecients that were really close to the actual Fourier Series expansion of a square wave!

You can find my implementation on my github page.

So I decided to make the system 'learn' the square wave in my first implementation of linear regression. First some math:

The output prediction $\hat{y}$ can is represented by the linear model below. Here 'n' is the number of feature vectors.

$$ \hat{y} = \sum_{i=1}^{n} \theta_i x_i$$

To use gradient descent to train this model we need two things: a cost function and a gradient function.

The cost function is a function that takes the values of theta, and the features and the data samples and calculates a cost based on the error between the prediction that the model makes and the actual value. The most commonly used cost function is the mean square error cost function.

In this equation, m is the total number of training samples that we have and $x^{(i)}$ represents a feature vector that is ith training example. The cost function sums over all the training examples available to give one real number that represents the cost of the current values of the parameters.

$$

J = \frac{1}{m} \sum_{i=1}^{m}(\hat{y}^{(i)} - y^{(i)})^2 \\

\frac{\partial J}{\partial \theta_j} = \frac{2}{m} \sum_{i=1}^{m}(\hat{y}^{(i)} - y^{(i)})x^{(i)}_j

$$

The gradient of a function is a vector that points in the direction in which the function changes the most. So if I want to find the place where the cost is minimum all I have to do is to start out at a point, find the gradient at that point and go in the opposite direction. I have to keep doing this until the gradient is zero (or very close to it). Think of it like a ball rolling down slopes until it comes to rest at a place where there is no slope. So in Python I have to make a loop and each time the loop is run, I update the parameters theta using the rule. Here alpha is the learning parameter.

$$ \theta_j := \theta_j - \alpha \frac{2}{m} \sum_{i=1}^{m}(\hat{y}^{(i)} - y^{(i)})x^{(i)}_j$$

So I got a large set of points that were in the shape of a square wave and I used sine wave basis functions as features. After running gradient descent for a thousand iterations I got coeffecients that were really close to the actual Fourier Series expansion of a square wave!

The learned function superimposed over the original square wave |

How the error decreases over time. |

You can find my implementation on my github page.

## Sunday, 15 November 2015

### The One Thing that my College Gets Wrong About Exams

I'll be honest. Despite it's shortcomings, many of the courses that are offered actually do contain material that I find interesting. In fact, when you're learning something at the level of a college course, there are few subjects that

*aren't*interesting. It's like Richard Feynman said; everything is interesting if you go deep enough into it. My biggest complaint about the system isn't the lack of depth in the courses or for that matter the professor's depth of knowledge (although I have had a fair share of instructors who had a less than optimal grasp of the course material themselves). My biggest complaint is the*focus*of the examinations that they conduct that are supposed to evaluate how well the students know the material. The exams - in addition to having a low correlation to the students' grasp of the material - also actively discourage them from gaining a true understanding of the course material.
Typically, the exams I've written (particularly exams of courses that are highly mathematical) test only the ability of the students to memorize two things: derivations and formulae. I'm not saying that people should be exempted from memorizing formulae. There are plenty of formulae that are so important that they must be memorized. The problem is when you have to memorize formulae that look like this:

$L_F = -10 log(\cos{\theta} \{\frac{1}{2} - \frac{1}{\pi} p(1-p^2)^{\frac{1}{2}} - \frac{1}{\pi}arcsin(p) - q[\frac{1}{\pi}y(1-y^2)^{\frac{1}{2}} + \frac{1}{\pi}arcsin(y) + \frac{1}{2}]\})$

$p = \frac{\cos{(\theta_c)}(1-\cos{(\theta)})}{\sin{(\theta_c)}\sin{(\theta)}}$

$q = \frac{\cos^3(\theta_c)}{(\cos^2(\theta_c) - \sin^2(\theta_c))^\frac{1}{2}}$

$y = \frac{\cos^2(\theta_c)(1 - \cos \theta) - \sin^2(\theta)}{\sin{(\theta_c)} \cos{(\theta_c)} \sin{(\theta)}}$

For anyone interested it's a formula that tells you the coupling loss between two fiber optic cables when they're misaligned by an angle theta. The book doesn't even attempt to derive this formula because the authors

*know*that the derivation on its own doesn't really add to a person's understanding of fiber optics. It may be a nice exercise that tests the student's algebraic manipulation but that's pretty much it. In the real world, if there is a situation in which this formula needs to be used, it's pretty much guaranteed that the engineer will just look it up. It's completely pointless to memorize it.
Why is this focus bad? Firstly, it does not actually test understanding and leads to students not getting the score that they deserve. There have been countless instances where I

*knew*how to solve a problem but didn't get any credit because I had to remember some stupidly long equation. Secondly, it forces examiners to limit their questions to really simple ones because they know that remembering the formula or it's contrived derivation is half the question. So the number of questions that test my understanding of the material is next to zero. Thirdly, it actively discourages students from trying to understand the material. Once people figure out that they can get scores by just memorization, they will start focusing on memorization. Sometimes, there is so much to memorize that there isn't enough time to try and understand even if I wanted to.
If I were the person who was in charge of the setting the examinations, I would do the sensible thing and give every student a booklet full of the formulae that need to be used for the questions. This allows the questions to be far more interesting and test how the student is able to use the information at his disposal to solve actual problems that practicing engineers face. The brain is a processor with a limited amount of cache! It's not a hard disk! Introducing this simple change of focus will go a long way towards courses that are actually useful. It will also give the professors the freedom to ask tougher questions!

Now I know that this approach actually works! Before college I did my Cambridge IGCSEs and A-Levels. Both these exams were very focused on understanding rather than recall. Exams either had all the required formulae included as part of the questions or (as was the case for Further Mathematics at A-Levels) a large booklet of formulae. This didn't affect the difficulty of the exam at all. It's no use knowing formulae if you don't know how to use it. This might sound a bit strange but I actually

**enjoyed**writing my IGCSEs and my A-Levels. The exams were designed well enough that sometimes I came out of the exam hall with a new understanding of what I had learned. I still feel that I understood and learned more in class during my high school than I did in most of my college courses. Any understanding that I have about my course material is a result of my independent reading and efforts rather than the course instruction.
Now I'm not the first person to talk about this. People have been complaining about the education system for a very very long time. And many have even tried to change it by approaching the administration. Invariably though, I've seen that - at least in my college - the people in charge are very reluctant to bring about any of these changes. The biggest roadblock is probably the fact that a lot the of the professors in my college disagree with what I have said above. They believe that memorization is very important perhaps because that is how

*they*were taught. I've seen that things are changing, but not nearly fast enough. I don't really think that the changes that I've suggested will be implemented any time soon. So why am I writing all this down? I really just really needed to vent. XP## Thursday, 15 October 2015

### Simulating a Double Pendulum in Mathematica

I've been playing around with Mathematica's Non-Linear Control Systems toolbox over the past few days and it's been brilliant! One of the first systems that I tried to simulate is the double pendulum since it's such a commonly used sample problem in non-linear controls.

The first step is to write down the total kinetic and potential energies of the system and find the Lagrangian. Once you have that, you just need to run the Lagrangian through Mathematica's EulerEquations function and then set up a non-linear state space model using the differential equations.

Once you have the system model it's really simple to get the response and make a simple simulation.

And out pops a wonderful demonstration of chaos! Here are are three simulations with slightly different initial conditions showing diverging trajectories.

Initial conditions: {160, -160} |

The next thing I tested was whether I could make a simple controller to stabilize this double pendulum pointing straight up. I went with a standard Linear Quadratic regulator. In this simulation there is only one actuator at the elbow joint. So this is also an example of an underactuated robotic system. Works great! :D

## Sunday, 11 October 2015

### Simulating Mechanical Systems in Mathematica

I've been trying a lot of different software for simulating mechanical systems for my project. By far, the most fun I've had is when I was simulating it on Mathematica. Mathematica is a seriously cool piece of software. I got a simulation of the simple pendulum up and running almost 3 times faster than I did when I was working with Python and Matplotlib. Although, to be fair, the time I did it in Python was the first time I was doing the simulation. So I was learning about how to do the simulation as well as trying to make it. So that might have affected the amount of time I took.

## Friday, 9 October 2015

### Movie Review - Interstellar

Interstellar is a movie that I had been looking forward to for quite a while. After watching the movie I started going through a lot of the movie reviews that people started posting. I was quite disappointed to see so much negativity and nitpicking directed towards the movie.

The movie is set in a dreary post apocalyptic earth ravaged by dust storms. Food is scarce and blight is creeping in on the last of humanity's food supplies. To survive, people are forced to take up farming. In one of the farms Cooper - a NASA engineer turned unwilling farmer reminisces about the time humanity used to look up at the stars and wonder about our place in the universe.

Cooper soon discovers that all hope is not lost when a gravitational anomaly leads him to a hidden NASA base full of people who are working on one last mission to save humanity. A wormhole has appeared near Saturn that put a new galaxy full of potentially habitable planets within the reach of humans. Cooper is asked to help NASA with visiting the most promising of these worlds to investigate. They visit a world with tidal waves the size of mountains and a world of ice and snow all the while having to deal with the time dilation of a super massive black hole.

The movie excelled in its depiction of space travel and the realities of time dilation. It very realistically depicted the science of gravitational time dilation. The graphics of the worm hole was brilliant, dwarfed only by that of the absolutely gorgeous black hole.

The AI of the robots TARS and CASE were an unexpected delight! The movie did not try to make the AI look humanoid or impossibly advanced. The shape and movements of TARS were practical and realistic. The highly advanced natural language processing and fluid intelligence that the robots displayed was truly remarkable. As a person who is reasonably familiar with robotics, I know that these are areas where active research is going on. In fact, quite recently robotics research has started shifting from conservative control systems - that are very slow and careful - to control systems that make full use of the dynamics of the body of the robot - very similar to how TARS controls itself - to create robots that can make quick and acrobatic movements. The humor settings were a nice touch.

The harshest criticisms were leveled at the slightly unusual finale with Cooper floating around in the black hole and sending messages back in time. Many say that this ending was unsatisfactory and ruined what was otherwise a good movie. I disagree. Science fiction movies and novels can be characterized as a spectrum with hard science fiction on one end (like the work of Arthur C. Clarke) and soft science fiction on the other (like Doctor Who). Maybe people were complaining because most of the movie can be classified as hard science fiction while the black hole scene is more 'soft'. However, I find the idea of an advanced species manipulating a black hole to send messages back in time no more outlandish than the idea of them creating a wormhole at will. If there are species that are capable of the kind of engineering required to construct a stable wormhole, I think it's safe to assume that they have also figured out the mechanics of a black hole.

Also, a lot of the reviews that I've seen of interstellar keep looking at it from the point of view of a movie critic. They look at the movie as though it was designed to give them some sort of intense entertainment experience and that the movie fell short of giving it to them. I think it's unfair to expect Interstellar to be like that. Movies like interstellar are not good because they are packed full of action or witty dialog. Movies like interstellar are good because they make us dream about a better future. Just like Carl Sagan's COSMOS and Neil deGrasse Tyson's awe inspiring monologues on space, this movie has the power to inspire us. Perhaps one day in the future we'll become a space faring civilization and some of the people working on it will talk about how this really old movie about space travel captured their imagination.

So Interstellar is not a movie to watch if you're looking for action or comedy (although the movie has its fair share of both). It's a philosophical movie that is intended to make us think and wonder; and at least in my case, I think it succeeded in doing exactly that judging by the unusually long post-film trance I was in.

## Thursday, 8 October 2015

### The Magical Euler-Lagrange Equation and the Calculus of Variations

I've been learning a lot about simulating and controlling mechanical systems for one of my projects. Of all the math that I learned, the most amazing was the Euler-Lagrange equation.

$\frac{d}{dt} \frac{\partial L}{\partial \dot{q_i}} - \frac{\partial L}{\partial q_i} = Q_i$

The overarching theme of the control systems that I've been reading about is optimality. And wherever we need optmization, Calculus of variations tends come into the picture. Calculus of Variations is a branch of mathematics that studies the problem of finding functions that minimize the value of a functional (or cost function) in a particular range. As an example, take the range $x \in [0,1]$ on the real line and assume that $f(x)$ is a function that is defined in this range. There are an uncountably infinite different possible functions $f(x)$. A functional is an operation that maps the function $f(x)$ on to a real number. Think of this as a "cost" for the path that the function draws on the graph. The main problem that calculus of variations tries to solve is finding the function $f(x)$ that minimizes the "cost".

This area of mathematics has applications in a

So say you have a simple system like a simple pendulum and you want to figure out the equations of motion of system. The way people are used to doing it is drawing a diagram that shows the net forces acting on the system and then deriving the equations of motion using Newton's three laws of motion and energy constraints. Using the Euler-Lagrange equation, the procedure is slightly different and arguably simpler. The first step is to write down the Kinetic and Potential energies of the system.

$T = \frac{1}{2} m l^2 \dot{\theta}^2$

$V = -m g l \cos{\theta}$

$L = \frac{1}{2} m l^2 \dot{\theta}^2 + m g l \cos{\theta}$

To get the equations of motion of the system,

$\frac{d}{dt}\frac{\partial L}{\partial \dot{\theta}} - \frac{\partial L}{\partial \theta} = -b\dot{\theta} + u$

Where U is the torque an actuator (motor) at the base of the pendulum gives to the pendulum.

This gives:

$I\ddot{\theta} + b\dot{\theta} + m g l \sin{\theta} = u$

No fiddling around with forces and no need to try and figure if the net force is zero or not. Just works with pure energies! This works every time as long as you write down the Lagrangian correctly. The first time I saw this, I couldn't believe it worked! For the following few weeks I'd take Lagrangians of random things for fun and check to see if it was giving the correct answers.

This also makes me think about the amount of information we need to know 'everything' about a system. Before I learned about the Lagrangian, I had a 'gut-feeling' that in physics, fields were everything. Maybe this is because school and engineering level education in physics focuses a lot on field theories in physics. However, the fact that we can get the complete behaviour of a system from an expression that just encodes the total energy of the system implies that somehow, the energy of the system encodes all the necessary information to describe it completely. The equation suggests to me that potentials and energies are more 'fundamental' in a sense than fields.

The overarching theme of the control systems that I've been reading about is optimality. And wherever we need optmization, Calculus of variations tends come into the picture. Calculus of Variations is a branch of mathematics that studies the problem of finding functions that minimize the value of a functional (or cost function) in a particular range. As an example, take the range $x \in [0,1]$ on the real line and assume that $f(x)$ is a function that is defined in this range. There are an uncountably infinite different possible functions $f(x)$. A functional is an operation that maps the function $f(x)$ on to a real number. Think of this as a "cost" for the path that the function draws on the graph. The main problem that calculus of variations tries to solve is finding the function $f(x)$ that minimizes the "cost".

This area of mathematics has applications in a

*huge*variety of fields and arguably the most important of which is in the field of Physics. Long back some physicists found that the Newtonian Mechanics that everyone knows and loves can be reformulated as an optimization problem. They found that there is a quantity known as "action" (which is defined as kinetic energy - potential energy) that is minimized as particles and other things in the real world go about their daily business. This is commonly known as the "Principle of Least Action". Coming back to the Euler-Lagrange equation, mathematicians figured out that plugging in the functional into this equation gives the constraints of the system that cause it minimize the functional. This means that every single dynamic problem in Newtonian physics can be reformulated as an optimization problem! Pretty neat.So say you have a simple system like a simple pendulum and you want to figure out the equations of motion of system. The way people are used to doing it is drawing a diagram that shows the net forces acting on the system and then deriving the equations of motion using Newton's three laws of motion and energy constraints. Using the Euler-Lagrange equation, the procedure is slightly different and arguably simpler. The first step is to write down the Kinetic and Potential energies of the system.

$T = \frac{1}{2} m l^2 \dot{\theta}^2$

$V = -m g l \cos{\theta}$

$L = \frac{1}{2} m l^2 \dot{\theta}^2 + m g l \cos{\theta}$

To get the equations of motion of the system,

$\frac{d}{dt}\frac{\partial L}{\partial \dot{\theta}} - \frac{\partial L}{\partial \theta} = -b\dot{\theta} + u$

Where U is the torque an actuator (motor) at the base of the pendulum gives to the pendulum.

This gives:

$I\ddot{\theta} + b\dot{\theta} + m g l \sin{\theta} = u$

No fiddling around with forces and no need to try and figure if the net force is zero or not. Just works with pure energies! This works every time as long as you write down the Lagrangian correctly. The first time I saw this, I couldn't believe it worked! For the following few weeks I'd take Lagrangians of random things for fun and check to see if it was giving the correct answers.

This also makes me think about the amount of information we need to know 'everything' about a system. Before I learned about the Lagrangian, I had a 'gut-feeling' that in physics, fields were everything. Maybe this is because school and engineering level education in physics focuses a lot on field theories in physics. However, the fact that we can get the complete behaviour of a system from an expression that just encodes the total energy of the system implies that somehow, the energy of the system encodes all the necessary information to describe it completely. The equation suggests to me that potentials and energies are more 'fundamental' in a sense than fields.

## Monday, 5 October 2015

### Movie Review - Real Steel

Two weeks ago I discovered pure awesomeness in the form of Battle Bots. I spent my whole evening watching every episode of the 2015 season. Seeing my excitement, my friend suggested that I watch this movie called Real Steel which was released in 2011.

The movie is set in a future where human boxing is outlawed (Good!). Instead people watch insanely cool robots go all out against each other for entertainment (Fantastic!). A retired human boxer (Charlie Kenton) who is trying to make it big in the robot boxing (and failing miserably at it) has to spend two months looking after his son (who he hasn't seen in years). Though they are initially cold to each other, the duo eventually learn to work together with a little scrapyard bot called Atom and make it to the Real Steel Tournament, repairing their relationship along the way. The movie is surprisingly heart warming despite the fact that it is about robots fighting violently to their deaths while people cheer (Or maybe it's heartwarming

*because*of that :D ).
Of course, what I liked most about the movie was were the robots! Being a robotics enthusiast and having built quite a few robots myself, it's really nice to see more realistic robots being depicted in film. Far too many movies get robots wrong. Most of them vastly over estimate their abilities or completely ignore the laws of physics. I liked the fact that the robot fighters in the movie were controlled via joysticks and had the sort of jerky, non-human movements that characterize robots. I get the feeling that the movie was aiming for realism - which is a goal that I really appreciate, especially when the movie is about a subject like robotics which I'm really passionate about.

The fight scenes between the robots were brilliantly choreographed and genuinely exciting. The urge to cheer on little Atom during the final fight was completely overwhelming.

The movie also gave me a lot to think about. It set my mind a-buzz with ideas about how to implement a robot boxing tournament like that in real life. It also got me thinking about how I would go about designing robots like that. The problem is harder than most people think. This amusing compilation of robots attempting the DARPA Challenge gives you a good idea of how good our robots are in real life. The movie was a satisfying blend of science fiction, robots and very human, emotional moments. Definitely a worthwhile watch.

Subscribe to:
Posts (Atom)