Choosing Your Language
The Advantages of Python and MATLAB
You are here for one of two reasons: either because you want to get a ML quick start or because you have already made your first steps and want to see whether they have been the right ones. Either way, this article should prove an interesting read for you. Since, unlike most of the other texts on the subject, it doesn’t propose using Python or MATLAB. Instead, it focuses on explaining how to efficiently use .NET technologies. And why doing that may mean getting rid of the need for other programming languages.
Don’t get me wrong: it’s a never-ending discussion and there’s a reason why Python and MATLAB are usually favoured. They are both easy to learn and as easy to use afterwards. Especially if you’re starting from scratch. Furthermore, they both use your CPU resources rather sparingly, while providing comparatively fast performance.
However – and particularly if you are already programming in .NET – there’s really no reason for you to switch.
Why Then .NET?
The most popular .NET language (and one of the most popular programming languages in general), C#, is not only the best choice for developing Windows apps and games, but it also works as great with ML. These are the main three reasons:
- If you know the basics of the C# syntax, you will quickly find your inside the libraries for ML (like Accord.NET and AForge.NET);
- C# is faster than Python due to its “compiled” origin (Python is an interpreted language);
- Microsoft’s tools will make your work significantly easier and more efficient. You can create and manage Azure Database and publish your program to the Azure server directly from your integrated development environment (IDE). In addition, you can use all of Azure’s machine learning services, which will certainly help you get the most of cloud computing insofar data manipulations with ML are concerned.
Python and MATLAB work great if you are just starting and you don’t have any programming background either in .NET or at all. In any other case, however, I’d recommend giving .NET a try. I promise: exploring the numerous opportunities it offers will be merely a beginning.
Leaving the theory behind us, it’s time we focus more on practising the basics of ML using C# and the Accord.NET library. Maybe that will do the trick.
Practising Supervised Machine Learning
This article presumes that you already have some basic understanding of how AI – and ML in particularly – works. If that’s not the case, first, take your time to get familiar with some of the basic concepts and terminology or even ML’s objectives (there are few other relevant articles as well – more philosophical and less scientific). Afterwards, come back here and, bit by bit, I’ll lead you through some of the coding basics.
Hopefully, since you moved to this paragraph, you have same basic grasp of ML or you did heed to my warnings and read few of this blog’s other articles. Anyway, it’s time we began. In supervised ML, each task falls in one of two categories: regression and classification. Since this article concerns only the practical side of the most basic ML concepts, for now, we’ll focus mostly on regression. You may know something about regression from your linear algebra classes. According to a relevant course at Khan’s Academy (which I strongly advise you to take if you want to strengthen your mathematical skills):
“Regression is fitting a line or curve to a pattern we see in a scatter plot.”
But how will regression help you in solving actual tasks? Well, despite being pretty straightforward, you can leverage it to make all kinds of predictions. Consider one of the most popular tasks – predicting house prices based on their size (via Andrew Ng‘s Machine Learning course at Stanford University).
Now, let’s write down some code.
We’ll be working with Microsoft Visual Studio. In case you don’t have Microsoft’s IDE, you can get a free download for Windows, MacOS, and Linux.
We’ll start by creating a Project ‡› Console Application.
Now let’s use the NuGet package to get Accord.NET library. Keep in mind that, for this application, you will only need the following directives:
In Visual Studio, there are always few different ways to achieve what you want. In this case, you simply follow the illustrated instructions below. Or you can go directly to the Package Manager console and add the packages you need from there: PM> Install-Package Accord:
Once the library is added, we’re ready to go back to our task. First and foremost, we need to create a training set, i.e. a set of sample input-output pairs. What we’re doing is, basically, feeding the program with a “scatter plot” (remember the definition of regression?), from which the program should extract the underlying pattern and apply it future inputs to make the predictions we need.
The logic behind the solution concerns a method called “ApplyRegression”, while the Program’s “Main” method will remain for testing only. ApplyRegression accepts two parameters – the two arrays of type “double” – as the training set and returns void.
Now let’s use the “ApplyRegression” method by feeding it with the appropriate code. We’ll use the ordinary least squares (OLS) algorithm, which, to quote Wikipedia:
“…is a method for estimating the unknown parameters in a linear regression model, with the goal of minimizing the sum of the squares of the differences between the observed responses (values of the variable being predicted) in the given dataset and those predicted by a linear function of a set of explanatory variables.”
The method is widely used and is one of the most popular ones for solving these kinds of tasks. Let’s set it up and force it to learn from our data.
If you want to, you can also render a graphical representation of your training set in the Console.
Press Debug ‡› Start without debugging or “CTRL + F5” on Windows and voilÃ – there’s your data simply visualized.
Now let’s add a comment to this line and add few other lines of code so as to get our predictions and display them in the Console.
In the example above, we have predicted the price for a house of 3,100 feet squared. In the Console, you’ll see 538,78.
Feel free to train your program with more data. The more data it’s given, the more accurate predictions you will get.
Supervised ML works by combining two ingredients: a training set and a learning algorithm. Pictorially, it looks something like this:
In this article, in order to solve a simple regression task, we used a small training set (indeed, so small that it can be used for explanatory purposes only) and the ordinary least squares algorithm. By combining them, we’ve produced a model which gives us a prediction based on every given input. By enlarging the dataset, its accuracy is improved.