What You Should Know About Machine Learning

Well, the topic machine learning to non-techies sounds so strange. So, I decided to put together expect’s explanations and views on the topic: what you should know about machine learning in a lay man’s language. The aim in this article is basically to drag you closer to the topic machine learning and get to know what this really is.

Anyhow, most experts mention that often time they find themselves explaining machine learning to non-experts, and some they have offered to keep sharing there thoughts through public platforms as a way to have community engaged.

So, let’s dig deep…


Overview: Machine Learning And Artificial Intelligence

The abundance of data is crucial for the recognition of patterns, therefore the use of Machine Learning in the information age has been empowered by the Internet of Things (IoT) and big-data; seizing their capacity of generation, storage and processing of a great volume of data in real time at low cost.

This is going to change the way software is used and created, so it’s important to embrace it as an opportunity to be better, instead of a threat.

Machine learning is part of Artificial Intelligence, which in turn is a discipline of Computer Science. Since the advent of Artificial Intelligence discipline, there have been efforts to solve problems using algorithms for which it was necessary to program the solution. Unlike these approaches, machine learning does not require programming the solution but is based on programs that learn by themselves.

Machine learning will be increasingly present in our lives not only as consumers but also in the workplace. At last, but not at least, Machine learning is creating solutions for specific problems and is still very  far from reaching the capacity of the human brain.


What You Should Know About Machine Learning

  1. Machine learning means learning from data
    Machine learning lives up to the hype: there is an incredible number of problems that you can solve by providing the right training data to the right learning algorithms. Call it Artificial Intelligence if that helps you sell it, but know that AI, at least as used outside of academia, is often a buzzword that can mean whatever people want it to mean.
  2. Machine learning is about data and algorithms, but mostly data
    There’s a lot of excitement about advances in machine learning algorithms, and particularly about deep learning. But data is the key ingredient that makes machine learning possible. You can have machine learning without sophisticated algorithms, but not without good data.
  3. Unless you have a lot of data, you should stick to simple models
    Machine learning trains a model from patterns in your data, exploring a space of possible models defined by parameters. If your parameter space is too big, you’ll overfit to your training data and train a model that doesn’t generalize beyond it. A detailed explanation requires more math, but as a rule, you should keep your models as simple as possible.
  4. Machine learning can only be as good as the data you use to train it
    The phrase “garbage in, garbage out” predates machine learning, but it aptly characterizes a key limitation of machine learning. Machine learning can only discover patterns that are present in your training data. For supervised machine learning tasks like classification, you’ll need a robust collection of correctly labeled, richly featured training data.
  5. Machine learning only works if your training data is representative
    Just as a fund prospectus warns that “past performance is no guarantee of future results”, machine learning should warn that it’s only guaranteed to work for data generated by the same distribution that generated its training data. Be vigilant of skews between training data and production data, and retrain your models frequently so they don’t become stale.
  6. Most of the hard work for machine learning is data transformation
    From reading the hype about new machine learning techniques, you might think that machine learning is mostly about selecting and tuning algorithms. The reality is more prosaic: most of your time and effort goes into data cleansing and feature engineering — that is, transforming raw features into features that better represent the signal in your data.
  7. Deep learning is a revolutionary advance, but it isn’t a magic bullet
    Deep learning has earned its hype by delivering advances across a broad range of machine learning application areas. Moreover, deep learning automates some of the work traditionally performed through feature engineering, especially for image and video data. But deep learning isn’t a silver bullet. You can’t just use it out of the box, and you’ll still need to invest significant effort in data cleansing and transformation.
  8. Machine learning systems are highly vulnerable to operator error
    With apologies to the NRA, “Machine learning algorithms don’t kill people; people kill people.” When machine learning systems fail, it’s rarely because of problems with the machine learning algorithm. More likely, you’ve introduced human error into the training data, creating bias or some other systematic error. Always be skeptical, and approach machine learning with the discipline you apply to software engineering.
  9. Machine learning can inadvertently create a self-fulfilling prophecy
    In many applications of machine learning, the decisions you make today affect the training data you collect tomorrow. Once your machine learning system embeds biases into its model, it can continue generating new training data that reinforce those biases. And some biases can ruin people’s lives. Be responsible: don’t create self-fulfilling prophecies.
  10. AI is not going to become self-aware, rise up, and destroy humanity
    A surprising number of people seem to be getting their ideas about artificial intelligence from science fiction movies. We should be inspired by science fiction, but not so credulous that we mistake it for reality. There are enough real and present dangers to worry about, from consciously evil human beings to unconsciously biased machine learning models. So you can stop worrying about SkyNet and “superintelligence”.


Further More On Machine Learning  For Non-Experts

  • It is most decidedly NOT a branch of statistics, although I can understand the motivation of those who wish it were so. Certainly, that would simplify the field and make it more coherent, but it would also eliminate many aspects of the field that make it unique. If anything, I would argue that statistics is a subfield of ML, so the inclusion goes the other way.
  • Machine learning is simply put, the study of learning machines (reverse the words). It’s the (engineering) attempt to build artifacts that “learn”. Part of the problem here is that we haven’t defined what “learning” is. So, a good chunk of the past 30+ years has been on trying to define what “learning” is. We’re not past figuring that out yet!
  • The people who do ML are a curious amalgam of computer scientists, neuroscientists, statisticians, biologists, philosophers, physicists, engineers, and thinkers of various sorts. Just attend a NIPS meeting, and you’ll see why. If anything, this defines the field. As Michael Jordan (U.C. Berkeley) put it once, ML is simply defined as the set of activities of this curious mixture of people who come to meetings like NIPS. That is what makes it special.
  • ML gained significant traction these recent years thanks to the advances in deep learning techniques. These, in fact, have been improving lately thanks to the explosion of available computational resources and data and to new engineering techniques (most notably GPU usage).

All this improvement had a great impact on real-world problems: image recognition and translation are some of the few applications that have improved beyond imagination.


Summary Details:

  • Despite the recent fad, ML isn’t a recent field. One can trace back its origins to pattern recognition and computational learning theory. Instead, as stated in the introduction, what is recent is the success that it is having (mostly thanks to the rise of humongous data collections and the availability of cheap computer).
  • In the past, ML experienced “winter” periods where funding was scarce. Hopefully, no more winters to come.
  • ML is a computer science discipline that consists of making computers “learn” from data rather than programming instructions. For example, imagine you had to implement a gender (male vs female) recognition software. If you had to implement this in the traditional way, you would need to extract features that would help you decide. Then, you would write a lot of code to instruct the computer how to use these features. Unfortunately, this approach is tedious and not robust enough. On the other hand, the ML approach consists of collecting lots of images and labeling them. Then, running an ML algorithm that will learn the task by observing the data. By the way, this approach is called supervised learning.
  • ML algorithms come in various flavors. One of the most used forms is supervised learning as discussed in the example above. It consists in providing the computer with data about the thing to learn (this is called features) and also the correct thing to learn (this is called a target). In the previous example, a feature would be the height of the person (for traditional algorithms) or the edges in the image (for a deep learning model). The target would be the gender. In contrast, in unsupervised learning, the target isn’t provided and the algorithm has to rely on other representations to learn.
  • In the supervised learning world, there is a further distinction between classification and regression. The above recognition example is a classification problem since it consists in classifying discrete instances (male or female). A regression problem would be learning how apartment prices (a continuous target) vary according to location.
  • Deep learning is a sub-field of ML. It is dubbed “deep” because it uses (artificial) neural network structures with many layers.
  • ML is an evolving and exciting field. Many jobs exist and many more will. It is the modern form of literacy in our technological and data-driven society. Learn about it as much as you can.
  • Finally, current versions of ML models won’t kill you since they aren’t sentient and the road to making them this way is uncertain and distant. However, ML used within weapons and trained to do so could. This is why it is crucial to make as many people as possible informed about ML, what it is (and isn’t) really, and how to use it for great things (while avoiding unethical applications).

Technology can be our best friend, and technology can also be the biggest party pooper of our lives. It interrupts our own story, interrupts our ability to have a thought or a daydream, to imagine something wonderful, because we're too busy bridging the walk from the cafeteria back to the office on the cell phone. GO Tech UG: Bridging your Way to Tech