Is data science over-hyped? Or is the hype justified? Is there a risk of another AI winter in the near future? Milton Lim assesses the risks (both upside and downside) of the data science revolution and considers how best to ride the “Big Data” wave.
This article is the first in a series that explores the breakthroughs, history and philosophy of artificial intelligence (AI), computer science, machine learning, statistics, and high dimensional problems. It will also examine the issues of data quality, modelling, consulting, ethics and professionalism in data science.
Part 1: AI Breakthrough
“Any sufficiently advanced technology is indistinguishable from magic.”
- Arthur C. Clarke
On 15 March 2016, a breakthrough in AI history occurred in Seoul, South Korea. AlphaGo, an AI computer program developed by Google DeepMind, beat Lee Sedol, the world’s best player over the last decade, at the game of Go with a score of 4 games to 1. The ancient Chinese board game Go, translated as “surrounding chess”, was invented over 2500 years ago, with only a few simple rules where players aim to acquire more territory by surrounding the opponent’s stones. In Go, the number of potential options at each move is about 200, much larger than chess which is about 20. The number of possible Go board positions is (10170 ) and the number of possible Go games is at least 10^(10^48), much larger than the 10120 possible chess games or the 1080 atoms in the observable universe.
Mastering the game of Go has long been considered the holy grail of AI research due to its complexity, which requires human intuition, creativity and strategic thinking. Acquired by Google in 2014 for £400 million, London-based Google DeepMind’s mission is to “solve intelligence”, which is based on the principles of general “strong” AI (learning like a human), rather than narrow “weak” AI (pre-programmed with rules). AlphaGo’s victory was about a decade earlier than anticipated, potentially viewed as an “Apollo mission” to start the AI race.
AlphaGo was built from deep neural networks trained with supervised learning and reinforcement learning. It used supervised learning to identify the patterns in a database of 30 million moves from 100,000 historical games (labelled as winning or losing) played by Go experts. It then applied reinforcement learning by playing against different instances of itself repeatedly to refine its strategies. During such a game, AlphaGo operated with the following components: 
- Policy neural network to search for promising regions to place a move based on the training data by analysing popular human playing moves
- Value neural network to evaluate board positions to determine the probability of winning
- Monte Carlo tree search algorithm to simulate a probabilistic range of game outcomes for each potential move
Can machines “think”?
During the games, AlphaGo even produced a genius move (Game 2, Move 37) described as “creative and beautiful”, which connected all the previous stones played to form a network of influence around the board. This move had a 1 in 10,000 probability of being played by a human. AlphaGo transcended the limitations of its human creators to form its own original creations. Ke Jie, the current world No.1 ranked Go player commented “After humanity spent thousands of years improving our tactics, computers tell us that humans are completely wrong… I would go as far as to say not a single human has touched the edge of the truth of Go”
After losing the first 3 games, Sedol managed to stage a comeback in Game 4. He responded by playing a type of extreme strategy known as amashi, by aggressively forcing the opponent into an “all or nothing” situation (like going “all-in” in poker). Sedol found a wedge move that allowed him to control the centre and turn the game around, which was praised by commentators as the “hand of God”. AlphaGo evaluated that move at 0.007% probability of being played by a typical human, which reflects Sedol’s genius in identifying that move. Conceptually, Sedol forced AlphaGo into a “blind spot” in the high dimensional search space, which was not calibrated with enough training data to secure a win. Had Sedol won at least 3 games by successfully “stress testing” AlphaGo’s algorithm, he could have picked up the US$1m winner’s prize rather than just US$170k (a handsome reward for one week of work though).
AlphaGo was programmed to maximise its probability of winning, regardless of margin, over the long-term by searching on average 50-60 moves ahead. Hence, it focussed more on likely wins by a small margin (as a win by one stone is still a win), than unlikely wins by high margins. As a result, AlphaGo tended to play quite conservatively with a lot of “slack” moves that appeared to be suboptimal for maximising territory, in contrast to humans who intuitively claim as much territory as possible at every opportunity. It seemed that AlphaGo was able to navigate very close games “down to the wire” with great precision, better than humans relying on intuition. AlphaGo could easily “interpolate” strategies amongst existing training data and its simulations with great accuracy, but it was terrible at “extrapolating” into unknown territory without domain-specific knowledge, where it appeared to make “delusional” decisions.
Alchemy or chemistry?
Historically, artificial neural networks have been perceived as “black boxes” with little understanding of how they truly operate. As non-linear statistical models, currently the best way to look inside seems to be some simple sensitivity testing with the weights of the component neurons. Some critics have described the current state of research in neural networks as “more alchemy than chemistry” due to the lack of mathematical rigour and solid theoretical understanding behind them.
The success of AlphaGo showed that deep neural networks appear to be capable of capturing the complex non-linearities in the very high dimensional search space of Go games. However, it has not “solved” the game of Go exhaustively in a precise mathematical sense, but merely consolidated all the information contained in the human styles of playing into an effective machine learning algorithm. This is similar to the statistical strategy of “boosting” weak learners (in this case human data) into a strong learner (AlphaGo). Reinforcement learning allowed it to learn from all its past mistakes by rehearsing with itself (imagine if humans could learn to do this too!).
Google DeepMind has subsequently produced superior versions of AlphaGo:
- AlphaGo Zero: by playing against itself and without any human data, AlphaGo Zero surpassed the strength of AlphaGo Lee (version which won over Lee Sedol) in 3 days by winning 100 games to 0
- AlphaZero: generalized AlphaGo Zero’s approach into a single algorithm to play different games such as chess and shogi to achieve a superhuman level of play within 24 hours of training.
Into industry: the future of society
Google DeepMind is also applying its research to the healthcare and energy industries. AlphaGo’s algorithm has even been used as the basis for a new means of computing potential pharmaceutical drug molecules and protein folding. With the recent success and publicity, the prospect of an upcoming AI winter (a period of reduced funding and interest in AI research) seems unlikely. The implications of large-scale machine learning and AI for the future of society are very promising.
For interested readers, I would highly recommend watching the fascinating documentary “AlphaGo” (available on Netflix).
In the next article in this series, we will examine the history of AI since the 1950’s with booms and busts (AI summers and AI winters) to get an idea of where it might be heading in the future.
 Silver et al. (2016) “Mastering the game of Go with deep neural networks and tree search” Nature
 “Humans mourn loss after Google is unmasked as China’s Go master” Wall Street Journal, 5 January 2017
 Rahimi, Ali (2017) NIPS 2017 Test-of-Time Award
 Silver et al. (2017) “Mastering the game of Go without human knowledge” Nature
 Silver et al. (2017) “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm” preprint
 “Go and make some drugs” The Engineer, 3 April 2018
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivatives CC BY-NC-ND Version 3.0 (CC Australia ported licence).
CPD Actuaries Institute Members can claim two CPD points for every hour of reading articles on Actuaries Digital.