Let’s define an AI as a computer system that interacts with the world through capabilities (including visual perception and speech recognition) and behaviours that we would think of as intelligent. This is where the difficulty starts, because there are multiple definitions of intelligence. For example, is it the ability to acquire and apply knowledge and skills, or is it wisdom, or the ability to speak and write in Latin, Greek and Hebrew (as was needed for Harvard entry 150 years ago), or the ability to handle criticism without denial, blame, excuses or anxiety? Then we need to add in the problem of general intelligence as opposed to domain specific intelligence. This is translated in AI terms into the singularity: General AI – the singularity: the point at which an AI-powered computer or robot becomes capable of redesigning and improving itself or of designing AI more advanced than itself. This is general AI and it would have to successfully perform any intellectual task that a human being could, the domain specific Intelligence is what all current AI does e.g. Chess, Go, Car driving.
In general AI can seem like magic until we understand how it works and its use has become so general that we often don’t even think of it as being AI. If we take away the physical manifestation of AI in robots, androids etc. and think about the essence of AI and how it is done, then the most important aspect of AI is the design process through which a clear representation of the problem that an AI is to solve is developed. Without this, then there is no chance of developing a good solution using AI. How knowledge is represented and how information is represented for an AI to process will determine whether or not a particular problem is solved.
The AI designer must have a good understanding of the problem an AI is supposed to solve, as well as the type of AI technique that might be appropriate. The features of the problem must be specified along with the features of the environment in which the AI must operate. The AI’s intelligence or knowledge will enable it to find a solution to the problem within the constraints of its environment.
Rule based systems
Much early AI used rule-based knowledge systems to complete search, planning, decision making and game playing. Knowledge was represented as a series of logical if-then rules that enabled the AI to map problem features onto a rule that was appropriate for the environment and fire this rule to reach an answer/solution. The rule base could be complex and inter-linked to allow these systems to solve complex problems. The challenges of representing knowledge using rules include: issues of feature representation, having an adequate number of rules, obtaining rules that are not inconsistent, and having rules that handle special cases and situations. To be successful, rule based systems, also known as production rule systems must have the right features representing the problem, sufficient knowledge, consistent knowledge and the ability to deal with exceptions, because in the real world, there are almost always exceptions. It is hard to capture knowledge using rules.
Several ways to deal with the uncertainty of the real world in rule based systems were developed, including Fuzzy Set Theory, for example. The essential idea of Fuzzy Set Theory is that it acknowledges the fact that some problems or problem features are more typical than others. Using Fuzzy Set Theory, one can develop fuzzy logic in which knowledge is represented using fuzzy logical rules. The terminology “fuzzy set theory” and “fuzzy logic” recognizes the fact that in the real world things are not black and white or true or false but rather things can be partially true or partially false. Production rule base systems are fixed at the time of design, they do not learn as they solve more and more problems.
Probability and statistical systems
Probability systems rely on a set of probability based rules about the environment in which the system is operating. The probability parameters in these rules can be can be changed as the machine learns. The system must find the right rules to match the observed data. Statistical learning algorithms are used for voice recognition and they assume large amounts of data from which the % likelihood of something happening in the environment can be calculated. A set of prior probabilities are used to boot strap the process and then the system can be trained using the available data sets. Once again the extent of the learning is a factor of the quality of the training data and the subsequent data encountered by the system. The system is limited by the rules that describe the ‘world’ of problems that it can tackle. Put another way, these systems can only work with a finite collection of input types.
Neural networks
The original neural net theory was developed in the middle of the last century in the shape of the McCulloch Pitts (MP) analysis. This was a logical rule based system that was likened to the firing of a neuron. When the artificial (logic) MP neuron fired it was indicating that an assertion was positive/true, when it did not fire, the assertion was deemed false. An MP neuron with a high threshold value (inhibitory) would rarely fire, and represented a logical “and”, an MP neuron with a low threshold value represented a logical “or”. These were the building blocks of logic in a neural network. Any logical expression can be represented by a network of these special MP neurons. This theory was used to build machines that could implement any arbitrary logical function, but this means that such systems are restricted and could not process an infinite array of inputs. This limitation stimulated the use of the concept of functions: a type of machine whose inputs are a collection of numerical measurements that then generates a set of outputs, which are also represented as a pattern of numbers. This sort of situation is quite common in areas such as speech signal processing, image processing, or problems involving the control of physical devices and systems.
Rosenblatt – perceptron learning theorem
The MP neurons discussed in the previous section were used by a psychologist (Rosenblatt) to build a system called a Perceptron. It consisted of three distinct groups of MP neurons. The first group of neurons were called the “sensor” or “input units”. The outputs of the first group of MP-units were then connected to the second group of MP-units which were called “association” or “hidden units”. And finally, the outputs of the hidden units were connected to the third group of units which were called “response” or “output units”. The connections from the input units to the hidden units were randomly fixed due to some genetic disposition and not modifiable, while the connections from the hidden units to the output units were modifiable and changed their values depending upon the Perceptron’s experiences. The Perceptron Learning Theorem meant that the system would remember what it had been trained with, but it was not always able to deal with inputs on which it had not been trained.
The perceptron work was done in the late 1950’s and led to the development of Support Vector Machines, which address the two limitations of the Perceptron: the problem of a guarantee of memorization but not generalization, and, the problem of learning problems where no solution for entirely correct classification exists. Support Vector Machines are still used and popular for machine learning today. Another very popular machine learning algorithm is logistic regression modeling which also uses a version of the McCulloch-Pitts formal neuron.
Latent Semantic Analysis. A large class of machine learning algorithms learn the “semantic meaning” of a verbal code such as “dog” in a similar manner. However, the key idea for this specific class of machine learning algorithms is that a “situation” is defined in terms of the frequency in which the “verbal code” is used across a collection of documents. This collection of documents is called a “corpus” and can be interpreted in this context as an abstract model of the collection of situations that a language learner might experience.
Deep Learning[1]
The essential goal of most common deep learning methods is to learn a useful set of features for representing the learning machine’s environment, with the purpose of facilitating learning and generalization performance by the learning machine. Deep learning methods tend to focus on learning machines that have multiple layers of processing units or neural networks. So, for example, the inputs to the system are processed by the “input units”. The features for the input units are chosen by the artificial intelligence engineer. Then the outputs of the “input units” are fed into the inputs of the first layer of “hidden units”. Then the outputs of the first layer of “hidden units” are fed into the inputs of the second layer of “hidden units”. Then the outputs of the second layer of “hidden units” are fed into the inputs of the third layer of “hidden units”. Then the outputs of the third layer of “hidden units” are fed into the fourth layer of “hidden units”. This process continues until the final layer of hidden units is reached. At this point the output of the final layer of hidden units is used to make a prediction if the learning machine is a supervised learning machine.