NETtalk (artificial neural network) NETtalk is mostly known as a program that is able to learn how to pronounce written English text. The network does this by receiving text as input and by matching phonetic transcriptions to this input for comparison. NETtalk is an artificial neural network (ANN). It resulted out of a research project carried out in the mid 1980’s by Terry Sejnowski and Charles E. Rosenberg. The intent behind this project was to construct a simplified ANN that would shed light on the complexity behind learning human level cognitive tasks. This network was also intended to be implemented as a connectionist model that could learn to perform human-like cognitive tasks.
Connectionist approach
editANN can be viewed as computer simulations of how neurons in the human brain perform different tasks. There are various reasons for using these networks. First, they are intrinsically learning systems so they are able to efficiently acquire knowledge. Second, since they are simulations of the natural brain, it makes it relatively easy to implement ideas taken from neurobiology and apply them to the network. One of those main ideas being parallelism. Parallelism allows the system to activate a large number of neurons simultaneously, making it possible for the network to process a great number of things. Lastly, ANN have gained popularity due to their ability of being able to be embedded into physical robots [1] NETtalk is a connectionist model, which is a type of ANN. The core principal behind connectionism is that mental phenomena can be described by interconnected networks of simple and often uniform units. NETtalk has been described as a first-ordered connectionist model, which are models that lack a genuine knowledge of abstract categories (such as the idea of a vowel) [2]. This type of model provides the basis for an alternative model of human cognition. NETtalk is one of the most well-known examples of first-ordered connectionist system [2]. The core goal behind this model is to successfully transform text into spoken language. This is done through a connectionist perspective which utilizes information about how the mind works and transfers this information into a computational model.
Layers Network
editANN are composed of layers. NETtalk is a multi-layer network consisting of three main layers: an input, a hidden, and an output layer (refer to figure). The input layer receives information, the nodes in this layer activate the nodes in the hidden layer, which in turn activate the nodes in the output layer [3]. What makes this type of network special is the hidden layer. This layer performs intermediary processing, and it is the layer which allows the network to perform complex computations. ANN learns by a back-propagation learning rule. The system contains a “teacher” which compares the response given by the output layer to a target response. The difference between the two is known as the error signal. The links between the layers contain weights. When the error signal is send back to the system, it causes the system to adjust the connection weights in such a way that the system tends toward the correct output [4] . This procedure is repeated many thousands of times, allowing the system to learn the correct output.
NETtalk
editNETtalk is a parallel network that was originally designed to read English text aloud. The English language has inconsistent spelling, but it does follow some rules. The structure of the language makes it well suited to be studied under a ANN since these networks are are good at picking up statistical regularities, and are not bothered by occasional inconsistencies [5]. The system is first presented with a set of written letters which make up a word, and the final output is the pronunciation of the word presented. The output is then presented to a speech synthesizer which is responsible for producing sounds [3]. It is important to keep in mind that while the network has been shown to make correct pronunciations, it does not understand what it is reading, so it is not a form of artificial intelligence. The input layer contains seven groups, and each group contains 29 units (each unit representing a letter). NETtalk can only processes seven letters at a time. When it is processing a string of letters NETtalk focuses primarily on the fourth and middle letters of the seven presented, and these are the letters that the network aims to pronounce [5]. The remaining letters serve as the context which help the network to disambiguate the correct pronunciation. This is because in English the sound of a letter is heavily dependent on the adjacent letters [6]. Once the input units have received their corresponding letters they send it to the hidden nodes. There is a total of 80 nodes in the hidden layer, and these nodes engage in recording a partial portion of the input data. The hidden layer connects to 26 nodes in the output layer. The pattern of activation of these 26 nodes represent the system’s initial response to the input letters. This pronunciation is then compared to the correct response, which is specified by a teacher. An error signal is send back to the system and weights are then adjusted using the back-propagation algorithm. The network shows a 95% accuracy rate, and a 78% accuracy rate when new text is tested. This demonstrates that it has a capacity for generalization. Lastly, when the system is damaged by researchers (by randomly changing the weights), the network is still fairly accurate- showing that it is resistant to degradation [5].
Advantages of NETtalk
editGeneralization
editAn advantage of neural networks such as NETtalk its their ability to generalize. This property gives the network the ability to apply a learned rule to a novel situation . NETtalk can respond appropriately to a novel set of words once it has learned the internal structure of the English language [7]. This is a huge advantage as it allows programers to train the network with a fix number of items, and leaves it up to the network to generalize to a novel set of items- saving time and energy. For those who use NETtalk as a model to study human cognition, generalization is an important property as it resembles biological systems [8], making it a good model for studying how the human brain processes information.
Graceful Degradation
editGraceful degradation refers to the idea that increased damage to the network results in a gradual decrease in performance [3]. If some nodes are destroyed, others can take over. And thus, when the network is only slightly damaged it results in a small reduction of performance, while when the network suffers greater damage it results in larger deficits. Graceful degradation allows two important things: it allows for relearning after damage (by retaining some of the weights associated with appropriate pronunciation, it is able to re-learn what it already knew) , and a distinct pattern of activation may still occur even if the system has lost portions of set of units [8], allowing the system to still produce the desire output. Human patients who have undergo brain damage show a phenomenon similar to graceful degradation [3], giving researchers the opportunity to use NETtalk as a form to understand graceful degradation (as related to human language), and to possibly treat any loss that could occur after brain damage.
Disadvantages of NETtalk
editStability-plasticity dilemma
editNETtalk encounters a problem known as the stability-plasticity dilemma. Stephen Grossberg identified this dilemma. It states that a network should be plastic enough to store novel input patterns, and at the same time, the network should be stable enough to prevent previously encoded patterns from being erased [9]. NETtalk seems to be caught up in this dilemma, and while it can help provide insights into human interference, it could also be problematic. The problem arises when the network is unable to remember previously learned information because new information is being stored.
Catastrophic interference
editAnother disadvantage of NETtalk and neural networks in general is what has been labelled as catastrophic interference. This happens when the network has already been trained with a pattern and it is then trained to learn a new set of information [10]. When this happens, the learning of new information modifies the weights in a way such as the original set of weights is forgotten. Although some solutions have been found to fix the problem (refer to French (1992) study[10]), if the problem is not fixed, forgetting original knowledge could be catastrophic.
What the network does NOT do
editNETtalk does not understand what it is saying. As an example, the network is not able to know whether the presented letter is a vowel or not. Such as if the network is presented with an input as it’s asked to output “no” if the input is not a vowel, and “yes” if the input is a vowel, NETtalk is unable to make this response [4]. Researchers have also identified another shortcoming of NETtalk- This network lacks the ability of what has been labeled “structure-transforming (ST) generalizations.” The network is able to perform structure-preserving (SP) generalizations, which is the process of taking acquired knowledge (such as the knowledge of adding -ed endings to create past tense words) and transferring it to new cases [4]. As an example, it would be a ST generalization to take the knowledge it had previously learned about past tenses and be able to take the past tense of a word and remove it to unveil the stem . So a network would demonstrate ST generalization if it is presented with the word talked, and was able to remove the “ed” ending to unveil the stem “talk.” NETtalk lacks this ability. Lastly, the network does not always accurately model how the human brain works. As an example, since NETtalk is a supervised network it has a “teacher” telling it what to do. Humans do not always have somebody providing them the right answer [11]. Something else to take into consideration is that there is no current evidence that human brains feed an error signal back to the system to modify the connections between the neurons (as back-propagation does) [12]
NETtalk and memory
editAs an ANN, NETtalk was biologically inspired. This allows the network to simulate processes in the human brain, helping researchers use the model as a way to understand how humans use their memory. From a neuroscience perspective, there is an easy analogy between networks that display parallelism and neuronal circuitry. Nodes and the connections between them can be thought of as neurons and synapses, respectively. The changes that occur in neuronal structures due to experience, can be thought of as changes in the weights of the connections between the nodes in the NETtalk network [13]. NETtalk has been used to study what memory researchers call the spacing effect [14]. In NETtalk memory representations are shard among many nodes, and the representations are learned by continous practice. Similarly, in humans distributed practice has been shown to be more effective for long-term retention than massed practice. It has been found that NETtalk also takes advantage of distributive processing, with strikingly similar results to human experiments [14]. This suggests that the spacing effect can be explained using distributed representations in massive-parallel network architectures. The network serves as a physiological model (by how the nodes resemble neurons) and as a psychological model (by how the model learns). While NETtalk has been successful at aiding researchers study the spacing effect, it fails to mimic other aspects of memory. As an example, NETtalk learns things quicker than humans do [13]. To date NETtalk has only been successful at studying the spacing effect of memory. This is because the network may be too powerful [15]. This network models human processing to the extreme, and it captures a much larger class of processing, human and nonhuman alike. The power of the model could be used in future memory research when more sophisticated theories are developed which could be modeled by the system. Due do its high generalization performance, NETtalk has also been used as a source of comparison for new networks designed to research human memory. As an example, in 2003 NETtalk helped in the development of Memory-based reasoning (MBR) models [16].
NETtalk, speech and language
editSpeech and language in humans
editThe text-to-speech conversion used by NETtalk has served as an analogy for some aspects of human speech development. One can compare the output errors given by the network in relation to the input, and compare these errors to those appearing in young children (such as how to identify phonemes and how children plan a series of steps to produce language [17]). It has been found that the network tends to make the kinds of errors observed in children’s speech. As an example, the network would sometimes substituted a stressed vowel with a more neutral unstressed vowel, just like humans do [17]. Researchers have also used NETtalk to study second language acquisition. It has been found that when Spanish is learned as a second language, with English as the first, the performance was as good as when the network was trained with Spanish as the first language. However, when the network is first trained with Spanish and then with English, the network performs poorly [17]. This could give an insight into how individuals from different backgrounds might learn a second language.
Lasting Impact
editAlthough not a lot of research has been conducted on NETtalk, the research that has been done on it suggests that NETtalk might not be a good generalization after all. A mathematical theory of generalization was created and was named HERBIE (Heuristic Binary Engine). This theory was used by researchers as a benchmark for measuring the generalization efficacy of ANN for real world tasks [18].. Researchers found that the back-propagation algorithm used by NETtalk does not perform outstandingly well for the task of reading English text aloud. Researchers recommend for individuals conducting research on NETtalk to direct their research at creating systems that generalize well, as results show that NETtalk is in fact a poor generalizer [18]. And thus, even though NETtalk is slow (due to the number of trials it needs to go through before it starts to learn associations), and new learning has the possibility of overriding old knowledge (making it sometimes hard to use as psychological models of human memory), NETtalk is easy to use and can be applied to a wide range of data, explaining its popularity among researchers. The research seems to be going towards helping develop other networks similar to NETtalk. NETtalk is probably going to be used as a benchmark for future projects, and it will serve as a means to create better programs that are able to pronounce English text. However, the research does not seem to be pointing towards improving the network itself.
References
edit- ^ Pfeifer, R., & Scheier , C. (1999). Understanding intelligence. (pp. 139-177). United States of America: Massachusetts Institute of Technology
- ^ a b Clark, A. (1995). Connectionism and cognitive flexibility. In Graham, G., & Stephens, G. (Ed.), Philosophical Psychopathology. United States of America: Massachusetts Institute of Technology
- ^ a b c d Friedenberg, J. & Silverman G. (2012). Cognitive Science: An introduction to the study of mind. (pp. 187-230). United States of America: SAGE Publications
- ^ a b c Clark, A. (1993). Associative Engines. (pp. 41-86). United States of America: Massachusetts Institute of Technology
- ^ a b c Sejnowski, T., & Rosenberg C. (1986). NETtalk: A parallel network that learns to read aloud (Technical Report JHU/EEC-86/01). Baltmore, MD: John Hopkins University Press
- ^ Manfred, S. (1999). The mind within the Net. (pp. 19-38). United States of America: Massachusetts Institute of Technology
- ^ Long, D., Parks, R., & Levine, D. (1998). An introduction to neural network modeling: Merit, limitations and controversies. In Parks, R., Levine, D., Long, D. (Ed.), Fundamentals of neural network modeling. United States of America: Massachusetts Institute of Technology
- ^ a b Pfeifer, R., & Scheier , C. (1999). Understanding intelligence. (pp. 139-177). United States of America: Massachusetts Institute of Technology
- ^ Grossberg, S. (1987). Competitive learning: From interactive activation to adaptive resonance. Cognitive Science, 11, 23-63
- ^ a b French, R (2001). Catastrophic interference in connectionist networks. In Macmillan Encyclopedia of the Cognitive Sciences (pp. 611-615). London: Macmillan
- ^ Barto, A., Sutton, E. & Anderson, C. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man & Cybernetics, 13, 834-846
- ^ Bechtel, W. & Abrahamsen, A. (1991). Connectionism and the mind: An introduction to parallel processing in networks. Cambridge, MA: Basil Blackwell
- ^ a b Hirst, W., & Michael, S. (1988). Present and future memory research and its applications. In Michael, S. (Ed.), Perspectives in memory research. United States of America: Massachusetts Institute of Technology
- ^ a b Rosenberg, C., Sejnowski, T. (1986). The spacing effect on NETtalk, a massive-parallel network. Proceedings of the Eight Annual Conference of the Cognitive Science Society, (Hillside, New Jersey: Lawrence Erlbaum Associated) 72-89
- ^ Peters, S., and Richie R. 1973). “On the generative power of transformational grammars.” Information Science, 6I, 49-83
- ^ Waltz. D. L. (2003). Memory-Based Reasoning. In Arbib, M. (Ed), The handbook of brain theory and neural networks. United States of America: Massachusetts Institute of Technology
- ^ a b c Tenorio, M. F., & Tom, M. D. (1990). Adaptive networks as a model for human speech development. Computers in Human Behaviour, 6, 291-313
- ^ a b Wolpert, D. 1990. Constructing a Generalizer Superior to NETtalk via a Mathematical Theory of Generalization. Neural Networks, 3, 4445-452