Over the last decade improvements in all areas of optical and infrared sensor technology, laser generation and control technologies, and most areas of electro-optics have pushed the need for better control, detection, and recognition technologies. One such promising emerging technology is artificial neural networking.
The first neural networking attempts date back to the mid-1960s, with some software products available since the late 1970s, and now many companies are making products available. A neural network is a collection of "cells" of information called neurons where each neuron receives and then contains certain information. When presented with a problem, each neuron can respond that it recognizes the information, or if it has never learned any information, can take that opportunity to learn some part of the problem. In more sophisticated algorithms, the neuron can even refine information that it has already learned.
To work correctly, each neuron must be able to "talk" to all other neurons to share its information to make recognition decisions because each neuron will only "know" part of the problem. The more complex the problem, the more neurons needed.
Artificial neural networks as implemented in software, however, have been disappointing. Although software neural networks have become more practical as computer processor speeds and memory have increased, there are fundamental reasons why they have not met real-time applications needs—processor-speed limitations, interneuron associations, and having to program complex recognition algorithms. Simply put, even a 200-MHz Pentium processor is still at least two orders of magnitude too slow to address real-time problems such as video, voice, and most signal-recognition applications. And when the size of the neural network is increased, the network slows even more. One solution is the use of parallel processors to multiply the computing power of the processors. However, this then exacerbates the problem of interneuron associations. Thus, using conventional computers with the software creates a "vicious circle" of solutions where solving one problem only aggravates another problem.
In 1993, Neuroptics Technologies (NTI, Santa Rosa, CA) began development of a silicon-based solution, addressing speed, the ability to increase the neural network size with no decrease in speed, algorithms maximizing performance, and adaptive learning to eliminate problem characterization and complex software programs. In 1994, NTI entered into a partnership with IBM (Paris, France) to develop the zero instruction computer (ZISC), which NTI calls programmable adaptive learning memory (PALM). The PALM is a 144-pin surface-mount semiconductor chip furnished on boards containing one to 16 of the devices. The boards, in turn, can be linked to up to nine other boards, increasing the neural network size with no decrease in recognition speed and with full interneuron association.
There are currently two silicon-based neural network devices commercially available. The PALM-36 is a digital implementation with 36 neurons and 2304 synapses in each chip. Operating at 20 MHz, it compares to biological systems. An analog implementation is the Intel ETANN 80170 having 64 neurons and 10,240 synapse weights. The digital solution theoretically offers neural network expansion without limit with no decrease in recognition speed. Whether a single chip, a full-board configuration, or 10 linked boards with a network of 5760 neurons are implemented, the recognition time stays constant at approximately 30 µs because each neuroprocessor and its timing is independent of the others.
Instructionless learning capability of digital neural networks reduces the engineering analysis required to do characterization such as event detection and allows adaptive learning of variations of an event or feature. One such learning method is radial basis function (RBF), a compound classifier that permits the neural network to automatically shrink its recognition criteria as it is presented with events that could otherwise be characterized as the event or feature of interest.
The basic method of detection is done by histogram analysis of the entire field of regard, whether that is a screen of video imagery monitoring a signal over a period of time to detect an event or is the measurement and classification of a response. This method is unique in that once the network has been trained to look for a certain response or pattern, it can then measure the entire field of regard and produce a "distance" measurement from the prototype data (learned data) stored as a vector.
There may be more than one candidate event or pattern, each of which will produce a distance measurement output. This permits not just a simple digital-type ("1" or "0") deterministic output. The user is able to clarify a response to an event or pattern as either target or background information, which can then further refine the network's learning and accuracy.
One application ideal for neural computing is to learn and recognize the generalized section of an image such as part of a face. The technique is to extract features of a fixed-size component of the face, an eye, for instance, and retrieve this part in a sequence of images undergoing limited translation, scale, and rotation effects. Basically, the network is shown what face component is desired by dragging a square 16 × 16 gate to the region of interest in an image.
Thereafter, the program will extract salient features from this small pixel matrix (see Fig. 1). These key features are a gray-level distribution histogram and vertical and horizontal gray-level accumulations, with each of the three allocated a separate neural network. But these features can be very weak or indistinct and may be similar to other features as well.
Learning occurs by showing the three networks the features set describing the object in the pointing window. If this object is unknown then the network will automatically commit a new neuron that will actually "record" the target feature vector. In this case there are three different feature vectors. Every feature vector will be learned by a different logical neural network.
When a new image is presented to the software, the pointing window will scan the full image by an increment of four pixels in both vertical and horizontal direction. Features will be extracted for every location by the software, and recognition will be performed by feeding each feature vector to the corresponding network. If all three networks evaluate a given similarity with the learned feature set, then a "hypothesis" will be generated. After the entire screen has been evaluated, all hypotheses will be evaluated, and the hypothesis with the closest distance to the prototype will be the best guess as the recognized position.
Other applicationsDigital neural networking is able to deal with almost any kind of vector classification; any signal and data recognition application is straightforward because the technology decouples the application solution from complex software and associated technologies. Board systems have been used in predictive maintenance applications, endpoint and event detection, and video applications (see Fig. 2).
In addition to the histogram algorithms used for the noted PALM applications, there are other emerging feature-extraction methods such as wavelet transform (see Laser Focus World, Dec. 1995, p. 155). With recently announced wavelet processors, it will be possible to run applications at much higher speeds. Other technology growth areas may be image frame-grabber boards to enhance applications in real-time image recognition and tracking. Neural networking relieves computers from performing tasks at which they are less efficient and frees programmers and engineers to better understand and solve the higher-level nature of the tasks they face.