AI - playground or industrial tool?

This is a question that is often discussed at the moment. The answer, as so often, lies in the eye of the beholder. Because every user has different expectations of what a technology must achieve or bring to the table in order to be recognized and ultimately used. In any case, the necessary hardware for productive and efficient use is available. Many manufacturers of machine vision hardware have recognized this. The range of AI platforms in different performance classes is therefore growing steadily. But there are still teething problems. Providing hardware alone is not enough - a change in thinking is necessary!

What is AI missing?

It doesn't help that AI or machine learning (ML) works quite differently from rule-based image processing and thus the approach and processing of vision tasks is also different. The quality of the results is also no longer the product of manually developed programme code, but is determined by the learning process with suitable image data. What sounds so simple only leads to the desired goal with sufficient expertise and experience. Without a trained eye for the right data, errors often occur, which in turn leads to the incorrect application of ML methods. Tests have shown that different users achieve very different training qualities of the artificial neural networks (ANN) for the same task, because in some cases images with too much unimportant content, poor exposure, blurriness or even wrong labels were used for training.

The key competences for working with ML methods are no longer the same as for rule-based image processing and must therefore be built up specifically. Those who now have the time and resources to work, test and play with it are gaining that experience and know the pitfalls. That's probably why more new and young companies are working with it at the moment. They have no legacy, are not bound to existing processes and, in doing so, sometimes playfully and with a lot of exploratory drive, venture into tasks where classical image processing has so far remained without a solution. But as long as the big players do not yet introduce the new technology to the customer across the board and make a strong case for it, there is a lack of knowledge and trust - also on the part of the customer. To get the "old hands" out of their comfort zone, something has to change. This is because AI is facing an established system, for which suitable environmental conditions have been created in recent years. Knowledge, documentation, training, hardware, software, development environments, customer acceptance and demand had a long time to mature. AI, on the other hand, still comes across as very raw and puristic. Those who master it earn admiration and recognition - but also questioning faces and incomprehension.

Another important aspect for the future is new target groups. Dr.-Ing. Peter Ebert, editor-in-chief of the trade journal inVISION and an expert on the scene, says: "The vision community of the future is not only made up of classic image processing experts, but is also getting growth from the IoT sector." New user groups inevitably result in different use cases and requirements for the use of existing technologies. The classic programming SDK is not always sufficient. Old rules have to be broken!

Software as a pioneer

There is no lack of suitable hardware! Efficiently working AI accelerators make the application possibilities of ML particularly interesting for small, low-power embedded vision systems or fully integrated inference camera platforms, as they are already offered by individual manufacturers. But that alone does not solve the teething problems of the new technology in the industry. The AI must be tested, validated, re-trained and finally integrated into a productive workflow - the application. But who does that? Who can? All these are actually the same recurring tasks. But this requires other competences that go beyond prototype development. This usually still requires a system programmer who is able to programme these tools for a specific platform.

IDS is taking a different approach with the IDS NXT platform and is confident: With the right, coordinated tools, every user group can fully exploit the potential of an AI vision platform without spending a lot of time and money building up new core competencies that are not actually absolutely necessary for implementing their own AI applications. What does that mean? Specialized knowledge for training neural networks and programming your own applications can be packed into the tools for many simple AI workflows. In this way, every user can implement his or her individual requirements without having to build up his or her own team of specialists. The software enables each user group to use the tools that are appropriate for their respective tasks and working methods.

An image processing application is the framework that brings several specific individual tasks into an application-specific sequence and is intended to execute them error-free and efficiently. Classically, this task is programmed by developers in a programme code suitable for the platform, such as C++. The situation is not so different for the new AI platforms coming onto the market. An SDK (Software Development Kit) provides the necessary software interfaces for hardware-related programming of the respective platform together with the existing AI accelerators. In most cases, this largely provides application developers with a freely programmable platform for their own process solutions. For those who know what they are doing, only hardware performance and SDKs are a limit to creativity. The IDS NXT inference camera is also an open platform for developers who want to design their own image processing applications in the form of vision apps with the AI accelerator deep ocean, thanks to a corresponding SDK with many C++ source code examples.

Application Wizard

However, the majority of image processing applications work with relatively simple processes. Capture image → Analyze image or extract features (image processing) → Make process decisions → Initiate action. This can be simple recognition and classification of products, with subsequent signaling or forwarding of information via various interfaces for a machine control or sorting system. These are basic functionalities that only differ in a few details and therefore would not have to be reprogrammed each time. However, the selection of a Deep Learning (DL) use case, such as "classification" or "object detection", is often already too abstract as an entry point for a project to be able to derive the further necessary action steps for data acquisition and Vision App configuration.

IDS is therefore taking the path of making AI vision accessible and easy to use for the masses with IDS NXT inference cameras. App creation should be possible for any user group, whether programmer, image processing professional or machine operator and skilled worker. To this end, the cloud-based ANN (Artificial Neural Network) training software IDS lighthouse will be expanded with the next update to include a wizard that is even more strongly oriented to the user's actual problem and supports him or her with appropriate instructions for action. The Wizard easily extends the radius of action of the target group and now accompanies all individual tasks of a machine vision application. Starting with the question "What do you want to do?", IDS lighthouse provides a selection of very application-oriented problems, such as "Count objects", "Check presence or absence" or "Check inspection points". The assistant selects the app base with the appropriate DL use case in the background and suggests further actions to the user in order to gather the necessary information. In addition, useful tips, videos or instructions are offered to provide the user with the necessary background knowledge. Such a "guided application creation" suggests more of a tutorial than of classic app development. In the end, a fully customized vision app is available for download, which the user only needs to activate and launch on an IDS NXT camera.

"Puzzling" instead of programming

If you want to create more complex processes, you do not necessarily have to resort to C++ or other text-based programming languages. If the function library is packaged in visual blocks and a visual editor is added, processes can be put together like pieces of a puzzle without having to worry about the exact commands of the individual programming languages. Blockly is a project by Google that was created precisely for this purpose. IDS uses Blockly and adapts its own functionalities in order to be able to put together inference tasks of the camera in a kind of construction kit in arbitrarily complex sequences.

With Blockly, multi-stage examinations with several ANNs can also be very easily plugged together in one programme sequence.
With Blockly, multi-stage examinations with several ANNs can also be very easily plugged together in one programme sequence.

Thanks to the intuitive user interface of the Blockly Editor, even beginners and laymen quickly achieve success. The advantage of visual programming with this modular system over the application wizard is the possibility to create your own sequences. In this way, variables, parameters and AI results can be easily linked through logical links with mathematical calculations and conditional if/else statements or repetitive actions using loops. This also enables more complex workflows with two-stage inspection of objects and multiple neural networks. For example, an object recognizer provides basic pre-sorting of different parts, followed by detailed defect analysis by a second classifier to categorize parts even more specifically. Such processes are otherwise only possible with the VAC (Vision App Creator) and programming knowledge in C++.

Another advantage of puzzled apps is their very dynamic use. Similar to Python, Vision App programming with Blockly allows the possibility to execute "code" directly, as no complex cross-compiling is necessary. An application created in IDS lighthouse can easily be further programmed interactively after initial tests in the camera - directly in the camera! Vision Apps can also be designed directly there. This makes this visual app editor the ideal tool from the test and trial phase to operational use.

From a fully automated application configurator, to a vision app construction kit with an intuitive visual interface, to completely free programming with a classic SDK, IDS NXT offers the right tool for every level of knowledge. This saves time and costs when commissioning and setting up individual image processing applications with artificial intelligence.

Explainable results

Despite the well-known advantages of seeing AI and the high accuracy of the ANN used, diagnosis in the event of a fault is often difficult. Lack of insight into the way it works or inexplicable results are the other side of the coin, inhibiting the spread of algorithms. Commonly, ANNs are often wrongly perceived as a black boxes whose decisions are not comprehensible. "Although DL models are undoubtedly complex, they are not black boxes. In fact, it would be more accurate to call them glass boxes, because we can literally look inside and see what each component is doing." [Quote from "The black box metaphor in machine learning"]. Inference decisions of neural networks are not based on classical logical rules and the complex interactions of their artificial neurons may not be easily understandable to humans, but they are nevertheless results of a mathematical system and thus reproducible and analyzable. We just lack the right tools to support us. It is precisely in this area of AI that there is still a lot of room for improvement. And it is precisely here that it becomes apparent how well the various AI systems on the market can support the user in his endeavor.

IDS researches and works in this field together with institutes and universities to develop precisely these tools. The IDS NXT Experience Kit software already contains the results of this collaboration. A visualization in the form of so-called attention maps (heat maps) makes it easier to understand critical decisions of the AI in order to ultimately increase the acceptance of neural networks in the industrial environment. It can also be used to recognize and avoid trained data biases (see figure "Attention Maps"). Statistical analyses using a so-called confusion matrix will also soon be possible both in the cloud-based training software IDS lighthouse and in the IDS NXT camera itself, in order to be able to determine and understand the quality of the trained ANN more easily. With the help of these software tools, users can more directly trace the behavior and results of their IDS NXT AI back to weaknesses within the training data set and specifically correct them. This makes AI explainable and comprehensible for everyone.

Attention Maps
Attention Maps

Suitable for industrial use as a complete package

The enormous potential of artificial intelligence is beyond question. Available hardware in the form of inference cameras with AI accelerators also show how efficiently it can already be used. But the provision of hardware alone is not enough to infect the industry with AI across the board. Manufacturers are challenged to support users by sharing their expertise in the form of user-friendly software and integrated processes. Compared to the best practices, which have matured over years and built up a loyal customer base with a lot of documentation, knowledge transfer and many software tools, there is still a lot of catching up to do for AI, but it is already under construction. Standards and certifications are also currently being worked on to further increase acceptance and understanding and to bring AI to the big table. Ultimately, everyone should familiarize themselves with the new technology so as not to miss the boat. IDS helps with this. With IDS NXT Experience Kit, an embedded AI system is already available that can be quickly and easily operated as an industrial tool and used beneficially by any user group with many user-friendly software tools - even without in-depth knowledge of machine learning, image processing or application programming.