Being intimately involved in industrial and laboratory automation for the better part of 15 years I have come to understand a few undeniable truths. A good developer can create well behaved software systems if he or she does a little bit of planning and adheres to that plan. The problem with PC based automation control systems is software is only half of the equation. These systems also have one or more real world components (sensors, cameras, motors, solenoids, escapements) that require software control. This all seems relatively strait forward in theory but a number of things tend to occur during the development of such systems that one might not always expect. I will try to break these issues down into the two categories timing and window of opportunity.
Timing
Anyone that has ever created a software system that communicates to a device understands the complexities of timing as they relate to software execution, threading, and serial and Ethernet communication. Lets start with the software. For one thing, not all operating systems are created equal. Windows in all its flavors has what is called a round-robin thread priority scheduler (real time threads). This means that you never really know when a particular thread will receive CPU time, therefore any attempt at real time manipulation of external devices is basically a crap shoot. Fortunately, for most purposes the speed of execution and availability of multiple CPU's makes this issue moot. In most cases if you have a margin of error of more than 100ms you are probably OK. Anything less than this and you risk missing a critical real world interaction in the system. Linux, on the other hand has two real time scheduling modes. Linux allows round and FIFO so threads are less likely to be preempted except by higher priority threads. This brings up another important distinction. Linux has 99 separate real time thread priority levels where Windows has only 16. This leads us into a discussion about operating system determinism (a very hotly debated topic in control engineering). Suffice to say that no operating system is completely deterministic; however, a system can be designed to deterministically satisfy the operating requirements of a system. Determinism can also and does apply to communications methods and protocols. It is measured in much the same way as operating system determinism as it can never be completely deterministic but can have a known and predictable rate of transfer. As many of you might have guessed this means that a system that uses an operating system and communication method are by there very nature non-deterministic. Especially when those systems run software that dynamically creates processes and threads.
Window of Opportunity
All this talk about timing and determinism does not bode well for our PC based control system. In fact, this type of control does have some limitations related to timing. The real problem is that when software is used to control some device and that device has to manipulate some material or part within a fixed time frame we need to ensure that the combination of the software execution, the communication, and the action happens within our window of opportunity. If the system cannot reliably and repeatably do this we simply have to devise another way.
So Lets Look at an example:
Say we have a system that sorts bolts by size. We can determine the size of the bolt as they pass by on a conveyor using a high speed vision system. As the bolts pass by the system determines what size (A or B) the bolt is. A subsequent process will divert only the B size bolts onto another conveyor using a pusher. We have no more than 3 seconds between the inspection and diversion. The inspection produces a result in 1 second so that leaves us two seconds. This seems like a lot of time in cyberspace and it is. The twist here is that we do not know that only one bolt or result will be in queue at any given time. So now we have to introduce a sensor into our system to detect the bolt presence, read the result queue, and actuate the pusher. Say we have two bolts in queue with a space of only .4 seconds. Now our system is put to the test because we have approximately 100ms to read the sensor, 100ms to read the queue and 200ms to activate the pusher. I give the lion share of the time to the pusher because it actually has to move. Not only that but it has to reset its position from the previous bolt. In this case I would bet money that a windows OS would be very unreliable. For one, we are using polling to read the sensor which takes time and windows may not give our polling thread priority in this case so our thread stalls for 30ms. OOPS! we missed the bolt. I have seen this very thing happen. In fact, the only definitive way to remedy this situation is to augment the system with a PLC (Programmable Logic Controller) and port as much of the logic as possible to ladder. This is not the end of the world mind you, but it can add to overall project budgets.
Learning by Bad Examples
So what did I learn from all this. Timing really is everything and sometimes you can write bullet proof software and still come up short. In the world of real time computing as it applies to control engineering you have to pay close attention to the requirements of the system and the capability and responsiveness of each system component. I have also learned to continue to develop my knowledge of real time scheduling algorithms, thread management, and synchronization as these concepts can help optimize system timings.