Re: VFDs, Noise, and RS-485



Hi Charlie,

Charlie Edmondson wrote:
Terry Given wrote:



hear hear. I've had some hilarious hardware/software arguments too - me and the s/w guy basically saying "its your fault". Usually its s/w, perhaps 75% of the time - I suspect because its so easy to be careless with s/w, and so hard to spot (not looking is also the most common technique used for s/w peer reviews and testing).



Cheers Terry


I have been in some 'interesting' sw/hw fights myself! I was working tool roads a few years back, and they had an interesting 'hardware' problem. Every weekend, around 2-3 in the morning, the system went crazy. It started a network cascade and every toll system on the road went down. It was 'obviously' a hardware fault, so they called in some network experts to solve the problem. They put a network sniffer on the system, and watched the traffic...

Sure enough, early sunday morning, there was nothing, nothing, nothing, and then a huge cascade! What was the cascade? I bunch of 'Where are you?' messages! They programmers had built in a "If I don't hear from a system for 'x' minutes, check to see if they are still there..." routine. They had NOT built in the stuff to QUIT asking once it got an answer...

Charlie

Beautiful. And fits nicely with my theory that many programmers never test anything, and seldom if ever think about failure modes. I bet you've seen numerous systems hang when no response is received...


the funniest one I have had was a drive we designed using an existing micro & software, with all new hardware. We re-arranged all the ports, and vandalised the software to suit. All worked spiffingly well, until about 3 days after the product launch, when a customer complained to the service dept his drive kept tripping when he used the 4-20mA input. After the inevitable argument with Mark, I did some tests and found that at about 10mA the drive tripped. I configured the input as +/-10V, same problem at about 0V. It happened when the uC ADC pin reached about 2V or so, coincidentally the logic threshold. So I looked at the original product, where that pin was the emergency stop switch input. Armed with this ammo I re-started and easily won the argument. Mark went and had a nosey, and came back a few minutes later looking sheepish, with the problem solved. He'd re-written the I/O code, but had forgotten to remove the old E-stop code (all the rest was gone). The 80C196 was set to use that pin as an ADC, but the digital input SFR still worked :)

(best not to think about the 3 goes I had my tech have at getting a LED to light up a few weeks later)


the worst one was when a programmer set up a command to shut down the comms link, and save that state to EEPROM so it wouldnt turn back on again. The unit is gooped and screwed into an IP68 box with only the comms link & power coming out, and to get it to talk again the lid had to be removed (12 screws) so we could access the diagnostic serial port. That wouldnt be so bad, but another programmer was upgrading a customers screen (1000 units) and sent out a (untested) broadcast "shut up forever" command. Luckily the broadcast didnt work so well, and he only "killed" a few hundred modules. But a guy in a climbing harness had to undo a thousand screws to remove the crippled modules, then a few thousand more to revive them on the ground, and of course put it all back together again.


We made ourselves look like complete idiots to the customer so much for the "image quality upgrade". We had the screen back and running that day, which kept him happy, but if it had been a game day we would have been in the ***, it took hours to fix. When asked why he saved the state to EEPROM, the programmers reply was he had a choice (to save or not to save) and "just" chose to save it. The server was updated that day (*about 15 minutes after we tracked down the root cause) to never allow a broadcast of that command - until then it was transparent, you type it, it does it. The firmware was also hurriedly modified to remove the built-in self-distruct command :). And the programmers got an abrupt lesson in "thou shalt not *** with the customers equipment"


Cheers Terry .