Monday 14 September 2015

Six Hundred And Seventy-Three Days

The prediction made in the last blog entry was a little wide of the mark, and having been busy with other projects I didn't get round to making the small change to the software until today.

At about 4 o'clock this afternoon I shut down the polly system, at which point it was showing an up time of 673 days, 4 hours and 27 minutes.

I soon had the new version installed and I also took the opportunity to upgrade all the other software on the Raspberry Pi.

How long will this version run, before I think of another change?  Or perhaps a hardware failure or a power cut will stop it first.


Saturday 22 November 2014

No News Is Good News

Not much happening on the blog at the moment, mainly because there's nothing to report.  The system continues to run reliably, and the uptime is currently showing 377 days.

I permitted myself a small pat on the back as the uptime passed one year, but the record remains 550 days, and I don't think we're going to break that on this run because I have a couple of software improvements in mind which will require a restart to implement.

Still, this certainly shows that the Raspberry Pi is much more than just the cheap educational toy that some people have characterised it as, and is well capable of "professional" performance.  I like to think it also says something about the quality of my software.

Wednesday 28 May 2014

"Hardware" Failure

The Polly system needs to know whether it's dark or light, of course, and this is achieved by a little circuit with an ORP12 light dependent resistor measuring the light level.  This is mounted in a plastic box with a translucent lid, fixed to the inside of a window.

This week the plastic succumbed to the effects of 21 years of sunlight and burst into a shower of dust.  A true hardware failure!

With the aid of a hot glue gun I've installed the board in a new box and remounted it on the window.  I suppose I ought to make a note in my diary to replace it again in the 2030s.

The electronic and software parts of the system continue to perform well, and as I type the uptime is 199 days, and counting.

P.S. You might think that when I wrote "a little circuit" above, that seems a bit vague.  Well, to be honest I've no idea what's in there, and there are no surviving notes from 1993.  Probably a 741 op-amp as a voltage comparator, I guess.

Tuesday 7 January 2014

Update

I haven't posted anything here for some time, so I think an update is in order:

I had some problems back in October which turned out to be due to the cheapo SD card I was using:  On a couple of occasions I found the system working but reporting errors when it tried to write a file.  I ssh'd in to find the entire file system was read-only - presumably as a result of some sort of error, but I couldn't find out what the error was because, of course, it couldn't write to syslog.  A reboot restored normal service but a few days later the problem occurred again.  The third time the system refused to reboot.  I put the SD card into my main PC and it wasn't detected.  So, having already bought a new card after the first failure, I quickly installed the latest raspbian build on it, installed polly, copied all the config files from the most recent backup (Fortunately I've got daily backups running.) and normal service was resumed.

Since then the system has been much more reliable, and it's currently showing 58 days of uptime.  (The all-time record uptime was 550 days, ending in December 2001 when I accidentally unplugged the server!  This was the DOS-based version.)


Saturday 29 June 2013

Power Cut!

Forty-seven days of faultless running were brought to an abrupt halt yesterday by a power outage, when quite a large area of Liverpool was blacked out at just after 13:00.

The aftermath revealed a small flaw in the Polly-Pi system:  The Raspberry Pi has no real time clock on board.  Normally this is no problem, and ntp is used to set and maintain the correct time.  But there's a snag after a power outage:  At the time the Polly-Pi server rebooted on restoration of power, the broadband modem and the local server running ntpd were both still in the process of starting up, so no time was available.  Consequently, everything came up with the wrong clock setting, the clock was actually showing 12:18, a time about forty minutes before the outage, while the correct time was actually 14:40.  I watched Polly come up and connect to all the PICNET and ETHPIC nodes and then relaxed, not noticing the clock was wrong.  A couple of minutes later ntp service kicked in and suddenly the time was 14:42.  This led to all the communications protocol's timeouts expiring, resulting in the loss of communication with all the nodes.  Once normal timing was resumed, all the nodes came back on line, of course, and everything has been fine since then.

So, some kind of change is needed to improve this.  What to do?

1.  The obvious solution is to provide some RTC (Real Time Clock) hardware.  This would completely avoid the problem, with the late restoration of ntp service merely tweaking the time to compensate for inaccuracies in the RTC.  However, this seems like a significant effort for what should hopefully be a rare occurrence - Perhaps I can do things easier in software?

2.  How about just waiting a bit longer before starting?  Polly already has a 60 second pause at startup to allow everything in the house to stabilise after a mains failure.  If this pause had been a couple of minutes longer the problem wouldn't have arisen.

3.  Or, better, what about a more intelligent start-up pause?  Could I check for network connectivity and/or ntp status and wait until things are OK?  How long should I wait?  What if ntp never comes up (Local ntp server fails to reboot, and internet is down.)  I need to 'give up' eventually and make do with the time I've got.

Decision:  I'm going to implement option 2 immediately, and consider option 3 later.

The new build is installed and running, and I also took the opportunity to perform a load of debian updates.

Tuesday 7 May 2013

Local syslog using rsyslogd

Phase one of the new logging system is to use the local syslog daemon.  This turned out to be pretty easy to set up, as glibc has the appropriate functions, so the line

syslog(LOG_MAKEPRI(LOG_LOCAL0, LOG_NOTICE), "Some message");

was all that was needed, and suddenly the log reports from Polly were appearing in the syslog file at /var/log/syslog

That was easy, wasn't it.  But I want to put the log messages in a separate file (or two) so now I need to learn how to configure rsyslogd in order to filter them out.  A brief study of the interweb and I'd added the command

local0.*    /var/log/polly.log

into rsyslogd's configs.  But it didn't work.  I fiddled.  I tweaked.  I learned how to validate the config by typing rsyslogd -N1  but no joy.  I tried other syntaxes (Just to confuse matters, rsyslogd accepts configuration commands in three totally different formats, and you are free to mix them in the same configuration file.)

if $syslogfacility-text == 'local0' then /var/log/polly.log

But my new log file remained steadfastly empty.

Eventually I spotted a log message, hidden in auth.log, complaining I was providing an "unknown facility/priority: 405".  Aha, does that tell me anything?

The facility/priority number in syslog is defined as priority plus 8 * facility, and that's what LOG_MAKEPRI is supposed to be doing for me.  Facility LOCAL0 is 16, and priority NOTICE is 5, so the combined value should be 133 (decimal).  A couple of printf lines soon revealed that LOG_MAKEPRI(LOG_LOCAL0, LOG_NOTICE) was returning 1029 (0x405).  It was soon obvious that the LOG_LOCAL0 macro had already got the multiplication by 8 included.  I changed the call to syslog to

syslog( (LOG_LOCAL0 | LOG_NOTICE), "message");

and my filter was suddenly working.

As far as I can see this is a bug in the glibc documentation.  Hmph!  Anyone know how I report that?  Before doing so, I downloaded the latest development source for glibc and they've already fixed it!  Personally I would have altered the documentation to match the code, but there you are.

Anyway, I was now able to create a proper rsyslogd configuration file, located in /etc/rsyslog.d/polly.conf containing

if $syslogfacility-text == 'local0' and ($msg startswith ' ETHPIC') then /var/log/polly-ethpic.log
if $syslogfacility-text == 'local0' and not ($msg startswith ' ETHPIC') then /var/log/polly.log

# Don't pass it on
local0.*    ~

This splits out the rather busy ETHPIC log entries into a different file.  The last line discards all LOCAL0 messages, so they don't get passed on to the later rule which puts everything into the messages file.

I am still weighing up the pros and cons of phase two of the syslog project.  This would involve reconfiguring rsyslogd to pass the log messages to another server, with the aim of reducing wear and tear on the Raspberry Pi's SD card.  Qustions which need to be considered include
  • Is SD card wear a problem?
  • What happens if the logging system is down?



Monday 6 May 2013

Result!

Well, my guess turned out to be correct, and the temporary version of the Polly server with no log files is responding instantly to all button presses and commands.  All I need to do now is create some kind of non-blocking log system.  That sounds like an awful lot of coding, but wait, someone's already been there:  Unix's syslog system already has the capability to log to a remote system and would appear to do all I need.  So all that remains is to find out how to drive it...