Measuring Core, WiFi and Cloud Uptime

I wanted to know how stable the WiFi and Particle Cloud connectivity was from my Core’s point of view.

My first attempt, was to write an “Alive” event which is published by my Core soon after it powers:

  • An example event is “AL:V44:1438229397:Thu Jul 30 16:09:57 2015”
  • The event contains “AL:V”, FirmwareCodeVersion, Time.now(), Time.timeStr()
  • In practice, the “Alive” event is not always visible e.g. If the house loses power, then when the power comes back on, the Core is very likely to complete powering up before the house WiFi router powers up and connects to the internet, and so the published event does not reach the Particle Cloud.
  • If the “Alive” event is somehow lost (e.g. the WiFi is down, the Particle Cloud is down, or my Cloud is down when the event is published) then it is gone forever - as is all evidence of reboot of my Core.
  • Overall this event was not as useful as I thought it would be.

My second attempt, was to write an “AnalogReadings” event which is published by my Core regularly:

  • An example event is “AR:2562,0,2518,6,1,384,16,1347,160,145,145,0”
  • The event contains “AR:”, A0Reading, A1Reading, A2Reading, …, A7Reading, DigitalPinState, Sundry data
  • The event is published every 15 seconds (assuming the WiFi and Particle Cloud are up).
  • I store the event in a database every 15 seconds (assuming the WiFi, Particle Cloud and my cloud are up).
  • If the Core, the WiFi router, the Particle Cloud or my cloud go down then there are “gaps” in the sequence of AR events. But I don’t know which link in the chain broke causing the gap in the data.
  • While I find this event very useful for other purposes, it didn’t help me understand the Core’s view of the world as much as I had expected. I really wanted to known what fraction of the time the Core can send events to the Particle Cloud. And I want to do this without publishing lots of events.

My third attempt, the “Uptime” event, worked best:

  • An example event is “UT:56845,3600,3600,3600,44”
  • The event contains “UT:”, is AliveTicks, SpanTicks, WiFiTicks, CloudTicks, FirmwareCodeVersion where:
  • Alive Ticks = Number of ticks since Core powered up. A tick is approximately a second.
  • Span Ticks = Number of ticks since the last published UT event. Normally Span Ticks = 3600 = 1 hour
  • WiFi Ticks = Number of ticks that WiFi was available since last published event. WiFi Ticks <= Span Ticks
  • Cloud Ticks = Number of ticks that Particle Cloud was available since last published event. Cloud Ticks <= Span Ticks
  • The Core publishes the event about once an hour.
  • Under perfect conditions Span Ticks = WiFi Ticks = Cloud Ticks = 3600 (approx one hour)
  • But the publishing of the event will be delayed if WiFi or Cloud are down at the end of the hour. In this case Span Ticks will be greater than 3600

I use these Uptime events to generate a Node JS page - I find this page very useful:

The above screen shot comes from a site where the WiFi is not reliable. Points to note:

  • The Cloud Ticks values are about the same as the WiFi Ticks values. This means that when the WiFi is available, the Particle Cloud is almost always available.
  • Even for entries where the Span Ticks is exactly 3600, the duration between the “Created At” on two successive lines is a few seconds over the expected one hour. Not certain how to interpret this. I assume this is Core overhead.
  • If Span ticks is > 3600 then the WiFi was unreliable. The Core published the event as soon as the WiFi came back up and the Particle Cloud was available.
  • If my cloud goes down, I might miss an Uptime event, but an hour later I get another event.
  • If the Alive Ticks drops between successive line (e,g, in the above picture the drop from 81751 to 3635) then either the Core was powered off/on or the Core firmware was reflashed between the two lines.

The rest of this post describes how I generate, publish, listen to, summarise and display these Uptime events. I am using Heroku, Node JS and MySQL, so the code fragments relate to those tools.

The firmware code that my Core (and my Photons) uses to generate Uptime events is:

class EvaluateConnectivity {
  
public:
    const String firmwareVersion = "45";
     
    // Variables to measure the health of the Core over time. A tick is approximately a second.
    // Published approximately once an hour. But will be delayed if WiFi or Cloud are down - in which case theSpanTicks exceeds 3600
    int aliveTicks = 0;     // Number of ticks since Core powered up
    int theSpanTicks = 0;   // Number of ticks since last published event. Normally theSpanTicks <= 3600
    int theWiFiTicks = 0;   // Number of ticks that WiFi was available since last published event. theWiFiTicks <= theSpanTicks
    int theCloudTicks = 0;  // Number of ticks that Particle Cloud was available since last published event. theCloudTicks <= theSpanTicks
     
    // The interval between successive 'processing' ticks
    const unsigned long msTickInterval = 1000;
    // The time at which we will do our next 'processing' tick
    unsigned long msNextTick = 0;
    void init( )
    {
        // When is our next 'processing' tick?
        msNextTick = millis() + msTickInterval;
        // NOTE: millis() returns the number of milliseconds since the Particle Core began running the current program.
        // NOTE: millis() does NOT rely on the Core having connected to the Particle Cloud.    
    }
     
    // Is it time for the next 'processing' tick?
    boolean processTickNow( )
    {
        if( millis() < msNextTick )
            return false;
    
        // When is the next tick?
        msNextTick = millis() + msTickInterval;
        return true;
    }       
     
   // Record and publish how long we have been alive and how good our connectivity is.
    void incrementAndPublishUptimeData( )
    {
        aliveTicks++;
        theSpanTicks++;
         
        if( WiFi.ready( ) )
            theWiFiTicks++;
         
        if( Spark.connected( ) )
        {
            theCloudTicks++;
            if( theSpanTicks >= 60 * 60 )
            {
                // Tell the Cloud our view of the reliability of the connection
                Spark.publish( "Status",
                    "UT:" +
                    String( aliveTicks ) + "," +
                    String( theSpanTicks ) + "," +
                    String( theWiFiTicks ) + "," +
                    String( theCloudTicks ) + "," +
                    firmwareVersion,
                    60, PRIVATE );
     
                theSpanTicks = 0;
                theWiFiTicks = 0;
                theCloudTicks = 0;
            }
        }
    }
};
 
EvaluateConnectivity EC;
void setup() {
    // Initialise the object
    EC.init( );
}
void loop() {
     
    // Is it time for the next 'processing' tick?
    if( EC.processTickNow( ) )
        EC.incrementAndPublishUptimeData( );
    // Place other processing code here
}

I capture the UT events and store them in the MySQL database table core_events_ut defined as follows:

CREATE TABLE `core_events_ut` (
  `row_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `the_core_id` varchar(25) unsigned NOT NULL,
  `alive_ticks` int(10) unsigned NOT NULL,
  `span_ticks` int(11) unsigned NOT NULL,
  `wifi_ticks` int(11) unsigned NOT NULL,
  `cloud_ticks` int(11) unsigned NOT NULL,
  `published_at` datetime NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`row_id`),
  UNIQUE KEY `row_id_UNIQUE` (`row_id`),
  UNIQUE KEY `CoreTimeUnique` (`the_core_id`,`published_at`)
);

My cloud listens to the event and inserts it into core_events_ut using this MySQL stored procedure:

CREATE DEFINER=`DevUser`@`%` PROCEDURE `insert_core_event`( the_data varchar(63), the_published_at datetime, the_core_id varchar(25) )
BEGIN
  declare the_data_extn varchar(69);
 
  if( SUBSTRING(the_data,1,3) = 'UT:' )then
 
      /* Peel off the "UT:" from the data */
      SET @the_data_extn = ( SELECT SUBSTRING(the_data,4) ) );
 
      /* Insert the event into the uptime event table.
       * Store information about the WiFi and Spark Cloud connectivity.
       * The "ignore" keyword which makes MySql ignore inserts where duplicate key is found.
       */
      insert ignore into core_events_ut (core_id, alive_ticks, span_ticks, wifi_ticks, cloud_ticks, firmware_version, published_at )
        values (
            @the_core_id,
            SUBSTRING_INDEX(@the_data_extn, ',', 1),
            SUBSTRING_INDEX(SUBSTRING_INDEX(@the_data_extn, ',', 2), ',', -1 ),
            SUBSTRING_INDEX(SUBSTRING_INDEX(@the_data_extn, ',', 3), ',', -1 ),
            SUBSTRING_INDEX(SUBSTRING_INDEX(@the_data_extn, ',', 4), ',', -1 ),
            SUBSTRING_INDEX(SUBSTRING_INDEX(@the_data_extn, ',', 5), ',', -1 ),
            the_published_at);    
  end if;
 
END

To read the data collected above, I use the MySQL stored procedure recent_core_uptime:

CREATE DEFINER=`DevUser`@`%` PROCEDURE `recent_core_uptime`( the_core_id varchar(25) )
BEGIN
     
    SELECT *
    FROM core_events_ut
    WHERE core_id= the_core_id
    and created_at > DATE_ADD( UTC_TIMESTAMP( ), INTERVAL -7 DAY )
    order by row_id desc;
   
END

I use Node JS to display the data returned by the stored procedure recent_core_uptime:

//Display core uptime events
exports.core_recent_uptime = function(req, res)
{
    var the_page_title = 'Recent Core Uptime Events';
     
    global.mysqlPool.getConnection(function(err,connection) {
        if(err){
            console.error( "Error:Failed to get database connection" );
        }
        else {
       
            var the_core_id = req.params.core_id;
            var call_proc = "call recent_core_uptime( " + connection.escape(the_core_id) + " )";
            // console.log(call_proc);
         
            connection.query( call_proc, function(err,the_data) {
                if(err) {
                    console.error("Error:Calling recent_core_uptime", err );
                }
         
                // Check that we have some sensible data to display
                if( the_data && the_data[0] )            
                    res.render('core_recent_uptime',{
                        page_title:the_page_title,
                        data:the_data[0]});
                else
                    console.error("Error:No core data to display" );
                   
                connection.release( );
            });
        }
    });
}

The Node JS code uses the resource (layout) core_recent_uptime.ejs:

<html>
<body>
 
<table border="1" cellpadding="7" cellspacing="7">
<tr>
<th>Created at</th>
<th>Alive ticks</th>
<th>Span ticks</th>
<th>WiFi ticks</th>
<th>Cloud ticks</th>
</tr>
   
<% if(data.length){ for(var i = 0;i < data.length;i++) { %>
  
<tr>
<td><%=data[i].created_at%></td>
<td><%=data[i].alive_ticks%></td>
<td><%=data[i].span_ticks%></td>
<td><%=data[i].wifi_ticks%></td>
<td><%=data[i].cloud_ticks%></td>
</tr>
<% }  }else{ %>
  
<tr>
<td colspan="5">No activity found</td>
</tr>
<% } %>
  
</table>
 
 
</body>
</html>

And that’s it, end to end. I hope you find this useful.

3 Likes