Cloud + Machine-to-Machine = Disruption of Things: Part 2

Big Data Processing

Having data stored and available in the cloud makes it far easier to analyze. Distributing data on a sensor-by-sensor basis means simpler access for individual nodes or collections of nodes. Data can be distributed quickly to different entities much like the way Facebook photos or Twitter posts are distributed to each friend or follower. Flatter data formats means applications can be simpler and take less to develop. But big challenges arise when trying to analyze a massively growing data streams.

Given the large amount of incoming data and the wide range of queries on the data, significant effort needs to go into parsing and processing it so that queries are as responsive as possible. This means use cases for managing data that include optimizing near-term storage for immediate queries, setting up frameworks for analytics across massive data sets and deciding on what data to archive and in what raw, processed and derivative forms.

With Plaster Networks, for example, Appoxy keeps near-term data available for queries on the current performance of adapters. Appoxy also structured the data model and data flow to quickly respond to queries based on common time frames – performance for the last day, week, month, for example – and node groupings (adapter to adapter and adapter to device). Detecting and flagging unusual performance within a network is also an inherent M2M pattern architecture.

In addition to near-term data handling challenges comes issues with analyzing large data sets. An example is running queries extending across an entire set of nodes in a system to obtain insight on aggregate behavior and device performance. There could be thousands of sensors each with hundred of thousands of data points. This is no unlike data processing queries within social network sites or other large consumer websites with millions and hundreds of millions of users.

Just as NoSQL storage options are largely derived from consumer Web needs, big-data data architectures and analytical methods are also making their way from these same efforts. Hadoop, for example, is a set of frameworks used for distributed data analysis. One of its projects, MapReduce, is a basis approach pioneered by Google to process data across many clusters and then propagate the findings up from the nodes.

Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.

A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.

Just as smart phone service companies and cloud industries have formed, overtaking established mobile and IT leaders, new M2M companies will arise to supply important cloud-based components to this emerging set of smart-enabled devices, vehicles and machines.

Using these big data approaches, large sets of M2M data can be analyzed across an elastic and scalable cloud infrastructure. The virtual nature of the cloud along with distributed data models means that jobs can be queued up by the thousands and run in parallel if needed.

Once results have been reduced and then answers assembled, they’ll need effective visualization tools in order to have meaning. This requires charts, graphs, tables and maps and might require a similar level of optimization as handling the analysis at the start of the process. The good part here is that the elastic nature of the cloud processing and the flexibility and agility of cloud development means that these interfaces can be created and extended and control options added in the same manner and with the same speed as other parts of applications.

Summary

M2M applications across many devices and industries will have many of the same patterns regarding data collection, processing, visualizing and control. Just as there are patterns to social applications and Web 2.0 cloud services, there are patterns to M2M applications. Smart devices for building efficiency have similar needs as do medical devices, mining and agriculture sensors, truck and automobile diagnostics and most other electronic devices and machines.

The particular types of sensors used, the data collected, the way it’s transmitted to the cloud, the views and reports generated and the actions triggered may be different, but many core application needs, process flows and data approaches will be the same.

The advantages of the cloud — in data storage and application development alone — present a significant inflection point in the cost of operation and the speed of development for M2M applications. Sensor, device and equipment makers are only just beginning to leverage these capabilities but not near there levels where they could be.

And the opportunities aren’t just limited to M2M applications surrounding devices. There are big opportunities to create new M2M platforms and platform services. Just as smart phone service companies and cloud industries have formed, overtaking established mobile and IT leaders, new M2M companies will arise to supply important cloud-based components to this emerging set of smart-enabled devices, vehicles and machines.

The mobile and cloud industries have made people acutely aware the power that open platforms in combination with rich development tools and vibrant support community have in regards to technology adoption (see Apple iPhone and Google Android vs. RIM, Palm, Symbian and Microsoft). These same insights – and the strategic value of creating and leveraging cloud-based platforms – can be employed to service makers of devices and machines.

Great products can no longer be great products in isolation. They’ll need to be cloud aware in order to be viable in this changing ecosystem. An ecosystem where data matters and monitoring and control of devices can exist anywhere. Cloud computing and the way it impacts M2M applications will touch and transform, every industry in the same manner as has the Internet, email and the Web.

Discuss