The Role of Relays in Big Data Integration

The Role of Relays in Big Data Integration

The very nature of big data integration requires an organization to become more flexible in some ways; particularly when gathering input and metrics from such varied sources as mobile apps, browser heuristics, A / V input, CRMs, software logs and more. The number of different methodologies, protocols, and formats that your organization needs to ingest while complying with both internal and government mandated standards can be staggering.

Is there a clean and discreet way to achieve fast, heterogeneous data integration and still reap all of the benefits of big data analytics?

Data Integration via Distributed Relay Architecture

What if, instead of just allowing all of that data to flow in from dozens of information silos, you introduced a set of intelligent buffers? Imagine that each of these buffers was purpose built for the kind of input that you needed to receive at any given time: Shell scripts, REST APIs, federated DBs, hashed log files, and the like.

Let’s call these intelligent buffers what they really are: Relays. They ingest SSL encrypted data, send out additional queries as needed, and provide fault-tolerant data access according to ACLs specific to the team and server side apps managing that dataset.

If you could access such a distributed relay architecture to deal with your big data integration chain, it might look something like this:

<img alt=”Relays in big data integration” src=”relays-in-big-data-integration.jpg”

Now, you have options. For applications that require rapid updating such as stocks, commodities, currency trade, and the like, your relays can provide a reliable real time stream. For slower consumers, you can make use of the journaling system, which can act as a kind of centralized mirror for your collected data.

Distributed Relay Architecture’s Role in Big Data Analytics

It is important, particularly in fields such as predictive analytics and user behavior analytics, to avoid introducing noise to the system. Anything that relies on statistical modelling and machine learning abhors garbage data that might force the entire system to be rolled back. So it’s vitally important that data integration only takes place after it has been sanitized and presented in a proven-interoperable format.

Enter distributed relay architecture. Each relay ensures that the information flowing into the data analytics pipeline comes from a reliable, authenticated source and arrives in digestible chunks that the servers already understand.

Relays need to have five main attributes in order to perform their tasks successfully.

  1. Fast, Reliable Data Consumption: Real-time data communication is error prone. A chief concern is slow consumption, forcing the mission critical producer to back up. Placing a high speed relay near the producer of real time data shortens the distance and provides a fault-tolerant buffer.
  2. Portability: Interfacing with multiple incompatible versions of databases, operating systems, or non-interchangeable protocols is solved using individually configured relays; mapping the turbulent environment into a consistent, accessible protocol. This allows for the seamless combination of heterogeneous data sources, and makes data integration far easier on the back end.
  3. Localized Auditing and Entitlements: Understanding the logical rules for who should have access to data is often best understood by the team managing that data. An additional localized entitlement layer allows for sophisticated, granular data permissions to be assigned in a distributed fashion.
  4. Immunization: Server processes go down, network connections get lost, third party adapters crash, and hardware fails. But the otherwise systemic impact stops at the relay, keeping outages localized and limited, providing a best case scenario.
  5. Full Access: Files, executables, and OS functionality are often not readily reachable. By installing a relay locally, these resources become remotely accessible and monitorable. This is key to the kind of monitoring required for real time UX events, for example. It can also be important to other big data analytics applications that require process level monitoring, such as combined system and network performance testing. Relays allow for an incredibly granular level of monitoring for any of your data.

A Real Life Example of Data Integration via 3Forge Relays

One of the companies currently implementing a successful data relay platform is 3Forge. Their system specifically makes use of 3Forge’s proprietary relays.

Their platform is a comprehensive suite of components that provides a consolidated view of various applications and IT resources within a large information technology environment. Unsurprisingly, they rely on highly customised relays to accomplish this. Their suite allows companies to perform large scale data integration, and do the kind of monitoring and analysis required for even the most demanding enterprise implementations of real-time big data analytics.

The impact of a system such as this can be seen in the Citi Client Connection Manager project. 3Forge was able to implement custom 3Forge relays for dozens of proprietary order management systems. This allowed Citi to rebuild the exact dashboard they had previously, but with massively improved performance characteristics. Thanks to the browser based interface, new features can be rapidly deployed to the client without installing new desktop applications. The hardware requirements of Citi’s order management systems were reduced by 90%, dramatically improving stability and reducing costs.

In Conclusion

A distributed relay architecture might be one of the quickest, most cost effective solutions to some of the more complex data integration issues faced by big data applications. The ability to put discreet, purpose built relays into the architecture circumvents tons of issues commonly encountered when attempting to pull from heterogeneous sources.

The ability to scale vertically and horizontally means that these relays are still viable even when dealing with a large number of outside applications and resources. And a distributed browser back end can service a massive number of project teams and data analysts in multiple regions, without having to install proprietary client software.

Contact

If you would like to contact 3Forge for more details on their relay implementation for data integration and big data analytics applications, you can call their New York City office at +1 646 490 3733, or their London office at +44 020 3950 8528. They can also be reached by E-mail at info@3forge.com