Join Transform 2021 for the most important themes in enterprise AI & Data. Learn more.
Cloudera announced today it has added to its portfolio a Cloudera SQL Stream Builder tool based on technology it gained with the acquisition of Eventador that makes it possible to employ SQL to query streams of data in real time.
That Eventador tool is now integrated with a Cloudera DataFlow (CDF) streaming platform that provides a common framework for processing streaming data using open source Apache Flink, Kafka Streams, or Spark Structured Streaming engines. Previously, the only way to query that data was using programming tools based on Java or Scala. Now data analysts can now query CDF data without having to know how to write code, said Dinesh Chandrasekhar, head of product marketing for Cloudera.
SQL Stream Builder also enables analysts to create views of query results that can be exposed to other applications via REST application programming interfaces (APIs). It has also been integrated with the Shared Data Experience (SDX) framework Cloudera created to enforce governance and security policies across CDF.
Despite the rise of a wide range of programming languages employed to analyze data, the dominant lingua franca for querying data in the enterprise remains SQL. However, as the need to query data as it streams in real time becomes larger, organizations want to be able to extend SQL to, for example, potentially identify anomalies in processes that would be indicative of potential fraud, Chandrasekhar said.
Much of the increased need to query streaming data is being driven by digital business transformation initiatives that process and analyze data in real time using platforms such as Spark and Kafka. At some point, an analyst is going to need to launch an ad hoc query against that data to resolve a pressing issue long before the data is eventually stored in a relational database. “Data has a shelf life,” said Chandrasekhar.
Rather than having to find a developer to write that query in Java or some other programming language to achieve that goal, it’s now possible for an analyst to immediately launch a SQL query themselves. Previously, that query might not have ever been launched simply because it would have taken too much time and effort to find a developer to write the code.
In general, more data than ever is being processed and analyzed at both the points where it is created and consumed and where it moves between applications in real time. Cloudera is betting much of that data will ultimately land in a data warehouse based on the open source distribution of Hadoop that it provides. However, in the last few years, rival SQL-compatible data lakes based on proprietary platforms managed by cloud service providers have been gaining traction at the expense of provider of platforms based on Hadoop.
Cloudera, with the launch of Cloudera SQL Stream Builder, is adding one more SQL-compatible tool to a portfolio that makes it possible to query data residing in Hadoop and other frameworks such as Apache Spark that are typically deployed on top of Hadoop. It’s not clear just yet to what degree those capabilities will enable Cloudera to counter the recent successes of its rivals. However, as a provider of a data warehouse platform based on open source software, Cloudera does appeal to IT organizations that have decided to avoid proprietary software whenever possible.
Regardless of what tool is employed to analyze data, there’s more of it than ever being generated faster. The degree to which humans will be able to analyze data that is generated in real time remains to be seen. Many of the digital processes that organizations are trying to analyze occur in milliseconds, which is too fast for a human being to catch without help from some form of AI. Nevertheless, there’s a lot data residing in streaming platforms that can be queried. The challenge now is knowing how to first structure those SQL queries and, just as importantly, when to launch them.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.
Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the subjects of interest to you
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
- networking features, and more