Real-Time Analytics from Your Data Lake Teaching the Elephant to Dance

One of the biggest challenges with data lakes in general, and Hadoop in particular, is speed. How do you get real-time analytics performance out of a technology like Hadoop that was designed to trade off performance for scalability? While technologies like Hive, Presto, Parquet, ORC and others have delivered improvements, none of them provide near real-time, sub-second performance at scale.

Technologies like Apache Druid are used today alongside Hadoop to deliver real-time queries using the data from the data lake. Druid has also helped these same companies implement end-to-end real-time analytics using message buses like Kafka or Kinesis.

This whitepaper from Imply Data Inc. explains why delivering real-time analytics on a data lake is so hard, approaches companies have taken to accelerate their data lakes, and how they leveraged the same technology to create end-to-end real-time analytics architectures.




We use cookies to optimize your experience, enhance site navigation, analyze site usage, assist in our marketing efforts. Privacy Policy