We are leveraging Kafka of Cloudera Data flow for streaming analytics. CDF provides us real time data which is critical for producing live dashboards and also the amount of data streaming (in petabytes) helps us to have CDF as one stop shop for live data analysis Review collected by and hosted on G2.com.
Kafka of CDF although is scalable however it has a lot of lag problems and needs complex tuning. When the lag occurrs that is the current offset is more than consumer end offset, a lag in 6-7 figures can be seen that means the stale records reaches to around 1 million at times due to which the dashboard waits for latest data and it sometimes takes hours to fetch that and sometimes restart of service is also required to fix that Review collected by and hosted on G2.com.
Cloudera Data Flow(CDF) provides us a single platform for analysis of real time streaming data. We mostly use CFM, CEM to push agents data and Kafka to push live data which is then consumed by spark and after cleaning the financial reports are created. Review collected by and hosted on G2.com.
Kafka which was earlier a part of CDP(cloudera data platform) has been moved to CDF which makes us buy a separate subscription and hence incur more costs to the project. This was a smart move by Cloudera to make more money but surely hurts us as the service that we used along with CDP now has to be purchased as it comes under CDF umbrella Review collected by and hosted on G2.com.
Hortonworks two main pillars are HDP (Hortonworks Data Platform) and HDP (Hortonworks Data Flow). The former applies to the infrastructure required for building and deploying a data lake, and the latter is about ingestion, in batch or realtime.
Both HDP and HDF rely entirely on opensource projects, this is a distinctive point about Hortonworks. Review collected by and hosted on G2.com.
As an open source project collection, it relies strongly on community activity. You still have the option to contract premium consulting or training services.
Altough it is quickly evolving into Data Science tools availability (eg. Tensorflow incorporate in HDP 3), it can be cumbersome from a developer transitioning from a traditional IDE, into the notebook vs. datalake metaphore. Review collected by and hosted on G2.com.