JavaProspect: Use Case: How to Implement Hive Hooks to Optimize a Data Lake

Tuesday, October 30, 2018

Use Case: How to Implement Hive Hooks to Optimize a Data Lake

Data About Data

The important difference between data lakes and data swamps is prudently organized data leads to an efficient lake while a swamp is just data that is either over-replicated or siloed by its users. Getting the information on how the production data is being used across organization can not only be beneficial in building a well-organized data lake but it will also help data engineers to fine-tune the data pipelines or data itself.

To understand how data is consumed, we need to figure out answers to some basic questions like:

from DZone.com Feed https://ift.tt/2CQ43dl

JavaProspect

Tuesday, October 30, 2018

Use Case: How to Implement Hive Hooks to Optimize a Data Lake

Data About Data

No comments:

Post a Comment