Tuesday, October 30, 2018

Use Case: How to Implement Hive Hooks to Optimize a Data Lake

Data About Data

The important difference between data lakes and data swamps is prudently organized data leads to an efficient lake while a swamp is just data that is either over-replicated or siloed by its users.  Getting the information on how the production data is being used across organization can not only be beneficial in building a well-organized data lake but it will also help data engineers to fine-tune the data pipelines or data itself.

To understand how data is consumed, we need to figure out answers to some basic questions like:



from DZone.com Feed https://ift.tt/2CQ43dl

No comments:

Post a Comment