Monday, June 1, 2020

Part 1: Deploying a Distributed SQL Backend for Apache Airflow on Google Cloud

Apache Airflow is a popular platform for programmatically authoring, scheduling, and monitoring workflows. Airflow has been deployed by companies like Adobe, Airbnb, Etsy, Instacart, and Square. The advantage of defining workflows as code is that they become more maintainable, versionable, testable, and collaborative. Airflow is used to author these workflows as directed acyclic graphs (DAGs) of tasks. Airflow’s scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities make performing complex surgeries on DAGs a snap. Its browser-based UI makes it easy to visualize pipelines running in production, monitor their progress, and troubleshoot issues when needed.

Why Airflow With a YugabyteDB Backend?

By default, Airflow makes use of a SQLite database for its metadata store, which both the scheduler and web UI rely on. Typically, when Airflow is used in production, the SQLite backend is replaced with a traditional RDBMS like PostgreSQL. However, in order for PostgreSQL not to become a single point of failure in the Airflow deployment, administrators will still need to devise high-availability and failover strategies for PostgreSQL. There’s a simpler solution that Airflow can interact with just like PostgreSQL, but has the added advantages of high availability, support for multiple cloud and topology deployment options, plus high performance built in; it is YugabyteDB.



from DZone.com Feed https://ift.tt/302BNQf

No comments:

Post a Comment