Thank you for purchasing the MEAP of the second edition of Data Pipelines with Apache Airflow, Second Edition.
When we released the first edition of the book it was based on Airflow 2.0. Since then, Airflow has significantly evolved in many aspects including the addition of many new features and functionality. On top of that, Airflow 3.0 is just around the corner, so it is important to be ready and up to date for its arrival.
Despite Airflow’s maturity, new users are still getting onboard every day. With its continuously growing ecosystem and its vast set of features, it can be quite a daunting tool for those just starting out. While Airflow’s versatility is most of the time an asset, it can sometimes lead users to build suboptimal workflows as it can be hard to figure out the right way to use its features. For this reason, we decided to update the book to guide new and experienced users to make the most of what Airflow has to offer.

Incremental loading and backfilling.
One powerful feature of Airflow’s scheduling semantics is that the schedule not only triggers DAGs at specific time points (similar to, for example, Cron), but also provides details about the last and (expected) next schedule intervals. This essentially allows you to divide time into discrete intervals (e.g., every day, week, etc.), and run your DAG for each of these intervals.
This property of Airflow’s schedule intervals is invaluable for implementing efficient data pipelines, as it allows you to build incremental data pipelines. In these incremental pipelines, each DAG run processes only data for the corresponding time slot (the data’s delta, i.e. data that changed since the last interval) instead of having to reprocess the entire data set every time. Especially for larger data sets, this can provide significant time and cost benefits by avoiding expensive reprocessing of existing results.
Contents.
1. Welcome.
2. 1 Meet Apache Airflow.
3. 2 Anatomy of an Airflow DAG.
4. 3 Scheduling in Airflow.
5. 4 Templating tasks using the Airflow context.
6. 5 Defining dependencies between tasks.
7. 6 Triggering workflows.
8. 7 Communicating with external systems.
9. 8 Extending Airflow with custom operators and sensors.
10. 9 Testing.
11. 10 Running tasks in containers.
12. 11 Best practices.
13. 12 Project: finding the fastest way to get around NYC.
14. 13 Project: Keeping family traditions alive with Airflow and Generative AI.
15. 14 Operating Airflow in production.
16. Appendix A. Running code samples.
Бесплатно скачать электронную книгу в удобном формате, смотреть и читать:
Скачать книгу Data Pipelines with Apache Airflow, De Ruiter J., Cabral I., Geusebroek K., Harenslak B. - fileskachat.com, быстрое и бесплатное скачивание.
Скачать файл № 1 - pdf
Скачать файл № 2 - epub
Ниже можно купить эту книгу, если она есть в продаже, и похожие книги по лучшей цене со скидкой с доставкой по всей России.Купить книги
Скачать - epub - Яндекс.Диск.
Скачать - pdf - Яндекс.Диск.
Дата публикации:
Теги: учебник по программированию :: программирование :: De Ruiter :: Cabral :: Geusebroek :: Harenslak
Смотрите также учебники, книги и учебные материалы:
Следующие учебники и книги:
Предыдущие статьи: