The Cloudcast

Data Pipelines with Apache Airflow

Massive Studios

Julian LaNeve (@JulianLaneve, CTO @astronomerio) discusses data pipelines, Apache Airflow, Astronomer’s managed offering, and the benefits of data pipelines for both developers and operations.

SHOW: 939

SHOW TRANSCRIPT: The Cloudcast #939 Transcript

SHOW VIDEO: https://youtube.com/@TheCloudcastNET 

NEW TO CLOUD? CHECK OUT OUR OTHER PODCAST: "CLOUDCAST BASICS" 

SPONSORS:

  • [VASION] Vasion Print eliminates the need for print servers by enabling secure, cloud-based printing from any device, anywhere. Get a custom demo to see the difference for yourself.
  • [FCTR] Try FCTR.io (that's F-C-T-R dot io) free for 60 days. Modern security demands modern solutions. Check out Fctr's Tako AI, the first AI agent for Okta, on their website

SHOW NOTES:


Topic 1 - Welcome to the show, Julian. Give everyone a quick introduction.

Topic 2 - Our topic today is Data Pipelines with Apache Airflow.  For those unfamiliar, provide an introduction to Apache Airflow and how Airflow manages data pipelines.

Topic 3 - What are the advantages of Apache Airflow vs. others in the space? What are the downsides? How does Airflow fit in with other Apache projects?

Topic 4 - I would imagine this is where Astronomer potentially comes into play. What makes Astonomer different from Airflow? What problems are you trying to solve for both developers and operations folks?

Topic 5 - What does a typical implementation look like? What growing pains do developers typically face when they need to introduce pipelining tools and begin standardization? Is it a scale issue? A complexity of tools issue? Integrations with infrastructure?

Topic 6 - One aspect I typically see with automation is security, especially at scale. What recommendations do you have for developers regarding security, particularly in the context of multi-tenancy, for data pipelines?


FEEDBACK?

People on this episode