Apache Airflow — Sensors

Sai Prabhanj Turaga
3 min readNov 4, 2023

--

In Apache Airflow, a Sensor is a type of operator that is used to wait for a specific external condition to be met before allowing the workflow to continue. Sensors are particularly useful when you need to pause a task in your Directed Acyclic Graph (DAG) until some external event occurs.

Here’s an in-depth explanation of Airflow Sensors

Key Characteristics of Sensors

Waiting for a Condition

Sensors are designed to wait for some external condition or event to occur. This condition can be related to files, databases, APIs, or any other external systems.

Polling Mechanism

Sensors typically use a polling mechanism, where they check for the condition at regular intervals until it is met. Polling intervals can be configured.

Dynamic Task Execution

Sensors don’t execute a task themselves. Instead, they determine whether a downstream task should be executed based on the external condition. If the condition is met, the downstream task is triggered.

Timeouts and Poke Intervals

Sensors can be configured with a maximum amount of time to wait for the condition. If the condition is not met within this time, the sensor can raise an exception or proceed based on your configuration.

Common Sensor Operators in Airflow

FileSensor

  • Waits for the existence of a file or files in a specified directory.

TimeDeltaSensor

  • Pauses until a specified time interval has passed.

ExternalTaskSensor

  • Waits for the completion of another task in the same or a different DAG.

HttpSensor

  • Monitors an HTTP endpoint and waits for a specific HTTP response status code.

HdfsSensor

  • Waits for the existence of a file or files in Hadoop Distributed File System (HDFS).

S3KeySensor

  • Waits for a specific key to appear in an Amazon S3 bucket.

SqlSensor

  • Polls a SQL query against a database and waits for the query to return results.

RedisPubSubSensor

  • Waits for a specific message to appear on a Redis Pub/Sub channel.

JiraSensor

  • Monitors a Jira issue and waits for it to transition to a specific status.

NamedHivePartitionSensor

  • Waits for the existence of a Hive partition with a specified name.

Use Cases and Best Practices

Data Arrival

Use FileSensors or S3KeySensors to wait for data files to arrive in a directory or an S3 bucket before processing.

External Task Dependencies

Use ExternalTaskSensors to create dependencies between tasks in different DAGs, waiting for a prerequisite task to complete before proceeding.

API Availability

Use HttpSensors to wait for the availability of an API or a web service before making API requests.

Database Query

Use SqlSensors to wait for the result of a specific database query before continuing the workflow.

Service Availability

Use custom sensors to monitor the availability of external services like message queues, databases, or third-party services.

Sensors are valuable for creating reliable and resilient workflows. They help ensure that your workflow proceeds only when the necessary conditions are met, making them suitable for scenarios where data dependencies, external services, or specific timing requirements need to be satisfied before task execution.

--

--

Sai Prabhanj Turaga
Sai Prabhanj Turaga

Written by Sai Prabhanj Turaga

Seasoned Senior Engineer, works with Data

No responses yet