Manage data pipeline
Create data pipeline allows us to define ETL pipelines that creates or updates event logs at a defined schedule. At any point when creating the ETL pipeline, we can save the pipeline. This allows us to edit the pipeline at a future time. To save a pipeline, click the save pipeline icon in the top-right corner of the Data Pipeline Creation window.
We’d be prompted to enter the pipeline name.
We can also save a pipeline and exit the pipeline. Click the save & exit pipeline icon.
After saving a pipeline or successfully scheduling it, we can manage the pipeline to discover its status, edit the pipeline, stop the pipeline schedule, share the pipeline, or delete the pipeline.
To manage the saved or scheduled pipelines, click on Manage Data Pipelines icon in Portal.
The Data pipeline Management window appears, displaying all the scheduling-related details next to the pipeline name: time/frequency of loads, status, last run.
Note
By default, the window shows all the pipelines. To see only Scheduled or Unscheduled ones, click on the corresponding section.
We can unscheduled a scheduled pipeline by moving the pipeline activation slider next to the pipeline name. When the slider is toggled to the off position, the pipeline would not run when it reaches it schedules time.
Similarly, we can schedule the pipeline by toggling on the switch.
The Data pipeline Management window also informs us about the last run status. The status run could be SUCCESS, FAILED, NOT INITIALIZED or QUEUED.
SUCCESS: The scheduled pipeline run was successful.
FAILED: The system could not successfully run the scheduled pipeline.
NOT INITIALIZED: The pipeline was saved but not scheduled.
QUEUED: The pipeline job is currently in a queue waiting to run.
RUNNING: The pipeline is currently running.
Note
When a pipeline is scheduled from an S3 folder, and there are no unprocessed micro-batches in the S3 folder at the start of a pipeline run, the pipeline is marked as “FAILED”. The hover-over message in the pipeline manager indicates that there were no micro-batches to be processed by this pipeline run.
To view the details of all previous runs, click
Note
The status of the pipeline run might be either SUCCESS or FAILED. When the status is FAILED, system could not run the scheduled pipeline successfully.
If we wish to trigger a pipeline to run now, we can click
We can also edit a pipeline by clicking the edit icon.
This redirects us to the Data Pipeline Creation window where we can edit the Extract, Merge, Transform or Load tab of the pipeline.
To delete a scheduled pipeline, click
We can share a data pipeline with other users by clicking on the share icon.
Note
When we share a pipeline, and the pipeline is being edited by the shared used, we cannot edit the pipeline. To learn more about sharing a data pipeline, see Share data pipeline.
We can also refresh the window to get the latest pipeline management information. To refresh the window, click