Manage data pipeline

Create data pipeline allows us to define ETL pipelines that creates or updates event logs at a defined schedule. At any point when creating the ETL pipeline, we can save the pipeline. This allows us to edit the pipeline at a future time. To save a pipeline, click the save pipeline icon in the top-right corner of the Data Pipeline Creation window.

../_images/Manage0021.png

We’d be prompted to enter the pipeline name.

../_images/Manage0031.png

We can also save a pipeline and exit the pipeline. Click the save & exit pipeline icon.

../_images/Manage0041.png

After saving a pipeline or successfully scheduling it, we can manage the pipeline to discover its status, edit the pipeline, stop the pipeline schedule, share the pipeline, or delete the pipeline.

To manage the saved or scheduled pipelines, click on Manage Data Pipelines icon Manage016 in Portal.

../_images/Manage0051.png

The Data pipeline Management window appears, displaying all the scheduling-related details next to the pipeline name: time/frequency of loads, status, last run.

../_images/Manage0061.png

Note

By default, the window shows all the pipelines. To see only Scheduled or Unscheduled ones, click on the corresponding section.

../_images/Manage0071.png

We can unscheduled a scheduled pipeline by moving the pipeline activation slider next to the pipeline name. When the slider is toggled to the off position, the pipeline would not run when it reaches it schedules time.

../_images/Manage0081.png

Similarly, we can schedule the pipeline by toggling on the switch.

The Data pipeline Management window also informs us about the last run status. The status run could be SUCCESS, FAILED, NOT INITIALIZED or QUEUED.

  • SUCCESS: The scheduled pipeline run was successful.

  • FAILED: The system could not successfully run the scheduled pipeline.

  • NOT INITIALIZED: The pipeline was saved but not scheduled.

  • QUEUED: The pipeline job is currently in a queue waiting to run.

  • RUNNING: The pipeline is currently running.

Note

When a pipeline is scheduled from an S3 folder, and there are no unprocessed micro-batches in the S3 folder at the start of a pipeline run, the pipeline is marked as “FAILED”. The hover-over message in the pipeline manager indicates that there were no micro-batches to be processed by this pipeline run.

../_images/Manage0091.png

To view the details of all previous runs, click Manage017

Note

The status of the pipeline run might be either SUCCESS or FAILED. When the status is FAILED, system could not run the scheduled pipeline successfully.

../_images/Manage0101.png

If we wish to trigger a pipeline to run now, we can click Manage018

../_images/Manage0111.png

We can also edit a pipeline by clicking the edit icon.

../_images/Manage0121.png

This redirects us to the Data Pipeline Creation window where we can edit the Extract, Merge, Transform or Load tab of the pipeline.

To delete a scheduled pipeline, click Manage019

../_images/Manage0131.png

We can share a data pipeline with other users by clicking on the share icon.

../_images/Manage0141.png

Note

When we share a pipeline, and the pipeline is being edited by the shared used, we cannot edit the pipeline. To learn more about sharing a data pipeline, see Share data pipeline.

We can also refresh the window to get the latest pipeline management information. To refresh the window, click Manage020

../_images/Manage0151.png