Upload task mining data to the Portal

The raw data collected by the digital agent is not suitable to be used as an event log in Apromore. This is because many events may not have a case identifiers or activity labels. To address this shortcoming, Apromore task mining cockpit performs computations internally to convert the raw task mining data into an event log that can be uploaded in the Portal.

The list of processing steps performed by the task mining cockpit includes:

  • Merging of all raw data CSVs: Data from all four raw event CSV files (window events, element focus events, data items, and copy-paste events) is merged to enhance event attributes.

  • Cleaning the data: Some events may not have an ID based on the config rule specified. To ensure all events have a case ID, Apromore populates the null Case IDs with the Case ID value of the prior events or subsequent events.

  • Separating event logs by identifier: Apromore groups the event logs based on the identifier in the configuration. For instance, if events are identified as either Salesforce or GitHub in the configuration file, the final event log will be organized into two separate folders: the ‘Salesforce’ folder, containing only Salesforce events, and the ‘GitHub’ folder, containing only GitHub events.

The resulting processed task mining data can be uploaded into the Apromore Portal as an event log in two ways.

  • Manually by downloading the processed task mining data from the project page and uploading it via the Log Importer.

  • By creating an ETL pipeline to load the processed data as an event log.

Upload via the log importer

To access the file from the task mining cockpit, go to the project and click Download processed data.

Get002

This downloads a ZIP file containing the processed task mining data for each identifier. This data can be uploaded in Apromore as an event log.

The task mining data produced by the digital agent typically contains the following types of attributes, among others:

  • Activity: Activities captured by the digital agent.

  • Resource: Email address associated with the digital agent that registered the event.

  • Step: Steps captured by the digital agent.

  • Window: Applications opened by the user.

  • Window element: Focus UI elements captured by the digital agent. They include elements within a window, such as a button, form field, and dropdowns.

  • Screenshot: Images captured by the digital agent showing the contents displayed on the user’s screen.

When the task mining log is uploaded to Apromore, these columns are tagged automatically.

Upload002

The same holds for the screenshot column if present.

Upload003

Note

  • If column names have been changed during preprocessing (in a data pipeline), we may need to manually map each column to its type (e.g., activity, step, window, window element, or screenshot).

  • The activity, step, window, and window element columns should not contain empty values. If they do, Apromore will display an error during log import, allowing us to skip rows with missing values. However, the screenshot column may contain empty values.

Upload via an ETL pipeline

The processed task mining data can be accessed using an ETL pipeline. The logs are saved as parquet files in the dedicated S3 folder. We can create an ETL pipeline to append them as one single log.

In the Extract phase of the ETL pipeline, select Amazon S3.

Get003

Click the folder icon to select the folder.

Get004

To ensure the pipeline always get new logs in the folder we use S3 microbatching. Go to the folder of the task mining project. This folder is in the path “taskmining > event_logs > {project_number} > standalone”.

Get005

Note

For every event tagged as an identifier in the configuration file, Apromore creates a different folder and stores its event log there. This is to ensure the extracted data are modular.

In the screenshot below, Project 61 had only Apromore events tagged as an identifier while Project 17 had Apromore, ExcelProcess, and StackOverflow tagged as identifiers.

Note

If the identifiers are interrelated and we wish to have all extracted data in an event log, we can create a new pipeline that merges the results of the individual processes.

Get006

To extract the “Apromore” process task mining data for project 61, select Apromore and click Extract.

Get007

Change the file type to PARQUET.

Get008

Once done, click Extract.

Get009

Apromore extracts the parquet files in the folder using S3 micro-batch ingestion. In the Load step, Apromore automatically detects and assigns appropriate tags for task mining–specific columns. Ensure that:

  • The column representing a step is tagged as Step.

  • The column representing a focus element name is tagged as Window Element.

  • The column representing the application window is tagged as Window.

  • The column containing screenshot references is tagged as Screenshot.

Get010

To learn more about the tags, see Upload an event log.

Click Schedule Pipeline to specify how frequently the pipeline will run.

Note

Since an S3 folder was selected at the Extract stage, after each pipeline run, Apromore automatically updates the log with any new files arriving in the folder.

Once the data is in the Portal, it can be analyzed using any of Apromore tools such as Process Discoverer, Dashboards, etc