AI Alerts Dashboard (Legacy)

 

The AI Alerts feature offers proactive anomaly detection and instant alerts to help shorten time to resolution. This advanced feature eliminates the need to determine the thresholds, identify the relevant dimensions that you need to monitor, and set alerts to notify related individuals and groups.

The Conviva AI Alerting system is continuously checking for anomalies, and computes a baseline along with a range of variation for the metric(s) based on the mean and standard deviation derived from historical data. This range of variation is then used to evaluate the traffic in the past few minutes. If the upper boundary range or threshold is exceeded, an anomaly is detected, which then triggers the diagnosis process to determine if an alert should be fired based on the sensitivity control settings and the root cause of the event.

For each generated alert, the system provides the sessions that attributed to the dimensions (or set of dimensions) associated with the root cause of the alerts and the views that were impacted due to the alert.

Each AI alert is assigned an alert severity (Info, Warning, Critical) that can help you determine the impact of alert condition. By default, all alerts fire with the Info severity until the thresholds for Warning and Critical are set. As alerts fire, the AI Alerts page displays the alerts totals for each severity level and related metric, and continuously updates the totals as the alert conditions change. When an alert condition returns to the accepted range of variation and sensitivity control limits, the AI alert ends and is listed in the Alert Ended totals to indicate that the alert condition cleared. The specific threshold for each severity level is configured in the sensitivity controls. For more details on setting the sensitivity controls, see AI Alert Sensitivity.

The alert totals show the alerts for the last 30 days.

This topic includes the following sections:

This feature offers automatic anomaly detection and fault isolation across the following key metrics and key dimensions:

Metrics
  • Video Start Failures Technical

  • Exits Before Video Start

  • Video Startup Time

  • Rebuffering Ratio Video

  • Playback Failures Technical

Dimensions (Root-cause)
  • All Traffic

  • Live

  • VoD

  • Device

  • CDN

  • Asset

  • ISP

  • Channel (Beta)

  • City

  • OS Family

  • Player Name

  • Streaming Protocol (Beta)

AI Alerts fired for the regional ISP and city dimensions improve your ability to diagnose root cause analysis, for example tracing the root cause to a specific regional ISP and city rather than only the DMA group. As a result, you can see alerts associated with root causes from city only; regional ISP only; regional ISP and city; CDN and regional ISP; and CDN, regional ISP, and city. The DMA group dimension is no longer supported.

The Channel and Streaming Protocol dimensions support the following root cause combinations:

  • CDN: Channel: Protocol

  • CDN: Channel

  • CDN: Protocol

  • Channel: Protocol

  • Channel

  • Protocol

For a detailed definition of these metrics, see the Metric Dictionary.

Viewing AI Alerts List

To view the AI Alerts summary page:

  • Log in to Video and select AI Alerts in the Conviva Video feature selection


    Or

    Log in to Conviva Video and select AI Alerts in the Video feature selection


    Or

    >>>>>>> Stashed changes
  • Click on the Alert Bell in the menu and click an AI alert to display the AI alert summary.

The AI Alerts page shows the total number of fired alerts along with the alert totals by severity level and metric for up to the last 30 days.

Click a severity level or metric total to filter the alert display for that combination of severity and metric. For example, filter alerts for specific severity levels (Critical and Alert Ended) and metrics (Exits Before Video Start).

Each alert instance provides following AI alert information:

  • Customer: For CDN Partners, displays the customer impacted by the alert.

  • Account: Name of the account impacted by the alert.

  • Metrics: Name of the metric for which the AI alert fired.

  • Metric value: Metric value that caused the AI alert to fire.

  • Root cause: Dimension(s) that caused the AI alert to fire.

  • Severity: Critical, Warning, Info

  • Cumulative Impacted Unique Devices: The cumulative total of devices impacted by the issue at the time the AI alert fired.

  • Time alert fired/ended: Date and time that the AI alert fired, linked to the detailed Diagnostics page for the alert. When an alert ends, this field indicates the date and time the alert ended.

By default, the alert instances are sorted by time the alert fired, with the most recent instances at the top. You can also customize the alert display by clicking on a column name, severity level summary or metric summary.

As the AI alert conditions change, the alert severity updates to show the latest severity level. When an alert condition returns to the accepted range of variation and sensitivity control limits, the AI alert ends and is listed in the Alert Ended totals to indicate that the alert condition cleared.

Click an alert to toggle the alert history and display the previous alert states.

Root Cause Analysis

Diagnosing an AI alert starts with examining the information in the initial AI alert message and interpreting the dimensions in the alert title, and then, if necessary, inspecting the diagnostic details of the alert to determine if the root cause of the alert can be further isolated to a specific player, operating system, or browser. In the case of alerts with only a CDN or Asset as the root cause, no further diagnosis is required to determine this type of root cause. However in other cases, further diagnosis, such as narrowing OS, browser, and player levels, or analyzing the time series, can often help to narrow the root cause and troubleshoot the progression of the alert condition.

For each generated AI alert, the system displays the dimensions attributed to the root cause, and for further analysis provides a diagnostic time series of the metric variations leading up to the alert firing and continuing past the end of the alert. The AI alert system also displays a list of sessions that were impacted due to the alert and the device metadata associated with sessions to help you further isolate the root cause levels.

Issue Convergence

AI alerts that fire within several minutes of each other for the same metric are possibly related to the same root cause. As an alert condition progresses, the AI alert system performs continuous fault isolation and root cause analysis to detect if the initial dimensions have converged to further isolate the source of the alert. In the example below, after the initial Video Startup Time AI alert fired on August 1st at 02:57 with the root cause dimensions of Roku and Live, another Video Startup Time AI alert fired two minutes later at 02:59 with more impacted unique devices, as the root cause dimension converged to only Live traffic. This convergence indicates that within several minutes the AI alert system determined that the initial Video Startup Time alert that was not limited to Roku players and Live traffic, but was isolated to all players with Live traffic.

Initial AI Alert: Roku Live

Converged AI Alert: Live

AI alert convergence may take up to several minutes depending on the complexity of the AI alert condition and scope of the impact across OTT system levels. When initially interpreting an AI alert, check for neighboring alerts to determine if alerts have converged. If convergence occurred, use the latest alert and root cause for further diagnosis.

Searching AI Alerts

The AI Alerts page allows you to perform searches by the Root Cause, Metrics and Time Alert Fired fields.

To search the alerts:

Select the desired date range.

Type a search term such as iPHone or AKAMAI in the search box, followed by the Enter key.

To clear your search and see all alerts diagnostics again, clear the search box and press the Enter key.

Note: If your search yields no results, click the search box and press the Enter key. This will re-populate the list with all alerts.

Viewing AI Alerts Diagnostics

The AI alerts diagnostic page enables you to drill into the alert details with a data snapshot at the time the alert fired, a times series chart depicting the alert firing sequence expanded to one-minute intervals, and data illustrating the alert conditions, such as metric baseline and range of metric variance. When available, a second time series is shown for a related metric.

To diagnose an AI alert from the Diagnostics page, click the Time Alert Fired column for one of the alerts. The Diagnostics page appears with the alert details.

The Data snapshot section displays data at the time the alert fired:

  • Average metric value and severity or status

  • Time issue started

  • Time the AI fired

  • Cumulative number of impacted unique devices

  • Root cause of the alert

Because this data is a snapshot at the time the alert fired, the metric value, severity, and snapshot data do not reflect any subsequently impacted sessions. The AI alert percentage is the percentage of sessions within the root cause group that were impacted by the alert. For example, in the sample data snapshot, a Video Start Failures percentage of 9.3% indicates that 9.3% of all the sessions using the INHOUSE CDN and VOD (the root cause group) were impacted.

Time series of the metric that caused the alert to fire, with one minute data intervals for a two hour window, and severity-colored vertical bars showing the time when the alert reached each severity level. Drag the timestamps at the ends of the time series to zoom in on a more granular time period.

  • The horizontal blue dashed line shows the typical value for the metric, based on over 40 hours of recent historical data captured during normal conditions.

  • The horizontal blue band shows the typical range of variation for the metric. Values within the band are considered normal variation, based on recent historical data. A deviation outside the range is considered abnormal and may lead to an alert being fired if the sensitivity conditions are met.

  • The solid blue chart line shows the metric value for the dimension related to the root cause.

  • The solid yellow chart line shows the metric value for all other traffic not related to the root cause dimension.

Second Time Series

When available, a lower time series shows the progression of one or more related metrics over an identical period.

For Video Start Failures Technical and Exits Before Video Start, this chart shows a time series of Attempts for traffic in the alert root cause dimensions and all traffic excluding the root cause dimensions.

For Video Startup Time, this chart shows a time series of Plays for traffic in the alert root cause dimensions and all traffic excluding the root cause dimensions.

For Rebuffering Ratio, this chart shows a time series of Active Plays for traffic in the alert root cause dimensions and all traffic excluding the root cause dimensions.

For Video Playback Failures Technical, this chart shows a time series of EndedPlays for traffic in the alert root cause dimensions and all traffic excluding the root cause dimensions

Use the time sliders to zoom in and out of the time series to display the progression of the alert. The example shows that the issue started at 14:496 and subsequent alerts fired as Info at 14:53, Warning at 115:01, and Critical at 15:07.

The second time series shows the progression of Attempts during the same period for comparative analysis with Video Start Failures Technical. For example, an increase in the Video Start Failures Technical percentage while the number of Attempts decreased indicates an increasing impact on video engagement.

Hover over the time series to display detailed data at a specific 1-minute interval, such as the threshold value and number of impacted unique devices, with the respective totals within the AI alert dimension group:

  • Impacted Unique Devices: the unique devices impacted in the current interval.

  • Total Unique Devices: the unique devices in the current interval.

  • Cumulative Impacted Unique Devices: the unique devices impacted from when the issue started.

  • Cumulative Total Unique Device: the total unique devices from when the issue started.

Impacted unique devices are counted inclusively during each interval. As a result, the same unique device may be impacted in different intervals and therefore counted as an impacted device in each of those intervals, while it is only added to the Cumulative Impacted Unique Devices total in the first interval of impacted.

Impacted Sessions Due To Alert

The Diagnostics page also shows a list of up to 500 sessions that were impacted by the detected issue.

The system starts to compute impacted sessions for an alert as early as several minutes before the alert is fired. At each minute, the system computes all the sessions that are considered impacted within that minute and appends those sessions to the list of impacted sessions. This process repeats each minute until the total number of impacted sessions exceeds 500 or the detected anomaly ends.

The criteria for a session being impacted are defined per metric. For VSF-T alerts, all the VSF-T sessions are considered impacted. Similarly, all the EBVS sessions are considered impacted for EBVS alerts. For Rebuffering Ratio and VST alerts, only the sessions whose corresponding metrics are higher than the expected variances are considered impacted, and only the most impacted ones are picked if there are more than 500 impacted sessions within a minute.

For all AI alert metrics, the Subset of Impacted Sessions due to Alert table provides the following information:

  • IP address: This is the IP address of the device streaming this session. If IP address is not made available by publisher then this field is blank

  • ViewerID: This is the viewer/customer id passed to Conviva from publishers. If the viewer id has not been passed during device integration then this field is blank

  • CDN: This is the CDN used for playback.

  • ISP: This is the ISP that the end user device is connected to.

  • ASN: This is the ASN that the end user device is connected to.

  • City: This is the city location of the end user device.

  • State: This is the state location of the end user device.

  • Country: This is the country location of the end user device.

  • Device: The device used to view this video session.

  • Asset: The name of the impacted video asset.

  • Stream URL: The URL of the video stream.

  • Error Code: The error codes related to the alert.

  • Error Message: The error messages related to the alert.

  • Session Start Time: Time the session started.

  • Time Session Impacted: The timestamp for when the session was impacted.

  • Severity: Severity level of the alert.

  • Device Marketing Name: The marketing name of the device.

  • Player Framework: The name of the media player framework.

  • Browser Name and Version: The name of the browser on the device with the browser version number.

  • App Name and Version: The name of the player application on the device with the application version number.

  • OS Name and Version: The name of the operating system on the device with the OS version number.

For Rebuffering Ratio and Video Startup Time metric based alerts, this table sorts the sessions with the highest metric values at the top and provides the following additional information:

  • Metric value

  • StreamURL: The URL of the video asset. It represents the manifest URL for adaptive streaming protocols, such as HLS and Dash.

For Video Start Failures Technical metric based alerts, this table sorts the sessions by CDN (unsorted for single CDN) and provides the failure error code. This is the error code that resulted in video start failure for this session. Note that this information is shown only when the AI alert is related to a video start failure.

For Exits Before Video Start, this table sorts the sessions by CDN (unsorted for single CDN).

To display a list of impacted sessions for specific severity levels, click the Severity drop-down and select the severity levels, to display in the time series and impacted sessions table.

To expand the fields for a session to view the full session data (for example complete streamURLs and error codes), click the row for that session. This toggles between the list and table views of the session.

Click Customize to toggle which columns appear in the table.

Viewing AI Alerts List

To view the AI Alerts summary page:

  • Log in to Conviva Video and select AI Alerts in the Video feature selection


    Or

  • Click on the Alert Bell in the menu and click an AI alert to display the AI alert summary.

Each alert instance provides following information for each AI alert fired for a metric:

  • Metric value: Metric values that caused the AI alert to fire.

  • Root cause: Dimension(s) that caused the AI alert to fire.

  • Cumulative Impacted Unique Devices: The cumulative total of devices impacted by the issue at the time the AI alert fired.

  • Time alert fired: Date and time that the AI alert fired, linked to the detailed Diagnostics page for the alert.

By default, the instances are sorted by time the alerts fired, with the most recent instances at the top. You can also toggle the display order by clicking on a column name.

Searching AI Alerts

The AI Alerts page allows you to perform searches by the Root Cause, Metrics and Time Alert Fired fields.

  1. Type a search term such as Video Startup Time in the search box, followed by the Enter key.

  2. To clear your search and see all alerts diagnostics again, clear the search box and press the Enter key.

Note: Note: If your search yields no results, click the search box and press the Enter key. This will re-populate the list with all alerts.

AI Alert Email Subscription

Once an alert is fired, an email is immediately sent to the subscribed users. Individual users can subscribe to AI alert emails by selecting specific metrics or all available metrics. The system guarantees that an email is sent to each recipient within 10 seconds after an issue is detected. Within the alert email, an alert summary is provided, which includes the metric, the root cause of the issue, and the number of impacted sessions.

Note: Note: The AI alert email subscription settings do not impact the manual alert settings. For manual alert settings, see Alerts.

To subscribe to AI alert email notifications:

Click the Notifications menu and select the AI Alerts Email Subscription tab to open the AI Alerts Email Subscription settings.

The AI Alerts Email Subscription page appears with a list of the subscribed email addresses.

Click Subscribe to select individual metrics (or Select all metrics) and severity levels for alert notifications. Click Update to start to receive notifications. Otherwise, click Cancel.

To update an existing user's settings, hover over a current user email and select the Edit option.

Once you receive the AI alert email, click the Diagnostic Report link to view details about why the alert was fired and the impact of the issue.

Webhook Integrations

Admin users can also set webhooks to integrate Conviva AI alerts into external applications, such as JIRA or other support ticket applications.

To setup webhooks, see Setting Webhook Notifications.

Providing Feedback

We would like your input on which AI alerts are the most useful in helping you to proactively detect anomalies and shorten resolution times. To help us better understand AI alert usage, click the thumbs icons to like/dislike the AI alert displayed on the AI Alert Details page.

If you think an AI alert is not relevant, please provide additional feedback to help us understand why it is not relevant.