Only this pageAll pages
Powered by GitBook
Couldn't generate the PDF for 136 pages, generation stopped at 100.
Extend with 50 more pages.
1 of 100

Public Docs

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

What is DSPM?

Data Security Posture Management (DSPM) is a methodology that allows organisations to locate and classify sensitive data across their data stores, assess the risks associated with this data’s exposure based on its sensitivity, and implement access controls to this data.


Key Components of DSPM Tools

  • Data Discovery and Classification Automatically identifying and categorising data across an organisation’s data stores, particularly those in the cloud, based on sensitivity and regulatory requirements.

  • Risk Assessment Evaluating the security risks associated with both structured and unstructured data, considering factors such as exposure, access patterns and potential impact if breached.

  • Access and Permissions Management Ensuring that access to data is strictly controlled and that permissions are granted based on the principle of least privilege.

  • Data Protection Recommending security measures such as encryption, data masking and access controls to protect sensitive data.

  • Monitoring and Alerting Continuously monitoring data access and usage to detect suspicious activities or policy violations, with real-time alerts to facilitate rapid response.

Getting started

GQL Quick Guide

Basic information on the Getvisibility Query Language

What is GQL

  • GQL: Query language

  • Based on: Apache Lucene

  • Supports: Boolean, term, and range queries

  • Use: For custom queries without hard coding

Querying:

  • Choose terms from specific dataset: Files, Trustees, Activity, Management

  • Apply operations like AND, OR, =, !=, >, <, >=, <= to filter data

  • Form queries, e.g., flow=classification AND risk>=1.

Examples:

  • Simple: dataAttributeName=HR

  • Complex: complianceTag=PII AND dataAttributeName=HR AND (dataAttributeName= Record OR dataAttributeName=Legal) AND (detectorHits="Health Insurance" OR detectorHits="Compliance report")

Aggregation (Analytics):

  • Use in widgets for counters, charts, maps

  • Aggregate terms for complex visualisations

Choose how to run DSPM

Mark to create

Use DSPM in your company

Mark to create

Administration guide

Scan with Getvisibility

Networking

K3s uses Flannel to allow pod to pod communication between different hosts, is a lightweight provider of layer 3 network fabric that implements the Kubernetes Container Network Interface (CNI). It is what is commonly referred to as a CNI Plugin.

Flannel supports multiple backends for encapsulating packets. By default K3s uses Virtual Extensible LAN (VXLAN), which runs a Layer 2 network on top of a Layer 3 infrastructure. VXLAN uses in-kernel VXLAN to encapsulate the packets using UDP on port 8472.

During one of our HA setups () we noticed after running tcpdump -leni any -w output.pcap the UDP packets were not arriving at the destination host and we had to change the Flannel backend from VXLAN to host-gw which uses IP routes to pod subnets via node IPs.

To use host-gw backend you need to execute the following steps in all the nodes:

Enable 2FA

Two-factor authentication (2FA) enhances security by requiring users to provide two forms of identification before they are granted access. This method adds a layer of protection to the standard username and password method, making it significantly more challenging for potential intruders to gain unauthorised access.

Implementing 2FA in Keycloak helps organizations bolster their defences against data breaches and unauthorized access, which is crucial for protecting sensitive data in today’s digital landscape.

How to configure it?

  1. Going to the 'Authentication' tab, clicking on the browser

Estimate hardware capacity needs

A VM or server with the following specifications:

  • 16 x CPU cores (x86_64 processor with speed of 2.2 GHz or more). The CPU must support the instructions SSE4.1 SSE4.2 AVX AVX2 FMA

  • 64GB RAM

  • 700GB Free SSD disk. K3s will be installed in /var/lib/rancher so space should be allocated there. We also need 10-20 GB free space at / and /var.

Customer Support Portal

How to access the Customer Support Portal and submit a ticket

To access and use the Portal, please follow the below steps:

  1. Access the Portal by visiting

  2. If an account has not yet been created (this is usually sent via email upon first contact with ), select the 'Sign Up' option located in the top right corner of the screen.

  3. When the email is received, use the URL provided in the email to set a new password. After setting your password, the 'Login

Critical & Sensitive Classification Attribute Modification

Where to find the risk calculation rules.

To view the Critical and Sensitive Classification rule configuration from the Dashboard Click on Administration > Detectors > Attributes Detectors

Here the Critical and Sensitive attributes configuration can be viewed.

The Critical & Sensitive rules for Risk calculation can be re-configured by clicking the pencil icon beside the rule.

The Risk rules are based on a GQL query which can changed by clicking on the pencil icon on the right or by importing a JSON file using the “Import from file” function.

Once the edits are saved, the Sensitive and Critical Flags will update automatically. For the Risks to be recalculated, a rescan is needed.

Ubuntu 20.04 LTS Server OS is recommended. RHEL 8.6, 8.7, 8.8, & 9.2, and Suse Linux 15.3 are also supported but may need extra configuration.

  • Port 443/TCP open

  • Outbound internet access to download application artefacts. 100 Mbps download speed recommended

  • Domain Name Service (DNS) with public name resolution enabled

  • Network Time Protocol (NTP)

  • Software Update Service - access to a network-based repository for software update packages.

  • Fixed private IPv4 address

  • Unique static hostname

  • Understand installation requirements

    Getvisibility products use Kubernetes under the hood, and we have very specific hardware requirements. It's crucial to meet the minimum resource requirements defined for containers, as failing to do so can lead to various problems:

    1. Resource Starvation: If a container requests more CPU or memory resources than are actually available on the cluster, it can lead to resource starvation. This means other containers may not get the resources they need to run correctly, causing performance degradation or even crashes.

    2. Throttling: Kubernetes imposes resource limits for containers, and if a container's requested resources exceed its limits, Kubernetes may throttle or terminate the container to prevent it from consuming excessive resources, resulting in performance degradation.

    3. Out of Memory or CrashLoopBackOff Errors: Oversubscribing memory resources can lead to containers running out of memory, causing them to terminate abruptly or enter a constant restart loop, commonly referred to as a "CrashLoopBackOff" error.

    4. Performance Degradation: When requested CPU resources are larger than allocated, it can lead to performance issues as containers compete for CPU time, potentially slowing down critical processes and making the application unresponsive.

    5. Difficult Troubleshooting: Misallocation of resources, whether it's too little or too much, can be challenging to identify and correct. This can lead to extended troubleshooting efforts and downtime as administrators attempt to resolve resource-related issues.

    To ensure a stable and efficient Kubernetes deployment of our product, it's essential to accurately configure resource requests and limits for containers based on their actual requirements. This prevents resource-related problems and ensures smooth operation within the Kubernetes cluster.

    Installation

    Deployment guide

    The deployment guide will help you install, configure, and manage your deployment environment. It covers everything from initial setup to advanced configurations and troubleshooting.

    It includes:

    • Prerequisites for a successful installation.

    • Step-by-step instructions for installing and upgrading K3S.

    • Setting up Rancher and Fleet agents with an HTTP proxy.

    • Guide to installing Synergy/Focus/Enterprise using Helm without Rancher.

    • Guide for configuring Keycloak.

    • Troubleshooting guide.

    Implementation

    TODO

    Configure data sources

    Troubleshooting

    Integrations

    Configure detectors

    Authentication

    Uninstall K3s
  • Restart VM

  • You will need to slightly change the command you use to configure K3s into master nodes (you don’t need to change the command for workers nodes as they’ll read the configuration form the master ones) by appending --flannel-backend=host-gw , example:

  • Flannel
    PLTF-157
    curl -sfL https://$URL/k3s.sh | INSTALL_K3S_VERSION="v1.26.10+k3s1" K3S_KUBECONFIG_MODE="644" sh -s - server --node-name=master1 --cluster-init --flannel-backend=host-gw
    1. In the 'Browser - Conditional OTP' section select 'Required'

    1. Select the user needed to be added to use 2FA and add 'Configure OTP' under the 'Required user actions' section

    1. When logging in with that user, a screen will guide on how to follow the configuration

    1. Please ensure that 'Configure OTP' under the 'Required user actions' section is removed (otherwise everytime it will ask to configure otp as if it was the first time).

    1. After configuring this every time the user logs in, they will be required a ONE-TIME CODE

    1. (Optional) OTP settings can be found here

    ' button at the top right corner of the screen allows access to the account.
  • Once logged into the Customer Portal, it is possible to view and respond to tickets submitted by the organisation's other members.

  • If access needs to be restricted exclusively to tickets submitted by the logged in user, please reach out to Support.

    Once logged in the following options are at the top of the screen:

    My Tickets - This will show all tickets that have been created by the logged in account.

    Other Tickets - This will show all tickets that the logged in account has been CC’d on.

    Archived Tickets - This will show any ticket older than 120 days.

    Company Tickets - This will show all tickets submitted by members of the organisation.

    From the ticket view, all updates to the ticket are visible and has the option to mark the ticket as solved. This will close the ticket and then a ‘Happiness Rating’ can be added to rate the level of support that was received.

    To close the ticket click on ‘Please consider this request solved', then add a final message and click 'Submit’.

    If there are issues or feedback about the Support Portal, please send an email to [email protected], and we will make every effort to address any concerns.

    Customer Support
    [email protected]

    Install Synergy/Focus/Enterprise using Rancher

    Make sure you have already configured a license through the License Manager and that the end user has installed K3s and run the registration command as described in K3S installation.

    Prerequisites

    Please check K3S installationfor installation requirements.

    Installation

    1. Go to Rancher dashboard and click on the customer cluster that by now should be Active:

    1. Go to Apps > Charts and install the GetVisibility Essentials Helm chart:

    2.1. Click on Enable ElasticSearch:

    2.2. Configure the UTC hour (0-23) that backups should be performed at:

    1. Go to Apps > Charts and install the GetVisibility Monitoring Helm chart and Install into Project: Default.

    1. Go to the global menu Continuous Delivery > Clusters and click on Edit config for the cluster:

    a. For Synergy: add 2 labels product=synergy environment=prod and press Save.

    b. For Focus: add 2 labels product=focus environment=prod and press Save.

    c. For Enterprise: add 2 labels product=enterprise environment=prod and press Save.

    d. For DSPM with the Agent: add 2 labels product=ultimate environment=prod and press Save.

    e. For DSPM without the Agent: add 2 labels product=dspm environment=prod and press Save.

    Configuring Rancher and Fleet agent to run behind an HTTP proxy

    How to configure Rancher and Fleet agent.

    This is applicable when there is a cluster showing as “unavailable“ after the user configured a proxy on the server.

    Replace $PROXY_IP with the IP:PORT of the corporate proxy server and $NODE_IP with the IP or CIDR of the server running Kubernetes.

    1. Run env on the user’s server to determine what is the proxy IP. Ensure that the following line is checked:

    2. Open the file /etc/systemd/system/k3s.service.env and append the following lines:

    It is important to use correct IP addresses in the place of placeholders $PROXY_IP and $NODE_IP below.

    1. Restart k3s:

    1. Go to the Rancher dashboard Cluster Management > Clusters and click on Edit Config for the cluster:

    a. Go to Advanced Options:

    b. Configure the following Agent Environment Variables and press Save:

    Remember to use correct IP addresses in the place of placeholders $PROXY_IP and $NODE_IP below.

    1. Run the command:

    6. Type letter “i“ to insert text and on the env section, type the following lines:

    Example:

    Save by pressing Esc and then typing "wq"

    1. Do the same on the fleet-agent by running the command:

    1. Repeat Step 6.

    2. After applying all the changes, wait for the cluster to show as Online on Rancher.

    Configure Dashboard

    In order for the connectors to support proxy settings, you will need to enable it in the configuration page:

    How to Check AI Mesh Version

    To check the AI Mesh version from the Dashboard click on Administration > AI Mesh

    In the top right of this screen the AI Mesh Version can be seen.

    If more information on the AI Mesh is required or if tailoring is needed please contact Support.

    Rescan Files

    When a targeted rescan is needed it is possible to scan individual files or a specific selection.

    Reasons for a rescan can include:

    Ensuring that recent changes to files are reflected in the UI.

    If new patterns have been added to Pattern Matching.

    If new rules have been added in Controls Orchestration.

    Files can be sent for rescan individually by clicking on the hamburger menu for that file and click on “send to classification pipeline.

    There is also an option to reclassify multiple files at once by selecting them using the tickboxes on the left of the screen.

    Once the required files are selected the option to rescan appears on the bottom right of the screen.

    Data Source Permissions

    How to find the list of permissions granted for a Data Source

    The required permissions for scanning are documented by Data Source.

    For more information please review the list here.

    To Check the configured permissions for a Data Source Navigate to Administration > Data Sources and click on the hamburger menu.

    In the dropdown click permissions.:

    The example below shows the permissions for SharePoint Online.

    Scan Scheduler

    How to set a specific schedule for a scan.

    When a Data Source is added to Getvisibility for scanning, the scan begins automatically.

    If a rescan is needed this can be configured by clicking on Administration > Data Source > (the Data Source that needs Rescan e.g. One Drive) > Hamburger menu > Rescan Scheduler.

    The default configuration is Does Not Repeat.

    By clicking the drop-down menu other options can be choosen:

    Daily

    In this option both the time zone and time of day can be chosen

    Weekly

    With this option as well as the above configuration a specific or multiple days of the week

    Monthly

    This gives the option to pick a specific day or days each month to run the rescan.

    Configuring Rancher and Fleet agent to run behind a HTTP proxy if cluster was previously registered

    This article is applicable when there is a cluster showing as “unavailable“ after the user configured a proxy on the server.

    If you have a cluster which hasn’t been registered yet (registration command has not been run yet), then refer to .

    Replace $PROXY_IP with the IP:PORT of the corporate proxy server and $NODE_IP with the IP or CIDR of the server running Kubernetes.

    Scan Analytics

    Scan Analytics shows in-depth information gathered during the scan.

    There are two ways to access Scan Analytics, either via the main Analytics Dashboard or via the Data Sources page.

    Analytics Dashboards

    To access the Analytics Dashboards click on the link on the Getvisibilty homepage.

    SMB

    How to configure SMB/CIFS connection for scanning

    Configuring SMB connector in Dashboard

    • Navigate to Administration -> Data Sources -> SMB -> New scan

    Quick start

    • Keycloak configuration (each part ends with a whats next) + Log in to an existing account

    • Set up an integration (configure a data source)

    • Configure taxonomy → link to a full taxonomy setup

    Monitoring New Files via DDR Streaming

    Getvisibility DDR continuously monitors new files generated through streaming and provides real-time insights

    1. Filter by Streaming: Under Enterprise Search, use the filter scanTrigger=streaming.

    2. View File Details: DDR displays:

      1. File Path:

    Single Sign-on (SSO)

    The platform supports a wide range of Single Sign-On (SSO) protocols and providers, enabling seamless authentication across various services and applications. Here are the primary SSO protocols and some of the identity providers that Keycloak can integrate with:

    SSO Protocols

    1. OpenID Connect (OIDC): A modern, widely adopted protocol based on OAuth 2.0 for client authentication and authorization. It's used by many identity providers for secure and flexible user authentication.

    Supported Data Sources

    Below is a list of Data Sources that Getvisibility DDR (Streaming) currently supports:

    • AWS IAM

    • AWS S3

    • Azure AD

    Check Detectors → link to a full detectors setup
  • activate pattern matching → link to a full pattern matching setup

  • import dashboards → link to a full dashboard configuration guide

  • view scan results (explain dashboards, use cases, enterprise search, navigation, remediations)

  • configure policies (dataguard)

  • SAML 2.0: a commonly used protocol for exchanging authentication and authorization data between parties, particularly in enterprise environments.

    Identity Providers

    The platform allows integration with a variety of identity providers using these protocols, including:

    • Google

    • Amazon

    • OpenShift v3 / v4

    • GitHub

    • LinkedIn

    • Microsoft

    • Apple

    • Facebook

    This flexibility ensures that SSO can be implemented using preferred systems and protocols, facilitating easy and secure access to multiple applications with a single set of credentials.

    Azure Blob
  • Azure Files

  • Exchange Online

  • OneDrive

  • SharePoint Online

  • Box

  • Confluence Cloud

  • Gmail

  • Google Drive

  • Google IAM

  • SMB

  • LDAP (Windows AD)

  • Types of Scan

    The two types of scan are Trustee Scan and File Scan

    Trustee Scan

    This scan provides the list of Users and Groups on a Data Source

    File Scan

    This scan provides information about files and folders on a Data Source including structure and metadata.

    Once both scans are completed the data is processed and the two sets are combined to show who has access to what files.

    Enterprise setup

    Is Customer Data Saved by Getvisibility?

    No, file content is never saved. The classification server maintains a registry of file names and their properties but not the content. There is also an anonymization mechanism built into the Classification software that reduces file content to a mathematical number that is used throughout the platform.

    More specifically when a Data Source is added to the platform the following occours:

    • Data source scanned and general meta data is read.

    • This provides file path and permissions on the files.

    • The files are then sent to the OCR service to read the content.

    • The read content is then passed through the AI Mesh.

    • Through the process, customer data is not stored on disk and is only ever held in memory.

    • There is no long-term storage of data.

    http_proxy=http://X.X.X.X
    http_proxy="$PROXY_IP"
    https_proxy="$PROXY_IP"
    no_proxy="$NODE_IP,localhost,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.svc,.cluster.local"
    systemctl restart k3s.service
    HTTP_PROXY: $PROXY_IP
    HTTPS_PROXY: $PROXY_IP
    NO_PROXY: $NODE_IP,localhost,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.svc,.cluster.local
    kubectl edit deployment -n  cattle-system cattle-cluster-agent -o yaml
    - name: HTTP_PROXY       
       value: $PROXY_IP

    - name: HTTPS_PROXY       
       value: $PROXY_IP
    - name: NO_PROXY     
       value: $NODE_IP,localhost,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.svc,.cluster.local
    
    kubectl edit deployment -n  cattle-fleet-system fleet-agent  -o yaml

    Run env on the user’s server to determine what’s the proxy IP. You should check the following line:

    1. Open the file /etc/systemd/system/k3s.service.env and append the following lines:

    1. Restart k3s systemctl restart k3s.service

    2. Go to the Rancher dashboard Cluster Management > Clusters and click on Edit Config for your cluster:

    a. Go to Advanced Options:

    b. Configure the following Agent Environment Variables and press Save:

    1. Run the command kubectl edit deployment -n cattle-system cattle-cluster-agent -o yamland

    2. Type letter “i“ to insert text and on the env section, type the following lines:

    Example:

    Save by pressing ESC and then typing “wq”

    1. Now do the same on the fleet-agent by running the command kubectl edit deployment -n cattle-fleet-system fleet-agent -o yaml

    2. Repeat the step #6.

    3. After applying all the changes, wait for some minutes for the cluster to become Online on Rancher again.

    Configuring Rancher and Fleet agent to run behind an HTTP proxy

    Analytics by Data Source

    To access a per Data Source drill down click on Administration > Data Sources > Data Source of Choice (e.g. OneDrive) > Analytics Icon

    In the side bar detailed information regarding the scan of the chosen Data Source can be reviewed.

    Clicking on any of the fields in the Sidebar brings up a more detailed view of the data as well as giving the option to Remediate any issues that have been found.

    For a more detailed breakdown of Analytics please see here.

    Enter the details of the SMB server to scan

    • Name: Give a name to the scan to identify it later

    • Username: The user must be an admin level and have access to all the SMB/CIFS shares to be scanned

    • Password: Password for the admin user

    • Host IP Address: The IP Address of the SMB/CIFS server

    • Domain/Workgroup: The domain or workgroup to which the CIFS/SMB server belongs

    • Port: 445 is the default port, however if the default port is not used, input the correct port number for the SMB protocol

  • Click on the Folder icon in Path to select a particular share/folder to scan, or leave the path as empty to scan all shares

    • Save the configuration

    • Once the configuration is saved, click on the icon on the right and select Start file scan to begin scanning

    • The scan results can be viewed under Dashboard -> Enterprise Search

    Supported Protocols:

    The connector supports all SMB dialects up to SMB 3.1.1

    The location of the file in the data source.
  • Classification: Sensitivity level (Confidential, Highly Confidential, etc.).

  • Risk Level: Based on context and user activity.

  • Compliance Tags: Indicators for GDPR, HIPAA, PCI, and other regulations.

  • Detection Rules: The specific DDR rules triggered by the file.

  • Incident Response: If a high-risk file is detected, DDR generates an alert and suggests remediation steps, such as quarantining the file or revoking access.

    Using Azure AD as Keycloak Identity Provider

    You need Azure Admin permission to complete this integration.

    Azure app configuration

    Create new Azure app

    Create a new App registration from selecting support for Multiple organizations when asked.

    Find App registration in search.

    Click New registration.

    Fill in details as shown below.

    Give the application a name and write down Application (client) ID as it will be needed later.

    Configure a new secret

    Next, go to your App Registrations’ Certificates & secrets to create a New client secret. Copy the Value of the secret to somewhere at had as it is needed later in the configuration.

    Adding Keycloak IdP

    In Keycloak, create a new IdP by selecting Microsoft from the drop down

    Populate Client ID (this is Application (client) ID in Azure) and Client Secret (this is Value from Azure) using values obtained in previous steps.

    Finally copy Redirect URI from Keycloak and add Redirect ID UI link in Azure App.

    Test the functionality

    Open up a new Incognito mode in a browser and use

    Gmail

    This document provides information on how to configure Gmail connection for Focus product.

    Create OAuth2 Credentials

    • Create a Project in Google Cloud Console:

      • Go to the

      • Create a new project or select an existing project

    • Enable the Gmail:

      • In the Google Cloud Console, navigate to the "APIs & Services" > "Library"

      • Search for "Gmail API" and click on it

    • Create OAuth 2.0 Credentials:

      • In the Google Cloud Console, navigate to the "APIs & Services" > "Credentials" tab

      • Click "Create credentials" and select "Service account"

    Delegate domain-wide authority to your service account

    • From your domain's , go to Main menu menu > Security > Access and data control > API controls

    • In the Domain wide delegation pane, select Manage Domain Wide Delegation

    • Click Add new

    • In the Client ID field, enter the client ID obtained from the service account creation steps above

    • In the OAuth Scopes field, enter a comma-delimited list of the scopes required for the application

    • Use the below scopes:

      For scanning

      • https://www.googleapis.com/auth/admin.directory.user.readonly

      • https://www.googleapis.com/auth/gmail.readonly

      For tagging

    Scan Configuration Fields

    Description of the fields in the Scan Configuration popup

    The below screenshot shows the fields that appear in the Scan Configuration screen.

    Please note that not all of these fields are available for all Data Sources.

    Name

    Set a unique name so that the Data Source is easy to identify.

    Credentials

    This a dropdown to select the credentials that have already been configured for the Data Source.

    Geographic Location

    This is to indicate the physical location of the server the data sits on.

    Path

    This only needs to be defined for a specific location needs to be scanned.

    If left blank the entire Data Source will be scanned.

    Data Owner

    This is the person that is to be the person responsible for the data.

    This setting is optional.

    If the Data streaming check box is not visible it may be because the license for DDR is not present.

    To learn more about getting a license for DDR please reach out to the Getvisibility Enablement Team.

    LDAP

    How to configure LDAP connection to gather permissions and access rights for groups, users, and other entities (Trustees) on an LDAP server.

    Configuring LDAP connector in Dashboard

    • Navigate to Administration -> Data Sources -> LDAP -> New scan

    • Enter the details of the LDAP server to scan

      • Name: Give a name to the scan to identify it later

      • Username: The user must be an admin level and have access to all the LDAP utilities to be scanned. The username should be entered in the format [email protected]

      • Password: Password for the admin user

    • Save the configuration

    • Once the configuration is saved, click on the icon on the right and select Start trustee scan to begin scanning

    • The scan results can be viewed under Dashboard -> Access Governance

    Confluence Cloud

    How to configure Atlassian Confluence Cloud connection to scan it.

    URLS for whitelisting if a proxy is in place

    https://{your-domain}/wiki/api/v2 https://{your-domain}/wiki/rest/api *.atlassian.com *.atlassian.net

    Generating an API token

    • Log in to

    • Click Create API token

    • From the dialog that appears, enter a memorable and concise Label for the token and click Create

    • Click Copy to clipboard, and save it somewhere secure. It isn't possible to view the token after closing the creation dialog

    Configuring Confluence Cloud connector in Dashboard

    • Navigate to Administration -> Data Sources -> Confluence Cloud -> New scan

    • Enter the details

      • Name: Give a name to the scan to identify it later

      • Username: The email address for the Atlassian account you used to create the token

    • Save the configuration

    • Once the configuration is saved, click on the icon on the right and select Start trustee scan to begin the trustee scanning

    • The scan results can be viewed under Dashboard -> Access Governance

    • Click on the icon on the right and select Start file scan to begin the files scanning

    • The results can be viewed under Dashboard -> Enterprise Search

    Uploads to Rancher

    Rancher manages clusters through its control plane. Managed clusters send data to Rancher's central management servers. This includes "always-on" data, exchanged with Rancher whenever the cluster has Internet access, and "on-demand" data, which should be explicitly requested by GetVisibility Support via the Rancher UI.

    Always-On Data sent to Rancher includes:

    1. Cluster Metadata:

      • Information about the cluster

      • Nodes list and metadata (IP address, hostname, cluster role, etc.)

      • K3s version

    2. Health and Monitoring Data:

      • CPU and RAM usage on each cluster node

      • Current Metrics (via Prometheus)

      • Fleet agent heartbeat

    On-Demand Data:

    1. Cluster Metadata:

      • Resource allocation (which Kubernetes resource runs on which node)

      • Current cluster-level Alerts

      • Current cluster-level Events

    kubectl Commands Output:

    Rancher allows running kubectl exec into running containers, but this feature is blocked by our WAF. Support needs SSH access or screen-sharing with the customer to execute these commands.

    None of those categories are critical for operation, and access to Rancher can be disabled after deployment.

    Google Drive Auto-labelling

    Cloud Data classification with Google Drive (GDrive) Auto-labelling

    At Getvisibility, we understand the modern challenges of data management. With our leadership in Data Security Posture Management (DSPM), we're transforming the way organisations comprehend, classify, and protect their data.

    Getvisibility is a DSPM solution that can conveniently connect, discover, classify, and enable the protection of unstructured data in an organisation's data repositories. Our latest update includes connectors for GDrive, a real-time file storage and synchronisation service that is a product within Google Workspace, which has over 9 million paying organisations. Our GDrive connectors provide an easy setup for file scanning to begin and for insights on an organisations data to be delivered at speed.

    Utilising Getvisibility's cutting-edge Machine Learning (ML) classification to label files in GDrive represents a significant step in managing your sensitive data, regardless of its origin. Step into the future of data protection with Getvisibility by applying high precision tailored artificial intelligence (AI) coupled with Google Drive's native file labelling to significantly enhance the security of your Google Drive data, automatically and at scale.

    With our Google Drive Auto-Labelling feature, you no longer need to manually tag your files. Let our high precision, bespoke artificial intelligence (AI) mechanisms, integrated with Google Drive's native file labelling, classify and protect every document in your GDrive, automatically and at scale.

    Addressing the modern Data Security challenges

    Remote working, regulatory compliance, constant pressure of cyber attacks bring forward challenges encompassing interoperability, scalability, and governance. These complications can escalate to severe security breaches, including threats like intellectual property theft, both from internal and external sources. It's essential to counter these data security concerns with a robust DSPM solution. Getvisibility's Tailored & Narrow AI, charged by Large Language Model (LLM), aligns perfectly with distinct business needs for precise data analysis. Our state-of-the-art AI system can:

    • Minimise data handling costs by pinpointing only essential data to keep.

    • Provide reports on data at risk.

    • Seamlessly integrate with DLP platforms.

    • Automatically tag files.

    Benefits tailored for you:

    • Enhanced Data Security: Every file, irrespective of size, is labelled, solidifying its traceability and protection.

    • Time-saving Mechanism: Move past the era of manual classification. Entrust our machine learning and witness your files being labelled in no time.

    Dive deeper with GetVisibilty's GDrive auto-labelling

    Why settle for the ordinary? Experience unmatched efficiency and security with our innovative solution. For a comprehensive understanding of how Getvisibility can redefine your organisation's data security landscape, reach out to us or explore our website.

    For More Information:

    What is DDR?

    A brief description of DDR

    Getvisibility's Data Detection and Response (DDR) solution is designed to protect sensitive data by providing near real-time detection and response capabilities. It ensures that data across user environments are constantly monitored and any potential threats are flagged immediately. DDR focuses on data-centric security, ensuring organisations have visibility and control over their critical information assets.

    Key Features of DDR:

    1. Real-Time Monitoring: DDR continuously identifies data activities, including access, modification, sharing, deletion, and other activities to identify suspicious and malicious events.

    2. Automated Response: DDR sends instant alerts for quick remediation.

    3. Risk Mitigation: It ensures regulatory compliance with Privacy Compliance standards like GDPR, HIPAA, PCI-DSS, CCPA and other standards.

    4. AI-Powered Insights: DDR leverages proprietary Getvisibility’s AI-mesh models to analyse data context for the best accuracy.

    5. Data Intelligence: It provides dashboards with visibility into sensitive data and risks to your data.

    How DDR Works:

    1. Data Analysis: DDR identifies all data across unstructured data environments and then classifies the data based on its content and context.

    2. Risks Analysis: It evaluates user access, permissions, sharing and data location to identify risks related to your data.

    3. Policy Enforcement: DDR applies predefined and custom security policies to protect data based on its classification and sensitivity.

    4. Incident Response:

    How to Configure DDR Rules

    Create Scan Configuration

    To configure DDR rules, follow these steps:

    1. Access the Getvisibility DDR dashboard using your credentials.

    2. Under the DDR tab, select Create Scan Configuration to connect to the data sources to be monitored.

    1. Define Scopes: Specify the data sources that will be connect to.

    2. Verify Configuration: Ensure that at least one data source is successfully connected. A green checkmark will confirm the completion.

    Check for Incoming Events

    Once the scan configuration is complete:

    1. Go to Administration > Live Events Streaming to view real-time events.

    1. Monitor Event Activity: Filter events by source, user name, action type (create, update, delete), and event type.

    Overview Page

    The Overview Page provides a comprehensive view of DDR's performance:

    1. Event Statistics: Displays the number of events by source, such as Google Drive, SharePoint, OneDrive, and Box.

    2. Data Source Activity: Visualizes active data sources and the volume of events generated by each.

    3. Event Timeline: Shows when events occurred, helping identify peak activity periods and anomalies.

    Open Risks

    The Open Risks section highlights detected threats, categorised by risk type:

    • Public Exposure: Identifies sensitive files accessible to external users via public links.

    • External Sharing: Detects files shared outside the organisation, potentially exposing sensitive information.

    • Internal Over-Sharing: Flags data with excessive permissions within the organisation.

    For each risk, DDR provides detailed insights, including the file path, user activity, and recommended remediation steps.

    Import custom TLS certificate

    Using GV-Essentials chart

    You can configure a custom TLS certificates during gv-essentials chart installation.

    Go to TLS Certificate tab, click on Use Custom TLS Certificate and paste in the content of the certificate and private key in PEM format:

    Use Self-signed Certificate

    If you don’t want to import a custom certificate leave Use Custom TLS Certificate disabled, a self-signed certificate will then be auto generated instead.

    Sources

    Scanning

    Scanning process and statuses

    To review Scans and their status go to Data Sources in the Administration drop-down.

    The scanning process discovers and analyses files across all configured data sources. It operates in three steps:

    1) Discovery

    Supported Languages for ML Classifiers

    Listed below are the languages supported by the ML (Machine Learning) classifiers, grouped by language pack.

    Upgrade K3s

    This document outlines the steps to install and update K3s servers and how to deploy and backup Focus services.

    K3s Installation - Client

    Please refer to for the installation details.


    Exchange Online

    This document provides information for about creating a Exchange Connector app, which is required for Focus product to connect to customer's Exchange Online accounts.

    Registering an Azure App

    • Login to

    Webhooks

    How to configure a Webhook.

    A webhook is a way for one application to automatically send information to another application when something specific happens. For E.g. getting an instant message when a new email is received. It helps different apps talk to each other in real-time.

    In DSPM+, the webhook service makes it possible to subscribe to documents after cataloguing/classification stages. When a document passes Cataloguing or Classification, based on GQL provided in webhook and callback URL is sent to the target system (client system). Similarly in EDC webhook can be used to send information to client system based on activity of the users.

    Flow:

    Pattern matching

    The pattern matching functionality allows users to identify particular pieces of information in a document. This is implemented by using Regular Expressions (RegEx) that will try match content.

    Pattern Matching page

    Select Administration and then Pattern Matching:

    A selection of pre-configured patterns available:

    If these patterns are detected during a scan they will be presented using the “Keyword Hits” to the user. On the Pattern Matching page, users have access to various Filters and Options:

    Multi-Language Machine Learning

    Access global data security excellence with multi-language machine learning (ML) from GetVisibility

    Getvisibility tackles unclassified data protection across multiple languages. We do this using a cutting-edge in-house Data Science team who are forging a global AI-driven solution. Here at Getvisibility we have intro- duced eleven new ML languages, including Arabic, Chinese, Spanish, and more, providing comprehensive multilingual data insights to our customers. Elevate your Data Security Posture Management (DSPM) with tailored AI that breaks language barriers for informed decisions when fortifying an organisation's data security defences.

    K3s on RHEL/CentOS/Oracle Linux

    firewalld/fapolicyd

    It is recommended to disable and :

    nm-cloud-setup

    If enabled, it is required to disable

    ChatGPT

    How to configure ChatGPT connection for scanning.

    API key

    Owners can generate an API key in the . Note that the correct Organization must be selected when creating a key, corresponding to the administered workspace. Do not select the owner's personal organization.

    • Create a new API key:

    Configuring Taxonomies & Labels

    How to define a custom taxonomy

    To access the Taxonomy screen click on Policy Centre > Compliance Hub.

    Once in the screen the default Tags are visible.

    To add a label click on the + on the top left of the list of Tags.

    In the pop up enter a name for the new Label and optionally a Tag alias.

    The Tag alias is usually used for multi-lingual deployments if not all users speak English.

    http_proxy=http://X.X.X.X
    http_proxy="$PROXY_IP"
    https_proxy="$PROXY_IP"
    no_proxy="$NODE_IP,localhost,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.svc,.cluster.local"
    HTTP_PROXY: $PROXY_IP
    HTTPS_PROXY: $PROXY_IP
    NO_PROXY: $NODE_IP,localhost,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.svc,.cluster.local
    - name: HTTP_PROXY       
       value: $PROXY_IP

    - name: HTTPS_PROXY       
       value: $PROXY_IP
    - name: NO_PROXY     
       value: $NODE_IP,localhost,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.svc,.cluster.local

    Fleet bundle synchronization data

  • Current cluster status (healthy/unhealthy)

  • Kubernetes Objects:

    • List Kubernetes objects (usually Pods and Configurations)

    • Delete one or more objects

    • Create a new Kubernetes definition

    • Update existing definition (limited)

  • Kubernetes Container Logs:

    • Current logs via Rancher UI

    • Historical logs via Loki and Prometheus

  • Health and Monitoring Data:

    • Historical metrics via Grafana

  • Upon detecting a threat, DDR generates alerts and enables users to take remediation actions, such as moving files or revoking access.
    Analysis of data and content in various languages

    Securing unclassified data through discovery, classification and protection is not just an English language problem. At Getvisibility we recognise this and have been developing our own cutting edge Data Science team to deliver a truly global product in the world of artificial intelligence (AI) and data security.

    Many organisations around the world struggle to adapt to the rising problem of Data Security Posture Manage- ment (DSPM), let alone when dealing with multilingual and multicultural datasets.

    There are many data security problems in today's business world and they are growing at an exponential rate. Some of the data problems facing organisations, in any language, are double extortion ransomware, intellectu- al property theft, and data loss prevention (DLP) software which poses some of its own challenges with inter- operability, scalability, and governance.

    This is why Getvisiabliy is proud to announce the launch of eleven new ML languages that allow our Getvisibil platform to understand multiple language data files simultaneously. These eleven new languages are:

    • Arabic

    • Chinese

    • French

    • German

    • Hebrew

    • Italian

    • Polish

    • Portuguese

    • Romanian

    • Spanish

    • Thai

    Our multilingual data discovery ML gives organisations the strategic competitive advantage by delivering increased data knowledge from multiple language data sources within an organisation. This allows better informed data security decisions, at speed and precision with the Getvibiility platform.

    Getvisibiliy’s industry leading performance is due to our Tailored & Narrow AI, now across eleven new languag- es. This approach supercharges an organisation's DSPM solutions with our customisable AI, specifically trained with Large Language Models (LLM), that aligns with specific business needs for accurate and optimised data analysis.

    Our approach to having an in-house data scientist operation to develop our own language modes is atypical and allows Getvisibility to develop our own model library, pushing the boundaries of data discovery in the industry, month by month. We pride ourselves on continually evolving with the data security industry as this allows our customers to have cutting edge ML to accelerate their organisation’s DSPM and risk management. Another advantage to our approach is the development of native Getvisibility Query Language (GQL) that further provides efficiencies when producing results on sets of files.

    The new additions to our innovative ML provides several advantages that can greatly enhance the process of analysing and extracting insights from multilingual and multicultural datasets. These include:

    • Richer Data Analysis

    Multilingual machine learning enables the analysis of data and content in various languages, allowing organisations to access a wider range of data. This enables them to understand patterns across different languages leading to a more comprehensive and accurate understanding of data use in the company. A more complete view of data, which might not be possible if analysis is limited to a single language.

    • Efficient Information Retrieval

    Multilingual ML improves search and information retrieval processes by accurately understanding and interpreting queries in various languages. This is done by enabling GQL searches on our tags in multilingual, ensuring that users can find relevant information regardless of their language preference, providing a complete understanding of an organisation's data environment and security through comprehensive multi-language reporting.

    • Language-agnostic Insights

    Some data insights might not be specific to a single language. Multi-lingual ML can help identify and analyse data security trends that transcend linguistic boundaries.

    Getvisibility’s cutting-edge ML has a language model that does not need language detection, as its inputs can read documents in any mixture of the languages it supports. This integration of Getvisbility’s multilingual ML in data discovery builds on the best-in-class five pillars of data security: Know where the data is stored, Know the sensitivity of the data, Know who has access to the data, Know the flow of the data through the ecosystem, and Know how well is the data protected. This will lead to more accurate insights, improved decision-making, and a deeper understanding of potential security threats across languages.

    GetVisibility's DSPM Solution

    Improve the competitive advantage of your multinational or multilingual organisation with the latest innovative addition to Getvisibility’s DSPM solution. Get in touch and let's start your journey to a multilanguage DSPM solution today. For More Information: www.getvisibility.com

    portal.azure.com
    www.getvisibility.com

    IP Address: The IP Address of the server where the LDAP is installed

  • Certificate (Optional): If the server to be scaned uses LDAPS (LDAP over SSL/TLS) enter the certificate text here. Otherwise leave it blank

  • Port: 389 is the default port for LDAP, however for Secure LDAP 636 is used

    • Use Global Catalog ports at 3268 (LDAP) and 3269 (LDAPS), in case standard ports doesn't allow us to traverse through the whole LDAP tree

  • Inactivity: This defines inactive users. Default is 90 days

  • Search base: This is the point in the LDAP directory where Focus will start searching from. In this example:

    • DC stands for Domain Component. An attribute used to represent domain levels

    • aws-gv is the name of the first-level domain

    • local is the top-level domain

    Together, DC=aws-gv,DC=local represents the domain aws-gv.local

  • https://github.com/traefik/traefik-helm-chart/issues/187

    Chinese

    English, Chinese (Simplified, Traditional)

    Finnish

    English, Finnish

    West-Slavic-3

    English, Polish, Czech, Slovak

    German-Dutch

    English, German, Dutch

    Nordic-3

    English, Danish, Swedish, Norwegian

    Hebrew

    English, Hebrew

    Greek

    English, Greek

    Korean

    English, Korean

    Thai

    English, Thai

    If additional language packs are needed after the initial setup please reach out to support for assistance as each additional pack is a separate AI model that needs to be added.

    Name

    Languages in Pack

    Arabic

    English, Arabic

    Turkish

    English, Turkish

    Hindi

    English, Hindi

    Latin-5

    English, French, Spanish, Portuguese, Italian, Romanian

    Japanese

    English, Japanese

    Click the "Enable" button to enable the Goolge Drive Activity API for your project

    Enter a name in the Service account name field and CREATE AND CONTINUE

    • Under Grant this service account access to the project, select role as Owner and click DONE

    • Select the newly created service account and click Keys > Add Key > Create new key

    • Make sure the key type is set to json and click Create

    • The new private key pair is generated and downloaded to the machine. Note the values of private_key, client_email and client_id

    https://www.googleapis.com/auth/gmail.modify

  • https://www.googleapis.com/auth/gmail.labels

  • https://www.googleapis.com/auth/gmail.metadata

  • Click Authorize

  • Google Cloud Console
    Admin console
    API Token: The API token created in previous steps
  • Domain: The Atlassian domain

  • Click on the Folder icon in Path to select a particular space to scan, or leave the path as empty to scan all spaces

  • https://id.atlassian.com/manage-profile/security/api-tokens
    The system searches through all files and folders.
    • If a specific path has not been set, the entire Data Source will be scanned.

    • Metadata (path, size, format, etc.) and permissions are extracted and recorded for each file.

  • This step ensures that every every file and folder is identified and that access permissions are understood.

  • The scan discovery process can have the following statuses, reflecting its progress:

    Not Started: Data Source has been added but the scan has not started.

    Queued: Scan has been put into the queue for the execution.

    Failed To Start: Scan unable to start, usually due to issues with permissions or network.

    In Progress: Scan is actively running and processing data discovery.

    Cancelled: Scan was manually stopped or automatically aborted.

    Incomplete: Scan is partially completed but permissions to files were changed during scan.

    Completed: Scan has successfully finished Discovery phase.

    These statuses can be seen in the Last Scan Status column.

    2) Metadata classification

    This is the continuation of the Discovery process where:

    • Metadata information is processed for each file that has been collected as part of the Discovery step.

    • A detailed analysis of each file's metadata is performed .

    3) Content Classification

    • Permissions are analysed and the shared level is identified.

    • A detailed analysis of each file's content is performed.

    • Content is extracted and the sensitivity level and risk of each file is determined for classification.

      • This is determined by the Patterns/Detector setting and the AI Mesh

    • This ensures that sensitive information is properly identified and protected.

    Trustee Scan

    This is a scan to determine the Users and Groups present in a Data Source.

    • Metadata is extracted for each user, with specific fields depending on the data source. Some of the fields that will be picked up by the scan include Enabled, Last Login, Last Modified, etc.

    The statuses for these scans are the same as for files but there are two additional ones.

    Completed Only Users: The scan has been completed only for user-specific policies.

    Completed Only Groups: The scan has been completed only for group-specific policies.

    To see additional information on a running or completed scan click on the Scan Analytics Icon.

    This will pop out the Analytics sidebar where there is information such as scan duration, how many files have been scanned, classification insights, etc.

    Deploy Focus – Reseller
    1. Go to Rancher dashboard and wait for the new cluster to become Active:

    1. Select the cluster name and go to Apps > Charts and install the GetVisibility Essentials Helm chart:

    1. After Getvisibility Essentials make sure to Enable Elastic Search

    4. Go to Apps > Charts and install the GetVisibility Monitoring Helm chart and install into Project: Default:

    5. Go to the global menu Continuous Delivery > Clusters and click on Edit config for the cluster:

    6. Add 2 labels product=Focus environment=prod and press Save.


    Update – Client

    Focus backend services

    Updates and custom settings are automatically applied to all Focus backend services as long as the cluster has access to the public internet and can connect to the management server.

    In case there’s no internet connection or the management server is down, the cluster agent will keep trying to reach the management server until a connection can be established.

    K3s cluster

    To upgrade K3s from an older version to a specific version you can run the following command:

    Stop the old k3s binary (e.g. systemctl stop k3s) and start it again (e.g. systemctl start k3s). For more details please refer to the official documentation.

    Certificates

    By default, certificates in K3s expire in 12 months. If the certificates are expired or have fewer than 90 days remaining before they expire, the certificates are rotated when K3s is restarted.


    Backup - Client

    Consul

    Find the IP of the server where Consul is running (in case you have a multi-node cluster):

    Log into the server using SSH and execute the following command to take a snapshot of Consul:

    Find the path where the snapshot has been save to:

    Copy the snapshot file to a safe place.

    PostgreSQL

    Find the IP of the server where the PostgreSQL master is running (in case you have a multi-node cluster):

    Log into the server using SSH and execute the following command to backup all databases:

    Find the path where the backup has been save to:

    Copy the backup file to a safe place.

    K3S installation
    If you have access to multiple tenants, use the
    Settings
    icon in the top menu to switch to the tenant in which you want to register the application from the
    Directories + subscriptions
    menu
    • Browse to App Registration and select New registration

    • On the App Registration page enter below information and click Register button

      • Name: (Enter a meaningful application name that will be displayed to users of the app)

      • Supported account types:

        • Select which accounts you would like your application to support. You should see the options similar to below. You can select “Accounts in this organizational directory only”:

        • Leave the Redirect URI as empty and Click Register

    • Note the Application (client) ID, Directory (tenant) ID values

    • Navigate to Manage -> Certificates and secrets on the left menu, to create a new client secret

    • Provide a meaningful description and expiry to the secret, and click on Add

    • Once a client secret is created, note its Value and store it somewhere safe. NOTE: this value cannot be viewed once you leave this page

    • Navigate to Manage -> API permissions on the left menu, and Add a permission

    • Select Microsoft APIs -> Microsoft Graph

    • Select Application permissions

    • Permissions required

      • For scanning

        • Microsoft Graph > Application permissions > Mail > Mail.Read

        • Microsoft Graph > Application permissions > User > User.Read.All

        • Microsoft Graph > Application permissions > DeviceManagementApps > DeviceManagementApps.Read.All

        • Microsoft Graph > Application permissions > MailboxSettings > MailboxSettings.Read

      • For tagging

        • Microsoft Graph > Application permissions > Mail > Mail.ReadWrite

    • Once all the required permissions are added, Grant admin consent to them

    Azure Portal

    Go to the dashboard: Administration -> Webhooks -> Create webhook.

    1. Provide URL from 'Your unique URL ' from webhook.site; input GQL; name and webhook status. Please see below for the description of all the options.

    1. Provide Callback URL from 'Your unique URL ' from webhook.site

    If there are some scans in progress, data in cataloguing/classification-result in some moments you'll see requests coming to webhook.site (if not - manually scan a folder).

    Webhooks dataset:

    A collection of data that serves as the content triggering the webhook or being sent by the webhook to a specified endpoint. When an event occurs that matches certain conditions, the webhook system will package relevant data from the dataset and send it to a predefined URL.

    Files

    This contains information about files, such as their ID, types, path, version, etc. This information is based on content related to DSPM

    Activity Dataset

    This dataset captures activities or actions performed by users or systems, such as logins, updates, deletions, or other significant events. This is mainly from endpoint Agents.

    Streaming

    This dataset relates to live events and the content connected to DDR.

    GQL query

    This query language is designed to enhance the flexibility and efficiency of querying data through DSPM+,DDC and EDC platforms. It enables users to craft custom queries without the need of hard coding, significantly simplifying the process of filtering through and analyzing the data. On the webhook page it can be used to setup information using the datasets available.

    Callback URL

    A callback URL is a specific URL endpoint provided by a user or a system to receive data from another system when a certain event occurs. When the event is triggered, the webhook sends an HTTP POST request to the callback URL, delivering the relevant data payload. This mechanism allows real-time communication between systems, enabling automated workflows and immediate data synchronization

    Name

    This filed allows user to give a unique and meaningful name to the webhook.

    Webhooks status

    Users can create multiple webhooks and choose whether to keep it active or deactivate it based on the requirement.

    User also has the option to edit and delete a webhook

    Filters and options explained:

    • Search: Enter text here to filter patterns based in name

    • Classification: Filter by classification tags associated with patterns

    • Compliance: Filter by compliance tags associated with patterns

    • Distribution: Filter by distribution tags associated with patterns

    • Categories: Filter by file categories associated with patterns

    • Subcategories: Filter by file subcategories associated with patterns

    • Enabled: Filter by patterns that have been enabled or disabled.

    • Published: Filter by patterns that have been published or unpublished

    • Add New Pattern: Create a custom pattern

    • Publish: Push changes to the pattern matching system for start using

    • Clear filters: Remove all previously selected filters


    Create a New Pattern

    Options to create pattern matching explained:

    • Pattern Name: identifies the RegEx when it is found by the software

    • Regular Expression: the sequence to be matched

    • Enabled: whether the pattern will be searched for by the software

    • Hide RegEx in UI: obfuscates the regular expression

    • Tag Overrides: when the RegEx is found these tags will be written to the file

    • Classifications: security levels

    • Compliance: regulations that apply to data

    • Distribution: policies on how data should distributed

    • Category: data grouping

    • Subcategory: data subgrouping

    • Cancel: exit without saving

    • Create: save pattern information and exit


    Glossary of Pattern Matching terms

    RegEx: Regular Expression, a sequence or pattern that is searched for in text. Ex-ID uses Java RegEx notation.

    Rules: Instructions for Ex-ID about what to do when a RegEx is detected in a file.

    Pattern: The RegEx and rules associated with its detection.

    Pattern Name: Used to identify the pattern when it is detected.

    Classification: Tags that help secure documents and other files. e.g. Public, Internal, and Confidential.

    Compliance: Tags that help organisations conform to certain regulatory regimes. By applying compliance tags such as GDPR/PII to RegEx such as Social Security number, organisations can identify all related documents.

    Distribution: Tags that specify how a files should be moved either within or outside an organisation.

    Category: From Getvisibility’s ML model. These are groupings of information based on their use. e.g. Finance, HR, or Technical Documents.

    Subcategory: From Getvisibility’s ML model. These are sub-groupings of information based on their particular use. e.g. CV (resume), Code, or Sales Agreement.

    Publish: The action of pushing the enabled patterns to be used. As some parts of the system need to be restarted in order to take on a new pattern matching configuration, we allow users to chose when to enact the configuration so as not to impact the workflow of others.

    Unpublished: A pattern that has been created, changed, or edited but has not been pushed to the pattern matching system.

    Published: A pattern that is currently part of the pattern matching configuration.

    Disabled: A pattern that is currently part of the pattern matching configuration but is not to be detected.

    Enabled: An active pattern. One that is part of the configuration and will be used by the pattern matching system.

    Hide RegEx: Ex-ID allows for RegEx notations to be obfuscated for security and intellectual property reasons.

    nm-cloud-setup
    and
    reboot
    the node:

    make sure noexec is not used for dedicated rancher partition

    If you are using a dedicated partition (/var/lib/rancher) to run K3s make sure to NOT have mounted it using noexec flag inside /etc/fstab file.

    disable fips mode

    If you have FIPS mode enabled is necessary to disable it otherwise some of our workloads running in K3s will crash at startup. To check if FIPS is enabled run:

    In order to disable, please refer to the instruction below:

    Please visit this KB Article if you want to know more.

    iptables

    RHEL like systems have buggy version of iptables 1.8.4 which is causing issues with firewall, service routing and external network reachability as well as performance issues. It is required to configure k3s to use bundled version by modifying k3s service( same for k3s-agent service on worker nodes in HA deployments) file and adding --prefer-bundled-bin option to service’s cmd and restarting service.

    If this change is done on existing system reboot is recommended to clear duplicate iptables rules.

    More details can be found here - Known Issues | K3s.

    firewalld
    fapolicyd

    Settings: Default Project | All Permissions

  • Note that this must be a new key. Once the Compliance API scopes are granted, all other scopes are revoked.

  • Reminder: This key can only be viewed/copied once. Store it securely.

  • Send an email to [email protected] with:

    • The last 4 digits of the API key

    • The Key Name

    • The Created By Name

    • The requested scope (read and delete)

  • The OpenAI team will verify the key and grant the requested Compliance API scopes.

  • Administrators may then use this key or pass it to a partner for use with the Compliance API.

  • Workspace IDs can be found on the Admin dashboard

  • Configuring ChatGPT connector in Dashboard

    • Navigate to Administration -> Data Sources -> ChatGPT -> New scan

    • Provide the workspace id and the api key value obtained from above steps

    • Click on the Folder icon in Path to select a particular user or gpt to scan, or leave the path as empty to scan all

    • Save the configuration

    • Once the configuration is saved, click on the icon on the right and select Start trustee scan to begin the trustee scanning

    • The scan results can be viewed under Dashboard -> Access Governance

    • Click on the icon on the right and select Start file scan to begin the files scanning

    • The results can be viewed under Dashboard -> Enterprise Search

    OpenAI API Platform Portal

    Once the names have been added click on ACCEPT.

    The New Tag will now appear at the top of the list but it will not be created until Save has been clicked.

    The tags can be re-ordered in order of risk level by clicking on the six dots, with the riskiest being on the bottom.

    Below the Top Secret tag that was just created has been moved to the bottom and therefore riskiest position.

    To have new tags incorporated into the AI Mesh please reach out to Support.

    Data Controls

    About

    Data Controls enable organizations to apply security and compliance conditions on the data assets in their systems, and apply actions on those rules when they are identified.

    They are important for security and regulatory compliance as they help orchestrate the data handling within an organisation while ensuring stakeholders and data owners are involved.

    They are set up during the configuration of the system and refined as the DSPM journey proceeds. They are used by data owners, CISOs, and other stakeholders throughout an organisation.

    The data control rules are set using GQL, this can granularly define the files, users, or other assets that exist within the organisation and specify under which conditions the rule should activate.

    A graphical display of any recent condition-activations can be viewed as well. Automated actions can be applied to the rule where users can choose to alert using messaging apps or webhooks.

    The rules are configured in the DSPM platform under Data Controls. Simply select Create New Rule and follow the below instructions. The rules will be triggered during a scan of the particular dataset the rule applies to.

    How to set a rule

    In this example we will create a rule to find HR related data that is at high risk. We will assign ownership and set up a slack message to alert a specific channel.

    1. On the Data Controls page of DSPM, select Create new rule

    SCREENSHOT is outdated (DSPM / policy center; data rule / control orchestration)

    1. Enter the following data to create the rule

    • Name: To identify the rule amongst many that can be created

    • Description: Useful for others to understand the intention of the rule

    • Ownership: The person who is responsible for the rule and its consequences

    • Based on group: The data asset that this rule is associated with. These are granularly defined in the Data Asset Registry.

    • Select Accept

    1. This screen allows you to further refine the rule and set the actions

    SCREENSHOT is outdated (DSPM / policy center; data rule / control orchestration)

    1. At the top of the screen: the name, description, and owner are visible, as well as the creation date. The option to assign rule severity is also available. As this rule, if it were breached, has the potential to incur severe consequences such as legal and financial penalties, we will set it as High.

    2. In the select dataset dropdown, we need to define the entity types we are setting our conditions for. (In the backend this relates to separate databases). The choice will be for files, trustees, and activities.

    • Files: unstructured data classified during discovery

    • Trustees: the users and groups discovered during IAM scans

    • Activities: the usage statistics of the endpoint agents (FDC)

    We will select files in this example.

    The condition section will be pre-loaded with a GQL if you have selected a Data Asset Group. Here it is simply path=HR and we can see that there are some recent files that match this criteria.

    1. We will refine the search further by adding the condition that the HR files found will be high risk. AND risk=2

    The platform has three levels of risk: low, medium, and high. Their respective values in GQL are: 0, 1, and 2

    As can be seen, no files have yet to fall under this rule.

    We can create an action so that we can catch high risk HR files going forward.

    1. Scroll to below the condition and select Create Action. In the Action type dropdown you can choose a simple Webhook or a Slack Webhook. Here we will add a Slack Webhook that will notify a Slack channel when the data control is activated.

    Multiple actions can be created for the same data control.

    1. Select UPDATE to save the control, and that’s it! Once scanning commences we will get notified in Slack, as well as on the Incidents page.

    SCREENSHOT is outdated (DSPM / policy center; data rule / control orchestration)

    Install Synergy/Focus/Enterprise using Helm without Rancher

    Prerequisites

    Please refer to K3S installationhere for installation requirements.


    Installation

    All the commands in this runbook were tested against a VM with Canonical, Ubuntu, 20.04 LTS, amd64 as root.

    1. Install K3s and wait for the pods to become ready:

    1. Install Helm 3:

    1. Configure GetVisibility private Docker image registry (replace $USERNAME and $PASSWORD with the Docker credentials provided):

    1. Configure GetVisibility private Helm repository (replace $USERNAME and $PASSWORD with the Helm credentials provided):

    1. Install chart GetVisibility Essentials and wait for the pods to become ready. For Synergy (32 GB RAM machines), please replace keycloak.url= IPADDRESS/DNS/FQDN with IP Adress or FQDN or DNS name for Keycloak in formats like below or or

    For Focus and Enterprice (48+ GB RAM machines), please replace keycloak.url= IPADDRESS/DNS/FQDN with IP Adress or FQDN or DNS name for Keycloak in formats like below or or

    1. Depending on the theme you want to use, run one of the commands below to install the chart GV Platform (don’t forget to replace $PRODUCT with either synergy or focus or enterprise):

    a. GetVisibility theme:

    b. Forcepoint theme:


    Upgrade

    GV Essentials

    In order to upgrade the essential services (e.g. Postgres, Kafka, MongoDB, Consul) run helm repo update and then the same command you used in step 5.

    Synergy/Focus/Enterprise

    In order to upgrade Synergy/Focus/Enterprise run helm repo update and then the same command you used in step 6.


    Monitoring

    You can optionally also install monitoring tools to collect metrics and logs of all services that are running inside the cluster:

    1. Install the CRDs:

    1. Install the monitoring tools:

    To access the Grafana dashboard run the command:

    To access Prometheus dashboard run the command:

    Keycloak User Federation Configuration (LDAP/AD)

    There are various authentication protocols that can be used depending on use case. This guide outlines the steps to configure User Federation in Keycloak.

    1. To authorize users for the GetVisiblity dashboard (not Keycloak itself), ensure that the gv realm selected in the top left, not master (unless the aim is to authorize LDAP users to use Keycloak):

    1. Click on the User Federation menu item on the left pane. This should load a list of configured user federations (none at first).

    1. Click on Add Ldap providers to load the LDAP (Lightweight Directory Access Protocol) configuration

    1. Update the Connection URL field to reflect the LDAP server address where the Active Directory is hosted

    1. Click on the button Test connection to test the connection from the Keycloak instance to the LDAP server address. This should succeed quickly. If it hangs, the LDAP server (i.e. a domain controller) may be blocking connections from the Keycloak server address (i.e. the IP of the server running the GetVisibility product). The Public IP address of the LDAP server may need to be used.

    1. Update the Bind DN field to reflect the user used to access the LDAP server. In this case, the user with username “admin” from the domain “”.

    For Active Directory, the value for the Bind DN field could be [email protected].

    1. Update the Bind credentials field (see the above image) to contain the password used to access the LDAP server

    2. Click “Test authentication” to confirm that the provided credentials work as expected:

    1. Update the Users DN field to contain the Full DN of the LDAP tree where your users are.

    The above value for the “Users DN” field will import all users to the gv realm. All users within the “” domain will get full administrative access for the GetVisiblity dashboard.

    If this is not desired, make restrictions to which users are imported. Often, just restricting by OU is not granular enough.

    In this scenario, use in the User LDAP filter field, like so: (memberOf=cn=My Group,dc=domain,dc=com)

    Combining (“AND”) with other criteria: (&(theAttribute=theValue)(memberOf=cn=My Group,dc=domain,dc=com))

    For AD Server federation, some may prefer to configure the Username LDAP attribute as sAMAccountName or userPrincipalName. See and .

    Optional Steps

    1. Within Synchronization settings, set up automatic synchronization of users from the LDAP Active Directory to Keycloak. Here the auto-synchronisation settings can be configured.

    1. Click the Save button at the bottom of the screen.

    Synchronizing the Users to Keycloak DB

    To get the users into the Keycloak DB, the users need to to be synchronised for the first time (before the automatic synchronization happens, if applicable).

    Click the button Synchronize all users to immediately fetch all of the LDAP Active Directory users and load them into the Keycloak instance DB

    Synchronizing all users may take some time.

    Troubleshooting Keycloak LDAP integration

    Usually, any issues that occur during the LDAP Active Directory configuration process above will be related to Network accessibility concerns or authentication credentials being incorrect.

    However, if additional support is needed or the problem is not easily resolved by troubleshooting Network communications and authentication details, please reach out to .

    Template Language

    Template Language (GTL) allows using variables, functions and conditions in any report text input fields and compiles those expressions into resulting values.

    Example

    Example with actual data

    Syntax

    Two possible syntaxes parsed by GTL:

    • Expressions

    • Conditions

    Expressions

    Any text placed between {{ and }} is considered an Expression. Expression is a mathematical or logical operation that returns a value. For example, this {{ 10 }} is an expression that returns the number 10. It is possible to use operators, {{ 10 + 5 }} will return 15. Logical operators are also supported, {{ 10 > 5 }} will return true. Here is a list of supported operators:

    + - * / % = != > < >= <=

    Expression can also contain variables, that are defined in the current context. For example, {{ file.name }} will return the name of the file, if the file object is defined.

    Expression Functions

    But the most powerful feature of the expressions is the ability to call functions. These are predefined aggregation functions, that fetch data from the database and return the result. For example, {{ count('files') }} will return the number of files in the database.

    Here is a list of supported functions:

    • count

    • sum

    • avg

    • max

    Those functions support the following parameters:

    • Dataset name - the name of the dataset to fetch data from. Possible values are: files, trustees, connectors, agents, activities.

    • GQL - the GQL query to filter the data. For example, fileType=doc OR fileType=txt will return only files with the doc or txt

    Conditions

    Conditions are useful when you want to display different text based on some condition. Example:

    else clause is optional and can be omitted:

    The if statement is followed by a condition in parentheses. The condition must be any expression that returns a boolean value.

    SIEM Integration

    Webhooks

    A webhook is a method used in web development to enhance or modify the behavior of a web page or application through custom callbacks. These callbacks are automated messages sent by applications when specific events occur. Triggered by events in a source system, webhooks generate HTTP requests with payload data, which are sent to a destination system. Webhooks enable real-time communication between different applications, allowing them to exchange data seamlessly and synchronize processes. Developers, even if not affiliated with the originating application, can manage and modify these callbacks. This event-driven communication approach finds applications in various scenarios, enhancing automation and integration between different software systems.

    Using Webhooks in SIEM software

    Webhooks are used by Security Information and Event Management (SIEM) software to enhance security monitoring and incident response. SIEM tools integrate with webhooks to receive real-time event notifications from various sources, such as authentication systems, cloud services, or other security tools. These notifications trigger automated actions in the SIEM, allowing it to detect and respond to potential security threats promptly. Webhooks provide a seamless way to feed event data into SIEM systems, enhancing threat detection, analysis, and reporting capabilities. This integration enables organizations to achieve more effective and efficient security operations, as SIEM software can aggregate and correlate data from diverse sources to provide a comprehensive view of the security landscape. The result is improved incident response and better protection against cyber threats.

    Example: logging Focus classification events with Pipedream into Google Sheets

    Pipedream is an integration platform designed for developers to connect APIs rapidly using a low-code approach. It allows users to create workflows that integrate different applications, data sources, and APIs, without the need for extensive coding. Pipedream facilitates event-driven automations by providing a hosted platform where users can develop and execute workflows that streamline processes and automate tasks. With Pipedream, developers can build efficient connections between various services and systems, reducing the need for manual intervention and accelerating development cycles. The platform offers open source connectors and supports multiple programming languages like Node.js, Python, Go, and Bash. Pipedream simplifies the integration of disparate apps and enables developers to create effective workflows with ease, contributing to enhanced efficiency and productivity in software development.

    Workflow

    In , a workflow is a sequence of steps that automate processes and connect APIs. Workflows make it easy to create and manage integrations, allowing developers to connect different applications, services, and data sources. Workflows consist of steps that are executed in order, and they can include actions, code, and triggers. Triggers define when a workflow is initiated, such as through HTTP requests or scheduled intervals. Each step in a workflow can perform actions like connecting to APIs, manipulating data, and more. Pipedream enables users to create workflows with code-level control when needed, and even offers a no-code approach for automation. Workflows in Pipedream simplify the automation of complex tasks, integration of APIs, and the creation of event-driven processes.

    Create first Trigger in Pipedream

    Trigger is a fundamental concept that defines the initiation of a workflow. Triggers specify the type of event or condition that starts the execution of a workflow. These events can include HTTP requests, data from external apps or services, scheduled intervals, and more. When a trigger event occurs, the associated workflow is automatically initiated, and the defined steps within the workflow are executed sequentially. For instance, you can set up a trigger to activate a workflow when an HTTP request is received at a specific URL, allowing you to automate actions based on external events. Pipedream's triggers enable developers to create dynamic and event-driven workflows that respond to various inputs and conditions, enhancing automation and integration capabilities.

    Create your first trigger by using [New HTTP/Webhook Requests] option.

    No need to configure anything here.

    We are going to use later newly created URL when configuring a webhook in Focus.

    Create Webhook in Focus

    Please refer to for guidance how to setup webhooks.

    To capture classification events we are using “flow=CLASSIFICATION” query.

    Start a scan

    Please refer to to guidance on how to start a scan.

    An event reached out P

    We can continue with our workflow after first event reach out Pipedream workflow.

    After configuring Pipedream to add rows to our test spreadsheet in Google Sheets our workflow is completed.

    We can now Deploy is and head over to the Sheet to check it in action.

    Result

    The workflow is now completed and as a result out Sheet is being populated with classification events.

    AWS IAM

    How to configure IAM connection to gather permissions and access rights for groups and users on an AWS IAM.

    Create a policy

    • Sign in to the AWS Management Console and open the IAM console with the appropriate admin level account

    • In the navigation pane on the left, choose Policies and then choose Create policy

    • In the Policy editor section, find the Select a service section, then choose IAM service, and select Next

    • In Actions allowed, choose the below actions to add to the policy:

      • Read > GetUser

      • Read > GetPolicyVersion

      • Read > GetPolicy

    • For Resources, choose All and select Create policy to save the new policy

    Create a user

    • Sign in to the AWS Management Console and open the with the appropriate admin level account

    • In the navigation pane on the left, choose Users and then choose Create user

    • On the Specify user details page, under User details, in User name, enter the name for the new user, example iam-connector-user and select Next

    • On the Set permissions page, select Attach policies directly and choose the policy created in above steps

    • Select Next

    • Once the user is created, select it, and from the user page, choose Create access key

    • Select Other then Next

    • Enter a description if you wish and select Create access key

    • The Access and Secret Access Keys have now been created. These can be downloaded as a CSV, and also copied from this section. NOTE: the secret access key cannot be viewed once you leave this page

    Configuring AWS IAM connector in Dashboard

    • Navigate to Administration -> Data Sources -> AWS IAM -> New scan

    • Provide the access key and secret access key values generated in the above steps

    • Save the configuration

    • Once the configuration is saved, click on the icon on the right and select Start trustee scan to begin the scanning

    • The scan results can be viewed under Dashboard -> Access Governance

    Lineage

    Overview of Lineage

    Data Lineage in Getvisibility provides a comprehensive view of a file's lifecycle, tracking its origin, movement, transformation, and usage. This enhances security, compliance, and forensic investigations by offering end-to-end visibility into data activities.

    Traditional data monitoring provides static snapshots, which quickly become outdated, especially for large datasets. Real-time lineage addresses this by:

    1. Reducing Dependency on Rescans: Once streaming is enabled, changes are captured instantly.

    2. Improving Visibility: Organizations can see data movements in near real-time.

    3. Enabling Faster Incident Response: Security teams can quickly assess and respond to threats.

    Use Cases

    Data Lineage was developed to enable forensic investigations, ensuring organisations can:

    1. Investigate Incidents: Identify the root cause of security incidents, such as data breaches or unauthorised sharing.

    2. Enhance Compliance: Maintain audit trails for regulatory requirements.

    3. Support Risk Mitigation: Quickly respond to suspicious activities and apply appropriate remediation actions.

    Pre-Requisites to See Lineage

    1. Connection to Each Data Source: Ensure that each Data Source to be monitored has been configured in Getvisibility.

    2. Enabling Streaming: Activate real-time event streaming for each connector.

    Navigation to Lineage

    1. From Enterprise Search: Select a file and click on "Lineage" in the dropdown.

    1. From Open Risks: Identify a flagged file and expand the side menu.

    Lineage UI Explanation

    Filters:

    • Event Type (Create, Modify, Delete, Share, Move, etc.)

    • Data Source

    • User Activity

    Export:

    • Export lineage details to CSV for auditing and reporting.

    Color Scheme:

    • Green: Normal activity

    • Yellow: Medium-risk events (e.g., permission changes)

    • Red: High-risk events (e.g., external sharing)

    Description of the Lineage Screen

    Lifecycle: Displays the complete lifecycle of a file from creation to current state.

    Event Timeline: Chronological list of all file-related actions.

    User & Device: Shows which users and devices interacted with the file.

    File Path: Original and current locations of the file.

    List of Events Supported by Each Data Source

    Common Events:

    • Create

    • Modify

    • Delete

    Extended Events (via Audit Logs)

    • Change Permissions

    • Share

    • Move

    • Copy

    Data Source Specifics:

    • Google Drive: Audit log events available.

    • Azure (SharePoint Online, OneDrive, Blob, Files): Audit log events supported.

    • Box & Confluence: Extended events available in regular logs.

    • AWS S3, SMB, Dropbox: Limited to Create, Modify, and Delete.

    Use Case for Lineage

    Lineage supports forensic investigations, such as:

    1. External Sharing Investigation: When a file is shared externally, security analysts can trace its history to determine if the action was intentional or accidental.

    2. Suspicious Activity Investigation: If a user accesses and downloads sensitive information after a password reset, lineage provides detailed insights.

    3. Incident Response: Analysts can determine what actions to take, such as revoking access, quarantining files, or addressing user behaviour.

    How to Access Lineage

    1. Enterprise Search: Select the file, click the dropdown, and choose "Lineage."

    2. File View: Expand the file details and navigate to the "Lineage" tab.

    Hover and Export Options

    1. Event Description: Hovering over event icons shows a brief description.

    2. Export: Export the entire lineage history, including metadata, to CSV for audit trails and reporting.

    Data Lineage empowers organisations with real-time visibility, advanced threat detection, and comprehensive forensic capabilities, ensuring sensitive data remains secure and traceable.

    SharePoint on-premise

    How to configure SharePoint On-Premise connection to scan it.

    Configuring SharePoint On-Premise connector in Dashboard

    • Navigate to Administration -> Data Sources -> SharePoint On-Premise -> New scan

    • Provide the Domain URL, an admin username and its password

    • Click on the Folder icon in Site and path to select a particular site to scan, or leave the path as empty to scan all sites

    • Save the configuration

    • Once the configuration is saved, click on the icon on the right and select Start file scan to begin the scanning

    • The results can be viewed under Dashboard -> Enterprise Search

    File tagging

    Prerequisites

    • An admin level user is required to scan and tag files in SharePoint On-Premise. The user must be a member of Site Owners Group where they have full control permissions to the SharePoint site.

    • The default Getvisibility tags need to be created as a new column in their SharePoint. This process is described below:

      • In SharePoint, navigate to Documents

      • In the files view, select + Add column

    Supported SharePoint On-Premise versions:

    The connector supports SharePoint 2013, 2016, 2019.

    Import Data Controls

    Getvisibility DDR offers a Quick Start option for enabling out-of-the-box data controls

    1. Go to Administration > Quick Start.

    2. Under the Data Controls section, enable predefined DDR rules, such as:

      1. Public Exposure of Personal Identifiable Information (PII).

      2. Detection of Protected Health Information (PHI).

      3. Monitoring of Payment Card Industry (PCI) data.

    3. Import the desired Control Rules to start monitoring immediately.

    Sizing

    Server sizing to utilise Getvisibility products.

    Note that any sizing provided can be subject to change due to environmental variables. Below we outline some of the variables that can affect the sizing:

    • Server resources. e.g. physical vs virtual, physical underlying CPU, RAM, and disc specs/speeds and whether they are shared or dedicated.

    • Network (speed, latency and throughput)

    • Geolocation

    Azure Files

    How to configure Azure Files connection for to scanning.

    Registering an Azure App

    • Login to

    • If there are multiple tenants to choose from, use the Settings

    AWS S3

    How to create an AWS S3 user with policies, to connect to S3 accounts.

    Create a policy

    • Sign in to the AWS Management Console and open the with the appropriate admin level account

    • In the navigation pane on the left, choose

    iManage On-Premise

    This guide details how to create and configure an iManage connector to scan an on-premise iManage Work Server.

    To connect Forcepoint DSPM to your iManage server, you will need to gather three key pieces of information:

    1. Your Server's URL: The fully qualified domain name of your iManage server (e.g., imanage.mycompany.com).

    2. An Application Client ID: A unique ID from your iManage Control Center that identifies the Getvisibility application.

    SMB Streaming Configuration

    This guide provides steps on how to enable real-time data streaming for a SMB connection and monitor streaming events within the Getvisibility platform.

    This guide walks you through enabling real-time data streaming for a SMB connection and how to monitor live streaming events within the Getvisibility platform.

    Steps to Enable Data Streaming for SMB

    OneDrive Streaming Configuration

    This guide provides steps on how to enable real-time data streaming for a OneDrive connection and monitor streaming events within the Getvisibility platform.

    This guide walks you through enabling real-time data streaming for a OneDrive connection and how to monitor live streaming events within the Getvisibility platform.

    Configuring permissions for an Azure App

    • Login to

    Detectors

    How to set up a Detector

    What is Detector and how does it work?

    Detectors are features that allow users to set up alerts for certain parameters during a classification search. A user can set up a Detector to search for keywords within the entire contents of a document or file, as well as search for keyword hits within the file's pathname. It uses advanced AI and ML search techniques such as Fuzzy Word Search and Percolation to search through documents much more quickly than a traditional pattern-matching search, such as using Regular Expressions.

    An example of a Detector that a user could set up is “Employee Salary”. A user might want to ensure that documents that contain this information are not publicly shared or shared internally throughout an organisation.

    iManage Cloud

    How to create an iManage Connector app to connect to iManage accounts for the cloud.

    Registering an iManage App

    • To register iManage App you need to contact iManage support by sending an email to

    • Once an account is created login to

    • Click on username in the upper right corner and click Control Center

    Role-Based Access Control (RBAC)

    Available roles

    curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=vX.Y.Z-rc1 sh -
    kubectl get pod/gv-essentials-consul-server-0 -o jsonpath='{.spec.nodeName}'
    kubectl exec -it gv-essentials-consul-server-0 -- consul snapshot save /consul/data/backup.snap
    kubectl get pvc/data-default-gv-essentials-consul-server-0 -o jsonpath='{.spec.volumeName}' | xargs -I{} kubectl get pv/{} -o jsonpath='{.spec.hostPath.path}'
    kubectl get pod/gv-postgresql-0 -o jsonpath='{.spec.nodeName}'
    
    kubectl exec -it gv-postgresql-0 -- bash -c "pg_dumpall -U gv | gzip > /home/postgres/pgdata/backup.sql.gz"
    kubectl get pvc/pgdata-gv-postgresql-0 -o jsonpath='{.spec.volumeName}' | xargs -I{} kubectl get pv/{} -o jsonpath='{.spec.hostPath.path}'
    systemctl disable firewalld --now
    systemctl disable fapolicyd.service
    systemctl disable nm-cloud-setup.service nm-cloud-setup.timer
    reboot
    sysctl crypto.fips_enabled
    fips-mode-setup --disable
    ~$ cat  /etc/systemd/system/k3s.service
    ExecStart=/usr/local/bin/k3s \
        server \
    	'--node-name=local-01' \
    	'--prefer-bundled-bin' \
    ~$ sudo systemctl daemon-reload
    ~$ sudo systemctl stop k3s
    ~$ sudo systemctl start k3s
    ~$
    ~$ sudo reboot 
    File Totals
    {{ count('files') }} were discovered in the {{ connector.name }} cloud.
    Of this number {{ count('files', 'fileType=doc OR fileType=txt') }}
    were classified as they contain text, or are recognisable file types or data.
    File Totals 
    1000 were discovered in the Confluence cloud. 
    Of this number 800 were classified as they contain text, or are recognisable file types or data.
    Connect APIs, AI, databases and more - Pipedream
    Webhooks
    Scan with Getvisibility
    ipedream.com!

    Read > GetUserPolicy

  • List > ListUserPolicies

  • List > ListAttachedGroupPolicies

  • List > ListAttachedUserPolicies

  • List > ListGroups

  • List > ListUsers

  • List > ListGroupsForUser

  • IAM console
    Rename
  • Upload

  • Download

  • domain.com
    domain.com
    standard LDAP query
    User Naming Attributes - Win32 apps
    Active Directory LDAP Field Mappings
    Support
    https://192.168.10.1
    https://gv.domain.local
    https://gv.getvisibility.com
    https://192.168.10.1
    https://gv.domain.local
    https://gv.getvisibility.com
    min
  • median

  • file type.
  • Attribute - the attribute to aggregate. All functions, except count, require this parameter. For example: sum('files', 'fileType=doc OR fileType=txt', 'contentLength') will return the sum of the sizes of all files with the doc or txt file type.

  • curl -sfL https://assets.master.k3s.getvisibility.com/k3s/k3s.sh | INSTALL_K3S_VERSION="v1.26.10+k3s1" K3S_KUBECONFIG_MODE="644" sh -s - server --node-name=local-01
    
    kubectl get deploy -n kube-system --output name | xargs -n1 -t kubectl rollout status -n kube-system
    wget --quiet https://raw.githubusercontent.com/helm/helm/master/scripts/get -O /tmp/get_helm.sh \
    && chmod 0755 /tmp/get_helm.sh \
    && /tmp/get_helm.sh -v v3.8.2
    kubectl create secret docker-registry gv-docker-registry \
      --docker-server=https://images.master.k3s.getvisibility.com \
      --docker-username=$USERNAME \
      --docker-password=$PASSWORD \
      [email protected]
    helm repo add gv_stable https://charts.master.k3s.getvisibility.com/stable --username $USERNAME --password $PASSWORD
    
    helm repo update
    helm upgrade --install gv-essentials gv_stable/gv-essentials --wait \
    --timeout=10m0s --kubeconfig /etc/rancher/k3s/k3s.yaml \
    --set eck-operator.enabled=true --set eck-operator.settings.cpu=1 \
    --set eck-operator.settings.memory=1 --set eck-operator.settings.storage=40 \
    --set updateclusterid.enabled=false --set keycloak.url= IPADDRESS/DNS/FQDN
    
    kubectl get deploy --output name | xargs -n1 -t kubectl rollout status
    helm upgrade --install gv-essentials gv_stable/gv-essentials --wait \
    --timeout=10m0s --kubeconfig /etc/rancher/k3s/k3s.yaml \
    --set eck-operator.enabled=true --set eck-operator.settings.cpu=8 \
    --set eck-operator.settings.memory=20 --set eck-operator.settings.storage=160 \
    --set updateclusterid.enabled=false --set keycloak.url= IPADDRESS/DNS/FQDN
    
    kubectl get deploy --output name | xargs -n1 -t kubectl rollout status
    helm upgrade --install gv-platform gv_stable/gv-platform --wait \
    --timeout=10m0s --kubeconfig /etc/rancher/k3s/k3s.yaml \
    --set-string clusterLabels.environment=prod \
    --set-string clusterLabels.cluster_reseller=getvisibility \
    --set-string clusterLabels.cluster_name=mycluster \
    --set-string clusterLabels.product=$PRODUCT
    helm upgrade --install gv-platform gv_stable/gv-platform --wait \
    --timeout=10m0s --kubeconfig /etc/rancher/k3s/k3s.yaml \
    --set-string clusterLabels.environment=prod \
    --set-string clusterLabels.cluster_reseller=forcepoint \
    --set-string clusterLabels.cluster_name=mycluster \
    --set-string clusterLabels.product=$PRODUCT
    $ helm upgrade --install rancher-monitoring-crd gv_stable/rancher-monitoring-crd --wait \
    --kubeconfig /etc/rancher/k3s/k3s.yaml \
    --namespace=cattle-monitoring-system \
    --version=100.1.2+up19.0.5 \
    --create-namespace
    $ helm upgrade --install rancher-monitoring gv_stable/rancher-monitoring --wait \
    --kubeconfig /etc/rancher/k3s/k3s.yaml \
    --namespace=cattle-monitoring-system \
    --version=100.1.2+up19.0.5 \
    --set k3sServer.enabled=true \
    --set k3sControllerManager.enabled=true \
    --set k3sScheduler.enabled=true \
    --set k3sProxy.enabled=true
    kubectl port-forward svc/rancher-monitoring-grafana -n cattle-monitoring-system 3001:80 --address='0.0.0.0' and in your browser type the address http://$SERVER_IP:3001
    kubectl port-forward svc/rancher-monitoring-prometheus -n cattle-monitoring-system 3001:9090 --address='0.0.0.0' and in your browser type the address http://$SERVER_IP:3001
    {{ if (count('files', 'sensitive=true') > 0) }}
      Sensitive files have been detected!
    {{ else }}  
      You are safe!
    {{ endif }}
    {{ if (count('files', 'sensitive=true') > 0) }}
      Sensitive files have been detected!
    {{ endif }}
    {
       "roles":{
          "client":{
             "dashboard":[
                {
                   "name":"ADMIN"
                },
                {
                   "name":"AGENT_CONFIGURATION_WRITE"
                },
                {
                   "name":"ANALYTICS_READ_ONLY"
                },
                {
                   "name":"ANALYTICS_WRITE"
                },
                {
                   "name":"COMPLIANCE_HUB_READ"
                },
                {
                   "name":"COMPLIANCE_HUB_WRITE"
                },
                {
                   "name":"CONNECTIONS_READ_ONLY"
                },
                {
                   "name":"CONNECTIONS_WRITE"
                },
                {
                   "name":"DATA_REGISTER_READ"
                },
                {
                   "name":"DATA_REGISTER_WRITE"
                },
                {
                   "name":"DATA_RISK_READ_ONLY"
                },
                {
                   "name":"DATA_RISK_WRITE"
                },
                {
                   "name":"DEPARTMENTS_FULL_READ"
                },
                {
                   "name":"DEPARTMENTS_FULL_WRITE"
                },
                {
                   "name":"DEPARTMENTS_PARTIAL_READ"
                },
                {
                   "name":"DEPARTMENTS_PARTIAL_WRITE"
                },
                {
                   "name":"EXPLORE_FILES_PAGE_READ_ONLY"
                },
                {
                   "name":"EXPLORE_FILES_PAGE_WRITE"
                },
                {
                   "name":"EXPLORE_TRUSTEES_PAGE_READ_ONLY"
                },
                {
                   "name":"EXPLORE_TRUSTEES_PAGE_WRITE"
                },
                {
                   "name":"LANGUAGE_SETTINGS_WRITE"
                },
                {
                   "name":"PATTERN_MATCHING_READ_ONLY"
                },
                {
                   "name":"PATTERN_MATCHING_WRITE"
                },
                {
                   "name":"TAGGING_READ_ONLY"
                },
                {
                   "name":"USER"
                },
                {
                   "name":"TAGGING_WRITE"
                },
                {
                   "name":"USER_MANAGEMENT_WRITE"
                },
                {
                   "name":"PERMISSIONS_READ"
                },
                {
                   "name":"WEBHOOKS_READ_ONLY"
                },
                {
                   "name":"WEBHOOKS_WRITE"
                },
                {
                   "name":"CLUSTERING_WRITE"
                },
                {
                   "name":"REVOKE_PERMISSIONS_WRITE"
                }
             ]
          }
       }
    }

    Select Choice and then Next

    • Give the name as Classification and the choices as: Public, Internal, Confidential, Highly-Confidential. Select Save

    • Similary create Compliance and Distribution columns (if required)

    • Getvisibility and SharePoint's tags are now aligned

  • When tags are written to SharePoint files automatically over the API, as the tags are added by Getvisibility, Modified By changes to System Account.

    • Getvisibility preserves the Modified date where applicable.

  • Specific location of server. (data center or Azure, AWS, GCP, etc)

  • Amount of Data

  • Data info (type, size, number of files, etc)

  • vendor throttling

  • Sizing may require adjusting once the platform is operational to meet data classification scanning speed requirements/expectations. Please note that scaling is not linear - you do not need to double the size of the server to double the speed. To double the classification throughput, you may only need to add an additional 2 CPUs and 6GB RAM.

    If data/users spread across multiple geolocations a server per location is needed and then you need to size each server accordingly.

    The below tables are for environments up to 25,000 users. If a sizing is needed for a larger environment please reach out to Getvisibility.

    Basic Sizing

    CPU Sockets
    RAM (GB)
    Disk (TB)

    Synergy (Endpoint agent only) deployment specs:

    8

    48

    1.0

    DSPM (DSPM only) deployment spec specs:

    16

    80

    1.2

    Ultimate (DSPM + agent) deployment specs:

    20

    96

    Synergy (Endpoint agent only) single server deployment

    Number of licensed users
    CPU (sockets)
    RAM (GB)
    Disk (TBs)

    5000 Users or Less

    8

    48

    1.0

    10,000 Users or Less

    16

    80

    1.5

    15,000 Users or Less

    24

    112

    DSPM (DSPM only) single server deployment

    Number of licensed users
    CPU (sockets)
    RAM (GB)
    Disk (TBs)

    5000 Users or Less

    16

    80

    1.2

    10,000 Users or Less

    32

    144

    2.4

    15,000 Users or Less

    48

    208

    Ultimate (DSPM + agent) single server deployment

    Number of licensed users
    CPU (sockets)
    RAM (GB)
    Disk (TBs)

    5000 Users or Less

    20

    96

    2.2

    10,000 Users or Less

    40

    176

    3.9

    15,000 Users or Less

    60

    256

    DSPM + agent + Streaming single server deployment

    Number of licensed users
    CPU (sockets)
    RAM (GB)
    Disk (TBs)

    2000 Users or Less

    20

    96

    2.2

    5000 Users or Less

    40

    176

    3.4

    10,000 Users or Less

    80

    336

    Additional information: The CPU must support the instructions SSE4.1, SSE4.2, AVX, AVX2, FMA.

    Only x86_64 architecture is supported. Minimum CPU speed is 2.2 GHz

    icon in the top menu to switch to the tenant in which needs to be registered to the application from the
    Directories + subscriptions
    menu.
    • Browse to App Registration and select New registration

    • On the App Registration page enter below information and click Register button

      • Name: (Enter a meaningful application name that will be displayed to users of the app)

      • Supported account types:

        • Select which accounts the application will support. The options should be similar to those below. Select �Accounts in this organizational directory only�:

        • Leave the Redirect URI as empty and Click Register

    • Note the Application (client) ID, Directory (tenant) ID values

    • Navigate to Manage -> Certificates and secrets on the left menu, to create a new client secret

    • Provide a meaningful description and expiry to the secret, and click on Add

    • Once a client secret is created, note its Value and store it somewhere safe. NOTE: this value cannot be viewed once you leave this page

    • Navigate to Manage -> API permissions on the left menu, and Add a permission

    • Select Microsoft APIs -> Microsoft Graph

    • Select Application permissions

    • Permissions required

      • Microsoft Graph > Application permissions > Device > Device.Read.All

      • Microsoft Graph > Application permissions > Directory > Directory.Read.All

      • Microsoft Graph > Application permissions > Group > Group.Read.All

      • Microsoft Graph > Application permissions > User > User.Read.All

    • Once all the required permissions are added, click "Grant admin consent"

    Azure Storage Subscription ID, Resource group and connection strings

    A connection string is needed for the storage account you wish to scan.

    • Login to Azure Portal

    • If there are multiple tenants to choose from, use the Settings icon in the top menu to switch to the tenant in which needs to be registered to the application from the Directories + subscriptions menu.

    • Browse to Storage accounts and select the account to be scanned

    • Once the storage account is selected, note the Resouce group and Subscription ID values in the Overview page

    • Navigate to Security + networking -> Access keys on the left menu, and click on Show on the Connection string

    • Copy this Connection string value

    • Access Control (IAM) Role assignment

      • In the storage account, go to Access Control (IAM) and assign Reader role to the azure app created in the first step

      • Save the changes.

    Configuring Azure Files connector in Dashboard

    • Navigate to Administration -> Data Sources -> Azure Files -> New scan

    • Provide the Connection string value obtained from above steps

    • Click on the Folder icon in Path to select a particular share to scan, or leave the path as empty to scan all shares

    • Save the configuration

    • Once the configuration is saved, click on the icon on the right and select Start file scan to begin scanning

    • The results can be viewed under Dashboard -> Enterprise Search

    Azure Portal
    Policies
    and then choose
    Create policy
    • In the Policy editor section, find the Select a service section, then choose S3 service, and select Next. Once S3 service permissions are added, next, move on to IAM service

    • In Actions allowed, choose the below actions to add to the policy:

      • For scanning

        • IAM service

          • Read > GetUser

          • Read > GetPolicyVersion

          • Read > GetPolicy

          • Read > GetUserPolicy

          • List > ListUserPolicies

          • List > ListAttachedUserPolicies

        • S3 service

          • Read > GetBucketAcl

          • Read > GetBucketLocation

          • Read > GetObject

        • EC2 service

          • List > DescribeRegions

      • For revoke permissions (S3 service)

        • Permission Management > PutBucketAcl

        • Permission Management > PutObjectAcl

      • For tagging (S3 service)

        • Write > DeleteObject

        • Write > PutObject

        • Tagging > DeleteObjectTagging

    • For Resources, choose All and select Create policy to save the new policy

    Create a user

    • Sign in to the AWS Management Console and open the IAM console with the appropriate admin level account

    • In the navigation pane on the left, choose Users and then choose Create user

    • On the Specify user details page, under User details, in User name, enter the name for the new user, example S3-connector-user and select Next

    • On the Set permissions page, select Attach policies directly and choose the policy created in above steps

    • Select Next

    • Once the user is created, select it, and from the user page, choose Create access key

    • Select Other then Next

    • Enter a description if you wish and select Create access key

    • The Access and Secret Access Keys have now been created. These can be downloaded as a CSV, and also copied from this section. NOTE: the secret access key cannot be viewed once you leave this page

    Configuring AWS S3 connector in Dashboard

    • Navigate to Administration -> Data Sources -> AWS S3 -> New scan

    • Provide the access key and secret access key values generated in the above steps

    • Click on the Folder icon in Path to select a particular bucket to scan, or leave the path as empty to scan all buckets

    • Save the configuration

    • Once the configuration is saved, click on the icon on the right and select Start file scan to begin the scanning

    • The results can be viewed under Dashboard -> Enterprise Search

    IAM console
    A Service Account: A dedicated iManage user account with specific permissions for scanning.

    This guide will walk you through the steps for your iManage administrator to find this information and how to use it to configure the connector.

    Network Access Requirement

    Before you begin, ensure the Forcepoint DSPM server has network access to your on-premise iManage server's API. You may need to configure internal firewall rules to allow this connection.

    Prerequisites

    Before you begin, ensure you have the following:

    • Administrative access to your on-premise iManage Control Center.

    • The fully qualified domain name (hostname) of your on-premise iManage server (e.g., imanage.mycompany.com).

    • A dedicated iManage service account with a username and password.

    Part 1: Obtain Your Client ID (via iManage Control Center UI)

    This step must be performed by your internal iManage administrator.

    1. Log in to your on-premise iManage server.

    2. Click on your username in the upper-right corner and select Control Center.

    3. From the side menu, navigate to Applications.

    4. Select Desktop Auth Client from the list.

    5. Copy the Client ID value. This ID is used to identify the Forcepoint DSPM application to your iManage server. You will need this for Part 2 and Part 4.

    Part 2: Get Access Token and Customer ID (via API)

    You can use a command-line tool like curl to perform these one-time steps. Replace your.imanage.server.com with your on-premise server's actual hostname in the commands below.

    A. Get Access Token

    Run the following command in your terminal. Be sure to replace the placeholder values (YOUR_USERNAME, YOUR_PASSWORD, YOUR_CLIENT_ID) with your actual service account credentials and the Client ID from Part 1.

    The JSON response will contain your access_token.

    B. Get Customer ID

    Run the next command, replacing YOUR_ACCESS_TOKEN with the access_token value you received from the previous step.

    The JSON response will contain your customer_id.

    Part 3: Configure iManage Roles & Permissions (via iManage Control Center UI)

    This is performed in the iManage Control Center to grant the service account the necessary permissions.

    1. Navigate to Control Center > Roles.

    2. Create or edit the role assigned to your service account.

    3. Grant the following privileges:

      • For Scanning: System Access > Read-only

      • For Tagging: Document > Import / Create

      • For Moving Files: Document > Delete

      • For Revoking Permissions: System Access > Not Read-only

    Part 4: Configure the iManage Connector in Forcepoint DSPM

    1. In the Forcepoint DSPM, navigate to Administration > Data Sources.

    2. Find iManage in the list and click New Scan.

    3. Fill in the connector configuration fields:

    Field

    Value

    Description

    Name

    My On-Prem iManage

    A friendly name for this connection.

    Customer Id

    (ID from Part 2B)

    The numeric Customer ID for your instance.

    Username

    (Service Account)

    The iManage service account username.

    Password

    (Service Account)

    The service account password.

    1. Click Save.

    Part 5: Run the Scan

    1. Find your newly configured iManage connection in the list.

    2. Click the ... (three-dot) menu on the right.

    3. Select Start trustee scan to scan permissions (Optional).

    4. Once the trustee scan is complete (optional), click the ... menu again and select Start file scan to scan content.

    Part 6: View Results

    • Permission and access issues can be viewed in Dashboard > Access Governance (if you ran the trustee scan).

    • File classification and content results can be viewed in Dashboard > Enterprise Search.

    1. Create a New Scan Configuration
    1. From the Data Sources page, select SMB from the list of available data sources. In the Scan Configurations list Create New Configuration

    1. Make sure the connection has a Name, Credentials are set. Then select the SMB share Path that is going to listen.

    1. After selecting the folder, Select the Data streaming checkbox:

    1. Follow the download tab link and installation instructions of SMB agent:

    link to download agent instruction
    1. Follow installation instructions for SMB streaming agent:

    SMB Agent Installation

    This section addresses the different methods to install the SMB Connector on a single machine.

    SMB Connector Pre-requisites:

    • OS: Windows Server 2016 or later.

    • Processor: 2 GHz or faster, 2 cores (64-bit processor recommended).

    • Memory: 4GB RAM.

    • Hard Disk: 1GB free space.

    • Administrator Privileges: user needs admin permissions to install.

    • must be installed.

    Installation config

    The SMB Connector supports various configuration options which can be specified via smb_connector_application_config.json

    Manual Installation

    Pre-requisites:

    1. The ZIP of the installer files.

    2. smb_connector_application_config.json file.

    3. Windows Server machine access.

    4. Admin access to install the connector.

    Steps

    1. Download the SMB Connector ZIP File: Obtain the ZIP file and save it to the Windows machine.

    2. Prepare for Installation:

      • Unzip the contents of the ZIP file

      • Place the smb_connector_application_config.json file in the same directory as the unzipped contents.

    3. Configure the Installer:

      • Edit the smb_connector_application_config.json file as needed. Use the smb_connector_application_config.json.example file in the unzipped folder if creating the configuration from scratch.

    Create a folder mapping for every SMB share on the server that is to be scanned. WatchFolder should be the root directory of the share, and WebhookUrl should be from the scan configuration page for the SMB share on the GV dashboard (shown below).

    • Keep useDefaultFileFilters set to false if you want all files in the share to be scanned. If set to true, the connector will only scan files supported by the GV Synergy agent for classification.

    • IncludedExtensions and AdditionalFileFilters can be used if you wish to apply filters other than the defaults. IncludedExtensions supports file extensions in the format .txt, etc. AdditionalFileFilters allows for any custom file filter, including * as a wildcard

    • Start the Installation:

      • Execute the install.ps1 script by right clicking and choosing Run with PowerShell

    • Complete the Installation:

      • After the installation completes, the PowerShell window can be closed.

    1. Save Streaming configuration

    Monitoring Real-Time Events

    After the subscription is activated (green magnifying glass icon), real-time events will start flowing into the platform, and you will be able to monitor them from various sections of Getvisibility.

    Viewing Events in the Live Events Section

    1. Navigate to the Live Events section under Administration to view a detailed audit log of all streaming events (you may specify source filter to focus only on SMB events):

    If there are multiple tenants to choose from, use the Settings icon in the top menu to switch to the tenant in which needs to be registered to the application from the Directories + subscriptions menu

  • Browse to App Registration and select your application that was created for the scanning

  • Navigate to Manage -> API permissions on the left menu, and Add a permission

  • Select Microsoft APIs -> Office 365 Management API

    • Select Application permission

    • Select ActivityFeed.Read permission

    • Permissions required

      • All the scanning permissions(https://docs.getvisibility.com/scan-with-getvisibility/configure-data-sources/sharepoint-online)

      • Office 365 Management API ⇒ Application Permissions ⇒ ActivityFeed.Read

    • Once all the required permission is added, click "Grant admin consent"

    Enabling Auditing

    • Sign into the Microsoft Purview portal using Microsoft Edge browser

    • Select the Audit solution card. If the Audit solution card isn't displayed, select View all solutions and then select Audit from the Core section

    • If auditing isn't turned on for your organization, a banner is displayed prompting you start recording user and admin activity. Select the Start recording user and admin activity banner.

    • In certain cases, recoding cannot be enabled immediately and requires additional configuration. If this applies, users will be prompted to enable the customization setting. Select OK, and a new banner will appear, informing you that the process may take 24 to 48 hours to complete. After this waiting period, repeat the previous step to proceed with enabling recoding.

    Steps to Enable Data Streaming for OneDrive

    1. Create a New Scan Configuration

    1. From the Data Sources page, select OneDrive from the list of available data sources. In the Scan Configurations list, create a New Configuration.

    2. Make sure the connection has a Name and Credentials set. Then select the Path icon.

    2. Pick a Folder for Real-Time Events

    1. Click on the Folder icon in the Path field to select the folder you want to monitor for real-time events.

      • Magnifying glass icon: Folders with this icon next to them indicate that real-time events can be subscribed to from this directory.

    2. After selecting the folder, click Save & Close to finalize the changes.

    3. Clock icon: When data streaming is being activated, the clock icon will appear, indicating that the subscription is being processed. Once the subscription is activated, this icon will change to a green magnifying glass.

    4. After enabling Data Streaming, the system will automatically handle the subscription to OneDrive’s real-time events. There is no need to manually configure Webhooks.

    Monitoring Real-Time Events

    After the subscription is activated (green magnifying glass icon), real-time events will start flowing into the platform, and you will be able to monitor them from various sections of Getvisibility.

    Viewing Events in the Live Events Section

    1. Navigate to the Live Events section under Administration to view a detailed audit log of all streaming events.

    2. In this section, you can filter and view event details.

    Azure Portal
    In order to set this up, click on Administration → Detectors to bring them to the Detectors page.

    Here there is a list of pre-defined common Detectors that can be used.

    To begin the setup click on the Create button on the top right corner of the screen.

    This brings up the Detector Creation Screen.

    1. Provide a Query Name. For this example "Employee Salaries".

    2. Define where the Search Base of the Detector will look (i.e. search through the contents of a file or the file path). For example to search through the full document contents to look for certain salary-related keywords, select Content.

    3. In the Contain field set the relevant salary-related keywords that might trigger a detector hit in a potentially sensitive document. “Salary” “Compensation Package” “Payslip” “Payroll” “Compensation Structure” “OTE”

    4. If there are terms that the Detector is to ignore set them in in the Not Contain field.

    5. Click the Enabled button to turn on and then Save the Detector.

    The new Detector named Employee Salaries should now be visible in the list of Detectors

    A new scan will be needed to detect for Employee Salaries.

    Operations

    Each token that is added to a detector is related to the other tokens like an OR condition. AND conditions are not available detectors but this functionality can be configured indirectly through the data asset registry or directly through RegEx pattern matching.

    Why it is different to Pattern Matching

    Detectors work differently to Pattern Matching in a number of ways. Firstly, they can scan the entire contents of a document and path name for keywords while a traditional regex search is limited to searching through the first X amount of words across all documents. Detectors leverage advanced AI and ML techniques such as Fuzzy Word Search and Percolation Search in order to search for phrases across an entire document in a fraction of the time it would take to search with Pattern Matching.

    Usages

    Defining Data Asset

    An important feature of DSPM is the ability to identify data assets that are important to the organisation and assign those assets in the inventory. Detectors are a powerful method that work in conjunction with the AI Mesh to find critical, sensitive, and regulated data during scans.

    GQL queries

    Once Detectors are configured and scans are underway, users can access them for describing queries in GQL. Use the detectorHits value as shown below. GQL will give suggestions to help speed up filtering.

    Analytics

    Detectors are used along with the AI Mesh to analyse data and visually present findings in the Analytics Dashboard. Detectors associated with various data assets and types can be found through the out-of-the-box widgets and play a crucial role in helping to identify specific important data.

    Create Employee Lists

    In order to identify employee data during scans it can be useful to add all employee names to a detector. This means a detector that helps identify HR data located throughout the data estate.

    Overall, detectors give a better understanding of the data and help define very specific attributes as well as broad categories of data assets.

    • Note: Only users with admin role have access to Control Center

    • Go to the Applications menu item, click Desktop Auth Client and find Client ID

    • Customer ID should be provided by iManage admins, but if it is not provided, it can be retrieved from the /api response

      • Get Access Token

      • Get Customer ID

    • Go to the Roles menu item and set the following:

      • Select Global Management to setup admin roles. Enable the necessary options.

      • Select Library-level Management to setup library roles

    • Permissions required

      • For scanning

        • System Access > Read-only

      • To move files

    • Click on the Folder icon in Path to select a particular path to scan, or leave the path as empty to scan all

    • Save the configuration

    • Once the configuration is saved, click on the icon on the right and select Start trustee scan to begin Trustee scanning

    • The scan results can be viewed under Dashboard -> Access Governance

    • Click on the icon on the right and select Start file scan to begin file scanning

    • The results can be viewed under Dashboard -> Enterprise Search

    [email protected]
    iManage

    Azure AD

    How to create an Azure AD Connector app to connect to Azure Active Directory (Microsoft Entra ID).

    Registering an Azure App

    • Login to Azure Portal

    • If there are multiple tenants to choose from, use the Settings icon in the top menu to switch to the tenant in which needs to be registered to the application from the Directories + subscriptions menu

    • Browse to App Registration and select New registration

    • On the App Registration page enter the below information and click the Register button.

      • Name: (Enter a meaningful application name that will be displayed to users of the app)

      • Supported account types:

        • Select which accounts that the application will support. The options should be similar to the below screenshot.

    • Navigate to Manage -> Certificates and secrets on the left menu, to create a new client secret

    • Provide a meaningful description and expiry to the secret, and click on Add

    • Once a client secret is created, note its Value and store it somewhere safe. NOTE: this value cannot be viewed once this page is closed.

    • Navigate to Manage -> API permissions on the left menu, and Add a permission

    • Select Microsoft APIs -> Microsoft Graph

    • Select Application permissions

    • Permissions required

      • Scanning only:

        • Microsoft Graph > Application permissions > AuditLog > AuditLog.Read.All

        • Microsoft Graph > Application permissions > Directory > Directory.Read.All

    Configuring Azure AD connector in Dashboard

    • Navigate to Administration -> Data Sources -> Azure AD -> New scan

    • Provide the Directory (tenant) ID, Application (client) ID and Client Secret value generated in the above steps from the Azure application

    • Save the configuration

    • Once the configuration is saved, click on the icon on the right and select Start trustee scan to begin scanning

    • The scan results can be viewed under Dashboard -> Access Governance

    Analytics

    Detailed description on Dashboard Widgets

    About

    The Analytics page and its boards showcase various metrics, charts, and graphs that detail the findings from data scans, including overexposed files, sensitive data, and data at risk.

    These are critical for understanding and managing the organisation's data security and compliance posture. It identifies potential vulnerabilities, risks, and compliance issues, enabling informed decision-making to mitigate threats and enhance data protection strategies.

    The primary users of the Analytics page are CISOs (Chief Information Security Officers), security analysts, data protection officers, and IT administrators who are responsible for the organisation's data security and compliance. It provides these stakeholders with a comprehensive overview of the data security health of the organisation.

    The Analytics page is fully populated after DSPM scans have been completed but it can be accessed during scan to view live information.

    OneDrive

    How to create a OneDrive Connector app to scan OneDrive accounts.

    Required Whitelisting

    The following URLs needs to be whitelisted:

    • Microsoft Graph API:

    Confluence Cloud Streaming Configuration

    This document provides information on how to configure Confluence Cloud connection with real-time events monitoring and data streaming.

    Overview

    Follow this guide to integrate Confluence Cloud with your system for real-time events monitoring.

    To enable DDR (Streaming) for an existing Confluence Cloud instance, follow these steps:

    Box

    How to create a Box Connector app to scan Box accounts.

    Creating a Box app

    • Login to relevant account.

    • Navigate to .

    Keycloak User Federation using LDAP over SSL

    This feature is supported from GV Essentials version 4.5.51

    The CLI sections are exclusive to Helm and Air-gapped systems - these steps are not required for clusters connected to Rancher. There are no line breaks in the commands, unless otherwise noted.

    Scan History

    How to find the history of Scans performed on a Data Source

    1. Go to Administration > Data Sources

    2. Click on a Data Source

    3. Click on the “Last Scan Status” symbol

    curl -X POST "https://your.imanage.server.com/auth/oauth2/token" \
    -d "username=YOUR_USERNAME" \
    -d "password=YOUR_PASSWORD" \
    -d "grant_type=password" \
    -d "client_id=YOUR_CLIENT_ID"
    curl -X GET "https://your.imanage.server.com/api" \
    -H "X-Auth-Token: YOUR_ACCESS_TOKEN"

    1.4

    2.0

    20,000 Users or Less

    32

    144

    2.5

    25,000 Users or Less

    40

    176

    3.0

    3.6

    20,000 Users or Less

    64

    272

    4.8

    25,000 Users or Less

    80

    336

    6

    5.6

    20,000 Users or Less

    80

    336

    7.3

    25,000 Users or Less

    100

    416

    9.0

    6.3

    15,000 Users or Less

    120

    496

    9.2

    20,000 Users or Less

    160

    656

    12.1

    25,000 Users or Less

    200

    816

    15

    Client Id

    (ID from Part 1)

    The application Client ID.

    Domain

    your.imanage.server.com

    Crucial: Your on-premise server's hostname.

    Path

    (Optional)

    Leave blank to scan all content, or click the folder icon to select a specific path.

    Read > GetObjectAcl
  • List > ListAllMyBuckets

  • List > ListBucket

  • Tagging > PutObjectTagging

    .NET 8
    Live Events Section
    Event Details View

    Document > Delete

  • To revoke permissions

    • System Access > Not Read-only

  • For tagging

    • Document > Import / Create

  • Configuring iManage connector in Dashboard

    • Navigate to Administration -> Data Sources -> iManage -> New scan

    • Provide the customer id, client id, username, password and domain value

  • “Accounts in this organizational directory only” can be selected:

  • Leave the Redirect URI as empty and Click Register

  • Note the Application (client) ID, Directory (tenant) ID values

  • Once all the required permissions are added, click Grant admin consent

    The Analytics page gathers its information through the DSPM platform's data discovery, classification, and risk assessment processes. The platform’s connectors are set up to scan the organisation's digital environment, identifying and classifying data across systems and repositories, and evaluating the risks based on various factors such as sensitivity, exposure, and compliance requirements. This data is then aggregated, analysed, and presented on the Analytics Boards in an easily digestible format.

    The Analytics page is found within the DSPM platform's user interface under the dedicated "Analytics" section.

    The boards

    DSPM comes with 22 preconfigured boards out-of-the-box. Here are brief descriptions of the use cases they cover.

    • Financial Data At Risk: Focuses on identifying and mitigating risks associated with financial data, essential for preventing fraud and ensuring regulatory compliance.

    • Data Exposure: See potential data exposure risks, including ransomware impact, sensitive data distribution, and high-risk data locations across various assets and attributes.

    • Classification Overview: Provides a snapshot of data classification across the organisation, aiding in the identification of sensitive data and ensuring compliance with data protection regulations.

    • Key Data Overview: Highlights critical data assets within the organisation, enabling focused protection efforts on the most valuable and sensitive information.

    • Cataloged Files: Offers a detailed inventory of all catalogued files. These are files that have not passed through the ML pipeline. This helps identify any data issues.

    • Shadow Data: Reveals unmanaged or unknown data residing outside of controlled environments, reducing risks associated with data sprawl and exposure.

    • HR Data At Risk: Highlights vulnerabilities within human resources data, protecting sensitive employee information from breaches and unauthorised access.

    • Data Risk Assessment: Offers a detailed view of data risk factors, highlighting high-risk files, ownership gaps, and critical exposures to aid in mitigating security threats and ensuring compliance.

    • Unprotected Data: Identifies data lacking adequate security controls, allowing for quick remediation and the strengthening of data protection measures.

    • Data Ownership: Clarifies data stewardship within the organisation, promoting accountability and facilitating effective data management and security practices.

    • Duplicate Files: Identifies and addresses issues of data redundancy, improving storage efficiency and data management practices.

    • Data Risk Management: Identify and assess risks related to PHI, medical data, and PII exposure across various files and categories using the Data Risk Management dashboard.

    • Ransomware Exposure: Evaluates the organisation's vulnerability to ransomware attacks, facilitating proactive measures to protect critical data assets.

    • ROT Data: Identifies redundant, obsolete, or trivial (ROT) data that clutters systems and poses unnecessary risk, enabling effective data clean-up and policy enforcement.

    • Executive Data at Risk: Targets the specific data risks associated with executive-level information, ensuring high-profile data receives adequate security measures.

    • High Risk Users: Identifies users with excessive permissions or abnormal access patterns, enabling organisations to mitigate insider threats and enforce least privilege access policies.

    • Classification Distribution: This dashboard provides a comprehensive overview of data classification, distribution, and storage locations across different sensitivity levels and data sources.

    • Scan Status: The Scan Status board provides real-time insights into the progress of ongoing data scans, allowing organisations to monitor the coverage and completeness of their data discovery and security efforts.

    • Gen-AI Oversight: Monitors GPT deployments and user activities, providing insights into file uploads, chat usage, and potential risks associated with AI-driven operations.

    • Data Compression Schedules: Provides insights into data compression activities, optimising storage utilisation and enhancing data management efficiency.

    • Data Incidents: Summarises past and present data security incidents, providing insights from past incidents and enhancing organisational resilience against future threats.

    • Gen-AI Readiness: Evaluates a company's readiness for adopting GenAI by analysing the availability and sensitivity of data that could be used for AI training or RAG applications.

    While the default boards provide excellent coverage for the most frequent data security and compliance use cases, it can be beneficial to edit some of the input parameters to suit some specific customer requirements.

    The interface for editing the boards' widgets is designed for ease of use, incorporating GQL (Getvisibility Query Language) and graphical elements.

    See the GQL Reference Guide for full information.

    Editing Widgets Graphically

    There are a number of widgets available and each of them have their own unique customisation options.

    Counter

    The widget's design aims to provide a customisable and at-a-glance view of specific data metrics, which can be particularly useful for quickly accessing the volume of data that matches certain criteria, such as sensitive files or risk levels.

    Users can choose the dataset they wish to count from, like files, trustees, or agent activities. They can also employ GQL to refine their search and set the aggregation function (e.g., count, sum, average).

    This section allows users to add a descriptive title, position it accordingly, select an icon to represent the data visually, and choose primary and secondary colours for the widget's theme. Users can also toggle the compact mode to change the widget's display size.

    Chart

    These widgets are designed to help users tailor the display of data analytics to their preferences for better interpretation and presentation of data insights.They can have multiple types: Horizontal Bar, Vertical Bar, Line, Ares, or Pie.

    This tab allows users to select the type of dataset to visualise (e.g., files or trustees) and use GQL for specific queries. The 'Field to group by' feature is used to categorise data, with adjustable limits on the results displayed and thresholds for inclusion in the visualisation.

    Users can adjust general settings like chart type, add a title, adjust margins for clarity, and choose a colour palette for the chart. Options for additional customisations such as enabling grid lines or flipping colours for visual differentiation are also present.

    The Legend section has toggles for enabling a legend display and showing percentages, with adjustments for size and positioning on the chart.

    Map

    The map widget is an interactive element that displays geographical data. It is configurable to show specific information based on user-defined criteria. Geographic location can be added during scan set up and is crucial in discovering data sovereignty violations.

    This interface enables the use of GQL to query and filter the data that will be displayed on the map. Enter the query in the search bar and click "SAVE" to apply the filters or "CANCEL" to exit without making changes.

    Here the map's appearance can be customised. Adjust the data, area, border, and background colours using the colour selection tools, and add a title or subtitle as needed.

    Text

    The text widget allows for rich text creation and editing. Users can format the text with the various styling options provided.

    The toolbar has standard text editing options. Users can enter and format their text in the area below the toolbar.

    Table

    The table widget displays data in a structured format.

    This interface shows the selection of a data source (SharePoint Online) and the path to specific files within that source. The use of GQL is available to further query and refine the data. Options to export the data as a CSV file or view the table on the page will be provided. Users can set the result limit, PDF export limit, sorting field and order, and select which columns to display before saving.

    In this settings panel, you can add a title and subtitle for the table, and choose their alignment on the page (e.g., left, center, right).

    Incidents

    This widget is designed to monitor and report on pre-configured data compliance issues, focusing on various data security and management rules.

    Users can select a specific DSPM+ rule from a dropdown menu to focus on. The widget displays the count of rule violations and the corresponding files affected.

    Dual Data Grouping

    The Dual Data Grouping widget is used to organise and visualise complex datasets by multiple attributes simultaneously. It enables detailed analysis of complex data sets by allowing an examination of two separate data attributes concurrently. This enhances the understanding of the relationships within data.

    The settings allow you to define the 'Label' and the 'Field to group by', which in this case is 'Data Attribute Name', and set a display limit for these groupings. Use the GQL search bar to refine the data set. After setting up, click "SAVE" to update the widget or "CANCEL" to discard changes.

    The widget title and colour palette can be customised to visually distinguish the different groupings.

    There is also the option to 'Flip Colours' for the display, to improve visual contrast or accessibility.

    The 'Position' dropdown allows the title and subtitle to be aligned.

    Multi Counter

    The Multi Counter widget is designed to track and display counts for multiple items or categories within a dataset, useful for monitoring and comparing quantities at a glance.

    Each counter can be set to track a different field. Users can customise the criteria for each counter using the search fields provided and add additional counters if needed.

    DSPM Analytics presents essential data insights through its interface, offering a practical snapshot of data security and compliance statuses. This straightforward overview assists those in charge of data security with the necessary information to make quick, informed decisions to protect their organisation’s data.

    Azure Authentication: https://login.microsoftonline.com

    Registering an Azure App

    • Login to Azure Portal

    • If there are multiple tenants to choose from, use the Settings icon in the top menu to switch to the tenant in which needs to be registered to the application from the Directories + subscriptions menu.

    • Browse to App Registration and select New registration

    • On the App Registration page enter below information and click Register button

      • Name: (Enter a meaningful application name that will be displayed to users of the app)

      • Supported account types:

        • Select which accounts the application will support. The options should be similar to those below. Select “Accounts in this organizational directory only”:

        • Leave the Redirect URI as empty and Click Register

    • Note the Application (client) ID, Directory (tenant) ID values

    • Navigate to Manage -> Certificates and secrets on the left menu, to create a new client secret

    • Provide a meaningful description and expiry to the secret, and click on Add

    • Once a client secret is created, note its Value and store it somewhere safe. NOTE: this value cannot be viewed once you leave this page

    • Navigate to Manage -> API permissions on the left menu, and Add a permission

    • Select Microsoft APIs -> Microsoft Graph

    • Select Application permissions

    • For UnifiedPolicy.Tenant.Read

      • Navigate to Manage -> API permissions on the left menu, and Add a permission

      • Select APIs my organization uses tab

      • Search for Microsoft Information Protection Sync Service

      • Select Application permissions > UnifiedPolicy.Tenant.Read

    • For InformationProtectionPolicy.Read.All

      • Navigate to Manage -> API permissions on the left menu, and Add a permission

      • Select APIs my organization uses tab

    • Permissions required

      • For scanning

        • Microsoft Graph > Application permissions > Sites > Sites.Read.All

        • Microsoft Graph > Application permissions > Directory > Directory.Read.All

    • Once all the required permissions are added, click "Grant admin consent"

    Configuring OneDrive connector in Dashboard

    • Navigate to Administration -> Data Sources -> OneDrive -> New scan

    • Provide the Directory (tenant) ID, Application (client) ID and Client Secret value generated in the above steps from the Azure application

    • Click on the Folder icon in Path to select a particular user's OneDrive to scan, or leave the path as empty to scan all users

    • Save the configuration

    • Once the configuration is saved, click on the icon on the right and select Start file scan to begin the scanning

    • The results can be viewed under Dashboard -> Enterprise Search

    https://graph.microsoft.com
    Prerequisites

    Ensure the following prerequisites are met:

    1. Existing Confluence Cloud Instance: There needs to be an active Confluence Cloud instance.

    2. Enable Development Mode: Activate Development Mode on the Confluence Cloud site to be monitored. Refer to the official Confluence documentation.

    3. Deploy Proxy Container: Set up the Getvisibility container with a public proxy to allow integration with Confluence Cloud.

    Steps to Enable Data Streaming

    Step 1: Configure Confluence Cloud Data Streaming

    1. In the product UI, go to the Data Sources > Confluence Cloud page.

    2. Locate the existing Confluence Cloud scan configuration and select Edit Configuration.

    3. Within the Edit Confluence Cloud Configuration page, toggle Data Streaming to ON.

    4. Copy the Webhook URL provided, as it will be used later.

    5. Click Save & Close to apply changes.

    To enable data streaming, the confluence-cloud-streaming-proxy container will need to be deployed in the infrastructure e.g. using Docker or Kubernetes. This step involves configuring environment variables and setting up Docker for integration with Confluence Cloud.

    Step 2: Set Up confluence-cloud-streaming-proxy Application

    Deployment Instructions

    1. Download Docker image parts: Please download all files listed below:

    1. Merge Docker image parts:

    2. Load Docker image:

    3. Prepare a Docker Environment: Ensure that Docker is installed and configured on the infrastructure where the confluence-cloud-streaming-proxy application will be hosted. This will be the user environment.

    4. Set Environment Variables: Configure the following environment variables to allow the Confluence Cloud instance to communicate with the proxy application:

    Environment variable
    Description
    Example

    APP_LISTENER_PUBLIC_ACCESSIBLE_URL

    Publicly accessible URL at which app can be accessed. It is used in communication between Confluence Cloud Webhook mechanism and app

    e.g.

    APP_WEBHOOK_URL

    Webhook URL (taken from Getvisibility UI Confluence Cloud connector configuration form)

    e.g.

    1. Map Persistent Volume: Map a persistent volume to the /app/db/ directory within the container to ensure data retention across sessions.

    Example docker-compose.yml Configuration

    Use the following example to help set up the Docker configuration. Update the values as needed for the specific environment:

    Once configured, start the container by running docker-compose up -d or an equivalent command based on configured setup.

    Step 3: Expose the Application

    To expose the application publicly, consult with relevant internal team such as IT or DevOps team. For testing ngrok's free plan can be used to expose the app port as needed.

    1. Start the Application: Ensure the application runs before proceeding with the integration setup.

    Step 4: Install the Integration in Confluence Cloud

    To install the integration, follow the steps:

    1. Go to the Manage apps page in Confluence Cloud.

    2. Select the Upload app

    3. Paste the publicly accessible address in the form and press Upload.

    4. The application will install, and the integration will be ready in a few seconds.


    Uninstall integration from Confluence Cloud

    To uninstall the integration follow the steps:

    1. Go to the Manage apps page in Confluence Cloud.

    2. Find Getvisibility Confluence Cloud Streaming Proxy and click Uninstall.

    3. Confirm by selecting Uninstall app.

    4. Delete any associated containers and settings from your organization’s infrastructure

    90MB
    confluence-cloud-streaming-proxy.tar.gz.partaa
    Open
    46MB
    confluence-cloud-streaming-proxy.tar.gz.partab
    Open
    • Select Create New App and then Custom App

    • Select Server Authentication (with JWT) and enter app name, then click Create App

    • In the Configuration tab, change App Access Level to App + Enterprise Access, then, enable Generate user access tokens and Make API calls using the as-user header.

    • Click on Save changes

    • Make sure the below Application Scopes are selected

      • Content Actions > Read all files and folders stored in Box

      • Content Actions > Write all files and folders stored in Box

      • Administrative Actions > Manage users

      • Administrative Actions > Manage groups

    • In the same Configuration tab, scroll down to Generate a Public/Private Keypair

    • This will result in a JSON file being downloaded by the browser

    • In Authorization tab, click Review and Submit followed up with adding a description before submitting the app for review

    • Make note of User ID and Enterprise ID of the App in General Settings tab

    • Exit Dev Console and switch to the Admin Console

    • In Admin Console, go to Apps > Custom Apps Manager and locate the newly created app and click View button

    • Review the information and Authorize the app

    Configuring Box connector in Dashboard

    • Navigate to Administration -> Data Sources -> Box -> New scan

    • Provide the values generated in the above steps from the Box application

    • Click on the Folder icon in Path to select a particular folder to scan, or leave the path as empty to scan all folders

    • Save the configuration

    • Once the configuration is saved, click on the icon on the right and select Start trustee scan to begin the trustee scanning

    • The scan results can be viewed under Dashboard > Access Governance

    • Click on the icon on the right and select Start file scan to begin the files scanning

    • The results can be viewed under Dashboard > Enterprise Search

    File tagging

    Prerequisites

    • The Box Pricing Plans required for metadata writing are Business Plus, Enterprise, or Enterprise Plus. The basic Business plan does not include custom metadata and metadata templates.

    • A metadata template must be created to support Getvisibility's tags. Please follow the below steps to achive this.

      • In the Admin Console, in the lefthand navigation click Content

      • Toward the top of the page, click Metadata

      • Click Create New

      • Click Name Your Template and enter name as getvisibility

      • Create a new attribute named as Classification with options as: Public, General Business, Confidential, Highly-Confidential

    • Similarly, create two more attributes:

      • Distribution with options as: Internal, External

      • Compliance with options as: PCI, PII, PHI

    • Use the Status drop down to indicate this template is Visible

    • Click Save

    Box
    Dev Console
    1. Get CA certificate from customer

    The customer is asked to provide the CA certificate that was used to sign the end-entity certificate of the LDAP server. We are adding this Keycloak’s trust store to make sure the LDAP server’s certificate validates successfully. The provided CA certificate must be an X.509 v3 certificate in the ASCII PEM format (Base64-encoded). The file extension is usually .crt or .cer or .pem. Its content looks like this:

    Optional Checks

    1. The following command confirms that the cert is in the expected format and that it is, in fact, a CA certificate. Validity is not checked here.

    # openssl x509 -in /path/to/ca.crt -text -noout | grep CA

    1. The next command validates the LDAP server’s certificate against the provided CA certificate in the customer’s environment, where the LDAP server is accessible on port 636:

    # echo "q" | openssl s_client -connect dc.mycompany.com:636 -CAfile /path/to/ca.crt | grep -i verif

    2. Create JKS truststore file

    This command will create a JKS truststore file (ca.jks) and add the certificate (ca.crt) to it, protecting it with a password (STR0ngPA55). # keytool -importcert -file ca.crt -keystore ca.jks -alias rootca -storepass STR0ngPA55 -noprompt

    3. Encode truststore file with Base64

    # base64 -w 0 ca.jks

    4. Push trust store file to Keycloak

    Rancher

    Apps > Installed Apps > gv-essentials > Keycloak:

    • Leave the Keycloak URL settings field unchanged.

    • Check Enable Keycloak for LDAP over SSL.

    • Enter password for truststore.

    • Paste value from Step 3 into the Base64 .jks file field.

    • Click Upgrade in bottom right corner.

    CLI

    • Set KUBECONFIG environment variable: # export KUBECONFIG=/etc/rancher/k3s/k3s.yaml

    • Save the truststore file with base64 encoding to a variable: # export cert=$(base64 -w 0 ca.jks)

    • Print the variable and confirm it has the right value: # echo "$cert"

      MIIHggIBAzCCBywGCSqGSIb3DQEHAaCCBx0EggcZMIIHFTCCBxEGCSqGSIb3DQEHBqCCBwIwggb+A(...)

    • Upgrade or reinstall the gv-essentials chart, adding new values while reusing existing ones: # helm upgrade --install gv-essentials gv_stable/gv-essentials --wait --debug --timeout=10m0s --kubeconfig /etc/rancher/k3s/k3s.yaml --reuse-values --set keycloak.ldaps.enabled=true --set keycloak.ldaps.truststorecert="$cert" --set keycloak.ldaps.truststorepass="STR0ngPA55"

    5. Restart Keycloak

    This ensures the new truststore is loaded.

    Rancher

    Workloads > StatefulSets > gv-keycloak:

    • Choose the Redeploy option.

    CLI

    # kubectl rollout restart statefulset gv-keycloak

    6. Configure User federation in Keycloak correctly

    Enable StartTLS: OFF (default) - This must be turned OFF - communication will regardless be encrypted.

    Use Truststore SPI: Always (default) - This ensures that the imported certificate store is used to validate the LDAP server’s certificate.

    For further information, see here.

    7. Test connectivity, credentials, and user sync

    Use the Test connection and Test authentication buttons to make sure both connection and authentication to the LDAP server is successful.

    Ideally, both tests return success. (The exact same message is printed for both the connection and the authentication test.)

    8. Troubleshooting

    Below is an example of a message seen on the Keycloak side (web) and its counterpart in the logs of the Keycloak service (pods named gv-keycloak-…).

    1.

    2024-07-16 09:17:02,557 ERROR [org.keycloak.services] (executor-thread-6) KC-SERVICES0055: Error when authenticating to LDAP: Cannot invoke "org.keycloak.truststore.TruststoreProvider.getSSLSocketFactory()" because "provider" is null: java.lang.NullPointerException: Cannot invoke "org.keycloak.truststore.TruststoreProvider.getSSLSocketFactory()" because "provider" is null

    Keycloak is trying to read trust store to validate a certificate, but the trust store has not been loaded, it could be missing from inside the Keycloak pod (/opt/keycloak/certs/rootCA.jks)

    • Trust store has not been installed via the GetVisibility Essentials Helm chart?

      • Redo step 4.

    • Keycloak has not been restarted after installing certificate?

      • Redo step 5.

    Go to Administration > Data Sources

  • Click on a Data Source

  • Find the required Hamburger Menu

  • Click on Scan History

  • Either of the above options will show the history of scans performed on the relevant Data Source

    Azure Blob

    How to configure Azure Blob connection for scanning.

    Registering an Azure App

    • Login to Azure Portal

    • If there are multiple tenants to choose from, use the Settings icon in the top menu to switch to the tenant in which needs to be registered to the application from the Directories + subscriptions menu.

    • Browse to App Registration and select New registration

    • On the App Registration page enter below information and click Register button

      • Name: (Enter a meaningful application name that will be displayed to users of the app)

      • Supported account types:

        • Select which accounts the application will support. The options should be similar to those below. Select �Accounts in this organizational directory only�:

    • Navigate to Manage -> Certificates and secrets on the left menu, to create a new client secret

    • Provide a meaningful description and expiry to the secret, and click on Add

    • Once a client secret is created, note its Value and store it somewhere safe. NOTE: this value cannot be viewed once you leave this page

    • Navigate to Manage -> API permissions on the left menu, and Add a permission

    • Select Microsoft APIs -> Microsoft Graph

    • Select Application permissions

    • Permissions required

      • Microsoft Graph > Application permissions > Device > Device.Read.All

      • Microsoft Graph > Application permissions > Directory > Directory.Read.All

      • Microsoft Graph > Application permissions > Group > Group.Read.All

    Azure Storage Subscription ID, Resource group and connection strings

    A is needed for the storage account that is to be scanned.

    • Login to

    • If there are multiple tenants to choose from, use the Settings icon in the top menu to switch to the tenant which needs to be registered to the application from the Directories + subscriptions menu

    • Browse to Storage accounts and select the account to be scanned

    • Once the storage account is selected, note the Resouce group and Subscription ID values in the Overview page

    • Navigate to Security + networking -> Access keys on the left menu, and click on Show on the Connection string

    • Copy this Connection string value

    • Access Control (IAM) Role assignment - there are 2 options, one is to assign a built-in role, the other is to create and assign a custom role. Using a built-in role is an easier option to configure, while a custom role may be preferred to ensure least privileges assignment for increased security.

      • Option 1: In the storage account, go to Access Control (IAM) and check on either Storage Blob Data Owner or Data Contributor role to assign the role to the blob storage. (Per the Data Contributor role is the least privileged, built-in role for Listing Containers)

    *** Firewall rules must also be in place to allow the DSPM server to connect to

    Configuring Azure Blob connector in Dashboard

    • Navigate to Administration -> Data Sources -> Azure Blob -> New scan

    • Provide the Connection string value obtained from above steps

    • Click on the Folder icon in Path to select a particular share to scan, or leave the path as empty to scan all shares

    • Save the configuration

    • Once the configuration is saved, click on the icon on the right and select Start file scan to begin the scanning

    • The results can be viewed under Dashboard -> Enterprise Search

    K3S installation

    We use , an open-source container orchestration system to manage our applications.

    At the moment the only Kubernetes distribution supported is ( for the official documentation) by for both on-premise and cloud deployments.

    Why K3s?

    Kubernetes has been widely adopted in modern software development as it offers a powerful, portable and open-source platform that automates the management of containerized applications.

    When setting up a Kubernetes environment, it comes in two flavours: vanilla Kubernetes and managed Kubernetes. With vanilla Kubernetes, a software development team has to pull the Kubernetes source code binaries, follow the code path, and build the environment on the machine. On the other hand, managed Kubernetes comes pre-compiled and pre-configured with tools that improve features to enhance a certain focus area, such as storage, security, deployment, monitoring, etc. Managed Kubernetes versions are also known as Kubernetes distributions. Some popular Kubernetes distributions are Rancher, Red Hat OpenShift, Mirantis, VMware Tanzu, EKS, GKE and AKS.

    AI Mesh Overview

    Getvisibility offers cybersecurity AI products, specifically aimed at Data Security Posture Management (DSPM). In a broader sense, we also provide solutions for Data Governance. Our flagship product, DSPM+, is a sophisticated file classification pipeline. It seamlessly integrates with various data sources through a range of connectors including, but not limited to, Samba for network file servers, Windows file servers, Google Cloud, AWS, Dropbox, and SharePoint. The process involves downloading all the files from these sources, putting them through a pipeline that includes implementing our cutting-edge artificial intelligence technology to analyse the context of each file, and the classifying them under multiple criteria.

    The AI Mesh

    At the heart of this classification pipeline lies an artificial intelligence classification service designed to work on unstructured text. Once the text is extracted from files sourced through various connectors, it undergoes classification by diverse machine learning algorithms.

    Keycloak configuration

    How to complete the Keycloak installation setup.

    Introduction

    Keycloak is an Open-source product which allows Single Sign-On (SSO) and enables Identity and Access Management integration to allow for a quick, safe, and secure integration of authentication within modern applications.

    Below are the steps involved in configuring Keycloak, and you may choose to skip the Optional steps.


    K3s - Upgrade

    Prerequisites

    1. Access to all nodes of the cluster through one of the following methods - Rancher - SSH protocol - AWS Session Manager

    SharePoint Online

    How to create a SharePoint Connector app to scan SharePoint Online (SPO) accounts.

    Registering an Azure App

    • Login to

    • If there are multiple tenants to choose from, use the

    Streaming

    The integration of Data Streaming and File Lineage into the DSPM platform provides a comprehensive solution for real-time data monitoring and tracking across both cloud and on-premises data sources. This enhancement enables organizations to dynamically track file origins, data transformations and movements, and end-usage in real time, strengthening security, compliance, and auditability. By introducing these functionalities, businesses can seamlessly monitor data activities and movements across various data sources, providing up-to-date visibility over data estate and offering deeper insights into file history for e-forensics use cases and risk mitigation.

    By implementing Streaming, we unlock crucial use cases such as File Lineage tracking, and Data Detection and Response capabilities, enabling real-time visibility into data activities. This also builds the foundation for anomaly detection capabilities, frequently requested by customers. For instance, scenarios like a user resetting their password, accessing confidential data, and downloading it can be quickly identified. By providing almost real-time updates and visibility into the data estate, businesses can seamlessly monitor data activities, mitigating risks and improving security.

    PRECONDITION:

    During cluster installation, network administrators need to open on firewall exclusion for incoming requests for path:

    where ${HOST_DOMAIN} it's host domain of DSPM platform installation.

    cat confluence-cloud-streaming-proxy.tar.gz.part* > \
    confluence-cloud-streaming-proxy.tar.gz.joined
    docker load --input confluence-cloud-streaming-proxy.tar.gz.joined
    services:
      app:
        image: getvisibility/confluence-cloud-streaming-proxy:v0.3.2
        ports:
          - "8080:8080"
        environment:
          APP_LISTENER_PUBLIC_ACCESSIBLE_URL: https://5977-88-156-142-22.ngrok-free.app
          APP_WEBHOOK_URL: https://tenantabc.getvisibility.com/scan-manager/external/webhooks/notification/71ccab3d56980a2d9c766f42c86d36ffedc34258a0f226aaf56a628f06e9d89d
        volumes:
          - ./app-db/:/app/db/
    -----BEGIN CERTIFICATE-----
    MIIGBTCCA+2gAwIBAgIUaIGnTiJx27iBiIF+4jIkb7o5miswDQYJKoZIhvcNAQEL
    ...
    -----END CERTIFICATE-----
                        CA:TRUE
    depth=1 C = IE, ST = Ireland, L = Cork, O = MyCompany Ltd, CN = mycompany.com, emailAddress = [email protected]
    verify return:1
    depth=0 CN = mycompany.com
    verify return:1
    DONE
    Verification: OK
    Verify return code: 0 (ok)
    MIIHggIBAzCCBywGCSqGSIb3DQEHAaCCBx0EggcZMIIHFTCCBxEGCSqGSIb3DQEHBqCCBwIwggb+A(...)
    https://your-ngrok-url.app
    https://5977-88-156-142-22.ngrok-free.app
    https://tenantabc.getvisibility.com/...
    https://tenantabc.getvisibility.com/scan-manager/external/webhooks/notification/71ccab3d56980a2d9c766f42c86d36ffedc34258a0f226aaf56a628f06e9d89d

    The host domain needs to be publicly available on the web.

    Ensure that the certificate used is one that is trusted by the Data Source provider. For example with Microsoft services more information on the certificates that they accept can be found here.

    Multitenancy Setup

    For the multitenancy setup, we need to specify ${HOST_DOMAIN} as

    For Data Detection and Response (DDR) to function effectively, the callback endpoint URL must remain open and accessible beyond just the initial setup phase. DDR relies on real-time event notifications and data stream updates, continuously sent to the callback URL. If the callback endpoint is closed or restricted after setup, DDR will fail to receive critical updates, which may result in:

    • Delayed or missing alerts on data access, movement, or security threats.

    • Incomplete monitoring of file lineage and activities, impacting compliance and forensic investigations.

    To ensure uninterrupted functionality, organisations must configure their network to allow incoming requests to the callback URL from all necessary data sources.

    Additionally, for on-premise deployments, it is critical that the webhook URL is accessible by external resources to receive notifications. If external services cannot reach the callback URL, DDR will not function correctly, leading to missed event detections and security blind spots. Network administrators must ensure the necessary firewall rules and routing configurations are in place to allow external communication with the webhook.

    https://${HOST_DOMAIN}/scan-manager/external/webhooks/notification
    {{ .Values.clusterLabels.cluster_name }}.{{.Values.clusterLabels.rancher}}.app.getvisibility.com 

    Search for Microsoft Information Protection API

    • Select Application permissions > InformationProtectionPolicy.Read.All

    • For Azure Rights Management Services > Content.Writer

      • Navigate to Manage -> API permissions on the left menu, and Add a permission

      • Select Azure Rights Management Services tab

      • Select Application permissions

      • Select Content > Content.Writer

    Microsoft Graph > Application permissions > Files > Files.Read.All

  • Microsoft Graph > Application permissions > User > User.Read.All

  • For reading Sensitivity labels

    • Microsoft Graph > Application permissions > InformationProtectionPolicy > InformationProtectionPolicy.Read.All

    • APIs my organization uses > Microsoft Information Protection Sync Service > Application permissions > UnifiedPolicy.Tenant.Read

  • For revoke permissions

    • Microsoft Graph > Application permissions > Files > Files.ReadWrite.All

  • For tagging

    • Microsoft Graph > Application permissions > Sites > Sites.Manage.All

  • For MIP tagging

    • Azure Rights Management Services > Application permissions > Content.Writer

    • Microsoft Graph > Application permissions > Directory > Directory.Read.All

    • Microsoft Graph > Application permissions > Files > Files.ReadWrite.All

    • Microsoft Graph > Application permissions > Sites > Sites.Manage.All

    • Microsoft Graph > Application permissions > InformationProtectionPolicy > InformationProtectionPolicy.Read.All

    • APIs my organization uses > Microsoft Information Protection API > Application permissions > InformationProtectionPolicy.Read.All

  • Leave the Redirect URI as empty and Click Register

  • Note the Application (client) ID, Directory (tenant) ID values

  • Microsoft Graph > Application permissions > User > User.Read.All

  • Once all the required permissions are added, click "Grant admin consent"

  • We also need to assign Reader role to the azure app created in the first step

    • Save the changes.

    • Option 2: This option creates a custom role and assigns the same permissions as the Data Contributor role, except for the delete permissions. In the Blob storage account, go to Access Control (IAM) and click Add to create a new role. Name the role with a preferred name, and choose the following actions below to assign to this custom role. Select this custom role for the blob and save changes.

    • We also need to assign Reader role to the azure app created in the first step

  • Real Time Events Monitoring (Streaming) Permissions: To enable "Real Time Events Monitoring (Streaming)", the following additional Azure permission roles are required:

    • EventGrid Data Contributor

    • EventGrid EventSubscription Contributor

    • EventGrid TopicSpaces Publisher

    Assign these roles using Access Control (IAM) in the Blob storage account, similar to the steps mentioned above for assigning the Storage Blob Data Owner or Data Contributor role.

  • Next, in the Networking tab, under Public network access, select "Enabled from all networks", or "Enabled from select virtual networks and IP addresses". If the latter was chosen, then under Firewall section add the IP address range for the DSPM server.

  • Enable "Allow trusted Microsoft services to access this storage account" and Save the changes.

  • connection string
    Azure Portal
    Microsoft's documentation
    https://(mystorageaccount).blob.core.windows.net

    Kubernetes distributions can have different components that may cause applications that work in one distribution to not necessarily work or even crash into another. Some of the most important components that differ between distributions are:

    • Container Runtime: The container runtime is the software that is responsible for running containers. Each Kubernetes Distribution may offer support for different Container Runtimes. Some popular container runtimes include Docker, CRI-O, Apache Mesos, CoreOS, rkt, Canonical LXC and frakti among others.

    • Storage: Storage is important for Kubernetes applications as it offers a way to persist this data. Kubernetes’ Container Storage Interface (CSI) allows third-party vendors to easily create storage solutions for containerized applications. Some Kubernetes Distributions build their own storage solutions while others integrate with existing third party solutions. Popular storage solutions for Kubernetes include: Amazon ElasticBlock Storage (EBS), GlusterFS, Portworx, Rook, OpenEBS among others.

    • Networking: Kubernetes applications are typically broken down into container-based microservices which are hosted in different PODs, running in different machines. Networking implementations allow for the seamless communication and interaction between different containerized components. Networking in Kubernetes is a herculean task, and each distribution may rely on a networking solution to facilitate communication between pods, services and the internet. Popular networking implementations include Flannel, Weave Net, Calico and Canal among others.

    In order to offer our customers a better and more seamless experience while configuring, running, upgrading and troubleshooting our products while also avoiding compatibility issues between different distributions we decided to officially support ONLY ONE Kubernetes distribution: K3s. The main reasons for choosing K3s are:

    1. Costs — K3s is 100% open source and there’s no need to pay for any expensive licenses.

    2. Less setup overhead — a lot of time is saved when setting up a new environment because you don’t need to go through a lengthy process of acquiring extra licenses based on how many CPU cores you have. Also, K3s can be installed using only one command.

    3. It supports many Linux distros — K3s supports popular Linux distributions including open source ones, it can also run both on-premise and in the cloud (AWS, Azure, GCP).

    4. It’s fast and lightweight — K3s is packaged as a single <100MB binary and its lightweight architecture makes it faster than stock Kubernetes for the workloads that it runs.

    5. Easy to update — Thanks to its reduced dependencies.

    6. Batteries included — CRI, CNI, service load balancer, and ingress controller are included.

    7. Smaller attack surface — Thanks to its small size and reduced amount of dependencies.

    8. Certified — K3s is an official project that delivers a powerful certified Kubernetes distribution.

    9. Flexible — you can run K3s using single-node or multi-node cluster setup.

    K3s and Antivirus

    Antivirus software adds an unknown variable to the existing complexity of Kubernetes. Most of them have not yet kept up with newer technologies such as Kubernetes and have not reached a CNCF certified status. In environments where antivirus software had been enabled, there has seen issues stemming from interfering actions from such software. As an example, there have been incidents where the antivirus software had pruned files in the ContainerD filesystem incorrectly, causing the ContainerD mounts to go corrupt and causing data loss.

    Issues resulting from third-party tools, such as antivirus and intrusion detection software, interfering with ContainerD or other necessary system calls are deemed resolved should disabling such tools restore functionality.

    Also, having Antivirus software running in the nodes can result in poor performance. Some antivirus software may continue to impact even if all paths our products rely on are excluded. Whilst performing scanning and inspection of files, disk I/O and CPU usage can increase, thus slowing our product performance. In some cases, this reduction in performance can be dramatic and can even render our products impossible to use.

    Lastly, where a customer environment has deviated from our supported configurations, we always reserve the right to recommend the customer to revert to a supported configuration to resolve the reported issue.

    Secure mode K3s

    If the cluster has been deployed before January 2023, most probably it will come across this issue at some point.

    If cluster runs a pod that uses a public image that hasn’t yet been copied to our public repo, error ErrImagePull with the message Failed to pull image will arise.

    To address this issue this configuration file needs to be added,

    mirrors:
      "*":
        endpoint:
          - "https://images.master.k3s.getvisibility.com"
        rewrite:
          "(.*)": "gv-public/$1"
      "images.master.k3s.getvisibility.com":
        endpoint:
          - "https://images.master.k3s.getvisibility.com"

    This file instructs K3s to rewrite all images (except images.master.k3s.getvisibility.com) and redirect them to Getvisibility’s public repository.

    Requirements

    The minimum requirement for the Kubernetes cluster is a single node (1 virtual machine) with the following specs:

    EDC / Synergy
    DDS / DSPM
    Ultimate

    CPU cores

    8

    16

    20

    Memory

    32GB

    64GB

    80GB

    Storage

    500GB min 32M inodes

    600GB min 39M inodes

    Please also refer to Estimate hardware capacity needs.


    Installation

    Ensure the following items are in place and configured:

    • Domain Name Service (DNS) with public name resolution enabled

    • Network Time Protocol (NTP)

    • Software Update Service - access to a network-based repository for software update packages.

    • Fixed private IPv4 address

    • Unique static hostname

    For details on how to configure Rancher behind a proxy, refer to the official Rancher documentation and ensure the network settings mentioned above are applied.

    If using proxy, please run this before using starting k3s.sh installation:

    When running k3s.sh script you need to provide product name in a form of PRODUCT_NAME argument. This will instruct the installer to test your current environment against product requirements, which are different.

    Allowed product names are:

    • synergy

    • focus

    • dspm

    • enterprise

    • ultimate


    Capitalization of the name is important. If you provide name that cannot be recognized or if you don't provide product name at all, the script will default to PRODUCT_NAME="dspm".

    This is just a regular k3s installation command when you want to install Getvisibility Enterprise product.

    If you want just to see if your environment meets all the requirements use the command above.

    Above command will allow you to skip checking how installed memory size, number of required CPU cores and storage requirements fare against your environment.

    If you want to skip both hardware and connectivity checks use this argument.

    Be cautious when skipping built-in checks - we built them for the product to achieve optimal performance with minimal required maintenance.

    Argument
    Description

    SKIP_PRECHECK=true

    to skip all built in checks

    SKIP_SYSTEM_CHECKS=true

    to skip hardware checks

    SKIP_NETWORK_CHECKS=true

    to skip connectivity checks

    ONLY_PRECHECK=true

    will run precheck only and stop after that

    The installer, depending on the combination of arguments provided, will perform a set of actions before returning back to the command line.

    Provided not all the checks have been skipped, it is possible for the installer to abort the installation process. If that happens please review the output paying special attention to any WARN messages. Should you have any concerns or questions please cotact the Support with the result screen attached.

    An example output of installer showing some tests failed
    A successful installation will result in a screen containing no warnings or errors

    Run the kubectl registration command:

    The command below is just an example, it will not work during deployment. For direct customers, Customer Support Team will provide the registration command, otherwise you should have been provided registration command in the Welcome Email.

    For security reasons the registration command can be used only a single time, the command becomes invalid after the first use. In case you need to run it again you must contact the support team for a new registration command.

    Monitor the progress of the installation:

    • The K3s deployment is complete when elements of all the deployments (coredns, local-path-provisioner, metrics-server, traefik and cattle-cluster-agent) show at least "1" as "AVAILABLE"

    • In case of errors you can inspect the logs of a pod using kubectl logs , e.g.


    K3s support matrix

    Please note that we don’t use Docker as the container runtime, instead we use containerd.


    Network settings

    Your network should be configured to allow the following public urls to be accessible over port 443 (HTTPS) and HTTPS traffic is bypassed (NOT intercepted):

    For more details on how to configure Rancher behind a proxy Configuring Rancher and Fleet agent to run behind an HTTP proxy

    Rancher might be trying to reach to git.rancher.io since it’s a default hard-coded repository, but we have our own private repo with all our charts. So it's ok to block it as we can't disable it.

    Kubernetes
    K3s
    click here
    Suse Linux
    124KB
    EN-2.6.5SupportMatrix-300422-0116-26_.pdf
    PDF
    Open
    The classification process utilizes an AI mesh - a network composed of different AI components.
    Representative Image of the Getvisibility AI Mesh

    The typical mesh deployment is inhomogeneous, and contains the following types of nodes:

    • LLM-like miniature language models transforming text into salient document vectors, with between 10-30 million parameters;

    • deep neural network classifiers for sentiment analysis, with below 100,000 parameters, use the document vectors to produce classification outcomes;

    • bag-of-word models for topic detection;

    • filters based on regular expressions or fuzzy text searches;

    • other types of evaluators (ex. complexity of text, etc) - Python code segments;

    • mapping multiple input models into outputs with .

    Size of models

    Our deployments are 10 times smaller than even the smallest and most efficient Large Language Model (LLM)-based classifier deployments. This scale allows us to classify a file within 200 milliseconds, relying solely on a normal CPU without the need for specific GPU deployment. Additionally, given that our models are 10,000 times smaller than typical large AI deployments, we are not subject to regulations that apply to large AI deployments, such as the EU AI Act.

    Output signals.

    This network typically generates a multitude of classification outcomes, or signals. Each classification decision is generally binary—true or false—indicating whether the text viewed by the AI mesh is related to a specific signal. Furthermore, each outcome is accompanied by a confidence value, which is a number between zero and one. In rare instances, constituting less than 5% of the cases, the mesh outputs a categorical signal. Unlike the binary true/false, it classifies the text into one of three, four, or possibly even five mutually exclusive categories.

    Accuracy SLA

    The Service Level Agreement (SLA) for the accuracy of the ML components used in the AI mesh stipulates no less than 80% accuracy on a balanced dataset—comprising 50% positive examples and 50% negative examples. This accuracy rate is measured on an out-of-sample basis, meaning the data used for this accuracy assessment is not employed in training the machine learning model. This approach provides insights into the model's ability to generalize.

    Interpreting confidence

    The confidence level associated with each classification outcome in machine learning models, whether binary or categorical, varies between zero and one and indicates the certainty of the prediction. A confidence of 0 suggests that the classifier views the prediction as no better than a random guess, implying a 50% probability of accuracy. On the other hand, a confidence of 1 indicates maximum certainty, meaning the input data closely matches the training data for the given classification. Confidence levels between 0 and 1 are linear and uniformly distributed based on the training data, with a confidence of 0.5 representing a median level of certainty.

    The AI mesh as a Bayesian Network

    The AI Mesh functions as a Bayesian network, where results are propagated forward. This process involves using machine learning models, such as a Continuous Bag Of Words (CBOW) model and various filters, to determine whether a file is confidential. Both outcomes (true and false) are considered with their respective probabilities, which are then propagated forward to influence the confidence score. Users utilizing this confidence score will take into account its value, leading to situations where a strong classification signal might be overshadowed by other signals if, collectively, they provide stronger evidence. In Bayesian networks, this sampling technique is known as forward sampling or ancestral sampling. The AI mesh employs a highly efficient implementation of this technique by constraining the distributions of the internal nodes to either categorical or binary distributions.

    Typical usage

    The typical token window analysed is 512 tokens, which corresponds to roughly a page of text. For larger texts, the results from multiple passes are integrated with the mesh. For shorter texts, the mesh composition can be adjusted to accommodate.

    For example, in order to determine if a document is confidential, in a rudimentary setting, a machine learning model that works on document vectors is involved. This model performs sentiment analysis on the original document to understand if it sounds confidential. Additionally, a simpler model searches for words like "confidential" or words similar to "confidential" syntactically as part of topic detection. There are filters and detectors designed to pick up specific keywords, such as the word "confidential" itself, which may be stamped by another application, included as part of a watermark, or in the context of certification and compliance policies. Finally, a Bayesian network of all these models is used to infer the outcome and associated confidence level.

    Bayesian network using 3 signals to determine an outcome. An AND between weak true and weak false favors false, because it passes forward the probability that both signals are true (which is low). An OR between weak false signals is undecided, because it passes forward the probability that either signals is true (which is higher). In this case, the filter is considered fuzzy.

    Classification Pipeline Diagram

    We list below the functional diagram of the classification pipeline around the AI Mesh.

    Structure of the AI Mesh

    The AI mesh features a stereotypical structure designed to facilitate easy reasoning and training for individuals involved in proofing, training, and selling the mesh. Since the mesh is a directed acyclical graph, it allows for the definition of inputs, intermediary nodes, and outputs.

    Network inputs

    The inputs or entry points take in raw information about the file, which is then analysed and produces some sort of signal. This signal is interpreted by other nodes in the mesh. Inputs include various forms of transforming the input text into document vectors or word vectors, elements collecting statistical information about the input text, or processing it for other types of statistical information collectors. Additionally, filters provide a signal indicating whether certain keywords or patterns of keywords are present in the input text.

    In an effort to streamline the deployment of the AI mesh and make it more user-friendly, there is an emphasis on reducing the number of filters that are directly relevant to the AI mesh. For example, when detecting banking information, a straightforward approach might involve creating detectors for words like "bank" or "account." However, such words' relevance to declassification can vary significantly between use-cases, making it challenging to establish a universally understandable policy for managing these detectors to meet expectations.

    To overcome this challenge, information is organized within the network using CBOW models. This allows for ongoing tweaking of signal sensitivity based on user feedback. The strategy also involves restricting filters to use case-specific information. For instance, to identify confidential information on a specific premises, CBOW models are deployed to detect text indicating confidentiality or secrecy. Machine learning models assess the likelihood of text containing trade secrets or intellectual property. Users are encouraged to input filters relevant to the confidential signal, using specific keywords related to their technology, such as internal product names, codewords, or internal product IDs, which would not be known externally.

    Intermediary nodes

    Intermediary nodes function by utilizing information provided by the inputs or other intermediary nodes, yet they are not visible in the user interface (UI). This can be attributed either to the irrelevance of the information processed by these nodes to the user—such as computation of reading ease scores and document complexity, which could clutter the user's view—or to the inaccuracy of intermediary signals. Efforts are made to furnish a more accurate signal by combining various intermediary signals.

    Examples of intermediary nodes comprise machine learning classifiers that employ document level vectors to determine if the text aligns with a certain type of signal, CBOW classifiers that ascertain whether a specific topic is being discussed in the document, and Bayesian mappings that integrate several signals into a conclusive output signal.

    The Standard Mapping

    Intermediary or output mappings often exhibit a stereotypical structure where multiple input signals are consolidated to create a more robust and accurate output signal. For instance, to determine whether a file is an HR document, input signals might include a machine learning model that assesses whether the file reads as an HR document, a CBOW model that detects topics relevant to the HR sector present in the file, and several filters searching for HR-specific terminology. While there are numerous methods to combine these signals into an output signal, a standardized approach, referred to as the "standard mapping," is typically employed to ensure consistency and efficiency in the process.

    The standard mapping process outputs a true or false value based on inputs from three types of true/false signals, which can either be filters or machine learning models.

    1. Hard Signals: These are decisive signals that set the standard mapping to true whenever any one of them is true, regardless of the status of other signals. For instance, the detection of a highly specific and unique identifier like a Social Security number in certain contexts immediately indicates the presence of private identifiable information, irrespective of other detectors' output.

    2. Soft Signals: These signals set the standard mapping to true only if one of them is true and is also supported by other true signals. This is used in cases where broad criteria need further verification. For example, detecting the word "account" may flag a text potentially as financial information. However, it requires additional corroborative evidence from other sources or models to be classified definitively as financial information.

    3. Supporting Evidence: These signals influence the standard mapping's truth value either if all are true with high confidence, providing strong evidence that the mapping should be true, or if they are true with low confidence but a soft signal is also true. This layered approach ensures a nuanced decision-making process that accounts for evidence strength and relevance.

    This structured approach to output mapping ensures accurate and reliable determinations based on the nature and strength of the input signals. This approach is outlined in pseudocode below:

    Output nodes

    Output nodes utilize information from intermediary nodes to generate signals that are directly presented in the user interface (UI). These signals encompass:

    1. Data Attributes: Important for characterizing the data or data asset attributes, such as whether the data is financial, HR-related, etc.

    2. Compliance Labels: These labels indicate whether the data may be subject to specific compliance regulations, including PII (Personal Identifiable Information), PHI (Protected Health Information), etc.

    3. Classifications: Define the kind of actionable results that should be derived after classifying the file, providing a clear directive for subsequent actions.

    Notably, especially in the context of compliance and classification, these output nodes can also be used to stamp information directly onto the file. This ensures that important data about compliance and classification is visibly and immediately associated with the file, facilitating easy access to this critical information through the UI.

    The typical classification system categorizes the level of sensitivity of a file. This can range from a binary flag indicating whether the file is sensitive or not, to a more nuanced classification with three to five labels, such as:

    • Public

    • Internal

    • Confidential

    • Highly Confidential

    • Secret/Top Secret

    However, it is recommended to avoid using more than four or five mutually exclusive outcomes for classifying a file. This is because having too many categories can complicate implementation on the customer's side and pose challenges in verifying the accuracy of the classifier. Simplifying the classification spectrum helps both in ease of use and ensuring a more straightforward validation of classification results.

    Visualizing the Mesh

    We offer below the visualization of a large AI mesh (80 nodes) with input nodes at the bottom and output nodes at the top.

    Nodes are colour coded as follows:

    • yellow - document and word vectors

    • blue - ML classifiers

    • green - light ML (CBOW) classifiers

    • red - Python / engineered signals

    • black - forward mappings

    Notice how few input filters are entangled with knowledge collected by ML models, and how the classification output node (top) integrates information from all these nodes.

    Multilingual Support

    The AI mesh is designed to be multilingual, catering to the requirements of machine learning models that depend on word vectors or document factors derived from unstructured text. The strategy to achieve multilingual capability involves generating the same document or word vectors for the same text translated into multiple languages (language-agnostic representations). This approach compresses the text into sentence or document vectors, and the language model itself has a certain capability to translate between the languages it supports.

    For the sake of classification speed and accuracy, the deployment is typically restricted to bilingual models, where one of the languages is English and the other could be Arabic, French, or any other language. Although the solution has been tested with up to 12 different languages, in practice, a more focused bilingual approach is preferred.

    For other types of nodes within the mesh, such as filter nodes or complexity detectors, adequate adjustments are necessary to account for language-specific differences. This ensures that the AI mesh can efficiently and accurately process information across different languages, maintaining its effectiveness and utility in multilingual environments.

    Explainable AI

    The design of the AI mesh carefully balances exposing a reasonable number of signals and accurately characterizing a block of text of a certain size. Limiting the number of relevant signals to no more than 100 is very important for maintaining the explainability of the mesh in relation to the analysed content. This approach ensures that users can understand how and why certain analytical outcomes were reached without being overwhelmed by too much information.

    When the AI mesh produces a classification outcome, we also store to the database the prerequisites for that outcome within the mesh. This includes which models contributed, in what way, and the confidence scores that contributed to the ancestral sampling of that classification outcome with a specific confidence score. This rich signal provides substantial information about the unstructured text that the mesh processes.

    These prerequisite signals are essential for explaining the classification outcome that the user observes. Explanations can be provided on a per-file basis by examining the outputs of intermediary nodes in the mesh or on a population basis by identifying which factors lead to particular decisions for specific file populations. Natural language synthesis can be employed to translate these intermediary outcomes into understandable natural language, further enhancing the explainability of the mesh's analytical processes.

    Example explainable AI pipeline. Information produced by the mesh is mixed with keywords and summaries to produce a natural language explanation of why the file was classified in a certain way.

    Using the AI Mesh

    The target quality for the user experience with the AI mesh aims to mirror the Service Level Agreement (SLA) for the ML classifiers, where around 80% of the predictions are expected to be perceived as accurate by the user. Adjustments to the mesh will be made if the user's perception significantly deviates from this standard. Specifically, for any given file analysed by the mesh, approximately 8 out of 10 data attributes collected should be correct or flagged with low confidence. Similarly, for any specific data attribute, about 8 out of 10 files should yield a correct prediction or a prediction marked with low confidence.

    After a file is evaluated, the per-file outcomes from the classification network within the AI mesh are stored in a database, making them accessible to GQL enabled filters and reports. This approach leverages the rich signal derived from the unstructured content to generate a wide array of actionable reports. Moreover, the classification pipeline incorporates Active Directory information about who has access to the files. This integration is important for assessing the risk associated with highly confidential files being accessed by trustees, as part of the DSPM+ suite.

    Summary: Advantages of the AI Mesh

    Characterization of data (static or in-flight) with an AI mesh of narrow models has a series of advantages compared to using Large Language Model (LLM) AI technology.

    Speed and Compute Efficiency

    The overall compute required to run the AI mesh 100x-1000x less than that of a classification LLM with similar accuracy. Due to that, it can be successfully productized without requiring specialized hardware such as GPUs.

    Low Cost of Maintenance

    Owing to the way the AI mesh is constructed, tweaking it towards providing expected outcomes for different use cases entails modifying a small number of nodes, which lowers the cost of adapting the mesh to expectations.

    Explainability

    Since the mesh relies on specialized detectors which are associated with intuitive concepts, it can be used natively to build robust explanations regarding the classification outcomes, with or without language synthesis by LLM.

    Trust

    The mesh uses narrow AI classifiers which are trained on synthetic datasets which are small (1-10M tokens) compared to LLM corpora (trillions of tokens). These datasets are available for review and audit, and can be used to completely characterize the behaviour of the AI system, and to ascertain its regulatory liability.

    Extensibility

    The layout of the mesh natively allows integration with any sources or 3rd party signals via its mapping mechanism.

    Logging into Keycloak admin panel

    The Keycloak admin URL will consist of the following components:

    • The domain that has been configured for the reseller to access the application (e.g. my-reseller.net or 10.10.121.127)

    • The service path (e.g. auth for Keycloak)

    • The keycloak admin path /admin/master/console

    An example of the above might look something like this:

    Once the correct address has been entered for the cluster Keycloak instance following the above guidelines, it should be possible to login to the Keycloak admin dashboard using the following details:

    This is the default username and password for the initial login to Keycloak. Please ensure that it is changed!

    Username: admin

    Password: admin

    The access protocol should always be https

    The domain in the example above (E.g. my-reseller.net) might not be applicable if a domain is not configured, in which case the server IP address needs to be used (e.g. 10.10.121.127)

    Once logged into the portal, there are some additional steps to complete in order to configure Keycloak.


    Completing the Realm Configuration

    In Keycloak, a Realm is a top level authentication domain which contains an isolated authentication configuration. For example, each separate Keycloak Realm might represent a different environment.

    A Realm needs to be created to managa the cluster authentication:

    1. Click on the left-side menu item Realm Settings. Make sure that the gv realm is selected in the top left, not master.

    Keycloak main screen.
    1. This will load the Gv Realm Settings → General tab, enter the desired user-friendly reseller name into both the Display name and HTML Display name fields.

    Keycloak Ream Settings.
    1. Click the Save button to commit these changes to the Realm Settings.

    Do not change the content of Realm ID field, it has to be gv.


    Completing the Dashboard Client Configuration

    1. Click on the Clients menu item on the left-side menu, this should load a list of authentication clients.

    Keycloak clients dashboard.
    1. Click on the name link of the item labeled dashboard to navigate to its client configuration page.

    1. Open the dropdown for Login Theme and select the theme created for the reseller (E.g. my-reseller-theme).

    1. Update the Valid Redirect URIs to include the URL that has been configured for the Dashboard UI (remember to click the + plus icon after entering the value). This will allow Keycloak to redirect back to the Dashboard UI after authenticating.

    1. Update the Web Origins to include the URL that has been configured for the Dashboard UI (remember to click the + plus icon after entering the value). This will allow CORS endpoint calls to Keycloak from the Dashboard UI.

    1. Clear the Front-channel logout URL field’s content. This way, instead of the “you are getting logged out” screen, it will go straight to the login page upon logout. Alternatively, you can you can enter the Front-channel logout URL in the following format: https://my-dashboard.com/auth/realms/gv/protocol/openid-connect/logout.

    1. Click the Save button at the bottom of the screen.

    Required for Synergy Settings

    Setting up a default Agent user

    This step is important and required for the agent to work correctly. This user is only used internally by agents on endpoints to authenticate with the server. This user cannot be used to log in to the dashboard. For dashboard login, you must create your user in the gv realm.

    1. Make sure it’s still the gv realm selected in the top left, not master.

    1. Click on the Users menu item on the left-side menu, this should load the Users list.

    1. Click the Add user button in the top right to open the Add user screen.

    1. It’s only necessary to complete two fields on this form; The Username field should contain agent, and the Email field should contain [email protected].

    1. Click the Save button at the bottom of the screen.

    Optional Settings

    Completing the Agent Client Configuration

    This step is optional if Synergy is being used and not required if Focus is being used.

    1. Click on the Clients menu item on the left-side menu, this should load a list of authentication clients.

    1. Click on Edit (or click on the name link) on the item labelled agent in order to load the client.

    1. Update the Valid Redirect URIs value (default is https://localhost:80) to a secure address that you know is not vulnerable or exposed. This is a required field and requires at least one value, so while we have set it to a temporary value, it’s encouraged to change this to something internal.

    1. Click the Save button at the bottom of the screen.

    Completing the User Federation Configuration

    The authentication protocol that the customer decides to use is different per use case. Below is some guidance on how to configure a User Federation in Keycloak.

    Configuring the User Federation

    1. Click on the User Federation menu item on the left-side menu, this should load a list of configured user federations.

    1. Click on Edit (or click on the name link) on the item labelled ldap in order to load the LDAP (Lightweight Directory Access Protocol) configuration.

    1. Update the Connection URL field to reflect the LDAP server address where the Active Directory is hosted.

    1. Update the Users DN field (see the above image) to contain the Full DN of the LDAP tree where your users are.

    2. Click on the button Test connection to test the connection from the Keycloak instance to the LDAP server address. This should succeed quickly, and if it hangs, there is a possibility that the LDAP server is not allowing access from the Keycloak instance server address, or you may need to use the Public IP address of the LDAP server.

    1. Update the Bind DN field to reflect the relevant username used to access the LDAP server.

    1. Update the Bind Credential field (see the above image) to contain the relevant password used to access the LDAP server.

    1. (Optional) Click on the Accordion option Sync Settings in order to set up automatic synchronization of users from the LDAP Active Directory to Keycloak. It is also possible configure the auto-synchronization settings here

    1. Click the Save button at the bottom of the screen.

    Synchronizing the Users to Keycloak DB

    In order to get the users into the Keycloak DB, the users need to be synchronized for the first time (before the automatic synchronization happens, if applicable).

    This is one simple step:

    1. Click the button Synchronize all users in order to immediately fetch all of the LDAP Active Directory users and load them into the Keycloak instance DB.

    Synchronizing all users may take some time.

    Creating a user to access the GetVisibility dashboard

    By default, there are no users in the gv realm, meaning that nobody can access the dashboard to view agent activity, use analytics, run scans or create reports.

    Users must either be created manually as described below, or imported, e.g. via LDAP user federation.

    Users created in the gv realm will have full administrative access to the GetVisibility web console.

    RBAC implementation for granular management of dashboard user permissions is on our roadmap.

    1. Make sure that it’s still the gv realm selected in the top left, not master:

    2. Click on the Users menu item on the left-side menu, this should load the (empty) Users list.

    1. Click the Add user button at the top to open the Add user screen.

    2. There is only one mandatory field here; The Username field should contain your desired username, e.g. admin.

    1. Click Create. This will then load the User Details page for the user that was just created.

    2. Here, click Set password.

    1. Next, choose a strong password for the user. Leave the “Temporary” option on if the user should change their password on the first login.

    1. Click Save.

    2. Navigate to the /ui endpoint of the IP of the server or the domain if you configured any. E.g. https://my-dashboard.com/ui or https://10.10.121.127/ui

    GetVisibility login page.

    Confirm that the credentials are working as expected.


    Troubleshooting Keycloak LDAP integration

    Usually any issues which occur during the LDAP Active Directory configuration process above will be related to Network accessibility concerns or authentication credentials being incorrect.

    However, if any additional assistance is required or the problem is not easily resolved by troubleshooting Network communications and authentication details, please reach out to Support following the steps here.

    The K3s version tag you wish to upgrade to: https://github.com/k3s-io/k3s/releases
  • The system-upgrade-controller file that will be used to upgrade the K3s cluster: https://assets.master.k3s.getvisibility.com/system-upgrade-controller/v0.10.0/system-upgrade-controller.yaml

  • The Bundle file for the K3s upgrade in the Air-Gap Environment

  • Make sure you push all new docker images to the ECR gv-public docker registry that you need to install the new k3s version.


  • Focus/Synergy services

    Updates and custom settings are automatically applied to all backend services using Fleet as long as the cluster has access to the public internet and can connect to the management server.

    In case there’s no internet connection or the management server is down, the cluster agent will keep trying to reach the management server until a connection can be established.

    Upgrading K3s to 1.24

    1. Log in to Rancher or one of the master nodes of the cluster to use kubectl CLI

    2. List the node name and the K3s version:

    1. Add the label k3s-upgrade=true to the nodes: Note: In the case of a multi-node cluster, each node will be updated with the label mentioned above

    1. Deploy the system-upgrade-controller :

    1. Create upgrade-plan.yaml file. Note: the key version has the version of the K3s that the cluster will be upgraded to.

    1. Run the upgrade plan. The upgrade controller should watch for this plan and execute the upgrade on the labeled nodes

    1. Once the plan is executed, all pods will restart and will take a few minutes to recover. Check the status of all the pods:

    1. Check if the K3s version has been upgraded:

    1. Delete the system-upgrade-controller

    Demo Video

    Here is the demo video that showcases the steps that need to be performed to upgrade K3s:

    video


    Upgrading K3s - AirGap (Manual Approach)

    1. Take a shell session to each of the cluster nodes (VMs)

    2. Download and Extract the bundle file: tar -xf gv-platform-$VERSION.tar to all the VMs

    3. Perform the following steps in each of the VMs to Upgrade K3s:

    1. Restart the k3s service across each of the nodes Master nodes:

    Worker nodes:

    1. Wait for a few minutes for the pods to recover.

    1. Check the k3s version across the nodes

    Demo Video

    Here is the demo video that showcases the steps that need to be performed to upgrade K3s in the Air Gap environment:

    video


    Upgrading K3s to 1.26

    For the Platform Team: Local Cluster K3s Upgrade

    If you are upgrading K3s of the local cluster, you would need to remove the existing PodSecurityPolicy resources.

    We have only one of them under the chart aws-node-termination-handler

    1. Patch the helm Chart to disable the psp resource.

      kubectl patch helmchart aws-node-termination-handler -n kube-system --type='json' -p='[{"op": "add", "path": "/spec/set/rbac.pspEnabled", "value": "false"}]'

    2. This will trigger the removal of the PSP resource

    The traefik is deployed as daemonset in the local clusters. You would need to restart the daemonset instead when following the steps given in (broken link)

    • Deploy the system-upgrade-controller:

    • Create the upgrade plan Note: the key version has the version of the K3s that the cluster will be upgraded to.

    If you are also running a worker node then execute this too:

    • Run the upgrade plan:

    In the case of a Worker node execute this too:

    • Once the plan is executed, all pods will restart and take a few minutes to recover Check the status of all the pods:

    • Check if the K3s version has been upgraded:

    • Delete the system-upgrade-controller:

    Reference: Apply upgrade: https://docs.k3s.io/upgrades/automated#install-the-system-upgrade-controller

    Post Upgrade Patch

    We have seen an issue with Traefik not able to access any resources after the upgrade is implemented. Follow these steps to implement the fix

    • Run this patch to add traefik.io to the apiGroup of the ClusterRole traefik-kube-system

    • Add the missing CRDs

    • Restart traefik deployment

    If you are unable to access the Keycloak or the Product UI then it might be a cache issue. Try the Private window of the browser you are using.

    Reference: https://github.com/k3s-io/k3s/issues/8755#issuecomment-1789526830

    Upgrading K3s - AirGap (Manual Approach)

    Follow these steps to upgrade k3s: Upgrading K3s - AirGap (Manual Approach)

    Post Upgrade Patch

    • Run this patch to add traefik.io to the apiGroup of the ClusterRole traefik-kube-system

    • Add the missing CRDs

    • Restart traefik deployment

    Reference: https://github.com/k3s-io/k3s/issues/8755#issuecomment-1789526830

    If you are unable to access the Keycloak or the Product UI then it might be a cache issue. Try the Private window of the browser you are using.

    Certificates

    By default, certificates in K3s expire in 12 months. If the certificates are expired or have fewer than 90 days remaining before they expire, the certificates are rotated when K3s is restarted.

    Settings
    icon in the top menu to switch to the tenant in which needs to be registered to the application from the
    Directories + subscriptions
    menu.
    • Browse to App Registration and select New registration

    • On the App Registration page enter below information and click Register button

      • Name: (Enter a meaningful application name that will be displayed to users of the app)

      • Supported account types:

        • Select which accounts the application will support. The options should be similar to those below. Select “Accounts in this organizational directory only”:

        • Leave the Redirect URI as empty and Click Register

    • Note the Application (client) ID, Directory (tenant) ID values

    • Navigate to Manage -> Certificates and secrets on the left menu, to create a new client secret

    • Provide a meaningful description and expiry to the secret, and click on Add

    • Once a client secret is created, note its Value and store it somewhere safe. NOTE: this value cannot be viewed once the page is closed.

    • Navigate to Manage -> API permissions on the left menu, and Add a permission

    • Select Microsoft APIs -> Microsoft Graph

    • Select Application permissions

    • For UnifiedPolicy.Tenant.Read

      • Navigate to Manage -> API permissions on the left menu, and Add a permission

      • Select APIs my organization uses tab

      • Search for Microsoft Information Protection Sync Service

      • Select Application permissions > UnifiedPolicy.Tenant.Read

    • For InformationProtectionPolicy.Read.All

      • Navigate to Manage -> API permissions on the left menu, and Add a permission

      • Select APIs my organization uses tab

    • For Azure Rights Management Services > Content.Writer

      • Navigate to Manage -> API permissions on the left menu, and Add a permission

      • Select Azure Rights Management Services tab

    • Permissions required

      • For scanning

        • Microsoft Graph > Application permissions > Sites > Sites.Read.All

      • For reading Sensitivity labels

        • Microsoft Graph > Application permissions > InformationProtectionPolicy > InformationProtectionPolicy.Read.All

        • APIs my organization uses > Microsoft Information Protection Sync Service > Application permissions > UnifiedPolicy.Tenant.Read

      • For revoke permissions

        • Microsoft Graph > Application permissions > Files > Files.ReadWrite.All

      • For tagging

        • Microsoft Graph > Application permissions > Sites > Sites.Manage.All

      • For MIP tagging

        • Azure Rights Management Services > Application permissions > Content.Writer

        • Microsoft Graph > Application permissions > Directory > Directory.Read.All

        • Microsoft Graph > Application permissions > Sites > Sites.Manage.All

    • Once all the required permissions are added, click "Grant admin consent"

    Configuring SharePoint Online connector in Dashboard

    • Navigate to Administration -> Data Sources -> SharePoint Online -> New scan

    • Provide the Directory (tenant) ID, Application (client) ID and Client Secret value generated in the above steps from the azure application

    • Click on the Folder icon in Site and path to select a particular site to scan, or leave the path as empty to scan all sites

    • Save the configuration

    • Once the configuration is saved, click on the icon on the right and select Start file scan to begin the scanning

    • The results can be viewed under Dashboard -> Enterprise Search

    File tagging

    Prerequisites

    • First create the default Getvisibility tags as a new column in SharePoint. This process is described below:

      • In SharePoint, navigate to Documents

      • In the files view, select + Add column

      • Select Choice and then Next

      • Set the name to Classification and the choices as: Public, Internal, Confidential, Highly-Confidential. Select.

      • Then click Save

      • Similary create Compliance and Distribution columns (if required)

      • Getvisibility and SharePoint's tags are now aligned

    • When tags are written to SharePoint files automatically over the API, as the tags are added by Getvisibility, Modified By changes to System Account.

      • Getvisibility preserves the Modified date where applicable.

    Azure Portal

    Google Drive

    How to configure a Google Drive connection to scan files and folders.

    Create OAuth2 Credentials

    • Create a Project in Google Cloud Console:

      • Go to the

      • Create a new project or select an existing project

    • Enable the Google Drive, Drive Labels and Admin SDK API:

      • In the Google Cloud Console, navigate to APIs & Services > Library

      • Search for "Google Drive API" and click on it

    • Create OAuth 2.0 Credentials:

      • In the Google Cloud Console, navigate to the APIs & Services > Credentials

      • Click "Create credentials" and select "Service account"

    Delegate domain-wide authority to your service account

    • From your domain's , go to Main menu menu > Security > Access and data control > API controls

    • In the Domain wide delegation pane, select "MANAGE DOMAIN-WIDE DELEGATION"

    • Click Add new

    • In the Client ID field, enter the client ID obtained from the service account creation steps above

    • In the OAuth Scopes field, enter a comma-delimited list of the scopes required for the application

    • Use the below scopes:

      For scanning

      • https://www.googleapis.com/auth/admin.directory.user.readonly

      • https://www.googleapis.com/auth/admin.directory.group.readonly

    Provide required Admin roles to a user

    In order to perform a scan using Google Drive connector, it needs a user with the below Admin roles assigned:

    • Services Admin

    • User Management

    • Groups Reader

    They can be added/checked here for the UserID which will be used for impersonation: > Directory > Users > Assign roles > add Services Admin, User Management, and Groups Reader roles, as follows:

    • Navigate to

    • Select Users under Directory from the left menu

    • Select a user you want to use for scanning

    • Navigate to User details -> Admin roles and privileges

    • Edit the roles, and enable:

      • Services Admin

      • User Management

      • Groups Reader

    Note: It might take few minutes before the changes are affected.

    Configuring Google Drive connector in Dashboard

    • Navigate to Administration -> Data Sources -> Google Drive -> New scan

    • Enter the details of the OAuth2 credentials obtained previously, also add the user id (in the form of [email protected]) of the user you assigned roles in the above steps

    • Click on the Folder icon in Path to select a particular user's drive to scan, or leave the path as empty to scan all users

    • Save the configuration

    • Once the configuration is saved, click on the icon on the right and select Start file scan to begin scanning

    • The scan results can be viewed under Dashboard -> Enterprise Search

    File tagging

    Prerequisites

    Default Getvisibility labels need to be created in Google Drive. This process is described below:

    • Turn on Drive labels for the organization

      1. In the Google Admin Console (at admin.google.com)

      2. Go to Menu

      3. Click Labels

    Configuring a HA K3s cluster

    Our K3s HA setup consists of 4 homogeneous nodes (3 master nodes + 1 worker node) and can withstand a single-node failure with a very short failover disruption (between 3 to 6 minutes).

    With our HA setup we can achieve a monthly uptime of 99.9% (a maximum of 43m of downtime every month).

    Prerequisites

    Please refer to K3S installation for the node specs of the product you’ll be installing.

    The minimum spec allowed for a HA node is 8 CPUs, 32GB of RAM and 500GB of free SSD disk space. All nodes should also have the same spec and OS.

    Networking

    Internal

    We recommend running the K3s nodes in a 10Gb low latency private network for the maximum security and performance.

    K3s needs the following ports to be accessible by all other nodes running in the same cluster:

    The ports above should not be publicly exposed as they will open up your cluster to be accessed by anyone. Make sure to always run your nodes behind a firewall/security group/private network that disables external access to the ports mentioned above.

    All nodes in the cluster must have:

    1. Domain Name Service (DNS) configured

    2. Network Time Protocol (NTP) configured

    3. Software Update Service - access to a network-based repository for software update packages

    4. Fixed private IPv4 address

    External

    The following port must be publicly exposed in order to allow users to access Synergy or Focus product:

    The user must not access the K3s nodes directly, instead, there should be a load balancer sitting between the end user and all the K3s nodes (master and worker nodes):

    The load balancer must operate at Layer 4 of the OSI model and listen for connections on port 443. After the load balancer receives a connection request, it selects a target from the target group (which can be any of the master or worker nodes in the cluster) and then attempt to open a TCP connection to the selected target (node) on port 443.

    The load balancer must have health checks enabled which are used to monitor the health of the registered targets (nodes in the cluster) so that the load balancer can send requests to healthy nodes only.

    The recommended health check configuration is:

    • Timeout: 10 seconds

    • Healthy threshold: 3 consecutive health check successes

    • Unhealthy threshold: 3 consecutive health check failures

    • Interval: 30 seconds

    Public

    Please refer to for the list of urls you need to enable in your corporate proxy in order to connect to our private registries.


    Configuring K3s nodes

    We need 3 master nodes and at least 1 worker node to run K3s in HA mode.

    The nodes must be homogeneous, having the same number of CPUs, RAM and disk space.

    1st master node

    To get started launch a server node using the cluster-init flag:

    Check for your first master node status, it should have the Ready state:

    Use the following command to copy the TOKEN that will used to join the other nodes to the cluster:

    Don’t also forget to copy the private IP address of the 1st master node which will be used by the other nodes to join the cluster.

    2nd master node

    SSH into the 2nd server to join it to the cluster:

    1. Replace K3S_TOKEN with the contents of the file /var/lib/rancher/k3s/server/node-token from the 1st master node installation.

    2. Set --node-name to master2

    3. Set --server to the private static IP address of the 1st master node.

    Check the node status:

    3rd master node

    SSH into the 3rd server to join it to the cluster:

    1. Replace K3S_TOKEN with the contents of the file /var/lib/rancher/k3s/server/node-token from the 1st master node installation.

    2. Set --node-name to master3

    3. Set --server to the private static IP address of the 1st master node.

    Check the node status:

    1st worker node

    SSH into the 4th server to join it to the cluster:

    1. Replace K3S_TOKEN with the contents of the file /var/lib/rancher/k3s/server/node-token from the 1st master node installation.

    2. Set --node-name to worker1

    3. Set --server to the private static IP address of the 1st master node.

    Joining additional worker nodes

    You may create as many additional worker nodes as you want.

    SSH into the server to join it to the cluster:

    1. Replace K3S_TOKEN with the contents of the file /var/lib/rancher/k3s/server/node-token from the 1st master node installation.

    2. Update --node-name with your worker node name(Ex: worker2 , worker3 etc..)

    3. Set --server to the private static IP address of the 1st master node.

    Check the node status:

    Register HA K3s Cluster to Rancher

    You may run the registration command that you generated using Rancher UI or through license manager. You should see all master and worker nodes in your cluster through the Machine Pools on the Rancher dashboard:


    Install Helm charts

    GV Essentials

    1. Go to Apps > Charts and install the GetVisibility Essentials Helm chart:

    1. If you are installing Focus or Enterprise click on Enable ElasticSearch.

    2. Configure the UTC hour (0-23) that backups should be performed at:

    1. Click on High Available and set:

      1. MinIO Replicas to 4

      2. MinIO Mode to distributed

    GV Monitoring

    1. Go to Apps > Charts and install the GetVisibility Monitoring Helm chart and Install into Project: Default.

    1. Click on High Available and set:

      1. Prometheus replicas to 2

      2. Loki replicas to 2

    Configure Fleet labels

    1. Go to the global menu Continuous Delivery > Clusters and click on Edit config for the cluster:

    1. For Synergy: add 3 labels product=synergy environment=prod high_available=true and press Save.

    2. For Focus: add 3 labels product=focus environment=prod high_available=true

    Enabling Microsoft O365 Streaming with on-premise or private cloud DDR deployments

    Overview

    This guide outlines how to configure Microsoft O365 Streaming in environments where Getvisibility’s Data Detection and Response (DDR) platform is deployed on-premise or in a private cloud. The integration enables DDR to receive and act upon real-time Microsoft 365 activity notifications.


    Prerequisites

    Ensure the following prerequisites are in place before starting the integration:

    • A deployed and operational DDR instance.

    • A public DNS record pointing to the DDR listener endpoint.

    • A valid SSL/TLS certificate from a trusted Certificate Authority.

    • An internet-accessible port 443 (HTTPS) endpoint.


    Deployment Steps

    Step 1: Expose DDR Webhook Endpoint

    Make sure the DDR webhook endpoint is:

    • Publicly accessible via a fully qualified domain name (FQDN).

    • Protected with a valid SSL/TLS certificate.

    • Accessible on port 443 (HTTPS).

    Note: You can use a reverse proxy (e.g., NGROK, NGINX) to securely expose internal services if needed.


    Step 2: Configure Firewall for Microsoft Graph

    Microsoft recommends restricting webhook traffic to only allow inbound requests from Microsoft Graph servers. This reduces the attack surface and prevents spoofed webhook messages.

    Allowlist Required Endpoints:

    More info at

    ⚠️ Action Required: Your firewall or reverse proxy must allow inbound HTTPS traffic from all IP addresses Microsoft uses to deliver change notifications. Regularly update your rules using Microsoft’s published IP ranges.

    Azure Files Streaming Configuration

    This document provides information on how to configure Azure Files connection with real-time events monitoring and data streaming.

    Prerequisites

    1. Existing Azure Files connection: An Azure Files scan configuration must already exist.

    Microsoft.Storage/storageAccounts/blobServices/containers/read	(Return a container or a list of containers)
    Microsoft.Storage/storageAccounts/blobServices/containers/write	(Modify a container's metadata or properties)
    Microsoft.Storage/storageAccounts/blobServices/generateUserDelegationKey/action	(Returns a user delegation key for the Blob service)
    Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read	(Return a blob or a list of blobs)
    Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write	(Write to a blob)
    Microsoft.Storage/storageAccounts/blobServices/containers/blobs/move/action	(Moves the blob from one path to another)
    Microsoft.Storage/storageAccounts/blobServices/containers/blobs/add/action	(Returns the result of adding blob content)
    curl -sfL https://assets.master.k3s.getvisibility.com/k3s/k3s.sh | \
    INSTALL_K3S_VERSION="v1.26.10+k3s1" K3S_KUBECONFIG_MODE="644" PRODUCT_NAME=enterprise sh -s - server --node-name=local-01
    curl -sfL https://assets.master.k3s.getvisibility.com/k3s/k3s.sh | \
    INSTALL_K3S_VERSION="v1.26.10+k3s1" K3S_KUBECONFIG_MODE="644" PRODUCT_NAME=ultimate ONLY_PRECHECK=true sh -s - server --node-name=local-01
    curl -sfL https://assets.master.k3s.getvisibility.com/k3s/k3s.sh | \
    INSTALL_K3S_VERSION="v1.26.10+k3s1" K3S_KUBECONFIG_MODE="644" SKIP_SYSTEM_CHECKS=true sh -s - server --node-name=local-01
    curl -sfL https://assets.master.k3s.getvisibility.com/k3s/k3s.sh | \
    INSTALL_K3S_VERSION="v1.26.10+k3s1" K3S_KUBECONFIG_MODE="644" SKIP_PRECHECK=true sh -s - server --node-name=local-01
    export http_proxy="$PROXY_IP"
    export https_proxy="$PROXY_IP"
    no_proxy="$NODE_IP,localhost,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.svc,.cluster.local"
    kubectl apply -f https://....k3s.getvisibility.com/v3/import/dxslsxcf84....yaml
    watch -c "kubectl get deployments -A" 
     kubectl logs cattle-cluster-agent-d96d648d8-wjvl9 -n cattle-system
    https://assets.master.k3s.getvisibility.com (Custom K3s installation files)
    https://images.master.k3s.getvisibility.com (Private Docker registry)
    https://charts.master.k3s.getvisibility.com (Private Helm registry)
    https://prod-eu-west-1-starport-layer-bucket.s3.eu-west-1.amazonaws.com (Docker registry AWS CDN)
    https://rpm.rancher.io (Rancher RPM repo for configuring SELinux packages on RHEL or CentOS)
    https://api.master.k3s.getvisibility.com (Private API server)
    https://rancher.master.k3s.getvisibility.com (Rancher management server)
    https://rancher.$RESELLER_NAME.k3s.getvisibility.com (Rancher management server, where $RESELLER_NAME is Getvisibility for direct customers)
    if ANY hard signals True:
      return True
    
    if ANY soft signals True AND ALL supporting signals True even at low confidence:
      return True
    
    # by default this is disabled
    if ALL supporting signals True at high confidence:
      return True
      
    otherwise return False
    https://my-reseller.net/auth/admin/master/console
    kubectl get nodes
    kubectl label node --all k3s-upgrade=true
    kubectl apply -f https://assets.master.k3s.getvisibility.com/system-upgrade-controller/v0.10.0/system-upgrade-controller.yaml
    cat > upgrade-plan.yaml << EOF
    ---
    apiVersion: upgrade.cattle.io/v1
    kind: Plan
    metadata:
      name: k3s-latest
      namespace: system-upgrade
    spec:
      concurrency: 1
      version: v1.24.9+k3s2
      nodeSelector:
        matchExpressions:
          - {key: k3s-upgrade, operator: Exists}
      serviceAccountName: system-upgrade
      upgrade:
        image: docker.io/rancher/k3s-upgrade
    EOF
    kubectl apply -f upgrade-plan.yaml
    watch kubectl get pods -A
    kubectl get nodes
    kubectl delete -f https://assets.master.k3s.getvisibility.com/system-upgrade-controller/v0.10.0/system-upgrade-controller.yaml
    $ mkdir -p /var/lib/rancher/k3s/agent/images/
    $ gunzip -c assets/k3s-airgap-images-amd64.tar.gz > /var/lib/rancher/k3s/agent/images/airgap-images.tar
    $ cp assets/k3s /usr/local/bin && chmod +x /usr/local/bin/k3s
    $ systemctl restart  k3s.service
    $ systemctl restart k3s-agent.service
    watch kubectl get pods -A
    kubectl get nodes
    kubectl apply -f https://assets.master.k3s.getvisibility.com/system-upgrade-controller/v0.13.1/system-upgrade-controller.yaml
    cat > upgrade-plan-server.yaml << EOF
    ---
    # Server plan
    apiVersion: upgrade.cattle.io/v1
    kind: Plan
    metadata:
      name: server-plan
      namespace: system-upgrade
    spec:
      concurrency: 1
      cordon: true
      nodeSelector:
        matchExpressions:
        - key: node-role.kubernetes.io/control-plane
          operator: In
          values:
          - "true"
      serviceAccountName: system-upgrade
      upgrade:
        image: rancher/k3s-upgrade
      version: v1.26.10+k3s1
    EOF
    cat > upgrade-plan-agent.yaml << EOF
    ---
    # Agent plan
    apiVersion: upgrade.cattle.io/v1
    kind: Plan
    metadata:
      name: agent-plan
      namespace: system-upgrade
    spec:
      concurrency: 1
      cordon: true
      nodeSelector:
        matchExpressions:
        - key: node-role.kubernetes.io/control-plane
          operator: DoesNotExist
      prepare:
        args:
        - prepare
        - server-plan
        image: rancher/k3s-upgrade
      serviceAccountName: system-upgrade
      upgrade:
        image: rancher/k3s-upgrade
      version: v1.26.10+k3s1
    EOF
    kubectl apply -f upgrade-plan-server.yaml
    kubectl apply -f upgrade-plan-agent.yaml
    watch kubectl get pods -A
    kubectl get nodes
    kubectl delete -f https://assets.master.k3s.getvisibility.com/system-upgrade-controller/v0.13.1/system-upgrade-controller.yaml
    kubectl patch clusterrole traefik-kube-system -n kube-system --type='json' -p='[{"op": "add", "path": "/rules/-1/apiGroups/-", "value": "traefik.io"}]'
    kubectl apply -f https://assets.master.k3s.getvisibility.com/k3s/v1.26.10+k3s1/traefik-patch.yaml
    kubectl rollout restart deployment traefik -n kube-system
    kubectl patch clusterrole traefik-kube-system -n kube-system --type='json' -p='[{"op": "add", "path": "/rules/-1/apiGroups/-", "value": "traefik.io"}]'
    kubectl apply -f assets/traefik-patch.yaml
    kubectl rollout restart deployment traefik -n kube-system
    Post Upgrade Patch

    Firewall rules allowing inbound traffic from Microsoft Graph servers.

    Microsoft 365 URLs and IP Address Ranges
    Additional Microsoft 365 IP Addresses and URLs
    Graph Change Notification Delivery – Firewall Configuration

    700GB min 32M inodes

    CNCF
    Bayesian inference

    Search for Microsoft Information Protection API

    • Select Application permissions > InformationProtectionPolicy.Read.All

    Select Application permissions

    • Select Content > Content.Writer

    Microsoft Graph > Application permissions > InformationProtectionPolicy > InformationProtectionPolicy.Read.All

  • APIs my organization uses > Microsoft Information Protection API > Application permissions > InformationProtectionPolicy.Read.All

  • 9796

    Prometheus node exporter

    Globally unique node name (use --node-name when installing K3s in a VM to set a static node name)
    Balance mode: round robin

    Consul Server replicas to 3

    and press
    Save.
  • For Enterprise: add 3 labels product=enterprise environment=prod high_available=true and press Save.

  • Protocol

    Port

    Description

    TCP

    6443

    Kubernetes API Server

    UDP

    8472

    Required for Flannel VXLAN

    TCP

    2379-2380

    embedded etcd

    TCP

    10250

    metrics-server for HPA

    Protocol

    Port

    Description

    TCP

    443

    Focus/Synergy backend

    Estimate hardware capacity needs

    TCP

    curl -sfL https://assets.master.k3s.getvisibility.com/k3s/k3s.sh | INSTALL_K3S_VERSION="v1.26.10+k3s1" K3S_KUBECONFIG_MODE="644" sh -s - server --node-name=master1 --cluster-init
    kubectl get nodes
    cat /var/lib/rancher/k3s/server/node-token
    curl -sfL https://assets.master.k3s.getvisibility.com/k3s/k3s.sh | K3S_TOKEN=SHARED_SECRET INSTALL_K3S_VERSION="v1.26.10+k3s1" K3S_KUBECONFIG_MODE="644" sh -s - server --node-name=master2 --server https://<ip or hostname of master1>:6443
    kubectl get nodes
    curl -sfL https://assets.master.k3s.getvisibility.com/k3s/k3s.sh | K3S_TOKEN=SHARED_SECRET INSTALL_K3S_VERSION="v1.26.10+k3s1" K3S_KUBECONFIG_MODE="644" sh -s - server --node-name=master3 --server https://<ip or hostname of master1>:6443
    kubectl get nodes
    curl -sfL https://assets.master.k3s.getvisibility.com/k3s/k3s.sh | K3S_TOKEN=SHARED_SECRET INSTALL_K3S_VERSION="v1.26.10+k3s1" K3S_KUBECONFIG_MODE="644" sh -s - agent --node-name=worker1 --server https://<ip or hostname of any master node>:6443
    curl -sfL https://assets.master.k3s.getvisibility.com/k3s/k3s.sh | K3S_TOKEN=SHARED_SECRET INSTALL_K3S_VERSION="v1.26.10+k3s1" K3S_KUBECONFIG_MODE="644" sh -s - agent --node-name=workerX --server https://<ip or hostname of any master node>:6443
    kubectl get nodes

    Click the "Enable" button to enable the Google Drive API for the project

  • Search for "Admin SDK API" and click on it

    • Click the "Enable" button to enable the Admin SDK API for the project

    • Search for "Drive Labels API" and click on it

    • Click the "Enable" button to enable Drive Labels API for the project

    Enter a name in the Service account name field and click CREATE AND CONTINUE

    • Under Grant this service account access to the project, select role as Owner and click DONE

    • Select the newly created service account and click Keys > Add Key > Create new key

    • Make sure the key type is set to json and click Create

    • The new private key pair is generated and downloaded to the machine. Note the values of private_key, client_email and client_id

    https://www.googleapis.com/auth/drive.readonly

    For revoke permissions

    • https://www.googleapis.com/auth/drive

    For tagging

    • https://www.googleapis.com/auth/drive.file

    • https://www.googleapis.com/auth/drive

    • https://www.googleapis.com/auth/drive.admin.labels

    • https://www.googleapis.com/auth/drive.metadata

    • https://www.googleapis.com/auth/drive.labels

    For Extended Streaming Events

    • https://www.googleapis.com/auth/admin.reports.audit.readonly

  • Click Authorize

  • Click on Save

    Select Turn Labels On

  • Click Save

  • Create Drive labels:

    1. Go to the labels manager at https://drive.google.com/labels.

      Requires having the Manage Labels privilege.

    2. Click New label.

    3. To create one badged label:

      1. Choose a badged label

      2. Choose to start from an example, or from scratch.

      3. Update the title as Classification.

      4. (Optional) Add a description or a learn more URL that points to internal documentation about the label.

    4. To create a standard label:

      1. Two standard labelsneed to be created; Distribution and Compliance

      2. Click a standard label template or click Create New.

      3. Enter or update the label name.

  • Publish the labels

    1. If it’s not open already, open the labels manager (https://drive.google.com/labels) and click the label.

    2. Review the label and any fields.

    3. Click Publish.

    4. Confirm that the lable will be published by clicking Publish.

  • Google Cloud Console
    Admin console
    admin.google.com
    Admin console
    Apps > Google Workspace > Drive and Docs

    If an Azure Files scan does not already exist, follow this guide to create a new Azure Files scan and ensure the necessary credentials are set up.

    Steps to Enable Data Streaming

    1. Select an Existing Scan Configuration

    1. Go to the Scan configurations page in the product UI.

    2. Locate your existing Azure Files scan configuration and select Edit Configuration from the options menu. Note the configured path (folder) and save it, as it will be used in step 9 to replace {FolderPath}.

    2. Enable Data Streaming

    1. Within the Edit Azure Files Scan Configuration page, toggle Data Streaming to ON.

    2. Copy the Webhook URL provided, as you will use it later in the Azure Portal. Save this Webhook URL, as it will be used in step 9 to replace {WebhookUrl}.

    1. Click Save & Close button to save configuration.

    3. Create Azure Event Hub

    1. Navigate to Azure Portal Event hubs and click Create

    1. In Create Namespace Window fill in the details

      1. Give it a Name

      2. Select your subscription and resource group

      3. Select location

      4. Pricing tier - standard

      5. Throughput Units - 1

    1. Click on Review + Create and then Create after validation

    1. After namespace is created, click on + Event Hub button

    2. In Create Event Hub Window fill in name and click Create + Review and Create after validation. Save the name of the Event Hub you created in this step, as it will be used later in step 9 to replace {eventHubName}.

    1. Configure access policy

      1. In the event hubs namespace window click on Settings/Shared access policies and then +Add button

      2. Fill in the details in the new tab, set LogicAppsListenerPolicy as name, select Listen policy, and click Save.

      3. Click on the newly created policy, then copy and save the Connection string–primary key. This will be needed later in step 8b.

    4. Configure Azure Storage Diagnostic settings

    1. Navigate to Azure Portal and open your Storage Account.

    2. Select needed account from the Storage Accounts

    3. In the left-hand menu, select Monitoring/Diagnostic settings and click file

    4. In Diagnostic settings Window click on "+ Add diagnostic setting" button

    5. In Create Diagnostic setting Window fill in the details:

      1. Give it a Name

      2. Select Category groups allLogs

      3. Select Destination details Stream to an event hub and select newly created Event Hub Namespace and Event Hub

    5. Configure Azure Logic Apps

    1. Go to Azure logic apps and click "Add" button

    2. In Create Logic App Window select Workflow Service Plan

    3. In Create Logic App (Workflow Service Plan) Window fill in the details and click "Create + Review":

      1. Select your subscription and resource group

      2. Give logic app name

      3. Select region

      4. Pricing plan should be WS1

      5. In the monitoring tab select No for the application insights

      6. Click Review + create button

    4. Click Create after validation

    5. In newly created logic app click on Workflows/Workflows and then +Add button

    6. In new workflow tab fill in name, select State type: Stateful and click Create

    7. In created workflow go to Developer/Designer and click on Add a trigger, then in search type "Event hub" and select "When events are available in Event Hub"

    8. Configure API connection

      1. Click on the trigger, set "Temp" for Event Hub Name and then click on Change connection.

      2. Then click Add New and fill in the details. Enter any name for the connection name and use the connection string {Connection string–primary key} from step 3.6.c.

    9. In workflow navigation tab go to Developer/Code and set the provided code, then click save:

      1. Replace with a path to the streaming folder. For ex., you want to get events from the folder "StreamingFolder" which is located in file share with the name "DocumentsShare" and in the folder with the name "Personal". In this case, the path should be "DocumentsShare/Personal/StreamingFolder"

      2. Replace with webhook url provided in the application in the scan configuration window

    Next Steps

    After configuring the event subscription:

    • You may upload documents to the configured path.

    • The events triggered by these uploads will be processed by the Data Streaming setup, and the results will appear in your Getvisibility dashboard.

    Troubleshooting

    If you experience any issues with the configuration, ensure that:

    1. The Webhook URL is correct and matches the configuration in Azure.

    2. Steps 5.8 and 5.9 properly executed and all the variables are replaced with real values.

    3. You can also check if the trigger was unsuccessful by navigating to your configured in previos steps Logic App, then Workflow and Trigger History. If you see any failed triggers, you can inspect the error details to identify the issue.

    Dropbox

    How to configure Dropbox connection to scan it.

    Initial creation of Dropbox connector in Dashboard

    • Navigate to Administration -> Data Sources -> Dropbox

    • Then go to Credentials tab and click New credentials

    • Create credentials name and copy the Redirect URL - it will be needed later. App Key and App Secret fields will be filled later once we .

    Registering a Dropbox App

    • Login to

    • Go to and click Create app

    • On the App Creation page enter below information and click Create app button

      • Choose an API: Most applications will use "Dropbox API"

      • Choose Access Type: Select "Full Dropbox" for complete access.

    Finishing creation of Dropbox connector in Dashboard

    • Once done click Authorize with Dropbox button as below:

    • Then you'll be redirected to 1st page to trust your application - click Continue

    • Then you'll see a list of permissions app will be granted - click Allow

    Azure Blob Streaming Configuration

    This document provides information on how to configure Azure Blob connection with real-time events monitoring and data streaming.

    To enable DDR (Streaming) for an existing Azure Blob scan, follow these steps:

    Prerequisites

    1. Existing Azure Blob connection: An Azure Blob scan configuration must already exist.

    Sharepoint Online Streaming Configuration

    This guide provides steps on how to enable real-time data streaming for a Sharepoint Online connection and monitor streaming events within the Getvisibility platform.

    This guide walks you through enabling real-time data streaming for a Sharepoint Online connection and how to monitor live streaming events within the Getvisibility platform.

    Configuring permissions for an Azure App

    • Login to

    Customize options, and assign a colour.

    (Optional) Add a description.

  • Choose whether the label is copied when the file is copied.

  • Add a field.

  • and then
  • Click Save.

  • On the Change Connection tab, click Details and copy the Name from the connection details. Save this Name, as it will be used later in step 9 to replace {connectionName}.

  • Click save on workflow designer window

  • Replace {eventHubName} with azure event hub name that was created previously
  • Replace {connectionName} with connection name from previouse step

  • {FolderPath}
    {WebhookUrl}

    Name Your App and click Create app: Enter a name that will be visible to users.

  • Go to the Settings tab and find app key and secret above the OAuth 2 section

  • We need to set proper permissions for Dropbox app. Below you can find a list of required permissions:

    • For scanning

      • Files and Folders > files.metadata.read, files.content.read

      • Collaboration > sharing.read

      • Team Data > team_data.member

      • Members > members.read, groups.read

    • For remediations

      • Collaboration > sharing.write

      • Files and Folders > files.content.write

    • For tagging

      • Files and Folders > files.content.write, files.metadata.write

  • Go to the Permissions tab of the newly created App and set the following:

    • Account Info: account_info.read

    • Files and Folders: files.metadata.write, files.metadata.read, files.content.write, files.content.read

    • Collaboration: sharing.read, sharing.write

    • Team: team_info.read

    • Team Data: team_data.member, team_data.content.write, team_data.content.read, files.team_metadata.write, files.team_metadata.read, files.permanent_delete

    • Members: members.read, groups.read

  • Once permissions are set click Save button located on the black snackbar at the bottom of the window.

  • Go back to Settings tab and scroll to Redirect URI section. Put here copied link from Dashboard and click Add

  • Then copy App key from Dropbox App settings page and put it into App key field in Dashboard Create connection form. Similar action should be made for App secret.

  • Once done you'll be redirected back to Dashboard page with success message as below:

  • Connection has been configured successfully

  • Register a Dropbox App
    Dropbox
    Dropbox App Console

    If an Azure Blob scan has not yet been created, follow this guide to create a new Azure Blob scan and ensure the necessary credentials are configured.

    Steps to Enable Data Streaming

    1. Select an Existing Scan Configuration

    1. Go to the Scan configurations page in the product UI.

    2. Find the existing Azure Blob scan configuration and select Edit Configuration from the options menu.

    2. Enable Data Streaming

    1. Within the Edit Azure Blob Scan Configuration page, toggle Data Streaming to ON.

    2. Copy the Webhook URL provided, as you will use it later in the Azure Portal.

    3. Configure Azure Event Grid Subscription

    1. Navigate to Azure Portal and open the Storage Account.

    2. Select one of the connector from the Storage Accounts

    3. In the left-hand menu, select Events and click Create Event Subscription.tor menu

    4. In Create Event Subscription Window fill in the details:

      1. Give it a Name

      2. Select endpoint type Web Hook

      3. Set configure an endpoint

    5. Go to Filters Menu on top

    6. In the Subject Filters section, enter the correct path format for the subscription:

      • Use the following pattern: /blobServices/default/containers/{connectionDetails.ContainerName}/blobs/{connectionDetails.FolderPath}

      • For example, if the container is mycontainer and the folder path is accuracy test/repository1, the path will look like: /blobServices/default/containers/mycontainer/blobs/accuracy test/repository1

    7. Click Create to complete the Event Subscription setup.

    4. Assign Required Azure Permissions

    Ensure the following permissions are assigned to the Azure Storage Account:

    • EventGrid Data Contributor

    • EventGrid EventSubscription Contributor

    • EventGrid TopicSpaces Publisher

    For details on assigning these roles, refer to this documentation.

    5. Create Azure Event Hub

    1. Navigate to Azure Portal Event hubs and click Create

    1. In Create Namespace Window fill in the details

      1. Give it a Name

      2. Select your subscription and resource group

      3. Select location

      4. Pricing tier - standard

      5. Throughput Units - 1

    1. Click on Review + Create and then Create after validation

    1. After namespace is created, click on + Event Hub button

    2. In Create Event Hub Window fill in name and click Create + Review and Create after validation. Save the name of the Event Hub you created in this step, as it will be used later in step 9 to replace {eventHubName}.

    1. Configure access policy

      1. In the event hubs namespace window click on Settings/Shared access policies and then +Add button

      2. Fill in the details in the new tab, set LogicAppsListenerPolicy as name, select Listen policy, and click Save.

      3. Click on the newly created policy, then copy and save the Connection string–primary key. This will be needed later in step 8b.

    6. Configure Azure Storage Diagnostic settings

    1. Navigate to Azure Portal and open your Storage Account.

    2. Select needed account from the Storage Accounts

    3. In the left-hand menu, select Monitoring/Diagnostic settings and click blob

    4. In Diagnostic settings Window click on "+ Add diagnostic setting" button

    5. In Create Diagnostic setting Window fill in the details:

      1. Give it a Name

      2. Select Category groups allLogs

      3. Select Destination details Stream to an event hub and select newly created Event Hub Namespace and Event Hub

    7. Configure Azure Logic Apps

    1. Go to Azure logic apps and click "Add" button

    2. In Create Logic App Window select Workflow Service Plan

    3. In Create Logic App (Workflow Service Plan) Window fill in the details and click "Create + Review":

      1. Select your subscription and resource group

      2. Give logic app name

      3. Select region

      4. Pricing plan should be WS1

      5. In the monitoring tab select No for the application insights

      6. Click Review + create button

    4. Click Create after validation

    5. In newly created logic app click on Workflows/Workflows and then +Add button

    6. In new workflow tab fill in name, select State type: Stateful and click Create

    7. In created workflow go to Developer/Designer and click on Add a trigger, then in search type "Event hub" and select "When events are available in Event Hub"

    8. Configure API connection

      1. Click on the trigger, set "Temp" for Event Hub Name and then click on Change connection.

      2. Then click Add New and fill in the details. Enter any name for the connection name and use the connection string {Connection string–primary key} from step 3.6.c.

    9. In workflow navigation tab go to Developer/Code and set the provided code, then click save:

      1. Replace with a path to the streaming folder. For ex., you want to get events from the folder "StreamingFolder" which is located in file share with the name "DocumentsShare" and in the folder with the name "Personal". In this case, the path should be "DocumentsShare/Personal/StreamingFolder"

      2. Replace with webhook url provided in the application in the scan configuration window

    Troubleshooting

    If you experience any issues with the configuration, ensure that:

    1. The Webhook URL is correct and matches the configuration in Azure.

    2. Steps 5.8 and 5.9 properly executed and all the variables are replaced with real values.

    3. You can also check if the trigger was unsuccessful by navigating to your configured in previos steps Logic App, then Workflow and Trigger History. If you see any failed triggers, you can inspect the error details to identify the issue.

    Next Steps

    After configuring the event subscription:

    • Documents may be uploaded to the configured path.

    • The events triggered by these uploads will be processed by the Data Streaming setup, and the results will appear in the Getvisibility dashboard.

    Troubleshooting

    If there any issues with the configuration, ensure that:

    1. The Webhook URL is correct and matches the configuration in Azure.

    2. The required Azure permissions are correctly assigned.

    3. Steps 5.8 and 5.9 properly executed and all the variables are replaced with real values.

    4. You can also check if the trigger was unsuccessful by navigating to your configured in previos steps Logic App, then Workflow and Trigger History. If you see any failed triggers, you can inspect the error details to identify the issue.

    If there are multiple tenants to choose from, use the Settings icon in the top menu to switch to the tenant in which needs to be registered to the application from the Directories + subscriptions menu

  • Browse to App Registration and select your application that was created for the scanning

  • Navigate to Manage -> API permissions on the left menu, and Add a permission

  • Select Microsoft APIs -> Office 365 Management API

    • Select Application permission

    • Select ActivityFeed.Read permission

    • Permissions required

      • All the scanning permissions(https://docs.getvisibility.com/scan-with-getvisibility/configure-data-sources/onedrive)

      • Office 365 Management API ⇒ Application Permissions ⇒ ActivityFeed.Read

    • Once all the required permission is added, click "Grant admin consent"

    Enabling Auditing

    • Sign into the Microsoft Purview portal using Microsoft Edge browser

    • Select the Audit solution card. If the Audit solution card isn't displayed, select View all solutions and then select Audit from the Core section

    • If auditing isn't turned on for your organization, a banner is displayed prompting you start recording user and admin activity. Select the Start recording user and admin activity banner.

    • In certain cases, recoding cannot be enabled immediately and requires additional configuration. If this applies, users will be prompted to enable the customization setting. Select OK, and a new banner will appear, informing you that the process may take 24 to 48 hours to complete. After this waiting period, repeat the previous step to proceed with enabling recoding.

    Steps to Enable Data Streaming for Sharepoint Online

    1. Create a New Scan Configuration

    1. From the Data Sources page, select Sharepoint Online from the list of available data sources. In the Scan Configurations list create New Configuration

    2. Make sure the connection has a Name, Credentials are set. Then select the Path icon.

    2. Pick a Folder for Real-Time Events

    1. Click on the Folder icon in the Path field to select the folder you want to monitor for real-time events.

      • Magnifying glass icon: Folders with this icon next to them indicate that real-time events can be subscribed to from this directory.

    2. After selecting the folder, click Save & Close to finalize the changes.

    3. Clock icon: When data streaming is being activated, the clock icon will appear, indicating that the subscription is being processed. Once the subscription is activated, this icon will change to a green magnifying glass.

    4. After enabling Data Streaming, the system will automatically handle the subscription to Sharepoint Online’s real-time events. There is no need to manually configure Webhooks.

    Monitoring Real-Time Events

    After the subscription is activated (green magnifying glass icon), real-time events will start flowing into the platform, and you will be able to monitor them from various sections of Getvisibility.

    Viewing Events in the Live Events Section

    1. Navigate to the Live Events section under Administration to view a detailed audit log of all streaming events.

    2. In this section, you can filter and view event details

    Azure Portal
    {
        "definition": {
            "$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#",
            "actions": {
                "Filter_Records": {
                    "type": "Query",
                    "inputs": {
                        "from": "@triggerBody()?['ContentData']?['records']",
                        "where": "@and(not(empty(item()?['uri'])),or(contains(item()?['uri'], '{FolderPath}/'),contains(item()?['uri'], '{FolderPath}?')))"
                    },
                    "runAfter": {}
                },
                "Condition": {
                    "type": "If",
                    "expression": "@greater(length(body('Filter_Records')), 0)",
                    "actions": {
                        "HTTP-copy": {
                            "type": "Http",
                            "inputs": {
                                "uri": "{WebhookUrl}",
                                "method": "POST",
                                "headers": {
                                    "Content-Type": "application/json"
                                },
                                "body": {
                                    "event": "@setProperty(triggerBody(),'ContentData',setProperty(triggerBody()?['ContentData'],'records',body('Filter_Records')))"
                                }
                            },
                            "runAfter": {}
                        }
                    },
                    "else": {},
                    "runAfter": {
                        "Filter_Records": [
                            "Succeeded"
                        ]
                    }
                }
            },
            "contentVersion": "1.0.0.0",
            "outputs": {},
            "triggers": {
                "When_events_are_available_in_Event_Hub": {
                    "type": "ApiConnection",
                    "inputs": {
                        "host": {
                            "connection": {
                                "referenceName": "{connectionName}"
                            }
                        },
                        "method": "get",
                        "path": "/@{encodeURIComponent('{eventHubName}')}/events/batch/head",
                        "queries": {
                            "contentType": "application/json",
                            "consumerGroupName": "$Default",
                            "maximumEventsCount": 50
                        }
                    },
                    "recurrence": {
                        "interval": 30,
                        "frequency": "Second"
                    },
                    "splitOn": "@triggerBody()"
                }
            }
        },
        "kind": "Stateful"
    }
    

    Use the Webhook URL provided at the step 2 to Subscriber endpoint and Confirm selection.

    Make sure to replace {connectionDetails.ContainerName} and {connectionDetails.FolderPath} with the actual container name and folder path from the scan configuration.

  • Click Save.

  • On the Change Connection tab, click Details and copy the Name from the connection details. Save this Name, as it will be used later in step 9 to replace {connectionName}.

  • Click save on workflow designer window

  • Replace {eventHubName} with azure event hub name that was created previously
  • Replace {connectionName} with connection name from previouse step

  • {FolderPath}
    {WebhookUrl}

    Google IAM

    How to configure Google IAM connection to gather permissions and access rights for trustees.

    Create OAuth2 Credentials

    • Create a Project in Google Cloud Console:

    {
        "definition": {
            "$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#",
            "actions": {
                "Filter_Records": {
                    "type": "Query",
                    "inputs": {
                        "from": "@triggerBody()?['ContentData']?['records']",
                        "where": "@and(not(empty(item()?['uri'])),or(contains(item()?['uri'], '{FolderPath}/'),contains(item()?['uri'], '{FolderPath}?')))"
                    },
                    "runAfter": {}
                },
                "Condition": {
                    "type": "If",
                    "expression": "@greater(length(body('Filter_Records')), 0)",
                    "actions": {
                        "HTTP-copy": {
                            "type": "Http",
                            "inputs": {
                                "uri": "{WebhookUrl}",
                                "method": "POST",
                                "headers": {
                                    "Content-Type": "application/json"
                                },
                                "body": {
                                    "event": "@setProperty(triggerBody(),'ContentData',setProperty(triggerBody()?['ContentData'],'records',body('Filter_Records')))"
                                }
                            },
                            "runAfter": {}
                        }
                    },
                    "else": {},
                    "runAfter": {
                        "Filter_Records": [
                            "Succeeded"
                        ]
                    }
                }
            },
            "contentVersion": "1.0.0.0",
            "outputs": {},
            "triggers": {
                "When_events_are_available_in_Event_Hub": {
                    "type": "ApiConnection",
                    "inputs": {
                        "host": {
                            "connection": {
                                "referenceName": "{connectionName}"
                            }
                        },
                        "method": "get",
                        "path": "/@{encodeURIComponent('{eventHubName}')}/events/batch/head",
                        "queries": {
                            "contentType": "application/json",
                            "consumerGroupName": "$Default",
                            "maximumEventsCount": 50
                        }
                    },
                    "recurrence": {
                        "interval": 30,
                        "frequency": "Second"
                    },
                    "splitOn": "@triggerBody()"
                }
            }
        },
        "kind": "Stateful"
    }
    
    <figure><img src="../../.gitbook/assets/cab519c5-725f-4f62-a8d4-3bce7eb60737 (1).png" alt=""><figcaption></figcaption></figure>
    Go to the Google Cloud Console
  • Create a new project or select an existing project

  • Enable the Admin SDK:

    • In the Google Cloud Console, navigate to the "APIs & Services" > "Library"

    • Search for "Admin SDK" and click on it

    • Click the "Enable" button to enable the Admin SDK API for your project

  • Create OAuth 2.0 Credentials:

    • In the Google Cloud Console, go to APIs & Services > Credentials

    • Click "Create credentials" and select "Service account"

    • Enter a name in the Service account name field andclick CREATE CREDENTIALS

    • Under "Grant this service account access to the project," select role as Owner and click DONE

    • Select the newly created service account and click Keys > Add Key > Create new key

    • Make sure the key type is set to json and click CREATE

    • The new private key pair is generated and downloaded to the machine. Note the values of private_key, client_email and client_id

  • Delegate domain-wide authority to your service account

    • From your domain's Admin console, go to Main menu menu > Security > Access and data control > API controls

    • In the Domain wide delegation pane, select Manage Domain Wide Delegation

    • Click Add new

    • In the Client ID field, enter the client ID obtained from the service account creation steps above

    • In the OAuth Scopes field, enter a comma-delimited list of the scopes required for the application

    • Use the below scopes:

      • https://www.googleapis.com/auth/admin.directory.user.readonly

      • https://www.googleapis.com/auth/admin.directory.domain.readonly

      • https://www.googleapis.com/auth/admin.directory.group.readonly

      • https://www.googleapis.com/auth/admin.directory.rolemanagement.readonly

    • Click Authorize

    Required scopes

    • DirectoryService.Scope.AdminDirectoryUserReadonly

    • DirectoryService.Scope.AdminDirectoryDomainReadonly

    • DirectoryService.Scope.AdminDirectoryGroupReadonly

    • DirectoryService.Scope.AdminDirectoryRolemanagementReadonly

    Configuring Google IAM connector in Dashboard

    • Navigate to Administration -> Data Sources -> Google IAM -> New scan

    • Enter the details of the OAuth2 credemtials obtained previously

    • Save the configuration

    • Once the configuration is saved, click on the icon on the right and select Start trustee scan to begin scanning

    • The scan results can be viewed under Dashboard -> Access Governance

    Air Gap Installation

    Single Node Installation

    Install K3s

    Make sure you have /usr/local/bin configured in your PATH: export PATH=$PATH:/usr/local/bin). All the commands must be executed as root user.

    The commands have been tested on Ubuntu Server 20.04 LTS, SUSE Linux Enterprise Server 15 SP4 and RHEL 8.6.

    For RHEL, K3s needs the following package to be installed: k3s-selinux (repo rancher-k3s-common-stable) and its dependencies container-selinux (repo rhel-8-appstream-rhui-rpms) and policycoreutils-python-utils (repo rhel-8-baseos-rhui-rpms). Also, firewalld nm-cloud-setup.service and nm-cloud-setup.timer must be disabled and the server restarted before the installation, for more information.

    The steps below you guide you through the air-gap installation of , a lightweight Kubernetes distribution created by Rancher Labs:

    1. Extract the downloaded file: tar -xf gv-platform-$VERSION.tar

    2. Prepare K3s for air-gap installation:

    1. Install K3s:

    1. Wait for the 30s and check if K3s is running with the command: kubectl get pods -A and systemctl status k3s.service

    Import Docker images

    The steps below will manually deploy the necessary images to the cluster.

    1. Import Docker images locally:

    Install Helm charts

    The following steps guide you through the installation of the dependencies required by Focus and Synergy.

    Replace $VERSION with the version that is present in the bundle that has been downloaded. To check all the charts that have been download run ls charts.

    1. Install Getvisibility Essentials and set the daily UTC backup hour (0-23) for performing backups.

    1. Install Monitoring CRD:

    1. Install Monitoring:

    1. Check all pods are Running with the command: kubectl get pods -A

    Install Focus/Synergy Helm Chart

    Replace the following variables:

    • $VERSION with the version that is present in the bundle that has been downloaded

    • $RESELLER with the reseller code (either getvisibility or forcepoint)

    • $PRODUCT with the product being installed (synergy

    In case if you expirience 404 error for accessing to Keycloak or UI and use 1.26 (default) version of K3s ensure that treafik patch is applied

    Install custom artifact bundles

    Models and other artifacts, like custom agent versions or custom consul configuration can be shipped inside auto deployable bundles. These bundles are docker images that contain the artifacts to be deployed alongside scripts to deploy them. To create a new bundle or modify an existing one follow this guide first: . The list of all the available bundles is inside the bundles/ directory on the models-ci project on github.

    link to an internal confuence

    After the model bundle is published, for example images.master.k3s.getvisibility.com/models:company-1.0.1 You’ll have to generate a public link to this image by running the k3s-air-gap Publish ML models GitHub CI task. The task will ask you for the docker image URL.

    We are still using the s repo because the bundles were only used to deploy custom models at first.

    Once the task is complete you’ll get a public URL to download the artifact on the summary of the task. After that you have to execute the following commands.

    Replace the following variables:

    • $URL with the URL to the model bundle provided by the task

    • $BUNDLE with the name of the artifact, in this case company-1.0.1

    Now you’ll need to execute the artifact deployment job. This job will unpack the artifacts from the docker image into a MinIO bucket inside the on premise cluster and restart any services that use them.

    Replace the following variables:

    • $GV_DEPLOYER_VERSION with the version of the model deployer available under charts/

    • $BUNDLE_VERSION with the version of the artifact, in this case company-1.0.1

    You should be able to verify that everything went alright by locating the ml-model job that was launched. The logs should look like this:

    In addition you can enter the different services that consume these artifacts to check if they have been correctly deployed. For example for the models you can open a shell inside the classifier containers and check the /models directory or check the models-data bucket inside MinIO. Both should contain the expected models.


    Multiple Node Installation (High Availability)

    Prerequisites

    Firewall Rules for Internal Communication

    We recommend running the K3s nodes in a 10Gb low latency private network for the maximum security and performance.

    K3s needs the following ports to be accessible (Inbound and Outbound) by all other nodes running in the same cluster:

    The ports above should not be publicly exposed as they will open up your cluster to be accessed by anyone. Make sure to always run your nodes behind a firewall/security group/private network that disables external access to the ports mentioned above.

    All nodes in the cluster must have:

    1. Domain Name Service (DNS) configured

    2. Network Time Protocol (NTP) configured

    3. Fixed private IPv4 address

    4. Globally unique node name (use --node-name when installing K3s in a VM to set a static node name)

    Firewall Rules for External Communication

    The following port must be publicly exposed in order to allow users to access Synergy or Focus product:

    The user must not access the K3s nodes directly, instead, there should be a load balancer sitting between the end user and all the K3s nodes (master and worker nodes):

    The load balancer must operate at Layer 4 of the OSI model and listen for connections on port 443. After the load balancer receives a connection request, it selects a target from the target group (which can be any of the master or worker nodes in the cluster) and then attempts to open a TCP connection to the selected target (node) on port 443.

    The load balancer must have health checks enabled which are used to monitor the health of the registered targets (nodes in the cluster) so that the load balancer can send requests to healthy nodes only.

    The recommended health check configuration is:

    • Timeout: 10 seconds

    • Healthy threshold: 3 consecutive health check successes

    • Unhealthy threshold: 3 consecutive health check failures

    • Interval: 30 seconds

    VM Count

    At least 4 machines are required to provide high availability of the Getvisibility platform. The HA setup supports a single-node failure.

    Install K3s

    Make sure you have /usr/local/bin configured in your PATH: export PATH=$PATH:/usr/local/bin). All the commands must be executed as root user.

    The commands have been tested on Ubuntu Server 20.04 LTS, SUSE Linux Enterprise Server 15 SP4 and RHEL 8.6.

    For RHEL, K3s needs the following package to be installed: k3s-selinux (repo rancher-k3s-common-stable) and its dependencies container-selinux (repo rhel-8-appstream-rhui-rpms) and policycoreutils-python-utils (repo rhel-8-baseos-rhui-rpms). Also, firewalld nm-cloud-setup.service and nm-cloud-setup.timer must be disabled and the server restarted before the installation, for more information.

    The steps below you guide you through the air-gap installation of , a lightweight Kubernetes distribution created by Rancher Labs:

    1. Create at least 4 VMs with the same specs

    2. Extract the downloaded file: tar -xf gv-platform-$VERSION.tar to all the VMs

    3. Create a local DNS entry private-docker-registry.local across all the nodes resolving to the master1 node:

    1. Prepare the K3s for air-gap installation files:

    1. Update the registries.yaml file across all the nodes.

    1. Install K3s in the 1st master node: To get started launch a server node using the cluster-init flag:

    Check for your first master node status, it should have the Ready state:

    Use the following command to copy the TOKEN from this node that will be used to join the other nodes to the cluster:

    Also, copy the IP address of the 1st master node which will be used by the other nodes to join the cluster.

    1. Install K3s in the 2nd master node:

    Run the following command and assign the contents of the file: /var/lib/rancher/k3s/server/node-token from the 1st master node to the K3S_TOKEN variable.

    Set --node-name to “master2”

    Set --server to the IP address of the 1st master node

    Check the node status:

    1. Install K3s in the 3rd master node:

    Run the following command and assign the contents of the file: /var/lib/rancher/k3s/server/node-token from the 1st master node to the K3S_TOKEN variable.

    Set --node-name to “master3”

    Set --server to the IP address of the 1st master node

    Check the node status:

    1. Install K3s in the 1st worker node: Use the same approach to install K3s and to connect the worker node to the cluster group. The installation parameter would be different in this case. Run the following command: Set --node-name to “worker1” (where n is the nth number of the worker node)

    Check the node status:

    Deploy Private Docker Registry and Import Docker images

    1. Extract and Import the Docker images locally to the master1 node

    1. Install gv-private-registry helm chart in the master1 node: Replace $VERSION with the version that is present in the bundle that has been downloaded. To check all the charts that have been download run ls charts.

    1. Tag and push the docker images to the local private docker registry deployed in the master1 node:

    Install Helm charts

    The following steps guide you through the installation of the dependencies required by Focus and Synergy.

    Perform the following steps in the master1 Node

    Replace $VERSION with the version that is present in the bundle that has been downloaded. To check all the charts that have been download run ls charts.

    1. Install Getvisibility Essentials and set the daily UTC backup hour (0-23) for performing backups. If you are installing Focus or Enterprise append --set eck-operator.enabled=true to the command in order to enable (BROKEN LINK TO ELASTIC SEARCH)

    1. Install Monitoring CRD:

    1. Install Monitoring:

    1. Check all pods are Running with the command:

    Install Focus/Synergy Helm Chart

    Replace the following variables:

    • $VERSION with the version that is present in the bundle that has been downloaded

    • $RESELLER with the reseller code (either getvisibility or forcepoint)

    • $PRODUCT with the product being installed (synergy

    Install Kube-fledged

    Perform the following steps in the master1 node

    1. Install gv-kube-fledged helm chart. Replace $VERSION with the version that is present in the bundle that has been downloaded. To check all the charts that have been download run ls charts.

    1. Create and deploy imagecache.yaml

    Install custom artifacts

    Models and other artifacts, like custom agent versions or custom consul configuration can be shipped inside auto deployable bundles. The procedure to install custom artifact bundles on an HA cluster is the same as in the single node cluster case. Take a look at the guide for single-node clusters above.

    Upgrade

    View current values in config file for each chart

    • Before upgrading each chart, you can check the settings used in the current installation with helm get values <chartname>.

    • If the current values are different from the defaults, you will need to change the parameters of the helm upgrade command for the chart in question.

    • For example, if the backup is currently set to run at 2 AM instead of the 1 AM default, change --set backup.hour=1 to --set backup.hour=2

    Focus/Synergy/Enterprise Helm Chart

    To upgrade Focus/Synergy/Enterprise you must:

    1. Download the new bundle

    2. Import Docker images

    3. Install Focus/Synergy/Enterprise Helm Chart

    LINK TO INTERNAL CONFLUENCE

    1. Import Docker images only to the Master1 node

    2. In the case of HA deployment, Recreate and redeploy the imagecache.yaml file : Perform the 2nd Step

    GetVisibility Essentials Helm Chart

    To upgrade the GV Essential chart you must:

    1. Download the new bundle

    2. Import Docker images

    3. Run the command from Install Getvisibility Essentials under Install Helm charts section

    1. Import Docker images only to the Master1 node

    2. In the case of HA deployment, Recreate and redeploy the imagecache.yaml file : Perform the 2nd Step

    Install custom artifacts

    Models and other artifacts, like custom agent versions or custom consul configuration can be shipped inside auto deployable bundles. The procedure to upgrade custom artifact bundles is the same as the installation one, take a look at the guides above for single-node and multi-node installations.

    or
    focus
    or
    enterprise
    )

    9796

    Prometheus node exporter

    TCP

    80

    Private Docker Registry

    Balance mode: round-robin
    or
    focus
    or
    enterprise
    )
    .
  • Below is a mostly default config.

  • Protocol

    Port

    Description

    TCP

    6443

    Kubernetes API Server

    UDP

    8472

    Required for Flannel VXLAN

    TCP

    2379-2380

    embedded etcd

    TCP

    10250

    metrics-server for HPA

    Protocol

    Port

    Description

    TCP

    443

    Focus/Synergy backend

    click here
    K3s
    https://getvisibility.atlassian.net/wiki/spaces/GS/pages/65372391/Model+deployment+guide#1.-Create-a-new-model-bundle-or-modify-an-existing-one
    images.master.k3s.getvisibility.com/model
    click here
    K3s
    Air Gap Installation | Install Kube fledged
    Air Gap Installation | Install Kube fledged

    TCP

    # mkdir -p /var/lib/rancher/k3s/agent/images/
    # gunzip -c assets/k3s-airgap-images-amd64.tar.gz > /var/lib/rancher/k3s/agent/images/airgap-images.tar
    # cp assets/k3s /usr/local/bin && chmod +x /usr/local/bin/k3s
    # tar -xzf assets/helm-v3.8.2-linux-amd64.tar.gz
    # cp linux-amd64/helm /usr/local/bin
    # cat scripts/k3s.sh | INSTALL_K3S_SKIP_DOWNLOAD=true SKIP_PRECHECK=true K3S_KUBECONFIG_MODE="644" sh -s - server --node-name=local-01
    # mkdir /tmp/import
    # for f in images/*.gz; do IMG=$(basename "${f}" .gz); gunzip -c "${f}" > /tmp/import/"${IMG}"; done
    # for f in /tmp/import/*.tar; do ctr -n=k8s.io images import "${f}"; done
    # helm upgrade --install gv-essentials charts/gv-essentials-$VERSION.tgz --wait \
    --timeout=10m0s --kubeconfig /etc/rancher/k3s/k3s.yaml \
    --set backup.hour=1 \
    --set eck-operator.enabled=true \
    --set updateclusterid.enabled=false \
    --set eck-operator.settings.cpu=4 \
    --set eck-operator.settings.memory=20 \
    --set eck-operator.settings.storage=160
    # helm upgrade --install rancher-monitoring-crd charts/rancher-monitoring-crd-$VERSION.tgz --wait \
    --kubeconfig /etc/rancher/k3s/k3s.yaml \
    --namespace=cattle-monitoring-system \
    --create-namespace
    # helm upgrade --install rancher-monitoring charts/rancher-monitoring-$VERSION.tgz --wait \
    --kubeconfig /etc/rancher/k3s/k3s.yaml \
    --namespace=cattle-monitoring-system \
    --set k3sServer.enabled=true \
    --set k3sControllerManager.enabled=true \
    --set k3sScheduler.enabled=true \
    --set k3sProxy.enabled=true \
    --set prometheus.retention=5
    # helm upgrade --install gv-platform charts/gv-platform-$VERSION.tgz --wait \
    --timeout=10m0s --kubeconfig /etc/rancher/k3s/k3s.yaml \
    --set-string clusterLabels.environment=prod \
    --set-string clusterLabels.cluster_reseller=$RESELLER \
    --set-string clusterLabels.cluster_name=mycluster \
    --set-string clusterLabels.product=$PRODUCT
    # kubectl patch clusterrole traefik-kube-system -n kube-system --type='json' -p='[{"op": "add", "path": "/rules/-1/apiGroups/-", "value": "traefik.io"}]'
    # kubectl apply -f assets/traefik-patch.yaml
    # kubectl rollout restart deployment traefik -n kube-system
    mkdir custom
    wget -O custom/$BUNDLE.tar.gz $URL
    gunzip custom/$BUNDLE.tar.gz
    ctr -n=k8s.io images import models/$BUNDLE.tar
     helm upgrade \
     --install gv-model-deployer charts/gv-model-deployer-$GV_DEPLOYER_VERSION.tgz \
     --wait --timeout=10m0s --kubeconfig /etc/rancher/k3s/k3s.yaml \
     --set models.version="$BUNDLE_VERSION"
    root@ip-172-31-9-140:~# kubectl logs -f ml-model-0jvaycku9prx-84nbf
    Uploading models
    Added `myminio` successfully.
    `/models/AIP-1.0.0.zip` -> `myminio/models-data/AIP-1.0.0.zip`
    `/models/Commercial-1.0.0.zip` -> `myminio/models-data/Commercial-1.0.0.zip`
    `/models/Default-1.0.0.zip` -> `myminio/models-data/Default-1.0.0.zip`
    `/models/classifier-6.1.2.zip` -> `myminio/models-data/classifier-6.1.2.zip`
    `/models/lm-full-en-2.1.2.zip` -> `myminio/models-data/lm-full-en-2.1.2.zip`
    `/models/sec-mapped-1.0.0.zip` -> `myminio/models-data/sec-mapped-1.0.0.zip`
    Total: 0 B, Transferred: 297.38 MiB, Speed: 684.36 MiB/s
    Restart classifier
    deployment.apps/classifier-focus restarted
    root@ip-172-31-9-140:~# 
    cat >> /etc/hosts  << EOF
    <Master1_node_VM_IP>  private-docker-registry.local
    EOF
    $ mkdir -p /var/lib/rancher/k3s/agent/images/
    $ gunzip -c assets/k3s-airgap-images-amd64.tar.gz > /var/lib/rancher/k3s/agent/images/airgap-images.tar
    $ cp assets/k3s /usr/local/bin && chmod +x /usr/local/bin/k3s
    $ tar -xzf assets/helm-v3.8.2-linux-amd64.tar.gz && cp linux-amd64/helm /usr/local/bin
    $ mkdir -p /etc/rancher/k3s
    $ cp assets/registries.yaml  /etc/rancher/k3s/
    cat scripts/k3s.sh | INSTALL_K3S_SKIP_DOWNLOAD=true K3S_KUBECONFIG_MODE="644" sh -s - server --node-name=master1 --cluster-init
    kubectl get nodes
    cat /var/lib/rancher/k3s/server/node-token
    cat scripts/k3s.sh | K3S_TOKEN=$K3S_TOKEN INSTALL_K3S_SKIP_DOWNLOAD=true K3S_KUBECONFIG_MODE="644" sh -s - server --node-name=master2 --server https://<ip or hostname of any master node>:6443
    kubectl get nodes
    cat scripts/k3s.sh | K3S_TOKEN=$K3S_TOKEN INSTALL_K3S_SKIP_DOWNLOAD=true K3S_KUBECONFIG_MODE="644" sh -s - server --node-name=master3 --server https://<ip or hostname of any master node>:6443
    cat scripts/k3s.sh | $K3S_TOKEN INSTALL_K3S_SKIP_DOWNLOAD=true K3S_TOKEN=$K3S_TOKEN K3S_KUBECONFIG_MODE="644" sh -s - agent --node-name=worker1 --server https://<ip or hostname of any master node>:6443
    kubectl get nodes
    $ mkdir /tmp/import
    $ for f in images/*.gz; do IMG=$(basename "${f}" .gz); gunzip -c "${f}" > /tmp/import/"${IMG}"; done
    $ for f in /tmp/import/*.tar; do ctr -n=k8s.io images import "${f}"; done
    $ helm upgrade --install  gv-private-registry charts/gv-private-registry-$VERSION.tgz --wait \
      --timeout=10m0s \
      --kubeconfig /etc/rancher/k3s/k3s.yaml
    $ sh scripts/push-docker-images.sh
    $ helm upgrade --install gv-essentials charts/gv-essentials-$VERSION.tgz --wait \
    --timeout=10m0s --kubeconfig /etc/rancher/k3s/k3s.yaml \
    --set global.high_available=true \
    --set eck-operator.enabled=true  \
    --set minio.replicas=4 \
    --set minio.mode=distributed \
    --set consul.server.replicas=3 \
    --set updateclusterid.enabled=false \
    --set backup.hour=1
    $ helm upgrade --install rancher-monitoring-crd charts/rancher-monitoring-crd-$VERSION.tgz --wait \
    --kubeconfig /etc/rancher/k3s/k3s.yaml \
    --namespace=cattle-monitoring-system \
    --create-namespace
    $ helm upgrade --install rancher-monitoring charts/rancher-monitoring-$VERSION.tgz --wait \
    --kubeconfig /etc/rancher/k3s/k3s.yaml \
    --set global.high_available=true \
    --namespace=cattle-monitoring-system \
    --set loki-stack.loki.replicas=2 \
    --set prometheus.prometheusSpec.replicas=2
    kubectl get pods -A
    $ helm upgrade --install gv-platform charts/gv-platform-$VERSION.tgz --wait \
    --timeout=10m0s --kubeconfig /etc/rancher/k3s/k3s.yaml \
    --set high_available=true \
    --set-string clusterLabels.environment=prod \
    --set-string clusterLabels.cluster_reseller=$RESELLER \
    --set-string clusterLabels.cluster_name=mycluster \
    --set-string clusterLabels.product=$PRODUCT
    $ helm upgrade --install gv-kube-fledged charts/gv-kube-fledged-$VERSION.tgz -n kube-fledged \
    --timeout=10m0s \
    --kubeconfig /etc/rancher/k3s/k3s.yaml \
    --create-namespace
    $ sh scripts/create-imagecache-file.sh
    $ kubectl apply -f scripts/imagecache.yaml

    DDR Supported Events

    A comprehensive list of the supported event types by Data Source for DDR

    When DDR (aka streaming) is enabled and events start coming in from the data source there are two types of events:

    Informational

    Examples would be Read, View, etc.

    No actions are taken when these events are detected.

    Modification Events:

    These are events that alter the file or the file permissions. Examples would include creating a file or user, changing a file name etc.

    When these types of events are detected a scan or rescan of the item will occur so that it can be classified.

    AWS IAM

    Events that Trigger (Re)Scan:

    Create Events:

    • CreateUser - A new user account is created.

    • CreateGroup - A new user group is created.

    • CreateRole - A new role is created with specific permissions.

    Update Events:

    • UpdateUser - Modifications are made to an existing user.

    • UpdateGroup - Changes are made to a group, such as adding or removing members.

    • UpdateRole - A role is updated with new permissions or settings.

    • AttachUserPolicy - A policy is attached to a user, modifying access rights.

    Delete Events:

    • DeleteUser - A user account is deleted.

    • DeleteGroup - A group is deleted along with its associated permissions.

    • DeleteRole - A role is deleted from IAM.

    Other Processed Events:

    Informational Events:

    • ConsoleLogin - A user logs in through the AWS console.

    • SignInFailure - A login attempt fails.

    • SignInSuccess - A login attempt is successful.

    • FederatedLogin - A user logs in via federated authentication.

    List of Processed AWS S3 Events

    Events that Trigger (Re)Scan:

    Create Events:

    • s3:ObjectCreated: - A new object is uploaded to an S3 bucket.

    • s3:ObjectCreated:Post – A new object is uploaded to an S3 bucket by an HTTP POST operation.

    • s3:ObjectCreated:CompleteMultipartUpload – An object was created after a multipart upload operation.

    • s3:ObjectCreated:Copy – A new object is created by an S3 copy operation.

    Update Events:

    • s3:ObjectRestore:Post – A restore request for an archived object is initiated.

    • s3:ObjectRestore:Delete – A restore request for an archived object is deleted.

    • s3:ObjectAcl:Put – Access control settings for an object are updated.

    • s3:ObjectTagging:Put – Tags for an object are added or modified.

    Delete Events:

    • s3:ObjectRemoved:Delete – An object is deleted from an S3 bucket.

    • s3:ObjectRemoved:DeleteMarkerCreated – A delete marker is created for an object, marking it as deleted.

    • s3:LifecycleExpiration:Delete – An object is removed due to lifecycle rules.

    • s3:LifecycleExpiration:DeleteMarkerCreated – A delete marker is created due to lifecycle rules.

    Other Processed Events:

    Informational Events:

    • s3:ReducedRedundancyLostObject - An object stored in Reduced Redundancy Storage is lost.

    • s3:LifecycleTransition – An object is transitioned to a different storage class based on lifecycle rules.

    • s3:Replication:OperationFailedReplication – The replication operation for an object failed.

    • s3:Replication:OperationNotTracked – The replication operation for an object is not tracked.

    Azure Blob

    Events that Trigger (Re)Scan:

    Create Events:

    • Microsoft.Storage.BlobCreated - A new blob is created or content is updated in a storage container.

    • Microsoft.Storage.DirectoryCreated - A new directory is created in a storage container.

    Update Events:

    • Microsoft.Storage.BlobRenamed - A blob is renamed within a container.

    • Microsoft.Storage.DirectoryRenamed - A directory is renamed within a container.

    Delete Events:

    • Microsoft.Storage.BlobDeleted - A blob is deleted from a storage container.

    • Microsoft.Storage.DirectoryDeleted - A directory is deleted from a storage container.

    Other Processed Events:

    • Microsoft.EventGrid.SubscriptionValidationEvent - A subscription validation event.

    • Microsoft.Storage.BlobTierChanged - The storage tier of a blob is modified.

    • GetBlobServiceProperties - Retrieves properties of the Blob service.

    • GetContainerProperties - Retrieves properties of a storage container.

    List of Processed Azure Files Events

    Events that Trigger (Re)Scan:

    Create Events:

    • CreateFile - A new file is created in an Azure Files share.

    • CreateDirectory - A new directory is created in an Azure Files share.

    • CopyFile - A file is copied to a new location.

    Update Events:

    • SetFileProperties - The properties of a file are updated.

    • SetFileMetadata - Metadata of a file is updated.

    Delete Events:

    • DeleteFile - A file is deleted from an Azure Files share.

    • DeleteDirectory - A directory is deleted from an Azure Files share.

    Other Processed Events:

    • ListShares - Lists file shares in an account.

    • GetShareProperties - Retrieves properties of a file share.

    • GetShareMetadata - Retrieves metadata of a file share.

    • GetDirectoryProperties - Retrieves properties of a directory.

    Box

    Events that Trigger (Re)Scan:

    Create Events:

    • FILE.UPLOADED - A new file is uploaded.

    • FOLDER.CREATED - A new folder is created.

    • FILE.RESTORED - A previously deleted file is restored.

    • FOLDER.RESTORED - A previously deleted folder is restored.

    Update Events:

    • FILE.MOVED - A file is moved to a new location.

    • FILE.RENAMED - A file is renamed.

    • FOLDER.RENAMED - A folder is renamed.

    • FOLDER.MOVED - A folder is moved to a new location.

    Delete Events:

    • FILE.TRASHED - A file is moved to the trash.

    • FILE.DELETED - A file is permanently deleted.

    • FOLDER.TRASHED - A folder is moved to the trash.

    • FOLDER.DELETED - A folder is permanently deleted.

    Other Processed Events:

    • FILE.DOWNLOADED - A file is downloaded.

    • FOLDER.DOWNLOADED - A folder is downloaded.

    • FILE.COPIED - A file is copied to another location.

    • FOLDER.COPIED - A folder is copied to another location.

    Confluence Cloud

    Events that Trigger (Re)Scan:

    Create Events:

    • page_created - A new page is created in Confluence.

    • blogpost_created - A new blog post is created.

    • attachment_created - A new attachment is uploaded.

    Update Events:

    • page_updated - An existing page is modified.

    • blogpost_updated - A blog post is updated.

    • attachment_updated - An attachment is updated.

    Delete Events:

    • page_deleted - A page is deleted from Confluence.

    • blogpost_deleted - A blog post is deleted.

    • attachment_deleted - An attachment is removed.

    Other Processed Events:

    • All other events are categorized as informational.

    Gmail Events

    Events that Trigger (Re)Scan:

    Create Events:

    • MessagesAdded - A new email message is added.

    Update Events:

    • LabelsAdded - A label is added to an email.

    • LabelsRemoved - A label is removed from an email.

    Delete Events:

    • MessagesDeleted - An email message is deleted.

    Google Drive Events

    Events that Trigger (Re)Scan:

    Create Events:

    • create - A new file or folder is created.

    • upload - A new file is uploaded.

    Update Events:

    • edit - A file or folder is modified.

    • rename - A file or folder is renamed.

    • move - An item is moved to a different location.

    Delete Events:

    • delete - An item is permanently removed.

    • trash - An item is moved to the trash.

    Other Processed Events:

    • view - A file or folder is viewed.

    • download - A file is downloaded.

    • preview - A file is previewed.

    • print - A file is printed.

    Google IAM Events

    Events that Trigger (Re)Scan:

    Create Events:

    • create_group - A new group is created.

    • create_user - A new user is created.

    Update Events:

    • 2sv_disable - Two-step verification is disabled.

    • 2sv_enroll - Two-step verification is enrolled.

    • password_edit - A user's password is modified.

    • recovery_email_edit - A recovery email is changed.

    Delete Events:

    • delete_group - A group is deleted.

    • delete_user - A user is deleted.

    • archive_user - A user is archived.

    • unarchive_user - A user is unarchived.


    Other Processed Events:

    • login_success - A user successfully logs in.

    • login_failure - A login attempt fails.

    • login_challenge - A login challenge occurs.

    • application_login_failure - An application login fails.

    OneDrive and SharePoint Online Events

    Events that Trigger (Re)Scan:

    Create Events:

    • FileUploaded - A new file is uploaded.

    • FolderCreated - A new folder is created.

    • FileRestored - A previously deleted file is restored.

    • FolderRestored - A previously deleted folder is restored.

    Update Events:

    • FileModified - A file is modified.

    • FileMoved - A file is moved to a new location.

    • FileRenamed - A file is renamed.

    • FolderModified - A folder is modified.

    Delete Events:

    • FileDeleted - A file is permanently deleted.

    • FolderDeleted - A folder is permanently deleted.

    • FileRecycled - A file is moved to the recycle bin.

    • FolderRecycled - A folder is moved to the recycle bin.

    Other Processed Events:

    • FileAccessed - A file is accessed.

    • FileDownloaded - A file is downloaded.

    • FilePreviewed - A file is previewed.

    • FolderCopied - A folder is copied.

    DetachUserPolicy - A policy is removed from a user, altering permissions.

  • PutUserPolicy - A new policy is assigned to a user.

  • AttachGroupPolicy - A policy is attached to a group, affecting all its members.

  • DetachGroupPolicy - A policy is removed from a group.

  • PutGroupPolicy - A policy is assigned to a group.

  • AttachRolePolicy - A policy is attached to a role, modifying access rights.

  • DetachRolePolicy - A policy is removed from a role.

  • PutRolePolicy - A new policy is assigned to a role.

  • ChangePassword - A user changes their password.

  • AddUserToGroup - A user is added to a group, changing their access permissions.

  • RemoveUserFromGroup - A user is removed from a group.

  • SessionStart - A session begins.

  • SessionEnd - A session ends.

  • GenerateCredentialReport - A report on credentials is generated.

  • GetCredentialReport - A credential report is retrieved.

  • ListAccessKeys - Access keys for a user are listed.

  • ListUserTags - Tags associated with a user are retrieved.

  • ListUsers - Users within an AWS account are listed.

  • ListGroups - Groups within an AWS account are listed.

  • ListRoles - Roles within an AWS account are listed.

  • GetUser - Information about a specific user is retrieved.

  • GetGroup - Information about a specific group is retrieved.

  • GetRole - Information about a specific role is retrieved.

  • s3:ObjectRestore:Completed – An archived object has been fully restored and is now available.

    s3:ObjectTagging:Delete – Tags for an object are removed.

    s3:Replication:OperationMissedThreshold – The replication operation did not meet its threshold requirements.

  • s3:Replication:OperationReplicatedAfterThreshold – The replication operation succeeded after surpassing the threshold.

  • s3:IntelligentTiering – An object is moved between storage tiers.

  • GetContainerServiceMetadata - Retrieves metadata for a storage container.

  • ListContainers - Lists storage containers in an account.

  • BlobPreflightRequest - A request to verify blob upload conditions.

  • ListBlobs - Lists blobs in a container.

  • GetBlobProperties - Retrieves properties of a blob.

  • GetBlobMetadata - Retrieves metadata associated with a blob.

  • GetBlockList - Retrieves the list of blocks in a blob.

  • GetContainerACL - Retrieves the access control list of a container.

  • GetContainerMetadata - Retrieves metadata for a container.

  • CopyBlob - Copies a blob from one location to another.

  • CopyBlobSource - Identifies the source blob for a copy operation.

  • CopyBlobDestination - Identifies the destination blob for a copy operation.

  • DeleteBlob - Deletes a blob from a container.

  • DeleteBlobSnapshot - Deletes a snapshot of a blob.

  • DeleteContainer - Deletes a storage container.

  • PutBlob - Uploads a new blob to a container.

  • PutBlock - Uploads a block for a blob.

  • PutBlockList - Commits a set of uploaded blocks as a blob.

  • CreateBlobSnapshot - Creates a snapshot of an existing blob.

  • CreateBlockBlob - Creates a new block blob.

  • CreateContainer - Creates a new storage container.

  • SetBlobMetadata - Updates metadata for a blob.

  • SetBlobProperties - Updates properties of a blob.

  • SetContainerMetadata - Updates metadata for a storage container.

  • SetContainerACL - Modifies the access control list of a container.

  • AcquireBlobLease - Acquires a lease on a blob.

  • ReleaseBlobLease - Releases a lease on a blob.

  • RenewBlobLease - Renews a lease on a blob.

  • BreakBlobLease - Breaks an active lease on a blob.

  • AcquireContainerLease - Acquires a lease on a container.

  • BreakContainerLease - Breaks an active lease on a container.

  • ChangeBlobLease - Changes an active lease on a blob.

  • ChangeContainerLease - Changes an active lease on a container.

  • RenewContainerLease - Renews a lease on a container.

  • UndeleteBlob - Restores a deleted blob.

  • GetFileProperties - Retrieves properties of a file.

  • ListDirectoriesAndFiles - Lists directories and files in a share.

  • GetFile - Retrieves a file from a share.

  • GetFileRangeList - Retrieves the range list of a file.

  • GetShareStats - Retrieves statistics for a file share.

  • CreateShare - Creates a new file share.

  • PutRange - Uploads a range of data to a file.

  • SetShareMetadata - Updates metadata for a file share.

  • SetShareProperties - Updates properties of a file share.

  • SetDirectoryMetadata - Updates metadata of a directory.

  • SetDirectoryProperties - Updates properties of a directory.

  • ResizeFile - Resizes an existing file.

  • SetFileTier - Sets the tier of a file.

  • SetShareQuota - Updates the quota of a file share.

  • SetShareACL - Updates the access control list of a file share.

  • SetDirectoryACL - Updates the access control list of a directory.

  • SetFileACL - Updates the access control list of a file.

  • DeleteShare - Deletes a file share.

  • AcquireShareLease - Acquires a lease on a file share.

  • ReleaseShareLease - Releases a lease on a file share.

  • RenewShareLease - Renews a lease on a file share.

  • BreakShareLease - Breaks an active lease on a file share.

  • ChangeShareLease - Changes an active lease on a file share.

  • StartCopyFile - Initiates a file copy operation.

  • AbortCopyFile - Cancels an ongoing file copy operation.

  • CopyFileSource - Specifies the source file in a copy operation.

  • CopyFileDestination - Specifies the destination file in a copy operation.

  • CreateShareSnapshot - Creates a snapshot of a file share.

  • DeleteShareSnapshot - Deletes a snapshot of a file share.

  • UndeleteShare - Restores a deleted file share.

  • UndeleteFile - Restores a deleted file.

  • UndeleteDirectory - Restores a deleted directory.

  • RenameFile - Renames a file within a share.

  • RenameFileSource - Specifies the source file in a rename operation.

  • RenameFileDestination - Specifies the destination file in a rename operation.

  • RenameDirectory - Renames a directory within a share.

  • RenameDirectorySource - Specifies the source directory in a rename operation.

  • RenameDirectoryDestination - Specifies the destination directory in a rename operation.

  • COLLABORATION.CREATED - A collaboration event is created.

  • COLLABORATION.REMOVED - A collaboration is removed.

  • COLLABORATION.UPDATED - A collaboration is updated.

  • SHARED_LINK.CREATED - A shared link is created.

  • SHARED_LINK.UPDATED - A shared link is updated.

  • SHARED_LINK.DELETED - A shared link is deleted.

  • FILE.LOCKED - A file is locked for editing.

  • FILE.UNLOCKED - A file is unlocked for editing.

  • COMMENT.CREATED - A comment is added to a file.

  • COMMENT.UPDATED - A comment is updated.

  • COMMENT.DELETED - A comment is deleted.

  • METADATA_INSTANCE.CREATED - A metadata instance is created.

  • METADATA_INSTANCE.UPDATED - A metadata instance is updated.

  • METADATA_INSTANCE.DELETED - A metadata instance is deleted.

  • TASK_ASSIGNMENT.CREATED - A task is assigned.

  • TASK_ASSIGNMENT.UPDATED - A task assignment is updated.

  • SIGN_REQUEST.COMPLETED - A signature request is completed.

  • SIGN_REQUEST.DECLINED - A signature request is declined.

  • SIGN_REQUEST.EXPIRED - A signature request expired.

  • SIGN_REQUEST.SIGNER_EMAIL_BOUNCED - A signature request email bounced.

  • access_item_content - An item’s content is accessed.
  • sync - A file or folder is synced.

  • request_access - Access to an item is requested.

  • approval_requested - An approval request is sent.

  • approval_completed - An approval request is completed.

  • approval_canceled - An approval request is cancelled.

  • approval_comment_added - A comment is added to an approval request.

  • approval_due_time_change - The due time for an approval request is changed.

  • approval_reviewer_change - The reviewer of an approval request is changed.

  • approval_reviewer_responded - A reviewer responds to an approval request.

  • deny_access_request - An access request is denied.

  • expire_access_request - An access request expires.

  • change_owner - The owner of an item is changed.

  • change_document_access_scope - The access scope of a document is changed.

  • change_document_visibility - The visibility of a document is changed.

  • change_acl_editors - The list of editors for a document is modified.

  • change_user_access - User access permissions are modified.

  • shared_drive_membership_change - Membership in a shared drive is changed.

  • shared_drive_settings_change - Shared drive settings are modified.

  • apply_security_update - Security updates are applied.

  • shared_drive_apply_security_update - A security update is applied to a shared drive.

  • shared_drive_remove_security_update - A security update is removed from a shared drive.

  • remove_security_update - A security update is removed.

  • enable_inherited_permissions - Inherited permissions are enabled.

  • disable_inherited_permissions - Inherited permissions are disabled.

  • recovery_phone_edit - A recovery phone number is changed.

  • recovery_secret_qa_edit - A recovery question or answer is changed.

  • account_disabled_password_leak - A user account is disabled due to a password leak.

  • account_disabled_generic - A user account is disabled.

  • account_disabled_spamming - A user account is disabled due to spamming.

  • account_disabled_spamming_through_relay - A user account is disabled for spamming via relay.

  • accept_invitation - A user accepts an invitation.

  • add_info_setting - An informational setting is added.

  • add_member - A new member is added to a group.

  • add_member_role - A role is assigned to a member.

  • add_security_setting - A security setting is added.

  • add_service_account_permission - A permission is assigned to a service account.

  • approve_join_request - A join request is approved.

  • ban_member_with_moderation - A member is banned.

  • change_info_setting - An informational setting is modified.

  • change_security_setting - A security setting is changed.

  • change_group_setting - A group setting is modified.

  • change_group_name - A group's name is changed.

  • change_first_name - A user's first name is changed.

  • change_password - A user's password is changed.

  • suspend_user - A user is suspended.

  • unsuspend_user - A user is unsuspended.

  • update_group_settings - A group's settings are updated.

  • user_license_assignment - A license is assigned to a user.

  • user_license_revoke - A license is revoked from a user.

  • add_group_member - A member is added to a group.

  • remove_group_member - A member is removed from a group.

  • change_user_access - User access permissions are changed.

  • change_acl_editors - The list of editors for a document is changed.

  • application_login_success - An application login succeeds.

  • alert_center_view - The alert center is accessed.

  • request_to_join - A request to join a group is sent.

  • request_to_join_via_mail - A request to join a group via email is sent.

  • approval_requested - An approval request is made.

  • approval_canceled - An approval request is canceled.

  • approval_comment_added - A comment is added to an approval request.

  • approval_completed - An approval request is completed.

  • approval_due_time_change - The due time of an approval request is changed.

  • approval_reviewer_change - The reviewer of an approval request is changed.

  • approval_reviewer_responded - A reviewer responds to an approval request.

  • deny_access_request - An access request is denied.

  • expire_access_request - An access request expires.

  • shared_drive_membership_change - Membership in a shared drive is changed.

  • shared_drive_settings_change - Shared drive settings are changed.

  • apply_security_update - A security update is applied.

  • remove_security_update - A security update is removed.

  • shared_drive_apply_security_update - A security update is applied to a shared drive.

  • shared_drive_remove_security_update - A security update is removed from a shared drive.

  • suspicious_login - A suspicious login is detected.

  • suspicious_login_less_secure_app - A suspicious login from a less secure app is detected.

  • suspicious_programmatic_login - A suspicious programmatic login is detected.

  • user_signed_out_due_to_suspicious_session_cookie - A user is signed out due to a suspicious session cookie.

  • FolderMoved - A folder is moved to a new location.
  • FolderRenamed - A folder is renamed.

  • FileSensitivityLabelChanged - A file's sensitivity label is modified.

  • FileSensitivityLabelApplied - A sensitivity label is applied to a file.

  • SharingSet - Sharing permissions are updated.

  • AddedToGroup - A user is added to a group.

  • SiteDeleted - A SharePoint site is deleted.

  • GroupRemoved - A group is removed.

  • FileCopied - A file is copied.
  • SharedLinkCreated - A shared link is created.

  • SharedLinkDisabled - A shared link is disabled.

  • SharingInvitationAccepted - A sharing invitation is accepted.

  • SharingRevoked - A sharing invitation is revoked.

  • AnonymousLinkCreated - An anonymous link is created.

  • SecureLinkCreated - A secure link is created.

  • SecureLinkUpdated - A secure link is updated.

  • SecureLinkDeleted - A secure link is deleted.

  • AccessInvitationAccepted - An access invitation is accepted.

  • AccessInvitationRevoked - An access invitation is revoked.

  • AccessRequestApproved - An access request is approved.

  • AccessRequestRejected - An access request is rejected.

  • FileCheckOutDiscarded - A file checkout is discarded.

  • FileCheckedIn - A file is checked in.

  • FileCheckedOut - A file is checked out.

  • SharingInheritanceBroken - Sharing inheritance is broken.

  • AddedToSecureLink - A user is added to a secure link.

  • RemovedFromSecureLink - A user is removed from a secure link.

  • SiteCollectionCreated - A new SharePoint site collection is created.

  • GQL

    GQL (Getvisibility Query Language) is a query language designed to enhance the flexibility and efficiency of querying data through the DSPM+, DDC, and EDC platforms. It enables the creation of custom queries without the need for hard coding, significantly simplifying the process of filtering through and analysing data.

    Based on Apache Lucene query language, GQL supports boolean, term, and range queries. This flexibility allows the language to seamlessly integrate with the platform’s Analytics software to produce elegant and insightful visualisations.

    Once mastered, GQL offers maximum flexibility, enabling both broad and precise data analysis.

    Usage

    Terms

    There are separate sets of terms used for the different datasets within the DSPM+, DDC, and EDC platforms. Each of the datasets allow for unique GQL terms relating to this data:

    • Files: Unstructured data discovered and classified on-prem and in the cloud file storage locations. GQL term examples: path, ingestedAt, flow

    • Trustees: Users and groups that are discovered in on-prem and in cloud IAM systems. GQL term examples: type, isAdmin, outdatedPassword

    For the full sets of terms, see tables below.

    Operations

    Operations are performed on or between terms to help filter data. The available operations are:

    • AND Combines queries to match items meeting all conditions

    • OR Matches items meeting any listed conditions

    • () Groups queries to clarify operation order

    Formation

    Queries are formed using terms, their values, and operations. They can be as simple as a query looking for High Risk HR Data:

    To complex queries specifying Health, Safety, and Compliance Documents as a data asset in DSPM:

    The UI will give suggestions as you type to help out.

    You should experiment with GQL queries across various platform interfaces. See what works and what doesn't. Get creative and let the real-time suggestions assist you. Remember, you can save the queries you create as bookmarks for future use.

    Click on the star

    Enter a description, select Accept

    The bookmark is saved

    Scroll down to see saved bookmarks

    Dates

    Queries can be created that incorporate dates. These can include exact dates and times or ranges. Date types include: createdAt, lastModifiedAt, and ingestedAt.

    GQL will provide suggestions for common time intervals such as minutes, days, months, and years.

    Once a date type has been selected and an operation associated with it, a date interface will be presented to the user. Simply search for and select the appropriate date to create the query.

    Date ranges

    If a specific range of dates are needed, for example, all files created in May 2022, the following method should be used.

    This method will search for files whose creation dates are greater than or equal to midnight on the 1st May 2022 and less than midnight on the 1st of June 2022.

    1. Type createdAt>= and select the first date

    1. Select AND

    1. Type createdAt< and select the closing date

    1. Hit enter or the search icon and the query will the filter the results

    This method can be used with any date data type. It can be as granular as seconds or as broad as years.

    Aggregation

    When creating or editing widgets such as counters, charts, or maps in the Analytics boards you will have the ability to aggregate some of the terms in the datasets. For example: you can use counts to show critical shared files, group by file type when displaying classification results, or use multiple groupings to create more complex visualisations.

    While not strictly part of GQL yet, they are useful to know as it will help in constructing more descriptive visualisations.

    GQL glossaries

    • GQL Term: Used in the query

    • Label: Displayed in the interface

    • Type: Data type of the term

    • Aggregation: Grouping types that are available to that term, only in the Analytics boards

    Files Dataset

    Unstructured data discovered and classified from on-prem and cloud file storage locations.

    GQL Term
    Label
    Type
    Description
    Aggregation

    Trustees dataset

    Users and groups that are discovered in on-prem and in cloud IAM systems

    GQL Term
    Label
    Type
    Description
    Aggregation

    Activity dataset

    User activities tracked by endpoint classification.

    GQL Term
    Label
    Type
    Description
    Aggregation

    Management dataset

    Administrative data from individual classification endpoints

    GQL Term
    Label
    Type
    Description
    Aggregation

    Remediation Dataset

    List of all remediation events initiated by users.

    GQL Term
    Label
    Type
    Description
    Aggregation

    Streaming Data Set

    Real time events events from DDR

    GQL Term
    Label
    Type
    Description
    Aggregation

    Extending Streaming

    Similar to Streaming events dataset but usually with more detail, however the do not appear on the UI as quickly.

    GQL Term
    Label
    Type
    Description
    Aggregation

    File Audit Log

    Events associated with scanning of files.

    GQL Term
    Label
    Type
    Description
    Aggregation

    Activity: User activities tracked by the endpoint classification platform. GQL term examples: recipients, operation, agentId

  • Management: Administrative data from individual classification endpoints. GQL term examples: lastSeen, status, os

  • Remediation: Actions initiated by users to remediate issues. GQL term examples: actionType, errorReason, updatedPermissions

  • Streaming: Real time events from DDR functionality. GQL term examples: source, eventTime, eventId

  • Extended Streaming: Similar to Streaming but there is a longer delay as to when these events will appear in the UI. Some of the event types may also contain more detail than when compared to the Streaming dataset. GQL term examples: tenantId, rawEventType, streamingEventType

  • File Audit Log: List of events that are associated with the scanning of files. GQL term examples: source, functionalityName, moduleName

  • = Equal to

  • != Not equal to

  • > Greater than

  • < Less than

  • >= Greater than or equal to

  • <= Less than or equal to

  • EXISTS

  • NOT_EXISTS

  • Path

    STRING

    The path of the document

    contentLength

    Content length

    LONG

    The size of the document in bytes

    count,

    sum,

    average,

    min, max,

    median,

    Can be grouped

    risk

    Risk

    NUMBER

    The document risk factor. low=0,

    medium=1,

    high=2

    source

    Source

    STRING

    The source of the document

    Can be grouped

    createdAt

    Created at

    DATE

    The document creation date

    min, max, median,

    Can be grouped

    lastModifiedAt

    Last modified at

    DATE

    The document last modified date

    min, max, median,

    Can be grouped

    ingestedAt

    Ingested at

    DATE

    The document ingested date

    min, max, median,

    Can be grouped

    lastAccessedAt

    Last accessed date

    DATE

    The document last accessed date

    min, max, median, Can be grouped

    flow

    Flow

    STRING

    The document current flow stage

    Can be grouped

    classification

    Classification

    STRING

    The classification of the document

    Can be grouped

    classificationConfidence

    Classification confidence

    DOUBLE

    The classification confidence of the document

    configurationIds

    Configuration Id

    STRING

    The configuration id of the document

    connectorId

    Connector name

    STRING

    Name of the scan connector

    Can be grouped

    sensitive

    Scan Trigger

    BOOLEAN

    The document sensitive flag

    scanTrigger

    Manual Classification

    BOOLEAN

    The trigger of the file scanning

    critical

    Critical

    BOOLEAN

    The document critical flag

    md5

    Document hash

    STRING

    The hash value of the document

    Can be grouped

    keywordHits

    Keyword Hits

    STRING

    The keyword hits of the document

    Can be grouped

    detectorHits

    Detector Hits

    STRING

    The detector hits of the document

    Can be grouped

    directPermissionsId

    Direct permissions ids

    STRING

    Direct permissions ids of the document

    Can be grouped

    indirectPermissionsId

    Indirect Permission Id

    STRING

    (Experimental) The trustee Id of the document that has access either directly or indirectly

    indirectPermissions

    Indirect Permissions

    STRING

    (Experimental) The trustee name of the document that has access either directly or indirectly

    trusteeName

    Trustee Name

    STRING

    The name of an owner of the document

    Can be grouped

    trusteeLoginName

    Trustee Login Name

    STRING

    The login name of the owner of the document

    dataAttributeName

    Data Attribute Name

    STRING

    The data attribute of the document

    Can be grouped

    distributionTags

    Distribution Tag Name

    STRING

    The distribution tag of the document

    Can be grouped

    keyword

    Keyword

    STRING

    Keyword of the document

    Can be grouped

    complianceTag

    Compliance Tag

    STRING

    Compliance Tag of the document

    Can be grouped

    location

    Location

    STRING

    To get Documents by connection location

    Can be grouped

    language

    Language

    STRING

    The document language

    externalSharedLink

    External Shared Link

    BOOLEAN

    The document sharing status

    ownerId

    Owner

    Identifier

    STRING

    The document owner identifier

    Can be grouped

    downloadUrl

    Download URL

    STRING

    The download URL of the document

    machineName

    Machine Name

    STRING

    The machine name of the endpoint where the document originated

    Can be grouped

    cloudLabels

    Cloud Labels

    STRING

    The document cloud labels

    Can be grouped

    dataAssets

    Data asset

    STRING

    The document data asset

    Can be grouped

    departments

    Department

    STRING

    The document department

    Can be grouped

    dataOwners

    Data Owner

    STRING

    Data Owner Name

    Can be grouped

    dynamicAttributes

    Dynamic attribute

    STRING

    The document dynamic attribute

    Can be grouped

    sourceEntityId

    Source entity identifier

    STRING

    The document id as reported from the source system

    Can be grouped

    Trustee Type

    STRING

    Indicates the trustee is a user or a group

    Can be grouped

    source

    Source

    STRING

    The type of the connector

    Can be grouped

    name

    Login Name

    STRING

    Login name of the trustee

    Can be grouped

    displayName

    Display Name

    STRING

    The display name of the trustee (user or group)

    Can be grouped

    isEnabled

    Enabled Status

    BOOLEAN

    Indicates if the trustee is enabled

    isAdmin

    Admin Privileges

    BOOLEAN

    Indicates if the trustee has administrator privileges

    outdatedPassword

    Outdated Password

    BOOLEAN

    Indicates if the trustee's password is outdated

    min, max, median, Can be grouped

    lastLoginAt

    Last Login Date

    DATE

    The timestamp when the trustee (user or group) logged in

    min,

    max,

    median,

    average,

    Can be grouped

    lastModifiedAt

    Last Modified Date

    DATE

    The timestamp when the trustee's (user or group) record was modified

    min,

    max,

    median,

    average

    createdAt

    Creation Date

    DATE

    The time trustee was created

    min,

    max,

    median,

    average

    ingestedAt

    Ingestion Date

    DATE

    The timestamp when the trustee (user or group) was ingested

    min, max, median, Can be grouped

    configurationId

    Configuration ID

    STRING

    Configuration ID associated with the trustee (user or group)

    Can be grouped

    isActive

    Active Status

    BOOLEAN

    Indicates if the trustee is active

    Email Recipients

    STRING

    The recipients of the email

    Can be grouped

    senderEmail

    Email Sender

    STRING

    The sender of the email

    Can be grouped

    subject

    Email Subject

    STRING

    The subject of the email

    Can be grouped

    operation

    Operation Type

    STRING

    The type of the operation performed

    Can be grouped

    eventTime

    Event Time

    DATE

    The time when the event occurred

    min, max,

    median,

    Can be grouped

    ipAddress

    IP Address

    STRING

    The IP address of the machine where the activity was performed

    Can be grouped

    eventType

    Event Type

    STRING

    The event type of the activity performed (i.e. Activity with File, Email)

    Can be grouped

    hostName

    Host Name

    STRING

    The hostname of the machine where the activity was performed

    Can be grouped

    department

    Department

    STRING

    The department of the user who performed the activity

    Can be grouped

    agentId

    Agent Id

    STRING

    The identification of the agent who performed the activity

    Can be grouped

    entityId

    Agent

    STRING

    Unique identifier of the machine

    Can be grouped

    user

    User

    STRING

    The username of the individual who performed the activity

    Can be grouped

    contentLength

    File Size

    BYTES

    The size of the file involved in the activity

    sum, average, min, max, median,

    Can be grouped

    mimeType

    File Type

    STRING

    The MIME type of the file

    Can be grouped

    fileName

    File Name

    STRING

    The name of the file

    Can be grouped

    filePath

    File Path

    STRING

    The path of the file

    Can be grouped

    creationTime

    Created At

    DATE

    The time when the file involved in the activity was created

    min, max,

    median,

    Can be grouped

    lastModificationTime

    Last Modified At

    DATE

    The last time the file involved in the activity was changed

    min, max, median,

    Can be grouped

    tags

    Tags

    STRING

    Classification tags

    Can be grouped

    classificationTag

    Classification Tag

    STRING

    Classification Tag of the document

    Can be grouped

    distributionTag

    Distribution Tag

    STRING

    The distribution tag of the document

    Can be grouped

    complianceTag

    Compliance Tag

    STRING

    Compliance Tag of the document

    Can be grouped

    senderDomain

    Sender Domain

    STRING

    Sender Domain of email

    Can be grouped

    recipientDomain

    Recipient Domain

    STRING

    Recipient Domain of email

    Can be grouped

    Domain

    STRING

    Shows the Active Directory domain name, if applicable

    Can be grouped

    ipAddress

    IP Address

    STRING

    Shows the IP address last recorded when the device was active

    Can be grouped

    status

    Online Status

    STRING

    Shows whether the device is currently online or offline

    Can be grouped

    user

    User Name

    STRING

    Displays the name of the last user who logged into the device

    Can be grouped

    version

    Agent Version

    STRING

    The version of the agent software currently installed on the device

    Can be grouped

    os

    OS

    STRING

    Indicates the operating system of the device, either Windows or Mac

    Can be grouped

    deviceId

    Device ID

    STRING

    Displays the ID of the device

    department

    Department

    STRING

    Displays the department the agent belongs to

    Can be grouped

    Action type

    STRING

    Action type

    Can be grouped

    errorReason

    Error Reason

    STRING

    Error Reason

    Can be grouped

    updatedAt

    Updated At

    STRING

    Modification date of the file

    min,

    max,

    median,

    Can be grouped

    updatedPermissions

    Updated permission

    STRING

    Permissions that were revoked

    Can be group

    updatedTrustees

    Updated trustees

    STRING

    Updated trustees

    Can be grouped

    createdAt

    Created At

    DATE

    Creation date of remediation request

    min,

    max,

    median,

    Can be grouped

    path

    Source file path

    STRING

    Path of the source file

    Can be grouped

    errorReasonFull

    Full Error Reason

    STRING

    Full error reason message

    Can be grouped

    targetSource

    Target connector type

    STRING

    Connector type of the target file

    Can be grouped

    targetPath

    Target file path

    STRING

    Path of the target file

    Can be grouped

    createdBy

    Created By - ID

    STRING

    ID of the user who initiated remediation

    Can be grouped

    createdByName

    Created By - Username

    STRING

    Name of the user who initiated remediation

    Can be grouped

    fileId

    File ID

    STRING

    The Id of the file

    Can be grouped

    configurationIds

    Source configuration id

    STRING

    Configuration id of the source file

    Can be grouped

    batchRequestId

    ID of a batch request

    STRING

    ID of a batch request related to the event

    Can be grouped

    targetConfigurationIds

    Target configuration id

    STRING

    Configuration id of the target file

    Can be grouped

    Action type

    STRING

    Type of the action

    Can be grouped

    eventTime

    Event time

    DATE

    The time of the event

    min,

    max,

    median,

    Can be grouped

    userName

    Username

    STRING

    Name of the user

    Can be grouped

    userId

    User ID

    STRING

    Id of the user

    Can be group

    fileName

    File name

    STRING

    File name

    Can be grouped

    eventId

    Id

    STRING

    Id of a file

    Can be grouped

    path

    File path

    STRING

    File path

    Can be grouped

    configurationIds

    Scan configuration ID

    STRING

    Scan configuration ID

    Can be grouped

    fileId

    fileId

    STRING

    Id of the file

    Can be grouped

    Scan Configuration ID

    STRING

    The identifier of the scan configuration

    Can be grouped

    rawEventType

    Raw Event Type

    STRING

    The type of the raw event

    Can be grouped

    streamingEventType

    Streaming Event Type

    STRING

    The type of the streaming event

    Can be grouped

    actionType

    Action Type

    STRING

    The type of action performed

    Can be group

    entityId

    Entity ID

    STRING

    The identifier of the entity

    Can be grouped

    entityName

    Entity Name

    STRING

    The name of the entity

    Can be grouped

    fileId

    File ID

    STRING

    The identifier of the file

    Can be grouped

    sourcePath

    Source Path

    STRING

    The source path of the file

    path

    Path

    STRING

    The path of the file

    connectorType

    Connector Type

    STRING

    The type of connector used

    Can be grouped

    userId

    User ID

    STRING

    The identifier of the user

    Can be grouped

    userName

    User Name

    STRING

    The name of the user

    Can be grouped

    timestamp

    Timestamp

    DATE

    The time when the event occurred

    min, max, median,

    Can be grouped

    Functionality Name

    STRING

    The name of the functionality associated with the event

    Can be grouped

    moduleName

    Module Name

    STRING

    Name of the module associated with the event

    Can be grouped

    createdAt

    Created At

    STRING

    Creation date of the file

    min,

    max,

    median,

    Can be grouped

    message

    Message

    STRING

    Message

    Can be group

    path

    File path

    STRING

    Path of the file

    Can be grouped

    fileId

    File ID

    STRING

    The Id of the file associated with the event

    Can be grouped

    scanId

    Scan ID

    STRING

    Id of the scan associated with the event

    Can be grouped

    configurationId

    Configuration ID

    STRING

    ID of the configuration associated with the event

    Can be grouped

    fileId

    Id

    STRING

    The internal Id of the document

    fileType

    File Type

    STRING

    The type of the document

    Can be grouped

    trusteeId

    Trustee ID

    STRING

    The Id of the trustee

    Can be grouped

    groups

    Groups

    STRING

    The groups that the trustee is a member of

    Can be grouped

    spanId

    Span ID

    STRING

    The span ID of the activity performed

    Can be grouped

    traceId

    Trace ID

    STRING

    The trace ID of the activity performed

    Can be grouped

    lastSeen

    Last Seen

    DATE

    The last time the device was observed to be online

    min, max, median,

    Can be grouped

    hostName

    Host Name

    STRING

    The identification of the agent who performed the activity

    Can be grouped

    source

    Source connector type

    STRING

    Connector type of the source file

    Can be grouped

    status

    Status

    STRING

    Status of the remediation request

    Can be grouped

    source

    Source

    STRING

    Source of the event

    Can be grouped

    eventType

    Event type

    STRING

    Type of the event

    Can be grouped

    id

    ID

    STRING

    The unique identifier of the streaming event

    Can be grouped

    tenantId

    Tenant ID

    STRING

    The identifier of the tenant

    Can be grouped

    source

    Source connector type

    STRING

    Connector type associated with the file

    Can be grouped

    eventType

    Event Type

    STRING

    Type of the event

    Can be grouped

    path

    type

    recipients

    domain

    actionType

    actionType

    scanConfigurationId

    functionalityName

    dataAttributeName=HR AND risk=2
    complianceTag=PII AND dataAttributeName=HR AND (dataAttributeName=Record OR
    dataAttributeName=Legal OR dataAttributeName=Safety) AND
    (detectorHits="Health Insurance" OR detectorHits="Risk assessment" OR
    detectorHits="Policy and Procedure" OR detectorHits="Compliance report" OR
    detectorHits="Safety Policies" OR detectorHits="Security Policies")