Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Data Security Posture Management (DSPM) is a methodology that allows organisations to locate and classify sensitive data across their data stores, assess the risks associated with this data’s exposure based on its sensitivity, and implement access controls to this data.
Data Discovery and Classification Automatically identifying and categorising data across an organisation’s data stores, particularly those in the cloud, based on sensitivity and regulatory requirements.
Risk Assessment Evaluating the security risks associated with both structured and unstructured data, considering factors such as exposure, access patterns and potential impact if breached.
Access and Permissions Management Ensuring that access to data is strictly controlled and that permissions are granted based on the principle of least privilege.
Data Protection Recommending security measures such as encryption, data masking and access controls to protect sensitive data.
Monitoring and Alerting Continuously monitoring data access and usage to detect suspicious activities or policy violations, with real-time alerts to facilitate rapid response.
Basic information on the Getvisibility Query Language
GQL: Query language
Based on: Apache Lucene
Supports: Boolean, term, and range queries
Use: For custom queries without hard coding
Choose terms from specific dataset: Files, Trustees, Activity, Management
Apply operations like AND, OR, =, !=, >, <, >=, <= to filter data
Form queries, e.g., flow=classification AND risk>=1.
Simple: dataAttributeName=HR
Complex: complianceTag=PII AND dataAttributeName=HR AND (dataAttributeName= Record OR dataAttributeName=Legal) AND (detectorHits="Health Insurance" OR detectorHits="Compliance report")
Use in widgets for counters, charts, maps
Aggregate terms for complex visualisations
Mark to create
Mark to create
K3s uses Flannel to allow pod to pod communication between different hosts, is a lightweight provider of layer 3 network fabric that implements the Kubernetes Container Network Interface (CNI). It is what is commonly referred to as a CNI Plugin.
Flannel supports multiple backends for encapsulating packets. By default K3s uses Virtual Extensible LAN (VXLAN), which runs a Layer 2 network on top of a Layer 3 infrastructure. VXLAN uses in-kernel VXLAN to encapsulate the packets using UDP on port 8472.
During one of our HA setups () we noticed after running tcpdump -leni any -w output.pcap the UDP packets were not arriving at the destination host and we had to change the Flannel backend from VXLAN to host-gw which uses IP routes to pod subnets via node IPs.
To use host-gw backend you need to execute the following steps in all the nodes:
Two-factor authentication (2FA) enhances security by requiring users to provide two forms of identification before they are granted access. This method adds a layer of protection to the standard username and password method, making it significantly more challenging for potential intruders to gain unauthorised access.
Implementing 2FA in Keycloak helps organizations bolster their defences against data breaches and unauthorized access, which is crucial for protecting sensitive data in today’s digital landscape.
Going to the 'Authentication' tab, clicking on the browser
A VM or server with the following specifications:
16 x CPU cores (x86_64 processor with speed of 2.2 GHz or more). The CPU must support the instructions SSE4.1 SSE4.2 AVX AVX2 FMA
64GB RAM
700GB Free SSD disk. K3s will be installed in /var/lib/rancher so space should be allocated there. We also need 10-20 GB free space at / and /var.
How to access the Customer Support Portal and submit a ticket
To access and use the Portal, please follow the below steps:
Access the Portal by visiting
If an account has not yet been created (this is usually sent via email upon first contact with ), select the 'Sign Up' option located in the top right corner of the screen.
When the email is received, use the URL provided in the email to set a new password. After setting your password, the 'Login
Where to find the risk calculation rules.
To view the Critical and Sensitive Classification rule configuration from the Dashboard Click on Administration > Detectors > Attributes Detectors
Here the Critical and Sensitive attributes configuration can be viewed.
The Critical & Sensitive rules for Risk calculation can be re-configured by clicking the pencil icon beside the rule.
The Risk rules are based on a GQL query which can changed by clicking on the pencil icon on the right or by importing a JSON file using the “Import from file” function.
Once the edits are saved, the Sensitive and Critical Flags will update automatically. For the Risks to be recalculated, a rescan is needed.
Ubuntu 20.04 LTS Server OS is recommended. RHEL 8.6, 8.7, 8.8, & 9.2, and Suse Linux 15.3 are also supported but may need extra configuration.
Port 443/TCP open
Outbound internet access to download application artefacts. 100 Mbps download speed recommended
Domain Name Service (DNS) with public name resolution enabled
Network Time Protocol (NTP)
Software Update Service - access to a network-based repository for software update packages.
Fixed private IPv4 address
Unique static hostname



Getvisibility products use Kubernetes under the hood, and we have very specific hardware requirements. It's crucial to meet the minimum resource requirements defined for containers, as failing to do so can lead to various problems:
Resource Starvation: If a container requests more CPU or memory resources than are actually available on the cluster, it can lead to resource starvation. This means other containers may not get the resources they need to run correctly, causing performance degradation or even crashes.
Throttling: Kubernetes imposes resource limits for containers, and if a container's requested resources exceed its limits, Kubernetes may throttle or terminate the container to prevent it from consuming excessive resources, resulting in performance degradation.
Out of Memory or CrashLoopBackOff Errors: Oversubscribing memory resources can lead to containers running out of memory, causing them to terminate abruptly or enter a constant restart loop, commonly referred to as a "CrashLoopBackOff" error.
Performance Degradation: When requested CPU resources are larger than allocated, it can lead to performance issues as containers compete for CPU time, potentially slowing down critical processes and making the application unresponsive.
Difficult Troubleshooting: Misallocation of resources, whether it's too little or too much, can be challenging to identify and correct. This can lead to extended troubleshooting efforts and downtime as administrators attempt to resolve resource-related issues.
To ensure a stable and efficient Kubernetes deployment of our product, it's essential to accurately configure resource requests and limits for containers based on their actual requirements. This prevents resource-related problems and ensures smooth operation within the Kubernetes cluster.
The deployment guide will help you install, configure, and manage your deployment environment. It covers everything from initial setup to advanced configurations and troubleshooting.
It includes:
Prerequisites for a successful installation.
Step-by-step instructions for installing and upgrading K3S.
Setting up Rancher and Fleet agents with an HTTP proxy.
Guide to installing Synergy/Focus/Enterprise using Helm without Rancher.
Guide for configuring Keycloak.
Troubleshooting guide.
TODO
Restart VM
You will need to slightly change the command you use to configure K3s into master nodes (you don’t need to change the command for workers nodes as they’ll read the configuration form the master ones) by appending --flannel-backend=host-gw , example:
curl -sfL https://$URL/k3s.sh | INSTALL_K3S_VERSION="v1.26.10+k3s1" K3S_KUBECONFIG_MODE="644" sh -s - server --node-name=master1 --cluster-init --flannel-backend=host-gwIn the 'Browser - Conditional OTP' section select 'Required'
Select the user needed to be added to use 2FA and add 'Configure OTP' under the 'Required user actions' section
When logging in with that user, a screen will guide on how to follow the configuration
Please ensure that 'Configure OTP' under the 'Required user actions' section is removed (otherwise everytime it will ask to configure otp as if it was the first time).
After configuring this every time the user logs in, they will be required a ONE-TIME CODE
(Optional) OTP settings can be found here
Once logged into the Customer Portal, it is possible to view and respond to tickets submitted by the organisation's other members.
Once logged in the following options are at the top of the screen:
My Tickets - This will show all tickets that have been created by the logged in account.
Other Tickets - This will show all tickets that the logged in account has been CC’d on.
Archived Tickets - This will show any ticket older than 120 days.
Company Tickets - This will show all tickets submitted by members of the organisation.
From the ticket view, all updates to the ticket are visible and has the option to mark the ticket as solved. This will close the ticket and then a ‘Happiness Rating’ can be added to rate the level of support that was received.
To close the ticket click on ‘Please consider this request solved', then add a final message and click 'Submit’.
If there are issues or feedback about the Support Portal, please send an email to [email protected], and we will make every effort to address any concerns.
Make sure you have already configured a license through the License Manager and that the end user has installed K3s and run the registration command as described in K3S installation.
Please check K3S installationfor installation requirements.
Go to Rancher dashboard and click on the customer cluster that by now should be Active:
Go to Apps > Charts and install the GetVisibility Essentials Helm chart:
2.1. Click on Enable ElasticSearch:
2.2. Configure the UTC hour (0-23) that backups should be performed at:
Go to Apps > Charts and install the GetVisibility Monitoring Helm chart and Install into Project: Default.
Go to the global menu Continuous Delivery > Clusters and click on Edit config for the cluster:
a. For Synergy: add 2 labels product=synergy environment=prod and press Save.
b. For Focus: add 2 labels product=focus environment=prod and press Save.
c. For Enterprise: add 2 labels product=enterprise environment=prod and press Save.
d. For DSPM with the Agent: add 2 labels product=ultimate environment=prod and press Save.
e. For DSPM without the Agent: add 2 labels product=dspm environment=prod and press Save.
How to configure Rancher and Fleet agent.
This is applicable when there is a cluster showing as “unavailable“ after the user configured a proxy on the server.
Run env on the user’s server to determine what is the proxy IP. Ensure that the following line is checked:
Open the file /etc/systemd/system/k3s.service.env and append the following lines:
It is important to use correct IP addresses in the place of placeholders $PROXY_IP and $NODE_IP below.
Restart k3s:
Go to the Rancher dashboard Cluster Management > Clusters and click on Edit Config for the cluster:
a. Go to Advanced Options:
b. Configure the following Agent Environment Variables and press Save:
Remember to use correct IP addresses in the place of placeholders $PROXY_IP and $NODE_IP below.
Run the command:
6. Type letter “i“ to insert text and on the env section, type the following lines:
Example:
Save by pressing Esc and then typing "wq"
Do the same on the fleet-agent by running the command:
Repeat Step 6.
After applying all the changes, wait for the cluster to show as Online on Rancher.
In order for the connectors to support proxy settings, you will need to enable it in the configuration page:
To check the AI Mesh version from the Dashboard click on Administration > AI Mesh
In the top right of this screen the AI Mesh Version can be seen.
If more information on the AI Mesh is required or if tailoring is needed please contact Support.
When a targeted rescan is needed it is possible to scan individual files or a specific selection.
Ensuring that recent changes to files are reflected in the UI.
If new patterns have been added to Pattern Matching.
If new rules have been added in Controls Orchestration.
Files can be sent for rescan individually by clicking on the hamburger menu for that file and click on “send to classification pipeline.
There is also an option to reclassify multiple files at once by selecting them using the tickboxes on the left of the screen.
Once the required files are selected the option to rescan appears on the bottom right of the screen.
How to find the list of permissions granted for a Data Source
The required permissions for scanning are documented by Data Source.
For more information please review the list here.
To Check the configured permissions for a Data Source Navigate to Administration > Data Sources and click on the hamburger menu.
In the dropdown click permissions.:
The example below shows the permissions for SharePoint Online.
How to set a specific schedule for a scan.
When a Data Source is added to Getvisibility for scanning, the scan begins automatically.
If a rescan is needed this can be configured by clicking on Administration > Data Source > (the Data Source that needs Rescan e.g. One Drive) > Hamburger menu > Rescan Scheduler.
The default configuration is Does Not Repeat.
By clicking the drop-down menu other options can be choosen:
In this option both the time zone and time of day can be chosen
With this option as well as the above configuration a specific or multiple days of the week
This gives the option to pick a specific day or days each month to run the rescan.
This article is applicable when there is a cluster showing as “unavailable“ after the user configured a proxy on the server.
If you have a cluster which hasn’t been registered yet (registration command has not been run yet), then refer to .
Keycloak configuration (each part ends with a whats next) + Log in to an existing account
Set up an integration (configure a data source)
Configure taxonomy → link to a full taxonomy setup
Getvisibility DDR continuously monitors new files generated through streaming and provides real-time insights
Filter by Streaming: Under Enterprise Search, use the filter scanTrigger=streaming.
View File Details: DDR displays:
File Path:
The platform supports a wide range of Single Sign-On (SSO) protocols and providers, enabling seamless authentication across various services and applications. Here are the primary SSO protocols and some of the identity providers that Keycloak can integrate with:
OpenID Connect (OIDC): A modern, widely adopted protocol based on OAuth 2.0 for client authentication and authorization. It's used by many identity providers for secure and flexible user authentication.
Below is a list of Data Sources that Getvisibility DDR (Streaming) currently supports:
AWS IAM
AWS S3
Azure AD
activate pattern matching → link to a full pattern matching setup
import dashboards → link to a full dashboard configuration guide
view scan results (explain dashboards, use cases, enterprise search, navigation, remediations)
configure policies (dataguard)
SAML 2.0: a commonly used protocol for exchanging authentication and authorization data between parties, particularly in enterprise environments.
The platform allows integration with a variety of identity providers using these protocols, including:
Amazon
OpenShift v3 / v4
GitHub
Microsoft
Apple
This flexibility ensures that SSO can be implemented using preferred systems and protocols, facilitating easy and secure access to multiple applications with a single set of credentials.
Azure Files
Exchange Online
OneDrive
SharePoint Online
Box
Confluence Cloud
Gmail
Google Drive
Google IAM
SMB
LDAP (Windows AD)


























The two types of scan are Trustee Scan and File Scan
This scan provides the list of Users and Groups on a Data Source
This scan provides information about files and folders on a Data Source including structure and metadata.
Once both scans are completed the data is processed and the two sets are combined to show who has access to what files.
No, file content is never saved. The classification server maintains a registry of file names and their properties but not the content. There is also an anonymization mechanism built into the Classification software that reduces file content to a mathematical number that is used throughout the platform.
More specifically when a Data Source is added to the platform the following occours:
Data source scanned and general meta data is read.
This provides file path and permissions on the files.
The files are then sent to the OCR service to read the content.
The read content is then passed through the AI Mesh.
Through the process, customer data is not stored on disk and is only ever held in memory.
There is no long-term storage of data.




http_proxy=http://X.X.X.Xhttp_proxy="$PROXY_IP"
https_proxy="$PROXY_IP"
no_proxy="$NODE_IP,localhost,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.svc,.cluster.local"systemctl restart k3s.serviceHTTP_PROXY: $PROXY_IP
HTTPS_PROXY: $PROXY_IP
NO_PROXY: $NODE_IP,localhost,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.svc,.cluster.localkubectl edit deployment -n cattle-system cattle-cluster-agent -o yaml- name: HTTP_PROXY
value: $PROXY_IP
- name: HTTPS_PROXY
value: $PROXY_IP
- name: NO_PROXY
value: $NODE_IP,localhost,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.svc,.cluster.local
kubectl edit deployment -n cattle-fleet-system fleet-agent -o yamlRun env on the user’s server to determine what’s the proxy IP. You should check the following line:
Open the file /etc/systemd/system/k3s.service.env and append the following lines:
Restart k3s systemctl restart k3s.service
Go to the Rancher dashboard Cluster Management > Clusters and click on Edit Config for your cluster:
a. Go to Advanced Options:
b. Configure the following Agent Environment Variables and press Save:
Run the command kubectl edit deployment -n cattle-system cattle-cluster-agent -o yamland
Type letter “i“ to insert text and on the env section, type the following lines:
Example:
Save by pressing ESC and then typing “wq”
Now do the same on the fleet-agent by running the command kubectl edit deployment -n cattle-fleet-system fleet-agent -o yaml
Repeat the step #6.
After applying all the changes, wait for some minutes for the cluster to become Online on Rancher again.
In the side bar detailed information regarding the scan of the chosen Data Source can be reviewed.
Clicking on any of the fields in the Sidebar brings up a more detailed view of the data as well as giving the option to Remediate any issues that have been found.
For a more detailed breakdown of Analytics please see here.

Enter the details of the SMB server to scan
Name: Give a name to the scan to identify it later
Username: The user must be an admin level and have access to all the SMB/CIFS shares to be scanned
Password: Password for the admin user
Host IP Address: The IP Address of the SMB/CIFS server
Domain/Workgroup: The domain or workgroup to which the CIFS/SMB server belongs
Port: 445 is the default port, however if the default port is not used, input the correct port number for the SMB protocol
Click on the Folder icon in Path to select a particular share/folder to scan, or leave the path as empty to scan all shares
Save the configuration
Once the configuration is saved, click on the icon on the right and select Start file scan to begin scanning
The scan results can be viewed under Dashboard -> Enterprise Search
The connector supports all SMB dialects up to SMB 3.1.1

Classification: Sensitivity level (Confidential, Highly Confidential, etc.).
Risk Level: Based on context and user activity.
Compliance Tags: Indicators for GDPR, HIPAA, PCI, and other regulations.
Detection Rules: The specific DDR rules triggered by the file.
Incident Response: If a high-risk file is detected, DDR generates an alert and suggests remediation steps, such as quarantining the file or revoking access.
You need Azure Admin permission to complete this integration.
Create a new App registration from selecting support for Multiple organizations when asked.
Find App registration in search.
Click New registration.
Fill in details as shown below.
Give the application a name and write down Application (client) ID as it will be needed later.
Next, go to your App Registrations’ Certificates & secrets to create a New client secret. Copy the Value of the secret to somewhere at had as it is needed later in the configuration.
In Keycloak, create a new IdP by selecting Microsoft from the drop down
Populate Client ID (this is Application (client) ID in Azure) and Client Secret (this is Value from Azure) using values obtained in previous steps.
Finally copy Redirect URI from Keycloak and add Redirect ID UI link in Azure App.
Open up a new Incognito mode in a browser and use
This document provides information on how to configure Gmail connection for Focus product.
Create a Project in Google Cloud Console:
Go to the
Create a new project or select an existing project
Enable the Gmail:
In the Google Cloud Console, navigate to the "APIs & Services" > "Library"
Search for "Gmail API" and click on it
Create OAuth 2.0 Credentials:
In the Google Cloud Console, navigate to the "APIs & Services" > "Credentials" tab
Click "Create credentials" and select "Service account"
From your domain's , go to Main menu menu > Security > Access and data control > API controls
In the Domain wide delegation pane, select Manage Domain Wide Delegation
Click Add new
In the Client ID field, enter the client ID obtained from the service account creation steps above
In the OAuth Scopes field, enter a comma-delimited list of the scopes required for the application
Use the below scopes:
For scanning
https://www.googleapis.com/auth/admin.directory.user.readonly
https://www.googleapis.com/auth/gmail.readonly
For tagging
Description of the fields in the Scan Configuration popup
The below screenshot shows the fields that appear in the Scan Configuration screen.
Please note that not all of these fields are available for all Data Sources.
Set a unique name so that the Data Source is easy to identify.
Credentials
This a dropdown to select the credentials that have already been configured for the Data Source.
Geographic Location
This is to indicate the physical location of the server the data sits on.
Path
This only needs to be defined for a specific location needs to be scanned.
If left blank the entire Data Source will be scanned.
Data Owner
This is the person that is to be the person responsible for the data.
This setting is optional.
If the Data streaming check box is not visible it may be because the license for DDR is not present.
To learn more about getting a license for DDR please reach out to the Getvisibility Enablement Team.
How to configure LDAP connection to gather permissions and access rights for groups, users, and other entities (Trustees) on an LDAP server.
Navigate to Administration -> Data Sources -> LDAP -> New scan
Enter the details of the LDAP server to scan
Name: Give a name to the scan to identify it later
Username: The user must be an admin level and have access to all the LDAP utilities to be scanned. The username should be entered in the format [email protected]
Password: Password for the admin user
Save the configuration
Once the configuration is saved, click on the icon on the right and select Start trustee scan to begin scanning
The scan results can be viewed under Dashboard -> Access Governance
How to configure Atlassian Confluence Cloud connection to scan it.
Log in to
Click Create API token
From the dialog that appears, enter a memorable and concise Label for the token and click Create
Click Copy to clipboard, and save it somewhere secure. It isn't possible to view the token after closing the creation dialog
Navigate to Administration -> Data Sources -> Confluence Cloud -> New scan
Enter the details
Name: Give a name to the scan to identify it later
Username: The email address for the Atlassian account you used to create the token
Save the configuration
Once the configuration is saved, click on the icon on the right and select Start trustee scan to begin the trustee scanning
The scan results can be viewed under Dashboard -> Access Governance
Click on the icon on the right and select Start file scan to begin the files scanning
The results can be viewed under Dashboard -> Enterprise Search
Rancher manages clusters through its control plane. Managed clusters send data to Rancher's central management servers. This includes "always-on" data, exchanged with Rancher whenever the cluster has Internet access, and "on-demand" data, which should be explicitly requested by GetVisibility Support via the Rancher UI.
Cluster Metadata:
Information about the cluster
Nodes list and metadata (IP address, hostname, cluster role, etc.)
K3s version
Health and Monitoring Data:
CPU and RAM usage on each cluster node
Current Metrics (via Prometheus)
Fleet agent heartbeat
Cluster Metadata:
Resource allocation (which Kubernetes resource runs on which node)
Current cluster-level Alerts
Current cluster-level Events
Rancher allows running kubectl exec into running containers, but this feature is blocked by our WAF. Support needs SSH access or screen-sharing with the customer to execute these commands.
None of those categories are critical for operation, and access to Rancher can be disabled after deployment.
At Getvisibility, we understand the modern challenges of data management. With our leadership in Data Security Posture Management (DSPM), we're transforming the way organisations comprehend, classify, and protect their data.
Getvisibility is a DSPM solution that can conveniently connect, discover, classify, and enable the protection of unstructured data in an organisation's data repositories. Our latest update includes connectors for GDrive, a real-time file storage and synchronisation service that is a product within Google Workspace, which has over 9 million paying organisations. Our GDrive connectors provide an easy setup for file scanning to begin and for insights on an organisations data to be delivered at speed.
Utilising Getvisibility's cutting-edge Machine Learning (ML) classification to label files in GDrive represents a significant step in managing your sensitive data, regardless of its origin. Step into the future of data protection with Getvisibility by applying high precision tailored artificial intelligence (AI) coupled with Google Drive's native file labelling to significantly enhance the security of your Google Drive data, automatically and at scale.
With our Google Drive Auto-Labelling feature, you no longer need to manually tag your files. Let our high precision, bespoke artificial intelligence (AI) mechanisms, integrated with Google Drive's native file labelling, classify and protect every document in your GDrive, automatically and at scale.
Remote working, regulatory compliance, constant pressure of cyber attacks bring forward challenges encompassing interoperability, scalability, and governance. These complications can escalate to severe security breaches, including threats like intellectual property theft, both from internal and external sources. It's essential to counter these data security concerns with a robust DSPM solution. Getvisibility's Tailored & Narrow AI, charged by Large Language Model (LLM), aligns perfectly with distinct business needs for precise data analysis. Our state-of-the-art AI system can:
Minimise data handling costs by pinpointing only essential data to keep.
Provide reports on data at risk.
Seamlessly integrate with DLP platforms.
Automatically tag files.
Benefits tailored for you:
Enhanced Data Security: Every file, irrespective of size, is labelled, solidifying its traceability and protection.
Time-saving Mechanism: Move past the era of manual classification. Entrust our machine learning and witness your files being labelled in no time.
Dive deeper with GetVisibilty's GDrive auto-labelling
Why settle for the ordinary? Experience unmatched efficiency and security with our innovative solution. For a comprehensive understanding of how Getvisibility can redefine your organisation's data security landscape, reach out to us or explore our website.
For More Information:
A brief description of DDR
Getvisibility's Data Detection and Response (DDR) solution is designed to protect sensitive data by providing near real-time detection and response capabilities. It ensures that data across user environments are constantly monitored and any potential threats are flagged immediately. DDR focuses on data-centric security, ensuring organisations have visibility and control over their critical information assets.
Real-Time Monitoring: DDR continuously identifies data activities, including access, modification, sharing, deletion, and other activities to identify suspicious and malicious events.
Automated Response: DDR sends instant alerts for quick remediation.
Risk Mitigation: It ensures regulatory compliance with Privacy Compliance standards like GDPR, HIPAA, PCI-DSS, CCPA and other standards.
AI-Powered Insights: DDR leverages proprietary Getvisibility’s AI-mesh models to analyse data context for the best accuracy.
Data Intelligence: It provides dashboards with visibility into sensitive data and risks to your data.
Data Analysis: DDR identifies all data across unstructured data environments and then classifies the data based on its content and context.
Risks Analysis: It evaluates user access, permissions, sharing and data location to identify risks related to your data.
Policy Enforcement: DDR applies predefined and custom security policies to protect data based on its classification and sensitivity.
Incident Response:
To configure DDR rules, follow these steps:
Access the Getvisibility DDR dashboard using your credentials.
Under the DDR tab, select Create Scan Configuration to connect to the data sources to be monitored.
Define Scopes: Specify the data sources that will be connect to.
Verify Configuration: Ensure that at least one data source is successfully connected. A green checkmark will confirm the completion.
Once the scan configuration is complete:
Go to Administration > Live Events Streaming to view real-time events.
Monitor Event Activity: Filter events by source, user name, action type (create, update, delete), and event type.
The Overview Page provides a comprehensive view of DDR's performance:
Event Statistics: Displays the number of events by source, such as Google Drive, SharePoint, OneDrive, and Box.
Data Source Activity: Visualizes active data sources and the volume of events generated by each.
Event Timeline: Shows when events occurred, helping identify peak activity periods and anomalies.
The Open Risks section highlights detected threats, categorised by risk type:
Public Exposure: Identifies sensitive files accessible to external users via public links.
External Sharing: Detects files shared outside the organisation, potentially exposing sensitive information.
Internal Over-Sharing: Flags data with excessive permissions within the organisation.
For each risk, DDR provides detailed insights, including the file path, user activity, and recommended remediation steps.
Listed below are the languages supported by the ML (Machine Learning) classifiers, grouped by language pack.
How to configure a Webhook.
A webhook is a way for one application to automatically send information to another application when something specific happens. For E.g. getting an instant message when a new email is received. It helps different apps talk to each other in real-time.
In DSPM+, the webhook service makes it possible to subscribe to documents after cataloguing/classification stages. When a document passes Cataloguing or Classification, based on GQL provided in webhook and callback URL is sent to the target system (client system). Similarly in EDC webhook can be used to send information to client system based on activity of the users.
The pattern matching functionality allows users to identify particular pieces of information in a document. This is implemented by using Regular Expressions (RegEx) that will try match content.
Select Administration and then Pattern Matching:
A selection of pre-configured patterns available:
If these patterns are detected during a scan they will be presented using the “Keyword Hits” to the user. On the Pattern Matching page, users have access to various Filters and Options:
Getvisibility tackles unclassified data protection across multiple languages. We do this using a cutting-edge in-house Data Science team who are forging a global AI-driven solution. Here at Getvisibility we have intro- duced eleven new ML languages, including Arabic, Chinese, Spanish, and more, providing comprehensive multilingual data insights to our customers. Elevate your Data Security Posture Management (DSPM) with tailored AI that breaks language barriers for informed decisions when fortifying an organisation's data security defences.
How to define a custom taxonomy
To access the Taxonomy screen click on Policy Centre > Compliance Hub.
Once in the screen the default Tags are visible.
To add a label click on the + on the top left of the list of Tags.
In the pop up enter a name for the new Label and optionally a Tag alias.
http_proxy=http://X.X.X.Xhttp_proxy="$PROXY_IP"
https_proxy="$PROXY_IP"
no_proxy="$NODE_IP,localhost,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.svc,.cluster.local"HTTP_PROXY: $PROXY_IP
HTTPS_PROXY: $PROXY_IP
NO_PROXY: $NODE_IP,localhost,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.svc,.cluster.local- name: HTTP_PROXY
value: $PROXY_IP
- name: HTTPS_PROXY
value: $PROXY_IP
- name: NO_PROXY
value: $NODE_IP,localhost,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.svc,.cluster.localFleet bundle synchronization data
Current cluster status (healthy/unhealthy)
Kubernetes Objects:
List Kubernetes objects (usually Pods and Configurations)
Delete one or more objects
Create a new Kubernetes definition
Update existing definition (limited)
Kubernetes Container Logs:
Current logs via Rancher UI
Historical logs via Loki and Prometheus
Health and Monitoring Data:
Historical metrics via Grafana
Securing unclassified data through discovery, classification and protection is not just an English language problem. At Getvisibility we recognise this and have been developing our own cutting edge Data Science team to deliver a truly global product in the world of artificial intelligence (AI) and data security.
Many organisations around the world struggle to adapt to the rising problem of Data Security Posture Manage- ment (DSPM), let alone when dealing with multilingual and multicultural datasets.
There are many data security problems in today's business world and they are growing at an exponential rate. Some of the data problems facing organisations, in any language, are double extortion ransomware, intellectu- al property theft, and data loss prevention (DLP) software which poses some of its own challenges with inter- operability, scalability, and governance.
This is why Getvisiabliy is proud to announce the launch of eleven new ML languages that allow our Getvisibil platform to understand multiple language data files simultaneously. These eleven new languages are:
Arabic
Chinese
French
German
Hebrew
Italian
Polish
Portuguese
Romanian
Spanish
Thai
Our multilingual data discovery ML gives organisations the strategic competitive advantage by delivering increased data knowledge from multiple language data sources within an organisation. This allows better informed data security decisions, at speed and precision with the Getvibiility platform.
Getvisibiliy’s industry leading performance is due to our Tailored & Narrow AI, now across eleven new languag- es. This approach supercharges an organisation's DSPM solutions with our customisable AI, specifically trained with Large Language Models (LLM), that aligns with specific business needs for accurate and optimised data analysis.
Our approach to having an in-house data scientist operation to develop our own language modes is atypical and allows Getvisibility to develop our own model library, pushing the boundaries of data discovery in the industry, month by month. We pride ourselves on continually evolving with the data security industry as this allows our customers to have cutting edge ML to accelerate their organisation’s DSPM and risk management. Another advantage to our approach is the development of native Getvisibility Query Language (GQL) that further provides efficiencies when producing results on sets of files.
The new additions to our innovative ML provides several advantages that can greatly enhance the process of analysing and extracting insights from multilingual and multicultural datasets. These include:
Richer Data Analysis
Multilingual machine learning enables the analysis of data and content in various languages, allowing organisations to access a wider range of data. This enables them to understand patterns across different languages leading to a more comprehensive and accurate understanding of data use in the company. A more complete view of data, which might not be possible if analysis is limited to a single language.
Efficient Information Retrieval
Multilingual ML improves search and information retrieval processes by accurately understanding and interpreting queries in various languages. This is done by enabling GQL searches on our tags in multilingual, ensuring that users can find relevant information regardless of their language preference, providing a complete understanding of an organisation's data environment and security through comprehensive multi-language reporting.
Language-agnostic Insights
Some data insights might not be specific to a single language. Multi-lingual ML can help identify and analyse data security trends that transcend linguistic boundaries.
Getvisibility’s cutting-edge ML has a language model that does not need language detection, as its inputs can read documents in any mixture of the languages it supports. This integration of Getvisbility’s multilingual ML in data discovery builds on the best-in-class five pillars of data security: Know where the data is stored, Know the sensitivity of the data, Know who has access to the data, Know the flow of the data through the ecosystem, and Know how well is the data protected. This will lead to more accurate insights, improved decision-making, and a deeper understanding of potential security threats across languages.
Improve the competitive advantage of your multinational or multilingual organisation with the latest innovative addition to Getvisibility’s DSPM solution. Get in touch and let's start your journey to a multilanguage DSPM solution today. For More Information: www.getvisibility.com




































IP Address: The IP Address of the server where the LDAP is installed
Certificate (Optional): If the server to be scaned uses LDAPS (LDAP over SSL/TLS) enter the certificate text here. Otherwise leave it blank
Port: 389 is the default port for LDAP, however for Secure LDAP 636 is used
Use Global Catalog ports at 3268 (LDAP) and 3269 (LDAPS), in case standard ports doesn't allow us to traverse through the whole LDAP tree
Inactivity: This defines inactive users. Default is 90 days
Search base: This is the point in the LDAP directory where Focus will start searching from. In this example:
DC stands for Domain Component. An attribute used to represent domain levels
aws-gv is the name of the first-level domain
local is the top-level domain
Together, DC=aws-gv,DC=local represents the domain aws-gv.local





Chinese
English, Chinese (Simplified, Traditional)
Finnish
English, Finnish
West-Slavic-3
English, Polish, Czech, Slovak
German-Dutch
English, German, Dutch
Nordic-3
English, Danish, Swedish, Norwegian
Hebrew
English, Hebrew
Greek
English, Greek
Korean
English, Korean
Thai
English, Thai
If additional language packs are needed after the initial setup please reach out to support for assistance as each additional pack is a separate AI model that needs to be added.
Name
Languages in Pack
Arabic
English, Arabic
Turkish
English, Turkish
Hindi
English, Hindi
Latin-5
English, French, Spanish, Portuguese, Italian, Romanian
Japanese
English, Japanese
Click the "Enable" button to enable the Goolge Drive Activity API for your project
Enter a name in the Service account name field and CREATE AND CONTINUE
Under Grant this service account access to the project, select role as Owner and click DONE
Select the newly created service account and click Keys > Add Key > Create new key
Make sure the key type is set to json and click Create
The new private key pair is generated and downloaded to the machine. Note the values of private_key, client_email and client_id
https://www.googleapis.com/auth/gmail.modify
https://www.googleapis.com/auth/gmail.labels
https://www.googleapis.com/auth/gmail.metadata
Click Authorize







Domain: The Atlassian domain
Click on the Folder icon in Path to select a particular space to scan, or leave the path as empty to scan all spaces







If a specific path has not been set, the entire Data Source will be scanned.
Metadata (path, size, format, etc.) and permissions are extracted and recorded for each file.
This step ensures that every every file and folder is identified and that access permissions are understood.
The scan discovery process can have the following statuses, reflecting its progress:
Not Started: Data Source has been added but the scan has not started.
Queued: Scan has been put into the queue for the execution.
Failed To Start: Scan unable to start, usually due to issues with permissions or network.
In Progress: Scan is actively running and processing data discovery.
Cancelled: Scan was manually stopped or automatically aborted.
Incomplete: Scan is partially completed but permissions to files were changed during scan.
Completed: Scan has successfully finished Discovery phase.
These statuses can be seen in the Last Scan Status column.
Metadata information is processed for each file that has been collected as part of the Discovery step.
A detailed analysis of each file's metadata is performed .
Permissions are analysed and the shared level is identified.
A detailed analysis of each file's content is performed.
Content is extracted and the sensitivity level and risk of each file is determined for classification.
This is determined by the Patterns/Detector setting and the AI Mesh
This ensures that sensitive information is properly identified and protected.
This is a scan to determine the Users and Groups present in a Data Source.
Metadata is extracted for each user, with specific fields depending on the data source. Some of the fields that will be picked up by the scan include Enabled, Last Login, Last Modified, etc.
The statuses for these scans are the same as for files but there are two additional ones.
Completed Only Users: The scan has been completed only for user-specific policies.
Completed Only Groups: The scan has been completed only for group-specific policies.
To see additional information on a running or completed scan click on the Scan Analytics Icon.
This will pop out the Analytics sidebar where there is information such as scan duration, how many files have been scanned, classification insights, etc.

Go to Rancher dashboard and wait for the new cluster to become Active:
Select the cluster name and go to Apps > Charts and install the GetVisibility Essentials Helm chart:
After Getvisibility Essentials make sure to Enable Elastic Search
4. Go to Apps > Charts and install the GetVisibility Monitoring Helm chart and install into Project: Default:
5. Go to the global menu Continuous Delivery > Clusters and click on Edit config for the cluster:
6. Add 2 labels product=Focus environment=prod and press Save.
Updates and custom settings are automatically applied to all Focus backend services as long as the cluster has access to the public internet and can connect to the management server.
In case there’s no internet connection or the management server is down, the cluster agent will keep trying to reach the management server until a connection can be established.
To upgrade K3s from an older version to a specific version you can run the following command:
Stop the old k3s binary (e.g. systemctl stop k3s) and start it again (e.g. systemctl start k3s). For more details please refer to the official documentation.
By default, certificates in K3s expire in 12 months. If the certificates are expired or have fewer than 90 days remaining before they expire, the certificates are rotated when K3s is restarted.
Find the IP of the server where Consul is running (in case you have a multi-node cluster):
Log into the server using SSH and execute the following command to take a snapshot of Consul:
Find the path where the snapshot has been save to:
Copy the snapshot file to a safe place.
Find the IP of the server where the PostgreSQL master is running (in case you have a multi-node cluster):
Log into the server using SSH and execute the following command to backup all databases:
Find the path where the backup has been save to:
Copy the backup file to a safe place.
Browse to App Registration and select New registration
On the App Registration page enter below information and click Register button
Name: (Enter a meaningful application name that will be displayed to users of the app)
Supported account types:
Select which accounts you would like your application to support. You should see the options similar to below. You can select “Accounts in this organizational directory only”:
Leave the Redirect URI as empty and Click Register
Note the Application (client) ID, Directory (tenant) ID values
Navigate to Manage -> Certificates and secrets on the left menu, to create a new client secret
Provide a meaningful description and expiry to the secret, and click on Add
Once a client secret is created, note its Value and store it somewhere safe. NOTE: this value cannot be viewed once you leave this page
Navigate to Manage -> API permissions on the left menu, and Add a permission
Select Microsoft APIs -> Microsoft Graph
Select Application permissions
Permissions required
For scanning
Microsoft Graph > Application permissions > Mail > Mail.Read
Microsoft Graph > Application permissions > User > User.Read.All
Microsoft Graph > Application permissions > DeviceManagementApps > DeviceManagementApps.Read.All
Microsoft Graph > Application permissions > MailboxSettings > MailboxSettings.Read
For tagging
Microsoft Graph > Application permissions > Mail > Mail.ReadWrite
Once all the required permissions are added, Grant admin consent to them
Go to the dashboard: Administration -> Webhooks -> Create webhook.
Provide URL from 'Your unique URL ' from webhook.site; input GQL; name and webhook status. Please see below for the description of all the options.
Provide Callback URL from 'Your unique URL ' from webhook.site
If there are some scans in progress, data in cataloguing/classification-result in some moments you'll see requests coming to webhook.site (if not - manually scan a folder).
A collection of data that serves as the content triggering the webhook or being sent by the webhook to a specified endpoint. When an event occurs that matches certain conditions, the webhook system will package relevant data from the dataset and send it to a predefined URL.
This contains information about files, such as their ID, types, path, version, etc. This information is based on content related to DSPM
This dataset captures activities or actions performed by users or systems, such as logins, updates, deletions, or other significant events. This is mainly from endpoint Agents.
This dataset relates to live events and the content connected to DDR.
This query language is designed to enhance the flexibility and efficiency of querying data through DSPM+,DDC and EDC platforms. It enables users to craft custom queries without the need of hard coding, significantly simplifying the process of filtering through and analyzing the data. On the webhook page it can be used to setup information using the datasets available.
A callback URL is a specific URL endpoint provided by a user or a system to receive data from another system when a certain event occurs. When the event is triggered, the webhook sends an HTTP POST request to the callback URL, delivering the relevant data payload. This mechanism allows real-time communication between systems, enabling automated workflows and immediate data synchronization
This filed allows user to give a unique and meaningful name to the webhook.
Users can create multiple webhooks and choose whether to keep it active or deactivate it based on the requirement.
User also has the option to edit and delete a webhook
Filters and options explained:
Search: Enter text here to filter patterns based in name
Classification: Filter by classification tags associated with patterns
Compliance: Filter by compliance tags associated with patterns
Distribution: Filter by distribution tags associated with patterns
Categories: Filter by file categories associated with patterns
Subcategories: Filter by file subcategories associated with patterns
Enabled: Filter by patterns that have been enabled or disabled.
Published: Filter by patterns that have been published or unpublished
Add New Pattern: Create a custom pattern
Publish: Push changes to the pattern matching system for start using
Clear filters: Remove all previously selected filters
Options to create pattern matching explained:
Pattern Name: identifies the RegEx when it is found by the software
Regular Expression: the sequence to be matched
Enabled: whether the pattern will be searched for by the software
Hide RegEx in UI: obfuscates the regular expression
Tag Overrides: when the RegEx is found these tags will be written to the file
Classifications: security levels
Compliance: regulations that apply to data
Distribution: policies on how data should distributed
Category: data grouping
Subcategory: data subgrouping
Cancel: exit without saving
Create: save pattern information and exit
RegEx: Regular Expression, a sequence or pattern that is searched for in text. Ex-ID uses Java RegEx notation.
Rules: Instructions for Ex-ID about what to do when a RegEx is detected in a file.
Pattern: The RegEx and rules associated with its detection.
Pattern Name: Used to identify the pattern when it is detected.
Classification: Tags that help secure documents and other files. e.g. Public, Internal, and Confidential.
Compliance: Tags that help organisations conform to certain regulatory regimes. By applying compliance tags such as GDPR/PII to RegEx such as Social Security number, organisations can identify all related documents.
Distribution: Tags that specify how a files should be moved either within or outside an organisation.
Category: From Getvisibility’s ML model. These are groupings of information based on their use. e.g. Finance, HR, or Technical Documents.
Subcategory: From Getvisibility’s ML model. These are sub-groupings of information based on their particular use. e.g. CV (resume), Code, or Sales Agreement.
Publish: The action of pushing the enabled patterns to be used. As some parts of the system need to be restarted in order to take on a new pattern matching configuration, we allow users to chose when to enact the configuration so as not to impact the workflow of others.
Unpublished: A pattern that has been created, changed, or edited but has not been pushed to the pattern matching system.
Published: A pattern that is currently part of the pattern matching configuration.
Disabled: A pattern that is currently part of the pattern matching configuration but is not to be detected.
Enabled: An active pattern. One that is part of the configuration and will be used by the pattern matching system.
Hide RegEx: Ex-ID allows for RegEx notations to be obfuscated for security and intellectual property reasons.


If you are using a dedicated partition (/var/lib/rancher) to run K3s make sure to NOT have mounted it using noexec flag inside /etc/fstab file.
If you have FIPS mode enabled is necessary to disable it otherwise some of our workloads running in K3s will crash at startup. To check if FIPS is enabled run:
In order to disable, please refer to the instruction below:
Please visit this KB Article if you want to know more.
RHEL like systems have buggy version of iptables 1.8.4 which is causing issues with firewall, service routing and external network reachability as well as performance issues. It is required to configure k3s to use bundled version by modifying k3s service( same for k3s-agent service on worker nodes in HA deployments) file and adding --prefer-bundled-bin option to service’s cmd and restarting service.
If this change is done on existing system reboot is recommended to clear duplicate iptables rules.
More details can be found here - Known Issues | K3s.
Settings: Default Project | All Permissions
Note that this must be a new key. Once the Compliance API scopes are granted, all other scopes are revoked.
Reminder: This key can only be viewed/copied once. Store it securely.
Send an email to [email protected] with:
The last 4 digits of the API key
The Key Name
The Created By Name
The requested scope (read and delete)
The OpenAI team will verify the key and grant the requested Compliance API scopes.
Administrators may then use this key or pass it to a partner for use with the Compliance API.
Workspace IDs can be found on the Admin dashboard
Navigate to Administration -> Data Sources -> ChatGPT -> New scan
Provide the workspace id and the api key value obtained from above steps
Click on the Folder icon in Path to select a particular user or gpt to scan, or leave the path as empty to scan all
Save the configuration
Once the configuration is saved, click on the icon on the right and select Start trustee scan to begin the trustee scanning
The scan results can be viewed under Dashboard -> Access Governance
Click on the icon on the right and select Start file scan to begin the files scanning
The results can be viewed under Dashboard -> Enterprise Search
Once the names have been added click on ACCEPT.
The New Tag will now appear at the top of the list but it will not be created until Save has been clicked.
The tags can be re-ordered in order of risk level by clicking on the six dots, with the riskiest being on the bottom.
Below the Top Secret tag that was just created has been moved to the bottom and therefore riskiest position.
To have new tags incorporated into the AI Mesh please reach out to Support.


Data Controls enable organizations to apply security and compliance conditions on the data assets in their systems, and apply actions on those rules when they are identified.
They are important for security and regulatory compliance as they help orchestrate the data handling within an organisation while ensuring stakeholders and data owners are involved.
They are set up during the configuration of the system and refined as the DSPM journey proceeds. They are used by data owners, CISOs, and other stakeholders throughout an organisation.
The data control rules are set using GQL, this can granularly define the files, users, or other assets that exist within the organisation and specify under which conditions the rule should activate.
A graphical display of any recent condition-activations can be viewed as well. Automated actions can be applied to the rule where users can choose to alert using messaging apps or webhooks.
The rules are configured in the DSPM platform under Data Controls. Simply select Create New Rule and follow the below instructions. The rules will be triggered during a scan of the particular dataset the rule applies to.
In this example we will create a rule to find HR related data that is at high risk. We will assign ownership and set up a slack message to alert a specific channel.
1. On the Data Controls page of DSPM, select Create new rule
SCREENSHOT is outdated (DSPM / policy center; data rule / control orchestration)
Enter the following data to create the rule
Name: To identify the rule amongst many that can be created
Description: Useful for others to understand the intention of the rule
Ownership: The person who is responsible for the rule and its consequences
Based on group: The data asset that this rule is associated with. These are granularly defined in the Data Asset Registry.
Select Accept
This screen allows you to further refine the rule and set the actions
SCREENSHOT is outdated (DSPM / policy center; data rule / control orchestration)
At the top of the screen: the name, description, and owner are visible, as well as the creation date. The option to assign rule severity is also available. As this rule, if it were breached, has the potential to incur severe consequences such as legal and financial penalties, we will set it as High.
In the select dataset dropdown, we need to define the entity types we are setting our conditions for. (In the backend this relates to separate databases). The choice will be for files, trustees, and activities.
Files: unstructured data classified during discovery
Trustees: the users and groups discovered during IAM scans
Activities: the usage statistics of the endpoint agents (FDC)
We will select files in this example.
The condition section will be pre-loaded with a GQL if you have selected a Data Asset Group. Here it is simply path=HR and we can see that there are some recent files that match this criteria.
We will refine the search further by adding the condition that the HR files found will be high risk. AND risk=2
The platform has three levels of risk: low, medium, and high. Their respective values in GQL are: 0, 1, and 2
As can be seen, no files have yet to fall under this rule.
We can create an action so that we can catch high risk HR files going forward.
Scroll to below the condition and select Create Action. In the Action type dropdown you can choose a simple Webhook or a Slack Webhook. Here we will add a Slack Webhook that will notify a Slack channel when the data control is activated.
Multiple actions can be created for the same data control.
Select UPDATE to save the control, and that’s it! Once scanning commences we will get notified in Slack, as well as on the Incidents page.
SCREENSHOT is outdated (DSPM / policy center; data rule / control orchestration)
Please refer to K3S installationhere for installation requirements.
All the commands in this runbook were tested against a VM with Canonical, Ubuntu, 20.04 LTS, amd64 as root.
Install K3s and wait for the pods to become ready:
Install Helm 3:
Configure GetVisibility private Docker image registry (replace $USERNAME and $PASSWORD with the Docker credentials provided):
Configure GetVisibility private Helm repository (replace $USERNAME and $PASSWORD with the Helm credentials provided):
Install chart GetVisibility Essentials and wait for the pods to become ready.
For Synergy (32 GB RAM machines), please replace keycloak.url= IPADDRESS/DNS/FQDN with IP Adress or FQDN or DNS name for Keycloak in formats like below
or or
For Focus and Enterprice (48+ GB RAM machines), please replace keycloak.url= IPADDRESS/DNS/FQDN with IP Adress or FQDN or DNS name for Keycloak in formats like below
or or
Depending on the theme you want to use, run one of the commands below to install the chart GV Platform (don’t forget to replace $PRODUCT with either synergy or focus or enterprise):
a. GetVisibility theme:
b. Forcepoint theme:
In order to upgrade the essential services (e.g. Postgres, Kafka, MongoDB, Consul) run helm repo update and then the same command you used in step 5.
In order to upgrade Synergy/Focus/Enterprise run helm repo update and then the same command you used in step 6.
You can optionally also install monitoring tools to collect metrics and logs of all services that are running inside the cluster:
Install the CRDs:
Install the monitoring tools:
To access the Grafana dashboard run the command:
To access Prometheus dashboard run the command:
There are various authentication protocols that can be used depending on use case. This guide outlines the steps to configure User Federation in Keycloak.
To authorize users for the GetVisiblity dashboard (not Keycloak itself), ensure that the gv realm selected in the top left, not master (unless the aim is to authorize LDAP users to use Keycloak):
Click on the User Federation menu item on the left pane. This should load a list of configured user federations (none at first).
Click on Add Ldap providers to load the LDAP (Lightweight Directory Access Protocol) configuration
Update the Connection URL field to reflect the LDAP server address where the Active Directory is hosted
Click on the button Test connection to test the connection from the Keycloak instance to the LDAP server address. This should succeed quickly. If it hangs, the LDAP server (i.e. a domain controller) may be blocking connections from the Keycloak server address (i.e. the IP of the server running the GetVisibility product). The Public IP address of the LDAP server may need to be used.
Update the Bind DN field to reflect the user used to access the LDAP server. In this case, the user with username “admin” from the domain “”.
Update the Bind credentials field (see the above image) to contain the password used to access the LDAP server
Click “Test authentication” to confirm that the provided credentials work as expected:
Update the Users DN field to contain the Full DN of the LDAP tree where your users are.
The above value for the “Users DN” field will import all users to the gv realm. All users within the “” domain will get full administrative access for the GetVisiblity dashboard.
If this is not desired, make restrictions to which users are imported. Often, just restricting by OU is not granular enough.
In this scenario, use in the User LDAP filter field, like so: (memberOf=cn=My Group,dc=domain,dc=com)
Combining (“AND”) with other criteria: (&(theAttribute=theValue)(memberOf=cn=My Group,dc=domain,dc=com))
Within Synchronization settings, set up automatic synchronization of users from the LDAP Active Directory to Keycloak. Here the auto-synchronisation settings can be configured.
Click the Save button at the bottom of the screen.
To get the users into the Keycloak DB, the users need to to be synchronised for the first time (before the automatic synchronization happens, if applicable).
Click the button Synchronize all users to immediately fetch all of the LDAP Active Directory users and load them into the Keycloak instance DB
Usually, any issues that occur during the LDAP Active Directory configuration process above will be related to Network accessibility concerns or authentication credentials being incorrect.
However, if additional support is needed or the problem is not easily resolved by troubleshooting Network communications and authentication details, please reach out to .
Template Language (GTL) allows using variables, functions and conditions in any report text input fields and compiles those expressions into resulting values.
Example
Example with actual data
Two possible syntaxes parsed by GTL:
Expressions
Conditions
Any text placed between {{ and }} is considered an Expression. Expression is a mathematical or logical operation that returns a value. For example, this {{ 10 }} is an expression that returns the number 10. It is possible to use operators, {{ 10 + 5 }} will return 15. Logical operators are also supported, {{ 10 > 5 }} will return true. Here is a list of supported operators:
+ - * / % = != > < >= <=
Expression can also contain variables, that are defined in the current context. For example,
{{ file.name }} will return the name of the file, if the file object is defined.
Expression Functions
But the most powerful feature of the expressions is the ability to call functions. These are predefined aggregation functions, that fetch data from the database and return the result. For example,
{{ count('files') }} will return the number of files in the database.
Here is a list of supported functions:
count
sum
avg
max
Those functions support the following parameters:
Dataset name - the name of the dataset to fetch data from. Possible values are: files, trustees, connectors, agents, activities.
GQL - the GQL query to filter the data. For example, fileType=doc OR fileType=txt will return only files with the doc or txt
Conditions are useful when you want to display different text based on some condition. Example:
else clause is optional and can be omitted:
The if statement is followed by a condition in parentheses. The condition must be any expression that returns a boolean value.
A webhook is a method used in web development to enhance or modify the behavior of a web page or application through custom callbacks. These callbacks are automated messages sent by applications when specific events occur. Triggered by events in a source system, webhooks generate HTTP requests with payload data, which are sent to a destination system. Webhooks enable real-time communication between different applications, allowing them to exchange data seamlessly and synchronize processes. Developers, even if not affiliated with the originating application, can manage and modify these callbacks. This event-driven communication approach finds applications in various scenarios, enhancing automation and integration between different software systems.
Webhooks are used by Security Information and Event Management (SIEM) software to enhance security monitoring and incident response. SIEM tools integrate with webhooks to receive real-time event notifications from various sources, such as authentication systems, cloud services, or other security tools. These notifications trigger automated actions in the SIEM, allowing it to detect and respond to potential security threats promptly. Webhooks provide a seamless way to feed event data into SIEM systems, enhancing threat detection, analysis, and reporting capabilities. This integration enables organizations to achieve more effective and efficient security operations, as SIEM software can aggregate and correlate data from diverse sources to provide a comprehensive view of the security landscape. The result is improved incident response and better protection against cyber threats.
Pipedream is an integration platform designed for developers to connect APIs rapidly using a low-code approach. It allows users to create workflows that integrate different applications, data sources, and APIs, without the need for extensive coding. Pipedream facilitates event-driven automations by providing a hosted platform where users can develop and execute workflows that streamline processes and automate tasks. With Pipedream, developers can build efficient connections between various services and systems, reducing the need for manual intervention and accelerating development cycles. The platform offers open source connectors and supports multiple programming languages like Node.js, Python, Go, and Bash. Pipedream simplifies the integration of disparate apps and enables developers to create effective workflows with ease, contributing to enhanced efficiency and productivity in software development.
In , a workflow is a sequence of steps that automate processes and connect APIs. Workflows make it easy to create and manage integrations, allowing developers to connect different applications, services, and data sources. Workflows consist of steps that are executed in order, and they can include actions, code, and triggers. Triggers define when a workflow is initiated, such as through HTTP requests or scheduled intervals. Each step in a workflow can perform actions like connecting to APIs, manipulating data, and more. Pipedream enables users to create workflows with code-level control when needed, and even offers a no-code approach for automation. Workflows in Pipedream simplify the automation of complex tasks, integration of APIs, and the creation of event-driven processes.
Trigger is a fundamental concept that defines the initiation of a workflow. Triggers specify the type of event or condition that starts the execution of a workflow. These events can include HTTP requests, data from external apps or services, scheduled intervals, and more. When a trigger event occurs, the associated workflow is automatically initiated, and the defined steps within the workflow are executed sequentially. For instance, you can set up a trigger to activate a workflow when an HTTP request is received at a specific URL, allowing you to automate actions based on external events. Pipedream's triggers enable developers to create dynamic and event-driven workflows that respond to various inputs and conditions, enhancing automation and integration capabilities.
Create your first trigger by using [New HTTP/Webhook Requests] option.
No need to configure anything here.
We are going to use later newly created URL when configuring a webhook in Focus.
Please refer to for guidance how to setup webhooks.
To capture classification events we are using “flow=CLASSIFICATION” query.
Please refer to to guidance on how to start a scan.
We can continue with our workflow after first event reach out Pipedream workflow.
After configuring Pipedream to add rows to our test spreadsheet in Google Sheets our workflow is completed.
We can now Deploy is and head over to the Sheet to check it in action.
The workflow is now completed and as a result out Sheet is being populated with classification events.
How to configure IAM connection to gather permissions and access rights for groups and users on an AWS IAM.
Sign in to the AWS Management Console and open the IAM console with the appropriate admin level account
In the navigation pane on the left, choose Policies and then choose Create policy
In the Policy editor section, find the Select a service section, then choose IAM service, and select Next
In Actions allowed, choose the below actions to add to the policy:
Read > GetUser
Read > GetPolicyVersion
Read > GetPolicy
For Resources, choose All and select Create policy to save the new policy
Sign in to the AWS Management Console and open the with the appropriate admin level account
In the navigation pane on the left, choose Users and then choose Create user
On the Specify user details page, under User details, in User name, enter the name for the new user, example iam-connector-user and select Next
On the Set permissions page, select Attach policies directly and choose the policy created in above steps
Select Next
Once the user is created, select it, and from the user page, choose Create access key
Select Other then Next
Enter a description if you wish and select Create access key
The Access and Secret Access Keys have now been created. These can be downloaded as a CSV, and also copied from this section. NOTE: the secret access key cannot be viewed once you leave this page
Navigate to Administration -> Data Sources -> AWS IAM -> New scan
Provide the access key and secret access key values generated in the above steps
Save the configuration
Once the configuration is saved, click on the icon on the right and select Start trustee scan to begin the scanning
The scan results can be viewed under Dashboard -> Access Governance
Overview of Lineage
Data Lineage in Getvisibility provides a comprehensive view of a file's lifecycle, tracking its origin, movement, transformation, and usage. This enhances security, compliance, and forensic investigations by offering end-to-end visibility into data activities.
Traditional data monitoring provides static snapshots, which quickly become outdated, especially for large datasets. Real-time lineage addresses this by:
Reducing Dependency on Rescans: Once streaming is enabled, changes are captured instantly.
Improving Visibility: Organizations can see data movements in near real-time.
Enabling Faster Incident Response: Security teams can quickly assess and respond to threats.
Data Lineage was developed to enable forensic investigations, ensuring organisations can:
Investigate Incidents: Identify the root cause of security incidents, such as data breaches or unauthorised sharing.
Enhance Compliance: Maintain audit trails for regulatory requirements.
Support Risk Mitigation: Quickly respond to suspicious activities and apply appropriate remediation actions.
Connection to Each Data Source: Ensure that each Data Source to be monitored has been configured in Getvisibility.
Enabling Streaming: Activate real-time event streaming for each connector.
From Enterprise Search: Select a file and click on "Lineage" in the dropdown.
From Open Risks: Identify a flagged file and expand the side menu.
Event Type (Create, Modify, Delete, Share, Move, etc.)
Data Source
User Activity
Export lineage details to CSV for auditing and reporting.
Green: Normal activity
Yellow: Medium-risk events (e.g., permission changes)
Red: High-risk events (e.g., external sharing)
Lifecycle: Displays the complete lifecycle of a file from creation to current state.
Event Timeline: Chronological list of all file-related actions.
User & Device: Shows which users and devices interacted with the file.
File Path: Original and current locations of the file.
Create
Modify
Delete
Change Permissions
Share
Move
Copy
Google Drive: Audit log events available.
Azure (SharePoint Online, OneDrive, Blob, Files): Audit log events supported.
Box & Confluence: Extended events available in regular logs.
AWS S3, SMB, Dropbox: Limited to Create, Modify, and Delete.
Lineage supports forensic investigations, such as:
External Sharing Investigation: When a file is shared externally, security analysts can trace its history to determine if the action was intentional or accidental.
Suspicious Activity Investigation: If a user accesses and downloads sensitive information after a password reset, lineage provides detailed insights.
Incident Response: Analysts can determine what actions to take, such as revoking access, quarantining files, or addressing user behaviour.
Enterprise Search: Select the file, click the dropdown, and choose "Lineage."
File View: Expand the file details and navigate to the "Lineage" tab.
Event Description: Hovering over event icons shows a brief description.
Export: Export the entire lineage history, including metadata, to CSV for audit trails and reporting.
Data Lineage empowers organisations with real-time visibility, advanced threat detection, and comprehensive forensic capabilities, ensuring sensitive data remains secure and traceable.
How to configure SharePoint On-Premise connection to scan it.
Navigate to Administration -> Data Sources -> SharePoint On-Premise -> New scan
Provide the Domain URL, an admin username and its password
Click on the Folder icon in Site and path to select a particular site to scan, or leave the path as empty to scan all sites
Save the configuration
Once the configuration is saved, click on the icon on the right and select Start file scan to begin the scanning
The results can be viewed under Dashboard -> Enterprise Search
An admin level user is required to scan and tag files in SharePoint On-Premise. The user must be a member of Site Owners Group where they have full control permissions to the SharePoint site.
The default Getvisibility tags need to be created as a new column in their SharePoint. This process is described below:
In SharePoint, navigate to Documents
In the files view, select + Add column
The connector supports SharePoint 2013, 2016, 2019.
Getvisibility DDR offers a Quick Start option for enabling out-of-the-box data controls
Go to Administration > Quick Start.
Under the Data Controls section, enable predefined DDR rules, such as:
Public Exposure of Personal Identifiable Information (PII).
Detection of Protected Health Information (PHI).
Monitoring of Payment Card Industry (PCI) data.
Import the desired Control Rules to start monitoring immediately.
Server sizing to utilise Getvisibility products.
Note that any sizing provided can be subject to change due to environmental variables. Below we outline some of the variables that can affect the sizing:
Server resources. e.g. physical vs virtual, physical underlying CPU, RAM, and disc specs/speeds and whether they are shared or dedicated.
Network (speed, latency and throughput)
Geolocation
This guide details how to create and configure an iManage connector to scan an on-premise iManage Work Server.
To connect Forcepoint DSPM to your iManage server, you will need to gather three key pieces of information:
Your Server's URL: The fully qualified domain name of your iManage server (e.g., imanage.mycompany.com).
An Application Client ID: A unique ID from your iManage Control Center that identifies the Getvisibility application.
This guide provides steps on how to enable real-time data streaming for a SMB connection and monitor streaming events within the Getvisibility platform.
This guide provides steps on how to enable real-time data streaming for a OneDrive connection and monitor streaming events within the Getvisibility platform.
How to set up a Detector
Detectors are features that allow users to set up alerts for certain parameters during a classification search. A user can set up a Detector to search for keywords within the entire contents of a document or file, as well as search for keyword hits within the file's pathname. It uses advanced AI and ML search techniques such as Fuzzy Word Search and Percolation to search through documents much more quickly than a traditional pattern-matching search, such as using Regular Expressions.
An example of a Detector that a user could set up is “Employee Salary”. A user might want to ensure that documents that contain this information are not publicly shared or shared internally throughout an organisation.
How to create an iManage Connector app to connect to iManage accounts for the cloud.
Registering an iManage App
To register iManage App you need to contact iManage support by sending an email to
Once an account is created login to
Click on username in the upper right corner and click Control Center
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=vX.Y.Z-rc1 sh -kubectl get pod/gv-essentials-consul-server-0 -o jsonpath='{.spec.nodeName}'kubectl exec -it gv-essentials-consul-server-0 -- consul snapshot save /consul/data/backup.snapkubectl get pvc/data-default-gv-essentials-consul-server-0 -o jsonpath='{.spec.volumeName}' | xargs -I{} kubectl get pv/{} -o jsonpath='{.spec.hostPath.path}'kubectl get pod/gv-postgresql-0 -o jsonpath='{.spec.nodeName}'
kubectl exec -it gv-postgresql-0 -- bash -c "pg_dumpall -U gv | gzip > /home/postgres/pgdata/backup.sql.gz"kubectl get pvc/pgdata-gv-postgresql-0 -o jsonpath='{.spec.volumeName}' | xargs -I{} kubectl get pv/{} -o jsonpath='{.spec.hostPath.path}'systemctl disable firewalld --now
systemctl disable fapolicyd.servicesystemctl disable nm-cloud-setup.service nm-cloud-setup.timer
rebootsysctl crypto.fips_enabledfips-mode-setup --disable~$ cat /etc/systemd/system/k3s.service
ExecStart=/usr/local/bin/k3s \
server \
'--node-name=local-01' \
'--prefer-bundled-bin' \
~$ sudo systemctl daemon-reload
~$ sudo systemctl stop k3s
~$ sudo systemctl start k3s
~$~$ sudo reboot File Totals
{{ count('files') }} were discovered in the {{ connector.name }} cloud.
Of this number {{ count('files', 'fileType=doc OR fileType=txt') }}
were classified as they contain text, or are recognisable file types or data.File Totals
1000 were discovered in the Confluence cloud.
Of this number 800 were classified as they contain text, or are recognisable file types or data.

















Read > GetUserPolicy
List > ListUserPolicies
List > ListAttachedGroupPolicies
List > ListAttachedUserPolicies
List > ListGroups
List > ListUsers
List > ListGroupsForUser














Upload
Download






























































median
Attribute - the attribute to aggregate. All functions, except count, require this parameter. For example: sum('files', 'fileType=doc OR fileType=txt', 'contentLength') will return the sum of the sizes of all files with the doc or txt file type.
curl -sfL https://assets.master.k3s.getvisibility.com/k3s/k3s.sh | INSTALL_K3S_VERSION="v1.26.10+k3s1" K3S_KUBECONFIG_MODE="644" sh -s - server --node-name=local-01
kubectl get deploy -n kube-system --output name | xargs -n1 -t kubectl rollout status -n kube-systemwget --quiet https://raw.githubusercontent.com/helm/helm/master/scripts/get -O /tmp/get_helm.sh \
&& chmod 0755 /tmp/get_helm.sh \
&& /tmp/get_helm.sh -v v3.8.2kubectl create secret docker-registry gv-docker-registry \
--docker-server=https://images.master.k3s.getvisibility.com \
--docker-username=$USERNAME \
--docker-password=$PASSWORD \
[email protected]helm repo add gv_stable https://charts.master.k3s.getvisibility.com/stable --username $USERNAME --password $PASSWORD
helm repo updatehelm upgrade --install gv-essentials gv_stable/gv-essentials --wait \
--timeout=10m0s --kubeconfig /etc/rancher/k3s/k3s.yaml \
--set eck-operator.enabled=true --set eck-operator.settings.cpu=1 \
--set eck-operator.settings.memory=1 --set eck-operator.settings.storage=40 \
--set updateclusterid.enabled=false --set keycloak.url= IPADDRESS/DNS/FQDN
kubectl get deploy --output name | xargs -n1 -t kubectl rollout statushelm upgrade --install gv-essentials gv_stable/gv-essentials --wait \
--timeout=10m0s --kubeconfig /etc/rancher/k3s/k3s.yaml \
--set eck-operator.enabled=true --set eck-operator.settings.cpu=8 \
--set eck-operator.settings.memory=20 --set eck-operator.settings.storage=160 \
--set updateclusterid.enabled=false --set keycloak.url= IPADDRESS/DNS/FQDN
kubectl get deploy --output name | xargs -n1 -t kubectl rollout statushelm upgrade --install gv-platform gv_stable/gv-platform --wait \
--timeout=10m0s --kubeconfig /etc/rancher/k3s/k3s.yaml \
--set-string clusterLabels.environment=prod \
--set-string clusterLabels.cluster_reseller=getvisibility \
--set-string clusterLabels.cluster_name=mycluster \
--set-string clusterLabels.product=$PRODUCThelm upgrade --install gv-platform gv_stable/gv-platform --wait \
--timeout=10m0s --kubeconfig /etc/rancher/k3s/k3s.yaml \
--set-string clusterLabels.environment=prod \
--set-string clusterLabels.cluster_reseller=forcepoint \
--set-string clusterLabels.cluster_name=mycluster \
--set-string clusterLabels.product=$PRODUCT$ helm upgrade --install rancher-monitoring-crd gv_stable/rancher-monitoring-crd --wait \
--kubeconfig /etc/rancher/k3s/k3s.yaml \
--namespace=cattle-monitoring-system \
--version=100.1.2+up19.0.5 \
--create-namespace$ helm upgrade --install rancher-monitoring gv_stable/rancher-monitoring --wait \
--kubeconfig /etc/rancher/k3s/k3s.yaml \
--namespace=cattle-monitoring-system \
--version=100.1.2+up19.0.5 \
--set k3sServer.enabled=true \
--set k3sControllerManager.enabled=true \
--set k3sScheduler.enabled=true \
--set k3sProxy.enabled=truekubectl port-forward svc/rancher-monitoring-grafana -n cattle-monitoring-system 3001:80 --address='0.0.0.0' and in your browser type the address http://$SERVER_IP:3001kubectl port-forward svc/rancher-monitoring-prometheus -n cattle-monitoring-system 3001:9090 --address='0.0.0.0' and in your browser type the address http://$SERVER_IP:3001{{ if (count('files', 'sensitive=true') > 0) }}
Sensitive files have been detected!
{{ else }}
You are safe!
{{ endif }}{{ if (count('files', 'sensitive=true') > 0) }}
Sensitive files have been detected!
{{ endif }}{
"roles":{
"client":{
"dashboard":[
{
"name":"ADMIN"
},
{
"name":"AGENT_CONFIGURATION_WRITE"
},
{
"name":"ANALYTICS_READ_ONLY"
},
{
"name":"ANALYTICS_WRITE"
},
{
"name":"COMPLIANCE_HUB_READ"
},
{
"name":"COMPLIANCE_HUB_WRITE"
},
{
"name":"CONNECTIONS_READ_ONLY"
},
{
"name":"CONNECTIONS_WRITE"
},
{
"name":"DATA_REGISTER_READ"
},
{
"name":"DATA_REGISTER_WRITE"
},
{
"name":"DATA_RISK_READ_ONLY"
},
{
"name":"DATA_RISK_WRITE"
},
{
"name":"DEPARTMENTS_FULL_READ"
},
{
"name":"DEPARTMENTS_FULL_WRITE"
},
{
"name":"DEPARTMENTS_PARTIAL_READ"
},
{
"name":"DEPARTMENTS_PARTIAL_WRITE"
},
{
"name":"EXPLORE_FILES_PAGE_READ_ONLY"
},
{
"name":"EXPLORE_FILES_PAGE_WRITE"
},
{
"name":"EXPLORE_TRUSTEES_PAGE_READ_ONLY"
},
{
"name":"EXPLORE_TRUSTEES_PAGE_WRITE"
},
{
"name":"LANGUAGE_SETTINGS_WRITE"
},
{
"name":"PATTERN_MATCHING_READ_ONLY"
},
{
"name":"PATTERN_MATCHING_WRITE"
},
{
"name":"TAGGING_READ_ONLY"
},
{
"name":"USER"
},
{
"name":"TAGGING_WRITE"
},
{
"name":"USER_MANAGEMENT_WRITE"
},
{
"name":"PERMISSIONS_READ"
},
{
"name":"WEBHOOKS_READ_ONLY"
},
{
"name":"WEBHOOKS_WRITE"
},
{
"name":"CLUSTERING_WRITE"
},
{
"name":"REVOKE_PERMISSIONS_WRITE"
}
]
}
}
}Select Choice and then Next
Give the name as Classification and the choices as: Public, Internal, Confidential, Highly-Confidential. Select Save
Similary create Compliance and Distribution columns (if required)
Getvisibility and SharePoint's tags are now aligned
When tags are written to SharePoint files automatically over the API, as the tags are added by Getvisibility, Modified By changes to System Account.
Getvisibility preserves the Modified date where applicable.







Specific location of server. (data center or Azure, AWS, GCP, etc)
Amount of Data
Data info (type, size, number of files, etc)
vendor throttling
Sizing may require adjusting once the platform is operational to meet data classification scanning speed requirements/expectations. Please note that scaling is not linear - you do not need to double the size of the server to double the speed. To double the classification throughput, you may only need to add an additional 2 CPUs and 6GB RAM.
If data/users spread across multiple geolocations a server per location is needed and then you need to size each server accordingly.
The below tables are for environments up to 25,000 users. If a sizing is needed for a larger environment please reach out to Getvisibility.
Synergy (Endpoint agent only) deployment specs:
8
48
1.0
DSPM (DSPM only) deployment spec specs:
16
80
1.2
Ultimate (DSPM + agent) deployment specs:
20
96
5000 Users or Less
8
48
1.0
10,000 Users or Less
16
80
1.5
15,000 Users or Less
24
112
DSPM (DSPM only) single server deployment
5000 Users or Less
16
80
1.2
10,000 Users or Less
32
144
2.4
15,000 Users or Less
48
208
Ultimate (DSPM + agent) single server deployment
5000 Users or Less
20
96
2.2
10,000 Users or Less
40
176
3.9
15,000 Users or Less
60
256
DSPM + agent + Streaming single server deployment
2000 Users or Less
20
96
2.2
5000 Users or Less
40
176
3.4
10,000 Users or Less
80
336
Additional information: The CPU must support the instructions SSE4.1, SSE4.2, AVX, AVX2, FMA.
Only x86_64 architecture is supported. Minimum CPU speed is 2.2 GHz
Browse to App Registration and select New registration
On the App Registration page enter below information and click Register button
Name: (Enter a meaningful application name that will be displayed to users of the app)
Supported account types:
Select which accounts the application will support. The options should be similar to those below. Select �Accounts in this organizational directory only�:
Leave the Redirect URI as empty and Click Register
Note the Application (client) ID, Directory (tenant) ID values
Navigate to Manage -> Certificates and secrets on the left menu, to create a new client secret
Provide a meaningful description and expiry to the secret, and click on Add
Once a client secret is created, note its Value and store it somewhere safe. NOTE: this value cannot be viewed once you leave this page
Navigate to Manage -> API permissions on the left menu, and Add a permission
Select Microsoft APIs -> Microsoft Graph
Select Application permissions
Permissions required
Microsoft Graph > Application permissions > Device > Device.Read.All
Microsoft Graph > Application permissions > Directory > Directory.Read.All
Microsoft Graph > Application permissions > Group > Group.Read.All
Microsoft Graph > Application permissions > User > User.Read.All
Once all the required permissions are added, click "Grant admin consent"
A connection string is needed for the storage account you wish to scan.
Login to Azure Portal
If there are multiple tenants to choose from, use the Settings icon in the top menu to switch to the tenant in which needs to be registered to the application from the Directories + subscriptions menu.
Browse to Storage accounts and select the account to be scanned
Once the storage account is selected, note the Resouce group and Subscription ID values in the Overview page
Navigate to Security + networking -> Access keys on the left menu, and click on Show on the Connection string
Copy this Connection string value
Access Control (IAM) Role assignment
In the storage account, go to Access Control (IAM) and assign Reader role to the azure app created in the first step
Save the changes.
Navigate to Administration -> Data Sources -> Azure Files -> New scan
Provide the Connection string value obtained from above steps
Click on the Folder icon in Path to select a particular share to scan, or leave the path as empty to scan all shares
Save the configuration
Once the configuration is saved, click on the icon on the right and select Start file scan to begin scanning
The results can be viewed under Dashboard -> Enterprise Search
In the Policy editor section, find the Select a service section, then choose S3 service, and select Next. Once S3 service permissions are added, next, move on to IAM service
In Actions allowed, choose the below actions to add to the policy:
For scanning
IAM service
Read > GetUser
Read > GetPolicyVersion
Read > GetPolicy
Read > GetUserPolicy
List > ListUserPolicies
List > ListAttachedUserPolicies
S3 service
Read > GetBucketAcl
Read > GetBucketLocation
Read > GetObject
EC2 service
List > DescribeRegions
For revoke permissions (S3 service)
Permission Management > PutBucketAcl
Permission Management > PutObjectAcl
For tagging (S3 service)
Write > DeleteObject
Write > PutObject
Tagging > DeleteObjectTagging
For Resources, choose All and select Create policy to save the new policy
Sign in to the AWS Management Console and open the IAM console with the appropriate admin level account
In the navigation pane on the left, choose Users and then choose Create user
On the Specify user details page, under User details, in User name, enter the name for the new user, example S3-connector-user and select Next
On the Set permissions page, select Attach policies directly and choose the policy created in above steps
Select Next
Once the user is created, select it, and from the user page, choose Create access key
Select Other then Next
Enter a description if you wish and select Create access key
The Access and Secret Access Keys have now been created. These can be downloaded as a CSV, and also copied from this section. NOTE: the secret access key cannot be viewed once you leave this page
Navigate to Administration -> Data Sources -> AWS S3 -> New scan
Provide the access key and secret access key values generated in the above steps
Click on the Folder icon in Path to select a particular bucket to scan, or leave the path as empty to scan all buckets
Save the configuration
Once the configuration is saved, click on the icon on the right and select Start file scan to begin the scanning
The results can be viewed under Dashboard -> Enterprise Search
This guide will walk you through the steps for your iManage administrator to find this information and how to use it to configure the connector.
Before you begin, ensure the Forcepoint DSPM server has network access to your on-premise iManage server's API. You may need to configure internal firewall rules to allow this connection.
Before you begin, ensure you have the following:
Administrative access to your on-premise iManage Control Center.
The fully qualified domain name (hostname) of your on-premise iManage server (e.g., imanage.mycompany.com).
A dedicated iManage service account with a username and password.
This step must be performed by your internal iManage administrator.
Log in to your on-premise iManage server.
Click on your username in the upper-right corner and select Control Center.
From the side menu, navigate to Applications.
Select Desktop Auth Client from the list.
Copy the Client ID value. This ID is used to identify the Forcepoint DSPM application to your iManage server. You will need this for Part 2 and Part 4.
You can use a command-line tool like curl to perform these one-time steps. Replace your.imanage.server.com with your on-premise server's actual hostname in the commands below.
A. Get Access Token
Run the following command in your terminal. Be sure to replace the placeholder values (YOUR_USERNAME, YOUR_PASSWORD, YOUR_CLIENT_ID) with your actual service account credentials and the Client ID from Part 1.
The JSON response will contain your access_token.
B. Get Customer ID
Run the next command, replacing YOUR_ACCESS_TOKEN with the access_token value you received from the previous step.
The JSON response will contain your customer_id.
This is performed in the iManage Control Center to grant the service account the necessary permissions.
Navigate to Control Center > Roles.
Create or edit the role assigned to your service account.
Grant the following privileges:
For Scanning: System Access > Read-only
For Tagging: Document > Import / Create
For Moving Files: Document > Delete
For Revoking Permissions: System Access > Not Read-only
In the Forcepoint DSPM, navigate to Administration > Data Sources.
Find iManage in the list and click New Scan.
Fill in the connector configuration fields:
Field
Value
Description
Name
My On-Prem iManage
A friendly name for this connection.
Customer Id
(ID from Part 2B)
The numeric Customer ID for your instance.
Username
(Service Account)
The iManage service account username.
Password
(Service Account)
The service account password.
Click Save.
Find your newly configured iManage connection in the list.
Click the ... (three-dot) menu on the right.
Select Start trustee scan to scan permissions (Optional).
Once the trustee scan is complete (optional), click the ... menu again and select Start file scan to scan content.
Permission and access issues can be viewed in Dashboard > Access Governance (if you ran the trustee scan).
File classification and content results can be viewed in Dashboard > Enterprise Search.
From the Data Sources page, select SMB from the list of available data sources. In the Scan Configurations list Create New Configuration
Make sure the connection has a Name, Credentials are set. Then select the SMB share Path that is going to listen.
After selecting the folder, Select the Data streaming checkbox:
Follow the download tab link and installation instructions of SMB agent:
Follow installation instructions for SMB streaming agent:
This section addresses the different methods to install the SMB Connector on a single machine.
OS: Windows Server 2016 or later.
Processor: 2 GHz or faster, 2 cores (64-bit processor recommended).
Memory: 4GB RAM.
Hard Disk: 1GB free space.
Administrator Privileges: user needs admin permissions to install.
must be installed.
The SMB Connector supports various configuration options which can be specified via smb_connector_application_config.json
Pre-requisites:
The ZIP of the installer files.
smb_connector_application_config.json file.
Windows Server machine access.
Admin access to install the connector.
Steps
Download the SMB Connector ZIP File: Obtain the ZIP file and save it to the Windows machine.
Prepare for Installation:
Unzip the contents of the ZIP file
Place the smb_connector_application_config.json file in the same directory as the unzipped contents.
Configure the Installer:
Edit the smb_connector_application_config.json file as needed. Use the smb_connector_application_config.json.example file in the unzipped folder if creating the configuration from scratch.
Create a folder mapping for every SMB share on the server that is to be scanned. WatchFolder should be the root directory of the share, and WebhookUrl should be from the scan configuration page for the SMB share on the GV dashboard (shown below).
Keep useDefaultFileFilters set to false if you want all files in the share to be scanned. If set to true, the connector will only scan files supported by the GV Synergy agent for classification.
IncludedExtensions and AdditionalFileFilters can be used if you wish to apply filters other than the defaults. IncludedExtensions supports file extensions in the format .txt, etc. AdditionalFileFilters allows for any custom file filter, including * as a wildcard
Start the Installation:
Execute the install.ps1 script by right clicking and choosing Run with PowerShell
Complete the Installation:
After the installation completes, the PowerShell window can be closed.
Save Streaming configuration
After the subscription is activated (green magnifying glass icon), real-time events will start flowing into the platform, and you will be able to monitor them from various sections of Getvisibility.
Navigate to the Live Events section under Administration to view a detailed audit log of all streaming events (you may specify source filter to focus only on SMB events):
If there are multiple tenants to choose from, use the Settings icon in the top menu to switch to the tenant in which needs to be registered to the application from the Directories + subscriptions menu
Browse to App Registration and select your application that was created for the scanning
Navigate to Manage -> API permissions on the left menu, and Add a permission
Select Microsoft APIs -> Office 365 Management API
Select Application permission
Select ActivityFeed.Read permission
Permissions required
All the scanning permissions(https://docs.getvisibility.com/scan-with-getvisibility/configure-data-sources/sharepoint-online)
Office 365 Management API ⇒ Application Permissions ⇒ ActivityFeed.Read
Once all the required permission is added, click "Grant admin consent"
Sign into the Microsoft Purview portal using Microsoft Edge browser
Select the Audit solution card. If the Audit solution card isn't displayed, select View all solutions and then select Audit from the Core section
If auditing isn't turned on for your organization, a banner is displayed prompting you start recording user and admin activity. Select the Start recording user and admin activity banner.
In certain cases, recoding cannot be enabled immediately and requires additional configuration. If this applies, users will be prompted to enable the customization setting. Select OK, and a new banner will appear, informing you that the process may take 24 to 48 hours to complete. After this waiting period, repeat the previous step to proceed with enabling recoding.
From the Data Sources page, select OneDrive from the list of available data sources. In the Scan Configurations list, create a New Configuration.
Make sure the connection has a Name and Credentials set. Then select the Path icon.
Click on the Folder icon in the Path field to select the folder you want to monitor for real-time events.
Magnifying glass icon: Folders with this icon next to them indicate that real-time events can be subscribed to from this directory.
After selecting the folder, click Save & Close to finalize the changes.
Clock icon: When data streaming is being activated, the clock icon will appear, indicating that the subscription is being processed. Once the subscription is activated, this icon will change to a green magnifying glass.
After enabling Data Streaming, the system will automatically handle the subscription to OneDrive’s real-time events. There is no need to manually configure Webhooks.
After the subscription is activated (green magnifying glass icon), real-time events will start flowing into the platform, and you will be able to monitor them from various sections of Getvisibility.
Navigate to the Live Events section under Administration to view a detailed audit log of all streaming events.
In this section, you can filter and view event details.
Here there is a list of pre-defined common Detectors that can be used.
To begin the setup click on the Create button on the top right corner of the screen.
This brings up the Detector Creation Screen.
Provide a Query Name. For this example "Employee Salaries".
Define where the Search Base of the Detector will look (i.e. search through the contents of a file or the file path). For example to search through the full document contents to look for certain salary-related keywords, select Content.
In the Contain field set the relevant salary-related keywords that might trigger a detector hit in a potentially sensitive document. “Salary” “Compensation Package” “Payslip” “Payroll” “Compensation Structure” “OTE”
If there are terms that the Detector is to ignore set them in in the Not Contain field.
Click the Enabled button to turn on and then Save the Detector.
The new Detector named Employee Salaries should now be visible in the list of Detectors
A new scan will be needed to detect for Employee Salaries.
Each token that is added to a detector is related to the other tokens like an OR condition. AND conditions are not available detectors but this functionality can be configured indirectly through the data asset registry or directly through RegEx pattern matching.
An important feature of DSPM is the ability to identify data assets that are important to the organisation and assign those assets in the inventory. Detectors are a powerful method that work in conjunction with the AI Mesh to find critical, sensitive, and regulated data during scans.
Once Detectors are configured and scans are underway, users can access them for describing queries in GQL. Use the detectorHits value as shown below. GQL will give suggestions to help speed up filtering.
Detectors are used along with the AI Mesh to analyse data and visually present findings in the Analytics Dashboard. Detectors associated with various data assets and types can be found through the out-of-the-box widgets and play a crucial role in helping to identify specific important data.
In order to identify employee data during scans it can be useful to add all employee names to a detector. This means a detector that helps identify HR data located throughout the data estate.
Overall, detectors give a better understanding of the data and help define very specific attributes as well as broad categories of data assets.
Note: Only users with admin role have access to Control Center
Go to the Applications menu item, click Desktop Auth Client and find Client ID
Customer ID should be provided by iManage admins, but if it is not provided, it can be retrieved from the /api response
Get Access Token
Get Customer ID
Go to the Roles menu item and set the following:
Select Global Management to setup admin roles. Enable the necessary options.
Select Library-level Management to setup library roles
Permissions required
For scanning
System Access > Read-only
To move files
Click on the Folder icon in Path to select a particular path to scan, or leave the path as empty to scan all
Save the configuration
Once the configuration is saved, click on the icon on the right and select Start trustee scan to begin Trustee scanning
The scan results can be viewed under Dashboard -> Access Governance
Click on the icon on the right and select Start file scan to begin file scanning
The results can be viewed under Dashboard -> Enterprise Search




How to create an Azure AD Connector app to connect to Azure Active Directory (Microsoft Entra ID).
Login to Azure Portal
If there are multiple tenants to choose from, use the Settings icon in the top menu to switch to the tenant in which needs to be registered to the application from the Directories + subscriptions menu
Browse to App Registration and select New registration
On the App Registration page enter the below information and click the Register button.
Name: (Enter a meaningful application name that will be displayed to users of the app)
Supported account types:
Select which accounts that the application will support. The options should be similar to the below screenshot.
Navigate to Manage -> Certificates and secrets on the left menu, to create a new client secret
Provide a meaningful description and expiry to the secret, and click on Add
Once a client secret is created, note its Value and store it somewhere safe. NOTE: this value cannot be viewed once this page is closed.
Navigate to Manage -> API permissions on the left menu, and Add a permission
Select Microsoft APIs -> Microsoft Graph
Select Application permissions
Permissions required
Scanning only:
Microsoft Graph > Application permissions > AuditLog > AuditLog.Read.All
Microsoft Graph > Application permissions > Directory > Directory.Read.All
Navigate to Administration -> Data Sources -> Azure AD -> New scan
Provide the Directory (tenant) ID, Application (client) ID and Client Secret value generated in the above steps from the Azure application
Save the configuration
Once the configuration is saved, click on the icon on the right and select Start trustee scan to begin scanning
The scan results can be viewed under Dashboard -> Access Governance
Detailed description on Dashboard Widgets
The Analytics page and its boards showcase various metrics, charts, and graphs that detail the findings from data scans, including overexposed files, sensitive data, and data at risk.
These are critical for understanding and managing the organisation's data security and compliance posture. It identifies potential vulnerabilities, risks, and compliance issues, enabling informed decision-making to mitigate threats and enhance data protection strategies.
The primary users of the Analytics page are CISOs (Chief Information Security Officers), security analysts, data protection officers, and IT administrators who are responsible for the organisation's data security and compliance. It provides these stakeholders with a comprehensive overview of the data security health of the organisation.
The Analytics page is fully populated after DSPM scans have been completed but it can be accessed during scan to view live information.
This document provides information on how to configure Confluence Cloud connection with real-time events monitoring and data streaming.
The CLI sections are exclusive to Helm and Air-gapped systems - these steps are not required for clusters connected to Rancher. There are no line breaks in the commands, unless otherwise noted.
How to find the history of Scans performed on a Data Source
Go to Administration > Data Sources
Click on a Data Source
Click on the “Last Scan Status” symbol
curl -X POST "https://your.imanage.server.com/auth/oauth2/token" \
-d "username=YOUR_USERNAME" \
-d "password=YOUR_PASSWORD" \
-d "grant_type=password" \
-d "client_id=YOUR_CLIENT_ID"curl -X GET "https://your.imanage.server.com/api" \
-H "X-Auth-Token: YOUR_ACCESS_TOKEN"1.4
2.0
20,000 Users or Less
32
144
2.5
25,000 Users or Less
40
176
3.0
3.6
20,000 Users or Less
64
272
4.8
25,000 Users or Less
80
336
6
5.6
20,000 Users or Less
80
336
7.3
25,000 Users or Less
100
416
9.0
6.3
15,000 Users or Less
120
496
9.2
20,000 Users or Less
160
656
12.1
25,000 Users or Less
200
816
15
Client Id
(ID from Part 1)
The application Client ID.
Domain
your.imanage.server.com
Crucial: Your on-premise server's hostname.
Path
(Optional)
Leave blank to scan all content, or click the folder icon to select a specific path.























List > ListAllMyBuckets
List > ListBucket
Tagging > PutObjectTagging














































Document > Delete
To revoke permissions
System Access > Not Read-only
For tagging
Document > Import / Create
Navigate to Administration -> Data Sources -> iManage -> New scan
Provide the customer id, client id, username, password and domain value














“Accounts in this organizational directory only” can be selected:
Leave the Redirect URI as empty and Click Register
Note the Application (client) ID, Directory (tenant) ID values
Once all the required permissions are added, click Grant admin consent














The Analytics page gathers its information through the DSPM platform's data discovery, classification, and risk assessment processes. The platform’s connectors are set up to scan the organisation's digital environment, identifying and classifying data across systems and repositories, and evaluating the risks based on various factors such as sensitivity, exposure, and compliance requirements. This data is then aggregated, analysed, and presented on the Analytics Boards in an easily digestible format.
The Analytics page is found within the DSPM platform's user interface under the dedicated "Analytics" section.
DSPM comes with 22 preconfigured boards out-of-the-box. Here are brief descriptions of the use cases they cover.
Financial Data At Risk: Focuses on identifying and mitigating risks associated with financial data, essential for preventing fraud and ensuring regulatory compliance.
Data Exposure: See potential data exposure risks, including ransomware impact, sensitive data distribution, and high-risk data locations across various assets and attributes.
Classification Overview: Provides a snapshot of data classification across the organisation, aiding in the identification of sensitive data and ensuring compliance with data protection regulations.
Key Data Overview: Highlights critical data assets within the organisation, enabling focused protection efforts on the most valuable and sensitive information.
Cataloged Files: Offers a detailed inventory of all catalogued files. These are files that have not passed through the ML pipeline. This helps identify any data issues.
Shadow Data: Reveals unmanaged or unknown data residing outside of controlled environments, reducing risks associated with data sprawl and exposure.
HR Data At Risk: Highlights vulnerabilities within human resources data, protecting sensitive employee information from breaches and unauthorised access.
Data Risk Assessment: Offers a detailed view of data risk factors, highlighting high-risk files, ownership gaps, and critical exposures to aid in mitigating security threats and ensuring compliance.
Unprotected Data: Identifies data lacking adequate security controls, allowing for quick remediation and the strengthening of data protection measures.
Data Ownership: Clarifies data stewardship within the organisation, promoting accountability and facilitating effective data management and security practices.
Duplicate Files: Identifies and addresses issues of data redundancy, improving storage efficiency and data management practices.
Data Risk Management: Identify and assess risks related to PHI, medical data, and PII exposure across various files and categories using the Data Risk Management dashboard.
Ransomware Exposure: Evaluates the organisation's vulnerability to ransomware attacks, facilitating proactive measures to protect critical data assets.
ROT Data: Identifies redundant, obsolete, or trivial (ROT) data that clutters systems and poses unnecessary risk, enabling effective data clean-up and policy enforcement.
Executive Data at Risk: Targets the specific data risks associated with executive-level information, ensuring high-profile data receives adequate security measures.
High Risk Users: Identifies users with excessive permissions or abnormal access patterns, enabling organisations to mitigate insider threats and enforce least privilege access policies.
Classification Distribution: This dashboard provides a comprehensive overview of data classification, distribution, and storage locations across different sensitivity levels and data sources.
Scan Status: The Scan Status board provides real-time insights into the progress of ongoing data scans, allowing organisations to monitor the coverage and completeness of their data discovery and security efforts.
Gen-AI Oversight: Monitors GPT deployments and user activities, providing insights into file uploads, chat usage, and potential risks associated with AI-driven operations.
Data Compression Schedules: Provides insights into data compression activities, optimising storage utilisation and enhancing data management efficiency.
Data Incidents: Summarises past and present data security incidents, providing insights from past incidents and enhancing organisational resilience against future threats.
Gen-AI Readiness: Evaluates a company's readiness for adopting GenAI by analysing the availability and sensitivity of data that could be used for AI training or RAG applications.
While the default boards provide excellent coverage for the most frequent data security and compliance use cases, it can be beneficial to edit some of the input parameters to suit some specific customer requirements.
The interface for editing the boards' widgets is designed for ease of use, incorporating GQL (Getvisibility Query Language) and graphical elements.
See the GQL Reference Guide for full information.
There are a number of widgets available and each of them have their own unique customisation options.
The widget's design aims to provide a customisable and at-a-glance view of specific data metrics, which can be particularly useful for quickly accessing the volume of data that matches certain criteria, such as sensitive files or risk levels.
Users can choose the dataset they wish to count from, like files, trustees, or agent activities. They can also employ GQL to refine their search and set the aggregation function (e.g., count, sum, average).
This section allows users to add a descriptive title, position it accordingly, select an icon to represent the data visually, and choose primary and secondary colours for the widget's theme. Users can also toggle the compact mode to change the widget's display size.
These widgets are designed to help users tailor the display of data analytics to their preferences for better interpretation and presentation of data insights.They can have multiple types: Horizontal Bar, Vertical Bar, Line, Ares, or Pie.
This tab allows users to select the type of dataset to visualise (e.g., files or trustees) and use GQL for specific queries. The 'Field to group by' feature is used to categorise data, with adjustable limits on the results displayed and thresholds for inclusion in the visualisation.
Users can adjust general settings like chart type, add a title, adjust margins for clarity, and choose a colour palette for the chart. Options for additional customisations such as enabling grid lines or flipping colours for visual differentiation are also present.
The Legend section has toggles for enabling a legend display and showing percentages, with adjustments for size and positioning on the chart.
The map widget is an interactive element that displays geographical data. It is configurable to show specific information based on user-defined criteria. Geographic location can be added during scan set up and is crucial in discovering data sovereignty violations.
This interface enables the use of GQL to query and filter the data that will be displayed on the map. Enter the query in the search bar and click "SAVE" to apply the filters or "CANCEL" to exit without making changes.
Here the map's appearance can be customised. Adjust the data, area, border, and background colours using the colour selection tools, and add a title or subtitle as needed.
The text widget allows for rich text creation and editing. Users can format the text with the various styling options provided.
The toolbar has standard text editing options. Users can enter and format their text in the area below the toolbar.
The table widget displays data in a structured format.
This interface shows the selection of a data source (SharePoint Online) and the path to specific files within that source. The use of GQL is available to further query and refine the data. Options to export the data as a CSV file or view the table on the page will be provided. Users can set the result limit, PDF export limit, sorting field and order, and select which columns to display before saving.
In this settings panel, you can add a title and subtitle for the table, and choose their alignment on the page (e.g., left, center, right).
This widget is designed to monitor and report on pre-configured data compliance issues, focusing on various data security and management rules.
Users can select a specific DSPM+ rule from a dropdown menu to focus on. The widget displays the count of rule violations and the corresponding files affected.
The Dual Data Grouping widget is used to organise and visualise complex datasets by multiple attributes simultaneously. It enables detailed analysis of complex data sets by allowing an examination of two separate data attributes concurrently. This enhances the understanding of the relationships within data.
The settings allow you to define the 'Label' and the 'Field to group by', which in this case is 'Data Attribute Name', and set a display limit for these groupings. Use the GQL search bar to refine the data set. After setting up, click "SAVE" to update the widget or "CANCEL" to discard changes.
The widget title and colour palette can be customised to visually distinguish the different groupings.
There is also the option to 'Flip Colours' for the display, to improve visual contrast or accessibility.
The 'Position' dropdown allows the title and subtitle to be aligned.
The Multi Counter widget is designed to track and display counts for multiple items or categories within a dataset, useful for monitoring and comparing quantities at a glance.
Each counter can be set to track a different field. Users can customise the criteria for each counter using the search fields provided and add additional counters if needed.
DSPM Analytics presents essential data insights through its interface, offering a practical snapshot of data security and compliance statuses. This straightforward overview assists those in charge of data security with the necessary information to make quick, informed decisions to protect their organisation’s data.
Login to Azure Portal
If there are multiple tenants to choose from, use the Settings icon in the top menu to switch to the tenant in which needs to be registered to the application from the Directories + subscriptions menu.
Browse to App Registration and select New registration
On the App Registration page enter below information and click Register button
Name: (Enter a meaningful application name that will be displayed to users of the app)
Supported account types:
Select which accounts the application will support. The options should be similar to those below. Select “Accounts in this organizational directory only”:
Leave the Redirect URI as empty and Click Register
Note the Application (client) ID, Directory (tenant) ID values
Navigate to Manage -> Certificates and secrets on the left menu, to create a new client secret
Provide a meaningful description and expiry to the secret, and click on Add
Once a client secret is created, note its Value and store it somewhere safe. NOTE: this value cannot be viewed once you leave this page
Navigate to Manage -> API permissions on the left menu, and Add a permission
Select Microsoft APIs -> Microsoft Graph
Select Application permissions
For UnifiedPolicy.Tenant.Read
Navigate to Manage -> API permissions on the left menu, and Add a permission
Select APIs my organization uses tab
Search for Microsoft Information Protection Sync Service
Select Application permissions > UnifiedPolicy.Tenant.Read
For InformationProtectionPolicy.Read.All
Navigate to Manage -> API permissions on the left menu, and Add a permission
Select APIs my organization uses tab
Permissions required
For scanning
Microsoft Graph > Application permissions > Sites > Sites.Read.All
Microsoft Graph > Application permissions > Directory > Directory.Read.All
Once all the required permissions are added, click "Grant admin consent"
Navigate to Administration -> Data Sources -> OneDrive -> New scan
Provide the Directory (tenant) ID, Application (client) ID and Client Secret value generated in the above steps from the Azure application
Click on the Folder icon in Path to select a particular user's OneDrive to scan, or leave the path as empty to scan all users
Save the configuration
Once the configuration is saved, click on the icon on the right and select Start file scan to begin the scanning
The results can be viewed under Dashboard -> Enterprise Search
Ensure the following prerequisites are met:
Existing Confluence Cloud Instance: There needs to be an active Confluence Cloud instance.
Enable Development Mode: Activate Development Mode on the Confluence Cloud site to be monitored. Refer to the official Confluence documentation.
Deploy Proxy Container: Set up the Getvisibility container with a public proxy to allow integration with Confluence Cloud.
In the product UI, go to the Data Sources > Confluence Cloud page.
Locate the existing Confluence Cloud scan configuration and select Edit Configuration.
Within the Edit Confluence Cloud Configuration page, toggle Data Streaming to ON.
Copy the Webhook URL provided, as it will be used later.
Click Save & Close to apply changes.
To enable data streaming, the confluence-cloud-streaming-proxy container will need to be deployed in the infrastructure e.g. using Docker or Kubernetes. This step involves configuring environment variables and setting up Docker for integration with Confluence Cloud.
Deployment Instructions
Download Docker image parts: Please download all files listed below:
Merge Docker image parts:
Load Docker image:
Prepare a Docker Environment: Ensure that Docker is installed and configured on the infrastructure where the confluence-cloud-streaming-proxy application will be hosted. This will be the user environment.
Set Environment Variables: Configure the following environment variables to allow the Confluence Cloud instance to communicate with the proxy application:
APP_LISTENER_PUBLIC_ACCESSIBLE_URL
Publicly accessible URL at which app can be accessed. It is used in communication between Confluence Cloud Webhook mechanism and app
e.g.
APP_WEBHOOK_URL
Webhook URL (taken from Getvisibility UI Confluence Cloud connector configuration form)
e.g.
Map Persistent Volume: Map a persistent volume to the /app/db/ directory within the container to ensure data retention across sessions.
Example docker-compose.yml Configuration
Use the following example to help set up the Docker configuration. Update the values as needed for the specific environment:
Once configured, start the container by running docker-compose up -d or an equivalent command based on configured setup.
To expose the application publicly, consult with relevant internal team such as IT or DevOps team. For testing ngrok's free plan can be used to expose the app port as needed.
Start the Application: Ensure the application runs before proceeding with the integration setup.
To install the integration, follow the steps:
Go to the Manage apps page in Confluence Cloud.
Select the Upload app
Paste the publicly accessible address in the form and press Upload.
The application will install, and the integration will be ready in a few seconds.
To uninstall the integration follow the steps:
Go to the Manage apps page in Confluence Cloud.
Find Getvisibility Confluence Cloud Streaming Proxy and click Uninstall.
Confirm by selecting Uninstall app.
Delete any associated containers and settings from your organization’s infrastructure
Select Create New App and then Custom App
Select Server Authentication (with JWT) and enter app name, then click Create App
In the Configuration tab, change App Access Level to App + Enterprise Access, then, enable Generate user access tokens and Make API calls using the as-user header.
Click on Save changes
Make sure the below Application Scopes are selected
Content Actions > Read all files and folders stored in Box
Content Actions > Write all files and folders stored in Box
Administrative Actions > Manage users
Administrative Actions > Manage groups
In the same Configuration tab, scroll down to Generate a Public/Private Keypair
This will result in a JSON file being downloaded by the browser
In Authorization tab, click Review and Submit followed up with adding a description before submitting the app for review
Make note of User ID and Enterprise ID of the App in General Settings tab
Exit Dev Console and switch to the Admin Console
In Admin Console, go to Apps > Custom Apps Manager and locate the newly created app and click View button
Review the information and Authorize the app
Navigate to Administration -> Data Sources -> Box -> New scan
Provide the values generated in the above steps from the Box application
Click on the Folder icon in Path to select a particular folder to scan, or leave the path as empty to scan all folders
Save the configuration
Once the configuration is saved, click on the icon on the right and select Start trustee scan to begin the trustee scanning
The scan results can be viewed under Dashboard > Access Governance
Click on the icon on the right and select Start file scan to begin the files scanning
The results can be viewed under Dashboard > Enterprise Search
The Box Pricing Plans required for metadata writing are Business Plus, Enterprise, or Enterprise Plus. The basic Business plan does not include custom metadata and metadata templates.
A metadata template must be created to support Getvisibility's tags. Please follow the below steps to achive this.
In the Admin Console, in the lefthand navigation click Content
Toward the top of the page, click Metadata
Click Create New
Click Name Your Template and enter name as getvisibility
Create a new attribute named as Classification with options as: Public, General Business, Confidential, Highly-Confidential
Similarly, create two more attributes:
Distribution with options as: Internal, External
Compliance with options as: PCI, PII, PHI
Use the Status drop down to indicate this template is Visible
Click Save
The customer is asked to provide the CA certificate that was used to sign the end-entity certificate of the LDAP server. We are adding this Keycloak’s trust store to make sure the LDAP server’s certificate validates successfully.
The provided CA certificate must be an X.509 v3 certificate in the ASCII PEM format (Base64-encoded). The file extension is usually .crt or .cer or .pem. Its content looks like this:
The following command confirms that the cert is in the expected format and that it is, in fact, a CA certificate. Validity is not checked here.
# openssl x509 -in /path/to/ca.crt -text -noout | grep CA
The next command validates the LDAP server’s certificate against the provided CA certificate in the customer’s environment, where the LDAP server is accessible on port 636:
# echo "q" | openssl s_client -connect dc.mycompany.com:636 -CAfile /path/to/ca.crt | grep -i verif
This command will create a JKS truststore file (ca.jks) and add the certificate (ca.crt) to it, protecting it with a password (STR0ngPA55).
# keytool -importcert -file ca.crt -keystore ca.jks -alias rootca -storepass STR0ngPA55 -noprompt
# base64 -w 0 ca.jks
Rancher
Apps > Installed Apps > gv-essentials > Keycloak:
Leave the Keycloak URL settings field unchanged.
Check Enable Keycloak for LDAP over SSL.
Enter password for truststore.
Paste value from Step 3 into the Base64 .jks file field.
Click Upgrade in bottom right corner.
CLI
Set KUBECONFIG environment variable:
# export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
Save the truststore file with base64 encoding to a variable:
# export cert=$(base64 -w 0 ca.jks)
Print the variable and confirm it has the right value:
# echo "$cert"
MIIHggIBAzCCBywGCSqGSIb3DQEHAaCCBx0EggcZMIIHFTCCBxEGCSqGSIb3DQEHBqCCBwIwggb+A(...)
Upgrade or reinstall the gv-essentials chart, adding new values while reusing existing ones:
# helm upgrade --install gv-essentials gv_stable/gv-essentials --wait --debug --timeout=10m0s --kubeconfig /etc/rancher/k3s/k3s.yaml --reuse-values --set keycloak.ldaps.enabled=true --set keycloak.ldaps.truststorecert="$cert" --set keycloak.ldaps.truststorepass="STR0ngPA55"
This ensures the new truststore is loaded.
Rancher
Workloads > StatefulSets > gv-keycloak:
Choose the Redeploy option.
CLI
# kubectl rollout restart statefulset gv-keycloak
Enable StartTLS: OFF (default) - This must be turned OFF - communication will regardless be encrypted.
Use Truststore SPI: Always (default) - This ensures that the imported certificate store is used to validate the LDAP server’s certificate.
Use the Test connection and Test authentication buttons to make sure both connection and authentication to the LDAP server is successful.
Ideally, both tests return success. (The exact same message is printed for both the connection and the authentication test.)
Below is an example of a message seen on the Keycloak side (web) and its counterpart in the logs of the Keycloak service (pods named gv-keycloak-…).
1.
2024-07-16 09:17:02,557 ERROR [org.keycloak.services] (executor-thread-6) KC-SERVICES0055: Error when authenticating to LDAP: Cannot invoke "org.keycloak.truststore.TruststoreProvider.getSSLSocketFactory()" because "provider" is null: java.lang.NullPointerException: Cannot invoke "org.keycloak.truststore.TruststoreProvider.getSSLSocketFactory()" because "provider" is null
Keycloak is trying to read trust store to validate a certificate, but the trust store has not been loaded, it could be missing from inside the Keycloak pod (/opt/keycloak/certs/rootCA.jks)
Trust store has not been installed via the GetVisibility Essentials Helm chart?
Redo step 4.
Keycloak has not been restarted after installing certificate?
Redo step 5.
Go to Administration > Data Sources
Click on a Data Source
Find the required Hamburger Menu
Click on Scan History
Either of the above options will show the history of scans performed on the relevant Data Source


How to configure Azure Blob connection for scanning.
Login to Azure Portal
If there are multiple tenants to choose from, use the Settings icon in the top menu to switch to the tenant in which needs to be registered to the application from the Directories + subscriptions menu.
Browse to App Registration and select New registration
On the App Registration page enter below information and click Register button
Name: (Enter a meaningful application name that will be displayed to users of the app)
Supported account types:
Select which accounts the application will support. The options should be similar to those below. Select �Accounts in this organizational directory only�:
Navigate to Manage -> Certificates and secrets on the left menu, to create a new client secret
Provide a meaningful description and expiry to the secret, and click on Add
Once a client secret is created, note its Value and store it somewhere safe. NOTE: this value cannot be viewed once you leave this page
Navigate to Manage -> API permissions on the left menu, and Add a permission
Select Microsoft APIs -> Microsoft Graph
Select Application permissions
Permissions required
Microsoft Graph > Application permissions > Device > Device.Read.All
Microsoft Graph > Application permissions > Directory > Directory.Read.All
Microsoft Graph > Application permissions > Group > Group.Read.All
A is needed for the storage account that is to be scanned.
Login to
If there are multiple tenants to choose from, use the Settings icon in the top menu to switch to the tenant which needs to be registered to the application from the Directories + subscriptions menu
Browse to Storage accounts and select the account to be scanned
Once the storage account is selected, note the Resouce group and Subscription ID values in the Overview page
Navigate to Security + networking -> Access keys on the left menu, and click on Show on the Connection string
Copy this Connection string value
Access Control (IAM) Role assignment - there are 2 options, one is to assign a built-in role, the other is to create and assign a custom role. Using a built-in role is an easier option to configure, while a custom role may be preferred to ensure least privileges assignment for increased security.
Option 1: In the storage account, go to Access Control (IAM) and check on either Storage Blob Data Owner or Data Contributor role to assign the role to the blob storage. (Per the Data Contributor role is the least privileged, built-in role for Listing Containers)
*** Firewall rules must also be in place to allow the DSPM server to connect to
Navigate to Administration -> Data Sources -> Azure Blob -> New scan
Provide the Connection string value obtained from above steps
Click on the Folder icon in Path to select a particular share to scan, or leave the path as empty to scan all shares
Save the configuration
Once the configuration is saved, click on the icon on the right and select Start file scan to begin the scanning
The results can be viewed under Dashboard -> Enterprise Search
We use , an open-source container orchestration system to manage our applications.
At the moment the only Kubernetes distribution supported is ( for the official documentation) by for both on-premise and cloud deployments.
Getvisibility offers cybersecurity AI products, specifically aimed at Data Security Posture Management (DSPM). In a broader sense, we also provide solutions for Data Governance. Our flagship product, DSPM+, is a sophisticated file classification pipeline. It seamlessly integrates with various data sources through a range of connectors including, but not limited to, Samba for network file servers, Windows file servers, Google Cloud, AWS, Dropbox, and SharePoint. The process involves downloading all the files from these sources, putting them through a pipeline that includes implementing our cutting-edge artificial intelligence technology to analyse the context of each file, and the classifying them under multiple criteria.
At the heart of this classification pipeline lies an artificial intelligence classification service designed to work on unstructured text. Once the text is extracted from files sourced through various connectors, it undergoes classification by diverse machine learning algorithms.
How to complete the Keycloak installation setup.
Keycloak is an Open-source product which allows Single Sign-On (SSO) and enables Identity and Access Management integration to allow for a quick, safe, and secure integration of authentication within modern applications.
Below are the steps involved in configuring Keycloak, and you may choose to skip the Optional steps.
The integration of Data Streaming and File Lineage into the DSPM platform provides a comprehensive solution for real-time data monitoring and tracking across both cloud and on-premises data sources. This enhancement enables organizations to dynamically track file origins, data transformations and movements, and end-usage in real time, strengthening security, compliance, and auditability. By introducing these functionalities, businesses can seamlessly monitor data activities and movements across various data sources, providing up-to-date visibility over data estate and offering deeper insights into file history for e-forensics use cases and risk mitigation.
By implementing Streaming, we unlock crucial use cases such as File Lineage tracking, and Data Detection and Response capabilities, enabling real-time visibility into data activities. This also builds the foundation for anomaly detection capabilities, frequently requested by customers. For instance, scenarios like a user resetting their password, accessing confidential data, and downloading it can be quickly identified. By providing almost real-time updates and visibility into the data estate, businesses can seamlessly monitor data activities, mitigating risks and improving security.
PRECONDITION:
During cluster installation, network administrators need to open on firewall exclusion for incoming requests for path:
where ${HOST_DOMAIN} it's host domain of DSPM platform installation.
cat confluence-cloud-streaming-proxy.tar.gz.part* > \
confluence-cloud-streaming-proxy.tar.gz.joineddocker load --input confluence-cloud-streaming-proxy.tar.gz.joinedservices:
app:
image: getvisibility/confluence-cloud-streaming-proxy:v0.3.2
ports:
- "8080:8080"
environment:
APP_LISTENER_PUBLIC_ACCESSIBLE_URL: https://5977-88-156-142-22.ngrok-free.app
APP_WEBHOOK_URL: https://tenantabc.getvisibility.com/scan-manager/external/webhooks/notification/71ccab3d56980a2d9c766f42c86d36ffedc34258a0f226aaf56a628f06e9d89d
volumes:
- ./app-db/:/app/db/-----BEGIN CERTIFICATE-----
MIIGBTCCA+2gAwIBAgIUaIGnTiJx27iBiIF+4jIkb7o5miswDQYJKoZIhvcNAQEL
...
-----END CERTIFICATE----- CA:TRUEdepth=1 C = IE, ST = Ireland, L = Cork, O = MyCompany Ltd, CN = mycompany.com, emailAddress = [email protected]
verify return:1
depth=0 CN = mycompany.com
verify return:1
DONE
Verification: OK
Verify return code: 0 (ok)MIIHggIBAzCCBywGCSqGSIb3DQEHAaCCBx0EggcZMIIHFTCCBxEGCSqGSIb3DQEHBqCCBwIwggb+A(...)






















































The host domain needs to be publicly available on the web.
Ensure that the certificate used is one that is trusted by the Data Source provider. For example with Microsoft services more information on the certificates that they accept can be found here.
Multitenancy Setup
For the multitenancy setup, we need to specify ${HOST_DOMAIN} as
For Data Detection and Response (DDR) to function effectively, the callback endpoint URL must remain open and accessible beyond just the initial setup phase. DDR relies on real-time event notifications and data stream updates, continuously sent to the callback URL. If the callback endpoint is closed or restricted after setup, DDR will fail to receive critical updates, which may result in:
Delayed or missing alerts on data access, movement, or security threats.
Incomplete monitoring of file lineage and activities, impacting compliance and forensic investigations.
To ensure uninterrupted functionality, organisations must configure their network to allow incoming requests to the callback URL from all necessary data sources.
Additionally, for on-premise deployments, it is critical that the webhook URL is accessible by external resources to receive notifications. If external services cannot reach the callback URL, DDR will not function correctly, leading to missed event detections and security blind spots. Network administrators must ensure the necessary firewall rules and routing configurations are in place to allow external communication with the webhook.
https://${HOST_DOMAIN}/scan-manager/external/webhooks/notification{{ .Values.clusterLabels.cluster_name }}.{{.Values.clusterLabels.rancher}}.app.getvisibility.com Search for Microsoft Information Protection API
Select Application permissions > InformationProtectionPolicy.Read.All
For Azure Rights Management Services > Content.Writer
Navigate to Manage -> API permissions on the left menu, and Add a permission
Select Azure Rights Management Services tab
Select Application permissions
Select Content > Content.Writer
Microsoft Graph > Application permissions > Files > Files.Read.All
Microsoft Graph > Application permissions > User > User.Read.All
For reading Sensitivity labels
Microsoft Graph > Application permissions > InformationProtectionPolicy > InformationProtectionPolicy.Read.All
APIs my organization uses > Microsoft Information Protection Sync Service > Application permissions > UnifiedPolicy.Tenant.Read
For revoke permissions
Microsoft Graph > Application permissions > Files > Files.ReadWrite.All
For tagging
Microsoft Graph > Application permissions > Sites > Sites.Manage.All
For MIP tagging
Azure Rights Management Services > Application permissions > Content.Writer
Microsoft Graph > Application permissions > Directory > Directory.Read.All
Microsoft Graph > Application permissions > Files > Files.ReadWrite.All
Microsoft Graph > Application permissions > Sites > Sites.Manage.All
Microsoft Graph > Application permissions > InformationProtectionPolicy > InformationProtectionPolicy.Read.All
APIs my organization uses > Microsoft Information Protection API > Application permissions > InformationProtectionPolicy.Read.All























Leave the Redirect URI as empty and Click Register
Note the Application (client) ID, Directory (tenant) ID values
Microsoft Graph > Application permissions > User > User.Read.All
Once all the required permissions are added, click "Grant admin consent"
We also need to assign Reader role to the azure app created in the first step
Save the changes.
Option 2: This option creates a custom role and assigns the same permissions as the Data Contributor role, except for the delete permissions. In the Blob storage account, go to Access Control (IAM) and click Add to create a new role. Name the role with a preferred name, and choose the following actions below to assign to this custom role. Select this custom role for the blob and save changes.
We also need to assign Reader role to the azure app created in the first step
Real Time Events Monitoring (Streaming) Permissions: To enable "Real Time Events Monitoring (Streaming)", the following additional Azure permission roles are required:
EventGrid Data Contributor
EventGrid EventSubscription Contributor
EventGrid TopicSpaces Publisher
Assign these roles using Access Control (IAM) in the Blob storage account, similar to the steps mentioned above for assigning the Storage Blob Data Owner or Data Contributor role.
Next, in the Networking tab, under Public network access, select "Enabled from all networks", or "Enabled from select virtual networks and IP addresses". If the latter was chosen, then under Firewall section add the IP address range for the DSPM server.
Enable "Allow trusted Microsoft services to access this storage account" and Save the changes.





















Kubernetes distributions can have different components that may cause applications that work in one distribution to not necessarily work or even crash into another. Some of the most important components that differ between distributions are:
Container Runtime: The container runtime is the software that is responsible for running containers. Each Kubernetes Distribution may offer support for different Container Runtimes. Some popular container runtimes include Docker, CRI-O, Apache Mesos, CoreOS, rkt, Canonical LXC and frakti among others.
Storage: Storage is important for Kubernetes applications as it offers a way to persist this data. Kubernetes’ Container Storage Interface (CSI) allows third-party vendors to easily create storage solutions for containerized applications. Some Kubernetes Distributions build their own storage solutions while others integrate with existing third party solutions. Popular storage solutions for Kubernetes include: Amazon ElasticBlock Storage (EBS), GlusterFS, Portworx, Rook, OpenEBS among others.
Networking: Kubernetes applications are typically broken down into container-based microservices which are hosted in different PODs, running in different machines. Networking implementations allow for the seamless communication and interaction between different containerized components. Networking in Kubernetes is a herculean task, and each distribution may rely on a networking solution to facilitate communication between pods, services and the internet. Popular networking implementations include Flannel, Weave Net, Calico and Canal among others.
In order to offer our customers a better and more seamless experience while configuring, running, upgrading and troubleshooting our products while also avoiding compatibility issues between different distributions we decided to officially support ONLY ONE Kubernetes distribution: K3s. The main reasons for choosing K3s are:
Costs — K3s is 100% open source and there’s no need to pay for any expensive licenses.
Less setup overhead — a lot of time is saved when setting up a new environment because you don’t need to go through a lengthy process of acquiring extra licenses based on how many CPU cores you have. Also, K3s can be installed using only one command.
It supports many Linux distros — K3s supports popular Linux distributions including open source ones, it can also run both on-premise and in the cloud (AWS, Azure, GCP).
It’s fast and lightweight — K3s is packaged as a single <100MB binary and its lightweight architecture makes it faster than stock Kubernetes for the workloads that it runs.
Easy to update — Thanks to its reduced dependencies.
Batteries included — CRI, CNI, service load balancer, and ingress controller are included.
Smaller attack surface — Thanks to its small size and reduced amount of dependencies.
Certified — K3s is an official project that delivers a powerful certified Kubernetes distribution.
Flexible — you can run K3s using single-node or multi-node cluster setup.
The minimum requirement for the Kubernetes cluster is a single node (1 virtual machine) with the following specs:
CPU cores
8
16
20
Memory
32GB
64GB
80GB
Storage
500GB min 32M inodes
600GB min 39M inodes
Please also refer to Estimate hardware capacity needs.
Ensure the following items are in place and configured:
Domain Name Service (DNS) with public name resolution enabled
Network Time Protocol (NTP)
Software Update Service - access to a network-based repository for software update packages.
Fixed private IPv4 address
Unique static hostname
For details on how to configure Rancher behind a proxy, refer to the official Rancher documentation and ensure the network settings mentioned above are applied.
If using proxy, please run this before using starting k3s.sh installation:
When running k3s.sh script you need to provide product name in a form of PRODUCT_NAME argument. This will instruct the installer to test your current environment against product requirements, which are different.
Allowed product names are:
synergy
focus
dspm
enterprise
ultimate
Capitalization of the name is important. If you provide name that cannot be recognized or if you don't provide product name at all, the script will default to PRODUCT_NAME="dspm".
This is just a regular k3s installation command when you want to install Getvisibility Enterprise product.
If you want just to see if your environment meets all the requirements use the command above.
Above command will allow you to skip checking how installed memory size, number of required CPU cores and storage requirements fare against your environment.
If you want to skip both hardware and connectivity checks use this argument.
Be cautious when skipping built-in checks - we built them for the product to achieve optimal performance with minimal required maintenance.
SKIP_PRECHECK=true
to skip all built in checks
SKIP_SYSTEM_CHECKS=true
to skip hardware checks
SKIP_NETWORK_CHECKS=true
to skip connectivity checks
ONLY_PRECHECK=true
will run precheck only and stop after that
The installer, depending on the combination of arguments provided, will perform a set of actions before returning back to the command line.
Provided not all the checks have been skipped, it is possible for the installer to abort the installation process. If that happens please review the output paying special attention to any WARN messages. Should you have any concerns or questions please cotact the Support with the result screen attached.
Run the kubectl registration command:
The command below is just an example, it will not work during deployment. For direct customers, Customer Support Team will provide the registration command, otherwise you should have been provided registration command in the Welcome Email.
For security reasons the registration command can be used only a single time, the command becomes invalid after the first use. In case you need to run it again you must contact the support team for a new registration command.
Monitor the progress of the installation:
The K3s deployment is complete when elements of all the deployments (coredns, local-path-provisioner, metrics-server, traefik and cattle-cluster-agent) show at least "1" as "AVAILABLE"
In case of errors you can inspect the logs of a pod using kubectl logs , e.g.
Please note that we don’t use Docker as the container runtime, instead we use containerd.
Your network should be configured to allow the following public urls to be accessible over port 443 (HTTPS) and HTTPS traffic is bypassed (NOT intercepted):
For more details on how to configure Rancher behind a proxy Configuring Rancher and Fleet agent to run behind an HTTP proxy
Rancher might be trying to reach to git.rancher.io since it’s a default hard-coded repository, but we have our own private repo with all our charts. So it's ok to block it as we can't disable it.
The typical mesh deployment is inhomogeneous, and contains the following types of nodes:
LLM-like miniature language models transforming text into salient document vectors, with between 10-30 million parameters;
deep neural network classifiers for sentiment analysis, with below 100,000 parameters, use the document vectors to produce classification outcomes;
bag-of-word models for topic detection;
filters based on regular expressions or fuzzy text searches;
other types of evaluators (ex. complexity of text, etc) - Python code segments;
mapping multiple input models into outputs with .
Our deployments are 10 times smaller than even the smallest and most efficient Large Language Model (LLM)-based classifier deployments. This scale allows us to classify a file within 200 milliseconds, relying solely on a normal CPU without the need for specific GPU deployment. Additionally, given that our models are 10,000 times smaller than typical large AI deployments, we are not subject to regulations that apply to large AI deployments, such as the EU AI Act.
This network typically generates a multitude of classification outcomes, or signals. Each classification decision is generally binary—true or false—indicating whether the text viewed by the AI mesh is related to a specific signal. Furthermore, each outcome is accompanied by a confidence value, which is a number between zero and one. In rare instances, constituting less than 5% of the cases, the mesh outputs a categorical signal. Unlike the binary true/false, it classifies the text into one of three, four, or possibly even five mutually exclusive categories.
The Service Level Agreement (SLA) for the accuracy of the ML components used in the AI mesh stipulates no less than 80% accuracy on a balanced dataset—comprising 50% positive examples and 50% negative examples. This accuracy rate is measured on an out-of-sample basis, meaning the data used for this accuracy assessment is not employed in training the machine learning model. This approach provides insights into the model's ability to generalize.
The confidence level associated with each classification outcome in machine learning models, whether binary or categorical, varies between zero and one and indicates the certainty of the prediction. A confidence of 0 suggests that the classifier views the prediction as no better than a random guess, implying a 50% probability of accuracy. On the other hand, a confidence of 1 indicates maximum certainty, meaning the input data closely matches the training data for the given classification. Confidence levels between 0 and 1 are linear and uniformly distributed based on the training data, with a confidence of 0.5 representing a median level of certainty.
The AI Mesh functions as a Bayesian network, where results are propagated forward. This process involves using machine learning models, such as a Continuous Bag Of Words (CBOW) model and various filters, to determine whether a file is confidential. Both outcomes (true and false) are considered with their respective probabilities, which are then propagated forward to influence the confidence score. Users utilizing this confidence score will take into account its value, leading to situations where a strong classification signal might be overshadowed by other signals if, collectively, they provide stronger evidence. In Bayesian networks, this sampling technique is known as forward sampling or ancestral sampling. The AI mesh employs a highly efficient implementation of this technique by constraining the distributions of the internal nodes to either categorical or binary distributions.
The typical token window analysed is 512 tokens, which corresponds to roughly a page of text. For larger texts, the results from multiple passes are integrated with the mesh. For shorter texts, the mesh composition can be adjusted to accommodate.
For example, in order to determine if a document is confidential, in a rudimentary setting, a machine learning model that works on document vectors is involved. This model performs sentiment analysis on the original document to understand if it sounds confidential. Additionally, a simpler model searches for words like "confidential" or words similar to "confidential" syntactically as part of topic detection. There are filters and detectors designed to pick up specific keywords, such as the word "confidential" itself, which may be stamped by another application, included as part of a watermark, or in the context of certification and compliance policies. Finally, a Bayesian network of all these models is used to infer the outcome and associated confidence level.
We list below the functional diagram of the classification pipeline around the AI Mesh.
The AI mesh features a stereotypical structure designed to facilitate easy reasoning and training for individuals involved in proofing, training, and selling the mesh. Since the mesh is a directed acyclical graph, it allows for the definition of inputs, intermediary nodes, and outputs.
The inputs or entry points take in raw information about the file, which is then analysed and produces some sort of signal. This signal is interpreted by other nodes in the mesh. Inputs include various forms of transforming the input text into document vectors or word vectors, elements collecting statistical information about the input text, or processing it for other types of statistical information collectors. Additionally, filters provide a signal indicating whether certain keywords or patterns of keywords are present in the input text.
In an effort to streamline the deployment of the AI mesh and make it more user-friendly, there is an emphasis on reducing the number of filters that are directly relevant to the AI mesh. For example, when detecting banking information, a straightforward approach might involve creating detectors for words like "bank" or "account." However, such words' relevance to declassification can vary significantly between use-cases, making it challenging to establish a universally understandable policy for managing these detectors to meet expectations.
To overcome this challenge, information is organized within the network using CBOW models. This allows for ongoing tweaking of signal sensitivity based on user feedback. The strategy also involves restricting filters to use case-specific information. For instance, to identify confidential information on a specific premises, CBOW models are deployed to detect text indicating confidentiality or secrecy. Machine learning models assess the likelihood of text containing trade secrets or intellectual property. Users are encouraged to input filters relevant to the confidential signal, using specific keywords related to their technology, such as internal product names, codewords, or internal product IDs, which would not be known externally.
Intermediary nodes function by utilizing information provided by the inputs or other intermediary nodes, yet they are not visible in the user interface (UI). This can be attributed either to the irrelevance of the information processed by these nodes to the user—such as computation of reading ease scores and document complexity, which could clutter the user's view—or to the inaccuracy of intermediary signals. Efforts are made to furnish a more accurate signal by combining various intermediary signals.
Examples of intermediary nodes comprise machine learning classifiers that employ document level vectors to determine if the text aligns with a certain type of signal, CBOW classifiers that ascertain whether a specific topic is being discussed in the document, and Bayesian mappings that integrate several signals into a conclusive output signal.
Intermediary or output mappings often exhibit a stereotypical structure where multiple input signals are consolidated to create a more robust and accurate output signal. For instance, to determine whether a file is an HR document, input signals might include a machine learning model that assesses whether the file reads as an HR document, a CBOW model that detects topics relevant to the HR sector present in the file, and several filters searching for HR-specific terminology. While there are numerous methods to combine these signals into an output signal, a standardized approach, referred to as the "standard mapping," is typically employed to ensure consistency and efficiency in the process.
The standard mapping process outputs a true or false value based on inputs from three types of true/false signals, which can either be filters or machine learning models.
Hard Signals: These are decisive signals that set the standard mapping to true whenever any one of them is true, regardless of the status of other signals. For instance, the detection of a highly specific and unique identifier like a Social Security number in certain contexts immediately indicates the presence of private identifiable information, irrespective of other detectors' output.
Soft Signals: These signals set the standard mapping to true only if one of them is true and is also supported by other true signals. This is used in cases where broad criteria need further verification. For example, detecting the word "account" may flag a text potentially as financial information. However, it requires additional corroborative evidence from other sources or models to be classified definitively as financial information.
Supporting Evidence: These signals influence the standard mapping's truth value either if all are true with high confidence, providing strong evidence that the mapping should be true, or if they are true with low confidence but a soft signal is also true. This layered approach ensures a nuanced decision-making process that accounts for evidence strength and relevance.
This structured approach to output mapping ensures accurate and reliable determinations based on the nature and strength of the input signals. This approach is outlined in pseudocode below:
Output nodes utilize information from intermediary nodes to generate signals that are directly presented in the user interface (UI). These signals encompass:
Data Attributes: Important for characterizing the data or data asset attributes, such as whether the data is financial, HR-related, etc.
Compliance Labels: These labels indicate whether the data may be subject to specific compliance regulations, including PII (Personal Identifiable Information), PHI (Protected Health Information), etc.
Classifications: Define the kind of actionable results that should be derived after classifying the file, providing a clear directive for subsequent actions.
Notably, especially in the context of compliance and classification, these output nodes can also be used to stamp information directly onto the file. This ensures that important data about compliance and classification is visibly and immediately associated with the file, facilitating easy access to this critical information through the UI.
The typical classification system categorizes the level of sensitivity of a file. This can range from a binary flag indicating whether the file is sensitive or not, to a more nuanced classification with three to five labels, such as:
Public
Internal
Confidential
Highly Confidential
Secret/Top Secret
However, it is recommended to avoid using more than four or five mutually exclusive outcomes for classifying a file. This is because having too many categories can complicate implementation on the customer's side and pose challenges in verifying the accuracy of the classifier. Simplifying the classification spectrum helps both in ease of use and ensuring a more straightforward validation of classification results.
We offer below the visualization of a large AI mesh (80 nodes) with input nodes at the bottom and output nodes at the top.
Nodes are colour coded as follows:
yellow - document and word vectors
blue - ML classifiers
green - light ML (CBOW) classifiers
red - Python / engineered signals
black - forward mappings
Notice how few input filters are entangled with knowledge collected by ML models, and how the classification output node (top) integrates information from all these nodes.
The AI mesh is designed to be multilingual, catering to the requirements of machine learning models that depend on word vectors or document factors derived from unstructured text. The strategy to achieve multilingual capability involves generating the same document or word vectors for the same text translated into multiple languages (language-agnostic representations). This approach compresses the text into sentence or document vectors, and the language model itself has a certain capability to translate between the languages it supports.
For the sake of classification speed and accuracy, the deployment is typically restricted to bilingual models, where one of the languages is English and the other could be Arabic, French, or any other language. Although the solution has been tested with up to 12 different languages, in practice, a more focused bilingual approach is preferred.
For other types of nodes within the mesh, such as filter nodes or complexity detectors, adequate adjustments are necessary to account for language-specific differences. This ensures that the AI mesh can efficiently and accurately process information across different languages, maintaining its effectiveness and utility in multilingual environments.
The design of the AI mesh carefully balances exposing a reasonable number of signals and accurately characterizing a block of text of a certain size. Limiting the number of relevant signals to no more than 100 is very important for maintaining the explainability of the mesh in relation to the analysed content. This approach ensures that users can understand how and why certain analytical outcomes were reached without being overwhelmed by too much information.
When the AI mesh produces a classification outcome, we also store to the database the prerequisites for that outcome within the mesh. This includes which models contributed, in what way, and the confidence scores that contributed to the ancestral sampling of that classification outcome with a specific confidence score. This rich signal provides substantial information about the unstructured text that the mesh processes.
These prerequisite signals are essential for explaining the classification outcome that the user observes. Explanations can be provided on a per-file basis by examining the outputs of intermediary nodes in the mesh or on a population basis by identifying which factors lead to particular decisions for specific file populations. Natural language synthesis can be employed to translate these intermediary outcomes into understandable natural language, further enhancing the explainability of the mesh's analytical processes.
The target quality for the user experience with the AI mesh aims to mirror the Service Level Agreement (SLA) for the ML classifiers, where around 80% of the predictions are expected to be perceived as accurate by the user. Adjustments to the mesh will be made if the user's perception significantly deviates from this standard. Specifically, for any given file analysed by the mesh, approximately 8 out of 10 data attributes collected should be correct or flagged with low confidence. Similarly, for any specific data attribute, about 8 out of 10 files should yield a correct prediction or a prediction marked with low confidence.
After a file is evaluated, the per-file outcomes from the classification network within the AI mesh are stored in a database, making them accessible to GQL enabled filters and reports. This approach leverages the rich signal derived from the unstructured content to generate a wide array of actionable reports. Moreover, the classification pipeline incorporates Active Directory information about who has access to the files. This integration is important for assessing the risk associated with highly confidential files being accessed by trustees, as part of the DSPM+ suite.
Characterization of data (static or in-flight) with an AI mesh of narrow models has a series of advantages compared to using Large Language Model (LLM) AI technology.
The overall compute required to run the AI mesh 100x-1000x less than that of a classification LLM with similar accuracy. Due to that, it can be successfully productized without requiring specialized hardware such as GPUs.
Owing to the way the AI mesh is constructed, tweaking it towards providing expected outcomes for different use cases entails modifying a small number of nodes, which lowers the cost of adapting the mesh to expectations.
Since the mesh relies on specialized detectors which are associated with intuitive concepts, it can be used natively to build robust explanations regarding the classification outcomes, with or without language synthesis by LLM.
The mesh uses narrow AI classifiers which are trained on synthetic datasets which are small (1-10M tokens) compared to LLM corpora (trillions of tokens). These datasets are available for review and audit, and can be used to completely characterize the behaviour of the AI system, and to ascertain its regulatory liability.
The layout of the mesh natively allows integration with any sources or 3rd party signals via its mapping mechanism.
The Keycloak admin URL will consist of the following components:
The domain that has been configured for the reseller to access the application (e.g. my-reseller.net or 10.10.121.127)
The service path (e.g. auth for Keycloak)
The keycloak admin path /admin/master/console
An example of the above might look something like this:
Once the correct address has been entered for the cluster Keycloak instance following the above guidelines, it should be possible to login to the Keycloak admin dashboard using the following details:
This is the default username and password for the initial login to Keycloak. Please ensure that it is changed!
Username: admin
Password: admin
The access protocol should always be https
The domain in the example above (E.g. my-reseller.net) might not be applicable if a domain is not configured, in which case the server IP address needs to be used (e.g. 10.10.121.127)
Once logged into the portal, there are some additional steps to complete in order to configure Keycloak.
In Keycloak, a Realm is a top level authentication domain which contains an isolated authentication configuration. For example, each separate Keycloak Realm might represent a different environment.
A Realm needs to be created to managa the cluster authentication:
Click on the left-side menu item Realm Settings. Make sure that the gv realm is selected in the top left, not master.
This will load the Gv Realm Settings → General tab, enter the desired user-friendly reseller name into both the Display name and HTML Display name fields.
Click the Save button to commit these changes to the Realm Settings.
Do not change the content of Realm ID field, it has to be gv.
Click on the Clients menu item on the left-side menu, this should load a list of authentication clients.
Click on the name link of the item labeled dashboard to navigate to its client configuration page.
Open the dropdown for Login Theme and select the theme created for the reseller (E.g. my-reseller-theme).
Update the Valid Redirect URIs to include the URL that has been configured for the Dashboard UI (remember to click the + plus icon after entering the value). This will allow Keycloak to redirect back to the Dashboard UI after authenticating.
Update the Web Origins to include the URL that has been configured for the Dashboard UI (remember to click the + plus icon after entering the value). This will allow CORS endpoint calls to Keycloak from the Dashboard UI.
Clear the Front-channel logout URL field’s content. This way, instead of the “you are getting logged out” screen, it will go straight to the login page upon logout.
Alternatively, you can you can enter the Front-channel logout URL in the following format: https://my-dashboard.com/auth/realms/gv/protocol/openid-connect/logout.
Click the Save button at the bottom of the screen.
This step is important and required for the agent to work correctly. This user is only used internally by agents on endpoints to authenticate with the server. This user cannot be used to log in to the dashboard. For dashboard login, you must create your user in the gv realm.
Make sure it’s still the gv realm selected in the top left, not master.
Click on the Users menu item on the left-side menu, this should load the Users list.
Click the Add user button in the top right to open the Add user screen.
It’s only necessary to complete two fields on this form; The Username field should contain agent, and the Email field should contain [email protected].
Click the Save button at the bottom of the screen.
By default, there are no users in the gv realm, meaning that nobody can access the dashboard to view agent activity, use analytics, run scans or create reports.
Users must either be created manually as described below, or imported, e.g. via LDAP user federation.
Users created in the gv realm will have full administrative access to the GetVisibility web console.
RBAC implementation for granular management of dashboard user permissions is on our roadmap.
Make sure that it’s still the gv realm selected in the top left, not master:
Click on the Users menu item on the left-side menu, this should load the (empty) Users list.
Click the Add user button at the top to open the Add user screen.
There is only one mandatory field here; The Username field should contain your desired username, e.g. admin.
Click Create. This will then load the User Details page for the user that was just created.
Here, click Set password.
Next, choose a strong password for the user. Leave the “Temporary” option on if the user should change their password on the first login.
Click Save.
Navigate to the /ui endpoint of the IP of the server or the domain if you configured any. E.g. https://my-dashboard.com/ui or https://10.10.121.127/ui
Confirm that the credentials are working as expected.
Usually any issues which occur during the LDAP Active Directory configuration process above will be related to Network accessibility concerns or authentication credentials being incorrect.
However, if any additional assistance is required or the problem is not easily resolved by troubleshooting Network communications and authentication details, please reach out to Support following the steps here.
The system-upgrade-controller file that will be used to upgrade the K3s cluster:
https://assets.master.k3s.getvisibility.com/system-upgrade-controller/v0.10.0/system-upgrade-controller.yaml
The Bundle file for the K3s upgrade in the Air-Gap Environment
Make sure you push all new docker images to the ECR gv-public docker registry that you need to install the new k3s version.
Updates and custom settings are automatically applied to all backend services using Fleet as long as the cluster has access to the public internet and can connect to the management server.
In case there’s no internet connection or the management server is down, the cluster agent will keep trying to reach the management server until a connection can be established.
Log in to Rancher or one of the master nodes of the cluster to use kubectl CLI
List the node name and the K3s version:
Add the label k3s-upgrade=true to the nodes:
Note: In the case of a multi-node cluster, each node will be updated with the label mentioned above
Deploy the system-upgrade-controller :
Create upgrade-plan.yaml file.
Note: the key version has the version of the K3s that the cluster will be upgraded to.
Run the upgrade plan. The upgrade controller should watch for this plan and execute the upgrade on the labeled nodes
Once the plan is executed, all pods will restart and will take a few minutes to recover. Check the status of all the pods:
Check if the K3s version has been upgraded:
Delete the system-upgrade-controller
Here is the demo video that showcases the steps that need to be performed to upgrade K3s:
video
Take a shell session to each of the cluster nodes (VMs)
Download and Extract the bundle file: tar -xf gv-platform-$VERSION.tar to all the VMs
Perform the following steps in each of the VMs to Upgrade K3s:
Restart the k3s service across each of the nodes Master nodes:
Worker nodes:
Wait for a few minutes for the pods to recover.
Check the k3s version across the nodes
Here is the demo video that showcases the steps that need to be performed to upgrade K3s in the Air Gap environment:
video
For the Platform Team: Local Cluster K3s Upgrade
If you are upgrading K3s of the local cluster, you would need to remove the existing PodSecurityPolicy resources.
We have only one of them under the chart aws-node-termination-handler
Patch the helm Chart to disable the psp resource.
kubectl patch helmchart aws-node-termination-handler -n kube-system --type='json' -p='[{"op": "add", "path": "/spec/set/rbac.pspEnabled", "value": "false"}]'
This will trigger the removal of the PSP resource
The traefik is deployed as daemonset in the local clusters. You would need to restart the daemonset instead when following the steps given in (broken link)
Deploy the system-upgrade-controller:
Create the upgrade plan
Note: the key version has the version of the K3s that the cluster will be upgraded to.
If you are also running a worker node then execute this too:
Run the upgrade plan:
In the case of a Worker node execute this too:
Once the plan is executed, all pods will restart and take a few minutes to recover Check the status of all the pods:
Check if the K3s version has been upgraded:
Delete the system-upgrade-controller:
Reference: Apply upgrade: https://docs.k3s.io/upgrades/automated#install-the-system-upgrade-controller
We have seen an issue with Traefik not able to access any resources after the upgrade is implemented. Follow these steps to implement the fix
Run this patch to add traefik.io to the apiGroup of the ClusterRole traefik-kube-system
Add the missing CRDs
Restart traefik deployment
Follow these steps to upgrade k3s: Upgrading K3s - AirGap (Manual Approach)
Run this patch to add traefik.io to the apiGroup of the ClusterRole traefik-kube-system
Add the missing CRDs
Restart traefik deployment
By default, certificates in K3s expire in 12 months. If the certificates are expired or have fewer than 90 days remaining before they expire, the certificates are rotated when K3s is restarted.
Browse to App Registration and select New registration
On the App Registration page enter below information and click Register button
Name: (Enter a meaningful application name that will be displayed to users of the app)
Supported account types:
Select which accounts the application will support. The options should be similar to those below. Select “Accounts in this organizational directory only”:
Leave the Redirect URI as empty and Click Register
Note the Application (client) ID, Directory (tenant) ID values
Navigate to Manage -> Certificates and secrets on the left menu, to create a new client secret
Provide a meaningful description and expiry to the secret, and click on Add
Once a client secret is created, note its Value and store it somewhere safe. NOTE: this value cannot be viewed once the page is closed.
Navigate to Manage -> API permissions on the left menu, and Add a permission
Select Microsoft APIs -> Microsoft Graph
Select Application permissions
For UnifiedPolicy.Tenant.Read
Navigate to Manage -> API permissions on the left menu, and Add a permission
Select APIs my organization uses tab
Search for Microsoft Information Protection Sync Service
Select Application permissions > UnifiedPolicy.Tenant.Read
For InformationProtectionPolicy.Read.All
Navigate to Manage -> API permissions on the left menu, and Add a permission
Select APIs my organization uses tab
For Azure Rights Management Services > Content.Writer
Navigate to Manage -> API permissions on the left menu, and Add a permission
Select Azure Rights Management Services tab
Permissions required
For scanning
Microsoft Graph > Application permissions > Sites > Sites.Read.All
For reading Sensitivity labels
Microsoft Graph > Application permissions > InformationProtectionPolicy > InformationProtectionPolicy.Read.All
APIs my organization uses > Microsoft Information Protection Sync Service > Application permissions > UnifiedPolicy.Tenant.Read
For revoke permissions
Microsoft Graph > Application permissions > Files > Files.ReadWrite.All
For tagging
Microsoft Graph > Application permissions > Sites > Sites.Manage.All
For MIP tagging
Azure Rights Management Services > Application permissions > Content.Writer
Microsoft Graph > Application permissions > Directory > Directory.Read.All
Microsoft Graph > Application permissions > Sites > Sites.Manage.All
Once all the required permissions are added, click "Grant admin consent"
Navigate to Administration -> Data Sources -> SharePoint Online -> New scan
Provide the Directory (tenant) ID, Application (client) ID and Client Secret value generated in the above steps from the azure application
Click on the Folder icon in Site and path to select a particular site to scan, or leave the path as empty to scan all sites
Save the configuration
Once the configuration is saved, click on the icon on the right and select Start file scan to begin the scanning
The results can be viewed under Dashboard -> Enterprise Search
First create the default Getvisibility tags as a new column in SharePoint. This process is described below:
In SharePoint, navigate to Documents
In the files view, select + Add column
Select Choice and then Next
Set the name to Classification and the choices as: Public, Internal, Confidential, Highly-Confidential. Select.
Then click Save
Similary create Compliance and Distribution columns (if required)
Getvisibility and SharePoint's tags are now aligned
When tags are written to SharePoint files automatically over the API, as the tags are added by Getvisibility, Modified By changes to System Account.
Getvisibility preserves the Modified date where applicable.
How to configure a Google Drive connection to scan files and folders.
Create a Project in Google Cloud Console:
Go to the
Create a new project or select an existing project
Enable the Google Drive, Drive Labels and Admin SDK API:
In the Google Cloud Console, navigate to APIs & Services > Library
Search for "Google Drive API" and click on it
Create OAuth 2.0 Credentials:
In the Google Cloud Console, navigate to the APIs & Services > Credentials
Click "Create credentials" and select "Service account"
From your domain's , go to Main menu menu > Security > Access and data control > API controls
In the Domain wide delegation pane, select "MANAGE DOMAIN-WIDE DELEGATION"
Click Add new
In the Client ID field, enter the client ID obtained from the service account creation steps above
In the OAuth Scopes field, enter a comma-delimited list of the scopes required for the application
Use the below scopes:
For scanning
https://www.googleapis.com/auth/admin.directory.user.readonly
https://www.googleapis.com/auth/admin.directory.group.readonly
In order to perform a scan using Google Drive connector, it needs a user with the below Admin roles assigned:
Services Admin
User Management
Groups Reader
They can be added/checked here for the UserID which will be used for impersonation: > Directory > Users > Assign roles > add Services Admin, User Management, and Groups Reader roles, as follows:
Navigate to
Select Users under Directory from the left menu
Select a user you want to use for scanning
Navigate to User details -> Admin roles and privileges
Edit the roles, and enable:
Services Admin
User Management
Groups Reader
Note: It might take few minutes before the changes are affected.
Navigate to Administration -> Data Sources -> Google Drive -> New scan
Enter the details of the OAuth2 credentials obtained previously, also add the user id (in the form of [email protected]) of the user you assigned roles in the above steps
Click on the Folder icon in Path to select a particular user's drive to scan, or leave the path as empty to scan all users
Save the configuration
Once the configuration is saved, click on the icon on the right and select Start file scan to begin scanning
The scan results can be viewed under Dashboard -> Enterprise Search
Default Getvisibility labels need to be created in Google Drive. This process is described below:
Turn on Drive labels for the organization
In the Google Admin Console (at admin.google.com)
Go to Menu
Click Labels
Our K3s HA setup consists of 4 homogeneous nodes (3 master nodes + 1 worker node) and can withstand a single-node failure with a very short failover disruption (between 3 to 6 minutes).
With our HA setup we can achieve a monthly uptime of 99.9% (a maximum of 43m of downtime every month).
Please refer to K3S installation for the node specs of the product you’ll be installing.
The minimum spec allowed for a HA node is 8 CPUs, 32GB of RAM and 500GB of free SSD disk space. All nodes should also have the same spec and OS.
K3s needs the following ports to be accessible by all other nodes running in the same cluster:
The ports above should not be publicly exposed as they will open up your cluster to be accessed by anyone. Make sure to always run your nodes behind a firewall/security group/private network that disables external access to the ports mentioned above.
All nodes in the cluster must have:
Domain Name Service (DNS) configured
Network Time Protocol (NTP) configured
Software Update Service - access to a network-based repository for software update packages
Fixed private IPv4 address
The following port must be publicly exposed in order to allow users to access Synergy or Focus product:
The user must not access the K3s nodes directly, instead, there should be a load balancer sitting between the end user and all the K3s nodes (master and worker nodes):
The load balancer must operate at Layer 4 of the OSI model and listen for connections on port 443. After the load balancer receives a connection request, it selects a target from the target group (which can be any of the master or worker nodes in the cluster) and then attempt to open a TCP connection to the selected target (node) on port 443.
The load balancer must have health checks enabled which are used to monitor the health of the registered targets (nodes in the cluster) so that the load balancer can send requests to healthy nodes only.
The recommended health check configuration is:
Timeout: 10 seconds
Healthy threshold: 3 consecutive health check successes
Unhealthy threshold: 3 consecutive health check failures
Interval: 30 seconds
Please refer to for the list of urls you need to enable in your corporate proxy in order to connect to our private registries.
We need 3 master nodes and at least 1 worker node to run K3s in HA mode.
The nodes must be homogeneous, having the same number of CPUs, RAM and disk space.
To get started launch a server node using the cluster-init flag:
Check for your first master node status, it should have the Ready state:
Use the following command to copy the TOKEN that will used to join the other nodes to the cluster:
Don’t also forget to copy the private IP address of the 1st master node which will be used by the other nodes to join the cluster.
SSH into the 2nd server to join it to the cluster:
Replace K3S_TOKEN with the contents of the file /var/lib/rancher/k3s/server/node-token from the 1st master node installation.
Set --node-name to master2
Set --server to the private static IP address of the 1st master node.
Check the node status:
SSH into the 3rd server to join it to the cluster:
Replace K3S_TOKEN with the contents of the file /var/lib/rancher/k3s/server/node-token from the 1st master node installation.
Set --node-name to master3
Set --server to the private static IP address of the 1st master node.
Check the node status:
SSH into the 4th server to join it to the cluster:
Replace K3S_TOKEN with the contents of the file /var/lib/rancher/k3s/server/node-token from the 1st master node installation.
Set --node-name to worker1
Set --server to the private static IP address of the 1st master node.
You may create as many additional worker nodes as you want.
SSH into the server to join it to the cluster:
Replace K3S_TOKEN with the contents of the file /var/lib/rancher/k3s/server/node-token from the 1st master node installation.
Update --node-name with your worker node name(Ex: worker2 , worker3 etc..)
Set --server to the private static IP address of the 1st master node.
Check the node status:
You may run the registration command that you generated using Rancher UI or through license manager. You should see all master and worker nodes in your cluster through the Machine Pools on the Rancher dashboard:
Go to Apps > Charts and install the GetVisibility Essentials Helm chart:
If you are installing Focus or Enterprise click on Enable ElasticSearch.
Configure the UTC hour (0-23) that backups should be performed at:
Click on High Available and set:
MinIO Replicas to 4
MinIO Mode to distributed
Go to Apps > Charts and install the GetVisibility Monitoring Helm chart and Install into Project: Default.
Click on High Available and set:
Prometheus replicas to 2
Loki replicas to 2
Go to the global menu Continuous Delivery > Clusters and click on Edit config for the cluster:
For Synergy: add 3 labels product=synergy environment=prod high_available=true and press Save.
For Focus: add 3 labels product=focus environment=prod high_available=true
This guide outlines how to configure Microsoft O365 Streaming in environments where Getvisibility’s Data Detection and Response (DDR) platform is deployed on-premise or in a private cloud. The integration enables DDR to receive and act upon real-time Microsoft 365 activity notifications.
Ensure the following prerequisites are in place before starting the integration:
A deployed and operational DDR instance.
A public DNS record pointing to the DDR listener endpoint.
A valid SSL/TLS certificate from a trusted Certificate Authority.
An internet-accessible port 443 (HTTPS) endpoint.
Make sure the DDR webhook endpoint is:
Publicly accessible via a fully qualified domain name (FQDN).
Protected with a valid SSL/TLS certificate.
Accessible on port 443 (HTTPS).
Note: You can use a reverse proxy (e.g., NGROK, NGINX) to securely expose internal services if needed.
Microsoft recommends restricting webhook traffic to only allow inbound requests from Microsoft Graph servers. This reduces the attack surface and prevents spoofed webhook messages.
Allowlist Required Endpoints:
More info at
⚠️ Action Required: Your firewall or reverse proxy must allow inbound HTTPS traffic from all IP addresses Microsoft uses to deliver change notifications. Regularly update your rules using Microsoft’s published IP ranges.
Microsoft.Storage/storageAccounts/blobServices/containers/read (Return a container or a list of containers)
Microsoft.Storage/storageAccounts/blobServices/containers/write (Modify a container's metadata or properties)
Microsoft.Storage/storageAccounts/blobServices/generateUserDelegationKey/action (Returns a user delegation key for the Blob service)
Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read (Return a blob or a list of blobs)
Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write (Write to a blob)
Microsoft.Storage/storageAccounts/blobServices/containers/blobs/move/action (Moves the blob from one path to another)
Microsoft.Storage/storageAccounts/blobServices/containers/blobs/add/action (Returns the result of adding blob content)curl -sfL https://assets.master.k3s.getvisibility.com/k3s/k3s.sh | \
INSTALL_K3S_VERSION="v1.26.10+k3s1" K3S_KUBECONFIG_MODE="644" PRODUCT_NAME=enterprise sh -s - server --node-name=local-01curl -sfL https://assets.master.k3s.getvisibility.com/k3s/k3s.sh | \
INSTALL_K3S_VERSION="v1.26.10+k3s1" K3S_KUBECONFIG_MODE="644" PRODUCT_NAME=ultimate ONLY_PRECHECK=true sh -s - server --node-name=local-01curl -sfL https://assets.master.k3s.getvisibility.com/k3s/k3s.sh | \
INSTALL_K3S_VERSION="v1.26.10+k3s1" K3S_KUBECONFIG_MODE="644" SKIP_SYSTEM_CHECKS=true sh -s - server --node-name=local-01curl -sfL https://assets.master.k3s.getvisibility.com/k3s/k3s.sh | \
INSTALL_K3S_VERSION="v1.26.10+k3s1" K3S_KUBECONFIG_MODE="644" SKIP_PRECHECK=true sh -s - server --node-name=local-01export http_proxy="$PROXY_IP"
export https_proxy="$PROXY_IP"
no_proxy="$NODE_IP,localhost,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.svc,.cluster.local"kubectl apply -f https://....k3s.getvisibility.com/v3/import/dxslsxcf84....yamlwatch -c "kubectl get deployments -A" kubectl logs cattle-cluster-agent-d96d648d8-wjvl9 -n cattle-systemhttps://assets.master.k3s.getvisibility.com (Custom K3s installation files)
https://images.master.k3s.getvisibility.com (Private Docker registry)
https://charts.master.k3s.getvisibility.com (Private Helm registry)
https://prod-eu-west-1-starport-layer-bucket.s3.eu-west-1.amazonaws.com (Docker registry AWS CDN)
https://rpm.rancher.io (Rancher RPM repo for configuring SELinux packages on RHEL or CentOS)
https://api.master.k3s.getvisibility.com (Private API server)
https://rancher.master.k3s.getvisibility.com (Rancher management server)
https://rancher.$RESELLER_NAME.k3s.getvisibility.com (Rancher management server, where $RESELLER_NAME is Getvisibility for direct customers)if ANY hard signals True:
return True
if ANY soft signals True AND ALL supporting signals True even at low confidence:
return True
# by default this is disabled
if ALL supporting signals True at high confidence:
return True
otherwise return Falsehttps://my-reseller.net/auth/admin/master/consolekubectl get nodeskubectl label node --all k3s-upgrade=truekubectl apply -f https://assets.master.k3s.getvisibility.com/system-upgrade-controller/v0.10.0/system-upgrade-controller.yamlcat > upgrade-plan.yaml << EOF
---
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
name: k3s-latest
namespace: system-upgrade
spec:
concurrency: 1
version: v1.24.9+k3s2
nodeSelector:
matchExpressions:
- {key: k3s-upgrade, operator: Exists}
serviceAccountName: system-upgrade
upgrade:
image: docker.io/rancher/k3s-upgrade
EOFkubectl apply -f upgrade-plan.yamlwatch kubectl get pods -Akubectl get nodeskubectl delete -f https://assets.master.k3s.getvisibility.com/system-upgrade-controller/v0.10.0/system-upgrade-controller.yaml$ mkdir -p /var/lib/rancher/k3s/agent/images/
$ gunzip -c assets/k3s-airgap-images-amd64.tar.gz > /var/lib/rancher/k3s/agent/images/airgap-images.tar
$ cp assets/k3s /usr/local/bin && chmod +x /usr/local/bin/k3s$ systemctl restart k3s.service$ systemctl restart k3s-agent.servicewatch kubectl get pods -Akubectl get nodeskubectl apply -f https://assets.master.k3s.getvisibility.com/system-upgrade-controller/v0.13.1/system-upgrade-controller.yamlcat > upgrade-plan-server.yaml << EOF
---
# Server plan
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
name: server-plan
namespace: system-upgrade
spec:
concurrency: 1
cordon: true
nodeSelector:
matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: In
values:
- "true"
serviceAccountName: system-upgrade
upgrade:
image: rancher/k3s-upgrade
version: v1.26.10+k3s1
EOFcat > upgrade-plan-agent.yaml << EOF
---
# Agent plan
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
name: agent-plan
namespace: system-upgrade
spec:
concurrency: 1
cordon: true
nodeSelector:
matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: DoesNotExist
prepare:
args:
- prepare
- server-plan
image: rancher/k3s-upgrade
serviceAccountName: system-upgrade
upgrade:
image: rancher/k3s-upgrade
version: v1.26.10+k3s1
EOFkubectl apply -f upgrade-plan-server.yamlkubectl apply -f upgrade-plan-agent.yamlwatch kubectl get pods -Akubectl get nodeskubectl delete -f https://assets.master.k3s.getvisibility.com/system-upgrade-controller/v0.13.1/system-upgrade-controller.yamlkubectl patch clusterrole traefik-kube-system -n kube-system --type='json' -p='[{"op": "add", "path": "/rules/-1/apiGroups/-", "value": "traefik.io"}]'kubectl apply -f https://assets.master.k3s.getvisibility.com/k3s/v1.26.10+k3s1/traefik-patch.yamlkubectl rollout restart deployment traefik -n kube-systemkubectl patch clusterrole traefik-kube-system -n kube-system --type='json' -p='[{"op": "add", "path": "/rules/-1/apiGroups/-", "value": "traefik.io"}]'kubectl apply -f assets/traefik-patch.yamlkubectl rollout restart deployment traefik -n kube-systemFirewall rules allowing inbound traffic from Microsoft Graph servers.






700GB min 32M inodes


































Search for Microsoft Information Protection API
Select Application permissions > InformationProtectionPolicy.Read.All
Select Application permissions
Select Content > Content.Writer
Microsoft Graph > Application permissions > InformationProtectionPolicy > InformationProtectionPolicy.Read.All
APIs my organization uses > Microsoft Information Protection API > Application permissions > InformationProtectionPolicy.Read.All




























9796
Prometheus node exporter
--node-name when installing K3s in a VM to set a static node name)Consul Server replicas to 3
For Enterprise: add 3 labels product=enterprise environment=prod high_available=true and press Save.
Protocol
Port
Description
TCP
6443
Kubernetes API Server
UDP
8472
Required for Flannel VXLAN
TCP
2379-2380
embedded etcd
TCP
10250
metrics-server for HPA
Protocol
Port
Description
TCP
443
Focus/Synergy backend







TCP
curl -sfL https://assets.master.k3s.getvisibility.com/k3s/k3s.sh | INSTALL_K3S_VERSION="v1.26.10+k3s1" K3S_KUBECONFIG_MODE="644" sh -s - server --node-name=master1 --cluster-initkubectl get nodescat /var/lib/rancher/k3s/server/node-tokencurl -sfL https://assets.master.k3s.getvisibility.com/k3s/k3s.sh | K3S_TOKEN=SHARED_SECRET INSTALL_K3S_VERSION="v1.26.10+k3s1" K3S_KUBECONFIG_MODE="644" sh -s - server --node-name=master2 --server https://<ip or hostname of master1>:6443kubectl get nodescurl -sfL https://assets.master.k3s.getvisibility.com/k3s/k3s.sh | K3S_TOKEN=SHARED_SECRET INSTALL_K3S_VERSION="v1.26.10+k3s1" K3S_KUBECONFIG_MODE="644" sh -s - server --node-name=master3 --server https://<ip or hostname of master1>:6443kubectl get nodescurl -sfL https://assets.master.k3s.getvisibility.com/k3s/k3s.sh | K3S_TOKEN=SHARED_SECRET INSTALL_K3S_VERSION="v1.26.10+k3s1" K3S_KUBECONFIG_MODE="644" sh -s - agent --node-name=worker1 --server https://<ip or hostname of any master node>:6443curl -sfL https://assets.master.k3s.getvisibility.com/k3s/k3s.sh | K3S_TOKEN=SHARED_SECRET INSTALL_K3S_VERSION="v1.26.10+k3s1" K3S_KUBECONFIG_MODE="644" sh -s - agent --node-name=workerX --server https://<ip or hostname of any master node>:6443kubectl get nodesClick the "Enable" button to enable the Google Drive API for the project
Search for "Admin SDK API" and click on it
Click the "Enable" button to enable the Admin SDK API for the project
Search for "Drive Labels API" and click on it
Click the "Enable" button to enable Drive Labels API for the project
Enter a name in the Service account name field and click CREATE AND CONTINUE
Under Grant this service account access to the project, select role as Owner and click DONE
Select the newly created service account and click Keys > Add Key > Create new key
Make sure the key type is set to json and click Create
The new private key pair is generated and downloaded to the machine. Note the values of private_key, client_email and client_id
For revoke permissions
https://www.googleapis.com/auth/drive
For tagging
https://www.googleapis.com/auth/drive.file
https://www.googleapis.com/auth/drive
https://www.googleapis.com/auth/drive.admin.labels
https://www.googleapis.com/auth/drive.metadata
https://www.googleapis.com/auth/drive.labels
For Extended Streaming Events
https://www.googleapis.com/auth/admin.reports.audit.readonly
Click Authorize
Click on Save
Select Turn Labels On
Click Save
Create Drive labels:
Go to the labels manager at https://drive.google.com/labels.
Requires having the Manage Labels privilege.
Click New label.
To create one badged label:
Choose a badged label
Choose to start from an example, or from scratch.
Update the title as Classification.
(Optional) Add a description or a learn more URL that points to internal documentation about the label.
To create a standard label:
Two standard labelsneed to be created; Distribution and Compliance
Click a standard label template or click Create New.
Enter or update the label name.
Publish the labels
If it’s not open already, open the labels manager (https://drive.google.com/labels) and click the label.
Review the label and any fields.
Click Publish.
Confirm that the lable will be published by clicking Publish.
















If an Azure Files scan does not already exist, follow this guide to create a new Azure Files scan and ensure the necessary credentials are set up.
Go to the Scan configurations page in the product UI.
Locate your existing Azure Files scan configuration and select Edit Configuration from the options menu. Note the configured path (folder) and save it, as it will be used in step 9 to replace {FolderPath}.
Within the Edit Azure Files Scan Configuration page, toggle Data Streaming to ON.
Copy the Webhook URL provided, as you will use it later in the Azure Portal. Save this Webhook URL, as it will be used in step 9 to replace {WebhookUrl}.
Click Save & Close button to save configuration.
Navigate to Azure Portal Event hubs and click Create
In Create Namespace Window fill in the details
Give it a Name
Select your subscription and resource group
Select location
Pricing tier - standard
Throughput Units - 1
Click on Review + Create and then Create after validation
After namespace is created, click on + Event Hub button
In Create Event Hub Window fill in name and click Create + Review and Create after validation. Save the name of the Event Hub you created in this step, as it will be used later in step 9 to replace {eventHubName}.
Configure access policy
In the event hubs namespace window click on Settings/Shared access policies and then +Add button
Fill in the details in the new tab, set LogicAppsListenerPolicy as name, select Listen policy, and click Save.
Click on the newly created policy, then copy and save the Connection string–primary key. This will be needed later in step 8b.
Navigate to Azure Portal and open your Storage Account.
Select needed account from the Storage Accounts
In the left-hand menu, select Monitoring/Diagnostic settings and click file
In Diagnostic settings Window click on "+ Add diagnostic setting" button
In Create Diagnostic setting Window fill in the details:
Give it a Name
Select Category groups allLogs
Select Destination details Stream to an event hub and select newly created Event Hub Namespace and Event Hub
Go to Azure logic apps and click "Add" button
In Create Logic App Window select Workflow Service Plan
In Create Logic App (Workflow Service Plan) Window fill in the details and click "Create + Review":
Select your subscription and resource group
Give logic app name
Select region
Pricing plan should be WS1
In the monitoring tab select No for the application insights
Click Review + create button
Click Create after validation
In newly created logic app click on Workflows/Workflows and then +Add button
In new workflow tab fill in name, select State type: Stateful and click Create
In created workflow go to Developer/Designer and click on Add a trigger, then in search type "Event hub" and select "When events are available in Event Hub"
Configure API connection
Click on the trigger, set "Temp" for Event Hub Name and then click on Change connection.
Then click Add New and fill in the details. Enter any name for the connection name and use the connection string {Connection string–primary key} from step 3.6.c.
In workflow navigation tab go to Developer/Code and set the provided code, then click save:
Replace with a path to the streaming folder. For ex., you want to get events from the folder "StreamingFolder" which is located in file share with the name "DocumentsShare" and in the folder with the name "Personal". In this case, the path should be "DocumentsShare/Personal/StreamingFolder"
Replace with webhook url provided in the application in the scan configuration window
After configuring the event subscription:
You may upload documents to the configured path.
The events triggered by these uploads will be processed by the Data Streaming setup, and the results will appear in your Getvisibility dashboard.
If you experience any issues with the configuration, ensure that:
The Webhook URL is correct and matches the configuration in Azure.
Steps 5.8 and 5.9 properly executed and all the variables are replaced with real values.
You can also check if the trigger was unsuccessful by navigating to your configured in previos steps Logic App, then Workflow and Trigger History. If you see any failed triggers, you can inspect the error details to identify the issue.
How to configure Dropbox connection to scan it.
Navigate to Administration -> Data Sources -> Dropbox
Then go to Credentials tab and click New credentials
Create credentials name and copy the Redirect URL - it will be needed later. App Key and App Secret fields will be filled later once we .
Login to
Go to and click Create app
On the App Creation page enter below information and click Create app button
Choose an API: Most applications will use "Dropbox API"
Choose Access Type: Select "Full Dropbox" for complete access.
Once done click Authorize with Dropbox button as below:
Then you'll be redirected to 1st page to trust your application - click Continue
Then you'll see a list of permissions app will be granted - click Allow
This document provides information on how to configure Azure Blob connection with real-time events monitoring and data streaming.
This guide provides steps on how to enable real-time data streaming for a Sharepoint Online connection and monitor streaming events within the Getvisibility platform.


(Optional) Add a description.
Choose whether the label is copied when the file is copied.
Add a field.










Click Save.
On the Change Connection tab, click Details and copy the Name from the connection details. Save this Name, as it will be used later in step 9 to replace {connectionName}.
Click save on workflow designer window
{eventHubName} with azure event hub name that was created previouslyReplace {connectionName} with connection name from previouse step

















Name Your App and click Create app: Enter a name that will be visible to users.
Go to the Settings tab and find app key and secret above the OAuth 2 section
We need to set proper permissions for Dropbox app. Below you can find a list of required permissions:
For scanning
Files and Folders > files.metadata.read, files.content.read
Collaboration > sharing.read
Team Data > team_data.member
Members > members.read, groups.read
For remediations
Collaboration > sharing.write
Files and Folders > files.content.write
For tagging
Files and Folders > files.content.write, files.metadata.write
Go to the Permissions tab of the newly created App and set the following:
Account Info: account_info.read
Files and Folders: files.metadata.write, files.metadata.read, files.content.write, files.content.read
Collaboration: sharing.read, sharing.write
Team: team_info.read
Team Data: team_data.member, team_data.content.write, team_data.content.read, files.team_metadata.write, files.team_metadata.read, files.permanent_delete
Members: members.read, groups.read
Once permissions are set click Save button located on the black snackbar at the bottom of the window.
Go back to Settings tab and scroll to Redirect URI section. Put here copied link from Dashboard and click Add
Then copy App key from Dropbox App settings page and put it into App key field in Dashboard Create connection form. Similar action should be made for App secret.
Once done you'll be redirected back to Dashboard page with success message as below:
Connection has been configured successfully








If an Azure Blob scan has not yet been created, follow this guide to create a new Azure Blob scan and ensure the necessary credentials are configured.
Go to the Scan configurations page in the product UI.
Find the existing Azure Blob scan configuration and select Edit Configuration from the options menu.
Within the Edit Azure Blob Scan Configuration page, toggle Data Streaming to ON.
Copy the Webhook URL provided, as you will use it later in the Azure Portal.
Navigate to Azure Portal and open the Storage Account.
Select one of the connector from the Storage Accounts
In the left-hand menu, select Events and click Create Event Subscription.tor menu
In Create Event Subscription Window fill in the details:
Give it a Name
Select endpoint type Web Hook
Set configure an endpoint
Go to Filters Menu on top
In the Subject Filters section, enter the correct path format for the subscription:
Use the following pattern:
/blobServices/default/containers/{connectionDetails.ContainerName}/blobs/{connectionDetails.FolderPath}
For example, if the container is mycontainer and the folder path is accuracy test/repository1, the path will look like:
/blobServices/default/containers/mycontainer/blobs/accuracy test/repository1
Click Create to complete the Event Subscription setup.
Ensure the following permissions are assigned to the Azure Storage Account:
EventGrid Data Contributor
EventGrid EventSubscription Contributor
EventGrid TopicSpaces Publisher
For details on assigning these roles, refer to this documentation.
Navigate to Azure Portal Event hubs and click Create
In Create Namespace Window fill in the details
Give it a Name
Select your subscription and resource group
Select location
Pricing tier - standard
Throughput Units - 1
Click on Review + Create and then Create after validation
After namespace is created, click on + Event Hub button
In Create Event Hub Window fill in name and click Create + Review and Create after validation. Save the name of the Event Hub you created in this step, as it will be used later in step 9 to replace {eventHubName}.
Configure access policy
In the event hubs namespace window click on Settings/Shared access policies and then +Add button
Fill in the details in the new tab, set LogicAppsListenerPolicy as name, select Listen policy, and click Save.
Click on the newly created policy, then copy and save the Connection string–primary key. This will be needed later in step 8b.
Navigate to Azure Portal and open your Storage Account.
Select needed account from the Storage Accounts
In the left-hand menu, select Monitoring/Diagnostic settings and click blob
In Diagnostic settings Window click on "+ Add diagnostic setting" button
In Create Diagnostic setting Window fill in the details:
Give it a Name
Select Category groups allLogs
Select Destination details Stream to an event hub and select newly created Event Hub Namespace and Event Hub
Go to Azure logic apps and click "Add" button
In Create Logic App Window select Workflow Service Plan
In Create Logic App (Workflow Service Plan) Window fill in the details and click "Create + Review":
Select your subscription and resource group
Give logic app name
Select region
Pricing plan should be WS1
In the monitoring tab select No for the application insights
Click Review + create button
Click Create after validation
In newly created logic app click on Workflows/Workflows and then +Add button
In new workflow tab fill in name, select State type: Stateful and click Create
In created workflow go to Developer/Designer and click on Add a trigger, then in search type "Event hub" and select "When events are available in Event Hub"
Configure API connection
Click on the trigger, set "Temp" for Event Hub Name and then click on Change connection.
Then click Add New and fill in the details. Enter any name for the connection name and use the connection string {Connection string–primary key} from step 3.6.c.
In workflow navigation tab go to Developer/Code and set the provided code, then click save:
Replace with a path to the streaming folder. For ex., you want to get events from the folder "StreamingFolder" which is located in file share with the name "DocumentsShare" and in the folder with the name "Personal". In this case, the path should be "DocumentsShare/Personal/StreamingFolder"
Replace with webhook url provided in the application in the scan configuration window
If you experience any issues with the configuration, ensure that:
The Webhook URL is correct and matches the configuration in Azure.
Steps 5.8 and 5.9 properly executed and all the variables are replaced with real values.
You can also check if the trigger was unsuccessful by navigating to your configured in previos steps Logic App, then Workflow and Trigger History. If you see any failed triggers, you can inspect the error details to identify the issue.
After configuring the event subscription:
Documents may be uploaded to the configured path.
The events triggered by these uploads will be processed by the Data Streaming setup, and the results will appear in the Getvisibility dashboard.
If there any issues with the configuration, ensure that:
The Webhook URL is correct and matches the configuration in Azure.
The required Azure permissions are correctly assigned.
Steps 5.8 and 5.9 properly executed and all the variables are replaced with real values.
You can also check if the trigger was unsuccessful by navigating to your configured in previos steps Logic App, then Workflow and Trigger History. If you see any failed triggers, you can inspect the error details to identify the issue.
If there are multiple tenants to choose from, use the Settings icon in the top menu to switch to the tenant in which needs to be registered to the application from the Directories + subscriptions menu
Browse to App Registration and select your application that was created for the scanning
Navigate to Manage -> API permissions on the left menu, and Add a permission
Select Microsoft APIs -> Office 365 Management API
Select Application permission
Select ActivityFeed.Read permission
Permissions required
All the scanning permissions(https://docs.getvisibility.com/scan-with-getvisibility/configure-data-sources/onedrive)
Office 365 Management API ⇒ Application Permissions ⇒ ActivityFeed.Read
Once all the required permission is added, click "Grant admin consent"
Sign into the Microsoft Purview portal using Microsoft Edge browser
Select the Audit solution card. If the Audit solution card isn't displayed, select View all solutions and then select Audit from the Core section
If auditing isn't turned on for your organization, a banner is displayed prompting you start recording user and admin activity. Select the Start recording user and admin activity banner.
In certain cases, recoding cannot be enabled immediately and requires additional configuration. If this applies, users will be prompted to enable the customization setting. Select OK, and a new banner will appear, informing you that the process may take 24 to 48 hours to complete. After this waiting period, repeat the previous step to proceed with enabling recoding.
From the Data Sources page, select Sharepoint Online from the list of available data sources. In the Scan Configurations list create New Configuration
Make sure the connection has a Name, Credentials are set. Then select the Path icon.
Click on the Folder icon in the Path field to select the folder you want to monitor for real-time events.
Magnifying glass icon: Folders with this icon next to them indicate that real-time events can be subscribed to from this directory.
After selecting the folder, click Save & Close to finalize the changes.
Clock icon: When data streaming is being activated, the clock icon will appear, indicating that the subscription is being processed. Once the subscription is activated, this icon will change to a green magnifying glass.
After enabling Data Streaming, the system will automatically handle the subscription to Sharepoint Online’s real-time events. There is no need to manually configure Webhooks.
After the subscription is activated (green magnifying glass icon), real-time events will start flowing into the platform, and you will be able to monitor them from various sections of Getvisibility.
Navigate to the Live Events section under Administration to view a detailed audit log of all streaming events.
In this section, you can filter and view event details
{
"definition": {
"$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#",
"actions": {
"Filter_Records": {
"type": "Query",
"inputs": {
"from": "@triggerBody()?['ContentData']?['records']",
"where": "@and(not(empty(item()?['uri'])),or(contains(item()?['uri'], '{FolderPath}/'),contains(item()?['uri'], '{FolderPath}?')))"
},
"runAfter": {}
},
"Condition": {
"type": "If",
"expression": "@greater(length(body('Filter_Records')), 0)",
"actions": {
"HTTP-copy": {
"type": "Http",
"inputs": {
"uri": "{WebhookUrl}",
"method": "POST",
"headers": {
"Content-Type": "application/json"
},
"body": {
"event": "@setProperty(triggerBody(),'ContentData',setProperty(triggerBody()?['ContentData'],'records',body('Filter_Records')))"
}
},
"runAfter": {}
}
},
"else": {},
"runAfter": {
"Filter_Records": [
"Succeeded"
]
}
}
},
"contentVersion": "1.0.0.0",
"outputs": {},
"triggers": {
"When_events_are_available_in_Event_Hub": {
"type": "ApiConnection",
"inputs": {
"host": {
"connection": {
"referenceName": "{connectionName}"
}
},
"method": "get",
"path": "/@{encodeURIComponent('{eventHubName}')}/events/batch/head",
"queries": {
"contentType": "application/json",
"consumerGroupName": "$Default",
"maximumEventsCount": 50
}
},
"recurrence": {
"interval": 30,
"frequency": "Second"
},
"splitOn": "@triggerBody()"
}
}
},
"kind": "Stateful"
}































Use the Webhook URL provided at the step 2 to Subscriber endpoint and Confirm selection.
Make sure to replace {connectionDetails.ContainerName} and {connectionDetails.FolderPath} with the actual container name and folder path from the scan configuration.
Click Save.
On the Change Connection tab, click Details and copy the Name from the connection details. Save this Name, as it will be used later in step 9 to replace {connectionName}.
Click save on workflow designer window
{eventHubName} with azure event hub name that was created previouslyReplace {connectionName} with connection name from previouse step






















{
"definition": {
"$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#",
"actions": {
"Filter_Records": {
"type": "Query",
"inputs": {
"from": "@triggerBody()?['ContentData']?['records']",
"where": "@and(not(empty(item()?['uri'])),or(contains(item()?['uri'], '{FolderPath}/'),contains(item()?['uri'], '{FolderPath}?')))"
},
"runAfter": {}
},
"Condition": {
"type": "If",
"expression": "@greater(length(body('Filter_Records')), 0)",
"actions": {
"HTTP-copy": {
"type": "Http",
"inputs": {
"uri": "{WebhookUrl}",
"method": "POST",
"headers": {
"Content-Type": "application/json"
},
"body": {
"event": "@setProperty(triggerBody(),'ContentData',setProperty(triggerBody()?['ContentData'],'records',body('Filter_Records')))"
}
},
"runAfter": {}
}
},
"else": {},
"runAfter": {
"Filter_Records": [
"Succeeded"
]
}
}
},
"contentVersion": "1.0.0.0",
"outputs": {},
"triggers": {
"When_events_are_available_in_Event_Hub": {
"type": "ApiConnection",
"inputs": {
"host": {
"connection": {
"referenceName": "{connectionName}"
}
},
"method": "get",
"path": "/@{encodeURIComponent('{eventHubName}')}/events/batch/head",
"queries": {
"contentType": "application/json",
"consumerGroupName": "$Default",
"maximumEventsCount": 50
}
},
"recurrence": {
"interval": 30,
"frequency": "Second"
},
"splitOn": "@triggerBody()"
}
}
},
"kind": "Stateful"
}
<figure><img src="../../.gitbook/assets/cab519c5-725f-4f62-a8d4-3bce7eb60737 (1).png" alt=""><figcaption></figcaption></figure>



Create a new project or select an existing project
Enable the Admin SDK:
In the Google Cloud Console, navigate to the "APIs & Services" > "Library"
Search for "Admin SDK" and click on it
Click the "Enable" button to enable the Admin SDK API for your project
Create OAuth 2.0 Credentials:
In the Google Cloud Console, go to APIs & Services > Credentials
Click "Create credentials" and select "Service account"
Enter a name in the Service account name field andclick CREATE CREDENTIALS
Under "Grant this service account access to the project," select role as Owner and click DONE
Select the newly created service account and click Keys > Add Key > Create new key
Make sure the key type is set to json and click CREATE
The new private key pair is generated and downloaded to the machine. Note the values of private_key, client_email and client_id
From your domain's Admin console, go to Main menu menu > Security > Access and data control > API controls
In the Domain wide delegation pane, select Manage Domain Wide Delegation
Click Add new
In the Client ID field, enter the client ID obtained from the service account creation steps above
In the OAuth Scopes field, enter a comma-delimited list of the scopes required for the application
Use the below scopes:
https://www.googleapis.com/auth/admin.directory.user.readonly
https://www.googleapis.com/auth/admin.directory.domain.readonly
https://www.googleapis.com/auth/admin.directory.group.readonly
https://www.googleapis.com/auth/admin.directory.rolemanagement.readonly
Click Authorize
DirectoryService.Scope.AdminDirectoryUserReadonly
DirectoryService.Scope.AdminDirectoryDomainReadonly
DirectoryService.Scope.AdminDirectoryGroupReadonly
DirectoryService.Scope.AdminDirectoryRolemanagementReadonly
Navigate to Administration -> Data Sources -> Google IAM -> New scan
Enter the details of the OAuth2 credemtials obtained previously
Save the configuration
Once the configuration is saved, click on the icon on the right and select Start trustee scan to begin scanning
The scan results can be viewed under Dashboard -> Access Governance
Make sure you have /usr/local/bin configured in your PATH: export PATH=$PATH:/usr/local/bin). All the commands must be executed as root user.
For RHEL, K3s needs the following package to be installed: k3s-selinux (repo rancher-k3s-common-stable) and its dependencies container-selinux (repo rhel-8-appstream-rhui-rpms) and policycoreutils-python-utils (repo rhel-8-baseos-rhui-rpms).
Also, firewalld nm-cloud-setup.service and nm-cloud-setup.timer must be disabled and the server restarted before the installation, for more information.
The steps below you guide you through the air-gap installation of , a lightweight Kubernetes distribution created by Rancher Labs:
Extract the downloaded file: tar -xf gv-platform-$VERSION.tar
Prepare K3s for air-gap installation:
Install K3s:
Wait for the 30s and check if K3s is running with the command: kubectl get pods -A and systemctl status k3s.service
The steps below will manually deploy the necessary images to the cluster.
Import Docker images locally:
The following steps guide you through the installation of the dependencies required by Focus and Synergy.
Install Getvisibility Essentials and set the daily UTC backup hour (0-23) for performing backups.
Install Monitoring CRD:
Install Monitoring:
Check all pods are Running with the command: kubectl get pods -A
Replace the following variables:
$VERSION with the version that is present in the bundle that has been downloaded
$RESELLER with the reseller code (either getvisibility or forcepoint)
$PRODUCT with the product being installed (synergy
Models and other artifacts, like custom agent versions or custom consul configuration can be shipped inside auto deployable bundles. These bundles are docker images that contain the artifacts to be deployed alongside scripts to deploy them. To create a new bundle or modify an existing one follow this guide first: . The list of all the available bundles is inside the bundles/ directory on the models-ci project on github.
link to an internal confuence
After the model bundle is published, for example images.master.k3s.getvisibility.com/models:company-1.0.1 You’ll have to generate a public link to this image by running the k3s-air-gap Publish ML models GitHub CI task. The task will ask you for the docker image URL.
Once the task is complete you’ll get a public URL to download the artifact on the summary of the task. After that you have to execute the following commands.
Replace the following variables:
$URL with the URL to the model bundle provided by the task
$BUNDLE with the name of the artifact, in this case company-1.0.1
Now you’ll need to execute the artifact deployment job. This job will unpack the artifacts from the docker image into a MinIO bucket inside the on premise cluster and restart any services that use them.
Replace the following variables:
$GV_DEPLOYER_VERSION with the version of the model deployer available under charts/
$BUNDLE_VERSION with the version of the artifact, in this case company-1.0.1
You should be able to verify that everything went alright by locating the ml-model job that was launched. The logs should look like this:
In addition you can enter the different services that consume these artifacts to check if they have been correctly deployed. For example for the models you can open a shell inside the classifier containers and check the /models directory or check the models-data bucket inside MinIO. Both should contain the expected models.
K3s needs the following ports to be accessible (Inbound and Outbound) by all other nodes running in the same cluster:
The ports above should not be publicly exposed as they will open up your cluster to be accessed by anyone. Make sure to always run your nodes behind a firewall/security group/private network that disables external access to the ports mentioned above.
All nodes in the cluster must have:
Domain Name Service (DNS) configured
Network Time Protocol (NTP) configured
Fixed private IPv4 address
Globally unique node name (use --node-name when installing K3s in a VM to set a static node name)
The following port must be publicly exposed in order to allow users to access Synergy or Focus product:
The user must not access the K3s nodes directly, instead, there should be a load balancer sitting between the end user and all the K3s nodes (master and worker nodes):
The load balancer must operate at Layer 4 of the OSI model and listen for connections on port 443. After the load balancer receives a connection request, it selects a target from the target group (which can be any of the master or worker nodes in the cluster) and then attempts to open a TCP connection to the selected target (node) on port 443.
The load balancer must have health checks enabled which are used to monitor the health of the registered targets (nodes in the cluster) so that the load balancer can send requests to healthy nodes only.
The recommended health check configuration is:
Timeout: 10 seconds
Healthy threshold: 3 consecutive health check successes
Unhealthy threshold: 3 consecutive health check failures
Interval: 30 seconds
At least 4 machines are required to provide high availability of the Getvisibility platform. The HA setup supports a single-node failure.
Make sure you have /usr/local/bin configured in your PATH: export PATH=$PATH:/usr/local/bin). All the commands must be executed as root user.
For RHEL, K3s needs the following package to be installed: k3s-selinux (repo rancher-k3s-common-stable) and its dependencies container-selinux (repo rhel-8-appstream-rhui-rpms) and policycoreutils-python-utils (repo rhel-8-baseos-rhui-rpms).
Also, firewalld nm-cloud-setup.service and nm-cloud-setup.timer must be disabled and the server restarted before the installation, for more information.
The steps below you guide you through the air-gap installation of , a lightweight Kubernetes distribution created by Rancher Labs:
Create at least 4 VMs with the same specs
Extract the downloaded file: tar -xf gv-platform-$VERSION.tar to all the VMs
Create a local DNS entry private-docker-registry.local across all the nodes resolving to the master1 node:
Prepare the K3s for air-gap installation files:
Update the registries.yaml file across all the nodes.
Install K3s in the 1st master node:
To get started launch a server node using the cluster-init flag:
Check for your first master node status, it should have the Ready state:
Use the following command to copy the TOKEN from this node that will be used to join the other nodes to the cluster:
Also, copy the IP address of the 1st master node which will be used by the other nodes to join the cluster.
Install K3s in the 2nd master node:
Run the following command and assign the contents of the file: /var/lib/rancher/k3s/server/node-token from the 1st master node to the K3S_TOKEN variable.
Set --node-name to “master2”
Set --server to the IP address of the 1st master node
Check the node status:
Install K3s in the 3rd master node:
Run the following command and assign the contents of the file: /var/lib/rancher/k3s/server/node-token from the 1st master node to the K3S_TOKEN variable.
Set --node-name to “master3”
Set --server to the IP address of the 1st master node
Check the node status:
Install K3s in the 1st worker node:
Use the same approach to install K3s and to connect the worker node to the cluster group.
The installation parameter would be different in this case. Run the following command:
Set --node-name to “worker1” (where n is the nth number of the worker node)
Check the node status:
Extract and Import the Docker images locally to the master1 node
Install gv-private-registry helm chart in the master1 node:
Replace $VERSION with the version that is present in the bundle that has been downloaded.
To check all the charts that have been download run ls charts.
Tag and push the docker images to the local private docker registry deployed in the master1 node:
The following steps guide you through the installation of the dependencies required by Focus and Synergy.
Perform the following steps in the master1 Node
Install Getvisibility Essentials and set the daily UTC backup hour (0-23) for performing backups.
If you are installing Focus or Enterprise append --set eck-operator.enabled=true to the command in order to enable (BROKEN LINK TO ELASTIC SEARCH)
Install Monitoring CRD:
Install Monitoring:
Check all pods are Running with the command:
Replace the following variables:
$VERSION with the version that is present in the bundle that has been downloaded
$RESELLER with the reseller code (either getvisibility or forcepoint)
$PRODUCT with the product being installed (synergy
Perform the following steps in the master1 node
Install gv-kube-fledged helm chart.
Replace $VERSION with the version that is present in the bundle that has been downloaded.
To check all the charts that have been download run ls charts.
Create and deploy imagecache.yaml
Models and other artifacts, like custom agent versions or custom consul configuration can be shipped inside auto deployable bundles. The procedure to install custom artifact bundles on an HA cluster is the same as in the single node cluster case. Take a look at the guide for single-node clusters above.
Before upgrading each chart, you can check the settings used in the current installation with
helm get values <chartname>.
If the current values are different from the defaults, you will need to change the parameters of the
helm upgrade command for the chart in question.
For example, if the backup is currently set to run at 2 AM instead of the 1 AM default, change
--set backup.hour=1 to --set backup.hour=2
To upgrade Focus/Synergy/Enterprise you must:
Download the new bundle
Import Docker images
Install Focus/Synergy/Enterprise Helm Chart
LINK TO INTERNAL CONFLUENCE
To upgrade the GV Essential chart you must:
Download the new bundle
Import Docker images
Run the command from Install Getvisibility Essentials under Install Helm charts section
Models and other artifacts, like custom agent versions or custom consul configuration can be shipped inside auto deployable bundles. The procedure to upgrade custom artifact bundles is the same as the installation one, take a look at the guides above for single-node and multi-node installations.
















focusenterprise9796
Prometheus node exporter
TCP
80
Private Docker Registry
focusenterpriseBelow is a mostly default config.
Protocol
Port
Description
TCP
6443
Kubernetes API Server
UDP
8472
Required for Flannel VXLAN
TCP
2379-2380
embedded etcd
TCP
10250
metrics-server for HPA
Protocol
Port
Description
TCP
443
Focus/Synergy backend



TCP
# mkdir -p /var/lib/rancher/k3s/agent/images/
# gunzip -c assets/k3s-airgap-images-amd64.tar.gz > /var/lib/rancher/k3s/agent/images/airgap-images.tar
# cp assets/k3s /usr/local/bin && chmod +x /usr/local/bin/k3s
# tar -xzf assets/helm-v3.8.2-linux-amd64.tar.gz
# cp linux-amd64/helm /usr/local/bin# cat scripts/k3s.sh | INSTALL_K3S_SKIP_DOWNLOAD=true SKIP_PRECHECK=true K3S_KUBECONFIG_MODE="644" sh -s - server --node-name=local-01# mkdir /tmp/import
# for f in images/*.gz; do IMG=$(basename "${f}" .gz); gunzip -c "${f}" > /tmp/import/"${IMG}"; done
# for f in /tmp/import/*.tar; do ctr -n=k8s.io images import "${f}"; done# helm upgrade --install gv-essentials charts/gv-essentials-$VERSION.tgz --wait \
--timeout=10m0s --kubeconfig /etc/rancher/k3s/k3s.yaml \
--set backup.hour=1 \
--set eck-operator.enabled=true \
--set updateclusterid.enabled=false \
--set eck-operator.settings.cpu=4 \
--set eck-operator.settings.memory=20 \
--set eck-operator.settings.storage=160# helm upgrade --install rancher-monitoring-crd charts/rancher-monitoring-crd-$VERSION.tgz --wait \
--kubeconfig /etc/rancher/k3s/k3s.yaml \
--namespace=cattle-monitoring-system \
--create-namespace# helm upgrade --install rancher-monitoring charts/rancher-monitoring-$VERSION.tgz --wait \
--kubeconfig /etc/rancher/k3s/k3s.yaml \
--namespace=cattle-monitoring-system \
--set k3sServer.enabled=true \
--set k3sControllerManager.enabled=true \
--set k3sScheduler.enabled=true \
--set k3sProxy.enabled=true \
--set prometheus.retention=5# helm upgrade --install gv-platform charts/gv-platform-$VERSION.tgz --wait \
--timeout=10m0s --kubeconfig /etc/rancher/k3s/k3s.yaml \
--set-string clusterLabels.environment=prod \
--set-string clusterLabels.cluster_reseller=$RESELLER \
--set-string clusterLabels.cluster_name=mycluster \
--set-string clusterLabels.product=$PRODUCT# kubectl patch clusterrole traefik-kube-system -n kube-system --type='json' -p='[{"op": "add", "path": "/rules/-1/apiGroups/-", "value": "traefik.io"}]'
# kubectl apply -f assets/traefik-patch.yaml
# kubectl rollout restart deployment traefik -n kube-systemmkdir custom
wget -O custom/$BUNDLE.tar.gz $URL
gunzip custom/$BUNDLE.tar.gz
ctr -n=k8s.io images import models/$BUNDLE.tar helm upgrade \
--install gv-model-deployer charts/gv-model-deployer-$GV_DEPLOYER_VERSION.tgz \
--wait --timeout=10m0s --kubeconfig /etc/rancher/k3s/k3s.yaml \
--set models.version="$BUNDLE_VERSION"root@ip-172-31-9-140:~# kubectl logs -f ml-model-0jvaycku9prx-84nbf
Uploading models
Added `myminio` successfully.
`/models/AIP-1.0.0.zip` -> `myminio/models-data/AIP-1.0.0.zip`
`/models/Commercial-1.0.0.zip` -> `myminio/models-data/Commercial-1.0.0.zip`
`/models/Default-1.0.0.zip` -> `myminio/models-data/Default-1.0.0.zip`
`/models/classifier-6.1.2.zip` -> `myminio/models-data/classifier-6.1.2.zip`
`/models/lm-full-en-2.1.2.zip` -> `myminio/models-data/lm-full-en-2.1.2.zip`
`/models/sec-mapped-1.0.0.zip` -> `myminio/models-data/sec-mapped-1.0.0.zip`
Total: 0 B, Transferred: 297.38 MiB, Speed: 684.36 MiB/s
Restart classifier
deployment.apps/classifier-focus restarted
root@ip-172-31-9-140:~# cat >> /etc/hosts << EOF
<Master1_node_VM_IP> private-docker-registry.local
EOF$ mkdir -p /var/lib/rancher/k3s/agent/images/
$ gunzip -c assets/k3s-airgap-images-amd64.tar.gz > /var/lib/rancher/k3s/agent/images/airgap-images.tar
$ cp assets/k3s /usr/local/bin && chmod +x /usr/local/bin/k3s
$ tar -xzf assets/helm-v3.8.2-linux-amd64.tar.gz && cp linux-amd64/helm /usr/local/bin$ mkdir -p /etc/rancher/k3s
$ cp assets/registries.yaml /etc/rancher/k3s/cat scripts/k3s.sh | INSTALL_K3S_SKIP_DOWNLOAD=true K3S_KUBECONFIG_MODE="644" sh -s - server --node-name=master1 --cluster-initkubectl get nodescat /var/lib/rancher/k3s/server/node-tokencat scripts/k3s.sh | K3S_TOKEN=$K3S_TOKEN INSTALL_K3S_SKIP_DOWNLOAD=true K3S_KUBECONFIG_MODE="644" sh -s - server --node-name=master2 --server https://<ip or hostname of any master node>:6443kubectl get nodescat scripts/k3s.sh | K3S_TOKEN=$K3S_TOKEN INSTALL_K3S_SKIP_DOWNLOAD=true K3S_KUBECONFIG_MODE="644" sh -s - server --node-name=master3 --server https://<ip or hostname of any master node>:6443cat scripts/k3s.sh | $K3S_TOKEN INSTALL_K3S_SKIP_DOWNLOAD=true K3S_TOKEN=$K3S_TOKEN K3S_KUBECONFIG_MODE="644" sh -s - agent --node-name=worker1 --server https://<ip or hostname of any master node>:6443kubectl get nodes$ mkdir /tmp/import
$ for f in images/*.gz; do IMG=$(basename "${f}" .gz); gunzip -c "${f}" > /tmp/import/"${IMG}"; done
$ for f in /tmp/import/*.tar; do ctr -n=k8s.io images import "${f}"; done$ helm upgrade --install gv-private-registry charts/gv-private-registry-$VERSION.tgz --wait \
--timeout=10m0s \
--kubeconfig /etc/rancher/k3s/k3s.yaml$ sh scripts/push-docker-images.sh$ helm upgrade --install gv-essentials charts/gv-essentials-$VERSION.tgz --wait \
--timeout=10m0s --kubeconfig /etc/rancher/k3s/k3s.yaml \
--set global.high_available=true \
--set eck-operator.enabled=true \
--set minio.replicas=4 \
--set minio.mode=distributed \
--set consul.server.replicas=3 \
--set updateclusterid.enabled=false \
--set backup.hour=1$ helm upgrade --install rancher-monitoring-crd charts/rancher-monitoring-crd-$VERSION.tgz --wait \
--kubeconfig /etc/rancher/k3s/k3s.yaml \
--namespace=cattle-monitoring-system \
--create-namespace$ helm upgrade --install rancher-monitoring charts/rancher-monitoring-$VERSION.tgz --wait \
--kubeconfig /etc/rancher/k3s/k3s.yaml \
--set global.high_available=true \
--namespace=cattle-monitoring-system \
--set loki-stack.loki.replicas=2 \
--set prometheus.prometheusSpec.replicas=2kubectl get pods -A$ helm upgrade --install gv-platform charts/gv-platform-$VERSION.tgz --wait \
--timeout=10m0s --kubeconfig /etc/rancher/k3s/k3s.yaml \
--set high_available=true \
--set-string clusterLabels.environment=prod \
--set-string clusterLabels.cluster_reseller=$RESELLER \
--set-string clusterLabels.cluster_name=mycluster \
--set-string clusterLabels.product=$PRODUCT$ helm upgrade --install gv-kube-fledged charts/gv-kube-fledged-$VERSION.tgz -n kube-fledged \
--timeout=10m0s \
--kubeconfig /etc/rancher/k3s/k3s.yaml \
--create-namespace$ sh scripts/create-imagecache-file.sh
$ kubectl apply -f scripts/imagecache.yamlA comprehensive list of the supported event types by Data Source for DDR
When DDR (aka streaming) is enabled and events start coming in from the data source there are two types of events:
Examples would be Read, View, etc.
No actions are taken when these events are detected.
These are events that alter the file or the file permissions. Examples would include creating a file or user, changing a file name etc.
When these types of events are detected a scan or rescan of the item will occur so that it can be classified.
CreateUser - A new user account is created.
CreateGroup - A new user group is created.
CreateRole - A new role is created with specific permissions.
UpdateUser - Modifications are made to an existing user.
UpdateGroup - Changes are made to a group, such as adding or removing members.
UpdateRole - A role is updated with new permissions or settings.
AttachUserPolicy - A policy is attached to a user, modifying access rights.
DeleteUser - A user account is deleted.
DeleteGroup - A group is deleted along with its associated permissions.
DeleteRole - A role is deleted from IAM.
ConsoleLogin - A user logs in through the AWS console.
SignInFailure - A login attempt fails.
SignInSuccess - A login attempt is successful.
FederatedLogin - A user logs in via federated authentication.
s3:ObjectCreated: - A new object is uploaded to an S3 bucket.
s3:ObjectCreated:Post – A new object is uploaded to an S3 bucket by an HTTP POST operation.
s3:ObjectCreated:CompleteMultipartUpload – An object was created after a multipart upload operation.
s3:ObjectCreated:Copy – A new object is created by an S3 copy operation.
s3:ObjectRestore:Post – A restore request for an archived object is initiated.
s3:ObjectRestore:Delete – A restore request for an archived object is deleted.
s3:ObjectAcl:Put – Access control settings for an object are updated.
s3:ObjectTagging:Put – Tags for an object are added or modified.
s3:ObjectRemoved:Delete – An object is deleted from an S3 bucket.
s3:ObjectRemoved:DeleteMarkerCreated – A delete marker is created for an object, marking it as deleted.
s3:LifecycleExpiration:Delete – An object is removed due to lifecycle rules.
s3:LifecycleExpiration:DeleteMarkerCreated – A delete marker is created due to lifecycle rules.
s3:ReducedRedundancyLostObject - An object stored in Reduced Redundancy Storage is lost.
s3:LifecycleTransition – An object is transitioned to a different storage class based on lifecycle rules.
s3:Replication:OperationFailedReplication – The replication operation for an object failed.
s3:Replication:OperationNotTracked – The replication operation for an object is not tracked.
Microsoft.Storage.BlobCreated - A new blob is created or content is updated in a storage container.
Microsoft.Storage.DirectoryCreated - A new directory is created in a storage container.
Microsoft.Storage.BlobRenamed - A blob is renamed within a container.
Microsoft.Storage.DirectoryRenamed - A directory is renamed within a container.
Microsoft.Storage.BlobDeleted - A blob is deleted from a storage container.
Microsoft.Storage.DirectoryDeleted - A directory is deleted from a storage container.
Microsoft.EventGrid.SubscriptionValidationEvent - A subscription validation event.
Microsoft.Storage.BlobTierChanged - The storage tier of a blob is modified.
GetBlobServiceProperties - Retrieves properties of the Blob service.
GetContainerProperties - Retrieves properties of a storage container.
CreateFile - A new file is created in an Azure Files share.
CreateDirectory - A new directory is created in an Azure Files share.
CopyFile - A file is copied to a new location.
SetFileProperties - The properties of a file are updated.
SetFileMetadata - Metadata of a file is updated.
DeleteFile - A file is deleted from an Azure Files share.
DeleteDirectory - A directory is deleted from an Azure Files share.
ListShares - Lists file shares in an account.
GetShareProperties - Retrieves properties of a file share.
GetShareMetadata - Retrieves metadata of a file share.
GetDirectoryProperties - Retrieves properties of a directory.
FILE.UPLOADED - A new file is uploaded.
FOLDER.CREATED - A new folder is created.
FILE.RESTORED - A previously deleted file is restored.
FOLDER.RESTORED - A previously deleted folder is restored.
FILE.MOVED - A file is moved to a new location.
FILE.RENAMED - A file is renamed.
FOLDER.RENAMED - A folder is renamed.
FOLDER.MOVED - A folder is moved to a new location.
FILE.TRASHED - A file is moved to the trash.
FILE.DELETED - A file is permanently deleted.
FOLDER.TRASHED - A folder is moved to the trash.
FOLDER.DELETED - A folder is permanently deleted.
FILE.DOWNLOADED - A file is downloaded.
FOLDER.DOWNLOADED - A folder is downloaded.
FILE.COPIED - A file is copied to another location.
FOLDER.COPIED - A folder is copied to another location.
page_created - A new page is created in Confluence.
blogpost_created - A new blog post is created.
attachment_created - A new attachment is uploaded.
page_updated - An existing page is modified.
blogpost_updated - A blog post is updated.
attachment_updated - An attachment is updated.
page_deleted - A page is deleted from Confluence.
blogpost_deleted - A blog post is deleted.
attachment_deleted - An attachment is removed.
All other events are categorized as informational.
MessagesAdded - A new email message is added.
LabelsAdded - A label is added to an email.
LabelsRemoved - A label is removed from an email.
MessagesDeleted - An email message is deleted.
create - A new file or folder is created.
upload - A new file is uploaded.
edit - A file or folder is modified.
rename - A file or folder is renamed.
move - An item is moved to a different location.
delete - An item is permanently removed.
trash - An item is moved to the trash.
view - A file or folder is viewed.
download - A file is downloaded.
preview - A file is previewed.
print - A file is printed.
create_group - A new group is created.
create_user - A new user is created.
2sv_disable - Two-step verification is disabled.
2sv_enroll - Two-step verification is enrolled.
password_edit - A user's password is modified.
recovery_email_edit - A recovery email is changed.
delete_group - A group is deleted.
delete_user - A user is deleted.
archive_user - A user is archived.
unarchive_user - A user is unarchived.
login_success - A user successfully logs in.
login_failure - A login attempt fails.
login_challenge - A login challenge occurs.
application_login_failure - An application login fails.
FileUploaded - A new file is uploaded.
FolderCreated - A new folder is created.
FileRestored - A previously deleted file is restored.
FolderRestored - A previously deleted folder is restored.
FileModified - A file is modified.
FileMoved - A file is moved to a new location.
FileRenamed - A file is renamed.
FolderModified - A folder is modified.
FileDeleted - A file is permanently deleted.
FolderDeleted - A folder is permanently deleted.
FileRecycled - A file is moved to the recycle bin.
FolderRecycled - A folder is moved to the recycle bin.
FileAccessed - A file is accessed.
FileDownloaded - A file is downloaded.
FilePreviewed - A file is previewed.
FolderCopied - A folder is copied.
DetachUserPolicy - A policy is removed from a user, altering permissions.
PutUserPolicy - A new policy is assigned to a user.
AttachGroupPolicy - A policy is attached to a group, affecting all its members.
DetachGroupPolicy - A policy is removed from a group.
PutGroupPolicy - A policy is assigned to a group.
AttachRolePolicy - A policy is attached to a role, modifying access rights.
DetachRolePolicy - A policy is removed from a role.
PutRolePolicy - A new policy is assigned to a role.
ChangePassword - A user changes their password.
AddUserToGroup - A user is added to a group, changing their access permissions.
RemoveUserFromGroup - A user is removed from a group.
SessionStart - A session begins.
SessionEnd - A session ends.
GenerateCredentialReport - A report on credentials is generated.
GetCredentialReport - A credential report is retrieved.
ListAccessKeys - Access keys for a user are listed.
ListUserTags - Tags associated with a user are retrieved.
ListUsers - Users within an AWS account are listed.
ListGroups - Groups within an AWS account are listed.
ListRoles - Roles within an AWS account are listed.
GetUser - Information about a specific user is retrieved.
GetGroup - Information about a specific group is retrieved.
GetRole - Information about a specific role is retrieved.
s3:ObjectRestore:Completed – An archived object has been fully restored and is now available.
s3:ObjectTagging:Delete – Tags for an object are removed.
s3:Replication:OperationMissedThreshold – The replication operation did not meet its threshold requirements.
s3:Replication:OperationReplicatedAfterThreshold – The replication operation succeeded after surpassing the threshold.
s3:IntelligentTiering – An object is moved between storage tiers.
GetContainerServiceMetadata - Retrieves metadata for a storage container.
ListContainers - Lists storage containers in an account.
BlobPreflightRequest - A request to verify blob upload conditions.
ListBlobs - Lists blobs in a container.
GetBlobProperties - Retrieves properties of a blob.
GetBlobMetadata - Retrieves metadata associated with a blob.
GetBlockList - Retrieves the list of blocks in a blob.
GetContainerACL - Retrieves the access control list of a container.
GetContainerMetadata - Retrieves metadata for a container.
CopyBlob - Copies a blob from one location to another.
CopyBlobSource - Identifies the source blob for a copy operation.
CopyBlobDestination - Identifies the destination blob for a copy operation.
DeleteBlob - Deletes a blob from a container.
DeleteBlobSnapshot - Deletes a snapshot of a blob.
DeleteContainer - Deletes a storage container.
PutBlob - Uploads a new blob to a container.
PutBlock - Uploads a block for a blob.
PutBlockList - Commits a set of uploaded blocks as a blob.
CreateBlobSnapshot - Creates a snapshot of an existing blob.
CreateBlockBlob - Creates a new block blob.
CreateContainer - Creates a new storage container.
SetBlobMetadata - Updates metadata for a blob.
SetBlobProperties - Updates properties of a blob.
SetContainerMetadata - Updates metadata for a storage container.
SetContainerACL - Modifies the access control list of a container.
AcquireBlobLease - Acquires a lease on a blob.
ReleaseBlobLease - Releases a lease on a blob.
RenewBlobLease - Renews a lease on a blob.
BreakBlobLease - Breaks an active lease on a blob.
AcquireContainerLease - Acquires a lease on a container.
BreakContainerLease - Breaks an active lease on a container.
ChangeBlobLease - Changes an active lease on a blob.
ChangeContainerLease - Changes an active lease on a container.
RenewContainerLease - Renews a lease on a container.
UndeleteBlob - Restores a deleted blob.
GetFileProperties - Retrieves properties of a file.
ListDirectoriesAndFiles - Lists directories and files in a share.
GetFile - Retrieves a file from a share.
GetFileRangeList - Retrieves the range list of a file.
GetShareStats - Retrieves statistics for a file share.
CreateShare - Creates a new file share.
PutRange - Uploads a range of data to a file.
SetShareMetadata - Updates metadata for a file share.
SetShareProperties - Updates properties of a file share.
SetDirectoryMetadata - Updates metadata of a directory.
SetDirectoryProperties - Updates properties of a directory.
ResizeFile - Resizes an existing file.
SetFileTier - Sets the tier of a file.
SetShareQuota - Updates the quota of a file share.
SetShareACL - Updates the access control list of a file share.
SetDirectoryACL - Updates the access control list of a directory.
SetFileACL - Updates the access control list of a file.
DeleteShare - Deletes a file share.
AcquireShareLease - Acquires a lease on a file share.
ReleaseShareLease - Releases a lease on a file share.
RenewShareLease - Renews a lease on a file share.
BreakShareLease - Breaks an active lease on a file share.
ChangeShareLease - Changes an active lease on a file share.
StartCopyFile - Initiates a file copy operation.
AbortCopyFile - Cancels an ongoing file copy operation.
CopyFileSource - Specifies the source file in a copy operation.
CopyFileDestination - Specifies the destination file in a copy operation.
CreateShareSnapshot - Creates a snapshot of a file share.
DeleteShareSnapshot - Deletes a snapshot of a file share.
UndeleteShare - Restores a deleted file share.
UndeleteFile - Restores a deleted file.
UndeleteDirectory - Restores a deleted directory.
RenameFile - Renames a file within a share.
RenameFileSource - Specifies the source file in a rename operation.
RenameFileDestination - Specifies the destination file in a rename operation.
RenameDirectory - Renames a directory within a share.
RenameDirectorySource - Specifies the source directory in a rename operation.
RenameDirectoryDestination - Specifies the destination directory in a rename operation.
COLLABORATION.CREATED - A collaboration event is created.
COLLABORATION.REMOVED - A collaboration is removed.
COLLABORATION.UPDATED - A collaboration is updated.
SHARED_LINK.CREATED - A shared link is created.
SHARED_LINK.UPDATED - A shared link is updated.
SHARED_LINK.DELETED - A shared link is deleted.
FILE.LOCKED - A file is locked for editing.
FILE.UNLOCKED - A file is unlocked for editing.
COMMENT.CREATED - A comment is added to a file.
COMMENT.UPDATED - A comment is updated.
COMMENT.DELETED - A comment is deleted.
METADATA_INSTANCE.CREATED - A metadata instance is created.
METADATA_INSTANCE.UPDATED - A metadata instance is updated.
METADATA_INSTANCE.DELETED - A metadata instance is deleted.
TASK_ASSIGNMENT.CREATED - A task is assigned.
TASK_ASSIGNMENT.UPDATED - A task assignment is updated.
SIGN_REQUEST.COMPLETED - A signature request is completed.
SIGN_REQUEST.DECLINED - A signature request is declined.
SIGN_REQUEST.EXPIRED - A signature request expired.
SIGN_REQUEST.SIGNER_EMAIL_BOUNCED - A signature request email bounced.
sync - A file or folder is synced.
request_access - Access to an item is requested.
approval_requested - An approval request is sent.
approval_completed - An approval request is completed.
approval_canceled - An approval request is cancelled.
approval_comment_added - A comment is added to an approval request.
approval_due_time_change - The due time for an approval request is changed.
approval_reviewer_change - The reviewer of an approval request is changed.
approval_reviewer_responded - A reviewer responds to an approval request.
deny_access_request - An access request is denied.
expire_access_request - An access request expires.
change_owner - The owner of an item is changed.
change_document_access_scope - The access scope of a document is changed.
change_document_visibility - The visibility of a document is changed.
change_acl_editors - The list of editors for a document is modified.
change_user_access - User access permissions are modified.
shared_drive_membership_change - Membership in a shared drive is changed.
shared_drive_settings_change - Shared drive settings are modified.
apply_security_update - Security updates are applied.
shared_drive_apply_security_update - A security update is applied to a shared drive.
shared_drive_remove_security_update - A security update is removed from a shared drive.
remove_security_update - A security update is removed.
enable_inherited_permissions - Inherited permissions are enabled.
disable_inherited_permissions - Inherited permissions are disabled.
recovery_phone_edit - A recovery phone number is changed.
recovery_secret_qa_edit - A recovery question or answer is changed.
account_disabled_password_leak - A user account is disabled due to a password leak.
account_disabled_generic - A user account is disabled.
account_disabled_spamming - A user account is disabled due to spamming.
account_disabled_spamming_through_relay - A user account is disabled for spamming via relay.
accept_invitation - A user accepts an invitation.
add_info_setting - An informational setting is added.
add_member - A new member is added to a group.
add_member_role - A role is assigned to a member.
add_security_setting - A security setting is added.
add_service_account_permission - A permission is assigned to a service account.
approve_join_request - A join request is approved.
ban_member_with_moderation - A member is banned.
change_info_setting - An informational setting is modified.
change_security_setting - A security setting is changed.
change_group_setting - A group setting is modified.
change_group_name - A group's name is changed.
change_first_name - A user's first name is changed.
change_password - A user's password is changed.
suspend_user - A user is suspended.
unsuspend_user - A user is unsuspended.
update_group_settings - A group's settings are updated.
user_license_assignment - A license is assigned to a user.
user_license_revoke - A license is revoked from a user.
add_group_member - A member is added to a group.
remove_group_member - A member is removed from a group.
change_user_access - User access permissions are changed.
change_acl_editors - The list of editors for a document is changed.
application_login_success - An application login succeeds.
alert_center_view - The alert center is accessed.
request_to_join - A request to join a group is sent.
request_to_join_via_mail - A request to join a group via email is sent.
approval_requested - An approval request is made.
approval_canceled - An approval request is canceled.
approval_comment_added - A comment is added to an approval request.
approval_completed - An approval request is completed.
approval_due_time_change - The due time of an approval request is changed.
approval_reviewer_change - The reviewer of an approval request is changed.
approval_reviewer_responded - A reviewer responds to an approval request.
deny_access_request - An access request is denied.
expire_access_request - An access request expires.
shared_drive_membership_change - Membership in a shared drive is changed.
shared_drive_settings_change - Shared drive settings are changed.
apply_security_update - A security update is applied.
remove_security_update - A security update is removed.
shared_drive_apply_security_update - A security update is applied to a shared drive.
shared_drive_remove_security_update - A security update is removed from a shared drive.
suspicious_login - A suspicious login is detected.
suspicious_login_less_secure_app - A suspicious login from a less secure app is detected.
suspicious_programmatic_login - A suspicious programmatic login is detected.
user_signed_out_due_to_suspicious_session_cookie - A user is signed out due to a suspicious session cookie.
FolderRenamed - A folder is renamed.
FileSensitivityLabelChanged - A file's sensitivity label is modified.
FileSensitivityLabelApplied - A sensitivity label is applied to a file.
SharingSet - Sharing permissions are updated.
AddedToGroup - A user is added to a group.
SiteDeleted - A SharePoint site is deleted.
GroupRemoved - A group is removed.
SharedLinkCreated - A shared link is created.
SharedLinkDisabled - A shared link is disabled.
SharingInvitationAccepted - A sharing invitation is accepted.
SharingRevoked - A sharing invitation is revoked.
AnonymousLinkCreated - An anonymous link is created.
SecureLinkCreated - A secure link is created.
SecureLinkUpdated - A secure link is updated.
SecureLinkDeleted - A secure link is deleted.
AccessInvitationAccepted - An access invitation is accepted.
AccessInvitationRevoked - An access invitation is revoked.
AccessRequestApproved - An access request is approved.
AccessRequestRejected - An access request is rejected.
FileCheckOutDiscarded - A file checkout is discarded.
FileCheckedIn - A file is checked in.
FileCheckedOut - A file is checked out.
SharingInheritanceBroken - Sharing inheritance is broken.
AddedToSecureLink - A user is added to a secure link.
RemovedFromSecureLink - A user is removed from a secure link.
SiteCollectionCreated - A new SharePoint site collection is created.
GQL (Getvisibility Query Language) is a query language designed to enhance the flexibility and efficiency of querying data through the DSPM+, DDC, and EDC platforms. It enables the creation of custom queries without the need for hard coding, significantly simplifying the process of filtering through and analysing data.
Based on Apache Lucene query language, GQL supports boolean, term, and range queries. This flexibility allows the language to seamlessly integrate with the platform’s Analytics software to produce elegant and insightful visualisations.
Once mastered, GQL offers maximum flexibility, enabling both broad and precise data analysis.
There are separate sets of terms used for the different datasets within the DSPM+, DDC, and EDC platforms. Each of the datasets allow for unique GQL terms relating to this data:
Files: Unstructured data discovered and classified on-prem and in the cloud file storage locations. GQL term examples: path, ingestedAt, flow
Trustees: Users and groups that are discovered in on-prem and in cloud IAM systems. GQL term examples: type, isAdmin, outdatedPassword
For the full sets of terms, see tables below.
Operations are performed on or between terms to help filter data. The available operations are:
AND Combines queries to match items meeting all conditions
OR Matches items meeting any listed conditions
() Groups queries to clarify operation order
Queries are formed using terms, their values, and operations. They can be as simple as a query looking for High Risk HR Data:
To complex queries specifying Health, Safety, and Compliance Documents as a data asset in DSPM:
The UI will give suggestions as you type to help out.
You should experiment with GQL queries across various platform interfaces. See what works and what doesn't. Get creative and let the real-time suggestions assist you. Remember, you can save the queries you create as bookmarks for future use.
Click on the star
Enter a description, select Accept
The bookmark is saved
Scroll down to see saved bookmarks
Queries can be created that incorporate dates. These can include exact dates and times or ranges. Date types include: createdAt, lastModifiedAt, and ingestedAt.
GQL will provide suggestions for common time intervals such as minutes, days, months, and years.
Once a date type has been selected and an operation associated with it, a date interface will be presented to the user. Simply search for and select the appropriate date to create the query.
If a specific range of dates are needed, for example, all files created in May 2022, the following method should be used.
This method will search for files whose creation dates are greater than or equal to midnight on the 1st May 2022 and less than midnight on the 1st of June 2022.
Type createdAt>= and select the first date
Select AND
Type createdAt< and select the closing date
Hit enter or the search icon and the query will the filter the results
This method can be used with any date data type. It can be as granular as seconds or as broad as years.
When creating or editing widgets such as counters, charts, or maps in the Analytics boards you will have the ability to aggregate some of the terms in the datasets. For example: you can use counts to show critical shared files, group by file type when displaying classification results, or use multiple groupings to create more complex visualisations.
While not strictly part of GQL yet, they are useful to know as it will help in constructing more descriptive visualisations.
GQL Term: Used in the query
Label: Displayed in the interface
Type: Data type of the term
Aggregation: Grouping types that are available to that term, only in the Analytics boards
Unstructured data discovered and classified from on-prem and cloud file storage locations.
Users and groups that are discovered in on-prem and in cloud IAM systems
User activities tracked by endpoint classification.
Administrative data from individual classification endpoints
List of all remediation events initiated by users.
Real time events events from DDR
Similar to Streaming events dataset but usually with more detail, however the do not appear on the UI as quickly.
Events associated with scanning of files.
Activity: User activities tracked by the endpoint classification platform. GQL term examples: recipients, operation, agentId
Management: Administrative data from individual classification endpoints. GQL term examples: lastSeen, status, os
Remediation: Actions initiated by users to remediate issues. GQL term examples: actionType, errorReason, updatedPermissions
Streaming: Real time events from DDR functionality. GQL term examples: source, eventTime, eventId
Extended Streaming: Similar to Streaming but there is a longer delay as to when these events will appear in the UI. Some of the event types may also contain more detail than when compared to the Streaming dataset. GQL term examples: tenantId, rawEventType, streamingEventType
File Audit Log: List of events that are associated with the scanning of files. GQL term examples: source, functionalityName, moduleName
= Equal to
!= Not equal to
> Greater than
< Less than
>= Greater than or equal to
<= Less than or equal to
EXISTS
NOT_EXISTS
Path
STRING
The path of the document
contentLength
Content length
LONG
The size of the document in bytes
count,
sum,
average,
min, max,
median,
Can be grouped
risk
Risk
NUMBER
The document risk factor. low=0,
medium=1,
high=2
source
Source
STRING
The source of the document
Can be grouped
createdAt
Created at
DATE
The document creation date
min, max, median,
Can be grouped
lastModifiedAt
Last modified at
DATE
The document last modified date
min, max, median,
Can be grouped
ingestedAt
Ingested at
DATE
The document ingested date
min, max, median,
Can be grouped
lastAccessedAt
Last accessed date
DATE
The document last accessed date
min, max, median, Can be grouped
flow
Flow
STRING
The document current flow stage
Can be grouped
classification
Classification
STRING
The classification of the document
Can be grouped
classificationConfidence
Classification confidence
DOUBLE
The classification confidence of the document
configurationIds
Configuration Id
STRING
The configuration id of the document
connectorId
Connector name
STRING
Name of the scan connector
Can be grouped
sensitive
Scan Trigger
BOOLEAN
The document sensitive flag
scanTrigger
Manual Classification
BOOLEAN
The trigger of the file scanning
critical
Critical
BOOLEAN
The document critical flag
md5
Document hash
STRING
The hash value of the document
Can be grouped
keywordHits
Keyword Hits
STRING
The keyword hits of the document
Can be grouped
detectorHits
Detector Hits
STRING
The detector hits of the document
Can be grouped
directPermissionsId
Direct permissions ids
STRING
Direct permissions ids of the document
Can be grouped
indirectPermissionsId
Indirect Permission Id
STRING
(Experimental) The trustee Id of the document that has access either directly or indirectly
indirectPermissions
Indirect Permissions
STRING
(Experimental) The trustee name of the document that has access either directly or indirectly
trusteeName
Trustee Name
STRING
The name of an owner of the document
Can be grouped
trusteeLoginName
Trustee Login Name
STRING
The login name of the owner of the document
dataAttributeName
Data Attribute Name
STRING
The data attribute of the document
Can be grouped
distributionTags
Distribution Tag Name
STRING
The distribution tag of the document
Can be grouped
keyword
Keyword
STRING
Keyword of the document
Can be grouped
complianceTag
Compliance Tag
STRING
Compliance Tag of the document
Can be grouped
location
Location
STRING
To get Documents by connection location
Can be grouped
language
Language
STRING
The document language
externalSharedLink
External Shared Link
BOOLEAN
The document sharing status
ownerId
Owner
Identifier
STRING
The document owner identifier
Can be grouped
downloadUrl
Download URL
STRING
The download URL of the document
machineName
Machine Name
STRING
The machine name of the endpoint where the document originated
Can be grouped
cloudLabels
Cloud Labels
STRING
The document cloud labels
Can be grouped
dataAssets
Data asset
STRING
The document data asset
Can be grouped
departments
Department
STRING
The document department
Can be grouped
dataOwners
Data Owner
STRING
Data Owner Name
Can be grouped
dynamicAttributes
Dynamic attribute
STRING
The document dynamic attribute
Can be grouped
sourceEntityId
Source entity identifier
STRING
The document id as reported from the source system
Can be grouped
Trustee Type
STRING
Indicates the trustee is a user or a group
Can be grouped
source
Source
STRING
The type of the connector
Can be grouped
name
Login Name
STRING
Login name of the trustee
Can be grouped
displayName
Display Name
STRING
The display name of the trustee (user or group)
Can be grouped
isEnabled
Enabled Status
BOOLEAN
Indicates if the trustee is enabled
isAdmin
Admin Privileges
BOOLEAN
Indicates if the trustee has administrator privileges
outdatedPassword
Outdated Password
BOOLEAN
Indicates if the trustee's password is outdated
min, max, median, Can be grouped
lastLoginAt
Last Login Date
DATE
The timestamp when the trustee (user or group) logged in
min,
max,
median,
average,
Can be grouped
lastModifiedAt
Last Modified Date
DATE
The timestamp when the trustee's (user or group) record was modified
min,
max,
median,
average
createdAt
Creation Date
DATE
The time trustee was created
min,
max,
median,
average
ingestedAt
Ingestion Date
DATE
The timestamp when the trustee (user or group) was ingested
min, max, median, Can be grouped
configurationId
Configuration ID
STRING
Configuration ID associated with the trustee (user or group)
Can be grouped
isActive
Active Status
BOOLEAN
Indicates if the trustee is active
Email Recipients
STRING
The recipients of the email
Can be grouped
senderEmail
Email Sender
STRING
The sender of the email
Can be grouped
subject
Email Subject
STRING
The subject of the email
Can be grouped
operation
Operation Type
STRING
The type of the operation performed
Can be grouped
eventTime
Event Time
DATE
The time when the event occurred
min, max,
median,
Can be grouped
ipAddress
IP Address
STRING
The IP address of the machine where the activity was performed
Can be grouped
eventType
Event Type
STRING
The event type of the activity performed (i.e. Activity with File, Email)
Can be grouped
hostName
Host Name
STRING
The hostname of the machine where the activity was performed
Can be grouped
department
Department
STRING
The department of the user who performed the activity
Can be grouped
agentId
Agent Id
STRING
The identification of the agent who performed the activity
Can be grouped
entityId
Agent
STRING
Unique identifier of the machine
Can be grouped
user
User
STRING
The username of the individual who performed the activity
Can be grouped
contentLength
File Size
BYTES
The size of the file involved in the activity
sum, average, min, max, median,
Can be grouped
mimeType
File Type
STRING
The MIME type of the file
Can be grouped
fileName
File Name
STRING
The name of the file
Can be grouped
filePath
File Path
STRING
The path of the file
Can be grouped
creationTime
Created At
DATE
The time when the file involved in the activity was created
min, max,
median,
Can be grouped
lastModificationTime
Last Modified At
DATE
The last time the file involved in the activity was changed
min, max, median,
Can be grouped
tags
Tags
STRING
Classification tags
Can be grouped
classificationTag
Classification Tag
STRING
Classification Tag of the document
Can be grouped
distributionTag
Distribution Tag
STRING
The distribution tag of the document
Can be grouped
complianceTag
Compliance Tag
STRING
Compliance Tag of the document
Can be grouped
senderDomain
Sender Domain
STRING
Sender Domain of email
Can be grouped
recipientDomain
Recipient Domain
STRING
Recipient Domain of email
Can be grouped
Domain
STRING
Shows the Active Directory domain name, if applicable
Can be grouped
ipAddress
IP Address
STRING
Shows the IP address last recorded when the device was active
Can be grouped
status
Online Status
STRING
Shows whether the device is currently online or offline
Can be grouped
user
User Name
STRING
Displays the name of the last user who logged into the device
Can be grouped
version
Agent Version
STRING
The version of the agent software currently installed on the device
Can be grouped
os
OS
STRING
Indicates the operating system of the device, either Windows or Mac
Can be grouped
deviceId
Device ID
STRING
Displays the ID of the device
department
Department
STRING
Displays the department the agent belongs to
Can be grouped
Action type
STRING
Action type
Can be grouped
errorReason
Error Reason
STRING
Error Reason
Can be grouped
updatedAt
Updated At
STRING
Modification date of the file
min,
max,
median,
Can be grouped
updatedPermissions
Updated permission
STRING
Permissions that were revoked
Can be group
updatedTrustees
Updated trustees
STRING
Updated trustees
Can be grouped
createdAt
Created At
DATE
Creation date of remediation request
min,
max,
median,
Can be grouped
path
Source file path
STRING
Path of the source file
Can be grouped
errorReasonFull
Full Error Reason
STRING
Full error reason message
Can be grouped
targetSource
Target connector type
STRING
Connector type of the target file
Can be grouped
targetPath
Target file path
STRING
Path of the target file
Can be grouped
createdBy
Created By - ID
STRING
ID of the user who initiated remediation
Can be grouped
createdByName
Created By - Username
STRING
Name of the user who initiated remediation
Can be grouped
fileId
File ID
STRING
The Id of the file
Can be grouped
configurationIds
Source configuration id
STRING
Configuration id of the source file
Can be grouped
batchRequestId
ID of a batch request
STRING
ID of a batch request related to the event
Can be grouped
targetConfigurationIds
Target configuration id
STRING
Configuration id of the target file
Can be grouped
Action type
STRING
Type of the action
Can be grouped
eventTime
Event time
DATE
The time of the event
min,
max,
median,
Can be grouped
userName
Username
STRING
Name of the user
Can be grouped
userId
User ID
STRING
Id of the user
Can be group
fileName
File name
STRING
File name
Can be grouped
eventId
Id
STRING
Id of a file
Can be grouped
path
File path
STRING
File path
Can be grouped
configurationIds
Scan configuration ID
STRING
Scan configuration ID
Can be grouped
fileId
fileId
STRING
Id of the file
Can be grouped
Scan Configuration ID
STRING
The identifier of the scan configuration
Can be grouped
rawEventType
Raw Event Type
STRING
The type of the raw event
Can be grouped
streamingEventType
Streaming Event Type
STRING
The type of the streaming event
Can be grouped
actionType
Action Type
STRING
The type of action performed
Can be group
entityId
Entity ID
STRING
The identifier of the entity
Can be grouped
entityName
Entity Name
STRING
The name of the entity
Can be grouped
fileId
File ID
STRING
The identifier of the file
Can be grouped
sourcePath
Source Path
STRING
The source path of the file
path
Path
STRING
The path of the file
connectorType
Connector Type
STRING
The type of connector used
Can be grouped
userId
User ID
STRING
The identifier of the user
Can be grouped
userName
User Name
STRING
The name of the user
Can be grouped
timestamp
Timestamp
DATE
The time when the event occurred
min, max, median,
Can be grouped
Functionality Name
STRING
The name of the functionality associated with the event
Can be grouped
moduleName
Module Name
STRING
Name of the module associated with the event
Can be grouped
createdAt
Created At
STRING
Creation date of the file
min,
max,
median,
Can be grouped
message
Message
STRING
Message
Can be group
path
File path
STRING
Path of the file
Can be grouped
fileId
File ID
STRING
The Id of the file associated with the event
Can be grouped
scanId
Scan ID
STRING
Id of the scan associated with the event
Can be grouped
configurationId
Configuration ID
STRING
ID of the configuration associated with the event
Can be grouped
fileId
Id
STRING
The internal Id of the document
fileType
File Type
STRING
The type of the document
Can be grouped
trusteeId
Trustee ID
STRING
The Id of the trustee
Can be grouped
groups
Groups
STRING
The groups that the trustee is a member of
Can be grouped
spanId
Span ID
STRING
The span ID of the activity performed
Can be grouped
traceId
Trace ID
STRING
The trace ID of the activity performed
Can be grouped
lastSeen
Last Seen
DATE
The last time the device was observed to be online
min, max, median,
Can be grouped
hostName
Host Name
STRING
The identification of the agent who performed the activity
Can be grouped
source
Source connector type
STRING
Connector type of the source file
Can be grouped
status
Status
STRING
Status of the remediation request
Can be grouped
source
Source
STRING
Source of the event
Can be grouped
eventType
Event type
STRING
Type of the event
Can be grouped
id
ID
STRING
The unique identifier of the streaming event
Can be grouped
tenantId
Tenant ID
STRING
The identifier of the tenant
Can be grouped
source
Source connector type
STRING
Connector type associated with the file
Can be grouped
eventType
Event Type
STRING
Type of the event
Can be grouped




















path
type
recipients
domain
actionType
actionType
scanConfigurationId
functionalityName
dataAttributeName=HR AND risk=2complianceTag=PII AND dataAttributeName=HR AND (dataAttributeName=Record OR
dataAttributeName=Legal OR dataAttributeName=Safety) AND
(detectorHits="Health Insurance" OR detectorHits="Risk assessment" OR
detectorHits="Policy and Procedure" OR detectorHits="Compliance report" OR
detectorHits="Safety Policies" OR detectorHits="Security Policies")