Data Management

Semantics

How to Connect Data

1. Supported Data Sources

You can connect the following types of data sources:

JDBC (Java Database Connectivity):
- MySQL
- PostgreSQL
Files:
- Local files: Upload .csv, .xlsx, .xls formats from your computer.
- Google Drive: Connect to your google drive and upload .csv, .xlsx, .xlsor Google Sheet formats files.
- S3 Data Source: Configure S3 connection with your Region, Endpoint, Bucket Name, Access Key Type, Access Key ID, Secret, and File Path.

2. Add a New Data Source

There are three ways to connect a new data:

Quick upload in chatbox：

By clicking "+" in chatbox, you can upload local files here.

Through quick mode, you file will be available for only one chat session.
Connect and build semantic models:

Open Data Workbench and click + Browse Data Model to add new data model to scope.
Upload your data here — it will be intelligently structured into the semantic model.
Once a semantic model has been built, you can click Add to Scope so that it will appear in the side panel Recent Scope list.
Select the model you want to focus on. You can now ask any question or explore AI-generated suggestions in Recommended Questions Based on Scope.

Manually Connect JDBC Databases or files.

Open Data Connection and click Connect Data or "+".
Connect every type of data here — databases (MySQL, PostgreSQL via JDBC) or files (Local, Google Drive, S3).
You can later use the data to build a semantic model, either with smart semantics or manually.

3. View & Manage Data Sources

All connected data sources will appear in the Data Connection.
From here, you can:
- Search: Search by Table Name
- View Details: View the data schema and sample rows
- Delete or Disable: Remove inactive or outdated sources

How to Build Semantics

1. Overview:

Semantics translates raw tables and fields into reusable dimensions and metrics, making it easy for Ada to assist you with data analysis and natural language queries.

Semantic Model: A virtual data model built from one or more connected tables. It defines how your data is joined, how business concepts are mapped to fields, and how metrics are calculated.
Dimension: Attributes you can use to break down or filter data. (e.g. Time, Categories, Status.)
Metric: Numerical values used to measure business performance. You can create:
- Basic Metrics: direct aggregations like SUM(Sales) or COUNT(Users)
- Composite Metrics: calculated from other metrics, e.g., Profit Margin = (Revenue - Cost) / Revenue

2. Create a Semantic Model

Click “New Model” to begin building your semantic layer manually.
Select one or more tables from your connected data sources.
Click “+Associate Table” to join another table. You can choose the join types and joined dimensions here.

Join-EN-20250709

3. Create Dimension

Click “+Create Dimension” to add dimensions.
Select Data Source and Dimension Calculation. Add more Dimension by clicking "Add Custom Dimension".

Create-Dimension-EN-20250709

4. Create Metric

Click “Create Metric” to add Metrics. You can enter Name, Description and Category for it. You can also adjust the Number Format Settings on its format type, unit, decimal place or thousand separator.
Choose the configuration mode between Basic Metric and Composite Metric:
- To create a Basic Metric, select a saved data source, a calculation field, and its aggregation mode. You can also enable the Point Time Attribute to make the metric time-aware and restrict it to data available at that specific point in time.
- To create a Composite Metric, define a formula using one or more existing metrics. You can apply arithmetic operations (e.g. +, -, *, /) to combine metrics into a new calculated result.

Create-Metric-EN-20250709

5. Manage Models/Dimensions/Metrics

All created models/dimensions/metrics are stored in the Semantics in "Semantics Factory", where you can:
- Search, view and edit existing models/dimensions/metrics.
- Categorize metrics by adding "New Category" or editing existing categories.

Smart Semantic automates the creation of a semantic layer (tables, data models, dimensions, metrics) from your uploaded data files (Excel/CSV) using large language models (LLMs). Once processed, you can instantly query the data using natural language to uncover insights—no manual modeling required.

Key Benefits

Time Efficiency: Skip manual schema design.
Zero Modeling Expertise Needed: LLMs handle data interpretation.
Instant Querying: Ask questions immediately after processing.
Broad Use Cases: Analyze sales data, survey results, logs, etc.

Workflow

Upload Files
- Supported Formats: .xlsx, .xls, .csv (max 100MB/file).
- Requirements:
  - Clear headers (Row 1 must contain column names).
  - Consistent data types per column (e.g., avoid mixing text/numbers in one column).
- How to Upload:
  - Option 1: Chat Interface Upload. Locate the upload icon in the chat input section, after clicking you can upload files.
  - Option 2: Semantic Studio Module. Navigate to Semantic Studio → Click "Smart Semantics" button and upload files.
⚠️ Note: Sensitive data? Use pseudonymization before uploading.
Submit for Processing
- Click Submit after uploading.
- The system validates file structure (e.g., checks for empty cells, encoding issues).
Wait for Results
- Progress Tracking: You can check the status by hovering on the button "AI Model Creation Progress".
- What Happens:
  - LLMs analyze headers, sample rows, and relationships.
  - Output includes:
    - Tables & Data Models: Logical groupings (e.g., Orders, Customers).
    - Dimensions: Categorical fields (e.g., Product Category, Region).
    - Metrics: Aggregations (e.g., Total Sales = SUM(Revenue)).
⏱️ Processing Time: Scales with file size.
Use the Semantic Layer

Once ready:
- Explore Semantics:
  - Navigate to Semantics to review auto-generated tables/dimensions/metrics.
  - Edit names, data types, or relationships if needed (optional).
- Ask Questions:
  - Type natural language queries in the Chat: "Show total sales by region last quarter" → Converts to SQL/visualizations.
  - For supported question formats, please refer to the chapter "For knowledge users".

Example Use Case

File: customer_feedback.csv

Automatically Generated Semantics:
- Data Model: Feedback
- Dimensions: Product_ID, Response_Date, Sentiment (auto-detected as text categories).
- Metrics: Complaint_Count = COUNTIF(Response_Type = 'Complaint').
Query: "Trend of complaints by product for 2023" → Line chart + table output.

Troubleshooting

Issue	Solution
Processing failed	Re-upload with standardized headers and no formula errors.
Incorrect metrics	Edit the metric logic in *Data Models* > *Metrics*.
Slow queries	Use filters (e.g., date ranges) to narrow large datasets.
Unrecognized columns	Ensure headers are descriptive (avoid "Column A").

Configuration

Customize to align with your analytical workflow through these settings:

1. Enable Follow-Up Questions

Control whether LLMs suggest next-step queries after each response. Toggle State:

ON (Default): After answering a query, the system generates relevant follow-up questions.
OFF: Only the direct answer is provided.

2. Semantic Recall Sensitivity

Adjust how strictly semantic elements (metrics/dimensions/filters) are recalled during queries.

Match Sensitivity: Control how strictly elements are recalled in queries. Provide 4 scales, including Blur, Medium, High, Exact.
Max Returned Matches:
- Sets a cap on potential matches per element type.
- Example: If set to 3 for "sales", returns only the top 3 matches: Sales_Amount, Online_Sales, Store_Sales