Introducing Dataset Q&A: Increasing pure language querying for structured datasets in Amazon Fast

Each BI staff is aware of this bottleneck: a enterprise person has a query that falls outdoors current dashboards, in order that they file a ticket. An analyst writes the question, validates the outcomes, and delivers them—hours or days later. Multiply that by lots of of ad-hoc requests monthly, and the backlog turns into the only largest constraint on information staff productiveness.

Amazon Fast now provides a strong new pure language question functionality, Dataset Q&A, to take away this bottleneck. Your query is translated into SQL, run in opposition to the total dataset, and the outcomes are returned in seconds—no row sampling, subject curation, or pre-configured calculated fields required.

Fast already affords two pure language querying modes. Dashboard Q&A is meant for questions on information visualized in printed dashboards, drawing on the enterprise context that authors have constructed into every view. Subject Q&A goes additional. Authors enrich the info mannequin with business-friendly area names and synonyms, so customers can question a curated set of fields in plain language. Dataset Q&A now completes the image. Customers can discover any dataset immediately, going past what an writer has pre-configured, whereas all the safety, permissions, and governance that enterprises count on from Fast stay totally enforced.

Whereas the {industry} has raced to ship text-to-SQL demos, the true problem in enterprise BI has by no means been producing SQL. The problem is grounding ambiguous enterprise language in opposition to complicated schemas, implementing safety at each step, and explaining what the system did and why. The agentic system of Fast is purpose-built for this. The mannequin should resolve lexical ambiguity—does “quantity” imply row depend, income, or models shipped?—and map colloquial enterprise language to the exact column names and calculations within the dataset, with out a predefined dictionary. Earlier than any question runs, the system searches throughout all of your structured belongings (dashboards, datasets, and matters) utilizing a semantic graph that understands how your belongings relate to one another. This lets it discover the proper supply even when your query doesn’t use the precise title of a dataset or column. After the supply is recognized, the system peeks into the info for context like pattern values and distributions and makes use of author-provided area descriptions and enterprise context to disambiguate earlier than utilizing one of many three capabilities accessible for producing SQL.

This launch additionally introduces Dataset Enrichment, a streamlined manner for authors to floor the system in enterprise context for a single dataset with no subject configuration required. If the enterprise context already exists outdoors of Fast (in a knowledge catalog, a modeling instrument, or a staff wiki), authors can add it immediately as a file in opposition to the dataset. Subject descriptions, meant relationships throughout fields, customized directions about particular columns or the dataset as an entire, all of it may be supplied in industry-standard codecs (YAML, JSON) or as plain-text directions. The system applies this context routinely to each question, so an writer defines it as soon as and each person advantages at scale.

Belief requires transparency. With this launch, we additionally introduce Chat Explainability. For any intermediate step concerned in answering a pure language question, the system now provides customers mechanisms to discover what occurred underneath the hood. When structured information capabilities are invoked, customers see step-by-step reasoning behind every reply—the generated SQL, the assumptions the agent made, filters it utilized, and a plain-language clarification for non-technical stakeholders. There is no such thing as a black field.

On this publish, you discover ways to get began with Dataset Q&A, discover real-world use instances with hands-on examples, and uncover superior capabilities like auto-discovery throughout all of your information belongings and multi-dataset querying in a single dialog.

Answer overview

Dataset Q&A lets any person ask a query in plain pure language, and the system generates SQL, executes it in opposition to the total dataset, and returns a solution in seconds. Outcomes are aggregated by design, and each question routinely respects the row-level safety (RLS) and column-level safety (CLS) you could have already configured — no further setup required.

Key advantages embody:

Analyze hundreds of thousands of rows – Question the entire dataset with out row sampling or information caps.
Question past dashboard – Ask about fields and dimensions that aren’t in any current dashboard.
Begin querying instantly – No setup overhead required. Start exploring your information with out creating matters or dashboards.
Discover multi-part questions – Mix filters, calculations, and aggregations in a single pure language question.
Examine the generated SQL – Confirm question logic, validate accuracy, or learn the way the system interpreted your query.
Perceive how questions are interpreted – Overview step-by-step reasoning behind every reply, together with the assumptions made and filters utilized, earlier than sharing outcomes with stakeholders.

Walkthrough

Within the following walkthrough, we exhibit Dataset Q&A utilizing a real-world dataset of bicycle rental journeys from a metropolis bike-sharing community. To observe alongside and replicate the steps in your individual surroundings, just be sure you have the next in place:

An AWS account. For setup directions, see Getting Began with AWS.
Amazon Fast Enterprise Version enabled in your account with a minimum of one Enterprise person and Skilled person. For particulars, see Amazon Fast Sight editions and pricing.
Familiarity with Amazon Fast Sight ideas resembling datasets and the chat interface. See the Amazon Fast Sight documentation to get began.

For a pattern dataset, this walkthrough makes use of the publicly accessible final 4 months of the 2025 Divvy bike journey dataset, which accommodates bike-sharing journey data from Chicago. Obtain the information and create a Fast Sight dataset. You should use the append possibility to mix a number of information. For extra particulars, see the brand new information preparation expertise within the Fast Sight documentation or this YouTube video.

Observe: As a result of the underlying mannequin may phrase or format responses in a different way throughout classes, the precise wording and visible format of solutions could differ from what’s proven right here. Nevertheless, the info values and question outcomes must be constant when utilizing the identical query and dataset.

Step 1: Connect with your information

To make use of Dataset Q&A within the chat expertise, full the next steps:

In Amazon Fast, select the Open chat icon within the top-right navigation.
My Assistant seems because the default system chat agent.
Entry the data picker from the chat footer and select Add inside Particular information and apps.
In Add Fast belongings, select Datasets and choose the Divvy_Bike_Trips dataset.
Select Save.
With the Divvy_Bike_Trips dataset chosen, enter questions within the chat interface.
To start, attempt a dataset discovery query: Are you able to describe the construction of this dataset?

The Fast chat responds with an in depth breakdown of the dataset construction, explaining what data is captured in every column, describes the accessible fields and their goal.

Dataset Q&A capabilities could be invoked for each SPICE and direct question datasets together with Amazon Redshift, Amazon Athena, Amazon Aurora PostgreSQL and Amazon Easy Storage Service (S3) Tables.

Step 2: Discover the dataset

After connecting to the Divvy_Bike_Trips dataset, you possibly can discover the info by a collection of pure language questions. The next examples present how Dataset Q&A handles growing complexity whereas sustaining conversational context.

Instance 1: Analyze journey patterns

Begin with a common exploration of journey patterns throughout months:

What number of rides do we now have for each month in 2025 from September till December?”

Your query is translated right into a structured SQL question. Outcomes seem in a desk visible, together with a key observations part and advised subsequent steps. This question analyzed all 1,857,960 rides within the dataset. Dataset Q&A has no row limits for direct question datasets, so aggregations replicate the entire dataset. For SPICE datasets, the aggregations are topic to SPICE capability.

Instance 2: Present context to information the mannequin

The dataset accommodates two timestamp fields: started_at (when the journey started) and ended_at (when the journey concluded). When no context is supplied, Fast Chat makes use of started_at because the logical default for grouping journeys by month. To research by finish time as an alternative, add context to your query:

“What number of rides do we now have for each month in 2025 from September till December? Use the ended_at timestamp to find out the month.”

The Fast Chat understands the context and ended_at is used for the month grouping within the response.

Instance 3: Examine the generated SQL

To examine the SQL that Fast Sight generates, use the Explainability function accessible within the chat response. This shows step-by-step reasoning behind every reply, together with the generated SQL, so you possibly can confirm how the system interpreted your query.

“What number of rides do we now have for each month in 2025 from September till December?”

The SQL question seems within the response, exhibiting ended_at used from the earlier context, so you possibly can confirm that the interpretation is right.

Instance 4: Ask a number of questions without delay

You possibly can discover the info with a number of questions in a single immediate:

What number of bike rides are there?

What number of journeys by bike sort?

What number of journeys by members?

Particular person SQL queries are run for every query, and a mixed abstract is returned.

Instance 5: Mix superior calculations

The following question asks two questions without delay, each requiring metrics computed at runtime slightly than saved within the dataset.

“What proportion of complete journeys does every member sort account for in September 2025, and what’s the common journey period in minutes? Use a twin axis visible with the axis beginning at 0.”

Within the previous response, the avg_duration_minutes and percentage_of_total_trips are runtime calculations that don’t exist within the underlying dataset. You can too instruct Fast on the visible sort and axis configuration to make use of for representing the outcomes.The next SQL question is routinely generated by Fast in response to the pure language query above. It calculates the share of complete journeys and common journey period for every rider sort in September 2025, utilizing window features and date arithmetic:

SELECT
“member_casual”,
COUNT(*) AS trip_count,
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) AS percentage_of_total_trips,
ROUND(AVG(date_diff(‘second’, “started_at”, “ended_at”)) / 60.0, 2) AS avg_duration_minutes
FROM “Divvy_Bike_Trips”
WHERE “ended_at” >= ‘2025-09-01 00:00:00’
AND “ended_at” < ‘2025-10-01 00:00:00’
GROUP BY “member_casual”
ORDER BY trip_count DESC

Key parts of this question:

Window Perform: SUM(COUNT(*)) OVER () calculates complete journeys throughout all rider varieties for proportion calculation.
Share Calculation: COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () computes every group’s share of complete journeys.
Period Calculation: AVG(DATEDIFF(‘minute’, started_at, ended_at)) calculates common journey period in minutes.
Filtering: Limits information to September 2025 (from September 1 to earlier than October 1).
Grouping: Teams by member_casual to separate member and informal riders.
Ordering: Kinds by complete journeys in descending order.

Working with a number of datasets and areas

Dataset Q&A isn’t restricted to a single dataset. Whether or not you manually choose a dataset, add a number of datasets, or curate a Area with blended asset varieties, The built-in enterprise data graph identifies the proper supply of information based mostly on its interpretation of your query.

Including a single dataset

The earlier walkthrough demonstrated how you can join a single dataset by the data picker and discover it with pure language questions. That is essentially the most easy start line for Dataset Q&A.

Including a number of datasets

You possibly can add a number of datasets to the data picker and ask questions that span your information panorama. When a number of datasets are chosen, the Fast Chat routinely routes every query to essentially the most related dataset based mostly on the query context and accessible schema.

Instance state of affairs: A transportation analyst has entry to each the Divvy bike journey dataset and a Chicago climate dataset. By choosing each datasets within the data picker, they’ll ask:

“What was the whole variety of bike journeys in September 2025?” (routes to Divvy dataset)

“What have been the typical temperatures in September 2025?” (routes to climate dataset)

“Present me bike journey volumes and climate patterns for every month” (analyzes each datasets individually and presents mixed insights)

Auto-discovery with All information and apps

You don’t even must know which datasets can be found. In Fast Chat, the data picker offers an possibility to pick out All information and apps. When chosen, you possibly can ask a query and the system discovers the related datasets routinely, runs queries throughout them, and generates a unified response.

Curating a Area for cross-asset evaluation

For essentially the most complete expertise, set up associated belongings collectively utilizing Amazon Fast Areas. A Area is a group of information, datasets, dashboards, and data bases.Instance state of affairs: A “Transportation Analytics” area may include the Fast Sight Divvy bike journeys dataset, a Chicago climate dataset, metropolis infrastructure stories in PDF and occasion calendar in phrase codecs, and current Fast Sight transportation dashboards.

After this area is chosen within the data picker, you possibly can ask questions that draw from all belongings inside it:

“How did climate patterns have an effect on bike ridership in September?” (combines Divvy bike journey dataset with the Chicago climate dataset)

“What main occasions occurred throughout peak ridership weeks?” (references occasion calendar paperwork)

“Examine bike-sharing utilization with public transit ridership tendencies” (analyzes a number of datasets)

The Fast Chat routinely identifies which belongings include related data and synthesizes insights throughout structured information (datasets) and unstructured content material (paperwork).

Use instances

The next examples symbolize 4 frequent patterns the place Dataset Q&A delivers essentially the most worth.

Sample 1: Progressive complexity with out reconfiguration

What we demonstrated: Beginning with month-to-month aggregations, the walkthrough confirmed progressively extra complicated questions, from defining customized metrics (common journey period) to performing nested aggregations (proportion by member sort), all with none setup or configuration modifications.

Actual-world state of affairs: A enterprise analyst exploring gross sales information can begin by asking “What have been complete gross sales final quarter?” and naturally transfer to “What proportion of income got here from repeat clients in every area, and the way did their common order worth evaluate to new clients?” with out ready for a dashboard replace.

Why this issues: Dataset Q&A helps iterative exploration the place every query builds on the earlier one, with context maintained all through the dialog.

Profit: Pure analytical workflow that matches how analysts suppose by issues.

Sample 2: SQL Transparency with explainability for technical validation

What we demonstrated: For each question within the walkthrough, the generated SQL was accessible on demand, from easy aggregations to nested aggregations with window features. With this transparency, we are able to confirm that pure language was accurately interpreted earlier than sharing outcomes.

Actual-world state of affairs: An information engineer should verify that “What’s the common order worth for repeat clients who made purchases in each Q3 and This autumn 2025?” accurately identifies repeat clients (these with orders in each quarters, not simply both quarter) earlier than sharing the end result with executives.

With Dataset Q&A, technical customers can:

Perceive how pure language questions are interpreted and executed by the Explainability function.
Overview the generated question logic.
Confirm complicated situations resembling AND vs. OR logic, date ranges, and aggregation ranges.
Request changes if the interpretation doesn’t match the intent.
Validate the method earlier than sharing outcomes with stakeholders.

Profit: Confidence in outcomes, means to elucidate methodology, and technical credibility.

Sample 3: Full dataset evaluation

What we demonstrated: Each question accessed the entire underlying dataset. The month-to-month evaluation processed all 1,857,960 rides. The September proportion calculations aggregated throughout 714,562 rides. No sampling or truncation occurred.

Actual-world state of affairs: An operations supervisor analyzing buyer assist tickets wants decision patterns throughout all tickets from the previous yr. A query like “What proportion of tickets have been resolved inside SLA by precedence degree and assist tier?” requires full information for correct insights.

Dataset Q&A queries the entire underlying dataset with SQL, delivering correct aggregations throughout hundreds of thousands of data with out sampling or truncation.

Profit: Full, correct outcomes for data-driven decision-making

Sample 4: Multi-asset evaluation

What this demonstrates: Dataset Q&A works when a number of datasets or an area with blended belongings (datasets + paperwork) are in scope, enabling holistic evaluation throughout organizational information.

Actual-world state of affairs: A transportation planner should perceive how bike-sharing utilization correlates with public transit ridership and metropolis occasions. They created a “Transportation Analytics” Area containing:

Divvy bike journey dataset (structured information)
CTA transit ridership dataset (structured information)
Metropolis occasions calendar (PDF doc)
Climate information (CSV file)

With this area chosen, they’ll ask: “What was the impression of main occasions on bike and transit utilization in October 2025?”

The conversational assistant:

Identifies related structured information from bike journey and transit datasets
Extracts occasion data from the PDF calendar
Correlates climate patterns from the CSV file
Synthesizes insights throughout all sources

Why this issues: Organizations not often make choices based mostly on a single dataset. Dataset Q&A with Areas allows evaluation throughout information silos with out guide information integration or complicated ETL processes.

Profit: Holistic, context-aware insights that replicate the total complexity of enterprise operations.

Key distinctions

Dataset Q&A opens up one-time exploration past pre-configured boundaries. It offers entry to any area with customized runtime calculations in pure language, plus full SQL transparency for technical validation.
Dashboard Q&A works nicely when exploring insights throughout the boundaries of what dashboard authors have configured, together with particular visuals, fields, filters, and curated enterprise logic with calculations.
Subject Q&A shines when authors have created and maintained subject configurations with curated area definitions, synonyms, and customized directions.

Supported information sources

Supported information sources are Amazon Athena, Amazon Redshift, Amazon Aurora PostgreSQL, and Amazon S3 Tables in direct question mode for Dataset Q&A right now.

Present limitations

Composite datasets aren’t supported when the father or mother datasets use SPICE and the kid dataset is in direct question mode.
Customized SQL datasets with parameters are at the moment not supported.

Cleansing up

To keep away from incurring ongoing costs, delete the Divvy_Bike_Trips dataset that you just created as a part of this walkthrough. For directions, see Deleting a dataset within the Amazon Fast documentation.

Conclusion

Dataset Q&A for datasets in Fast Sight inside Amazon Fast removes the obstacles between enterprise questions and information insights. It provides analysts the flexibleness to transcend pre-configured dashboard boundaries, provides technical customers the SQL transparency to validate complicated logic, and offers everybody entry to finish datasets with out row limits.

This functionality enhances current Dashboard Q&A and Subject Q&A options, providing you with the proper instrument for each analytical state of affairs: curated insights while you want guardrails, and versatile exploration when your questions prolong past pre-configured visualizations.

Concerning the authors

Koushik Muthanna Koravanda Ganapathy

Koushik Muthanna Koravanda Ganapathy is a Specialist Options Architect for Amazon Fast at AWS. He helps clients design, implement, and scale Fast throughout their group, from structure to on a regular basis use.

Emily Zhu

Emily Zhu is a Senior Product Supervisor at Amazon Fast, answerable for the total structured information stack — spanning ruled and enterprise-scale information structure, high-performance analytical and conversational question engines, and the semantic and ontology layer that provides information actual which means at scale. She’s keen about how a powerful information technique unlocks AI technique, and is on a mission to make the structured information stack the inspiration for conversational and analytical experiences throughout Fast Suite.

Suren Raju

Suren Raju is a Senior Specialist Options Architect for GenAI at AWS, the place he architects cutting-edge AI options with a concentrate on Amazon Fast. He brings deep experience in structured information connectors, information prep, datasets, and information modeling, alongside his work with Amazon Fast’s multi-agentic workflows, orchestrations, and unstructured information integration by data bases and motion connectors. His revolutionary method to AI-driven options helps organizations democratize information entry and unlock transformative enterprise worth throughout the total spectrum of their information panorama.

Priya Mysore

Priya Mysore is a Senior Worldwide GenAI Specialist at AWS, with over 20 years of expertise in information and analytics. She is keen about serving to clients unlock the true potential of their information utilizing AI/ML and agentic capabilities in Amazon Fast. Priya excels at empowering enterprise customers to harness information by self-service analytics and clever automation. She guides organizations in implementing AI-driven options that democratize information entry and automate complicated workflows, enabling customers to uncover actionable insights and drive enterprise worth. Her deep experience in enterprise intelligence and agentic AI drives revolutionary options that meet the evolving wants of AWS clients.

What's Hot

NYT Strands hints and solutions for Tuesday, Might 12 (sport #800)

OpenAI Introduces Dawn: A Cybersecurity Initiative That Places Codex Safety on the Middle of Vulnerability Detection and Patch Validation

FAQ on hantavirus and outbreak on cruise ship Hondius

OpenAI Introduces Dawn: A Cybersecurity Initiative That Places Codex Safety on the Middle of Vulnerability Detection and Patch Validation

College students Boo Graduation Speaker After She Calls AI the ‘Subsequent Industrial Revolution’

10 GitHub Repositories to Grasp FastAPI

Constructing internet search-enabled brokers with Strands and Exa

Understanding LLM Distillation Methods – MarkTechPost

Your AI Use Is Breaking My Mind

NYT Strands hints and solutions for Tuesday, Might 12 (sport #800)

OpenAI Introduces Dawn: A Cybersecurity Initiative That Places Codex Safety on the Middle of Vulnerability Detection and Patch Validation

FAQ on hantavirus and outbreak on cruise ship Hondius

NYT Strands hints and solutions for Tuesday, Might 12 (sport #800)

OpenAI Introduces Dawn: A Cybersecurity Initiative That Places Codex Safety on the Middle of Vulnerability Detection and Patch Validation

FAQ on hantavirus and outbreak on cruise ship Hondius

Usefull link

categories

What's Hot

Answer overview

Walkthrough

Step 1: Connect with your information

Step 2: Discover the dataset

Instance 1: Analyze journey patterns

Instance 2: Present context to information the mannequin

Instance 3: Examine the generated SQL

Instance 4: Ask a number of questions without delay

Instance 5: Mix superior calculations

Working with a number of datasets and areas

Including a single dataset

Including a number of datasets

Auto-discovery with All information and apps

Curating a Area for cross-asset evaluation

Use instances

Sample 1: Progressive complexity with out reconfiguration

Sample 2: SQL Transparency with explainability for technical validation

Sample 3: Full dataset evaluation

Sample 4: Multi-asset evaluation

Key distinctions

Supported information sources

Present limitations

Cleansing up

Conclusion

Concerning the authors

Koushik Muthanna Koravanda Ganapathy

Emily Zhu

Suren Raju

Priya Mysore

Related Posts

Usefull link

categories