Textual content-to-SQL technology stays a persistent problem in enterprise AI purposes, notably when working with customized SQL dialects or domain-specific database schemas. Whereas basis fashions (FMs) reveal sturdy efficiency on customary SQL, reaching production-grade accuracy for specialised dialects requires fine-tuning. Nevertheless, fine-tuning introduces an operational trade-off: internet hosting customized fashions on persistent infrastructure incurs steady prices, even in periods of zero utilization.
The on-demand inference of Amazon Bedrock with fine-tuned Amazon Nova Micro fashions gives another. By combining the effectivity of LoRA (Low-Rank Adaptation) fine-tuning with serverless and pay-per-token inference, organizations can obtain customized text-to-SQL capabilities with out the overhead value incurred by persistent mannequin internet hosting. Regardless of the extra inference time overhead of making use of LoRA adapters, testing demonstrated latency appropriate for interactive text-to-SQL purposes, with prices scaling by utilization fairly than provisioned capability.
On this publish, we reveal two approaches to fine-tune Amazon Nova Micro for customized SQL dialect technology to ship each value effectivity and manufacturing prepared efficiency. Our instance workload maintained a price of $0.80 month-to-month with a pattern site visitors of twenty-two,000 queries monthly, which resulted in prices financial savings in comparison with a persistently hosted mannequin infrastructure.
Stipulations
To deploy these options, you’ll need the next:
- An AWS account with billing enabled
- Normal IAM permissions and function configured to entry:
- Quota for ml.g5.48xl occasion for Amazon SageMaker AI coaching.
Resolution overview
The answer consists of the next high-level steps:
- Put together your customized SQL coaching dataset with I/O pairs particular to your group’s SQL dialect and enterprise necessities.
- Begin the fine-tuning course of on Amazon Nova Micro mannequin utilizing your ready dataset and chosen fine-tuning method.
- Amazon Bedrock mannequin customization for streamlined deployment
- Amazon SageMaker AI for fine-grained coaching customization and management
- Deploy the customized mannequin on Amazon Bedrock to make use of on-demand inference, eradicating infrastructure administration whereas paying just for token utilization.
- Validate mannequin efficiency with check queries particular to your customized SQL dialect and enterprise use instances.
To reveal this method in follow, we offer two full implementation paths that tackle completely different organizational wants. The primary makes use of the managed mannequin customization of Amazon Bedrock for groups prioritizing simplicity and speedy deployment. The second makes use of Amazon SageMaker AI coaching jobs for organizations requiring extra granular management over hyperparameters and coaching infrastructure. Each implementations share the identical knowledge preparation pipeline and deploy to Amazon Bedrock for on-demand inference. The next are hyperlinks to every GitHub code pattern:
- Bedrock’s managed mannequin customization
- Amazon SageMaker AI coaching jobs
The next structure diagram illustrates the end-to-end workflow, which encompasses knowledge preparation, each fine-tuning approaches, and the Bedrock deployment path that allows serverless inference.
1. Dataset preparation
Our demonstration makes use of the sql-create-context dataset. This dataset is a curated mixture of WikiSQL and Spider datasets containing over 78,000 examples of pure language questions paired with SQL queries throughout numerous database schemas. This dataset supplies a great basis for text-to-SQL fine-tuning as a consequence of its selection in question complexity, from easy SELECT statements to advanced multi-table joins with aggregations.
Information formatting and construction
The Coaching knowledge is structured as outlined within the documentation. This includes creating JSONL information that comprise system immediate directions paired with consumer queries and corresponding SQL responses of various complexity. The formatted coaching dataset is then cut up into coaching and validation units, saved as JSONL information, and uploaded to Amazon Easy Storage Service (Amazon S3) for the fine-tuning course of.
Pattern Transformed Report
{
“schemaVersion”: “bedrock-conversation-2024”,
“system”: [
{
“text”: “You are a powerful text-to-SQL model. Your job is to answer questions about a database. You can use the following table schema for context: CREATE TABLE head (age INTEGER)”
}
],
“messages”: [
{
“role”: “user”,
“content”: [
{
“text”: “Return the SQL query that answers the following question: How many heads of the departments are older than 56 ?”
}
]
},
{
“function”: “assistant”,
“content material”: [
{
“text”: “SELECT COUNT(*) FROM head WHERE age > 56”
}
]
}
]
}
Amazon Bedrock fine-tuning method
The mannequin customization of Amazon Bedrock supplies a streamlined, absolutely managed method to fine-tuning Amazon Nova fashions with out the necessity to provision or handle coaching infrastructure. This technique is right for groups searching for speedy iteration and minimal operational overhead whereas reaching customized mannequin efficiency tailor-made to their text-to-SQL use case.
Utilizing the customization capabilities of Amazon Bedrock, coaching knowledge is uploaded to Amazon S3, and fine-tuning jobs are configured by way of the AWS console or API. AWS then handles the underlying coaching infrastructure. The ensuing customized mannequin could be deployed utilizing on-demand inference, sustaining the identical token-based pricing as the bottom Nova Micro mannequin with no further markup making it a cheap answer for variable workloads.This method is well-suited when that you must shortly customise a mannequin for customized SQL dialects with out managing ML infrastructure, wish to minimal operational complexity, or want serverless inference with computerized scaling.
2a. Making a Advantageous-tuning Job Utilizing Amazon Bedrock
Amazon Bedrock helps fine-tuning utilizing each the AWS Console and AWS SDK for Python (Boto3). The AWS documentation comprises basic steering on submit a coaching job with each approaches. In our implementation, we used the AWS SDK for Python (Boto3). Discuss with the pattern pocket book in our GitHub samples repository to view our step-by-step implementation.
Configure hyperparameters
After choosing the mannequin to fine-tune, we then configure our hyperparameters for our use case. For Amazon Nova Micro fine-tuning on Amazon Bedrock, the next hyperparameters could be custom-made to optimize our text-to-SQL mannequin:
Parameter
Vary/Constraints
Objective
What we used
Epochs
1–5
Variety of full passes by way of the coaching dataset
5 epochs
Batch Measurement
Fastened at 1
Variety of samples processed earlier than updating mannequin weights
1 (fastened for Nova Micro)
Studying Price
0.000001–0.0001
Step dimension for gradient descent optimization
0.00001 for secure convergence
Studying Price Warmup Steps
0–100
Variety of steps to regularly improve studying charge
10
Observe: These hyperparameters had been optimized for our particular dataset and use case. Optimum values could differ based mostly on dataset dimension and complexity. Within the pattern dataset, this configuration offered improved steadiness between mannequin accuracy and coaching time, finishing in roughly 2-3 hours.
Analyzing coaching metrics
Amazon Bedrock routinely generates coaching and validation metrics, that are saved in your specified S3 output location. These metrics embody:
- Coaching loss: Measures how effectively the mannequin suits the coaching knowledge
- Validation loss: Signifies generalization efficiency on unseen knowledge
The coaching and validation loss curves present profitable coaching: each lower persistently, observe comparable patterns, and converge to comparable remaining values.
3a. Deploy with on-demand inference
After your fine-tuning job completes efficiently, you’ll be able to deploy your customized Nova Micro mannequin utilizing on-demand inference. This deployment choice supplies computerized scaling and pay-per-token pricing, making it ideally suited for variable workloads with out the necessity to provision devoted compute assets.
Invoking the customized Nova Micro mannequin
After deployment, you’ll be able to invoke your customized text-to-SQL mannequin through the use of the deployment ARN because the mannequin ID within the Amazon Bedrock Converse API.
# Use the deployment ARN because the mannequin ID
deployment_arn = “arn:aws:bedrock:us-east-1::deployment/”
# Put together the inference request
response = bedrock_runtime.converse(
modelId=deployment_arn,
messages=[
{
“role”: “user”,
“content”: [
{
“text”: “””Database schema:
CREATE TABLE sales (
id INT,
product_name VARCHAR(100),
category VARCHAR(50),
revenue DECIMAL(10,2),
sale_date DATE
);
Question: What are the top 5 products by revenue in the Electronics category?”””
}
]
}
],
inferenceConfig={
“maxTokens”: 512,
“temperature”: 0.1, # Low temperature for deterministic SQL technology
“topP”: 0.9
}
)
# Extract the generated SQL question
sql_query = response[‘output’][‘message’][‘content’][‘text’]
print(f”Generated SQL:
{sql_query}”)
Amazon SageMaker AI fine-tuning method
Whereas the Amazon Bedrock method streamlines mannequin customization by way of a managed coaching expertise, organizations searching for deeper optimization management may profit from the SageMaker AI method. SageMaker AI supplies in depth management over coaching parameters that may considerably influence effectivity and mannequin efficiency. You’ll be able to alter batch dimension for velocity and reminiscence optimzation, fine-tune dropout settings throughout layers to forestall overfitting, and configure studying charge schedules for coaching stability. For LoRA fine-tuning particularly, You should use SageMaker AI to customise scaling elements and regularization parameters with completely different settings optimized for multimodal versus text-only datasets. Moreover, you’ll be able to alter the context window dimension and optimizer settings to match your particular use case necessities. See the next pocket book for the whole code pattern.
1b. Information preparation and add
The information preparation and add course of for the SageMaker AI fine-tuning method is an identical to the Amazon Bedrock implementation. Each approaches convert the SQL dataset to the bedrock-conversation-2024 schema format, cut up the info into coaching and check units, and add the JSONL information on to S3.
# S3 prefix for coaching knowledge
training_input_path = f’s3://{sess.default_bucket()}/datasets/nova-sql-context’
# Add datasets to S3
train_s3_path = sess.upload_data(
path=”knowledge/train_dataset.jsonl”,
bucket=bucket_name,
key_prefix=training_input_path
)
test_s3_path = sess.upload_data(
path=”knowledge/test_dataset.jsonl”,
bucket=bucket_name,
key_prefix=training_input_path
)
print(f’Coaching knowledge uploaded to: {train_s3_path}’)
print(f’Check knowledge uploaded to: {test_s3_path}’)
2b. Making a fine-tuning job utilizing Amazon SageMaker AI
Choose the mannequin ID, recipe, and picture URI:
# Nova configuration
model_id = “nova-micro/prod”
recipe = “https://uncooked.githubusercontent.com/aws/sagemaker-hyperpod-recipes/refs/heads/fundamental/recipes_collection/recipes/fine-tuning/nova/nova_1_0/nova_micro/SFT/nova_micro_1_0_g5_g6_48x_gpu_lora_sft.yaml”
instance_type = “ml.g5.48xlarge”
instance_count = 1
# Nova-specific picture URI
image_uri = f”708977205387.dkr.ecr.{sess.boto_region_name}.amazonaws.com/nova-fine-tune-repo:SM-TJ-SFT-latest”
print(f’Mannequin ID: {model_id}’)
print(f’Recipe: {recipe}’)
print(f’Occasion kind: {instance_type}’)
print(f’Occasion rely: {instance_count}’)
print(f’Picture URI: {image_uri}’)
Configuring customized coaching recipes
A key differentiator when utilizing Amazon SageMaker AI for Nova mannequin fine-tuning is the power to customise a coaching recipe. Recipes are pre-configured coaching stacks offered by AWS that can assist you shortly begin coaching and fine-tuning. Whereas sustaining compatibility with the usual hyperparameter set (epochs, batch dimension, studying charge, and warmup steps) of Amazon Bedrock, the recipes lengthen hyperparameter choices by way of:
- Regularization parameters: hidden_dropout, attention_dropout, and ffn_dropout to forestall overfitting.
- Optimizer settings: Customizable beta coefficients and weight decay settings.
- Structure controls: Adapter rank and scaling elements for LoRA coaching.
- Superior scheduling: Customized studying charge schedules and warmup methods.
The beneficial method is to start out with the default settings to create a baseline, then optimize based mostly in your particular wants. Right here’s a listing of among the further parameters you can optimize for.
Parameter
Vary/Constraints
Objective
max_length
1024–8192
Management the utmost context window dimension for enter sequences
global_batch_size
16,32,64
Variety of samples processed earlier than updating mannequin weights
hidden_dropout
0.0–1.0
Regularization for hidden layer states to forestall overfitting
attention_dropout
0.0–1.0
Regularization for consideration mechanism weights
ffn_dropout
0.0–1.0
Regularization for feed ahead community layers
weight_decay
0.0–1.0
L2 Regularization power for mannequin weights
Adapter_dropout
0.0–1.0
Regularization for LoRA adapter parameters
The entire recipe that we used could be discovered right here.
Creating and executing a SageMaker AI coaching job
After configuring your mannequin and recipe, initialize the ModelTrainer object and start coaching:
from sagemaker.prepare import ModelTrainer
coach = ModelTrainer.from_recipe(
training_recipe=recipe,
recipe_overrides=recipe_overrides,
compute=compute_config,
stopping_condition=stopping_condition,
output_data_config=output_config,
function=function,
base_job_name=job_name,
sagemaker_session=sess,
training_image=image_uri
)
# Configure knowledge channels
from sagemaker.prepare.configs import InputData, S3DataSource
train_input = InputData(
channel_name=”prepare”,
data_source=S3DataSource(
s3_uri=train_s3_path,
s3_data_type=”Converse”,
s3_data_distribution_type=”FullyReplicated”
)
)
val_input = InputData(
channel_name=”val”,
data_source=S3DataSource(
s3_uri=test_s3_path,
s3_data_type=”Converse”,
s3_data_distribution_type=”FullyReplicated”
)
)
# Start coaching
training_job = coach.prepare(
input_data_config=[train_input,val_input],
wait=False
)
After coaching, we register the mannequin with Amazon Bedrock by way of the create_custom_model_deployment Amazon Bedrock API, enabling on-demand inference by way of the converse API utilizing the deployed mannequin ARN, system prompts, and consumer messages.
In our SageMaker AI coaching job, we used default recipe parameters, together with an epoch of two and batch dimension of 64, our knowledge contained 20,000 traces thus the whole coaching job lasted for 4 hours. With our ml.g5.48xlarge occasion, the entire value for fine-tuning our Nova Micro mannequin was $65.
4. Testing and analysis
For evaluating our mannequin, we carried out each operational and accuracy testing. To judge accuracy, we carried out an LLM-as-a-Choose method the place we collected questions and SQL responses from our fine-tuned mannequin and used a decide mannequin to attain them towards the bottom reality responses.
def get_score(system, consumer, assistant, generated):
formatted_prompt = (
“You’re a knowledge science trainer that’s introducing college students to SQL. ”
f”Take into account the next query and schema:”
f”{consumer}”
f”{system}”
“Right here is the proper reply:”
f”{assistant}”
f”Right here is the scholar’s reply:”
f”{generated}”
“Please present a numeric rating from 0 to 100 on how effectively the scholar’s ”
“reply matches the proper reply. Put the rating in XML tags.”
)
_, outcome = ask_claude(formatted_prompt)
sample = r'(.*?)’
match = re.search(sample, outcome)
return match.group(1) if match else “0”
For operational testing, we gathered metrics together with TTFT (Time to First Token) and OTPS (Output Tokens Per Second). In comparison with the bottom Nova Micro mannequin, we skilled chilly begin time to first token averaging 639 ms throughout 5 runs (34% improve). This latency improve stems from making use of LoRA adapters at inference time fairly than baking them into mannequin weights. Nevertheless, this architectural alternative delivers substantial value advantages, because the fine-tuned Nova Micro mannequin prices the identical as the bottom mannequin, enabling on-demand pricing with pay-per-use flexibility and no minimal commitments. Throughout regular operation, our time to first token averages 380 ms throughout 50 calls (7% improve). Finish-to-end latency totals roughly 477 ms for full response technology. Token technology maintains a charge of roughly 183 tokens per second, representing solely a 27% lower from the bottom mannequin whereas remaining extremely appropriate for interactive purposes.
Price abstract
One-time prices:
- Amazon Bedrock mannequin coaching value: $0.001 per 1,000 tokens × variety of epochs
- For two,000 examples, 5 epochs and roughly 800 tokens every = $8.00
- SageMaker AI mannequin coaching value: We used the ml.g5.48xlarge occasion, which prices $16.288/hour
- Coaching lasted 4 hours with a 20,000-line dataset = $65.15
- Ongoing prices
- Storage: $1.95 monthly per customized mannequin
- On-demand inference: Identical per-token pricing as base Nova Micro
- Enter tokens: $0.000035 per 1,000 tokens (Amazon Nova Micro)
- Output tokens: $0.00014 per 1,000 tokens (Amazon Nova Micro)
Instance calculation for manufacturing workload:
For 22,000 queries monthly (100 customers × 10 queries/day × 22 enterprise days):
- Common 800 enter tokens + 60 output tokens per question
- Enter value: (22,000 × 800 / 1,000) × 0.000035 = 0.616
- Output value: (22,000 × 60 / 1,000) × 0.00014 = 0.184
- Whole month-to-month inference value: 0.80 USD
This evaluation validates that for customized dialect text-to-SQL use instances, fine-tuning a Nova mannequin utilizing PEFT LoRA on Amazon Bedrock is considerably more cost effective than self-hosting customized fashions on persistent infrastructure. Self-hosted approaches may suite use instances requiring most management over infrastructure, safety configurations, or integration necessities, however the Amazon Bedrock on-demand value mannequin gives vital value financial savings for many manufacturing text-to-SQL workloads.
Conclusion
These implementation choices reveal how Amazon Nova fine-tuning could be tailor-made to organizational wants and technical necessities. We explored two distinct approaches that serve completely different audiences and use instances. Whether or not you select the managed simplicity of Amazon Bedrock or extra management by way of SageMaker AI coaching, the serverless deployment mannequin and on-demand pricing implies that you solely pay for what you employ, whereas eradicating infrastructure administration.
The Amazon Bedrock mannequin customization method supplies a streamlined, managed answer that eliminates infrastructure complexity. Information scientists can deal with knowledge preparation and mannequin analysis with out managing coaching infrastructure, making it ideally suited for fast experimentation and improvement.
The SageMaker AI coaching method gives elevated management over each facet of the fine-tuning course of. Machine studying (ML) engineers acquire granular management over coaching parameters, infrastructure choice, and integration with present MLOps workflows, which permits optimization for required efficiency, value, and operational necessities. For instance, you’ll be able to alter batch sizes and occasion sorts to optimize coaching velocity, or modify studying charges and LoRA parameters to steadiness mannequin high quality with coaching time based mostly in your particular operational wants
Select Amazon Bedrock mannequin customization when: You want speedy iteration, have restricted ML infrastructure experience, or wish to reduce operational overhead whereas nonetheless reaching customized mannequin efficiency.
Select SageMaker AI coaching when: You require fine-grained parameter management, have particular infrastructure or compliance necessities, want integration with present MLOps pipelines, or wish to optimize each facet of the coaching course of.
Get began
Able to construct your individual cost-effective text-to-SQL answer? Entry our full implementations:
Each approaches use the identical cost-efficient deployment mannequin, so you’ll be able to select based mostly in your crew’s experience and necessities fairly than value constraints.
Concerning the authors
Zeek Granston
Zeek is an Affiliate AI/ML Options Architect targeted on constructing efficient synthetic intelligence and machine studying options. He stays present with business developments to ship sensible outcomes for shoppers. Exterior of labor, Zeek enjoys constructing AI purposes, and taking part in basketball.
Felipe Lopez
Felipe Lopez is a Senior AI/ML Specialist Options Architect at AWS. Previous to becoming a member of AWS, Felipe labored with GE Digital and SLB, the place he targeted on modeling and optimization merchandise for industrial purposes.

