This put up was co-written with Yash Munsadwala, Adam Hood, Justin Guse, and Hector Hernandez from PwC.
Contract evaluation usually consumes vital time for authorized, compliance, and procurement groups, particularly when necessary insights are buried in prolonged, unstructured agreements. As contract volumes develop, discovering particular clauses and assessing extracted phrases can grow to be more and more tough to scale.
In the present day, many groups rely totally on key phrase and pattern-based extraction or contract administration methods to research contracts. Whereas these strategies can work, they usually fall in need of offering constant insights at a scale. Because of this, many groups are exploring AI-based approaches that may mix massive language fashions (LLMs) with automated extraction workflows.
PwC’s AI-driven annotation (AIDA) answer, constructed on AWS, can extract structured insights from contracts via rule-based extraction and pure language queries. Utilizing LLMs, AIDA can interpret advanced authorized language and extracts insights primarily based on outlined guidelines. Customers can ask pure language questions on particular person contracts or throughout a number of paperwork inside a challenge and obtain context-specific solutions supported by linked citations. By decreasing the necessity to manually search and interpret contract language, these capabilities assist streamline evaluate workflows. In buyer implementations, AIDA has helped cut back handbook contract evaluate time by as much as 90%, serving to groups to retrieve key info extra rapidly and shorten evaluate cycles. On this put up, you will note how AIDA addresses these challenges. We stroll via the structure behind AIDA and display three core capabilities: template-based extraction, document-level chat, and world chat throughout paperwork.
Answer overview
AIDA is designed to transform unstructured paperwork into structured, searchable insights, streamlining the method to entry and reuse crucial contract info throughout methods. AIDA makes use of LLMs and a mix of AWS cloud-native and built-in providers to assist extract insights from contracts extra successfully. The answer supplies capabilities that may help organizational safety, compliance, and danger administration necessities, although clients stay liable for configuring and working the answer to fulfill their particular compliance obligations. As AIDA processes doubtlessly delicate contractual knowledge, applicable safeguards and human evaluate workflows ought to be utilized previous to enterprise or authorized reliance on AI-generated outputs. AIDA supplies a holistic suite of capabilities designed to handle current challenges. The next key options spotlight core performance, which we discover intimately within the subsequent sections:
- Personalized Knowledge Extraction: Extract scalable knowledge enabled by user-defined guidelines and customized templates. Use the customized extraction subject and logic per doc and extract insights from 1000’s of contracts in parallel with constant accuracy.
- Pure Language Q&A Throughout Paperwork: Ask pure language questions and obtain context-specific responses with linked citations to the supply paperwork.
- Integration with Mannequin Programs: Combine with mannequin methods (for instance, contract administration methods and doc repositories) that you should utilize to retrieve supply knowledge and ship extracted insights.
AIDA can help scalable contract evaluation throughout a variety of industries, together with Media & Leisure (M&E) and Actual Property—and competencies like Procurement, Authorized, and Compliance. As an example, within the M&E sector, AIDA helps content material producers and distributors unlock the general worth of their IP by extracting and analyzing rights info from license agreements. It summarizes rights comparable to broadcast, streaming, theatrical, and spinoff enabling quicker, knowledgeable selections on spin-offs, sequels, and world distribution. One main movie and TV studio lowered rights analysis time by 90%.
AIDA’s structure overview
The structure illustrates how AIDA’s parts work collectively to securely course of, analyze, and ship insights from advanced contracts utilizing the scalable, cloud-native providers of AWS. Every element is designed to assist course of contracts at scale whereas sustaining safety, traceability, and efficiency.
1. Edge safety and entry
AIDA’s edge layer allows authenticated entry and managed routing for consumer visitors. Requests go via AWS WAF for risk filtering, then via a Community Load Balancer to the reverse proxy server (NGINX), which manages SSL termination, routing, and coverage enforcement earlier than forwarding to Amazon Elastic Container Service (Amazon ECS). Knowledge in transit is encrypted utilizing TLS 1.2 or greater, together with consumer connections via HTTPS, and inside service-to-service communication between Amazon ECS, Amazon Relational Database Service (Amazon RDS), Amazon Easy Storage Service (Amazon S3), Amazon Bedrock, and different AWS providers.
Authentication is dealt with via Amazon Cognito, built-in with enterprise identification suppliers (for instance, Microsoft Entra ID, Okta) to safe entry at scale. AIDA applies fine-grained entry management via each application-level and project-level roles, so directors can handle consumer entry and permissions centrally. Challenge-level roles assist directors to manage consumer permissions and outline what actions every consumer can carry out inside a challenge, offering safe and ruled entry to knowledge and performance.
2. Knowledge storage
After authentication, AIDA shops uploaded paperwork, Optical Character Recognition (OCR) outputs, and related metadata in Amazon S3 offering a sturdy and cost-effective method to handle massive volumes of contract knowledge. Structured knowledge, configurations, and extracted insights persist in Amazon RDS, so customers can question and retrieve insights successfully for analytics and integration.
Amazon S3 buckets are encrypted at relaxation utilizing Amazon S3-managed encryption keys (SSE-S3), and Amazon RDS cases are encrypted at relaxation utilizing AWS KMS-managed keys. Moreover, S3 bucket setup follows Amazon S3 finest practices together with: Block Public Entry enabled on the bucket degree and enabling entry logging for safety evaluation and audit functions.
3. OCR and prediction processing
OCR and extraction workflows run asynchronously on Amazon ECS utilizing AWS Fargate, with duties coordinated via Amazon Easy Queue Service (Amazon SQS). With this method, customers can course of massive volumes of contracts in parallel with out blocking consumer interactions.
Extraction guidelines information how related content material is recognized and despatched to basis fashions (FMs) hosted on Amazon Bedrock, the place LLMs can interpret the contract textual content and extract structured values. Outcomes are written again to Amazon RDS, the place they’re accessible for evaluate, dashboards, and integrations.
4. Retrieval Augmented Era (RAG)
When analyzing contracts, it’s crucial that solutions are correct and traceable again to the unique supply textual content. RAG assist handle this by grounding mannequin responses within the underlying contract content material, fairly than relying solely on the mannequin’s data. AIDA makes use of RAG to assist confirm that responses are grounded within the underlying contract textual content. Paperwork saved in Amazon S3 are embedded utilizing Amazon Bedrock Embeddings Fashions, with vectors listed in Amazon OpenSearch Serverless for semantic search. Throughout inference, related knowledge is retrieved from Amazon Bedrock Information Bases and mixed with consumer enter, producing correct, context-aware, and explainable outcomes.
As well as, AIDA makes use of Amazon Bedrock Guardrails to use content material filtering, delicate info (PII) safety, and immediate security controls, additional confirming that responses stay safe and aligned with enterprise and authorized requirements.
5. Visualization
To indicate how contracts are being processed, AIDA integrates with Amazon Fast Sight to visualise metrics comparable to doc volumes, OCR accuracy, extraction throughput, and processing standing.
This dashboard can provide visibility into system efficiency and helps determine bottlenecks or alternatives to enhance effectivity over time.
6. System integrations throughout inside, vendor, and third-party methods
AIDA integrates with downstream methods utilizing AWS Lambda, Amazon EventBridge, and Amazon SQS. These integrations ship extracted insights to contract lifecycle administration instruments, knowledge methods, or different operational methods. A configurable human-in-the-loop evaluate queue can validate and approve extracted outputs earlier than they’re forwarded downstream.
By pushing structured contract knowledge into instruments in use, organizations can cut back handbook knowledge dealing with and reuse contract insights throughout compliance, reporting, and analytics workflows.
7. Ancillary and system providers
A variety of ancillary AWS providers help AIDA’s core system offering safety, observability, and automation. AWS Identification and Entry Administration (AWS IAM) and AWS Key Administration Service (AWS KMS) handle entry and encryption, with IAM insurance policies carried out following the precept of least privilege; Amazon CloudWatch and AWS X-Ray present monitoring; whereas AWS CodeBuild, AWS CodePipeline, and AWS CloudTrail allow steady deployment and auditability by enabling entry logging for knowledge operations.
Let’s discover how Amazon Bedrock particularly allows the clever options that drive these effectivity features.
How Amazon Bedrock allows AIDA’s clever options
Amazon Bedrock allows AIDA’s clever insights, extraction and conversational capabilities. By integrating superior FMs into AIDA’s processing pipeline, Amazon Bedrock allows context-aware knowledge extraction, semantic retrieval, and interactive chat functionalities. AIDA orchestrates doc processing, OCR, semantic retrieval, and LLM reasoning in a unified workflow retrieving related sections primarily based on queries or predefined guidelines and utilizing Amazon Bedrock to help RAG and supply responses with clear citations to the supply paperwork.
To showcase the important thing options, we uploaded pattern contracts to AIDA from the Contract Understanding Atticus Dataset (CUAD), an open authorized contract evaluate dataset created with dozens of authorized consultants from The Atticus Challenge. The CUAD dataset is publicly accessible below the Inventive Commons Attribution 4.0 (CC BY 4.0) license, allowing use and distribution for analysis and analysis functions.
1. Smarter, quicker insights extraction via reusable templates
Reusable templates can extract constant contract attributes at scale by serving to customers to outline extraction logic as soon as and apply it throughout a number of paperwork. Every template teams collectively labels that signify key contract parts comparable to termination discover intervals, renewal phrases, or rights clauses that authorized and compliance groups often evaluate.
When a template is utilized to a set of contracts, the identical extraction guidelines are used persistently throughout paperwork. This helps cut back handbook evaluate effort whereas enhancing accuracy and consistency, particularly when working with massive contract volumes. Behind the scenes, AIDA processes every contract utilizing a structured illustration that preserves web page and part context. Extraction guidelines information how related content material is recognized, and LLMs interpret that context to extract the right values. Outcomes are returned with citations that hyperlink again to the unique contract textual content, enabling you to confirm the place every perception got here from.
For instance, the Termination Discover Interval label extracts timelines instantly from the contract proven within the following screenshot, whereas the correct panel shows the extracted reply (highlighted in inexperienced) with clickable references to the precise supply textual content inside the contract.
2. Doc-level chat
You should utilize document-level chat to ask pure language questions on a single contract and obtain solutions grounded instantly in that doc. This functionality is especially helpful when fast clarification on particular phrases, dates, or obligations is required, stopping you from manually scanning prolonged and complicated agreements.
When questions are submitted, AIDA can determine probably the most related sections of the contract by evaluating queries in opposition to a semantic illustration of the doc’s content material. These sections are then offered as context to an LLM that’s hosted on Amazon Bedrock, which generates a response primarily based on the contract textual content.
3. International chat
International chat extends the document-level chat function to help questions throughout a number of contracts inside a challenge. This function is helpful when a broader view is required, comparable to figuring out frequent clauses, evaluating obligations, or summarizing phrases throughout a group of associated agreements.
International chat can be utilized in two methods. In a single state of affairs, questions are evaluated throughout the contracts in a challenge to supply a consolidated, project-wide view. In one other state of affairs, questions might be scoped to a particular set of contracts, so customers can deal with particular agreements whereas utilizing the identical conversational interface.
AIDA helps construct a semantic data base utilizing Amazon Bedrock from the underlying contracts by extracting and embedding doc content material for search. These embeddings are listed in Amazon OpenSearch Serverless, making a scalable semantic layer that may help queries throughout massive and numerous contract collections.
When submitting a query, AIDA can retrieve related passages utilizing a mix of implicit and specific filtering. Implicit filtering depends on semantic similarity between queries and the contract content material to floor contextually related sections. Express filtering applies metadata constraints comparable to contract sort, creation date, enterprise unit, or jurisdiction to slender outcomes to probably the most related subset. The chosen context is then offered to an LLM hosted on Amazon Bedrock, which generates a consolidated response with citations linking again to the unique supply paperwork.
Supporting capabilities constructed on AIDA’s system
The next part describes the supporting capabilities which are constructed on AIDA’s system: operational dashboard and exterior system integrations.
Operational dashboard
The operational dashboard supplies a consolidated view of contract evaluate efficiency on the challenge degree monitoring file volumes, OCR and perception extraction completion charges, errors, and extraction accuracy. It helps groups rapidly spot bottlenecks and monitor reviewer’s productiveness.
Exterior System Integrations
The structured extracted insights generated by AIDA might be rapidly pushed to downstream methods comparable to Contract Lifecycle Administration (CLM) instruments, ERP methods, CRMs, or knowledge warehouses. This integration helps enrich inside or exterior methods with high-quality, machine-readable contract knowledge, decreasing handbook knowledge re-entry and reconciliation throughout methods. By embedding these insights instantly into these methods, organizations can enhance compliance monitoring and help quicker, data-driven selections.
PwC’s AI-driven annotation (AIDA) answer, enabled by AWS, helps transfer organizations past handbook contract evaluate to a quicker, extra dependable, and scalable method. By bringing collectively OCR, user-defined extraction guidelines, and Retrieval Augmented Era via Amazon Bedrock, AIDA helps rapidly determine key phrases, obligations, and insights buried inside advanced contracts.
The answer helps streamline authorized and operational workflows, cut back evaluate time, and enhance consistency throughout massive volumes of paperwork. This answer was constructed on the cloud-native providers of AWS and designed to be safe like Amazon ECS, Amazon S3, Amazon RDS, and Amazon OpenSearch Serverless. AIDA can present the flexibleness and resilience wanted for enterprise deployment. Collectively, PwC and AWS can flip contract knowledge into actionable intelligence, enabling smarter selections and higher effectivity throughout their operations.
In regards to the authors
Ariana Lopez
Ariana Lopez is a Senior Companion Answer Architect at AWS. She has 15 years of business expertise spending the vast majority of her profession in cloud. She has expertise in cloud automation, technique, and answer architecting. In the present day, she is targeted on serving to Companions architect finest observe options.
Yash Munsadwala
Yash Munsadwala is a Lead Engineer in PwC’s Cloud, Engineering, Knowledge and AI (CEDA) observe. He makes a speciality of architecting and delivering knowledge and AI, net structure, and cloud modernization initiatives that assist enterprises speed up digital transformation. Yash leverages his experience in software program engineering, knowledge structure, and AWS-native providers to construct scalable, safe, and resilient options throughout industries.
Adam Hood
Adam Hood is a Principal and AWS Knowledge and AI Chief at PwC US. As a strategic and results-oriented know-how chief, Adam makes a speciality of driving enterprise-wide transformation and unlocking enterprise worth via the strategic software of digital methods, knowledge, and GenAI/AI/ML. He has guided organizations via advanced digital, finance, and ERP modernizations.
Justin Guse
Justin Guse is a Director in PwC’s Cloud Engineering observe targeted on serving to shoppers clear up enterprise challenges with AWS options. He brings over 11 years of expertise in cloud structure, with a deal with cloud migrations, greenfield deployments, and safety. Justin is an AWS Ambassador and an lively member of the AWS Certification Topic Matter Skilled program serving as a Lead SME.
Hector Hernandez
Hector Hernandez is a Senior Engineering Supervisor in PwC’s Cloud, Engineering, Knowledge and AI (CEDA) observe primarily based in Rochester, NY, with over 25 years of expertise in structure, design, growth, and supply of enterprise know-how options. He’s an skilled software program engineer, architect, and staff chief with deep experience in enterprise structure, software modernization, integration, and large-scale cloud transformations. Hector has led the design and implementation of enterprise-level Extract, Remodel, Load (ETL) methods, e-commerce platforms, and complicated knowledge migration initiatives, together with on-premises to cloud and cloud to cloud migrations.

