AIP-C01 Exam Dumps - Amazon Web Services AWS Certified Professional Questions and Answers

Question # 14

A financial services company uses multiple foundation models (FMs) through Amazon Bedrock for its generative AI (GenAI) applications. To comply with a new regulation for GenAI use with sensitive financial data, the company needs a token management solution.

The token management solution must proactively alert when applications approach model-specific token limits. The solution must also process more than 5,000 requests each minute and maintain token usage metrics to allocate costs across business units.

Which solution will meet these requirements?

Options:

Develop model-specific tokenizers in an AWS Lambda function. Configure the Lambda function to estimate token usage before sending requests to Amazon Bedrock. Configure the Lambda function to publish metrics to Amazon CloudWatch and trigger alarms when requests approach thresholds. Store detailed token usage in Amazon DynamoDB to report costs.

Implement Amazon Bedrock Guardrails with token quota policies. Capture metrics on rejected requests. Configure Amazon EventBridge rules to trigger notifications based on Amazon Bedrock Guardrails metrics. Use Amazon CloudWatch dashboards to visualize token usage trends across models.

Deploy an Amazon SQS dead-letter queue for failed requests. Configure an AWS Lambda function to analyze token-related failures. Use Amazon CloudWatch Logs Insights to generate reports on token usage patterns based on error logs from Amazon Bedrock API responses.

Use Amazon API Gateway to create a proxy for all Amazon Bedrock API calls. Configure request throttling based on custom usage plans with predefined token quotas. Configure API Gateway to reject requests that will exceed token limits.

Buy Now

Question # 15

A university recently digitized a collection of archival documents, academic journals, and manuscripts. The university stores the digital files in an AWS Lake Formation data lake.

The university hires a GenAI developer to build a solution to allow users to search the digital files by using text queries. The solution must return journal abstracts that are semantically similar to a user's query. Users must be able to search the digitized collection based on text and metadata that is associated with the journal abstracts. The metadata of the digitized files does not contain keywords. The solution must match similar abstracts to one another based on the similarity of their text. The data lake contains fewer than 1 million files.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

Use Amazon Titan Embeddings in Amazon Bedrock to create vector representations of the digitized files. Store embeddings in the OpenSearch Neural plugin for Amazon OpenSearch Service.

Use Amazon Comprehend to extract topics from the digitized files. Store the topics and file metadata in an Amazon Aurora PostgreSQL database. Query the abstract metadata against the data in the Aurora database.

Use Amazon SageMaker AI to deploy a sentence-transformer model. Use the model to create vector representations of the digitized files. Store embeddings in an Amazon Aurora PostgreSQL database that has the pgvector extension.

Use Amazon Titan Embeddings in Amazon Bedrock to create vector representations of the digitized files. Store embeddings in an Amazon Aurora PostgreSQL Serverless database that has the pgvector extension.

Buy Now

Answer:

Explanation:

Option D is the best choice because it delivers true semantic search with the smallest operational footprint by combining a fully managed embedding service with an automatically scaling vector-capable database. The university’s requirement is explicitly semantic: the metadata has no keywords, and the system must match abstracts based on similarity of meaning. This is a direct fit for an embeddings-based approach, where each abstract is converted into a vector representation and searched using vector similarity. Amazon Titan Embeddings in Amazon Bedrock provides a managed way to generate these vectors without hosting or maintaining an ML model, eliminating the operational work of model provisioning, patching, scaling, and lifecycle management.

For storage and retrieval, Amazon Aurora PostgreSQL Serverless with the pgvector extension supports vector storage and similarity search while minimizing infrastructure operations. Aurora Serverless reduces capacity planning and scaling tasks because it can automatically adjust to changes in workload, which is valuable for a university search application with variable usage patterns. With fewer than 1 million files, a PostgreSQL-based vector store is commonly operationally simpler than running a dedicated search cluster, while still meeting the requirement to query using both text-derived similarity and associated metadata filters stored alongside the vectors.

Option A can also enable vector search, but operating an OpenSearch domain typically introduces additional concerns such as domain sizing, shard strategy, cluster scaling, and performance tuning for k-NN workloads. Option C increases operational overhead the most because it requires deploying and operating a sentence-transformer model endpoint in SageMaker AI, including scaling, monitoring, and model management. Option B does not meet the semantic similarity requirement reliably because topic extraction is not equivalent to embedding-based semantic matching, especially when the metadata lacks keywords and the system must compare abstracts by meaning.

Therefore, D best satisfies semantic search needs with the least operational overhead.

Question # 16

A company is building a generative AI (GenAI) application that processes financial reports and provides summaries for analysts. The application must run two compute environments. In one environment, AWS Lambda functions must use the Python SDK to analyze reports on demand. In the second environment, Amazon EKS containers must use the JavaScript SDK to batch process multiple reports on a schedule. The application must maintain conversational context throughout multi-turn interactions, use the same foundation model (FM) across environments, and ensure consistent authentication.

Which solution will meet these requirements?

Options:

Use the Amazon Bedrock InvokeModel API with a separate authentication method for each environment. Store conversation states in Amazon DynamoDB. Use custom I/O formatting logic for each programming language.

Use the Amazon Bedrock Converse API directly in both environments with a common authentication mechanism that uses IAM roles. Store conversation states in Amazon ElastiCache. Create programming language-specific wrappers for model parameters.

Create a centralized Amazon API Gateway REST API endpoint that handles all model interactions by using the InvokeModel API. Store interaction history in application process memory in each Lambda function or EKS container. Use environment variables to configure model parameters.

Use the Amazon Bedrock Converse API and IAM roles for authentication. Pass previous messages in the request messages array to maintain conversational context. Use programming language-specific SDKs to establish consistent API interfaces.

Buy Now

Question # 17

A healthcare company uses Amazon Bedrock to deploy an application that generates summaries of clinical documents. The application experiences inconsistent response quality with occasional factual hallucinations. Monthly costs exceed the company’s projections by 40%. A GenAI developer must implement a near real-time monitoring solution to detect hallucinations, identify abnormal token consumption, and provide early warnings of cost anomalies. The solution must require minimal custom development work and maintenance overhead.

Which solution will meet these requirements?

Options:

Configure Amazon CloudWatch alarms to monitor InputTokenCount and OutputTokenCount metrics to detect anomalies. Store model invocation logs in an Amazon S3 bucket. Use AWS Glue and Amazon Athena to identify potential hallucinations.

Run Amazon Bedrock evaluation jobs that use LLM-based judgments to detect hallucinations. Configure Amazon CloudWatch to track token usage. Create an AWS Lambda function to process CloudWatch metrics. Configure the Lambda function to send usage pattern notifications.

Configure Amazon Bedrock to store model invocation logs in an Amazon S3 bucket. Enable text output logging. Configure Amazon Bedrock guardrails to run contextual grounding checks to detect hallucinations. Create Amazon CloudWatch anomaly detection alarms for token usage metrics.

Use AWS CloudTrail to log all Amazon Bedrock API calls. Create a custom dashboard in Amazon QuickSight to visualize token usage patterns. Use Amazon SageMaker Model Monitor to detect quality drift in generated summaries.

Buy Now

Question # 18

A company is building a generative AI (GenAI) application that uses Amazon Bedrock APIs to process complex customer inquiries. During peak usage periods, the application experiences intermittent API timeouts that cause issues such as broken response chunks and delayed data delivery. The application struggles to ensure that prompts remain within token limits when handling complex customer inquiries of varying lengths. Users have reported truncated inputs and incomplete responses. The company has also observed foundation model (FM) invocation failures.

The company needs a retry strategy that automatically handles transient service errors and prevents overwhelming Amazon Bedrock during peak usage periods. The strategy must also adapt to changing service availability and support response streaming and token-aware request handling.

Which solution will meet these requirements?

Options:

Implement a standard retry strategy that uses a 1-second fixed delay between attempts and a 3-retry maximum for all errors. Handle streaming response timeouts by restarting streams. Cap token usage for each session.

Implement an adaptive retry strategy that uses exponential backoff with jitter and a circuit breaker pattern that temporarily disables retries when error rates exceed a predefined threshold. Implement a streaming response handler that monitors for chunk delivery timeouts. Configure the handler to buffer successfully received chunks and intelligently resume streaming from the last received chunk when connections are re-established.

Use the AWS SDK to configure a retry strategy in standard mode. Wrap Amazon Bedrock API calls in try-catch blocks that handle timeout exceptions. Return cached completions for failed streaming requests. Enforce a global token limit for all users. Add jitter-based retry logic and lightweight token trimming for each request. Resume broken streams by requesting only missing chunks from the point of failure. Maintain a small in-memory buffer of

Set Amazon Bedrock client request timeouts to 30 seconds. Implement client-side load shedding. Buffer partial results and stop new requests when application performance degrades. Set static token usage caps for all requests. Configure exponential backoff retries, dynamic chunk sizing, and context-aware token limits.

Buy Now

Answer:

Explanation:

Option B best meets all requirements because it combines AWS-recommended resiliency patterns for transient failures with streaming-aware handling and adaptive protection against cascading retries during peak load. When timeouts and throttling occur, naïve retries can amplify traffic and worsen outages. Exponential backoff with jitter is the standard AWS best practice because it spreads retry attempts over time, reduces synchronized retry storms, and lowers the probability of repeatedly colliding with service limits.

The requirement also states the strategy must “adapt to changing service availability” and “prevent overwhelming Amazon Bedrock.” A circuit breaker pattern directly addresses this by temporarily stopping or reducing retries when failure rates exceed a threshold, allowing the system to degrade gracefully instead of continually hammering the service. This is a key mechanism to prevent cascading failures during throttling events.

Because the application uses response streaming and experiences broken chunks, the retry strategy must be streaming-aware. A streaming response handler that detects chunk delivery timeouts and buffers already received chunks prevents the user from losing progress when a connection drops. Resuming from the last successfully received chunk minimizes redundant generation and reduces additional load on the model compared with restarting the entire stream. This supports better user experience and better service efficiency during intermittent failures.

Token-aware request handling is supported in this architecture because the application can apply token budgeting before invoking the model (for example, trimming or summarizing excessive context) while still preserving streaming output behavior. Option B provides the correct backbone for this by focusing on adaptive control and robust streaming recovery.

Option A is too simplistic and risks retry storms. Option C combines conflicting elements (global token limit, cached completions for streaming) and includes impractical “request only missing chunks” behavior that is not a reliable property of streamed generative output. Option D includes useful ideas (load shedding) but relies on static caps and does not provide as strong adaptive retry control as circuit breaking.

Therefore, Option B is the most correct and operationally safe strategy for peak-load Bedrock streaming workloads.

Question # 19

A healthcare company is using Amazon Bedrock to develop a real-time patient care AI assistant to respond to queries for separate departments that handle clinical inquiries, insurance verification, appointment scheduling, and insurance claims. The company wants to use a multi-agent architecture.

The company must ensure that the AI assistant is scalable and can onboard new features for patients. The AI assistant must be able to handle thousands of parallel patient interactions. The company must ensure that patients receive appropriate domain-specific responses to queries.

Which solution will meet these requirements?

Options:

Isolate data for each agent by using separate knowledge bases. Use IAM filtering to control access to each knowledge base. Deploy a supervisor agent to perform natural language intent classification on patient inquiries. Configure the supervisor agent to route queries to specialized collaborator agents to respond to department-specific queries. Configure each specialized collaborator agent to use Retrieval Augmented Generation (RAG) with th

Create a separate supervisor agent for each department. Configure individual collaborator agents to perform natural language intent classification for each specialty domain within each department. Integrate each collaborator agent with department-specific knowledge bases only. Implement manual handoff processes between the supervisor agents.

Isolate data for each department in separate knowledge bases. Use IAM filtering to control access to each knowledge base. Deploy a single general-purpose agent. Configure multiple action groups within the general-purpose agent to perform specific department functions. Implement rule-based routing logic in the general-purpose agent instructions.

Implement multiple independent supervisor agents that run in parallel to respond to patient inquiries for each department. Configure multiple collaborator agents for each supervisor agent. Integrate all agents with the same knowledge base. Use external routing logic to merge responses from multiple supervisor agents.

Buy Now

Answer:

Explanation:

Option A best meets the requirements because it applies an AWS-aligned multi-agent pattern that cleanly separates responsibilities: a supervisor agent performs intent classification and orchestration, while specialized collaborator agents handle domain-specific tasks using the right knowledge sources. This structure is well suited for healthcare workflows where clinical questions, scheduling, and insurance processes require different policies, terminology, and data access boundaries.

The requirement for appropriate domain-specific responses is addressed by routing each user query to a department-focused collaborator agent that is grounded with its own department-specific knowledge base. Using Retrieval Augmented Generation with the correct knowledge base improves factual alignment and reduces cross-department leakage (for example, avoiding claims content in a clinical answer). It also supports better prompt grounding and more consistent tone and constraints per department.

The requirement to isolate data maps to using separate knowledge bases per agent and enforcing access through IAM controls, ensuring that each agent can retrieve only from the authorized datasets. This is important for minimizing unintended exposure of sensitive or irrelevant departmental data and supports governance and compliance needs.

For scalability and thousands of parallel interactions, this architecture minimizes contention and bottlenecks. Each collaborator agent can scale independently because requests are distributed across multiple agents and multiple retrieval backends. Operationally, onboarding new features is also simpler: the company can add a new collaborator agent (for example, “billing disputes” or “pharmacy refills”) with its own knowledge base and policies without redesigning the entire assistant.

Option B introduces unnecessary complexity with multiple supervisors and manual handoffs. Option C overloads a single agent with broad instructions and rule-based routing, which increases prompt complexity and reduces maintainability as features grow. Option D creates high operational complexity and risks inconsistent outputs when merging responses from parallel supervisors, and it weakens data isolation by using a shared knowledge base across agents.

Question # 20

A financial services company is creating a Retrieval Augmented Generation (RAG) application that uses Amazon Bedrock to generate summaries of market activities. The application relies on a vector database that stores a small proprietary dataset with a low index count. The application must perform similarity searches. The Amazon Bedrock model’s responses must maximize accuracy and maintain high performance.

The company needs to configure the vector database and integrate it with the application.

Which solution will meet these requirements?

Options:

Launch an Amazon MemoryDB cluster and configure the index by using the Flat algorithm. Configure a horizontal scaling policy based on performance metrics.

Launch an Amazon MemoryDB cluster and configure the index by using the Hierarchical Navigable Small World (HNSW) algorithm. Configure a vertical scaling policy based on performance metrics.

Launch an Amazon Aurora PostgreSQL cluster and configure the index by using the Inverted File with Flat Compression (IVFFlat) algorithm. Configure the instance class to scale to a larger size when the load increases.

Launch an Amazon DocumentDB cluster that has an IVFFlat index and a high probe value. Configure connections to the cluster as a replica set. Distribute reads to replica instances.

Buy Now

Question # 21

A retail company has a generative AI (GenAI) product recommendation application that uses Amazon Bedrock. The application suggests products to customers based on browsing history and demographics. The company needs to implement fairness evaluation across multiple demographic groups to detect and measure bias in recommendations between two prompt approaches. The company wants to collect and monitor fairness metrics in real time. The company must receive an alert if the fairness metrics show a discrepancy of more than 15% between demographic groups. The company must receive weekly reports that compare the performance of the two prompt approaches.

Which solution will meet these requirements with the LEAST custom development effort?

Options:

Configure an Amazon CloudWatch dashboard to display default metrics from Amazon Bedrock API calls. Create custom metrics based on model outputs. Set up Amazon EventBridge rules to invoke AWS Lambda functions that perform post-processing analysis on model responses and publish custom fairness metrics.

Create the two prompt variants in Amazon Bedrock Prompt Management. Use Amazon Bedrock Flows to deploy the prompt variants with defined traffic allocation. Configure Amazon Bedrock guardrails to monitor demographic fairness. Set up Amazon CloudWatch alarms on the GuardrailContentSource dimension by using InvocationsIntervened metrics to detect recommendation discrepancy threshold violations.

Set up Amazon SageMaker Clarify to analyze model outputs. Publish fairness metrics to Amazon CloudWatch. Create CloudWatch composite alarms that combine SageMaker Clarify bias metrics with Amazon Bedrock latency metrics.

Create an Amazon Bedrock model evaluation job to compare fairness between the two prompt variants. Enable model invocation logging in Amazon CloudWatch. Set up CloudWatch alarms for InvocationsIntervened metrics with a dimension for each demographic group.

Buy Now

Question # 22

A publishing company is developing a chat assistant that uses a containerized large language model (LLM) that runs on Amazon SageMaker AI. The architecture consists of an Amazon API Gateway REST API that routes user requests to an AWS Lambda function. The Lambda function invokes a SageMaker AI real-time endpoint that hosts the LLM.

Users report uneven response times. Analytics show that a high number of chats are abandoned after 2 seconds of waiting for the first token. The company wants a solution to ensure that p95 latency is under 800 ms for interactive requests to the chat assistant.

Which combination of solutions will meet this requirement? (Select TWO.)

Options:

Enable model preload upon container startup. Implement dynamic batching to process multiple user requests together in a single inference pass.

Select a larger GPU instance type for the SageMaker AI endpoint. Set the minimum number of instances to 0. Continue to perform per-request processing. Lazily load model weights on the first request.

Switch to a multi-model endpoint. Use lazy loading without request batching.

Set the minimum number of instances to greater than 0. Enable response streaming.

Switch to Amazon SageMaker Asynchronous Inference for all requests. Store requests in an Amazon S3 bucket. Set the minimum number of instances to 0.

Buy Now

Answer:

A, D

Explanation:

The correct answers are A and D because they directly reduce time-to-first-token and stabilize p95 latency for interactive, real-time chat workloads hosted on Amazon SageMaker AI real-time endpoints.

Option D addresses the biggest driver of uneven latency: cold starts and scale-to-zero behavior. By setting the minimum number of instances to greater than 0, the endpoint always has warm capacity and loaded runtime resources, eliminating the first-request penalty that causes users to wait multiple seconds. Enabling response streaming improves perceived latency by returning the first tokens as soon as they are generated rather than waiting for the complete response. This directly targets the abandonment problem described (users leaving after waiting for the first token).

Option A further improves p95 latency and throughput by removing model loading overhead during inference and improving GPU utilization. Preloading model weights during container startup ensures the model is ready before traffic arrives and avoids unpredictable on-demand weight loading. Dynamic batching increases efficiency by grouping compatible requests into a single inference pass, reducing per-request overhead and improving GPU saturation. When tuned properly for interactive workloads, batching can reduce tail latency while preserving responsiveness by enforcing small batch windows.

Option B makes latency worse because setting minimum instances to 0 and lazily loading weights guarantees cold-start delays and unpredictable first-token performance. Option C similarly increases cold-start behavior through lazy loading and offers no batching benefits. Option E is designed for non-interactive workloads and introduces queueing and storage latency, which conflicts with the 800 ms p95 requirement for interactive chat.

Therefore, A and D are the best combination to achieve consistently low p95 latency and fast first-token streaming for a SageMaker-hosted chat assistant.

Question # 23

A company is using Amazon Bedrock to design an application to help researchers apply for grants. The application is based on an Amazon Nova Pro foundation model (FM). The application contains four required inputs and must provide responses in a consistent text format. The company wants to receive a notification in Amazon Bedrock if a response contains bullying language. However, the company does not want to block all flagged responses.

The company creates an Amazon Bedrock flow that takes an input prompt and sends it to the Amazon Nova Pro FM. The Amazon Nova Pro FM provides a response.

Which additional steps must the company take to meet these requirements? (Select TWO.)

Options:

Use Amazon Bedrock Prompt Management to specify the required inputs as variables. Select an Amazon Nova Pro FM. Specify the output format for the response. Add the prompt to the prompts node of the flow.

Create an Amazon Bedrock guardrail that applies the hate content filter. Set the filter response to block. Add the guardrail to the prompts node of the flow.

Create an Amazon Bedrock prompt router. Specify an Amazon Nova Pro FM. Add the required inputs as variables to the input node of the flow. Add the prompt router to the prompts node. Add the output format to the output node.

Create an Amazon Bedrock guardrail that applies the insults content filter. Set the filter response to detect. Add the guardrail to the prompts node of the flow.

Create an Amazon Bedrock application inference profile that specifies an Amazon Nova Pro FM. Specify the output format for the response in the description. Include a tag for each of the input variables. Add the profile to the prompts node of the flow.

Buy Now

Exam Code: AIP-C01

Exam Name: AWS Certified Generative AI Developer - Professional

Last Update: Feb 22, 2026

Questions: 107

AIP-C01 PDF

$25.5 ~~$84.99~~

Add to Cart

AIP-C01 Testing Engine

$28.5 ~~$94.99~~

Add to Cart

AIP-C01 PDF + Testing Engine

$40.5 ~~$134.99~~

Add to Cart

Spring Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: Board70

certsboard certification exams

Navigation:

AIP-C01 Exam Dumps - Amazon Web Services AWS Certified Professional Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

AIP-C01 PDF

AIP-C01 Testing Engine

AIP-C01 PDF + Testing Engine

Quick Links

Recently New Released Certification Exams

Site Secure