Amazon Data-Engineer-Associate Sample Questions

Question # 51

A company's data engineer needs to optimize the performance of table SQL queries. The company stores data in an Amazon Redshift cluster. The data engineer cannot increase the size of the cluster because of budget constraints. The company stores the data in multiple tables and loads the data by using the EVEN distribution style. Some tables are hundreds of gigabytes in size. Other tables are less than 10 MB in size. Which solution will meet these requirements? 

A. Keep using the EVEN distribution style for all tables. Specify primary and foreign keysfor all tables.
B. Use the ALL distribution style for large tables. Specify primary and foreign keys for alltables.
C. Use the ALL distribution style for rarely updated small tables. Specify primary andforeign keys for all tables.
D. Specify a combination of distribution, sort, and partition keys for all tables.


Question # 52

A data engineer is configuring Amazon SageMaker Studio to use AWS Glue interactive sessions to prepare data for machine learning (ML) models. The data engineer receives an access denied error when the data engineer tries to prepare the data by using SageMaker Studio. Which change should the engineer make to gain access to SageMaker Studio? 

A. Add the AWSGlueServiceRole managed policy to the data engineer's IAM user.
B. Add a policy to the data engineer's IAM user that includes the sts:AssumeRole action forthe AWS Glue and SageMaker service principals in the trust policy.
C. Add the AmazonSageMakerFullAccess managed policy to the data engineer's IAM user.
D. Add a policy to the data engineer's IAM user that allows the sts:AddAssociation actionfor the AWS Glue and SageMaker service principals in the trust policy.


Question # 53

A company stores petabytes of data in thousands of Amazon S3 buckets in the S3 Standard storage class. The data supports analytics workloads that have unpredictable and variable data access patterns. The company does not access some data for months. However, the company must be able to retrieve all data within milliseconds. The company needs to optimize S3 storage costs. Which solution will meet these requirements with the LEAST operational overhead? 

A. Use S3 Storage Lens standard metrics to determine when to move objects to more costoptimizedstorage classes. Create S3 Lifecycle policies for the S3 buckets to move objectsto cost-optimized storage classes. Continue to refine the S3 Lifecycle policies in the futureto optimize storage costs.
B. Use S3 Storage Lens activity metrics to identify S3 buckets that the company accessesinfrequently. Configure S3 Lifecycle rules to move objects from S3 Standard to the S3Standard-Infrequent Access (S3 Standard-IA) and S3 Glacier storage classes based on theage of the data.
C. Use S3 Intelligent-Tiering. Activate the Deep Archive Access tier.
D. Use S3 Intelligent-Tiering. Use the default access tier.


Question # 54

A company uses AWS Step Functions to orchestrate a data pipeline. The pipeline consists of Amazon EMR jobs that ingest data from data sources and store the data in an Amazon S3 bucket. The pipeline also includes EMR jobs that load the data to Amazon Redshift. The company's cloud infrastructure team manually built a Step Functions state machine. The cloud infrastructure team launched an EMR cluster into a VPC to support the EMR jobs. However, the deployed Step Functions state machine is not able to run the EMR jobs. Which combination of steps should the company take to identify the reason the Step Functions state machine is not able to run the EMR jobs? (Choose two.) 

A. Use AWS CloudFormation to automate the Step Functions state machine deployment.Create a step to pause the state machine during the EMR jobs that fail. Configure the stepto wait for a human user to send approval through an email message. Include details of theEMR task in the email message for further analysis.
B. Verify that the Step Functions state machine code has all IAM permissions that arenecessary to create and run the EMR jobs. Verify that the Step Functions state machinecode also includes IAM permissions to access the Amazon S3 buckets that the EMR jobsuse. Use Access Analyzer for S3 to check the S3 access properties.
C. Check for entries in Amazon CloudWatch for the newly created EMR cluster. Changethe AWS Step Functions state machine code to use Amazon EMR on EKS. Change theIAM access policies and the security group configuration for the Step Functions statemachine code to reflect inclusion of Amazon Elastic Kubernetes Service (Amazon EKS).
D. Query the flow logs for the VPC. Determine whether the traffic that originates from theEMR cluster can successfully reach the data providers. Determine whether any securitygroup that might be attached to the Amazon EMR cluster allows connections to the datasource servers on the informed ports.
E. Check the retry scenarios that the company configured for the EMR jobs. Increase thenumber of seconds in the interval between each EMR task. Validate that each fallback state has the appropriate catch for each decision state. Configure an Amazon SimpleNotification Service (Amazon SNS) topic to store the error messages.


Question # 55

A company maintains an Amazon Redshift provisioned cluster that the company uses for extract, transform, and load (ETL) operations to support critical analysis tasks. A sales team within the company maintains a Redshift cluster that the sales team uses for business intelligence (BI) tasks. The sales team recently requested access to the data that is in the ETL Redshift cluster so the team can perform weekly summary analysis tasks. The sales team needs to join data from the ETL cluster with data that is in the sales team's BI cluster. The company needs a solution that will share the ETL cluster data with the sales team without interrupting the critical analysis tasks. The solution must minimize usage of the computing resources of the ETL cluster. Which solution will meet these requirements? 

A. Set up the sales team Bl cluster asa consumer of the ETL cluster by using Redshift datasharing.
B. Create materialized views based on the sales team's requirements. Grant the salesteam direct access to the ETL cluster.
C. Create database views based on the sales team's requirements. Grant the sales teamdirect access to the ETL cluster.
D. Unload a copy of the data from the ETL cluster to an Amazon S3 bucket every week.Create an Amazon Redshift Spectrum table based on the content of the ETL cluster.


Question # 56

A data engineer maintains custom Python scripts that perform a data formatting process that many AWS Lambda functions use. When the data engineer needs to modify the Python scripts, the data engineer must manually update all the Lambda functions. The data engineer requires a less manual way to update the Lambda functions. Which solution will meet this requirement? 

A. Store a pointer to the custom Python scripts in the execution context object in a sharedAmazon S3 bucket.
B. Package the custom Python scripts into Lambda layers. Apply the Lambda layers to theLambda functions.
C. Store a pointer to the custom Python scripts in environment variables in a sharedAmazon S3 bucket.
D. Assign the same alias to each Lambda function. Call reach Lambda function byspecifying the function's alias.


Question # 57

A company maintains multiple extract, transform, and load (ETL) workflows that ingest data from the company's operational databases into an Amazon S3 based data lake. The ETL workflows use AWS Glue and Amazon EMR to process data.The company wants to improve the existing architecture to provide automated orchestration and to require minimal manual effort.Which solution will meet these requirements with the LEAST operational overhead?  

A. AWS Glue workflows
B. AWS Step Functions tasks
C. AWS Lambda functions
D. Amazon Managed Workflows for Apache Airflow (Amazon MWAA) workflows


Question # 58

A company has multiple applications that use datasets that are stored in an Amazon S3 bucket. The company has an ecommerce application that generates a dataset that contains personally identifiable information (PII). The company has an internal analytics application that does not require access to the PII.To comply with regulations, the company must not share PII unnecessarily. A data engineer needs to implement a solution that with redact PII dynamically, based on the needs of each application that accesses the dataset.Which solution will meet the requirements with the LEAST operational overhead?  

A. Create an S3 bucket policy to limit the access each application has. Create multiple copies of the dataset. Give each dataset copy the appropriate level of redaction for the needs of the application that accesses the copy.
B. Create an S3 Object Lambda endpoint. Use the S3 Object Lambda endpoint to read data from the S3 bucket. Implement redaction logic within an S3 Object Lambda function to dynamically redact PII based on the needs of each application that accesses the data.
C. Use AWS Glue to transform the data for each application. Create multiple copies of the dataset. Give each dataset copy the appropriate level of redaction for the needs of the application that accesses the copy.
D. Create an API Gateway endpoint that has custom authorizers. Use the API Gateway endpoint to read data from the S3 bucket. Initiate a REST API call to dynamically redact PII based on the needs of each application that accesses the data.


Question # 59

A company stores daily records of the financial performance of investment portfolios in .csv format in an Amazon S3 bucket. A data engineer uses AWS Glue crawlers to crawl the S3 data.The data engineer must make the S3 data accessible daily in the AWS Glue Data Catalog.Which solution will meet these requirements?  

A. Create an IAM role that includes the AmazonS3FullAccess policy. Associate the role with the crawler. Specify the S3 bucket path of the source data as the crawler's data store. Create a daily schedule to run the crawler. Configure the output destination to a new path in the existing S3 bucket.
B. Create an IAM role that includes the AWSGlueServiceRole policy. Associate the role with the crawler. Specify the S3 bucket path of the source data as the crawler's data store. Create a daily schedule to run the crawler. Specify a database name for the output.
C. Create an IAM role that includes the AmazonS3FullAccess policy. Associate the role with the crawler. Specify the S3 bucket path of the source data as the crawler's data store. Allocate data processing units (DPUs) to run the crawler every day. Specify a database name for the output.
D. Create an IAM role that includes the AWSGlueServiceRole policy. Associate the role with the crawler. Specify the S3 bucket path of the source data as the crawler's data store. Allocate data processing units (DPUs) to run the crawler every day. Configure the output destination to a new path in the existing S3 bucket.


Question # 60

A company uses Amazon Athena for one-time queries against data that is in Amazon S3. The company has several use cases. The company must implement permission controls to separate query processes and access to query history among users, teams, and applications that are in the same AWS account.Which solution will meet these requirements?  

A. Create an S3 bucket for each use case. Create an S3 bucket policy that grants permissions to appropriate individual IAM users. Apply the S3 bucket policy to the S3 bucket.
B. Create an Athena workgroup for each use case. Apply tags to the workgroup. Create an 1AM policy that uses the tags to apply appropriate permissions to the workgroup.
C. Create an JAM role for each use case. Assign appropriate permissions to the role for each use case. Associate the role with Athena.
D. Create an AWS Glue Data Catalog resource policy that grants permissions to appropriate individual IAM users for each use case. Apply the resource policy to the specific tables that Athena uses.


‹ First456

Download All Questions PDF Check Customers Feedbacks