Amazon Data-Engineer-Associate Sample Questions

Question # 21

A company has a production AWS account that runs company workloads. The company's security team created a security AWS account to store and analyze security logs from the production AWS account. The security logs in the production AWS account are stored in Amazon CloudWatch Logs. The company needs to use Amazon Kinesis Data Streams to deliver the security logs to the security AWS account. Which solution will meet these requirements? 

A. Create a destination data stream in the production AWS account. In the security AWSaccount, create an IAM role that has cross-account permissions to Kinesis Data Streams inthe production AWS account.
B. Create a destination data stream in the security AWS account. Create an IAM role and atrust policy to grant CloudWatch Logs the permission to put data into the stream. Create asubscription filter in the security AWS account.
C. Create a destination data stream in the production AWS account. In the production AWSaccount, create an IAM role that has cross-account permissions to Kinesis Data Streams inthe security AWS account.
D. Create a destination data stream in the security AWS account. Create an IAM role and atrust policy to grant CloudWatch Logs the permission to put data into the stream. Create asubscription filter in the production AWS account.


Question # 22

A company is migrating on-premises workloads to AWS. The company wants to reduce overall operational overhead. The company also wants to explore serverless options. The company's current workloads use Apache Pig, Apache Oozie, Apache Spark, Apache Hbase, and Apache Flink. The on-premises workloads process petabytes of data in seconds. The company must maintain similar or better performance after the migration to AWS. Which extract, transform, and load (ETL) service will meet these requirements? 

A. AWS Glue
B. Amazon EMR
C. AWS Lambda
D. Amazon Redshift


Question # 23

A data engineering team is using an Amazon Redshift data warehouse for operational reporting. The team wants to prevent performance issues that might result from longrunning queries. A data engineer must choose a system table in Amazon Redshift to record anomalies when a query optimizer identifies conditions that might indicate performance issues. Which table views should the data engineer use to meet this requirement? 

A. STL USAGE CONTROL
B. STL ALERT EVENT LOG
C. STL QUERY METRICS
D. STL PLAN INFO


Question # 24

A media company wants to improve a system that recommends media content to customer based on user behavior and preferences. To improve the recommendation system, the company needs to incorporate insights from third-party datasets into the company's existing analytics platform. The company wants to minimize the effort and time required to incorporate third-party datasets. Which solution will meet these requirements with the LEAST operational overhead? 

A. Use API calls to access and integrate third-party datasets from AWS Data Exchange.
B. Use API calls to access and integrate third-party datasets from AWS
C. Use Amazon Kinesis Data Streams to access and integrate third-party datasets fromAWS CodeCommit repositories.
D. Use Amazon Kinesis Data Streams to access and integrate third-party datasets fromAmazon Elastic Container Registry (Amazon ECR).


Question # 25

A company uses an on-premises Microsoft SQL Server database to store financial transaction data. The company migrates the transaction data from the on-premises database to AWS at the end of each month. The company has noticed that the cost to migrate data from the on-premises database to an Amazon RDS for SQL Server database has increased recently. The company requires a cost-effective solution to migrate the data to AWS. The solution must cause minimal downtown for the applications that access the database. Which AWS service should the company use to meet these requirements? 

A. AWS Lambda
B. AWS Database Migration Service (AWS DMS)
C. AWS Direct Connect
D. AWS DataSync


Question # 26

A company has used an Amazon Redshift table that is named Orders for 6 months. The company performs weekly updates and deletes on the table. The table has an interleaved sort key on a column that contains AWS Regions. The company wants to reclaim disk space so that the company will not run out of storage space. The company also wants to analyze the sort key column. Which Amazon Redshift command will meet these requirements? 

A. VACUUM FULL Orders
B. VACUUM DELETE ONLY Orders
C. VACUUM REINDEX Orders
D. VACUUM SORT ONLY Orders


Question # 27

A company extracts approximately 1 TB of data every day from data sources such as SAP HANA, Microsoft SQL Server, MongoDB, Apache Kafka, and Amazon DynamoDB. Some of the data sources have undefined data schemas or data schemas that change. A data engineer must implement a solution that can detect the schema for these data sources. The solution must extract, transform, and load the data to an Amazon S3 bucket. The company has a service level agreement (SLA) to load the data into the S3 bucket within 15 minutes of data creation. Which solution will meet these requirements with the LEAST operational overhead? 

A. Use Amazon EMR to detect the schema and to extract, transform, and load the data intothe S3 bucket. Create a pipeline in Apache Spark.
B. Use AWS Glue to detect the schema and to extract, transform, and load the data intothe S3 bucket. Create a pipeline in Apache Spark.
C. Create a PvSpark proqram in AWS Lambda to extract, transform, and load the data intothe S3 bucket.
D. Create a stored procedure in Amazon Redshift to detect the schema and to extract,transform, and load the data into a Redshift Spectrum table. Access the table from AmazonS3.


Question # 28

A company stores data from an application in an Amazon DynamoDB table that operates in provisioned capacity mode. The workloads of the application have predictable throughput load on a regular schedule. Every Monday, there is an immediate increase in activity early in the morning. The application has very low usage during weekends. The company must ensure that the application performs consistently during peak usage times. Which solution will meet these requirements in the MOST cost-effective way? 

A. Increase the provisioned capacity to the maximum capacity that is currently presentduring peak load times.
B. Divide the table into two tables. Provision each table with half of the provisionedcapacity of the original table. Spread queries evenly across both tables.
C. Use AWS Application Auto Scaling to schedule higher provisioned capacity for peakusage times. Schedule lower capacity during off-peak times.
D. Change the capacity mode from provisioned to on-demand. Configure the table to scaleup and scale down based on the load on the table.


Question # 29

A company is planning to migrate on-premises Apache Hadoop clusters to Amazon EMR. The company also needs to migrate a data catalog into a persistent storage solution. The company currently stores the data catalog in an on-premises Apache Hive metastore on the Hadoop clusters. The company requires a serverless solution to migrate the data catalog. Which solution will meet these requirements MOST cost-effectively? 

A. Use AWS Database Migration Service (AWS DMS) to migrate the Hive metastore intoAmazon S3. Configure AWS Glue Data Catalog to scan Amazon S3 to produce the datacatalog.
B. Configure a Hive metastore in Amazon EMR. Migrate the existing on-premises Hivemetastore into Amazon EMR. Use AWS Glue Data Catalog to store the company's datacatalog as an external data catalog.
C. Configure an external Hive metastore in Amazon EMR. Migrate the existing on-premisesHive metastore into Amazon EMR. Use Amazon Aurora MySQL to store the company'sdata catalog.
D. Configure a new Hive metastore in Amazon EMR. Migrate the existing on-premises Hivemetastore into Amazon EMR. Use the new metastore as the company's data catalog.


Question # 30

A company loads transaction data for each day into Amazon Redshift tables at the end of each day. The company wants to have the ability to track which tables have been loaded and which tables still need to be loaded. A data engineer wants to store the load statuses of Redshift tables in an Amazon DynamoDB table. The data engineer creates an AWS Lambda function to publish the details of the load statuses to DynamoDB. How should the data engineer invoke the Lambda function to write load statuses to the DynamoDB table? 

A. Use a second Lambda function to invoke the first Lambda function based on AmazonCloudWatch events.
B. Use the Amazon Redshift Data API to publish an event to Amazon EventBridqe.Configure an EventBridge rule to invoke the Lambda function.
C. Use the Amazon Redshift Data API to publish a message to an Amazon Simple Queue Service (Amazon SQS) queue. Configure the SQS queue to invoke the Lambda function.
D. Use a second Lambda function to invoke the first Lambda function based on AWSCloudTrail events.


12345

Download All Questions PDF Check Customers Feedbacks