A company has five offices in different AWS Regions. Each office has its own human resources (HR) department that uses a unique IAM role. The company stores employee records in a data lake that is based on Amazon S3 storage. A data engineering team needs to limit access to the records. Each HR department should be able to access records for only employees who are within the HR department's Region. Which combination of steps should the data engineering team take to meet this requirement with the LEAST operational overhead? (Choose two.)
A. Use data filters for each Region to register the S3 paths as data locations. B. Register the S3 path as an AWS Lake Formation location. C. Modify the IAM roles of the HR departments to add a data filter for each department'sRegion. D. Enable fine-grained access control in AWS Lake Formation. Add a data filter for eachRegion. E. Create a separate S3 bucket for each Region. Configure an IAM policy to allow S3access. Restrict access based on Region.
Answer: B,D Explanation: AWS Lake Formation is a service that helps you build, secure, and manage data lakes on Amazon S3. You can use AWS Lake Formation to register the S3 path as adata lake location, and enable fine-grained access control to limit access to the recordsbased on the HR department’s Region. You can use data filters to specify which S3prefixes or partitions each HR department can access, and grant permissions to the IAMroles of the HR departments accordingly. This solution will meet the requirement with theleast operational overhead, as it simplifies the data lake management and security, andleverages the existing IAM roles of the HR departments12.The other options are not optimal for the following reasons:A. Use data filters for each Region to register the S3 paths as data locations. Thisoption is not possible, as data filters are not used to register S3 paths as datalocations, but to grant permissions to access specific S3 prefixes or partitionswithin a data location. Moreover, this option does not specify how to limit access tothe records based on the HR department’s Region.C. Modify the IAM roles of the HR departments to add a data filter for eachdepartment’s Region. This option is not possible, as data filters are not added toIAM roles, but to permissions granted by AWS Lake Formation. Moreover, thisoption does not specify how to register the S3 path as a data lake location, or howto enable fine-grained access control in AWS Lake Formation.E. Create a separate S3 bucket for each Region. Configure an IAM policy to allowS3 access. Restrict access based on Region. This option is not recommended, asit would require more operational overhead to create and manage multiple S3buckets, and to configure and maintain IAM policies for each HR department.Moreover, this option does not leverage the benefits of AWS Lake Formation, suchas data cataloging, data transformation, and data governance.References:1: AWS Lake Formation2: AWS Lake Formation Permissions: AWS Identity and Access Management: Amazon S3
Question # 12
A healthcare company uses Amazon Kinesis Data Streams to stream real-time health data from wearable devices, hospital equipment, and patient records. A data engineer needs to find a solution to process the streaming data. The data engineer needs to store the data in an Amazon Redshift Serverless warehouse. The solution must support near real-time analytics of the streaming data and the previous day's data. Which solution will meet these requirements with the LEAST operational overhead?
A. Load data into Amazon Kinesis Data Firehose. Load the data into Amazon Redshift. B. Use the streaming ingestion feature of Amazon Redshift. C. Load the data into Amazon S3. Use the COPY command to load the data into AmazonRedshift. D. Use the Amazon Aurora zero-ETL integration with Amazon Redshift.
Answer: B Explanation: The streaming ingestion feature of Amazon Redshift enables you to ingest data from streaming sources, such as Amazon Kinesis Data Streams, into AmazonRedshift tables in near real-time. You can use the streaming ingestion feature to processthe streaming data from the wearable devices, hospital equipment, and patient records.The streaming ingestion feature also supports incremental updates, which means you canappend new data or update existing data in the Amazon Redshift tables. This way, you canstore the data in an Amazon Redshift Serverless warehouse and support near real-timeanalytics of the streaming data and the previous day’s data. This solution meets therequirements with the least operational overhead, as it does not require any additionalservices or components to ingest and process the streaming data. The other options areeither not feasible or not optimal. Loading data into Amazon Kinesis Data Firehose andthen into Amazon Redshift (option A) would introduce additional latency and cost, as wellas require additional configuration and management. Loading data into Amazon S3 andthen using the COPY command to load the data into Amazon Redshift (option C) wouldalso introduce additional latency and cost, as well as require additional storage space andETL logic. Using the Amazon Aurora zero-ETL integration with Amazon Redshift (option D)would not work, as it requires the data to be stored in Amazon Aurora first, which is not thecase for the streaming data from the healthcare company. References:Using streaming ingestion with Amazon RedshiftAWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide,Chapter 3: Data Ingestion and Transformation, Section 3.5: Amazon RedshiftStreaming Ingestion
Question # 13
A company is migrating a legacy application to an Amazon S3 based data lake. A data engineer reviewed data that is associated with the legacy application. The data engineer found that the legacy data contained some duplicate information. The data engineer must identify and remove duplicate information from the legacy application data. Which solution will meet these requirements with the LEAST operational overhead?
A. Write a custom extract, transform, and load (ETL) job in Python. Use theDataFramedrop duplicatesf) function by importingthe Pandas library to perform datadeduplication. B. Write an AWS Glue extract, transform, and load (ETL) job. Usethe FindMatchesmachine learning(ML) transform to transform the data to perform data deduplication. C. Write a custom extract, transform, and load (ETL) job in Python. Import the Pythondedupe library. Use the dedupe library to perform data deduplication. D. Write an AWS Glue extract, transform, and load (ETL) job. Import the Python dedupelibrary. Use the dedupe library to perform data deduplication.
Answer: B Explanation: AWS Glue is a fully managed serverless ETL service that can handle data deduplication with minimal operational overhead. AWS Glue provides a built-in MLtransform called FindMatches, which can automatically identify and group similar records ina dataset. FindMatches can also generate a primary key for each group of records andremove duplicates. FindMatches does not require any coding or prior ML experience, as itcan learn from a sample of labeled data provided by the user. FindMatches can also scaleto handle large datasets and optimize the cost and performance of the ETL job.References:AWS GlueFindMatches ML TransformAWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide
Question # 14
A company needs to build a data lake in AWS. The company must provide row-level data access and column-level data access to specific teams. The teams will access the data by using Amazon Athena, Amazon Redshift Spectrum, and Apache Hive from Amazon EMR. Which solution will meet these requirements with the LEAST operational overhead?
A. Use Amazon S3 for data lake storage. Use S3 access policies to restrict data access byrows and columns. Provide data access throughAmazon S3. B. Use Amazon S3 for data lake storage. Use Apache Ranger through Amazon EMR torestrict data access byrows and columns. Providedata access by using Apache Pig. C. Use Amazon Redshift for data lake storage. Use Redshift security policies to restrictdata access byrows and columns. Provide data accessby usingApache Spark and AmazonAthena federated queries. D. UseAmazon S3 for data lake storage. Use AWS Lake Formation to restrict data accessby rows and columns. Provide data access through AWS Lake Formation.
Answer: D Explanation: Option D is the best solution to meet the requirements with the leastoperational overhead because AWS Lake Formation is a fully managed service thatsimplifies the process of building, securing, and managing data lakes. AWS Lake Formation allows you to define granular data access policies at the row and column levelfor different users and groups. AWS Lake Formation also integrates with Amazon Athena,Amazon Redshift Spectrum, and Apache Hive on Amazon EMR, enabling these services toaccess the data in the data lake through AWS Lake Formation.Option A is not a good solution because S3 access policies cannot restrict data access byrows and columns. S3 access policies are based on the identity and permissions of therequester, the bucket and object ownership, and the object prefix and tags. S3 accesspolicies cannot enforce fine-grained data access control at the row and column level.Option B is not a good solution because it involves using Apache Ranger and Apache Pig,which are not fully managed services and require additional configuration andmaintenance. Apache Ranger is a framework that provides centralized securityadministration for data stored in Hadoop clusters, such as Amazon EMR. Apache Rangercan enforce row-level and column-level access policies for Apache Hive tables. However,Apache Ranger is not a native AWS service and requires manual installation andconfiguration on Amazon EMR clusters. Apache Pig is a platform that allows you to analyzelarge data sets using a high-level scripting language called Pig Latin. Apache Pig canaccess data stored in Amazon S3 and process it using Apache Hive. However,Apache Pigis not a native AWS service and requires manual installation and configuration on AmazonEMR clusters.Option C is not a good solution because Amazon Redshift is not a suitable service for datalake storage. Amazon Redshift is a fully managed data warehouse service that allows youto run complex analytical queries using standard SQL. Amazon Redshift can enforce rowleveland column-level access policies for different users and groups. However, AmazonRedshift is not designed to store and process large volumes of unstructured or semistructureddata, which are typical characteristics of data lakes. Amazon Redshift is alsomore expensive and less scalable than Amazon S3 for data lake storage.References:AWS Certified Data Engineer - Associate DEA-C01 Complete Study GuideWhat Is AWS Lake Formation? - AWS Lake FormationUsing AWS Lake Formation with Amazon Athena - AWS Lake FormationUsing AWS Lake Formation with Amazon Redshift Spectrum - AWS LakeFormationUsing AWS Lake Formation with Apache Hive on Amazon EMR - AWS LakeFormationUsing Bucket Policies and User Policies - Amazon Simple Storage ServiceApache RangerApache PigWhat Is Amazon Redshift? - Amazon Redshift
Question # 15
A company uses an Amazon Redshift provisioned cluster as its database. The Redshift cluster has five reserved ra3.4xlarge nodes and uses key distribution. A data engineer notices that one of the nodes frequently has a CPU load over 90%. SQL Queries that run on the node are queued. The other four nodes usually have a CPU load under 15% during daily operations. The data engineer wants to maintain the current number of compute nodes. The data engineer also wants to balance the load more evenly across all five compute nodes. Which solution will meet these requirements?
A. Change the sort key to be the data column that is most often used in a WHERE clauseof the SQL SELECT statement. B. Change the distribution key to the table column that has the largest dimension. C. Upgrade the reserved node from ra3.4xlarqe to ra3.16xlarqe. D. Change the primary key to be the data column that is most often used in a WHEREclause of the SQL SELECT statement.
Answer: B Explanation: Changing the distribution key to the table column that has the largest dimension will help to balance the load more evenly across all five compute nodes. Thedistribution key determines how the rows of a table are distributed among the slices of thecluster. If the distribution key is not chosen wisely, it can cause data skew, meaning someslices will have more data than others, resulting in uneven CPU load and queryperformance. By choosing the table column that has the largest dimension, meaning thecolumn that has the most distinct values, as the distribution key, the data engineer canensure that the rows are distributed more uniformly across the slices, reducing data skewand improving query performance.The other options are not solutions that will meet the requirements. Option A, changing thesort key to be the data column that is most often used in a WHERE clause of the SQLSELECT statement, will not affect the data distribution or the CPU load. The sort keydetermines the order in which the rows of a table are stored on disk, which can improve theperformance of range-restricted queries, but not the load balancing. Option C, upgradingthe reserved node from ra3.4xlarge to ra3.16xlarge, will not maintain the current number ofcompute nodes, as it will increase the cost and the capacity of the cluster. Option D,changing the primary key to be the data column that is most often used in a WHEREclause of the SQL SELECT statement, will not affect the data distribution or the CPU loadeither. The primary key is a constraint that enforces the uniqueness of the rows in a table,but it does not influence the data layout or the query optimization. References:Choosing a data distribution styleChoosing a data sort keyWorking with primary keys
Question # 16
A company is developing an application that runs on Amazon EC2 instances. Currently, the data that the application generates is temporary. However, the company needs to persist the data, even if the EC2 instances are terminated. A data engineer must launch new EC2 instances from an Amazon Machine Image (AMI) and configure the instances to preserve the data. Which solution will meet this requirement?
A. Launch new EC2 instances by using an AMI that is backed by an EC2 instance storevolume that contains the application data. Apply the default settings to the EC2 instances. B. Launch new EC2 instances by using an AMI that is backed by a root Amazon ElasticBlock Store (Amazon EBS) volume that contains the application data. Apply the defaultsettings to the EC2 instances. C. Launch new EC2 instances by using an AMI that is backed by an EC2 instance storevolume. Attach an Amazon Elastic Block Store (Amazon EBS) volume to contain theapplication data. Apply the default settings to the EC2 instances. D. Launch new EC2 instances by using an AMI that is backed by an Amazon Elastic BlockStore (Amazon EBS) volume. Attach an additional EC2 instance store volume to containthe application data. Apply the default settings to the EC2 instances.
Answer: C Explanation: Amazon EC2 instances can use two types of storage volumes: instance store volumes and Amazon EBS volumes. Instance store volumes are ephemeral, meaningthey are only attached to the instance for the duration of its life cycle. If the instance isstopped, terminated, or fails, the data on the instance store volume is lost. Amazon EBSvolumes are persistent, meaning they can be detached from the instance and attached toanother instance, and the data on the volume is preserved. To meet the requirement ofpersisting the data even if the EC2 instances are terminated, the data engineer must useAmazon EBS volumes to store the application data. The solution is to launch new EC2instances by using an AMI that is backed by an EC2 instance store volume, which is thedefault option for most AMIs. Then, the data engineer must attach an Amazon EBS volumeto each instance and configure the application to write the data to the EBS volume. Thisway, the data will be saved on the EBS volume and can be accessed by another instance ifneeded. The data engineer can apply the default settings to the EC2 instances, as there isno need to modify the instance type, security group, or IAM role for this solution. The otheroptions are either not feasible or not optimal. Launching new EC2 instances by using anAMI that is backed by an EC2 instance store volume that contains the application data(option A) or by using an AMI that is backed by a root Amazon EBS volume that containsthe application data (option B) would not work, as the data on the AMI would be outdatedand overwritten by the new instances. Attaching an additional EC2 instance store volumeto contain the application data (option D)would not work, as the data on the instance storevolume would be lost if the instance is terminated. References:Amazon EC2 Instance StoreAmazon EBS VolumesAWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide,Chapter 2: Data Store Management, Section 2.1: Amazon EC2
Question # 17
A data engineer must ingest a source of structured data that is in .csv format into an Amazon S3 data lake. The .csv files contain 15 columns. Data analysts need to run Amazon Athena queries on one or two columns of the dataset. The data analysts rarely query the entire file. Which solution will meet these requirements MOST cost-effectively?
A. Use an AWS Glue PySpark job to ingest the source data into the data lake in .csvformat. B. Create an AWS Glue extract, transform, and load (ETL) job to read from the .csvstructured data source. Configure the job to ingest the data into the data lake in JSONformat.C. Use an AWS Glue PySpark job to ingest the source data into the data lake in ApacheAvro format. D. Create an AWS Glue extract, transform, and load (ETL) job to read from the .csvstructured data source. Configure the job to write the data into the data lake in ApacheParquet format.
Answer: D Explanation: Amazon Athena is a serverless interactive query service that allows you to analyze data in Amazon S3 using standard SQL. Athena supports various data formats,such as CSV,JSON, ORC, Avro, and Parquet. However, not all data formats are equallyefficient for querying. Some data formats, such as CSV and JSON, are row-oriented,meaning that they store data as a sequence of records, each with the same fields. Roworientedformats are suitable for loading and exporting data, but they are not optimal foranalytical queries that often access only a subset of columns. Row-oriented formats alsodo not support compression or encoding techniques that can reduce the data size andimprove the query performance.On the other hand, some data formats, such as ORC and Parquet, are column-oriented,meaning that they store data as a collection of columns, each with a specific data type.Column-oriented formats are ideal for analytical queries that often filter, aggregate, or joindata by columns. Column-oriented formats also support compression and encodingtechniques that can reduce the data size and improve the query performance. Forexample, Parquet supports dictionary encoding, which replaces repeated values withnumeric codes, and run-length encoding, which replaces consecutive identical values witha single value and a count. Parquet also supports various compression algorithms, such asSnappy, GZIP, and ZSTD, that can further reduce the data size and improve the queryperformance.Therefore, creating an AWS Glue extract, transform, and load (ETL) job to read from the.csv structured data source and writing the data into the data lake in Apache Parquetformat will meet the requirements most cost-effectively. AWS Glue is a fully managed service that provides a serverless data integration platform for data preparation, datacataloging, and data loading. AWS Glue ETL jobs allow you to transform and load datafrom various sources into various targets, using either a graphical interface (AWS GlueStudio) or a code-based interface (AWS Glue console or AWS Glue API). By using AWSGlue ETL jobs, you can easily convert the data from CSV to Parquet format, without havingto write or manage any code. Parquet is a column-oriented format that allows Athena toscan only the relevant columns and skip the rest, reducing the amount of data read fromS3. This solution will also reduce the cost of Athena queries, as Athena charges based onthe amount of data scanned from S3.The other options are not as cost-effective as creating an AWS Glue ETL job to write thedata into the data lake in Parquet format. Using an AWS Glue PySpark job to ingest thesource data into the data lake in .csv format will not improve the query performance orreduce the query cost, as .csv is a row-oriented format that does not support columnaraccess or compression. Creating an AWS Glue ETL job to ingest the data into the datalake in JSON format will not improve the query performance or reduce the query cost, asJSON is also a row-oriented format that does not support columnar access or compression.Using an AWS Glue PySpark job to ingest the source data into the data lake in ApacheAvro format will improve the query performance, as Avro is a column-oriented format thatsupports compression and encoding, but it will require more operational effort, as you willneed to write and maintain PySpark code to convert the data from CSV to Avro format.References:Amazon AthenaChoosing the Right Data FormatAWS Glue[AWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide],Chapter 5: Data Analysis and Visualization, Section 5.1: Amazon Athena
Question # 18
A data engineer uses Amazon Redshift to run resource-intensive analytics processes once every month. Every month, the data engineer creates a new Redshift provisioned cluster. The data engineer deletes the Redshift provisioned cluster after the analytics processes are complete every month. Before the data engineer deletes the cluster each month, the data engineer unloads backup data from the cluster to an Amazon S3 bucket. The data engineer needs a solution to run the monthly analytics processes that does not require the data engineer to manage the infrastructure manually. Which solution will meet these requirements with the LEAST operational overhead?
A. Use Amazon Step Functions to pause the Redshift cluster when the analytics processesare complete and to resume the cluster to run new processes every month. B. Use Amazon Redshift Serverless to automatically process the analytics workload. C. Use the AWS CLI to automatically process the analytics workload. D. Use AWS CloudFormation templates to automatically process the analytics workload.
Answer: B Explanation: Amazon Redshift Serverless is a new feature of Amazon Redshift that enables you to run SQL queries on data in Amazon S3 without provisioning or managingany clusters. You can use Amazon Redshift Serverless to automatically process theanalytics workload, as it scales up and down the compute resources based on the querydemand, and charges you only for the resources consumed. This solution will meet therequirements with the least operational overhead, as it does not require the data engineerto create, delete, pause, or resume any Redshift clusters, or to manage any infrastructuremanually. You can use the Amazon Redshift Data API to run queries from the AWS CLI,AWS SDK, or AWS Lambda functions12.The other options are not optimal for the following reasons:A. Use Amazon Step Functions to pause the Redshift cluster when the analyticsprocesses are complete and to resume the cluster to run new processes everymonth. This option is not recommended, as it would still require the data engineerto create and delete a new Redshift provisioned cluster every month, which canincur additional costs and time. Moreover, this option would require the dataengineer to use Amazon Step Functions to orchestrate the workflow of pausingand resuming the cluster, which can add complexity and overhead.C. Use the AWS CLI to automatically process the analytics workload. This optionis vague and does not specify how the AWS CLI is used to process the analyticsworkload. The AWS CLI can be used to run queries on data in Amazon S3 usingAmazon Redshift Serverless, Amazon Athena, or Amazon EMR, but each of theseservices has different features and benefits. Moreover, this option does notaddress the requirement of not managing the infrastructure manually, as the dataengineer may still need to provision and configure some resources, such asAmazon EMR clusters or Amazon Athena workgroups.D. Use AWS CloudFormation templates to automatically process the analyticsworkload. This option is also vague and does not specify how AWSCloudFormation templates are used to process the analytics workload. AWSCloudFormation is a service that lets you model and provision AWS resourcesusing templates. You can use AWS CloudFormation templates to create anddelete a Redshift provisioned cluster every month, or to create and configure otherAWS resources, such as Amazon EMR, Amazon Athena, or Amazon RedshiftServerless. However, this option does not address the requirement of notmanaging the infrastructure manually, as the data engineer may still need to writeand maintain the AWS CloudFormation templates, and to monitor the status andperformance of the resources.References:1: Amazon Redshift Serverless2: Amazon Redshift Data API: Amazon Step Functions: AWS CLI: AWS CloudFormation
Question # 19
A financial company wants to use Amazon Athena to run on-demand SQL queries on a petabyte-scale dataset to support a business intelligence (BI) application. An AWS Glue job that runs during non-business hours updates the dataset once every day. The BI application has a standard data refresh frequency of 1 hour to comply with company policies. A data engineer wants to cost optimize the company's use of Amazon Athena without adding any additional infrastructure costs. Which solution will meet these requirements with the LEAST operational overhead?
A. Configure an Amazon S3 Lifecycle policy to move data to the S3 Glacier Deep Archivestorage class after 1 day B. Use the query result reuse feature of Amazon Athena for the SQL queries. C. Add an Amazon ElastiCache cluster between the Bl application and Athena. D. Change the format of the files that are in the dataset to Apache Parquet.
Answer: B Explanation: The best solution to cost optimize the company’s use of Amazon Athena without adding any additional infrastructure costs is to use the query result reuse feature ofAmazonAthena for the SQL queries. This feature allows you to run the same query multipletimes without incurring additional charges, as long as the underlying data has not changedand the query results are still in the query result location in Amazon S31. This feature isuseful for scenarios where you have a petabyte-scale dataset that is updated infrequently,such as once a day, and you have a BI application that runs the same queries repeatedly,such as every hour. By using the query result reuse feature, you can reduce the amount ofdata scanned by your queries and save on the cost of running Athena. You can enable ordisable this feature at the workgroup level or at the individual query level1.Option A is not the best solution, as configuring an Amazon S3 Lifecycle policy to movedata to the S3 Glacier Deep Archive storage class after 1 day would not cost optimize thecompany’s use of Amazon Athena, but rather increase the cost and complexity. AmazonS3 Lifecycle policies are rules that you can define to automatically transition objectsbetween different storage classes based on specified criteria, such as the age of theobject2. S3 Glacier Deep Archive is the lowest-cost storage class in Amazon S3, designedfor long-term data archiving that is accessed once or twice in a year3. While moving data toS3 Glacier Deep Archive can reduce the storage cost, it would also increase the retrievalcost and latency, as it takes up to 12 hours to restore the data from S3 Glacier DeepArchive3. Moreover, Athena does not support querying data that is in S3 Glacier or S3Glacier Deep Archive storage classes4. Therefore, using this option would not meet therequirements of running on-demand SQL queries on the dataset.Option C is not the best solution, as adding an Amazon ElastiCache cluster between the BIapplication and Athena would not cost optimize the company’s use of Amazon Athena, butrather increase the cost and complexity. Amazon ElastiCache is a service that offers fullymanaged in-memory data stores, such as Redis and Memcached, that can improve theperformance and scalability of web applications by caching frequently accessed data.While using ElastiCache can reduce the latency and load on the BI application, it would notreduce the amount of data scanned by Athena, which is the main factor that determines thecost of running Athena. Moreover, using ElastiCache would introduce additional infrastructure costs and operational overhead, as you would have to provision, manage,and scale the ElastiCache cluster, and integrate it with the BI application and Athena.Option D is not the best solution, as changing the format of the files that are in the datasetto Apache Parquet would not cost optimize the company’s use of Amazon Athena withoutadding any additional infrastructure costs, but rather increase the complexity. ApacheParquet is a columnar storage format that can improve the performance of analyticalqueries by reducing the amount of data that needs to be scanned and providing efficientcompression and encoding schemes. However,changing the format of the files that are inthe dataset to Apache Parquet would require additional processing and transformationsteps, such as using AWS Glue or Amazon EMR to convert the files from their originalformat to Parquet, and storing the converted files in a separate location in Amazon S3. Thiswould increase the complexity and the operational overhead of the data pipeline, and alsoincur additional costs for using AWS Glue or Amazon EMR. References:Query result reuseAmazon S3 LifecycleS3 Glacier Deep ArchiveStorage classes supported by Athena[What is Amazon ElastiCache?][Amazon Athena pricing][Columnar Storage Formats]AWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide
Question # 20
A company uses an Amazon Redshift cluster that runs on RA3 nodes. The company wants to scale read and write capacity to meet demand. A data engineer needs to identify a solution that will turn on concurrency scaling. Which solution will meet this requirement?
A. Turn on concurrency scaling in workload management (WLM) for Redshift Serverlessworkgroups. B. Turn on concurrency scaling at the workload management (WLM) queue level in theRedshift cluster. C. Turn on concurrency scaling in the settings duringthe creation of andnew Redshiftcluster. D. Turn on concurrency scaling for the daily usage quota for the Redshift cluster.
Answer: B Explanation: Concurrency scaling is a feature that allows you to support thousands ofconcurrent users and queries, with consistently fast query performance. When you turn onconcurrency scaling, Amazon Redshift automatically adds query processing power inseconds to process queries without any delays. You can manage which queries are sent tothe concurrency-scaling cluster by configuring WLM queues. To turn on concurrencyscaling for a queue, set the Concurrency Scaling mode value to auto. The other options areeither incorrect or irrelevant, as they do not enable concurrency scaling for the existingRedshift cluster on RA3 nodes. References:Working with concurrency scaling - Amazon RedshiftAmazon Redshift Concurrency Scaling - Amazon Web ServicesConfiguring concurrency scaling queues - Amazon RedshiftAWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide(Chapter 6, page 163)c