A data scientist wants to use Amazon Forecast to build a forecasting model for inventory demand for a retail company. The company has provided a dataset of historic inventory demand for its products as a .csv file stored in an Amazon S3 bucket. The table below shows a sample of the dataset.
How should the data scientist transform the data?
A. Use ETL jobs in AWS Glue to separate the dataset into a target time series dataset andan item metadata dataset. Upload both datasets as .csv files to Amazon S3. B. Use a Jupyter notebook in Amazon SageMaker to separate the dataset into a relatedtime series dataset and an item metadata dataset. Upload both datasets as tables inAmazon Aurora. C. Use AWS Batch jobs to separate the dataset into a target time series dataset, a relatedtime series dataset, and an item metadata dataset. Upload them directly to Forecast from alocal machine. D. Use a Jupyter notebook in Amazon SageMaker to transform the data into the optimizedprotobuf recordIO format. Upload the dataset in this format to Amazon S3.
Answer: A Explanation: Amazon Forecast requires the input data to be in a specific format. The datascientist should use ETL jobs in AWS Glue to separate the dataset into a target time seriesdataset and an item metadata dataset. The target time series dataset should contain thetimestamp, item_id, and demand columns, while the item metadata dataset should containthe item_id, category, and lead_time columns. Both datasets should be uploaded as .csvfiles to Amazon S3 . References:How Amazon Forecast Works - Amazon ForecastChoosing Datasets - Amazon Forecast
Question # 52
The chief editor for a product catalog wants the research and development team to build a machine learning system that can be used to detect whether or not individuals in a collection of images are wearing the company's retail brand. The team has a set of training data. Which machine learning algorithm should the researchers use that BEST meets their requirements?
A. Latent Dirichlet Allocation (LDA) B. Recurrent neural network (RNN) C. K-means D. Convolutional neural network (CNN)
Answer: D Explanation: The problem of detecting whether or not individuals in a collection of images are wearing the company’s retail brand is an example of image recognition, which is a typeof machine learning task that identifies and classifies objects in an image. Convolutionalneural networks (CNNs) are a type of machine learning algorithm that are well-suited forimage recognition, as they can learn to extract features from images and handle variationsin size, shape, color, and orientation of the objects. CNNs consist of multiple layers thatperform convolution, pooling, and activation operations on the input images, resulting in ahigh-level representation that can be used for classification or detection. Therefore, optionD is the best choice for the machine learning algorithm that meets the requirements of thechief editor.Option A is incorrect because latent Dirichlet allocation (LDA) is a type of machine learningalgorithm that is used for topic modeling, which is a task that discovers the hidden themesor topics in a collection of text documents. LDA is not suitable for image recognition, as itdoes not preserve the spatial information of the pixels. Option B is incorrect becauserecurrent neural networks (RNNs) are a type of machine learning algorithm that are usedfor sequential data, such as text, speech, or time series. RNNs can learn from the temporaldependencies and patterns in the input data, and generate outputs that depend on theprevious states. RNNs are not suitable for image recognition, as they do not capture thespatial dependencies and patterns in the input images. Option C is incorrect because kmeansis a type of machine learning algorithm that is used for clustering, which is a taskthat groups similar data points together based on their features. K-means is not suitable forimage recognition, as it does not perform classification or detection of the objects in theimages.References:Image Recognition Software - ML Image & Video Analysis - Amazon …Image classification and object detection using Amazon Rekognition … AWS Amazon Rekognition - Deep Learning Face and Image Recognition …GitHub - awslabs/aws-ai-solution-kit: Machine Learning APIs for common …Meet iNaturalist, an AWS-powered nature app that helps you identify …
Question # 53
A wildlife research company has a set of images of lions and cheetahs. The company created a dataset of the images. The company labeled each image with a binary label that indicates whether an image contains a lion or cheetah. The company wants to train a model to identify whether new images contain a lion or cheetah. .... Dh Amazon SageMaker algorithm will meet this requirement?
A. XGBoost B. Image Classification - TensorFlow C. Object Detection - TensorFlow D. Semantic segmentation - MXNet
Answer: B Explanation: The best Amazon SageMaker algorithm for this task is Image Classification -TensorFlow. This algorithm is a supervised learning algorithm that supports transferlearning with many pretrained models from the TensorFlow Hub. Transfer learning allowsthe company to fine-tune one of the available pretrained models on their own dataset, evenif a large amount of image data is not available. The image classification algorithm takes animage as input and outputs a probability for each provided class label. The company canchoose from a variety of models, such as MobileNet, ResNet, or Inception, depending ontheir accuracy and speed requirements. The algorithm also supports distributed training,data augmentation, and hyperparameter tuning.References:Image Classification - TensorFlow - Amazon SageMakerAmazon SageMaker Provides New Built-in TensorFlow Image ClassificationAlgorithmImage Classification with ResNet :: Amazon SageMaker WorkshopImage classification on Amazon SageMaker | by Julien Simon - Medium
Question # 54
A company’s data scientist has trained a new machine learning model that performs better on test data than the company’s existing model performs in the production environment. The data scientist wants to replace the existing model that runs on an Amazon SageMaker endpoint in the production environment. However, the company is concerned that the new model might not work well on the production environment data. The data scientist needs to perform A/B testing in the production environment to evaluate whether the new model performs well on production environment data. Which combination of steps must the data scientist take to perform the A/B testing? (Choose two.)
A. Create a new endpoint configuration that includes a production variant for each of thetwo models. B. Create a new endpoint configuration that includes two target variants that point todifferent endpoints. C. Deploy the new model to the existing endpoint. D. Update the existing endpoint to activate the new model. E. Update the existing endpoint to use the new endpoint configuration.
Answer: A,E Explanation: The combination of steps that the data scientist must take to perform the A/Btesting are to create a new endpoint configuration that includes a production variant foreach of the two models, and update the existing endpoint to use the new endpointconfiguration. This approach will allow the data scientist to deploy both models on the same endpoint and split the inference traffic between them based on a specifieddistribution.Amazon SageMaker is a fully managed service that provides developers and datascientists the ability to quickly build, train, and deploy machine learning models. AmazonSageMaker supports A/B testing on machine learning models by allowing the data scientistto run multiple production variants on an endpoint. A production variant is a version of amodel that is deployed on an endpoint. Each production variant has a name, a machinelearning model, an instance type, an initial instance count, and an initial weight. The initialweight determines the percentage of inference requests that the variant will handle. Forexample, if there are two variants with weights of 0.5 and 0.5, each variant will handle 50%of the requests. The data scientist can use production variants to test models that havebeen trained using different training datasets, algorithms, and machine learningframeworks; test how they perform on different instance types; or a combination of all of theabove1.To perform A/B testing on machine learning models, the data scientist needs to create anew endpoint configuration that includes a production variant for each of the two models.An endpoint configuration is a collection of settings that define the properties of anendpoint, such as the name, the production variants, and the data capture configuration.The data scientist can use the Amazon SageMaker console, the AWS CLI, or the AWSSDKs to create a new endpoint configuration. The data scientist needs to specify the name,model name, instance type, initial instance count, and initial variant weight for eachproduction variant in the endpoint configuration2.After creating the new endpoint configuration, the data scientist needs to update theexisting endpoint to use the new endpoint configuration. Updating an endpoint is theprocess of deploying a new endpoint configuration to an existing endpoint. Updating anendpoint does not affect the availability or scalability of the endpoint, as AmazonSageMaker creates a new endpoint instance with the new configuration and switches theDNS record to point to the new instance when it is ready. The data scientist can use theAmazon SageMaker console, the AWS CLI, or the AWS SDKs to update an endpoint. Thedata scientist needs to specify the name of the endpoint and the name of the new endpointconfiguration to update the endpoint3.The other options are either incorrect or unnecessary. Creating a new endpointconfiguration that includes two target variants that point to different endpoints is notpossible, as target variants are only used to invoke a specific variant on an endpoint, not todefine an endpoint configuration. Deploying the new model to the existing endpoint wouldreplace the existing model, not run it side-by-side with the new model. Updating theexisting endpoint to activate the new model is not a valid operation, as there is noactivation parameter for an endpoint.References:1: A/B Testing ML models in production using Amazon SageMaker | AWS MachineLearning Blog 2: Create an Endpoint Configuration - Amazon SageMaker3: Update an Endpoint - Amazon SageMake
Question # 55
A data science team is working with a tabular dataset that the team stores in Amazon S3. The team wants to experiment with different feature transformations such as categorical feature encoding. Then the team wants to visualize the resulting distribution of the dataset. After the team finds an appropriate set of feature transformations, the team wants to automate the workflow for feature transformations. Which solution will meet these requirements with the MOST operational efficiency?
A. Use Amazon SageMaker Data Wrangler preconfigured transformations to explorefeature transformations. Use SageMaker Data Wrangler templates for visualization. Exportthe feature processing workflow to a SageMaker pipeline for automation. B. Use an Amazon SageMaker notebook instance to experiment with different featuretransformations. Save the transformations to Amazon S3. Use Amazon QuickSight forvisualization. Package the feature processing steps into an AWS Lambda function forautomation. C. Use AWS Glue Studio with custom code to experiment with different featuretransformations. Save the transformations to Amazon S3. Use Amazon QuickSight forvisualization. Package the feature processing steps into an AWS Lambda function forautomation. D. Use Amazon SageMaker Data Wrangler preconfigured transformations to experimentwith different feature transformations. Save the transformations to Amazon S3. UseAmazon QuickSight for visualzation. Package each feature transformation step into aseparate AWS Lambda function. Use AWS Step Functions for workflow automation.
Answer: A Explanation: The solution A will meet the requirements with the most operationalefficiency because it uses Amazon SageMaker Data Wrangler, which is a service thatsimplifies the process of data preparation and feature engineering for machine learning.The solution A involves the following steps:Use Amazon SageMaker Data Wrangler preconfigured transformations to explorefeature transformations. Amazon SageMaker Data Wrangler provides a visualinterface that allows data scientists to apply various transformations to their tabulardata, such as encoding categorical features, scaling numerical features, imputingmissing values, and more. Amazon SageMaker Data Wrangler also supportscustom transformations using Python code or SQL queries1.Use SageMaker Data Wrangler templates for visualization. Amazon SageMakerData Wrangler also provides a set of templates that can generate visualizations ofthe data, such as histograms, scatter plots, box plots, and more. Thesevisualizations can help data scientists to understand the distribution andcharacteristics of the data, and to compare the effects of different featuretransformations1.Export the feature processing workflow to a SageMaker pipeline for automation.Amazon SageMaker Data Wrangler can export the feature processing workflow asa SageMaker pipeline, which is a service that orchestrates and automatesmachine learning workflows. A SageMaker pipeline can run the feature processingsteps as a preprocessing step, and then feed the output to a training step or aninference step. This can reduce the operational overhead of managing the featureprocessing workflow and ensure its consistency and reproducibility2.The other options are not suitable because:Option B: Using an Amazon SageMaker notebook instance to experiment withdifferent feature transformations, saving the transformations to Amazon S3, usingAmazon QuickSight for visualization, and packaging the feature processing stepsinto an AWS Lambda function for automation will incur more operational overheadthan using Amazon SageMaker Data Wrangler. The data scientist will have towrite the code for the feature transformations, the data storage, the datavisualization, and the Lambda function. Moreover, AWS Lambda has limitations onthe execution time, memory size, and package size, which may not be sufficientfor complex feature processing tasks3.Option C: Using AWS Glue Studio with custom code to experiment with differentfeature transformations, saving the transformations to Amazon S3, using AmazonQuickSight for visualization, and packaging the feature processing steps into anAWS Lambda function for automation will incur more operational overhead thanusing Amazon SageMaker Data Wrangler. AWS Glue Studio is a visual interfacethat allows data engineers to create and run extract, transform, and load (ETL)jobs on AWS Glue. However, AWS Glue Studio does not provide preconfiguredtransformations or templates for feature engineering or data visualization. The datascientist will have to write custom code for these tasks, as well as for the Lambdafunction. Moreover, AWS Glue Studio is not integrated with SageMaker pipelines,and it may not be optimized for machine learning workflows4.Option D: Using Amazon SageMaker Data Wrangler preconfiguredtransformations to experiment with different feature transformations, saving thetransformations to Amazon S3, using Amazon QuickSight for visualization, packaging each feature transformation step into a separate AWS Lambda function,and using AWS Step Functions for workflow automation will incur more operationaloverhead than using Amazon SageMaker Data Wrangler. The data scientist willhave to create and manage multiple AWS Lambda functions and AWS StepFunctions, which can increase the complexity and cost of the solution. Moreover,AWS Lambda and AWS Step Functions may not be compatible with SageMakerpipelines, and they may not be optimized for machine learning workflows5.References:1: Amazon SageMaker Data Wrangler2: Amazon SageMaker Pipelines3: AWS Lambda4: AWS Glue Studio5: AWS Step Functions
Question # 56
A Machine Learning Specialist is training a model to identify the make and model of vehicles in images The Specialist wants to use transfer learning and an existing model trained on images of general objects The Specialist collated a large custom dataset of pictures containing different vehicle makes and models. What should the Specialist do to initialize the model to re-train it with the custom data?
A. Initialize the model with random weights in all layers including the last fully connectedlayer B. Initialize the model with pre-trained weights in all layers and replace the last fullyconnected layer. C. Initialize the model with random weights in all layers and replace the last fully connectedlayer D. Initialize the model with pre-trained weights in all layers including the last fully connectedlayer
Answer: B Explanation: Transfer learning is a technique that allows us to use a model trained for acertain task as a starting point for a machine learning model for a different task. For imageclassification, a common practice is to use a pre-trained model that was trained on a largeand general dataset, such as ImageNet, and then customize it for the specific task. Oneway to customize the model is to replace the last fully connected layer, which is responsiblefor the final classification, with a new layer that has the same number of units as thenumber of classes in the new task. This way, the model can leverage the features learnedby the previous layers, which are generic and useful for many image recognition tasks, andlearn to map them to the new classes. The new layer can be initialized with randomweights, and the rest of the model can be initialized with the pre-trained weights. Thismethod is also known as feature extraction, as it extracts meaningful features from the pretrainedmodel and uses them for the new task. References:Transfer learning and fine-tuningDeep transfer learning for image classification: a survey
Question # 57
A retail company is ingesting purchasing records from its network of 20,000 stores to Amazon S3 by using Amazon Kinesis Data Firehose. The company uses a small, serverbased application in each store to send the data to AWS over the internet. The company uses this data to train a machine learning model that is retrained each day. The company's data science team has identified existing attributes on these records that could be combined to create an improved model. Which change will create the required transformed records with the LEAST operational overhead?
A. Create an AWS Lambda function that can transform the incoming records. Enable datatransformation on the ingestion Kinesis Data Firehose delivery stream. Use the Lambdafunction as the invocation target. B. Deploy an Amazon EMR cluster that runs Apache Spark and includes the transformationlogic. Use Amazon EventBridge (Amazon CloudWatch Events) to schedule an AWS Lambda function to launch the cluster each day and transform the records that accumulatein Amazon S3. Deliver the transformed records to Amazon S3. C. Deploy an Amazon S3 File Gateway in the stores. Update the in-store software todeliver data to the S3 File Gateway. Use a scheduled daily AWS Glue job to transform thedata that the S3 File Gateway delivers to Amazon S3. D. Launch a fleet of Amazon EC2 instances that include the transformation logic. Configurethe EC2 instances with a daily cron job to transform the records that accumulate in AmazonS3. Deliver the transformed records to Amazon S3.
Answer: A Explanation: The solution A will create the required transformed records with the least operational overhead because it uses AWS Lambda and Amazon Kinesis Data Firehose,which are fully managed services that can provide the desired functionality. The solution Ainvolves the following steps:Create an AWS Lambda function that can transform the incoming records. AWSLambda is a service that can run code without provisioning or managingservers. AWS Lambda can execute the transformation logic on the purchasingrecords and add the new attributes to the records1.Enable data transformation on the ingestion Kinesis Data Firehose deliverystream. Use the Lambda function as the invocation target. Amazon Kinesis DataFirehose is a service that can capture, transform, and load streaming data intoAWS data stores. Amazon Kinesis Data Firehose can enable data transformationand invoke the Lambda function to process the incoming records before deliveringthem to Amazon S3. This can reduce the operational overhead of managing thetransformation process and the data storage2.The other options are not suitable because:Option B: Deploying an Amazon EMR cluster that runs Apache Spark and includesthe transformation logic, using Amazon EventBridge (Amazon CloudWatchEvents) to schedule an AWS Lambda function to launch the cluster each day andtransform the records that accumulate in Amazon S3, and delivering thetransformed records to Amazon S3 will incur more operational overhead thanusing AWS Lambda and Amazon Kinesis Data Firehose. The company will have tomanage the Amazon EMR cluster, the Apache Spark application, the AWSLambda function, and the Amazon EventBridge rule. Moreover, this solution willintroduce a delay in the transformation process, as it will run only once a day3.Option C: Deploying an Amazon S3 File Gateway in the stores, updating the instoresoftware to deliver data to the S3 File Gateway, and using a scheduled dailyAWS Glue job to transform the data that the S3 File Gateway delivers to AmazonS3 will incur more operational overhead than using AWS Lambda and AmazonKinesis Data Firehose. The company will have to manage the S3 File Gateway,the in-store software, and the AWS Glue job. Moreover, this solution will introducea delay in the transformation process, as it will run only once a day4.Option D: Launching a fleet of Amazon EC2 instances that include thetransformation logic, configuring the EC2 instances with a daily cron job totransform the records that accumulate in Amazon S3, and delivering thetransformed records to Amazon S3 will incur more operational overhead thanusing AWS Lambda and Amazon Kinesis Data Firehose. The company will have to manage the EC2 instances, the transformation code, and the cron job. Moreover,this solution will introduce a delay in the transformation process, as it will run onlyonce a day5.References:1: AWS Lambda2: Amazon Kinesis Data Firehose3: Amazon EMR4: Amazon S3 File Gateway5: Amazon EC2
Question # 58
A company wants to enhance audits for its machine learning (ML) systems. The auditing system must be able to perform metadata analysis on the features that the ML models use. The audit solution must generate a report that analyzes the metadata. The solution also must be able to set the data sensitivity and authorship of features. Which solution will meet these requirements with the LEAST development effort?
A. Use Amazon SageMaker Feature Store to select the features. Create a data flow toperform feature-level metadata analysis. Create an Amazon DynamoDB table to storefeature-level metadata. Use Amazon QuickSight to analyze the metadata. B. Use Amazon SageMaker Feature Store to set feature groups for the current featuresthat the ML models use. Assign the required metadata for each feature. Use SageMakerStudio to analyze the metadata. C. Use Amazon SageMaker Features Store to apply custom algorithms to analyze thefeature-level metadata that the company requires. Create an Amazon DynamoDB table tostore feature-level metadata. Use Amazon QuickSight to analyze the metadata. D. Use Amazon SageMaker Feature Store to set feature groups for the current featuresthat the ML models use. Assign the required metadata for each feature. Use AmazonQuickSight to analyze the metadata.
Answer: D Explanation: The solution that will meet the requirements with the least development effort is to use Amazon SageMaker Feature Store to set feature groups for the current featuresthat the ML models use, assign the required metadata for each feature, and use AmazonQuickSight to analyze the metadata. This solution can leverage the existing AWS servicesand features to perform feature-level metadata analysis and reporting.Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store,update, search, and share machine learning (ML) features. The service provides featuremanagement capabilities such as enabling easy feature reuse, low latency serving, timetravel, and ensuring consistency between features used in training and inferenceworkflows. A feature group is a logical grouping of ML features whose organization andstructure is defined by a feature group schema. A feature group schema consists of a list offeature definitions, each of which specifies the name, type, and metadata of a feature. Themetadata can include information such as data sensitivity, authorship, description, andparameters. The metadata can help make features discoverable, understandable, andtraceable. Amazon SageMaker Feature Store allows users to set feature groups for thecurrent features that the ML models use, and assign the required metadata for each featureusing the AWS SDK for Python (Boto3), AWS Command Line Interface (AWS CLI), orAmazon SageMaker Studio1. Amazon QuickSight is a fully managed, serverless business intelligence service that makesit easy to create and publish interactive dashboards that include ML insights. AmazonQuickSight can connect to various data sources, such as Amazon S3, Amazon Athena,Amazon Redshift, and Amazon SageMaker Feature Store, and analyze the data usingstandard SQL or built-in ML-powered analytics. Amazon QuickSight can also create richvisualizations and reports that can be accessed from any device, and securely shared withanyone inside or outside an organization. Amazon QuickSight can be used to analyze themetadata of the features stored in Amazon SageMaker Feature Store, and generate areport that summarizes the metadata analysis2.The other options are either more complex or less effective than the proposed solution.Using Amazon SageMaker Data Wrangler to select the features and create a data flow toperform feature-level metadata analysis would require additional steps and resources, andmay not capture all the metadata attributes that the company requires. Creating anAmazon DynamoDB table to store feature-level metadata would introduce redundancy andinconsistency, as the metadata is already stored in Amazon SageMaker Feature Store.Using SageMaker Studio to analyze the metadata would not generate a report that can beeasily shared and accessed by the company.References:1: Amazon SageMaker Feature Store – Amazon Web Services2: Amazon QuickSight – Business Intelligence Service - Amazon Web Services
Question # 59
A company's machine learning (ML) specialist is building a computer vision model to classify 10 different traffic signs. The company has stored 100 images of each class in Amazon S3, and the company has another 10.000 unlabeled images. All the images come from dash cameras and are a size of 224 pixels * 224 pixels. After several training runs, the model is overfitting on the training data. Which actions should the ML specialist take to address this problem? (Select TWO.)
A. Use Amazon SageMaker Ground Truth to label the unlabeled images B. Use image preprocessing to transform the images into grayscale images. C. Use data augmentation to rotate and translate the labeled images. D. Replace the activation of the last layer with a sigmoid. E. Use the Amazon SageMaker k-nearest neighbors (k-NN) algorithm to label theunlabeled images.
Answer: C,E Explanation: Data augmentation is a technique to increase the size and diversity of the trainingdata by applying random transformations such as rotation, translation, scaling,flipping, etc. This can help reduce overfitting and improve the generalization of themodel. Data augmentation can be done using the Amazon SageMaker imageclassification algorithm, which supports various augmentation options such ashorizontal_flip, vertical_flip, rotate, brightness, contrast, etc1The Amazon SageMaker k-nearest neighbors (k-NN) algorithm is a supervisedlearning algorithm that can be used to label unlabeled data based on the similarityto the labeled data. The k-NN algorithm assigns a label to an unlabeled instanceby finding the k closest labeled instances in the feature space and taking amajority vote among their labels. This can help increase the size and diversity ofthe training data and reduce overfitting. The k-NN algorithm can be used with theAmazon SageMaker image classification algorithm by extracting features from theimages using a pre-trained model and then applying the k-NN algorithm on thefeature vectors2Using Amazon SageMaker Ground Truth to label the unlabeled images is not agood option because it is a manual and costly process that requires humanannotators. Moreover, it does not address the issue of overfitting on the existinglabeled data.Using image preprocessing to transform the images into grayscale images is not agood option because it reduces the amount of information and variation in theimages, which can degrade the performance of the model. Moreover, it does notaddress the issue of overfitting on the existing labeled data.Replacing the activation of the last layer with a sigmoid is not a good optionbecause it is not suitable for a multi-class classification problem. A sigmoidactivation function outputs a value between 0 and 1, which can be interpreted as aprobability of belonging to a single class. However, for a multi-class classificationproblem, the output should be a vector of probabilities that sum up to 1, which canbe achieved by using a softmax activation function.References:1: Image classification algorithm - Amazon SageMaker2: k-nearest neighbors (k-NN) algorithm - Amazon SageMaker
Question # 60
An obtain relator collects the following data on customer orders: demographics, behaviors, location, shipment progress, and delivery time. A data scientist joins all the collected datasets. The result is a single dataset that includes 980 variables. The data scientist must develop a machine learning (ML) model to identify groups of customers who are likely to respond to a marketing campaign. Which combination of algorithms should the data scientist use to meet this requirement? (Select TWO.)
A. Latent Dirichlet Allocation (LDA) B. K-means C. Se mantic feg mentation D. Principal component analysis (PCA) E. Factorization machines (FM)
Answer: B,D Explanation:The data scientist should use K-means and principal component analysis (PCA) to meetthis requirement. K-means is a clustering algorithm that can group customers based ontheir similarity in the feature space. PCA is a dimensionality reduction technique that cantransform the original 980 variables into a smaller set of uncorrelated variables that capturemost of the variance in the data. This can help reduce the computational cost and noise inthe data, and improve the performance of the clustering algorithm.References:Clustering - Amazon SageMakerDimensionality Reduction - Amazon SageMaker