Amazon MLS-C01 Sample Questions

Question # 91

A Data Scientist is building a model to predict customer churn using a dataset of 100 continuous numerical features. The Marketing team has not provided any insight about which features are relevant for churn prediction. The Marketing team wants to interpret the model and see the direct impact of relevant features on the model outcome. While training a logistic regression model, the Data Scientist observes that there is a wide gap between the training and validation set accuracy. Which methods can the Data Scientist use to improve the model performance and satisfy the Marketing team’s needs? (Choose two.) 
 

A. Add L1 regularization to the classifier 
B. Add features to the dataset 
C. Perform recursive feature elimination 
D. Perform t-distributed stochastic neighbor embedding (t-SNE) 
E. Perform linear discriminant analysis 


Question # 92


 A bank wants to launch a low-rate credit promotion. The bank is located in a town that recently experienced economic hardship. Only some of the bank's customers were affected by the crisis, so the bank's credit team must identify which customers to target with the promotion. However, the credit team wants to make sure that loyal customers' full credit history is considered when the decision is made. The bank's data science team developed a model that classifies account transactions and understands credit eligibility. The data science team used the XGBoost algorithm to train the model. The team used 7 years of bank transaction historical data for training and hyperparameter tuning over the course of several days. The accuracy of the model is sufficient, but the credit team is struggling to explain accurately why the model denies credit to some customers. The credit team has almost no skill in data science. What should the data science team do to address this issue in the MOST operationally efficient manner? 
 

A. Use Amazon SageMaker Studio to rebuild the model. Create a notebook that uses theXGBoost training container to perform model training. Deploy the model at an endpoint.Enable Amazon SageMaker Model Monitor to store inferences. Use the inferences tocreate Shapley values that help explain model behavior. Create a chart that shows featuresand SHapley Additive exPlanations (SHAP) values to explain to the credit team how thefeatures affect the model outcomes. 
B. Use Amazon SageMaker Studio to rebuild the model. Create a notebook that uses theXGBoost training container to perform model training. Activate Amazon SageMakerDebugger, and configure it to calculate and collect Shapley values. Create a chart thatshows features and SHapley Additive exPlanations (SHAP) values to explain to the creditteam how the features affect the model outcomes. 
C. Create an Amazon SageMaker notebook instance. Use the notebook instance and theXGBoost library to locally retrain the model. Use the plot_importance() method in the Python XGBoost interface to create a feature importance chart. Use that chart to explain tothe credit team how the features affect the model outcomes. 
D. Use Amazon SageMaker Studio to rebuild the model. Create a notebook that uses theXGBoost training container to perform model training. Deploy the model at an endpoint.Use Amazon SageMaker Processing to post-analyze the model and create a feature importance explainability chartautomatically for the credit team. 


Question # 93


 A company supplies wholesale clothing to thousands of retail stores. A data scientist must create a model that predicts the daily sales volume for each item for each store. The data scientist discovers that more than half of the stores have been in business for less than 6 months. Sales data is highly consistent from week to week. Daily data from the database has been aggregated weekly, and weeks with no sales are omitted from the current dataset. Five years (100 MB) of sales data is available in Amazon S3. Which factors will adversely impact the performance of the forecast model to be developed, and which actions should the data scientist take to mitigate them? (Choose two.) 
 

A. Detecting seasonality for the majority of stores will be an issue. Request categoricaldata to relate new stores with similar stores that have more historical data. 
B. The sales data does not have enough variance. Request external sales data from otherindustries to improve the model's ability to generalize. 
C. Sales data is aggregated by week. Request daily sales data from the source databaseto enable building a daily model. 
D. The sales data is missing zero entries for item sales. Request that item sales data fromthe source database include zero entries to enable building the model. 
E. Only 100 MB of sales data is available in Amazon S3. Request 10 years of sales data,which would provide 200 MB of training data for the model. 


Question # 94

A credit card company wants to build a credit scoring model to help predict whether a new credit card applicant will default on a credit card payment. The company has collected data from a large number of sources with thousands of raw attributes. Early experiments to train a classification model revealed that many attributes are highly correlated, the large number of features slows down the training speed significantly, and that there are some overfitting issues. The Data Scientist on this project would like to speed up the model training time without losing a lot of information from the original dataset. Which feature engineering technique should the Data Scientist use to meet the objectives? 
 

A. Run self-correlation on all features and remove highly correlated features 
B. Normalize all numerical values to be between 0 and 1 
C. Use an autoencoder or principal component analysis (PCA) to replace original featureswith new features 
D. Cluster raw data using k-means and use sample data from each cluster to build a newdataset 


Question # 95

A machine learning specialist works for a fruit processing company and needs to build a system that categorizes apples into three types. The specialist has collected a dataset that contains 150 images for each type of apple and applied transfer learning on a neural network that was pretrained on ImageNet with this dataset. The company requires at least 85% accuracy to make use of the model. After an exhaustive grid search, the optimal hyperparameters produced the following: 68% accuracy on the training set 67% accuracy on the validation set What can the machine learning specialist do to improve the system’s accuracy? 
 

A. Upload the model to an Amazon SageMaker notebook instance and use the Amazon SageMaker HPO feature to optimize the model’s hyperparameters. 
B. Add more data to the training set and retrain the model using transfer learning to reduce the bias. 
C. Use a neural network model with more layers that are pretrained on ImageNet and apply transfer learning to increase the variance. 
D. Train a new model using the current neural network architecture. 


Question # 96

A telecommunications company is developing a mobile app for its customers. The company is using an Amazon SageMaker hosted endpoint for machine learning model inferences. Developers want to introduce a new version of the model for a limited number of users who subscribed to a preview feature of the app. After the new version of the model is tested as a preview, developers will evaluate its accuracy. If a new version of the model has better accuracy, developers need to be able to gradually release the new version for all users over a fixed period of time. How can the company implement the testing model with the LEAST amount of operational overhead? 
 

A. Update the ProductionVariant data type with the new version of the model by using the CreateEndpointConfig operation with the InitialVariantWeight parameter set to 0. Specifythe TargetVariant parameter for InvokeEndpoint calls for users who subscribed to the preview feature. When the new version of the model is ready for release, graduallyincrease InitialVariantWeight until all users have the updated version. 
B. Configure two SageMaker hosted endpoints that serve the different versions of the model. Create an Application Load Balancer (ALB) to route traffic to both endpoints basedon the TargetVariant query string parameter. Reconfigure the app to send the TargetVariant query string parameter for users who subscribed to the preview feature.When the new version of the model is ready for release, change the ALB's routing algorithm to weighted until all users have the updated version. 
C. Update the DesiredWeightsAndCapacity data type with the new version of the model by using the UpdateEndpointWeightsAndCapacities operation with the DesiredWeight parameter set to 0. Specify the TargetVariant parameter for InvokeEndpoint calls for users who subscribed to the preview feature. When the new version of the model is ready forrelease, gradually increase DesiredWeight until all users have the updated version. 
D. Configure two SageMaker hosted endpoints that serve the different versions of the model. Create an Amazon Route 53 record that is configured with a simple routing policy and that points to the current version of the model. Configure the mobile app to use the endpoint URL for users who subscribed to the preview feature and to use the Route 53 record for other users. When the new version of the model is ready for release, add a new model version endpoint to Route 53, and switch the policy to weighted until all users have the updated version. 


Question # 97

An e commerce company wants to launch a new cloud-based product recommendation feature for its web application. Due to data localization regulations, any sensitive data must not leave its on-premises data center, and the product recommendation model must be trained and tested using nonsensitive data only. Data transfer to the cloud must use IPsec. The web application is hosted on premises with a PostgreSQL database that contains all the data. The company wants the data to be uploaded securely to Amazon S3 each day for model retraining. How should a machine learning specialist meet these requirements? 
 

A. Create an AWS Glue job to connect to the PostgreSQL DB instance. Ingest tableswithout sensitive data through an AWS Site-to-Site VPN connection directly into AmazonS3. 
B. Create an AWS Glue job to connect to the PostgreSQL DB instance. Ingest all datathrough an AWS Site- to-Site VPN connection into Amazon S3 while removing sensitivedata using a PySpark job. 
C. Use AWS Database Migration Service (AWS DMS) with table mapping to selectPostgreSQL tables with no sensitive data through an SSL connection. Replicate datadirectly into Amazon S3. 
D. Use PostgreSQL logical replication to replicate all data to PostgreSQL in Amazon EC2through AWS Direct Connect with a VPN connection. Use AWS Glue to move data fromAmazon EC2 to Amazon S3. 


Question # 98

A retail company is using Amazon Personalize to provide personalized product
 recommendations for its customers during a marketing campaign. The company sees a significant increase in sales of recommended items to existing customers immediately after deploying a new solution version, but these sales decrease a short time after deployment. Only historical data from before the marketing campaign is available for training. How should a data scientist adjust the solution? 
 

A. Use the event tracker in Amazon Personalize to include real-time user interactions. 
B. Add user metadata and use the HRNN-Metadata recipe in Amazon Personalize. 
C. Implement a new solution using the built-in factorization machines (FM) algorithm in Amazon SageMaker. 
D. Add event type and event value fields to the interactions dataset in Amazon Personalize. 


Question # 99

A data scientist must build a custom recommendation model in Amazon SageMaker for an online retail company. Due to the nature of the company's products, customers buy only 4- 5 products every 5-10 years. So, the company relies on a steady stream of new customers. When a new customer signs up, the company collects data on the customer's preferences. Below is a sample of the data available to the data scientist.

 
How should the data scientist split the dataset into a training and test set for this use case?

A. Shuffle all interaction data. Split off the last 10% of the interaction data for the test set. 
B. Identify the most recent 10% of interactions for each user. Split off these interactions forthe test set. 
C. Identify the 10% of users with the least interaction data. Split off all interaction data fromthese users for the test set. 
D. Randomly select 10% of the users. Split off all interaction data from these users for thetest set. 


Question # 100

A media company with a very large archive of unlabeled images, text, audio, and video footage wishes to index its assets to allow rapid identification of relevant content by the Research team. The company wants to use machine learning to accelerate the efforts of its in-house researchers who have limited machine learning expertise. Which is the FASTEST route to index the assets? 
 

A. Use Amazon Rekognition, Amazon Comprehend, and Amazon Transcribe to tag datainto distinct categories/classes. 
B. Create a set of Amazon Mechanical Turk Human Intelligence Tasks to label all footage. 
C. Use Amazon Transcribe to convert speech to text. Use the Amazon SageMaker NeuralTopic Model (NTM) and Object Detection algorithms to tag data into distinct categories/classes. 
D. Use the AWS Deep Learning AMI and Amazon EC2 GPU instances to create custommodels for audio transcription and topic modeling, and use object detection to tag data into distinctcategories/classes. 


‹ First89

Download All Questions PDF Check Customers Feedbacks