Last Updated on November 4, 2022 by InfraExam

DP-100 : Designing and Implementing a Data Science Solution on Azure : Part 04

  1. DRAG DROP

    You create a multi-class image classification deep learning experiment by using the PyTorch framework. You plan to run the experiment on an Azure Compute cluster that has nodes with GPU’s.

    You need to define an Azure Machine Learning service pipeline to perform the monthly retraining of the image classification model. The pipeline must run with minimal cost and minimize the time required to train the model.

    Which three pipeline steps should you run in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q01 049 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q01 049 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q01 049 Answer
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q01 049 Answer
    Explanation:

    Step 1: Configure a DataTransferStep() to fetch new image data…

    Step 2: Configure a PythonScriptStep() to run image_resize.y on the cpu-compute compute target.

    Step 3: Configure the EstimatorStep() to run training script on the gpu_compute computer target.

    The PyTorch estimator provides a simple way of launching a PyTorch training job on a compute target.

  2. Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

    After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

    An IT department creates the following Azure resource groups and resources:

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q02 050
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q02 050

    The IT department creates an Azure Kubernetes Service (AKS)-based inference compute target named aks-cluster in the Azure Machine Learning workspace.

    You have a Microsoft Surface Book computer with a GPU. Python 3.6 and Visual Studio Code are installed.

    You need to run a script that trains a deep neural network (DNN) model and logs the loss and accuracy metrics.

    Solution: Attach the mlvm virtual machine as a compute target in the Azure Machine Learning workspace. Install the Azure ML SDK on the Surface Book and run Python code to connect to the workspace. Run the training script as an experiment on the mlvm remote compute resource.

    Does the solution meet the goal?

    • Yes
    • No
    Explanation:

    Use the VM as a compute target.

    Note: A compute target is a designated compute resource/environment where you run your training script or host your service deployment. This location may be your local machine or a cloud-based compute resource.

  3. Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

    After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

    An IT department creates the following Azure resource groups and resources:

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q03 051
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q03 051

    The IT department creates an Azure Kubernetes Service (AKS)-based inference compute target named aks-cluster in the Azure Machine Learning workspace.

    You have a Microsoft Surface Book computer with a GPU. Python 3.6 and Visual Studio Code are installed.

    You need to run a script that trains a deep neural network (DNN) model and logs the loss and accuracy metrics.

    Solution: Install the Azure ML SDK on the Surface Book. Run Python code to connect to the workspace and then run the training script as an experiment on local compute.

    Does the solution meet the goal?

    • Yes
    • No
    Explanation:
    Need to attach the mlvm virtual machine as a compute target in the Azure Machine Learning workspace.
  4. Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

    After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

    An IT department creates the following Azure resource groups and resources:

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q04 052
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q04 052

    The IT department creates an Azure Kubernetes Service (AKS)-based inference compute target named aks-cluster in the Azure Machine Learning workspace.

    You have a Microsoft Surface Book computer with a GPU. Python 3.6 and Visual Studio Code are installed.

    You need to run a script that trains a deep neural network (DNN) model and logs the loss and accuracy metrics.

    Solution: Install the Azure ML SDK on the Surface Book. Run Python code to connect to the workspace. Run the training script as an experiment on the aks-cluster compute target.

    Does the solution meet the goal?

    • Yes
    • No
    Explanation:
    Need to attach the mlvm virtual machine as a compute target in the Azure Machine Learning workspace.
  5. HOTSPOT

    You plan to use Hyperdrive to optimize the hyperparameters selected when training a model. You create the following code to define options for the hyperparameter experiment:

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q05 053
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q05 053

    For each of the following statements, select Yes if the statement is true. Otherwise, select No.

    NOTE: Each correct selection is worth one point.

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q05 054 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q05 054 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q05 054 Answer
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q05 054 Answer
    Explanation:

    Box 1: No
    max_total_runs (50 here)
    The maximum total number of runs to create. This is the upper bound; there may be fewer runs when the sample space is smaller than this value.

    Box 2: Yes
    Policy Early Termination Policy
    The early termination policy to use. If None – the default, no early termination policy will be used.

    Box 3: No
    Discrete hyperparameters are specified as a choice among discrete values. choice can be:
    one or more comma-separated values
    a range object
    any arbitrary list object

  6. HOTSPOT

    You are using Azure Machine Learning to train machine learning models. You need a compute target on which to remotely run the training script.

    You run the following Python code:

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q06 055
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q06 055

    For each of the following statements, select Yes if the statement is true. Otherwise, select No.

    NOTE: Each correct selection is worth one point.

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q06 056 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q06 056 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q06 056 Answer
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q06 056 Answer
    Explanation:

    Box 1: Yes
    The compute is created within your workspace region as a resource that can be shared with other users.

    Box 2: Yes
    It is displayed as a compute cluster.
    View compute targets
    1. To see all compute targets for your workspace, use the following steps:
    2. Navigate to Azure Machine Learning studio.
    3. Under Manage, select Compute.
    4. Select tabs at the top to show each type of compute target.

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q06 057
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q06 057

    Box 3: Yes
    min_nodes is not specified, so it defaults to 0.

  7. HOTSPOT

    You have an Azure blob container that contains a set of TSV files. The Azure blob container is registered as a datastore for an Azure Machine Learning service workspace. Each TSV file uses the same data schema.

    You plan to aggregate data for all of the TSV files together and then register the aggregated data as a dataset in an Azure Machine Learning workspace by using the Azure Machine Learning SDK for Python.

    You run the following code.

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q07 058
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q07 058

    For each of the following statements, select Yes if the statement is true. Otherwise, select No.

    NOTE: Each correct selection is worth one point.

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q07 059 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q07 059 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q07 059 Answer
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q07 059 Answer
    Explanation:

    Box 1: No
    FileDataset references single or multiple files in datastores or from public URLs. The TSV files need to be parsed.

    Box 2: Yes
    to_path() gets a list of file paths for each file stream defined by the dataset.

    Box 3: Yes
    TabularDataset.to_pandas_dataframe loads all records from the dataset into a pandas Data Frame.

    Tabular Dataset represents data in a tabular format created by parsing the provided file or list of files.

    Note: TSV is a file extension for a tab-delimited file used with spreadsheet software. TSV stands for Tab Separated Values. TSV files are used for raw data and can be imported into and exported from spreadsheet software. TSV files are essentially text files, and the raw data can be viewed by text editors, though they are often used when moving raw data between spreadsheets.

  8. You create a batch inference pipeline by using the Azure ML SDK. You configure the pipeline parameters by executing the following code:

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q08 059
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q08 059

    You need to obtain the output from the pipeline execution.

    Where will you find the output?

    • the digit_identification.py script
    • the debug log
    • the Activity Log in the Azure portal for the Machine Learning workspace
    • the Inference Clusters tab in Machine Learning studio
    • a file named parallel_run_step.txt located in the output folder
    Explanation:
    output_action (str): How the output is to be organized. Currently supported values are ‘append_row’ and ‘summary_only’.
    ‘append_row’ – All values output by run() method invocations will be aggregated into one unique file named parallel_run_step.txt that is created in the output location.
    ‘summary_only’
  9. DRAG DROP

    You create a multi-class image classification deep learning model.

    The model must be retrained monthly with the new image data fetched from a public web portal. You create an Azure Machine Learning pipeline to fetch new data, standardize the size of images, and retrain the model.

    You need to use the Azure Machine Learning SDK to configure the schedule for the pipeline.

    Which four actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q09 060 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q09 060 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q09 060 Answer
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q09 060 Answer
    Explanation:

    Step 1: Publish the pipeline.
    To schedule a pipeline, you’ll need a reference to your workspace, the identifier of your published pipeline, and the name of the experiment in which you wish to create the schedule.

    Step 2: Retrieve the pipeline ID.
    Needed for the schedule.

    Step 3: Create a Schedule Recurrence..
    To run a pipeline on a recurring basis, you’ll create a schedule. A Schedule associates a pipeline, an experiment, and a trigger.
    First create a schedule. Example: Create a Schedule that begins a run every 15 minutes:
    recurrence = Schedule Recurrence(frequency=”Minute”, interval=15)

    Step 4: Define an Azure Machine Learning pipeline schedule..
    Example, continued:
    recurring_schedule = Schedule.create(ws, name=”MyRecurringSchedule”,
    description=”Based on time”,
    pipeline_id=pipeline_id,
    experiment_name=experiment_name,
    recurrence=recurrence)

  10. HOTSPOT

    You create a script for training a machine learning model in Azure Machine Learning service.

    You create an estimator by running the following code:

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q10 061
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q10 061

    For each of the following statements, select Yes if the statement is true. Otherwise, select No.

    NOTE: Each correct selection is worth one point.

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q10 062 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q10 062 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q10 062 Answer
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q10 062 Answer
    Explanation:

    Box 1: Yes
    Parameter source_directory is a local directory containing experiment configuration and code files needed for a training job.

    Box 2: Yes
    script_params is a dictionary of command-line arguments to pass to the training script specified in entry_script.

    Box 3: No

    Box 4: Yes
    The conda_packages parameter is a list of strings representing conda packages to be added to the Python environment for the experiment.

  11. HOTSPOT

    You have a Python data frame named sales Data in the following format:

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q11 063
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q11 063

    The data frame must be unpivoted to a long data format as follows:

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q11 064
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q11 064

    You need to use the pandas.melt() function in Python to perform the transformation.

    How should you complete the code segment? To answer, select the appropriate options in the answer area.

    NOTE: Each correct selection is worth one point.

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q11 065 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q11 065 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q11 065 Answer
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q11 065 Answer
    Explanation:

    Box 1: dataFrame
    Syntax: pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name=’value’, col_level=None)[source]

    Where frame is a DataFrame

    Box 2: shop
    Paramter id_vars id_vars : tuple, list, or ndarray, optional
    Column(s) to use as identifier variables.

    Box 3: [‘2017′,’2018’]
    value_vars : tuple, list, or ndarray, optional
    Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars.

    Example:
    df = pd.DataFrame({‘A’: {0: ‘a’, 1: ‘b’, 2: ‘c’},
    … ‘B’: {0: 1, 1: 3, 2: 5},
    … ‘C’: {0: 2, 1: 4, 2: 6}})

    pd.melt(df, id_vars=[‘A’], value_vars=[‘B’, ‘C’])
    A variable value
    0 a B 1
    1 b B 3
    2 c B 5
    3 a C 2
    4 b C 4
    5 c C 6

  12. HOTSPOT

    You are working on a classification task. You have a dataset indicating whether a student would like to play soccer and associated attributes. The dataset includes the following columns:

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q12 066
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q12 066

    You need to classify variables by type.

    Which variable should you add to each category? To answer, select the appropriate options in the answer area.

    NOTE: Each correct selection is worth one point.

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q12 067 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q12 067 Question

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q12 067 Answer
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q12 067 Answer
  13. HOTSPOT

    You plan to preprocess text from CSV files. You load the Azure Machine Learning Studio default stop words list.

    You need to configure the Preprocess Text module to meet the following requirements:

    – Ensure that multiple related words from a single canonical form.
    – Remove pipe characters from text.
    – Remove words to optimize information retrieval.

    Which three options should you select? To answer, select the appropriate options in the answer area.

    NOTE: Each correct selection is worth one point.

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q13 068 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q13 068 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q13 068 Answer
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q13 068 Answer
    Explanation:

    Box 1: Remove stop words
    Remove words to optimize information retrieval.
    Remove stop words: Select this option if you want to apply a predefined stopword list to the text column. Stop word removal is performed before any other processes.

    Box 2: Lemmatization
    Ensure that multiple related words from a single canonical form.
    Lemmatization converts multiple related words to a single canonical form

    Box 3: Remove special characters
    Remove special characters: Use this option to replace any non-alphanumeric special characters with the pipe | character.

  14. You plan to run a script as an experiment using a Script Run Configuration. The script uses modules from the scipy library as well as several Python packages that are not typically installed in a default conda environment.

    You plan to run the experiment on your local workstation for small datasets and scale out the experiment by running it on more powerful remote compute clusters for larger datasets.

    You need to ensure that the experiment runs successfully on local and remote compute with the least administrative effort.

    What should you do?

    • Do not specify an environment in the run configuration for the experiment. Run the experiment by using the default environment.
    • Create a virtual machine (VM) with the required Python configuration and attach the VM as a compute target. Use this compute target for all experiment runs.
    • Create and register an Environment that includes the required packages. Use this Environment for all experiment runs.
    • Create a config.yaml file defining the conda packages that are required and save the file in the experiment folder.
    • Always run the experiment with an Estimator by using the default packages.
    Explanation:
    If you have an existing Conda environment on your local computer, then you can use the service to create an environment object. By using this strategy, you can reuse your local interactive environment on remote runs.
  15. You write a Python script that processes data in a comma-separated values (CSV) file.

    You plan to run this script as an Azure Machine Learning experiment.

    The script loads the data and determines the number of rows it contains using the following code:

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q15 069
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 04 Q15 069

    You need to record the row count as a metric named row_count that can be returned using the get_metrics method of the Run object after the experiment run completes.

    Which code should you use?

    • run.upload_file(T3 row_count', './data.csv')
    • run.log('row_count', rows)
    • run.tag('row_count', rows)
    • run.log_table('row_count', rows)
    • run.log_row('row_count', rows)
    Explanation:

    Log a numerical or string value to the run with the given name using log(name, value, description=”). Logging a metric to a run causes that metric to be stored in the run record in the experiment. You can log the same metric multiple times within a run, the result being considered a vector of that metric.

    Example: run.log(“accuracy”, 0.95)

    Incorrect Answers:
    E: Using log_row(name, description=None, **kwargs) creates a metric with multiple columns as described in kwargs. Each named parameter generates a column with the value specified. log_row can be called once to log an arbitrary tuple, or multiple times in a loop to generate a complete table.

    Example: run.log_row(“Y over X”, x=1, y=0.4)

  16. Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

    After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

    You are creating a new experiment in Azure Machine Learning Studio.

    One class has a much smaller number of observations than the other classes in the training set.

    You need to select an appropriate data sampling strategy to compensate for the class imbalance.

    Solution: You use the Synthetic Minority Oversampling Technique (SMOTE) sampling mode.

    Does the solution meet the goal?

    • Yes
    • No
    Explanation:
    SMOTE is used to increase the number of underepresented cases in a dataset used for machine learning. SMOTE is a better way of increasing the number of rare cases than simply duplicating existing cases.
  17. Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

    After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

    You are creating a new experiment in Azure Machine Learning Studio.

    One class has a much smaller number of observations than the other classes in the training set.

    You need to select an appropriate data sampling strategy to compensate for the class imbalance.

    Solution: You use the Stratified split for the sampling mode.

    Does the solution meet the goal?

    • Yes
    • No
    Explanation:

    Instead use the Synthetic Minority Oversampling Technique (SMOTE) sampling mode.

    Note: SMOTE is used to increase the number of underepresented cases in a dataset used for machine learning. SMOTE is a better way of increasing the number of rare cases than simply duplicating existing cases.

  18. You are creating a machine learning model.

    You need to identify outliers in the data.

    Which two visualizations can you use? Each correct answer presents a complete solution.

    NOTE: Each correct selection is worth one point.

    • Venn diagram
    • Box plot
    • ROC curve
    • Random forest diagram
    • Scatter plot
    Explanation:

    The box-plot algorithm can be used to display outliers.

    One other way to quickly identify Outliers visually is to create scatter plots.

  19. You are evaluating a completed binary classification machine learning model.

    You need to use the precision as the evaluation metric.

    Which visualization should you use?

    • Violin plot
    • Gradient descent
    • Box plot
    • Binary classification confusion matrix
    Explanation:

    Incorrect Answers:
    A: A violin plot is a visual that traditionally combines a box plot and a kernel density plot.

    B: Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or approximate gradient) of the function at the current point.

    C: A box plot lets you see basic distribution information about your data, such as median, mean, range and quartiles but doesn’t show you how your data looks throughout its range.

  20. You create a multi-class image classification deep learning model that uses the PyTorch deep learning framework.

    You must configure Azure Machine Learning Hyperdrive to optimize the hyperparameters for the classification model.

    You need to define a primary metric to determine the hyperparameter values that result in the model with the best accuracy score.

    Which three actions must you perform? Each correct answer presents part of the solution.

    NOTE: Each correct selection is worth one point.

    • Set the primary_metric_goal of the estimator used to run the bird_classifier_train.py script to maximize.
    • Add code to the bird_classifier_train.py script to calculate the validation loss of the model and log it as a float value with the key loss.
    • Set the primary_metric_goal of the estimator used to run the bird_classifier_train.py script to minimize.
    • Set the primary_metric_name of the estimator used to run the bird_classifier_train.py script to accuracy.
    • Set the primary_metric_name of the estimator used to run the bird_classifier_train.py script to loss.
    • Add code to the bird_classifier_train.py script to calculate the validation accuracy of the model and log it as a float value with the key accuracy.
    Explanation:

    AD:
    primary_metric_name=”accuracy”,
    primary_metric_goal=PrimaryMetricGoal.MAXIMIZE

    Optimize the runs to maximize “accuracy”. Make sure to log this value in your training script.
    Note:
    primary_metric_name: The name of the primary metric to optimize. The name of the primary metric needs to exactly match the name of the metric logged by the training script.

    primary_metric_goal: It can be either PrimaryMetricGoal.MAXIMIZE or PrimaryMetricGoal.MINIMIZE and determines whether the primary metric will be maximized or minimized when evaluating the runs.

    F: The training script calculates the val_accuracy and logs it as “accuracy”, which is used as the primary metric.