DP-100 : Designing and Implementing a Data Science Solution on Azure : Part 03

  1. Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

    After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

    You are analyzing a numerical dataset which contains missing values in several columns.

    You must clean the missing values using an appropriate operation without affecting the dimensionality of the feature set.

    You need to analyze a full dataset to include all values.

    Solution: Calculate the column median value and use the median value as the replacement for any missing value in the column.

    Does the solution meet the goal?

    • Yes
    • No

    Explanation:
    Use the Multiple Imputation by Chained Equations (MICE) method.

  2. You create an Azure Machine Learning workspace.

    You must create a custom role named DataScientist that meets the following requirements:

    – Role members must not be able to delete the workspace.
    – Role members must not be able to create, update, or delete compute resources in the workspace.
    – Role members must not be able to add new users to the workspace.

    You need to create a JSON file for the DataScientist role in the Azure Machine Learning workspace.

    The custom role must enforce the restrictions specified by the IT Operations team.

    Which JSON code segment should you use?

    • DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q02 020
      DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q02 020
    • DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q02 021
      DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q02 021
    • DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q02 022
      DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q02 022

       

    • DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q02 023
      DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q02 023
      Explanation:

      The following custom role can do everything in the workspace except for the following actions:
      – It can’t create or update a compute resource.
      – It can’t delete a compute resource.
      – It can’t add, delete, or alter role assignments.
      – It can’t delete the workspace.

      To create a custom role, first construct a role definition JSON file that specifies the permission and scope for the role. The following example defines a custom role named “Data Scientist Custom” scoped at a specific workspace level:

      data_scientist_custom_role.json :
      {
      “Name”: “Data Scientist Custom”,
      “IsCustom”: true,
      “Description”: “Can run experiment but can’t create or delete compute.”,
      “Actions”: [“*”],
      “NotActions”: [
      “Microsoft.MachineLearningServices/workspaces/*/delete”,
      “Microsoft.MachineLearningServices/workspaces/write”,
      “Microsoft.MachineLearningServices/workspaces/computes/*/write”,
      “Microsoft.MachineLearningServices/workspaces/computes/*/delete”,
      “Microsoft.Authorization/*/write”
      ],
      “AssignableScopes”: [
      “/subscriptions/<subscription_id>/resourceGroups/<resource_group_name>/providers/Microsoft.MachineLearningServices/workspaces/<workspace_name>”
      ]
      }

  3. Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

    After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

    You are a data scientist using Azure Machine Learning Studio.

    You need to normalize values to produce an output column into bins to predict a target column.

    Solution: Apply an Equal Width with Custom Start and Stop binning mode.

    Does the solution meet the goal?

    • Yes
    • No
    Explanation:
    Use the Entropy MDL binning mode which has a target column.
  4. Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

    After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

    You are a data scientist using Azure Machine Learning Studio.

    You need to normalize values to produce an output column into bins to predict a target column.

    Solution: Apply a Quantiles binning mode with a PQuantile normalization.

    Does the solution meet the goal?

    • Yes
    • No
    Explanation:
    Use the Entropy MDL binning mode which has a target column.
  5. HOTSPOT

    You are evaluating a Python NumPy array that contains six data points defined as follows:

    data = [10, 20, 30, 40, 50, 60]

    You must generate the following output by using the k-fold algorithm implantation in the Python Scikit-learn machine learning library:

    train: [10 40 50 60], test: [20 30]
    train: [20 30 40 60], test: [10 50]
    train: [10 20 30 50], test: [40 60]

    You need to implement a cross-validation to generate the output.

    How should you complete the code segment? To answer, select the appropriate code segment in the dialog box in the answer area.

    NOTE: Each correct selection is worth one point.

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q05 024 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q05 024 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q05 024 Answer
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q05 024 Answer
    Explanation:

    Box 1: k-fold

    Box 2: 3
    K-Folds cross-validator provides train/test indices to split data in train/test sets. Split dataset into k consecutive folds (without shuffling by default).
    The parameter n_splits ( int, default=3) is the number of folds. Must be at least 2.

    Box 3: data

    Example: Example:

    >>>
    >>> from sklearn.model_selection import KFold
    >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
    >>> y = np.array([1, 2, 3, 4])
    >>> kf = KFold(n_splits=2)
    >>> kf.get_n_splits(X)
    2
    >>> print(kf)
    KFold(n_splits=2, random_state=None, shuffle=False)
    >>> for train_index, test_index in kf.split(X):
    … print(“TRAIN:”, train_index, “TEST:”, test_index)
    … X_train, X_test = X[train_index], X[test_index]
    … y_train, y_test = y[train_index], y[test_index]
    TRAIN: [2 3] TEST: [0 1]
    TRAIN: [0 1] TEST: [2 3]

  6. You are with a time series dataset in Azure Machine Learning Studio.

    You need to split your dataset into training and testing subsets by using the Split Data module.

    Which splitting mode should you use?

    • Recommender Split
    • Regular Expression Split
    • Relative Expression Split
    • Split Rows with the Randomized split parameter set to true
    Explanation:

    Split Rows: Use this option if you just want to divide the data into two parts. You can specify the percentage of data to put in each split, but by default, the data is divided 50-50.

    Incorrect Answers:
    B: Regular Expression Split: Choose this option when you want to divide your dataset by testing a single column for a value.
    C: Relative Expression Split: Use this option whenever you want to apply a condition to a number column.

  7. HOTSPOT

    You are preparing to build a deep learning convolutional neural network model for image classification. You create a script to train the model using CUDA devices.

    You must submit an experiment that runs this script in the Azure Machine Learning workspace.

    The following compute resources are available:

    – a Microsoft Surface device on which Microsoft Office has been installed. Corporate IT policies prevent the installation of additional software
    – a Compute Instance named ds-workstation in the workspace with 2 CPUs and 8 GB of memory
    – an Azure Machine Learning compute target named cpu-cluster with eight CPU-based nodes
    – an Azure Machine Learning compute target named gpu-cluster with four CPU and GPU-based nodes

    You need to specify the compute resources to be used for running the code to submit the experiment, and for running the script in order to minimize model training time.

    Which resources should the data scientist use? To answer, select the appropriate options in the answer area.

    NOTE: Each correct selection is worth one point.

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q07 025 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q07 025 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q07 025 Answer
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q07 025 Answer
    Explanation:

    Box 1: the ds-workstation compute instance
    A workstation notebook instance is good enough to run experiments.

    Box 2: the gpu-cluster compute target
    Just as GPUs revolutionized deep learning through unprecedented training and inferencing performance, RAPIDS enables traditional machine learning practitioners to unlock game-changing performance with GPUs. With RAPIDS on Azure Machine Learning service, users can accelerate the entire machine learning pipeline, including data processing, training and inferencing, with GPUs from the NC_v3, NC_v2, ND or ND_v2 families. Users can unlock performance gains of more than 20X (with 4 GPUs), slashing training times from hours to minutes and dramatically reducing time-to-insight.

  8. You create an Azure Machine Learning workspace. You are preparing a local Python environment on a laptop computer. You want to use the laptop to connect to the workspace and run experiments.

    You create the following config.json file.

    {
    "workspace_name" : "ml-workspace"
    }

    You must use the Azure Machine Learning SDK to interact with data and experiments in the workspace.

    You need to configure the config.json file to connect to the workspace from the Python environment.

    Which two additional parameters must you add to the config.json file in order to connect to the workspace? Each correct answer presents part of the solution.

    NOTE: Each correct selection is worth one point.

    • login
    • resource_group
    • subscription_id
    • key
    • region
    Explanation:

    To use the same workspace in multiple environments, create a JSON configuration file. The configuration file saves your subscription (subscription_id), resource (resource_group), and workspace name so that it can be easily loaded.

    The following sample shows how to create a workspace.
    from azureml.core import Workspace
    ws = Workspace.create(name=’myworkspace’,
    subscription_id='<azure-subscription-id>’,
    resource_group=’myresourcegroup’,
    create_resource_group=True,
    location=’eastus2′
    )

  9. HOTSPOT

    You are performing a classification task in Azure Machine Learning Studio.

    You must prepare balanced testing and training samples based on a provided data set.

    You need to split the data with a 0.75:0.25 ratio.

    Which value should you use for each parameter? To answer, select the appropriate options in the answer area.

    NOTE: Each correct selection is worth one point.

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q09 026 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q09 026 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q09 026 Answer
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q09 026 Answer
    Explanation:

    Box 1: Split rows
    Use the Split Rows option if you just want to divide the data into two parts. You can specify the percentage of data to put in each split, but by default, the data is divided 50-50.

    You can also randomize the selection of rows in each group, and use stratified sampling. In stratified sampling, you must select a single column of data for which you want values to be apportioned equally among the two result datasets.

    Box 2: 0.75
    If you specify a number as a percentage, or if you use a string that contains the “%” character, the value is interpreted as a percentage. All percentage values must be within the range (0, 100), not including the values 0 and 100.

    Box 3: Yes
    To ensure splits are balanced.

    Box 4: No
    If you use the option for a stratified split, the output datasets can be further divided by subgroups, by selecting a strata column.

  10. You create an Azure Machine Learning compute resource to train models. The compute resource is configured as follows:

    – Minimum nodes: 2
    – Maximum nodes: 4

    You must decrease the minimum number of nodes and increase the maximum number of nodes to the following values:

    – Minimum nodes: 0
    – Maximum nodes: 8

    You need to reconfigure the compute resource.

    What are three possible ways to achieve this goal? Each correct answer presents a complete solution.

    NOTE: Each correct selection is worth one point.

    • Use the Azure Machine Learning studio.
    • Run the update method of the AmlCompute class in the Python SDK.
    • Use the Azure portal.
    • Use the Azure Machine Learning designer.
    • Run the refresh_state() method of the BatchCompute class in the Python SDK.
    Explanation:

    A: You can manage assets and resources in the Azure Machine Learning studio.

    B: The update(min_nodes=None, max_nodes=None, idle_seconds_before_scaledown=None) of the AmlCompute class updates the ScaleSettings for this AmlCompute target.

    C: To change the nodes in the cluster, use the UI for your cluster in the Azure portal.

  11. HOTSPOT

    You have a dataset that contains 2,000 rows. You are building a machine learning classification model by using Azure Learning Studio. You add a Partition and Sample module to the experiment.

    You need to configure the module. You must meet the following requirements:

    – Divide the data into subsets
    – Assign the rows into folds using a round-robin method
    – Allow rows in the dataset to be reused

    How should you configure the module? To answer, select the appropriate options in the dialog box in the answer area.

    NOTE: Each correct selection is worth one point.

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q11 027 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q11 027 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q11 027 Answer
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q11 027 Answer
    Explanation: Use the Split data into partitions option when you want to divide the dataset into subsets of the data. This option is also useful when you want to create a custom number of folds for cross-validation, or to split rows into several groups.
    1. Add the Partition and Sample module to your experiment in Studio (classic), and connect the dataset.
    2. For Partition or sample mode, select Assign to Folds.
    3. Use replacement in the partitioning: Select this option if you want the sampled row to be put back into the pool of rows for potential reuse. As a result, the same row might be assigned to several folds.
    4. If you do not use replacement (the default option), the sampled row is not put back into the pool of rows for potential reuse. As a result, each row can be assigned to only one fold.
    5. Randomized split: Select this option if you want rows to be randomly assigned to folds.
    If you do not select this option, rows are assigned to folds using the round-robin method.
  12. You create a new Azure subscription. No resources are provisioned in the subscription.

    You need to create an Azure Machine Learning workspace.

    What are three possible ways to achieve this goal? Each correct answer presents a complete solution.

    NOTE: Each correct selection is worth one point.

    • Run Python code that uses the Azure ML SDK library and calls the Workspace.get method with name, subscription_id, and resource_group parameters.
    • Navigate to Azure Machine Learning studio and create a workspace.
    • Use the Azure Command Line Interface (CLI) with the Azure Machine Learning extension to call the az group create function with –name and –location parameters, and then the az ml workspace create function, specifying –w and –g parameters for the workspace name and resource group.
    • Navigate to Azure Machine Learning studio and create a workspace.
    • Run Python code that uses the Azure ML SDK library and calls the Workspace.get method with name, subscription_id, and resource_group parameters.
    Explanation:

    B: You can create a workspace in the Azure Machine Learning studio

    C: You can create a workspace for Azure Machine Learning with Azure CLI
    Install the machine learning extension.

    Create a resource group: az group create –name <resource-group-name> –location <location>

    To create a new workspace where the services are automatically created, use the following command: az ml workspace create -w <workspace-name> -g <resource-group-name>

    D: You can create and manage Azure Machine Learning workspaces in the Azure portal.
    Sign in to the Azure portal by using the credentials for your Azure subscription.
    In the upper-left corner of Azure portal, select + Create a resource.
    Use the search bar to find Machine Learning.
    Select Machine Learning.
    In the Machine Learning pane, select Create to begin.

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q12 028
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q12 028
  13. HOTSPOT

    You create an Azure Machine Learning workspace and set up a development environment. You plan to train a deep neural network (DNN) by using the Tensorflow framework and by using estimators to submit training scripts.

    You must optimize computation speed for training runs.

    You need to choose the appropriate estimator to use as well as the appropriate training compute target configuration.

    Which values should you use? To answer, select the appropriate options in the answer area.

    NOTE: Each correct selection is worth one point.

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q13 029 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q13 029 Question
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q13 029 Answer
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q13 029 Answer
    Explanation:

    Box 1: Tensorflow
    TensorFlow represents an estimator for training in TensorFlow experiments.

    Box 2: 12 vCPU, 112 GB memory..,2 GPU,..
    Use GPUs for the deep neural network.

  14. You are analyzing a dataset containing historical data from a local taxi company. You are developing a regression model.

    You must predict the fare of a taxi trip.

    You need to select performance metrics to correctly evaluate the regression model.

    Which two metrics can you use? Each correct answer presents a complete solution?

    NOTE: Each correct selection is worth one point.

    • a Root Mean Square Error value that is low
    • an R-Squared value close to 0
    • an F1 score that is low
    • an R-Squared value close to 1
    • an F1 score that is high
    • a Root Mean Square Error value that is high
    Explanation:

    RMSE and R2 are both metrics for regression models.

    A: Root mean squared error (RMSE) creates a single value that summarizes the error in the model. By squaring the difference, the metric disregards the difference between over-prediction and under-prediction.

    D: Coefficient of determination, often referred to as R2, represents the predictive power of the model as a value between 0 and 1. Zero means the model is random (explains nothing); 1 means there is a perfect fit. However, caution should be used in interpreting R2 values, as low values can be entirely normal and high values can be suspect.

    Incorrect Answers:
    C, E: F-score is used for classification models, not for regression models.

  15. Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

    After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

    You are using Azure Machine Learning to run an experiment that trains a classification model.

    You want to use Hyperdrive to find parameters that optimize the AUC metric for the model. You configure a HyperDriveConfig for the experiment by running the following code:

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q15 030
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q15 030

    You plan to use this configuration to run a script that trains a random forest model and then tests it with validation data. The label values for the validation data are stored in a variable named y_test variable, and the predicted probabilities from the model are stored in a variable named y_predicted.

    You need to add logging to the script to allow Hyperdrive to optimize hyperparameters for the AUC metric.

    Solution: Run the following code:

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q15 031
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q15 031

    Does the solution meet the goal?

    • Yes
    • No
    Explanation:

    Python printing/logging example:
    logging.info(message)

    Destination: Driver logs, Azure Machine Learning designer

  16. Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

    After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

    You are using Azure Machine Learning to run an experiment that trains a classification model.

    You want to use Hyperdrive to find parameters that optimize the AUC metric for the model. You configure a HyperDriveConfig for the experiment by running the following code:

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q16 032
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q16 032

    You plan to use this configuration to run a script that trains a random forest model and then tests it with validation data. The label values for the validation data are stored in a variable named y_test variable, and the predicted probabilities from the model are stored in a variable named y_predicted.

    You need to add logging to the script to allow Hyperdrive to optimize hyperparameters for the AUC metric.

    Solution: Run the following code:

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q16 033
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q16 033

    Does the solution meet the goal?

    •  Yes
    • No
    Explanation:

    Use a solution with logging.info(message) instead.

    Note: Python printing/logging example:
    logging.info(message)

    Destination: Driver logs, Azure Machine Learning designer

  17. Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

    After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

    You are using Azure Machine Learning to run an experiment that trains a classification model.

    You want to use Hyperdrive to find parameters that optimize the AUC metric for the model. You configure a HyperDriveConfig for the experiment by running the following code:

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q17 034
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q17 034

    You plan to use this configuration to run a script that trains a random forest model and then tests it with validation data. The label values for the validation data are stored in a variable named y_test variable, and the predicted probabilities from the model are stored in a variable named y_predicted.

    You need to add logging to the script to allow Hyperdrive to optimize hyperparameters for the AUC metric.

    Solution: Run the following code:

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q17 035
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q17 035

    Does the solution meet the goal?

    •  Yes
    • No
    Explanation:

    Use a solution with logging.info(message) instead.

    Note: Python printing/logging example:
    logging.info(message)

    Destination: Driver logs, Azure Machine Learning designer

  18. You use the following code to run a script as an experiment in Azure Machine Learning:

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q18 036
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q18 036

    You must identify the output files that are generated by the experiment run.

    You need to add code to retrieve the output file names.

    Which code segment should you add to the script?

    • files = run.get_properties()
    • files= run.get_file_names()
    • files = run.get_details_with_logs()
    • files = run.get_metrics()
    • files = run.get_details()
    Explanation:
    You can list all of the files that are associated with this run record by called run.get_file_names()
  19. You write five Python scripts that must be processed in the order specified in Exhibit A – which allows the same modules to run in parallel, but will wait for modules with dependencies.

    You must create an Azure Machine Learning pipeline using the Python SDK, because you want to script to create the pipeline to be tracked in your version control system. You have created five PythonScriptSteps and have named the variables to match the module names.

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q19 037
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q19 037

    You need to create the pipeline shown. Assume all relevant imports have been done.

    Which Python code segment should you use?

    • DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q19 038
      DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q19 038
    • DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q19 039
      DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q19 039
    • DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q19 040
      DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q19 040
    • DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q19 041
      DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q19 041
    Explanation:
    The steps parameter is an array of steps. To build pipelines that have multiple steps, place the steps in order in this array.
  20. You create a datastore named training_data that references a blob container in an Azure Storage account. The blob container contains a folder named csv_files in which multiple comma-separated values (CSV) files are stored.

    You have a script named train.py in a local folder named ./script that you plan to run as an experiment using an estimator. The script includes the following code to read data from the csv_files folder:

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q20 042
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q20 042

    You have the following script.

    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q20 043
    DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q20 043

    You need to configure the estimator for the experiment so that the script can read the data from a data reference named data_ref that references the csv_files folder in the training_data datastore.

    Which code should you use to configure the estimator?

    • DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q20 044
      DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q20 044
    • DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q20 045
      DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q20 045
    • DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q20 046
      DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q20 046
    • DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q20 047
      DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q20 047
    • DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q20 048
      DP-100 Designing and Implementing a Data Science Solution on Azure Part 03 Q20 048
    Explanation:

    Besides passing the dataset through the input parameters in the estimator, you can also pass the dataset through script_params and get the data path (mounting point) in your training script via arguments. This way, you can keep your training script independent of azureml-sdk. In other words, you will be able use the same training script for local debugging and remote training on any cloud platform.

    Example:
    from azureml.train.sklearn import SKLearn

    script_params = {
    # mount the dataset on the remote compute and pass the mounted path as an argument to the training script
    ‘–data-folder’: mnist_ds.as_named_input(‘mnist’).as_mount(),
    ‘–regularization’: 0.5
    }

    est = SKLearn(source_directory=script_folder,
    script_params=script_params,
    compute_target=compute_target,
    environment_definition=env,
    entry_script=’train_mnist.py’)

    # Run the experiment
    run = experiment.submit(est)
    run.wait_for_completion(show_output=True)

    Incorrect Answers:
    A: Pandas DataFrame not used.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments