DOP-C01 : AWS DevOps Engineer Professional : Part 21



DOP-C01 : AWS DevOps Engineer Professional : Part 21

  1. What is required to achieve gigabit network throughput on EC2? You already selected cluster-compute, 10GB instances with enhanced networking, and your workload is already network-bound, but you are not seeing 10 gigabit speeds.

    • Enable biplex networking on your servers, so packets are non-blocking in both directions and there’s no switching overhead.
    • Ensure the instances are in different VPCs so you don’t saturate the Internet Gateway on any one VPC.
    • Select PIOPS for your drives and mount several, so you can provision sufficient disk throughput.
    • Use a placement group for your instances so the instances are physically near each other in the same Availability Zone.
    Explanation:

    You are not guaranteed 10gigabit performance, except within a placement group. A placement group is a logical grouping of instances within a single Availability Zone. Using placement groups enables applications to participate in a low-latency, 10 Gbps network. Placement groups are recommended for applications that benefit from low network latency, high network throughput, or both.

  2. If you want CloudFormation stack status updates to show up in a continuous delivery system in as close to real time as possible, how should you achieve this?

    • Use a long-poll on the Resources object in your CloudFormation stack and display those state changes in the UI for the system.
    • Use a long-poll on the <code>ListStacksAPI</code> call for your CloudFormation stack and display those state changes in the UI for the system.
    • Subscribe your continuous delivery system to an SNS topic that you also tell your CloudFormation stack to publish events into.
    • Subscribe your continuous delivery system to an SQS queue that you also tell your CloudFormation stack to publish events into.
    Explanation:

    Use NotificationARNs.member.N when making a CreateStack call to push stack events into SNS in nearly real-time.

  3. What does it mean if you have zero IOPS and a non-empty I/O queue for all EBS volumes attached to a running EC2 instance?

    • The I/O queue is buffer flushing.
    • Your EBS disk head(s) is/are seeking magnetic stripes.
    • The EBS volume is unavailable.
    • You need to re-mount the EBS volume in the OS.
    Explanation:
    This is the definition of Unavailable from the EC2 and EBS SLA. “Unavailable” and “Unavailability” mean… For Amazon EBS, when all of your attached volumes perform zero read write IO, with pending IO in the queue.
  4. From a compliance and security perspective, which of these statements is true?

    • You do not ever need to rotate access keys for AWS IAM Users.
    • You do not ever need to rotate access keys for AWS IAM Roles, nor AWS IAM Users.
    • None of the other statements is true.
    • None of the other statements is true.
    Explanation:

    IAM Role Access Keys are auto-rotated by AWS on your behalf; you do not need to rotate them. The application is granted the permissions for the actions and resources that you have defined for the role through the security credentials associated with the role. These security credentials are temporary and we rotate them automatically. We make new credentials available at least five minutes prior to the expiration of the old credentials.

  5. Which of these configuration or deployment practices is a security risk for RDS?

    • Storing SQL function code in plaintext
    • Non-Multi-AZ RDS instance
    • Having RDS and EC2 instances exist in the same subnet
    • RDS in a public subnet
    Explanation:
    Making RDS accessible to the public internet in a public subnet poses a security risk, by making your database directly addressable and spammable. DB instances deployed within a VPC can be configured to be accessible from the Internet or from EC2 instances outside the VPC. If a VPC security group specifies a port access such as TCP port 22, you would not be able to access the DB instance because the firewall for the DB instance provides access only via the IP addresses specified by the DB security groups the instance is a member of and the port defined when the DB instance was created.
  6. Which of these is not a reason a Multi-AZ RDS instance will failover?

    • An Availability Zone outage
    • A manual failover of the DB instance was initiated using Reboot with failover
    • To autoscale to a higher instance class
    • The primary DB instance fails
    Explanation:

    The primary DB instance switches over automatically to the standby replica if any of the > following conditions occur: An Availability Zone outage, the primary DB instance fails, the DB instance’s server type is changed, the operating system of the DB instance is, undergoing software patching, a manual failover of the DB instance was initiated using Reboot with failover.

  7. You need to create an audit log of all changes to customer banking data. You use DynamoDB to store this customer banking data. It is important not to lose any information due to server failures.

    What is an elegant way to accomplish this?

    • Use a DynamoDB StreamSpecification and stream all changes to AWS Lambda. Log the changes to AWS CloudWatch Logs, removing sensitive information before logging.
    • Before writing to DynamoDB, do a pre-write acknoledgment to disk on the application server, removing sensitive information before logging. Periodically rotate these log files into S3.
    • Use a DynamoDB StreamSpecification and periodically flush to an EC2 instance store, removing sensitive information before putting the objects. Periodically flush these batches to S3.
    • Before writing to DynamoDB, do a pre-write acknoledgment to disk on the application server, removing sensitive information before logging. Periodically pipe these files into CloudWatch Logs.
    Explanation:
    All suggested periodic options are sensitive to server failure during or between periodic flushes. Streaming to Lambda and then logging to CloudWatch Logs will make the system resilient to instance and Availability Zone failures.
  8. You need your API backed by DynamoDB to stay online during a total regional AWS failure. You can tolerate a couple minutes of lag or slowness during a large failure event, but the system should recover with normal operation after those few minutes.

    What is a good approach?

    • Set up DynamoDB cross-region replication in a master-standby configuration, with a single standby in another region. Create an Auto Scaling Group behind an ELB in each of the two regions DynamoDB is running in. Add a Route53 Latency DNS Record with DNS Failover, using the ELBs in the two regions as the resource records.
    • Set up a DynamoDB Multi-Region table. Create an Auto Scaling Group behind an ELB in each of the two regions DynamoDB is running in. Add a Route53 Latency DNS Record with DNS Failover, using the ELBs in the two regions as the resource records.
    • Set up a DynamoDB Multi-Region table. Create a cross-region ELB pointing to a cross-region Auto Scaling Group, and direct a Route53 Latency DNS Record with DNS Failover to the crossregion ELB.
    • Set up DynamoDB cross-region replication in a master-standby configuration, with a single standby in another region. Create a cross-region ELB pointing to a cross-region Auto Scaling Group, and direct a Route53 Latency DNS Record with DNS Failover to the cross-region ELB.
    Explanation:

    There is no such thing as a cross-region ELB, nor such thing as a cross-region Auto Scaling Group, nor such thing as a DynamoDB Multi-Region Table. The only option that makes sense is the cross-regional replication version with two ELBs and ASGs with Route53 Failover and Latency DNS.

  9. You have an asynchronous processing application using an Auto Scaling Group and an SQS Queue. The Auto Scaling Group scales according to the depth of the job queue. The completion velocity of the jobs has gone down, the Auto Scaling Group size has maxed out, but the inbound job velocity did not increase.

    What is a possible issue?

    • Some of the new jobs coming in are malformed and unprocessable.
    • The routing tables changed and none of the workers can process events anymore.
    • Someone changed the IAM Role Policy on the instances in the worker group and broke permissions to access the queue.
    • The scaling metric is not functioning correctly.
    Explanation:
    The IAM Role must be fine, as if it were broken, NO jobs would be processed since the system would never be able to get any queue messages. The same reasoning applies to the routing table change. The scaling metric is fine, as instance count increased when the queue depth increased due to more messages entering than exiting. Thus, the only reasonable option is that some of the recent messages must be malformed and unprocessable.
  10. Your company wants to understand where cost is coming from in the company’s production AWS account. There are a number of applications and services running at any given time. Without expending too much initial development time, how best can you give the business a good understanding of which applications cost the most per month to operate?

    • Create an automation script which periodically creates AWS Support tickets requesting detailed intra-month information about your bill.
    • Use custom CloudWatch Metrics in your system, and put a metric data point whenever cost is incurred.
    • Use AWS Cost Allocation Tagging for all resources which support it. Use the Cost Explorer to analyze costs throughout the month.
    • Use the AWS Price API and constantly running resource inventory scripts to calculate total price based on multiplication of consumed resources over time.
    Explanation:

    Cost Allocation Tagging is a built-in feature of AWS, and when coupled with the Cost Explorer, provides a simple and robust way to track expenses. You can also use tags to filter views in Cost Explorer. Note that before you can filter views by tags in Cost Explorer, you must have applied tags to your resources and activate them, as described in the following sections. For more information about Cost Explorer, see Analyzing Your Costs with Cost Explorer.

  11. There is a very serious outage at AWS. EC2 is not affected, but your EC2 instance deployment scripts stopped working in the region with the outage.

    What might be the issue?

    • The AWS Console is down, so your CLI commands do not work.
    • S3 is unavailable, so you can’t create EBS volumes from a snapshot you use to deploy new volumes.
    • AWS turns off the <code>DeployCode</code> API call when there are major outages, to protect from system floods.
    • None of the other answers make sense. If EC2 is not affected, it must be some other issue.
    Explanation:
    S3 stores all snapshots. If S3 is unavailable, snapshots are unavailable. Amazon EC2 also uses Amazon S3 to store snapshots (backup copies) of the data volumes. You can use snapshots for recovering data quickly and reliably in case of application or system failures. You can also use snapshots as a baseline to create multiple new data volumes, expand the size of an existing data volume, or move data volumes across multiple Availability Zones, thereby making your data usage highly scalable. For more information about using data volumes and snapshots, see Amazon Elastic Block Store.
  12. Which of the following tools does not directly support AWS OpsWorks, for monitoring your stacks?

    • AWS Config
    • Amazon CloudWatch Metrics
    • AWS CloudTrail
    • Amazon CloudWatch Logs
    Explanation:
    You can monitor your stacks in the following ways: AWS OpsWorks uses Amazon CloudWatch to provide thirteen custom metrics with detailed monitoring for each instance in the stack; AWS OpsWorks integrates with AWS CloudTrail to log every AWS OpsWorks API call and store the data in an Amazon S3 bucket; You can use Amazon CloudWatch Logs to monitor your stack’s system, application, and custom logs.
  13. What is a circular dependency in AWS CloudFormation?

    • When a Template references an earlier version of itself.
    • When Nested Stacks depend on each other.
    • When Resources form a DependOn loop.
    • When a Template references a region, which references the original Template.
    Explanation:

    To resolve a dependency error, add a DependsOn attribute to resources that depend on other resources in your template. In some cases, you must explicitly declare dependencies so that AWS CloudFormation can create or delete resources in the correct order. For example, if you create an Elastic IP and a VPC with an Internet gateway in the same stack, the Elastic IP must depend on the Internet gateway attachment. For additional information, see DependsOn Attribute.

  14. You need to run a very large batch data processing job one time per day. The source data exists entirely in S3, and the output of the processing job should also be written to S3 when finished. If you need to version control this processing job and all setup and teardown logic for the system, what approach should you use?

    • Model an AWS EMR job in AWS Elastic Beanstalk.
    • Model an AWS EMR job in AWS CloudFormation.
    • Model an AWS EMR job in AWS OpsWorks.
    • Model an AWS EMR job in AWS CLI Composer.
    Explanation:
    To declaratively model build and destroy of a cluster, you need to use AWS CloudFormation. OpsWorks and Elastic Beanstalk cannot directly model EMR Clusters. The CLI is not declarative, and CLI Composer does not exist.
  15. What is true of the way that encryption works with EBS?

    • Snapshotting an encrypted volume makes an encrypted snapshot; restoring an encrypted snapshot creates an encrypted volume when specified / requested.
    • Snapshotting an encrypted volume makes an encrypted snapshot when specified / requested; restoring an encrypted snapshot creates an encrypted volume when specified / requested.
    • Snapshotting an encrypted volume makes an encrypted snapshot; restoring an encrypted snapshot always creates an encrypted volume.
    • Snapshotting an encrypted volume makes an encrypted snapshot when specified / requested; restoring an encrypted snapshot always creates an encrypted volume.
    Explanation:

    Snapshots that are taken from encrypted volumes are automatically encrypted. Volumes that are created from encrypted snapshots are also automatically encrypted. Your encrypted volumes and any associated snapshots always remain protected. For more information, see Amazon EBS Encryption.

  16. When thinking of AWS OpsWorks, which of the following is true?

    • Stacks have many layers, layers have many instances.
    • Instances have many stacks, stacks have many layers.
    • Layers have many stacks, stacks have many instances.
    • Layers have many instances, instances have many stacks.
    Explanation:

    The stack is the core AWS OpsWorks component. It is basically a container for AWS resources – Amazon EC2 instances, Amazon RDS database instances, and so on – that have a common purpose and should be logically managed together. You define the stack’s constituents by adding one or more layers. A layer represents a set of Amazon EC2 instances that serve a particular purpose, such as serving applications or hosting a database server. An instance represents a single computing resource, such as an Amazon EC2 instance.

  17. When thinking of AWS Elastic Beanstalk, which statement is true?

    • Worker tiers pull jobs from SNS.
    • Worker tiers pull jobs from SNS.
    • Worker tiers pull jobs from JSON.
    • Worker tiers pull jobs from SQS.
    Explanation:
    Elastic Beanstalk installs a daemon on each Amazon EC2 instance in the Auto Scaling group to process Amazon SQS messages in the worker environment. The daemon pulls data off the Amazon SQS queue, inserts it into the message body of an HTTP POST request, and sends it to a user-configurable URL path on the local host. The content type for the message body within an HTTP POST request is application/json by default.
  18. Your company needs to automate 3 layers of a large cloud deployment. You want to be able to track this deployment’s evolution as it changes over time, and carefully control any alterations. What is a good way to automate a stack to meet these requirements?

    • Use OpsWorks Stacks with three layers to model the layering in your stack.
    • Use CloudFormation Nested Stack Templates, with three child stacks to represent the three logical layers of your cloud.
    • Use AWS Config to declare a configuration set that AWS should roll out to your cloud.
    • Use Elastic Beanstalk Linked Applications, passing the important DNS entires between layers using the metadata interface.
    Explanation:

    Only CloudFormation allows source controlled, declarative templates as the basis for stack automation. Nested Stacks help achieve clean separation of layers while simultaneously providing a method to control all layers at once when needed.

  19. Your application’s Auto Scaling Group scales up too quickly, too much, and stays scaled when traffic decreases.

    What should you do to fix this?

    • Set a longer cooldown period on the Group, so the system stops overshooting the target capacity. The issue is that the scaling system does not allow enough time for new instances to begin servicing requests before measuring aggregate load again.
    • Calculate the bottleneck or constraint on the compute layer, then select that as the new metric, and set the metric thresholds to the bounding values that begin to affect response latency.
    • Raise the CloudWatch Alarms threshold associated with your autoscaling group, so the scaling takes more of an increase in demand before beginning.
    • Use larger instances instead of many smaller ones, so the Group stops scaling out so much and wasting resources as the OS level, since the OS uses a higher proportion of resources on smaller instances.
    Explanation:

    Systems will always over-scale unless you choose the metric that runs out first and becomes constrained first. You also need to set the thresholds of the metric based on whether or not latency is affected by the change, to justify adding capacity instead of wasting money.

  20. You need the absolute highest possible network performance for a cluster computing application. You already selected homogeneous instance types supporting 10 gigabit enhanced networking, made sure that your workload was network bound, and put the instances in a placement group. What is the last optimization you can make?

    • Use 9001 MTU instead of 1500 for Jumbo Frames, to raise packet body to packet overhead ratios.
    • Segregate the instances into different peered VPCs while keeping them all in a placement group, so each one has its own Internet Gateway.
    • Bake an AMI for the instances and relaunch, so the instances are fresh in the placement group and do not have noisy neighbors.
    • Turn off SYN/ACK on your TCP stack or begin using UDP for higher throughput.
    Explanation:

    For instances that are collocated inside a placement group, jumbo frames help to achieve the maximum network throughput possible, and they are recommended in this case. For more information, see Placement Groups.