mirror of
https://github.com/nickpoida/og-aws.git
synced 2025-02-15 03:11:57 +00:00
Make seciton names unique.
Better internal linking re EMR costs. Fixes #52.
This commit is contained in:
parent
b41c5b1ad4
commit
d14ee5d5f9
1 changed files with 68 additions and 68 deletions
136
README.md
136
README.md
|
@ -452,7 +452,7 @@ Security and IAM
|
|||
|
||||
We cover security basics first, since configuring user accounts is something you usually have to do early on when setting up your system.
|
||||
|
||||
### Basics
|
||||
### Security and IAM Basics
|
||||
|
||||
- 📒 [Homepage](https://aws.amazon.com/iam/) ∙ [User guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html) ∙ [FAQ](https://aws.amazon.com/iam/faqs/)
|
||||
- **IAM** is the service you use to manage accounts and permissioning for AWS.
|
||||
|
@ -465,9 +465,9 @@ We cover security basics first, since configuring user accounts is something you
|
|||
- 🔸The policy language has a complex and error-prone JSON syntax that’s quite confusing, so unless you are an expert, it is wise to base yours off trusted examples or AWS’ own pre-defined [managed policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_managed-vs-inline.html).
|
||||
- At the beginning, IAM policy may be very simple, but for large systems, it will grow in complexity, and need to be managed with care.
|
||||
- 🔹Make sure one person (perhaps with a backup) in your organization is formally assigned ownership of managing IAM policies, make sure every administrator works with that person to have changes reviewed. This goes a long way to avoiding accidental and serious misconfigurations.
|
||||
- It is best to give each user or service the minimum privileges needed to perform their duties. This is the[principle of least privilege](https://en.wikipedia.org/wiki/Principle_of_least_privilege), one of the foundations of good security. Organize all IAM users and groups according to levels of access they need.
|
||||
- It is best to give each user or service the minimum privileges needed to perform their duties. This is the [principle of least privilege](https://en.wikipedia.org/wiki/Principle_of_least_privilege), one of the foundations of good security. Organize all IAM users and groups according to levels of access they need.
|
||||
|
||||
### Tips
|
||||
### Security and IAM Tips
|
||||
|
||||
- 🔹Use IAM to create individual user accounts and **use IAM accounts for all users from the beginning**. This is slightly more work, but not that much.
|
||||
- That way, you define different users, and groups with different levels of privilege (if you want, choose from Amazon’s default suggestions, of administrator, power user, etc.).
|
||||
|
@ -499,7 +499,7 @@ We cover security basics first, since configuring user accounts is something you
|
|||
- [**AWS WAF**](https://aws.amazon.com/waf) is a web application firewall to help you protect your applications from common attack patterns.
|
||||
- 🔹**Export and audit security settings:** You can audit security policies simply by exporting settings using AWS APIs, e.g. using a Boto script like [SecConfig.py](https://gist.github.com/jlevy/cce1b44fc24f94599d0a4b3e613cc15d) (from [this 2013 talk](http://www.slideshare.net/AmazonWebServices/intrusion-detection-in-the-cloud-sec402-aws-reinvent-2013)) and then reviewing and monitoring changes manually or automatically.
|
||||
|
||||
### Gotchas and Limitations
|
||||
### Security and IAM Gotchas and Limitations
|
||||
|
||||
- ❗**Don’t share user credentials.** It’s remarkably common for first-time AWS users create one account and one set of credentials (access key or password), and then use them for a while, sharing among engineers and others within a company. This is easy. But *don’t do this*. This is an insecure practice for many reasons, but in particular, if you do, you will have reduced ability to revoke credentials on a per-user or per-service basis (for example, if an employee leaves or a key is compromised), which can lead to serious complications.
|
||||
- ❗**Instance metadata throttling:** The [instance metadata service](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html) has rate limiting on API calls. If you deploy IAM roles widely (as you should!) and have lots of services, you may hit global account limits easily. One solution is to store have code or scripts cache and reuse the credentials locally (for example, in ~/.aws/credentials).
|
||||
|
@ -509,7 +509,7 @@ We cover security basics first, since configuring user accounts is something you
|
|||
S3
|
||||
--
|
||||
|
||||
### Basics
|
||||
### S3 Basics
|
||||
|
||||
- 📒 [Homepage](https://aws.amazon.com/s3/) ∙ [Developer guide](https://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.html) ∙ [FAQ](https://aws.amazon.com/s3/faqs/) ∙ [Pricing](https://aws.amazon.com/s3/pricing/)
|
||||
- **S3** (Simple Storage Service) is AWS’ standard cloud storage service, offering file (opaque “blob”) storage of arbitrary numbers of files of almost any size, from 0 to 5 TB. (Prior to [2011](https://aws.amazon.com/releasenotes/Amazon-S3/1917932037969964) the maximum size was 5 GB; larger sizes are now well supported via [multipart support](https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html).)
|
||||
|
@ -519,7 +519,7 @@ S3
|
|||
- Although often bucket and key names are provided in APIs individually, it’s also common practice to write an S3 location in the form 's3://bucket-name/path/to/key' (where the key here is 'path/to/key'). (You’ll also see 's3n://' and 's3a://' prefixes [in Hadoop systems](https://wiki.apache.org/hadoop/AmazonS3).)
|
||||
- AWS offers many storage services, and several besides S3 offer file-type abstractions. [Glacier](#glacier) is for cheaper and infrequently accessed archival storage. [EBS](#ebs), unlike S3, allows random access to file contents via a traditional filesystem, but can only be attached to one EC2 instance at a time.
|
||||
|
||||
### Tips
|
||||
### S3 Tips
|
||||
|
||||
- For most practical purposes, you can consider S3 capacity unlimited, both in total size of files and number of objects.
|
||||
- **Bucket naming:** Buckets are chosen from a global namespace (across all regions, even though S3 itself stores data in [whichever S3 region](https://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region) you select), so you’ll find many bucket names are already taken. Creating a bucket means taking ownership of the name until you delete it. Bucket names have [a few restrictions](https://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html) on them.
|
||||
|
@ -594,7 +594,7 @@ S3
|
|||
- If you are primarily using a VPC, consider setting up a [VPC Endpoint](http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-endpoints.html) for S3 in order to allow your VPC-hosted resources to easily access it without the need for extra network configuration or hops.
|
||||
- **Cross-region replication:** S3 has [a feature](https://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html) for replicating a bucket between one region and a another. Note that S3 is already highly replicated within one region, so usually this isn’t necessary, except for compliance or latency reasons.
|
||||
|
||||
### Gotchas and Limitations
|
||||
### S3 Gotchas and Limitations
|
||||
|
||||
- 🔸For many years, there was a notorious [**100-bucket limit**](https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html#limits_s3) per account, which could not be raised and caused many companies significant pain. As of 2015, you can [request increases](https://aws.amazon.com/about-aws/whats-new/2015/08/amazon-s3-introduces-new-usability-enhancements/). You can ask for a raise in the number of buckets but it will still be capped (generally below ~1000 per account).
|
||||
- 🔸Be careful not to make implicit assumptions about transactionality or sequencing of updates to objects. Never assume that if you modify a sequence of objects, the clients will see the same modifications in the same sequence, or if you upload a whole bunch of files, that they will all appear at once to all clients.
|
||||
|
@ -620,19 +620,19 @@ As an illustration of comparative features and price, the table below gives S3 S
|
|||
EC2
|
||||
---
|
||||
|
||||
### Basics
|
||||
### EC2 Basics
|
||||
|
||||
- 📒 [Homepage](https://aws.amazon.com/ec2/) ∙ [Documentation](https://aws.amazon.com/documentation/ec2/) ∙ [FAQ](https://aws.amazon.com/ec2/faqs/) ∙ [Pricing](https://aws.amazon.com/ec2/pricing/) (see also [ec2instances.info](http://www.ec2instances.info/)\)
|
||||
- **EC2** (Elastic Compute Cloud) is AWS’ offering of the most fundamental piece of cloud computing: A [virtual private server](https://en.wikipedia.org/wiki/Virtual_private_server). These “instances” and can run [most Linux, BSD, and Windows operating systems](https://aws.amazon.com/ec2/faqs/#What_operating_system_environments_are_supported). Internally, they use [Xen](https://en.wikipedia.org/wiki/Xen) virtualization.
|
||||
- The term “EC2” is sometimes used to refer to the servers themselves, but technically refers more broadly to a whole collection of supporting services, too, like load balancing (ELBs/ALBs), IP addresses (EIPs), bootable images (AMIs), security groups, and network drives (EBS) (which we discuss individually in this guide).
|
||||
|
||||
### Alternatives and Lock-In
|
||||
### EC2 Alternatives and Lock-In
|
||||
|
||||
- Running EC2 is akin to running a set of physical servers, as long as you don’t do automatic scaling or tooled cluster setup. If you just run a set of static instances, migrating to another VPS or dedicated server provider should not be too hard.
|
||||
- 🚪**Alternatives to EC2:** The direct alternatives are Google Cloud, Microsoft Azure, Rackspace, DigitalOcean and other VPS providers, some of which offer similar API for setting up and removing instances. (See the comparisons [above](#when-to-use-aws).)
|
||||
- **Should you use Amazon Linux?** AWS encourages use of their own [Amazon Linux](https://aws.amazon.com/amazon-linux-ami/), which is evolved from from [Red Hat Enterprise Linux (RHEL)](https://en.wikipedia.org/wiki/Red_Hat_Enterprise_Linux) and [CentOS](https://en.wikipedia.org/wiki/CentOS). It’s used by many, but [others are skeptical](https://www.exratione.com/2014/08/do-not-use-amazon-linux/). Whatever you do, think this decision through carefully. It’s true Amazon Linux is heavily tested and better supported in the unlikely event you have deeper issues with OS and virtualization on EC2. But in general, many companies do just fine using a standard, non-Amazon Linux distribution, such as Ubuntu or CentOS. Using a standard Linux distribution means you have an exactly replicable environment should you use another hosting provider instead of (or in addition to) AWS. It’s also helpful if you wish to test deployments on local developer machines running the same standard Linux distribution (a practice that’s getting more common with Docker, too, and not currently possible with Amazon Linux).
|
||||
|
||||
### Tips
|
||||
### EC2 Tips
|
||||
|
||||
- 🔹**Picking regions:** When you first set up, consider which [regions](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions) you want to use first. Many people in North America just automatically set up in the us-east-1 (N. Virginia) region, which is the default, but it’s worth considering if this is best up front. For example, you might find it preferable to start in us-west-1 (N. California) or us-west-2 (Oregon) if you’re in California and latency matters. Some services [are not available in all regions](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/). Baseline costs also [vary by region](https://aws.amazon.com/ec2/pricing/), up to 10-30% (generally lowest in us-east-1).
|
||||
- **Instance types:** EC2 instances come in many types, corresponding to the capabilities of the virtual machine in CPU architecture and speed, RAM, disk sizes and types (SSD or magnetic), and network bandwidth.
|
||||
|
@ -655,9 +655,9 @@ EC2
|
|||
|
||||
- **GPU support:** You can rent GPU-enabled instances on EC2. There are [two instance types](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using_cluster_computing.html). Both sport an NVIDIA card (K520, 1536 CUDA cores and M2050, 448 CUDA cores).
|
||||
|
||||
### 💸 Cost Management
|
||||
### EC2 Cost Management
|
||||
|
||||
This section covers EC2 cost management; be sure to read the [general cost management section](#billing-and-cost-management) as well.
|
||||
💸 This section covers EC2 cost management; be sure to read the [general cost management section](#billing-and-cost-management) as well.
|
||||
|
||||
- With EC2, there is a trade-off between engineering effort (more analysis, more tools, more complex architectures) and spend rate on AWS. If your EC2 costs are small, many of the efforts here are not worth the engineering time required to make them work. But once you know your costs will be growing in excess of an engineer’s salary, serious investment is often worthwhile.
|
||||
- 🔹**Spot instances:**
|
||||
|
@ -682,7 +682,7 @@ This section covers EC2 cost management; be sure to read the [general cost manag
|
|||
- If you have multiple AWS accounts that are linked with Consolidated Billing, plan on using reservations, and want unused reservation capacity to be able to apply to compute hours from other accounts, you’ll need to create your instances in the availability zone with the same *name* across accounts. Keep in mind that when you have done this, your instances may not end up in the same *physical* data center across accounts - Amazon shuffles availability zones names across accounts in order to equalize resource utilization.
|
||||
- Make use of dynamic [Auto Scaling](#auto-scaling), where possible, in order to better match your cluster size (and cost) to the current resource requirements of your service.
|
||||
|
||||
### Gotchas and Limitations
|
||||
### EC2 Gotchas and Limitations
|
||||
|
||||
- ❗Never use ssh passwords. Just don’t do it; they are too insecure, and consequences of compromise too severe. Use keys instead. [Read up on this](https://www.digitalocean.com/community/tutorials/how-to-set-up-ssh-keys--2) and fully disable ssh password access to your ssh server by making sure 'PasswordAuthentication no' is in your /etc/ssh/sshd_config file. If you’re careful about managing ssh private keys everywhere they are stored, it is a major improvement on security over password-based authentication.
|
||||
- 🔸For all [newer instance types](https://aws.amazon.com/amazon-linux-ami/instance-type-matrix/), when selecting the AMI to use, be sure you select the HVM AMI, or it just won’t work.
|
||||
|
@ -695,7 +695,7 @@ This section covers EC2 cost management; be sure to read the [general cost manag
|
|||
AMIs
|
||||
----
|
||||
|
||||
### Tips
|
||||
### AMI Tips
|
||||
|
||||
- 📒 [User guide](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html)
|
||||
- **AMIs** (Amazon Machine Images) are immutable images that are used to launch preconfigured EC2 instances. They come in both public and private flavors. Access to public AMIs is either freely available (shared/community AMIs) or bought and sold in the [**AWS Marketplace**](http://aws.amazon.com/marketplace).
|
||||
|
@ -723,7 +723,7 @@ AMIs
|
|||
Auto Scaling
|
||||
------------
|
||||
|
||||
### Basics
|
||||
### Auto Scaling Basics
|
||||
|
||||
- 📒 [Homepage](https://aws.amazon.com/autoscaling/) ∙ [User guide](http://docs.aws.amazon.com/autoscaling/latest/userguide/) ∙ [FAQ](https://aws.amazon.com/ec2/faqs/#Auto_Scaling) ∙ [Pricing](https://aws.amazon.com/autoscaling/pricing/) at no additional charge
|
||||
- [**Auto Scaling Groups (ASGs)**](https://aws.amazon.com/autoscaling/) are used to control the number of instances in a service, reducing manual effort to provision or deprovision EC2 instances.
|
||||
|
@ -731,7 +731,7 @@ Auto Scaling
|
|||
- There are three common ways of using ASGs - dynamic (automatically adjust instance count based on metrics for things like CPU utilization), static (maintain a specific instance count at all times), scheduled (maintain different instance counts at different times of day or on days of the week).
|
||||
- 💸ASGs [have no additional charge](https://aws.amazon.com/autoscaling/pricing/) themselves; you pay for underlying EC2 and CloudWatch services.
|
||||
|
||||
### Tips
|
||||
### Auto Scaling Tips
|
||||
|
||||
- 💸 Better matching your cluster size to your current resource requirements through use of ASGs can result in significant cost savings for many types of workloads.
|
||||
- Pairing ASGs with ELBs is a common pattern used to deal with changes in the amount of traffic a service receives.
|
||||
|
@ -742,13 +742,13 @@ Auto Scaling
|
|||
EBS
|
||||
---
|
||||
|
||||
### Basics
|
||||
### EBS Basics
|
||||
|
||||
- 📒 [Homepage](https://aws.amazon.com/ebs/) ∙ [User guide](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AmazonEBS.html) ∙ [FAQ](https://aws.amazon.com/ebs/faqs/) ∙ [Pricing](https://aws.amazon.com/ebs/pricing/)
|
||||
- **EBS** (Elastic Block Store) provides block level storage. That is, it offers storage volumes that can be attached as filesystems, like traditional network drives.
|
||||
- EBS volumes can only be attached to one EC2 instance at a time. In contrast, EFS can be shared but has a much higher price point ([a comparison](http://stackoverflow.com/questions/29575877/aws-efs-vs-ebs-vs-s3-differences-when-to-use)).
|
||||
|
||||
### Tips
|
||||
### EBS Tips
|
||||
|
||||
- ⏱**RAID:** Use [RAID drives](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/raid-config.html) for [increased performance](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSPerformance.html).
|
||||
- ⏱A worthy read is AWS’ [post on EBS IO characteristics](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-io-characteristics.html) as well as their [performance tips](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSPerformance.html#d0e86148).
|
||||
|
@ -756,7 +756,7 @@ EBS
|
|||
- ⏱A single EBS volume allows 10k IOPS max. To get the maximum performance out of an EBS volume, it has to be of a maximum size and attached to an EBS-optimized EC2 instance.
|
||||
- A standard block size for an EBS volume is 16kb.
|
||||
|
||||
### Gotchas and Limitations
|
||||
### EBS Gotchas and Limitations
|
||||
|
||||
- ❗EBS durability is reasonably good for a regular hardware drive (annual failure rate of [between 0.1% - 0.2%](http://aws.amazon.com/ebs/details/#availabilityanddurability)). On the other hand, that is very poor if you don’t have backups! By contrast, S3 durability is extremely high. *If you care about your data, back it up S3 with snapshots.*
|
||||
- 🔸EBS has an [**SLA**](http://aws.amazon.com/ec2/sla/) with **99.95%** uptime. See notes on high availability below.
|
||||
|
@ -826,12 +826,12 @@ Load Balancers
|
|||
Elastic IPs
|
||||
-----------
|
||||
|
||||
### Basics
|
||||
### Elastic IP Basics
|
||||
|
||||
- 📒 [Documentation](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html) ∙ [FAQ](https://aws.amazon.com/ec2/faqs/#Elastic_IP) ∙ [Pricing](https://aws.amazon.com/ec2/pricing/#Elastic_IP_Addresses)
|
||||
- **Elastic IPs** are static IP addresses you can rent from AWS to assign to EC2 instances.
|
||||
|
||||
### Tips
|
||||
### Elastic IP Tips
|
||||
|
||||
- 🔹**Prefer load balancers to elastic IPs:** For single-instance deployments, you could just assign elastic IP to an instance, give that IP a DNS name, and consider that your deployment. Most of the time, you should provision a [load balancer](#load-balancers) instead:
|
||||
- It’s easy to add and remove instances from load balancers. It's also quicker to add or remove instances from a load balancer than to reassign an elastic IP.
|
||||
|
@ -841,25 +841,25 @@ Elastic IPs
|
|||
- If an Elastic IP is not attached to an active resource there is a small [hourly fee](https://aws.amazon.com/ec2/pricing/#Elastic_IP_Addresses).
|
||||
- Elastic IPs are [no extra charge](https://aws.amazon.com/ec2/pricing/#Elastic_IP_Addresses) as long as you’re using them. They have a (small) cost when not in use, which is a mechanism to prevent people from squatting on excessive numbers of IP addresses.
|
||||
|
||||
### Gotchas and Limitations
|
||||
### Elastic IP Gotchas and Limitations
|
||||
|
||||
- 🔸There is [officially no way](https://forums.aws.amazon.com/thread.jspa?threadID=171550) to allocate a contiguous block of IP addresses, something you may desire when giving IPs to external users. Though when allocating at once, you may get lucky and have some be part of the same CIDR block.
|
||||
|
||||
Glacier
|
||||
-------
|
||||
|
||||
### Basics
|
||||
### Glacier Basics
|
||||
|
||||
- 📒 [Homepage](https://aws.amazon.com/glacier/) ∙ [Developer guide](http://docs.aws.amazon.com/amazonglacier/latest/dev/) ∙ [FAQ](https://aws.amazon.com/glacier/faqs/) ∙ [Pricing](https://aws.amazon.com/glacier/pricing/)
|
||||
- **Glacier** is a lower-cost alternative to S3 when data is infrequently accessed, such as for archival purposes.
|
||||
- It’s only useful for data that is rarely accessed. It generally takes [3-5 hours](https://aws.amazon.com/glacier/faqs/#dataretrievals) to fulfill a retrieval request.
|
||||
- AWS [has not officially revealed](https://en.wikipedia.org/wiki/Amazon_Glacier#Storage) the storage media used by Glacier; it may be low-spin hard drives or even tapes.
|
||||
|
||||
### Tips
|
||||
### Glacier Tips
|
||||
|
||||
- You can physically [ship](https://aws.amazon.com/blogs/aws/send-us-that-data/) your data to Amazon to put on Glacier on a USB or eSATA HDD.
|
||||
|
||||
### Gotchas and Limitations
|
||||
### Glacier Gotchas and Limitations
|
||||
|
||||
- 🔸Getting files off Glacier is glacially slow (typically 3-5 hours or more).
|
||||
- 🔸Due to a fixed overhead per file (you pay per PUT or GET operation), uploading and downloading many small files on/to Glacier might be very expensive. There is also a 32k storage overhead per file. Hence it’s a good idea is to archive files before upload.
|
||||
|
@ -868,18 +868,18 @@ Glacier
|
|||
RDS
|
||||
---
|
||||
|
||||
### Basics
|
||||
### RDS Basics
|
||||
|
||||
- 📒 [Homepage](https://aws.amazon.com/rds/) ∙ [User guide](http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/) ∙ [FAQ](https://aws.amazon.com/rds/faqs/) ∙ [Pricing](https://aws.amazon.com/rds/pricing/)
|
||||
- **RDS** is a managed relational database service, allowing you to deploy and scale databases more easily. It supports Amazon Aurora, Oracle, Microsoft SQL Server, PostgreSQL, MySQL and MariaDB.
|
||||
|
||||
### Tips
|
||||
### RDS Tips
|
||||
|
||||
- If you’re looking for the managed convenience of RDS for MongoDB, this isn’t offered by AWS directly, but you may wish to consider a provider such as [**mLab**](https://mlab.com/).
|
||||
- MySQL RDS allows access to [binary logs](http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_LogAccess.Concepts.MySQL.html#USER_LogAccess.MySQL.BinaryFormat).
|
||||
- 🔸**MySQL vs MariaDB vs Aurora:** If you prefer a MySQL-style database but are starting something new, you probably should consider Aurora and MariaDB as well. **Aurora** has increased availability and is the next-generation solution. That said, Aurora [may not be](http://blog.takipi.com/benchmarking-aurora-vs-mysql-is-amazons-new-db-really-5x-faster/) as fast relative to MySQL as is sometimes reported, and is more complex to administer. **MariaDB**, the modern [community fork](https://en.wikipedia.org/wiki/MariaDB) of MySQL, [likely now has the edge over MySQL](http://cloudacademy.com/blog/mariadb-vs-mysql-aws-rds/) for many purposes and is supported by RDS.
|
||||
|
||||
### Gotchas and Limitations
|
||||
### RDS Gotchas and Limitations
|
||||
|
||||
- RDS instances run on EBS volumes, and hence are constrained by the EBS performance.
|
||||
- ⏱RDS instances run on EBS volumes, and hence are constrained by the EBS performance.
|
||||
|
@ -888,24 +888,24 @@ RDS
|
|||
DynamoDB
|
||||
--------
|
||||
|
||||
### Basics
|
||||
### DynamoDB Basics
|
||||
|
||||
- 📒 [Homepage](https://aws.amazon.com/dynamodb/) ∙ [Developer guide](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/) ∙ [FAQ](https://aws.amazon.com/dynamodb/faqs/) ∙ [Pricing](https://aws.amazon.com/dynamodb/pricing/)
|
||||
- **DynamoDB** is a [NoSQL](https://en.wikipedia.org/wiki/NoSQL) database with focuses on speed, flexibility, and scalability.
|
||||
- DynamoDB is priced on a combination of throughput and storage.
|
||||
|
||||
### Alternatives and Lock-in
|
||||
### DynamoDB Alternatives and Lock-in
|
||||
|
||||
- ⛓ Unlike the technologies behind many other Amazon products, DynamoDB is a proprietary AWS product with no interface-compatible alternative available as an open source project. If you tightly couple your application to its API and featureset, it will take significant effort to replace.
|
||||
- The most commonly used alternative to DynamoDB is [Cassandra](http://cassandra.apache.org/).
|
||||
|
||||
### Tips
|
||||
### DynamoDB Tips
|
||||
|
||||
- There is a [local version](https://aws.amazon.com/blogs/aws/dynamodb-local-for-desktop-development/) of DynamoDB provided for developer use.
|
||||
- [DynamoDB Streams](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html) provides an ordered stream of changes to a table. Use it to replicate, back up, or drive events off of data
|
||||
- DynamoDB can be used [as a simple locking service](https://gist.github.com/ryandotsmith/c95fd21fab91b0823328).
|
||||
|
||||
### Gotchas and Limitations
|
||||
### DynamoDB Gotchas and Limitations
|
||||
|
||||
- 🔸 DynamoDB doesn’t provide an easy way to bulk-load data (it is possible through [Data Pipeline](http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-importexport-ddb-part1.html), and this has some [unfortunate consequences](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.AvoidExcessivePTIncreases). Since you need to use the regular service APIs to update existing or create new rows, it is common to temporarily turn up a destination table’s write throughput to speed import. But when the table’s write capacity is increased, DynamoDB may do an irreversible split of the partitions underlying the table, spreading the total table capacity evenly across the new generation of tables. Later, if the capacity is reduced, the capacity for each partition is also reduced but the total number of partitions is not, leaving less capacity for each partition. This leaves the table in a state where it much easier for hotspots to overwhelm individual partitions.
|
||||
- It is important to make sure that DynamoDB [resource limits](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html#limits-data-types) are compatible with your dataset and workload. For example, the maximum size value that can be added to a DynamoDB table is 400 KB (larger items can be stored in S3 and a URL stored in DynamoDB).
|
||||
|
@ -913,7 +913,7 @@ DynamoDB
|
|||
ECS
|
||||
---
|
||||
|
||||
### Basics
|
||||
### ECS Basics
|
||||
|
||||
- 📒 [Homepage](https://aws.amazon.com/ecs/) ∙ [Developer guide](http://docs.aws.amazon.com/AmazonECS/latest/developerguide/) ∙ [FAQ](https://aws.amazon.com/ecs/faqs/) ∙ [Pricing](https://aws.amazon.com/ecs/pricing/)
|
||||
- **ECS** (EC2 Container Service) is a relatively new service (launched end of 2014) that manages clusters of services deployed via Docker.
|
||||
|
@ -927,7 +927,7 @@ ECS
|
|||
- If you want fast fleet-wide pulls of large images, you’ll need to push your image into a region-local registry.
|
||||
- Doesn’t support custom domains / certificates.
|
||||
|
||||
### Tips
|
||||
### ECS Tips
|
||||
|
||||
- [This blog from Convox](https://convox.com/blog/ecs-challenges/) (and [commentary](https://news.ycombinator.com/item?id=11598058)) lists a number of common challenges with ECS as of early 2016.
|
||||
|
||||
|
@ -936,12 +936,12 @@ ECS
|
|||
Lambda
|
||||
------
|
||||
|
||||
### Basics
|
||||
### Lambda Basics
|
||||
|
||||
- 📒 [Homepage](https://aws.amazon.com/lambda/) ∙ [Developer guide](http://docs.aws.amazon.com/lambda/latest/dg/) ∙ [FAQ](https://aws.amazon.com/lambda/faqs/) ∙ [Pricing](https://aws.amazon.com/lambda/pricing/)
|
||||
- **Lambda** is a relatively new service (launched at end of 2014) that offers a different type of compute abstraction: A user-defined function that can perform a small operation, where AWS manages provisioning and scheduling how it is run.
|
||||
|
||||
### Tips
|
||||
### Lambda Tips
|
||||
|
||||
- **What does “serverless” mean?** This idea of using Lambda for application logic has grown to be called **serverless** since you don't explicitly manage any server instances, as you would with EC2. This term is a bit confusing since the functions themselves do of course run on servers managed by AWS. [Serverless, Inc.](http://serverless.com/) also uses this word for the name of their company and [their own open source framework](https://github.com/serverless/serverless), but the term is usually meant more generally.
|
||||
- The release of Lambda and [API Gateway](#api-gateway) in 2015 triggered a startlingly rapid adoption in 2016, with many people writing about [serverless architectures](http://martinfowler.com/articles/serverless.html) in which many applications traditionally solved by managing EC2 servers can be built without explicitly managing servers at all.
|
||||
|
@ -949,11 +949,11 @@ Lambda
|
|||
- The [Awesome Serverless](https://github.com/anaibol/awesome-serverless) list gives a good set of examples of the relatively new set of tools and frameworks around Lambda.
|
||||
- The [**Serverless framework**](https://github.com/serverless/serverless) is a leading new approach designed to help group and manage Lambda functions. It’s approaching version 1 as of August 2016) and is popular among a small number of users.
|
||||
|
||||
### Alternatives and Lock-in
|
||||
### Lambda Alternatives and Lock-in
|
||||
|
||||
- 🚪Other clouds offer similar services with different names, including [Google Cloud Functions](https://cloud.google.com/functions/), [Azure Functions](https://azure.microsoft.com/en-us/services/functions/), and [IBM OpenWhisk](http://www.ibm.com/cloud-computing/bluemix/openwhisk/).
|
||||
|
||||
### Gotchas and Limitations
|
||||
### Lambda Gotchas and Limitations
|
||||
|
||||
- 🔸Lambda is a new technology. As of mid 2016, only a few companies are using it for large-scale production applications.
|
||||
- 🔸Managing lots of Lambda functions is a workflow challenge, and tooling to manage Lambda deployments is still immature.
|
||||
|
@ -964,14 +964,14 @@ Lambda
|
|||
API Gateway
|
||||
-----------
|
||||
|
||||
### Basics
|
||||
### API Gateway Basics
|
||||
|
||||
- 📒 [Homepage](https://aws.amazon.com/api-gateway/) ∙ [Developer guide](http://docs.aws.amazon.com/apigateway/latest/developerguide/) ∙ [FAQ](https://aws.amazon.com/api-gateway/faqs/) ∙ [Pricing](https://aws.amazon.com/api-gateway/pricing/)
|
||||
- **API Gateway** provides a scalable, secured front-end for service APIs, and can work with Lambda, Elastic Beanstalk, or regular EC2 services.
|
||||
- It allows “serverless” deployment of applications built with Lambda.
|
||||
- 🔸Switching over deployments after upgrades can be tricky. There are no built-in mechanisms to have a single domain name migrate from one API gateway to another one. So it may be necessary to build an additional layer in front (even another API Gateway) to allow smooth migration from one deployment to another.
|
||||
|
||||
### Gotchas and Limitations
|
||||
### API Gateway Gotchas and Limitations
|
||||
|
||||
- 🔸API Gateway only supports encrypted (https) endpoints, and does not support unencrypted HTTP. (This is probably a good thing.)
|
||||
- 🔸API Gateway endpoints are public — there is no mechanism to build private endpoints, e.g. for internal use.
|
||||
|
@ -981,19 +981,19 @@ API Gateway
|
|||
Route 53
|
||||
--------
|
||||
|
||||
### Basics
|
||||
### Route 53 Basics
|
||||
|
||||
- 📒 [Homepage](https://aws.amazon.com/route53/) ∙ [Developer guide](http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/) ∙ [FAQ](https://aws.amazon.com/route53/faqs/) ∙ [Pricing](https://aws.amazon.com/route53/pricing/)
|
||||
- **Route 53** is AWS’ DNS service.
|
||||
|
||||
### Alternatives and Lock-In
|
||||
### Route 53 Alternatives and Lock-In
|
||||
|
||||
- Historically, AWS was slow to penetrate the DNS market (as it is often driven by perceived reliability and long-term vendor relationships) but Route 53 has matured and [is becoming the standard option](https://www.datanyze.com/market-share/dns/) for many companies. Route 53 is cheap by historic DNS standards, as it has a fairly large global network with geographic DNS and other formerly “premium” features. It’s convenient if you are already using AWS.
|
||||
- ⛓Generally you don’t get locked into a DNS provider for simple use cases, but increasingly become tied in once you use specific features like geographic routing or Route 53’s alias records.
|
||||
- 🚪Many alternative DNS providers exist, ranging from long-standing premium brands like [UltraDNS](https://www.neustar.biz/services/dns-services) and [Dyn](http://dyn.com/managed-dns/) to less well known, more modestly priced brands like [DNSMadeEasy](http://www.dnsmadeeasy.com/). Most DNS experts will tell you that the market is opaque enough that reliability and performance don’t really correlate well with price.
|
||||
- ⏱Route 53 is usually somewhere in the middle of the pack on performance tests, e.g. the [SolveDNS reports](http://www.solvedns.com/dns-comparison/).
|
||||
|
||||
### Tips
|
||||
### Route 53 Tips
|
||||
|
||||
- 🔹Know about Route 53’s “alias” records:
|
||||
- Route 53 supports all the standard DNS record types, but note that [**alias resource record sets**](http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/resource-record-sets-choosing-alias-non-alias.html) are not standard part of DNS, but a specific Route 53 feature. (It’s available from other DNS providers too, but each provider has a different name for it.)
|
||||
|
@ -1007,21 +1007,21 @@ Route 53
|
|||
CloudFormation
|
||||
--------------
|
||||
|
||||
### Basics
|
||||
### CloudFormation Basics
|
||||
|
||||
- 📒 [Homepage](https://aws.amazon.com/cloudformation/) ∙ [Developer guide](http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/) ∙ [FAQ](https://aws.amazon.com/cloudformation/faqs/) ∙ [Pricing](https://aws.amazon.com/cloudformation/pricing/) at no additional charge
|
||||
- **CloudFormation** offers mechanisms to create and manage entire configurations of many types of AWS resources, using a JSON-based templating language.
|
||||
- 💸CloudFormation itself has [no additional charge](https://aws.amazon.com/cloudformation/pricing/) itself; you pay for the underlying resources.
|
||||
|
||||
### Alternatives and Lock-In
|
||||
### CloudFormation Alternatives and Lock-In
|
||||
|
||||
- Hashicorp’s [Terraform](https://www.terraform.io/intro/vs/cloudformation.html) is a third-party alternative.
|
||||
|
||||
### Tips
|
||||
### CloudFormation Tips
|
||||
|
||||
- [Troposphere](https://github.com/cloudtools/troposphere) is a Python library that makes it much easier to create CloudFormation templates.
|
||||
|
||||
### Gotchas and Limitations
|
||||
### CloudFormation Gotchas and Limitations
|
||||
|
||||
- 🔸CloudFormation is useful but complex and with a variety of pain points. Many companies find alternate solutions, and many companies use it, but only with significant additional tooling.
|
||||
- 🔸CloudFormation syntax is an awkward JSON format that makes both reading and debugging difficult.
|
||||
|
@ -1033,14 +1033,14 @@ CloudFormation
|
|||
VPCs, Network Security, and Security Groups
|
||||
-------------------------------------------
|
||||
|
||||
### Basics
|
||||
### VPC Basics
|
||||
|
||||
- 📒 [Homepage](https://aws.amazon.com/vpc/) ∙ [User guide](http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide) ∙ [FAQ](https://aws.amazon.com/vpc/faqs/) ∙ [Security groups](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html) ∙ [Pricing](https://aws.amazon.com/vpc/pricing/)
|
||||
- **VPC** (Virtual Private Cloud) is the virtualized networking layer of your AWS systems.
|
||||
- Most AWS users should have a basic understanding of VPC concepts, but few need to get into all the details. VPC configurations can be trivial or extremely complex, depending on the extent of your network and security needs.
|
||||
- All modern AWS accounts (those created [after 2013-12-04](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-vpc.html)) are “EC2-VPC” accounts that support VPCs, and all instances will be in a default VPC. Older accounts may still be using “EC2-Classic” mode. Some features don’t work without VPCs, so you probably will want to [migrate](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/vpc-migrate.html).
|
||||
|
||||
### Tips
|
||||
### VPC and Network Security Tips
|
||||
|
||||
- ❗**Security groups** are your first line of defense for your servers. Be extremely restrictive of what ports are open to all incoming connections. In general, if you use ELBs, ALBs or other load balancing, the only ports that need to be open to incoming traffic would be port 22 and whatever port your application uses.
|
||||
- **Port hygiene:** A good habit is to pick unique ports within an unusual range for each different kind of production service. For example, your web fronted might use 3010, your backend services 3020 and 3021, and your Postgres instances the usual 5432. Then make sure you have fine-grained security groups for each set of servers. This makes you disciplined about listing out your services, but also is more error-proof. For example, should you accidentally have an extra Apache server running on the default port 80 on a backend server, it will not be exposed.
|
||||
|
@ -1052,7 +1052,7 @@ VPCs, Network Security, and Security Groups
|
|||
- e.g. A bug in the YAML parser used by the Ruby on Rails admin site is much less serious when the admin site is only visible to the private network and accessed through VPN.
|
||||
- Another common pattern (especially as deployments get larger, security or regulatory requirements get more stringent, or team sizes increase) is to provide a [bastion host](https://www.pandastrike.com/posts/20141113-bastion-hosts) behind a VPN through which all SSH connections need to transit.
|
||||
|
||||
### Gotchas and Limitations
|
||||
### VPC and Netork Security Gotchas and Limitations
|
||||
|
||||
- 🔸Security groups are not shared across data centers, so if you have infrastructure in multiple data centers, you should make sure your configuration/deployment tools take that into account.
|
||||
- ❗Be careful when choosing your VPC IP CIDR block: If you are going to need to make use of [ClassicLink](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/vpc-classiclink.html), make sure that your private IP range [doesn’t overlap](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/vpc-classiclink.html#classiclink-limitations) with that of EC2 Classic.
|
||||
|
@ -1061,12 +1061,12 @@ VPCs, Network Security, and Security Groups
|
|||
KMS
|
||||
---
|
||||
|
||||
### Basics
|
||||
### KMS Basics
|
||||
|
||||
- 📒 [Homepage](https://aws.amazon.com/kms/) ∙ [Developer guide](http://docs.aws.amazon.com/kms/latest/developerguide/) ∙ [FAQ](https://aws.amazon.com/kms/faqs/) ∙ [Pricing](https://aws.amazon.com/kms/pricing/)
|
||||
- **KMS** (Key Management Service) is secure service for storing keys, such encryption keys for [EBS](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSEncryption.html) and [S3](http://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html).
|
||||
|
||||
### Tips
|
||||
### KMS Tips
|
||||
|
||||
- 🔹It’s very common for companies to manage keys completely via home-grown mechanisms, but it’s far preferable to use a service such as KMS from the beginning, as it encourages more secure design and improves policies and processes around managing keys.
|
||||
- A good motivation and overview is in [this AWS presentation](http://www.slideshare.net/AmazonWebServices/encryption-and-key-management-in-aws).
|
||||
|
@ -1077,17 +1077,17 @@ KMS
|
|||
CloudFront
|
||||
----------
|
||||
|
||||
### Basics
|
||||
### CloudFront Basics
|
||||
|
||||
- 📒 [Homepage](https://aws.amazon.com/cloudfront/) ∙ [Developer guide](http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/) ∙ [FAQ](https://aws.amazon.com/cloudfront/faqs/) ∙ [Pricing](https://aws.amazon.com/cloudfront/pricing/)
|
||||
- **CloudFront** is AWS’ [content delivery network (CDN)](https://en.wikipedia.org/wiki/Content_delivery_network).
|
||||
- Its primary use is improving latency for end users in to accessing cacheable content by hosting it at [about 40 global edge locations](http://aws.amazon.com/cloudfront/details/).
|
||||
|
||||
### Alternatives and Lock-in
|
||||
### CloudFront Alternatives and Lock-in
|
||||
|
||||
- 🚪CDNs are [a highly fragmented market](https://www.datanyze.com/market-share/cdn/). CloudFront has grown to be a leader, but many alternatives that might better suit specific needs.
|
||||
|
||||
### Tips
|
||||
### CloudFront Tips
|
||||
|
||||
- In its basic version, CloudFront [supports SSL](http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/SecureConnections.html) via the [SNI extension to TLS](https://en.wikipedia.org/wiki/Server_Name_Indication), which is supported by all modern web browsers. If you need to support older browsers, you need to pay a few hundred dollars a month for dedicated IPs.
|
||||
- 💸⏱Consider invalidation needs carefully. CloudFront [does support invalidation](http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Invalidation.html) of objects from edge locations, but this typically takes many minutes to propagate to edge locations, and costs $0.005 per request after the first 1000 requests. (Some other CDNs support this better.)
|
||||
|
@ -1095,7 +1095,7 @@ CloudFront
|
|||
- An alternative to invalidation that is often easier to manage, and instant, is to configure the distribution to [cache with query strings](http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/QueryStringParameters.html) and then append unique query strings with versions onto assets that are updated frequently.
|
||||
- ⏱For good web performance, it’s important turn on the option to [enable compression](http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html) on CloudFront distributions if the origin is S3 or another source that does not already compress.
|
||||
|
||||
### Gotchas and Limitations
|
||||
### CloudFront Gotchas and Limitations
|
||||
|
||||
- HTTP/2 is not yet supported.
|
||||
- If using S3 as a backing store, remember that the endpoints for website hosting and for general S3 are different. Example: “bucketname.s3.amazonaws.com” is a standard S3 serving endpoint, but to have redirect and error page support, you need to use the website hosting endpoint listed for that bucket, e.g. “bucketname.s3-website-us-east-1.amazonaws.com” (or the appropriate region).
|
||||
|
@ -1103,12 +1103,12 @@ CloudFront
|
|||
DirectConnect
|
||||
-------------
|
||||
|
||||
### Basics
|
||||
### DirectConnect Basics
|
||||
|
||||
- 📒 [Homepage](https://aws.amazon.com/directconnect/) ∙ [User guide](http://docs.aws.amazon.com/directconnect/latest/UserGuide/) ∙ [FAQ](https://aws.amazon.com/directconnect/faqs/) ∙ [Pricing](https://aws.amazon.com/directconnect/pricing/)
|
||||
- **Direct Connect** is a private, dedicated connection from your network(s) to AWS.
|
||||
|
||||
### Tips
|
||||
### DirectConnect Tips
|
||||
|
||||
- If your data center has [a partnering relationship](https://aws.amazon.com/directconnect/partners/) with AWS, setup is streamlined.
|
||||
- Use for more consistent predictable network performance guarantees.
|
||||
|
@ -1120,16 +1120,16 @@ DirectConnect
|
|||
Redshift
|
||||
--------
|
||||
|
||||
### Basics
|
||||
### Redshift Basics
|
||||
|
||||
- 📒 [Homepage](https://aws.amazon.com/redshift/) ∙ [Developer guide](http://docs.aws.amazon.com/redshift/latest/dg/) ∙ [FAQ](https://aws.amazon.com/redshift/faqs/) ∙ [Pricing](https://aws.amazon.com/redshift/pricing/)
|
||||
- **Redshift** is AWS’ [data warehouse](https://en.wikipedia.org/wiki/Data_warehouse) solution, which is highly parallel, share-nothing, and columnar. It is very widely used. It [was built](https://en.wikipedia.org/wiki/Amazon_Redshift) with [ParAccel](https://en.wikipedia.org/wiki/ParAccel) and [Postgres](https://en.wikipedia.org/wiki/PostgreSQL).
|
||||
|
||||
### Alternatives and Lock-in
|
||||
### Redshift Alternatives and Lock-in
|
||||
|
||||
- ⛓🚪Whatever data warehouse you select, your business will likely be locked in for a long time. Also (and not coincidentally) the data warehouse market is highly fragmented. Selecting a data warehouse is a choice to be made carefully, with research and awareness of [the market landscape](https://www.datanami.com/2016/03/14/data-warehouse-market-ripe-disruption-gartner-says/) and what [business intelligence](https://en.wikipedia.org/wiki/Business_intelligence) tools you’ll be using.
|
||||
|
||||
### Tips
|
||||
### Redshift Tips
|
||||
|
||||
- 🔸Although Redshift is based on Postgres, its SQL dialect and performance profile are different.
|
||||
- Redshift supports only [11 primitive data types](https://docs.aws.amazon.com/redshift/latest/dg/c_Supported_data_types.html). ([List of unsupported Postgres types](https://docs.aws.amazon.com/redshift/latest/dg/c_unsupported-postgresql-datatypes.html)\)
|
||||
|
@ -1137,7 +1137,7 @@ Redshift
|
|||
- 🔸Redshift does not support many Postgres functions, most notable date/time related or aggregates. See the [full list here](https://docs.aws.amazon.com/redshift/latest/dg/c_unsupported-postgresql-functions.html).
|
||||
- Major 3rd-party BI tools support Redshift integration (see [Quora](https://www.quora.com/Which-BI-visualisation-solution-goes-best-with-Redshift)).
|
||||
|
||||
### Gotchas and Limitations
|
||||
### Redshift Gotchas and Limitations
|
||||
|
||||
- 🔸While Redshift can handle heavy queries well, it does not scale horizontally, i.e. does not handle multiple queries in parallel. Therefore, if you expect a high parallel load, consider replicating or (if possible) sharding your data across multiple clusters.
|
||||
- Redshift data commit transactions are very expensive and serialized at the cluster level. Therefore, consider grouping multiple COPY commands into a single transaction whenever possible.
|
||||
|
@ -1147,19 +1147,19 @@ Redshift
|
|||
EMR
|
||||
---
|
||||
|
||||
### Basics
|
||||
### EMR Basics
|
||||
|
||||
- 📒 [Homepage](https://aws.amazon.com/emr/) ∙ [Release guide](http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/) ∙ [FAQ](https://aws.amazon.com/emr/faqs/) ∙ [Pricing](https://aws.amazon.com/emr/pricing/)
|
||||
- **EMR** (which used to stand for Elastic Map Reduce, but not anymore, since it now extends beyond map-reduce) is a service that offers managed deployment of [Hadoop](https://en.wikipedia.org/wiki/Apache_Hadoop), [HBase](https://en.wikipedia.org/wiki/Apache_HBase) and [Spark](https://en.wikipedia.org/wiki/Apache_Spark). It reduces reduces the management burden of setting up and maintaining these services yourself.
|
||||
|
||||
### Alternatives and Lock-in
|
||||
### EMR Alternatives and Lock-in
|
||||
|
||||
- ⛓Most of EMR is based open source technology that you can in principle deploy yourself. However, the job workflows and much other tooling is AWS-specific. Migrating from EMR to your own clusters is possible but not always trivial.
|
||||
|
||||
### Tips
|
||||
### EMR Tips
|
||||
|
||||
- EMR relies on many versions of Hadoop and other supporting software. Be sure to check [which versions are in use](https://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-release-components.html).
|
||||
- **EMR costs** can pile up quickly. [This blog post](http://engineering.bloomreach.com/strategies-for-reducing-your-amazon-emr-costs/) has some tips.
|
||||
- 💸❗**EMR costs** can pile up quickly since it involves lots of instances, and efficiency can be poor depending on cluster configuration and choice of workload, and accidents like hung jobs are costly. See the [section on EC2 cost management](#ec2-cost-management), especially the tips there about spot instances and avoiding hourly billing. [This blog post](http://engineering.bloomreach.com/strategies-for-reducing-your-amazon-emr-costs/) has additional tips.
|
||||
- ⏱Off-the-shelf EMR and Hadoop can have significant overhead when compared with efficient processing on a single machine. If your data is small and performance matters, you may wish to consider alternatives, as [this post](http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html) illustrates.
|
||||
- Python programmers may want to take a look at Yelp’s [mrjob](https://github.com/Yelp/mrjob).
|
||||
- It takes time to tune performance of EMR jobs, which is why third-party services such as [Qubole’s data service](https://www.qubole.com/mapreduce-as-a-service/) are gaining popularity as ways to improve performance or reduce costs.
|
||||
|
@ -1169,7 +1169,7 @@ High Availability
|
|||
|
||||
This section covers tips and information on achieving [high availability](https://en.wikipedia.org/wiki/High_availability).
|
||||
|
||||
### Tips
|
||||
### High Availability Tips
|
||||
|
||||
- AWS offers two levels of redundancy, [regions and availability zones (AZs)](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-regions-availability-zones).
|
||||
- When used correctly, regions and zones do allow for high availability. You may want to use non-AWS providers for larger business risk mitigation (i.e. not tying your company to one vendor), but reliability of AWS across regions is very high.
|
||||
|
@ -1185,7 +1185,7 @@ This section covers tips and information on achieving [high availability](https:
|
|||
- **EBS vs instance storage:** For a number of years, EBSs had a poorer track record for availability than instance storage. For systems where individual instances can be killed and restarted easily, instance storage with sufficient redundancy could give higher availability overall. EBS has improved, and modern instance types (since 2015) are now EBS-only, so this approach, while helpful at one time, may be increasingly archaic.
|
||||
- Be sure to [use and understand ELBs/ALBs](#load-balancers) appropriately. Many outages are due to not using load balancers, or misunderstandings or misconfigurations of ELBs.
|
||||
|
||||
### Gotchas and Limitations
|
||||
### High Availability Gotchas and Limitations
|
||||
|
||||
- **AZ naming** differs from one customer account to the next. Your “us-west-1a” is not the same as another customer’s “us-west-1a” — the letters are assigned to physical AZs randomly per account. This can also be a gotcha if you have multiple AWS accounts.
|
||||
- **Cross-AZ traffic** is not free. At large scale, the costs add up to a significant amount of money. If possible, optimize your traffic to stay within the same AZ as much as possible.
|
||||
|
|
Loading…
Reference in a new issue