From 001e50488fa8246e988f4b4caf2359f302d1bd19 Mon Sep 17 00:00:00 2001 From: Hai Dang Date: Tue, 18 Oct 2016 00:27:35 -0700 Subject: [PATCH] add sections for IAM, Security groups and EMR --- README.md | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index a696fa7..f9e4995 100644 --- a/README.md +++ b/README.md @@ -526,6 +526,10 @@ We cover security basics first, since configuring user accounts is something you - At the beginning, IAM policy may be very simple, but for large systems, it will grow in complexity, and need to be managed with care. - 🔹Make sure one person (perhaps with a backup) in your organization is formally assigned ownership of managing IAM policies, make sure every administrator works with that person to have changes reviewed. This goes a long way to avoiding accidental and serious misconfigurations. - It is best to give each user or service the minimum privileges needed to perform their duties. This is the [principle of least privilege](https://en.wikipedia.org/wiki/Principle_of_least_privilege), one of the foundations of good security. Organize all IAM users and groups according to levels of access they need. +- IAM has the permission hierarchy of + 1. Explicit deny: The most restrictive policy wins. + 2. Explicit allow: Access permissions to any resource has to be explicitly given. + 3. Implicit deny: All permissions are implicitly denied by default. ### Security and IAM Tips @@ -659,7 +663,7 @@ S3 - If you are primarily using a VPC, consider setting up a [VPC Endpoint](http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-endpoints.html) for S3 in order to allow your VPC-hosted resources to easily access it without the need for extra network configuration or hops. - **Cross-region replication:** S3 has [a feature](https://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html) for replicating a bucket between one region and a another. Note that S3 is already highly replicated within one region, so usually this isn’t necessary for durability, but it could be useful for compliance (geographically distributed data storage), lower latency, or as a strategy to reduce region-to-region bandwidth costs by mirroring heavily used data in a second region. - **IPv4 vs IPv6:** For a long time S3 only supported IPv4 at the default endpoint `https://BUCKET.s3.amazonaws.com`. However, [as of Aug 11, 2016](https://aws.amazon.com/blogs/aws/now-available-ipv6-support-for-amazon-s3/) it now supports both IPv4 & IPv6! To use both, you have to [enable dualstack](http://docs.aws.amazon.com/AmazonS3/latest/dev/dual-stack-endpoints.html) either in your preferred API client or by directly using this url scheme `https://BUCKET.s3.dualstack.REGION.amazonaws.com`. - +- **S3 Event Notifications:** S3 can be configured to send an SNS notification, SQS message, or AWS Lambda function on bucket events. [doc](http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html) ### S3 Gotchas and Limitations - 🔸For many years, there was a notorious [**100-bucket limit**](https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html#limits_s3) per account, which could not be raised and caused many companies significant pain. As of 2015, you can [request increases](https://aws.amazon.com/about-aws/whats-new/2015/08/amazon-s3-introduces-new-usability-enhancements/). You can ask to increase the limit, but it will still be capped (generally below ~1000 per account). @@ -1128,7 +1132,7 @@ VPCs, Network Security, and Security Groups ### VPC and Network Security Tips -- ❗**Security groups** are your first line of defense for your servers. Be extremely restrictive of what ports are open to all incoming connections. In general, if you use CLBs, ALBs or other load balancing, the only ports that need to be open to incoming traffic would be port 22 and whatever port your application uses. +- ❗**Security groups** are your first line of defense for your servers. Be extremely restrictive of what ports are open to all incoming connections. In general, if you use CLBs, ALBs or other load balancing, the only ports that need to be open to incoming traffic would be port 22 and whatever port your application uses. Security groups access policy is 'deny by default'. - **Port hygiene:** A good habit is to pick unique ports within an unusual range for each different kind of production service. For example, your web fronted might use 3010, your backend services 3020 and 3021, and your Postgres instances the usual 5432. Then make sure you have fine-grained security groups for each set of servers. This makes you disciplined about listing out your services, but also is more error-proof. For example, should you accidentally have an extra Apache server running on the default port 80 on a backend server, it will not be exposed. - **Migrating from Classic**: For migrating from older EC2-Classic deployments to modern EC2-VPC setup, [this article](http://blog.kiip.me/engineering/ec2-to-vpc-executing-a-zero-downtime-migration/) may be of help. - For basic AWS use, one default VPC may be sufficient. But as you scale up, you should consider mapping out network topology more thoroughly. A good overview of best practices is [here](http://blog.flux7.com/blogs/aws/vpc-best-configuration-practices). @@ -1265,6 +1269,9 @@ EMR - EMR relies on many versions of Hadoop and other supporting software. Be sure to check [which versions are in use](https://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-release-components.html). - ⏱Off-the-shelf EMR and Hadoop can have significant overhead when compared with efficient processing on a single machine. If your data is small and performance matters, you may wish to consider alternatives, as [this post](http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html) illustrates. - Python programmers may want to take a look at Yelp’s [mrjob](https://github.com/Yelp/mrjob). +- Larger instances don't neccessarily cost more in spot market. So you should look at different options and determine which instances you should bid for your jobs. +- Since you are billed at hour granularity, it is usually beneficial to increase the number of instances and/or the type of intances to keep the time of running your EMR jobs under one hour. +- Bootstrap Actions [doc](http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-bootstrap.html) are used to configure number of mappers/reducers/amount of memory of your EMR jobs. They are also used for installing extra softwares on for your cluster before your EMR jobs are running. - It takes time to tune performance of EMR jobs, which is why third-party services such as [Qubole’s data service](https://www.qubole.com/mapreduce-as-a-service/) are gaining popularity as ways to improve performance or reduce costs. ### EMR Gotchas and Limitations