From 71b3cbc0fd03b01389ffdf166fe65a8ccdc8d226 Mon Sep 17 00:00:00 2001 From: Halil Duygulu Date: Fri, 14 Oct 2016 10:24:07 +0300 Subject: [PATCH 1/4] Added some Redshift tips. --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 2fac26f..599203a 100644 --- a/README.md +++ b/README.md @@ -1252,6 +1252,9 @@ Redshift - ❗ Never resize a live cluster. The resize operation takes hours depending on the dataset size. In rare cases, the operation may also get stuck and you'll end up having a non-functional cluster. The safer approach is to create a new cluster from a snapshot, resize the new cluster and shut down the old one. - Redshift has reserved keywords which are not present in Postgres (see full list [here](https://docs.aws.amazon.com/redshift/latest/dg/r_pg_keywords.html)). Watch out for DELTA ([Delta Encodings](https://docs.aws.amazon.com/redshift/latest/dg/c_Delta_encoding.html)). - Redshift does not support many Postgres functions, most notably several date/time-related and aggregation functions. See the [full list here](https://docs.aws.amazon.com/redshift/latest/dg/c_unsupported-postgresql-functions.html). +- 🔹 If you need to change sort key or dist key of a table, you need to create a new table with new key and move your data to new table with insert into new_table select * from old_table. [Choosing Sort Key](http://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data.html) +- ❗🚪 When moving data with insert into x select from y; you need to have at least twice of y table size free space on your harddisk. Redshift first copies data to disk then to new table. +- 🔹 Vacuum delete only does not block copy commands, but vacuum reindex can block if you execute copy command every minute or so. EMR --- From 12f149dfdef9813e1b0ca361e7a27685514fe3ba Mon Sep 17 00:00:00 2001 From: Halil Duygulu Date: Mon, 17 Oct 2016 14:00:44 +0300 Subject: [PATCH 2/4] Changes requested for pr --- README.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 599203a..aa8ebc8 100644 --- a/README.md +++ b/README.md @@ -1252,9 +1252,8 @@ Redshift - ❗ Never resize a live cluster. The resize operation takes hours depending on the dataset size. In rare cases, the operation may also get stuck and you'll end up having a non-functional cluster. The safer approach is to create a new cluster from a snapshot, resize the new cluster and shut down the old one. - Redshift has reserved keywords which are not present in Postgres (see full list [here](https://docs.aws.amazon.com/redshift/latest/dg/r_pg_keywords.html)). Watch out for DELTA ([Delta Encodings](https://docs.aws.amazon.com/redshift/latest/dg/c_Delta_encoding.html)). - Redshift does not support many Postgres functions, most notably several date/time-related and aggregation functions. See the [full list here](https://docs.aws.amazon.com/redshift/latest/dg/c_unsupported-postgresql-functions.html). -- 🔹 If you need to change sort key or dist key of a table, you need to create a new table with new key and move your data to new table with insert into new_table select * from old_table. [Choosing Sort Key](http://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data.html) -- ❗🚪 When moving data with insert into x select from y; you need to have at least twice of y table size free space on your harddisk. Redshift first copies data to disk then to new table. -- 🔹 Vacuum delete only does not block copy commands, but vacuum reindex can block if you execute copy command every minute or so. +- 🔹 [Choosing Sort Key](http://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data.html) is very important since you can not change sort key after table created. If you need to change sort key or distribution key of a table, you need to create a new table with new key and move your data into it with a query like "insert into new_table select * from old_table". +- ❗🚪 When moving data with a query that looks like “insert into x select from y;” you need to have as twice as much space as table “y” takes up available on your cluster's disk. Redshift first copies the data to disk and then to the new table. A good [article](https://www.periscopedata.com/blog/changing-dist-and-sort-keys-in-redshift.html) about how to this for big tables. EMR --- From 7d6c2d5558cb695215e639a70673a8bf21b2e8fe Mon Sep 17 00:00:00 2001 From: Halil Duygulu Date: Tue, 18 Oct 2016 10:02:46 +0300 Subject: [PATCH 3/4] Made changes requested for pr --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index aa8ebc8..dea09f8 100644 --- a/README.md +++ b/README.md @@ -1252,8 +1252,8 @@ Redshift - ❗ Never resize a live cluster. The resize operation takes hours depending on the dataset size. In rare cases, the operation may also get stuck and you'll end up having a non-functional cluster. The safer approach is to create a new cluster from a snapshot, resize the new cluster and shut down the old one. - Redshift has reserved keywords which are not present in Postgres (see full list [here](https://docs.aws.amazon.com/redshift/latest/dg/r_pg_keywords.html)). Watch out for DELTA ([Delta Encodings](https://docs.aws.amazon.com/redshift/latest/dg/c_Delta_encoding.html)). - Redshift does not support many Postgres functions, most notably several date/time-related and aggregation functions. See the [full list here](https://docs.aws.amazon.com/redshift/latest/dg/c_unsupported-postgresql-functions.html). -- 🔹 [Choosing Sort Key](http://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data.html) is very important since you can not change sort key after table created. If you need to change sort key or distribution key of a table, you need to create a new table with new key and move your data into it with a query like "insert into new_table select * from old_table". -- ❗🚪 When moving data with a query that looks like “insert into x select from y;” you need to have as twice as much space as table “y” takes up available on your cluster's disk. Redshift first copies the data to disk and then to the new table. A good [article](https://www.periscopedata.com/blog/changing-dist-and-sort-keys-in-redshift.html) about how to this for big tables. +- 🔹 [Choosing a sort key](http://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data.html) is very important since you can not change a table‘s sort key after it is created. If you need to change the sort or distribution key of a table, you need to create a new table with the new key and move your data into it with a query like “insert into new_table select * from old_table“. +- ❗🚪 When moving data with a query that looks like “insert into x select from y“ you need to have as twice as much space as table “y“ takes up available on your cluster‘s disk. Redshift first copies the data to disk and then to the new table. [Here](https://www.periscopedata.com/blog/changing-dist-and-sort-keys-in-redshift.html) is a good article on how to this for big tables. EMR --- From 51a5671f8a2ee903681cf0790b520360669e7489 Mon Sep 17 00:00:00 2001 From: Halil Duygulu Date: Wed, 19 Oct 2016 11:40:10 +0300 Subject: [PATCH 4/4] Changes for " and ' --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index dea09f8..7dd67df 100644 --- a/README.md +++ b/README.md @@ -1252,8 +1252,8 @@ Redshift - ❗ Never resize a live cluster. The resize operation takes hours depending on the dataset size. In rare cases, the operation may also get stuck and you'll end up having a non-functional cluster. The safer approach is to create a new cluster from a snapshot, resize the new cluster and shut down the old one. - Redshift has reserved keywords which are not present in Postgres (see full list [here](https://docs.aws.amazon.com/redshift/latest/dg/r_pg_keywords.html)). Watch out for DELTA ([Delta Encodings](https://docs.aws.amazon.com/redshift/latest/dg/c_Delta_encoding.html)). - Redshift does not support many Postgres functions, most notably several date/time-related and aggregation functions. See the [full list here](https://docs.aws.amazon.com/redshift/latest/dg/c_unsupported-postgresql-functions.html). -- 🔹 [Choosing a sort key](http://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data.html) is very important since you can not change a table‘s sort key after it is created. If you need to change the sort or distribution key of a table, you need to create a new table with the new key and move your data into it with a query like “insert into new_table select * from old_table“. -- ❗🚪 When moving data with a query that looks like “insert into x select from y“ you need to have as twice as much space as table “y“ takes up available on your cluster‘s disk. Redshift first copies the data to disk and then to the new table. [Here](https://www.periscopedata.com/blog/changing-dist-and-sort-keys-in-redshift.html) is a good article on how to this for big tables. +- 🔹 [Choosing a sort key](http://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data.html) is very important since you can not change a table’s sort key after it is created. If you need to change the sort or distribution key of a table, you need to create a new table with the new key and move your data into it with a query like “insert into new_table select * from old_table”. +- ❗🚪 When moving data with a query that looks like “insert into x select from y”, you need to have twice as much disk space available as table “y” takes up on the cluster’s disks. Redshift first copies the data to disk and then to the new table. [Here](https://www.periscopedata.com/blog/changing-dist-and-sort-keys-in-redshift.html) is a good article on how to this for big tables. EMR ---