How Vacuum and Analyze will work in Redshift
Since redshift is designed by Postgres most of the features and commands will work in redshift too.First In order to find any tables which is out of stats we need to find in a way how many dead tuples are there in a schema
DEAD TUPLES :
If We have 10k records in a table,it will occupies 10k locations. if We deleted 5k records Now 5k memory location should become free but by default it will not become free, it will just marked as record deleted, so these type of records we will call it as dead tuples,you can't reclaim space at the same time you can't reuse those memory locations.
VACUUM :
VACUUM operation will go remove and the dead tuples and make the location as ready for use, it will not reclaim any space to os.but location will be free, anyone can use.
ANALYZE:
It will Keep stats up to date to Generate Better execution Plan
VACUUM FULL:
VACUUM will not reclaim any space, It will just removes dead tuples, now if you think data not distributed proper, huge % of bloats are there, so if you are seeing any performance impact, we will run this vacuum full to remove bloats and distribute data properly.
Query to find unsorting value to do Vacuum or analyze or vacuum full
--svv_table_info
SELECT "schema", table_id,
"table",diststyle,sortkey1,size,tbl_rows,unsorted,stats_off,skew_sortkey1,skew_rows
FROM svv_table_info
WHERE "schema" IN ('Schemaname')
AND (unsorted > 0 OR stats_off > 0)
ORDER BY stats_off DESC;
- By default, Redshift can skip the tables from vacuum Sort if the table is already at least 95 percent sorted. So we have to care about this if the tables have billions(or any huge numbers) of rows, then just 5% is a huge number of rows. So while running a vacuum, make you are defining the threshold percentage.
- Similarly, Analyze also will skip the tables from Analyzing if the out of stats is up to 10%. So we need run the following command to set this value to very low and the analyze will not skip any tables
No comments:
Post a Comment