Sunday, February 2, 2025

Redshift User Administration

1.1 Add New Users

Amazon Redshift user accounts can only be created and dropped by a database superuser.

CREATE USER username PASSWORD '#######';

Connection limit:

CREATE USER username PASSWORD '#######' connection limit 10;

Syslog access:

CREATE USER username PASSWORD '#######' syslog access unrestricted;

Syslog access specifies the user's access level to the Amazon Redshift system tables and views.

Superuser:

CREATE USER adminuser CREATE USER username PASSWORD '#######

Show all users:

select * from pg_user

Users can be assigned ownership to tables:

alter table schemaname.tablename owner to user001;

1.3 Remove Users

drop user username;

A user must reassign or free up all dependent objects and privileged resources.

A Redshift utilities GitHub repository provides example statements

and views for accessing Redshift Metadata.

1.4 Set user max connection limit

alter user username connection limit 10;

1.5 Set user query group (needs to tie into setting up a queue in WLM section above)

If a user group name is listed in a WLM queue definition, queries run by members of that user group will be assigned to the corresponding WLM queue.

alter group groupname add user username;

You can assign a query to a queue at run time by assigning your query to the appropriate query group. Use the SET command to begin a query group.

set query_group to groupname;

1.6 Adding database

It is recommended to use a single database within a Redshift cluster for all your application related schemas. A database can be created during the cluster creation process.

View databases:

select datname, datdba, datconnlimit from pg_database_info where datdba > 1;

1.7 Adding schemas

create schema schemaname authorization user001;

select nspname as schema, usename as owner from pg_namespace, pg_user where pg_namespace.nspowner = pg_user.usesysid and pg_user.usename ='user001';

You can set a schema search path for a user. Unqualified table names will be searched for using the search_path schema name values.

alter user user001 set search_path to schemaname, public;

Grant select on table to group

GRANT SELECT ON "database".schema.tablename TO group groupname;

Query to list username and associated group name

SELECT usename AS user_name, groname AS group_name

FROM pg_user, pg_group

WHERE pg_user.usesysid = ANY(pg_group.grolist)

AND pg_group.groname in (SELECT DISTINCT pg_group.groname from pg_group);

Query to list permission on group

select relacl ,

'grant ' || substring(

case when charindex('r',split_part(split_part(array_to_string(relacl, '|'),pu.groname,2 ) ,'/',1)) > 0 then ',select ' else '' end

||case when charindex('w',split_part(split_part(array_to_string(relacl, '|'),pu.groname,2 ) ,'/',1)) > 0 then ',update ' else '' end

||case when charindex('a',split_part(split_part(array_to_string(relacl, '|'),pu.groname,2 ) ,'/',1)) > 0 then ',insert ' else '' end

||case when charindex('d',split_part(split_part(array_to_string(relacl, '|'),pu.groname,2 ) ,'/',1)) > 0 then ',delete ' else '' end

||case when charindex('R',split_part(split_part(array_to_string(relacl, '|'),pu.groname,2 ) ,'/',1)) > 0 then ',rule ' else '' end

||case when charindex('x',split_part(split_part(array_to_string(relacl, '|'),pu.groname,2 ) ,'/',1)) > 0 then ',references ' else '' end

||case when charindex('t',split_part(split_part(array_to_string(relacl, '|'),pu.groname,2 ) ,'/',1)) > 0 then ',trigger ' else '' end

||case when charindex('X',split_part(split_part(array_to_string(relacl, '|'),pu.groname,2 ) ,'/',1)) > 0 then ',execute ' else '' end

||case when charindex('U',split_part(split_part(array_to_string(relacl, '|'),pu.groname,2 ) ,'/',1)) > 0 then ',usage ' else '' end

||case when charindex('C',split_part(split_part(array_to_string(relacl, '|'),pu.groname,2 ) ,'/',1)) > 0 then ',create ' else '' end

||case when charindex('T',split_part(split_part(array_to_string(relacl, '|'),pu.groname,2 ) ,'/',1)) > 0 then ',temporary ' else '' end

, 2,10000)

|| ' on '||namespace||'.'||item ||' to "'||pu.groname||'";' as grantsql

from

(SELECT

use.usename as subject,

nsp.nspname as namespace,

c.relname as item,

c.relkind as type,

use2.usename as owner,

c.relacl

FROM

pg_user use

cross join pg_class c

left join pg_namespace nsp on (c.relnamespace = nsp.oid)

left join pg_user use2 on (c.relowner = use2.usesysid)

WHERE

c.relowner = use.usesysid

and nsp.nspname NOT IN ('pg_catalog', 'pg_toast', 'information_schema')

ORDER BY

subject, namespace, item

) join pg_group pu on array_to_string(relacl, '|') like '%'||pu.groname||'%'

where relacl is not null

and pu.groname= ‘groupname’

order by 2

Query to find table creation date

SELECT

TRIM(nspname) AS schema_name,

TRIM(relname) AS table_name,

relcreationtime AS creation_time

FROM pg_class_info

LEFT JOIN pg_namespace ON pg_class_info.relnamespace = pg_namespace.oid

WHERE reltype != 0 and TRIM(relname) like '%f_mobiler%'

AND TRIM(nspname) = 'global' order by relcreationtime desc

Monday, September 9, 2024

Description:

Amazon DocumentDB is a document database service that supports MongoDB workloads and provides
virtually unlimited storage with real-time scalability. Functionality and capabilities that enable Amazon
DocumentDB to deliver MongoDB level performance include:
Apache 2.0 open source MongoDB 3.6 API lets you use existing MongoDB drivers and tools with
Amazon DocumentDB to facilitate a smooth transition.
Decoupled storage and compute, allowing independent scaling to meet dynamic workload demands.
Ability to add as many as 15 low-latency read replicas to handle millions of requests per second.
Guaranteed 99.99% availability by replicating six copies of data across three AWS Availability
Zones.
Rapid (less than 30 seconds) automatic failover to a read replica in the event of failure.

Advantages:

Amazon DocumentDB integrates deeply with AWS services, and when combined with these services,
offers advantages that provide compelling reasons to migrate from MongoDB to AWS DocumentDB:
Multiple levels of database security: network isolation using Amazon VPC, encryption at rest
via AWS Key Management Service (KMS), auditing, TLS for encryption-in-transit, encrypted
automated backups, snapshots, and replicas.
Automated monitoring and backups to Amazon S3 that allow point-in-time recovery.
Compliance with industry standards like PCI DSS, ISO 9001, 27001, 27017, and 27018, as well as SOC 1, 2, and SOC 3, and HIPAA.
AWS DocumentDB reduces operational overhead

Migrating to Amazon DocumentDB

We can migrate data from any MongoDB database, either on-premises or in the cloud (e.g. a MongoDB database running on Amazon EC2), to Amazon DocumentDB. There are three primary approaches for migrating your data to Amazon DocumentDB.

Offline

Online

Hybrid

Offline migration:

The offline approach uses the mongodump and mongorestore tools to migrate data from source MongoDB deployment to Amazon DocumentDB cluster. The offline method is the simplest migration approach, but it also incurs the most downtime for your cluster.

The basic process for offline migration is as follows:

1. Quiesce writes to your MongoDB source.

2. Dump collection data and indexes from the source MongoDB deployment.

3. Restore indexes to the Amazon DocumentDB cluster.

4. Restore collection data to the Amazon DocumentDB cluster.

5. Change your application endpoint to write to the Amazon DocumentDB cluster.

Online Migration:

For migration of production workloads with minimal downtime, we can use the online approach or the hybrid approach. With the online migration approach, use AWS Database Migration Service (DMS) to migrate the data from MongoDB to Amazon DocumentDB. DMS performs an initial full load of the data from the MongoDB source to Amazon DocumentDB. During the full load, source database is available for operations. Once the full load is completed, DMS switches to change data capture (CDC) mode to keep the source (MongoDB) and destination (Amazon DocumentDB) in sync. Once the databases are in sync,switch your applications to point to Amazon DocumentDB with near zero downtime.

Hybrid Approach:

The hybrid approach is a combination of the offline and online migration approaches. The hybrid approach is useful in a scenario where you need minimal downtime during migration, but the size of the source database is large or sufficient bandwidth is not available to migrate the data in a reasonable amount of time. The hybrid approach has two phases. In the first phase, you export the data from the source MongoDB using the mongodump tool, transfer it to AWS (if the source is on premises), and restore it to Amazon DocumentDB. You can use AWS Direct Connect or AWS Snowball to transfer the export dump to AWS. During this phase, the source (MongoDB) is available for operations and the data restored to Amazon DocumentDB does not contain the latest changes. In the second phase, you use DMS in CDC mode to copy the changes from the source (MongoDB) to Amazon DocumentDB and keep them in sync. Once the databases are in sync, you can switch your applications to point to Amazon DocumentDB with near zero downtime.

for Migration use below tools

https://github.com/awslabs/amazon-documentdb-tools

1) Index exported:

python migrationtools/documentdb_index_tool.py --host 127.0.0.1 --port 2700 --username sysAdmin --password sysadm1298 --auth-db admin --dump-indexes --dir C:\Junk\mongoindexes

2) copy indexes to mongo instance on a backup folder using winscp.

3) index imported: (Login into Mongodb node)

cd /imdb001/sw/mongo-3.4.17/bin

./mongorestore --dryRun --ssl --host='docdb.cluster-ccr6zchg1pp7.us-east-1.docdb.amazonaws.com' --sslCAFile=rds-combined-ca-bundle.pem --username=docdbadmin --password='<password>' --db=qstaflsdb01 --port=27017 --numParallelCollections=4 --maintainInsertionOrder --port=27017 /imdb001/data/backup/mongoindexes/qstaflsdb01

4) Run mongo restore for data from mongodump: (noindexRestore)

./mongorestore --gzip --ssl --host='docdb.cluster.us-east-1.docdb.amazonaws.com' --sslCAFile=rds-combined-ca-bundle.pem --username=docdbadmin --password='<password>' --db=test --port=27017 --numParallelCollections=8 --numInsertionWorkersPerCollection=4 --maintainInsertionOrder --noIndexRestore --port=27017 /imdb001/data/backup/

5) table dump and restore

./mongodump --gzip --db=test --port=27101 --username=sysAdmin --authenticationDatabase=admin --out=/imdb001/data/qausersroles --numParallelCollections=4 --dumpDbUsersAndRoles --exclude CollectionsWithPrefix Account --exclude CollectionsWithPrefix Fin --excludeCollectionsWithPrefix TEST --excludeCollectionsWithPrefix Gring --excludeCollectionsWithPrefix fin --excludeCollectionsWithPrefix users --password= --sslAllowInvalidHostnames

./mongorestore --gzip --ssl --host='docdb.cluster-ccr6zchg1pp7.us-east-1.docdb.amazonaws.com' --sslCAFile=rds-combined-ca-bundle.pem --username=docdbadmin --password='<password>' --db=test --port=27017 --numParallelCollections=4 --numInsertionWorkersPerCollection=4 --maintainInsertionOrder --noIndexRestore --restoreDbUsersAndRoles --port=27017 /imdb001/data/qausersroles/

FUll BACKUP DUMP

mongo_dump = [root@ip-10.0.0.0 bin]# ./mongodump --host=hostname:27101 --username=sysAdmin --password=password --out=/backupvolume

INDEX DUMP

[root@ip-10.0.0.0 tools]# python3 documentdb_index_tool.py --host hostname:27101 --port 27107 --username sysAdmin --password passport --auth-db admin --dump-indexes --dir /backupvolume/main_back/indexdump

INDEX RESTORE

./mongorestore --dryRun --ssl --host=docdb.cluster.us-east-1.docdb.amazonaws.com --sslCAFile=rds-combined-ca-bundle.pem --username=username admin --password='password' --db=test --port=27017 --numParallelCollections=8 --numInsertionWorkersPerCollection=8 --maintainInsertionOrder --port=27017 /backupvolume/indexdump/

Tuesday, September 3, 2024

How Vacuum and Analyze will work in Redshift

Since redshift is designed by Postgres most of the features and commands will work in redshift too.First In order to find any tables which is out of stats we need to find in a way how many dead tuples are there in a schema

DEAD TUPLES :

If We have 10k records in a table,it will occupies 10k locations. if We deleted 5k records Now 5k memory location should become free but by default it will not become free, it will just marked as record deleted, so these type of records we will call it as dead tuples,you can't reclaim space at the same time you can't reuse those memory locations.

VACUUM :

VACUUM operation will go remove and the dead tuples and make the location as ready for use, it will not reclaim any space to os.but location will be free, anyone can use.

ANALYZE:

It will Keep stats up to date to Generate Better execution Plan

VACUUM FULL:

VACUUM will not reclaim any space, It will just removes dead tuples, now if you think data not distributed proper, huge % of bloats are there, so if you are seeing any performance impact, we will run this vacuum full to remove bloats and distribute data properly.

Query to find unsorting value to do Vacuum or analyze or vacuum full

--svv_table_info

SELECT "schema", table_id,

"table",diststyle,sortkey1,size,tbl_rows,unsorted,stats_off,skew_sortkey1,skew_rows

FROM svv_table_info

WHERE "schema" IN ('Schemaname')

AND (unsorted > 0 OR stats_off > 0)

ORDER BY stats_off DESC;

By default, Redshift can skip the tables from vacuum Sort if the table is already at least 95 percent sorted. So we have to care about this if the tables have billions(or any huge numbers) of rows, then just 5% is a huge number of rows. So while running a vacuum, make you are defining the threshold percentage.

Similarly, Analyze also will skip the tables from Analyzing if the out of stats is up to 10%. So we need run the following command to set this value to very low and the analyze will not skip any tables

How to find stats information

SELECT * FROM SVV_VACUUM_SUMMARY;

SELECT * FROM SVV_VACUUM_PROGRESS;

Sr Database Administrator and Researcher