Sr Database Administrator and Researcher : 2024

Description:

Amazon DocumentDB is a document database service that supports MongoDB workloads and provides
virtually unlimited storage with real-time scalability. Functionality and capabilities that enable Amazon
DocumentDB to deliver MongoDB level performance include:
Apache 2.0 open source MongoDB 3.6 API lets you use existing MongoDB drivers and tools with
Amazon DocumentDB to facilitate a smooth transition.
Decoupled storage and compute, allowing independent scaling to meet dynamic workload demands.
Ability to add as many as 15 low-latency read replicas to handle millions of requests per second.
Guaranteed 99.99% availability by replicating six copies of data across three AWS Availability
Zones.
Rapid (less than 30 seconds) automatic failover to a read replica in the event of failure.

Advantages:

Amazon DocumentDB integrates deeply with AWS services, and when combined with these services,
offers advantages that provide compelling reasons to migrate from MongoDB to AWS DocumentDB:
Multiple levels of database security: network isolation using Amazon VPC, encryption at rest
via AWS Key Management Service (KMS), auditing, TLS for encryption-in-transit, encrypted
automated backups, snapshots, and replicas.
Automated monitoring and backups to Amazon S3 that allow point-in-time recovery.
Compliance with industry standards like PCI DSS, ISO 9001, 27001, 27017, and 27018, as well as SOC 1, 2, and SOC 3, and HIPAA.
AWS DocumentDB reduces operational overhead

Migrating to Amazon DocumentDB

We can migrate data from any MongoDB database, either on-premises or in the cloud (e.g. a MongoDB database running on Amazon EC2), to Amazon DocumentDB. There are three primary approaches for migrating your data to Amazon DocumentDB.

Offline

Online

Hybrid

Offline migration:

The offline approach uses the mongodump and mongorestore tools to migrate data from source MongoDB deployment to Amazon DocumentDB cluster. The offline method is the simplest migration approach, but it also incurs the most downtime for your cluster.

The basic process for offline migration is as follows:

1. Quiesce writes to your MongoDB source.

2. Dump collection data and indexes from the source MongoDB deployment.

3. Restore indexes to the Amazon DocumentDB cluster.

4. Restore collection data to the Amazon DocumentDB cluster.

5. Change your application endpoint to write to the Amazon DocumentDB cluster.

Online Migration:

For migration of production workloads with minimal downtime, we can use the online approach or the hybrid approach. With the online migration approach, use AWS Database Migration Service (DMS) to migrate the data from MongoDB to Amazon DocumentDB. DMS performs an initial full load of the data from the MongoDB source to Amazon DocumentDB. During the full load, source database is available for operations. Once the full load is completed, DMS switches to change data capture (CDC) mode to keep the source (MongoDB) and destination (Amazon DocumentDB) in sync. Once the databases are in sync,switch your applications to point to Amazon DocumentDB with near zero downtime.

Hybrid Approach:

The hybrid approach is a combination of the offline and online migration approaches. The hybrid approach is useful in a scenario where you need minimal downtime during migration, but the size of the source database is large or sufficient bandwidth is not available to migrate the data in a reasonable amount of time. The hybrid approach has two phases. In the first phase, you export the data from the source MongoDB using the mongodump tool, transfer it to AWS (if the source is on premises), and restore it to Amazon DocumentDB. You can use AWS Direct Connect or AWS Snowball to transfer the export dump to AWS. During this phase, the source (MongoDB) is available for operations and the data restored to Amazon DocumentDB does not contain the latest changes. In the second phase, you use DMS in CDC mode to copy the changes from the source (MongoDB) to Amazon DocumentDB and keep them in sync. Once the databases are in sync, you can switch your applications to point to Amazon DocumentDB with near zero downtime.

for Migration use below tools

https://github.com/awslabs/amazon-documentdb-tools

1) Index exported:

python migrationtools/documentdb_index_tool.py --host 127.0.0.1 --port 2700 --username sysAdmin --password sysadm1298 --auth-db admin --dump-indexes --dir C:\Junk\mongoindexes

2) copy indexes to mongo instance on a backup folder using winscp.

3) index imported: (Login into Mongodb node)

cd /imdb001/sw/mongo-3.4.17/bin

./mongorestore --dryRun --ssl --host='docdb.cluster-ccr6zchg1pp7.us-east-1.docdb.amazonaws.com' --sslCAFile=rds-combined-ca-bundle.pem --username=docdbadmin --password='<password>' --db=qstaflsdb01 --port=27017 --numParallelCollections=4 --maintainInsertionOrder --port=27017 /imdb001/data/backup/mongoindexes/qstaflsdb01

4) Run mongo restore for data from mongodump: (noindexRestore)

./mongorestore --gzip --ssl --host='docdb.cluster.us-east-1.docdb.amazonaws.com' --sslCAFile=rds-combined-ca-bundle.pem --username=docdbadmin --password='<password>' --db=test --port=27017 --numParallelCollections=8 --numInsertionWorkersPerCollection=4 --maintainInsertionOrder --noIndexRestore --port=27017 /imdb001/data/backup/

5) table dump and restore

./mongodump --gzip --db=test --port=27101 --username=sysAdmin --authenticationDatabase=admin --out=/imdb001/data/qausersroles --numParallelCollections=4 --dumpDbUsersAndRoles --exclude CollectionsWithPrefix Account --exclude CollectionsWithPrefix Fin --excludeCollectionsWithPrefix TEST --excludeCollectionsWithPrefix Gring --excludeCollectionsWithPrefix fin --excludeCollectionsWithPrefix users --password= --sslAllowInvalidHostnames

./mongorestore --gzip --ssl --host='docdb.cluster-ccr6zchg1pp7.us-east-1.docdb.amazonaws.com' --sslCAFile=rds-combined-ca-bundle.pem --username=docdbadmin --password='<password>' --db=test --port=27017 --numParallelCollections=4 --numInsertionWorkersPerCollection=4 --maintainInsertionOrder --noIndexRestore --restoreDbUsersAndRoles --port=27017 /imdb001/data/qausersroles/

FUll BACKUP DUMP

mongo_dump = [root@ip-10.0.0.0 bin]# ./mongodump --host=hostname:27101 --username=sysAdmin --password=password --out=/backupvolume

INDEX DUMP

[root@ip-10.0.0.0 tools]# python3 documentdb_index_tool.py --host hostname:27101 --port 27107 --username sysAdmin --password passport --auth-db admin --dump-indexes --dir /backupvolume/main_back/indexdump

INDEX RESTORE

./mongorestore --dryRun --ssl --host=docdb.cluster.us-east-1.docdb.amazonaws.com --sslCAFile=rds-combined-ca-bundle.pem --username=username admin --password='password' --db=test --port=27017 --numParallelCollections=8 --numInsertionWorkersPerCollection=8 --maintainInsertionOrder --port=27017 /backupvolume/indexdump/

How Vacuum and Analyze will work in Redshift

Since redshift is designed by Postgres most of the features and commands will work in redshift too.First In order to find any tables which is out of stats we need to find in a way how many dead tuples are there in a schema

DEAD TUPLES :

If We have 10k records in a table,it will occupies 10k locations. if We deleted 5k records Now 5k memory location should become free but by default it will not become free, it will just marked as record deleted, so these type of records we will call it as dead tuples,you can't reclaim space at the same time you can't reuse those memory locations.

VACUUM :

VACUUM operation will go remove and the dead tuples and make the location as ready for use, it will not reclaim any space to os.but location will be free, anyone can use.

ANALYZE:

It will Keep stats up to date to Generate Better execution Plan

VACUUM FULL:

VACUUM will not reclaim any space, It will just removes dead tuples, now if you think data not distributed proper, huge % of bloats are there, so if you are seeing any performance impact, we will run this vacuum full to remove bloats and distribute data properly.

Query to find unsorting value to do Vacuum or analyze or vacuum full

--svv_table_info

SELECT "schema", table_id,

"table",diststyle,sortkey1,size,tbl_rows,unsorted,stats_off,skew_sortkey1,skew_rows

FROM svv_table_info

WHERE "schema" IN ('Schemaname')

AND (unsorted > 0 OR stats_off > 0)

ORDER BY stats_off DESC;

By default, Redshift can skip the tables from vacuum Sort if the table is already at least 95 percent sorted. So we have to care about this if the tables have billions(or any huge numbers) of rows, then just 5% is a huge number of rows. So while running a vacuum, make you are defining the threshold percentage.

Similarly, Analyze also will skip the tables from Analyzing if the out of stats is up to 10%. So we need run the following command to set this value to very low and the analyze will not skip any tables

How to find stats information

SELECT * FROM SVV_VACUUM_SUMMARY;

SELECT * FROM SVV_VACUUM_PROGRESS;

Sr Database Administrator and Researcher

Monday, September 9, 2024

Tuesday, September 3, 2024

Sunday, September 1, 2024

RedShift Workload management (WLM)

Search This Blog

Followers