Thursday, October 27, 2016

mysql replication and verification steps

Imp things to remember:

#### slave entries:
relay-log=mysqld-relay-bin
report-host=hostname

### ssh tunnel with keys:

back01# mysql

    mysql> show databases;

    +--------------------+
    | Database           |
    +--------------------+
    | information_schema |
    | mysql              |
    | test               |
    | mywork_ops_backups  |
    +--------------------+

    4 rows in set (0.00 sec)



backup/dump

mysqldump mywork_ops_backups < /database_backups/myworkopsdb-$(date +%m-%d-%Y).sql


restore

mysql> create database ops_backup;
[root@pgvmdc ~]# mysql ops_backup < /root/pg-myworkopsdb-20141107.sql

mysql> describe backup_job;
+-----------------+-----------------------------------------------------+------+-----+---------+----------------+
| Field           | Type                                                | Null | Key | Default | Extra          |
+-----------------+-----------------------------------------------------+------+-----+---------+----------------+
| backup_id       | int(10) unsigned                                    | NO   | PRI | NULL    | auto_increment |
| job_start_dt_tm | datetime                                            | NO   |     | NULL    |                |
| job_end_dt_tm   | datetime                                            | NO   |     | NULL    |                |
| pod_name        | varchar(64)                                         | NO   |     | NULL    |                |
| status          | enum('Success','Error','Running','Rerun','md5fail') | NO   |     | NULL    |                |
| type            | enum('inc','full')                                  | NO   |     | NULL    |                |
| dateb           | varchar(64)                                         | NO   |     | NULL    |                |
| filename        | varchar(64)                                         | NO   |     | NULL    |                |
| filesize        | varchar(64)                                         | YES  |     | NULL    |                |
| vault           | varchar(64)                                         | NO   |     | NULL    |                |
+-----------------+-----------------------------------------------------+------+-----+---------+----------------+
10 rows in set (0.00 sec)




select * from backup_job where status ='Error';



mysql> select count(*) from backup_job;
+----------+
| count(*) |
+----------+
|    75977 |
+----------+

1 row in set (0.01 sec)


edit master /etc/my.cnf and restart


##replication master##

server-id=1

log-bin=mysql-bin

report-host=hostname

##replication master##

auto_increment_increment=2

auto_increment_offset=1


edit slave /etc/my.cnf and restart


##replication slave##

server-id=2

relay-log=mysqld-relay-bin

report-host=hostname

##replication slave##

auto_increment_increment=2

auto_increment_offset=2



replication user on master

mysql> create user vrep identified by 'vrepvrep';
Query OK, 0 rows affected (0.00 sec)


mysql> create user vrep identified by '2014#vrep';
Query OK, 0 rows affected (0.03 sec)

mysql> select host, user, password from mysql.user;

+---------------------+------+-------------------------------------------+
| host                | user | password                                  |
+---------------------+------+-------------------------------------------+
| localhost           | root |                                           |
| pgvmdc.myworkdev.com | root |                                           |
| 127.0.0.1           | root |                                           |
| localhost           |      |                                           |
| pgvmdc.myworkdev.com |      |                                           |
| %                   | vrep | *63536F52A6C2FC15C6CBECB7ADB3F9299D86D6B3 |
+---------------------+------+-------------------------------------------+

6 rows in set (0.00 sec)


mysql> grant replication slave on *.* to 'vrep'@'%' identified by 'vrepvrep';
Query OK, 0 rows affected (0.00 sec)


or


mysql> grant replication slave on *.* to vrep identified by 'vrepvrep';
Query OK, 0 rows affected (0.00 sec)


mysql> set password for 'vrep' = password('vrep');
Query OK, 0 rows affected (0.00 sec)


find master co-ordinates

mysql> flush tables with read lock;
mysql> show master status;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000001 |      320 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)



Setting the Master Configuration on the Slave:
mysql> change master to master_host='mysqlmaster';
Query OK, 0 rows affected (0.02 sec)

mysql> change master to master_user='vrep';
Query OK, 0 rows affected (0.01 sec)

mysql> change master to master_password='vrep';
Query OK, 0 rows affected (0.01 sec)

mysql> change master to master_log_file='vrep.log';
Query OK, 0 rows affected (0.00 sec)

mysql> change master to master_log_pos=320;
Query OK, 0 rows affected (0.00 sec)

mysql> start slave;
Query OK, 0 rows affected (0.00 sec



##################################registration error###################################

mysql> show slave status\G

*************************** 1. row ***************************

               Slave_IO_State:
              Master_Host: mysqlmaster

                  Master_User: vrep

                  Master_Port: 3306

                Connect_Retry: 60

             Master_Log_File: vrep.log

          Read_Master_Log_Pos: 1147

               Relay_Log_File: mysqld-relay-bin.000001

                Relay_Log_Pos: 4

        Relay_Master_Log_File: vrep.log

             Slave_IO_Running: No

            Slave_SQL_Running: No

              Replicate_Do_DB:

          Replicate_Ignore_DB:

           Replicate_Do_Table:

       Replicate_Ignore_Table:

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 0

                   Last_Error:

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 1147
              Relay_Log_Space: 106

              Until_Condition: None

               Until_Log_File:

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File:

           Master_SSL_CA_Path:

              Master_SSL_Cert:

            Master_SSL_Cipher:

               Master_SSL_Key:

        Seconds_Behind_Master: NULL

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 1597

                Last_IO_Error: Master command COM_REGISTER_SLAVE failed: Access denied for user 'vrep'@'mysqlslave' (using password: YES) (Errno: 1045)

               Last_SQL_Errno: 0

               Last_SQL_Error:

1 row in set (0.00 sec)





##################################connection error#####################################

mysql> show slave status\G

*************************** 1. row ***************************

               Slave_IO_State:

                  Master_Host: mysqlmaster

                  Master_User: vrep

                  Master_Port: 3306

                Connect_Retry: 60

              Master_Log_File: mysql-bin.000004

          Read_Master_Log_Pos: 366

               Relay_Log_File: mysqld-relay-bin.000001

                Relay_Log_Pos: 4

        Relay_Master_Log_File: mysql-bin.000004

             Slave_IO_Running: No

            Slave_SQL_Running: No

              Replicate_Do_DB:

          Replicate_Ignore_DB:

           Replicate_Do_Table:

       Replicate_Ignore_Table:

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 0

                   Last_Error:

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 366

              Relay_Log_Space: 106

              Until_Condition: None

               Until_Log_File:

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File:

           Master_SSL_CA_Path:

              Master_SSL_Cert:

            Master_SSL_Cipher:

               Master_SSL_Key:

        Seconds_Behind_Master: NULL

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 1045

                Last_IO_Error: error connecting to master 'vrep@mysqlmaster:3306' - retry-time: 60  retries: 86400

               Last_SQL_Errno: 0

               Last_SQL_Error:

1 row in set (0.00 sec)


################################after slave connects:##########################################

mysql>>; show slave status\G

*************************** 1. row ***************************

               Slave_IO_State: Waiting for master to send event

                  Master_Host: mysqlmaster

                  Master_User: vrep

                  Master_Port: 3306

                Connect_Retry: 60

              Master_Log_File: mysql-bin.000004

          Read_Master_Log_Pos: 988

               Relay_Log_File: mysqld-relay-bin.000002

                Relay_Log_Pos: 251

        Relay_Master_Log_File: mysql-bin.000004

             Slave_IO_Running: Yes

            Slave_SQL_Running: Yes

              Replicate_Do_DB:

          Replicate_Ignore_DB:

           Replicate_Do_Table:

       Replicate_Ignore_Table:

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 0

                  Last_Error:

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 988

             Relay_Log_Space: 407

              Until_Condition: None

               Until_Log_File:

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File:

          Master_SSL_CA_Path:
              Master_SSL_Cert:

            Master_SSL_Cipher:

               Master_SSL_Key:

        Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 0

                Last_IO_Error:

               Last_SQL_Errno: 0

               Last_SQL_Error:

1 row in set (0.00 sec)




################################after slave connects:##########################################


mysql>>; show processlist;

+----+------+------------------+------------+-------------+------+----------------------------------------------------------------+------------------+
| Id | User | Host             | db         | Command     | Time | State                                                          | Info             |
+----+------+------------------+------------+-------------+------+----------------------------------------------------------------+------------------+
|  2 | root | localhost        | ops_backup | Query       |    0 | NULL                                                           | show processlist |
| 34 | vrep | mysqlslave:35958 | NULL       | Binlog Dump |  282 | Has sent all binlog to slave; waiting for binlog to be updated | NULL             |
+----+------+------------------+------------+-------------+------+----------------------------------------------------------------+------------------+
2 rows in set (0.00 sec)






ssh tunnel,make sure ssh keys are set
ssh -N -f -L 5007:localhost:3306 root2@209.207.232.136



Errors:
Slave I/O: Master command COM_REGISTER_SLAVE failed: failed registering on master, reconnecting to try again


soln:
[ERROR] Slave I/O: error connecting to master, Error code 1045

The MySQL troubleshooting page gives a few hints, talking about credentials for the replication user. However, attempting a connection via the command line client works, so the password is OK and the slave can reach the master. As it turns out, for replication purposes, the maximum password length is 32 characters.

max is 32
min is 8



Errors
Can't execute the query because you have a conflicting read lock

Soln
show processlist;


check

on master

mysql>>; update backup_job set filesize=40000 where backup_id=131710;

Query OK, 1 row affected (0.04 sec)
Rows matched: 1  Changed: 1  Warnings: 0


mysql>>; update backup_job set filesize=44800 where backup_id=131710;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0


mysql>>; update backup_job set filesize=44900 where backup_id=131710;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0



on slave

mysql>>; select * from backup_job where backup_id='131710';
+-----------+---------------------+---------------------+----------+--------+------+----------+------------------------------------+----------+-------+
| backup_id | job_start_dt_tm     | job_end_dt_tm       | pod_name | status | type | dateb    | filename                           | filesize | vault |
+-----------+---------------------+---------------------+----------+--------+------+----------+------------------------------------+----------+-------+
|    131710 | 2014-11-07 07:15:15 | 2014-11-07 07:15:17 | drvv12   | Error  | inc  | 20141107 | inc-vault-3971-201411070715.tar.gz | 40000    | 3971  |
+-----------+---------------------+---------------------+----------+--------+------+----------+------------------------------------+----------+-------+

1 row in set (0.00 sec)


mysql>>; select * from backup_job where backup_id='131710';
+-----------+---------------------+---------------------+----------+--------+------+----------+------------------------------------+----------+-------+
| backup_id | job_start_dt_tm     | job_end_dt_tm       | pod_name | status | type | dateb    | filename                           | filesize | vault |
+-----------+---------------------+---------------------+----------+--------+------+----------+------------------------------------+----------+-------+
|    131710 | 2014-11-07 07:15:15 | 2014-11-07 07:15:17 | drvv12   | Error  | inc  | 20141107 | inc-vault-3971-201411070715.tar.gz | 44800    | 3971  |
+-----------+---------------------+---------------------+----------+--------+------+----------+------------------------------------+----------+-------+
1 row in set (0.00 sec)



mysql>>; select * from backup_job where backup_id='131710';
+-----------+---------------------+---------------------+----------+--------+------+----------+------------------------------------+----------+-------+
| backup_id | job_start_dt_tm     | job_end_dt_tm       | pod_name | status | type | dateb    | filename                           | filesize | vault |
+-----------+---------------------+---------------------+----------+--------+------+----------+------------------------------------+----------+-------+
|    131710 | 2014-11-07 07:15:15 | 2014-11-07 07:15:17 | drvv12   | Error  | inc  | 20141107 | inc-vault-3971-201411070715.tar.gz | 44900    | 3971  |
+-----------+---------------------+---------------------+----------+--------+------+----------+------------------------------------+----------+-------+

1 row in set (0.00 sec)

Application performance is slow

Scenario:

App login time has increased. Some one of the Sales and application support folks reported/complained that login takes login time.

We did confirm that login page sometimes just sits there and times out.

This was reported form all the PODs intermittently and all customers on the POD in general are showing higher login time.

Engineer Team mentioned there is a production_234234xx.js which sometimes takes longer than expected, its about 2MB file which show max of almost 5-6mins to load

Local Dublin Office speed = 70.63mbps download, 71.24mpbs upload
Download 8MB/sec

PA Office = download 17.87mbps
Download speed = 2.1MB/sec


I did few more scenario tests yesterday using the same curl command syntax

1. Download of production.js 3.1MB from (San Jose POD), twice every min (time Oct 9 14:30 - Oct 10 11:04 CDT)

2. Download of production.js 2.6MB from (Sterling POD), twice every min (time Oct 9 19:35 - Oct 10 11:04 CDT)

3. Download of 3.2MB tar file from stage4(ST), this test was without login into the application, twice every min (time Oct 9 19:11 - Oct 10 11:04 CDT)

All these tests were from Softlayer DC Datacenter (198.11.247.15)


1. We did 2470 downloads from San Jose, 87 of them took 6.5 sec rest shows 1.5 sec or under. About 3% with slow response.

2. We did 1789 downloads from Sterling, 49 of them took 5.2 sec rest shows 0.2 sec or under. About 3% again with slow response.

3. We did 1886 downloads from stage4 without using the application, 42 of them took 5.1 sec rest shows around 0.1 sec. About 2% with slow response.


All the output files are attached

This time I traced all the packets initiated from softlayer to stage4. This includes all the packets from test #2 and test #3


Surprisingly/Unfortunately the trace shows that all these are transactions took between 0.12 sec - 2.27 sec. It doesn't have any downloads showing 5 sec duration


Here are the trace files for review

https://drive.google.com/a/mywork.com/file/d/0B7-LikX8hKvKSFd3bjREbmk0Rzg/view?usp=sharing

https://drive.google.com/a/mywork.com/file/d/0B7-LikX8hKvKd0hlN080eGFTOG8/view?usp=sharing


Now, I'm trying to get into the curl results. Here is one example from vv1 downloads

time_namelookup: 5.006
time_connect: 5.076
time_appconnect: 5.273
time_pretransfer: 5.273
time_redirect: 0.000
time_starttransfer: 5.375
num_connects: 1
--------
time_total: 6.338

In all the cases, when we see the time 6.0+ sec.


The "time_namelookup: 5.006" is the one which makes it off the regular trend of 1.5 sec. If we simply remove this namelookup time we have a stable trend.


Now I'm not sure why curl is reporting this huge namelookup time, I will have to do some more digging on this ..


The 3 curl commands are as below:

1. curl "https://mywork-qms.myworkvault.com/ui/resources/scripts/minified/vault.main.production.js?1412277409" -H "Pragma: no-cache" -H "Accept-Encoding: gzip,deflate" -H "Accept-Language: en-US,en;q=0.8" -H "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36" -H "Accept: /" -H "Referer: https://mywork-qms.myworkvault.com/ui/" -H "Cookie: showNewsAlert=true; timestamp=; tx_regionMarkStyle=B1:#000000:3:0:#ffffff:1:2:1:0; tx_freeMode=region; tx_textMarkStyle=h103; tx_colorIndex=103; prodTS=1412277409; myworkUserSsoSettings=""sp#https://myworksys.okta.com/app/template_saml_2_0/kbzmgd97JZKLWKPZYTZJ/sso/saml""; JSESSIONID=EB1C5CA4AAAAC1C57CBEF9921166C69A.2; TK=07161E0AE282F531970F42BB3214F706ACA7EC30E22A3F03FB3936B093023A780C9D378E5CA012E6017C4DB142D1A421; TS=1412875533682; LT=sso; STS=""1412875533742,1412875534024""" -H "Connection: keep-alive" -H "Cache-Control: no-cache" --compressed -o prod.js -v --trace-time -w "@curltime.txt"


2. curl "https://stage4.myworkvault.com/resources/JMVC/mywork_vault/production_1407268926.js" -H "Pragma: no-cache" -H "Accept-Encoding: gzip,deflate" -H "Accept-Language: en-US,en;q=0.8" -H "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36" -H "Accept: /" -H "Referer: https://stage4.myworkvault.com/ui/" -H "Cookie: selectedLang=en; myworkUserSsoSettings=""""; JSESSIONID=A2F83783D8F70A357FA8E5B7150F7469.2; TK=9A7FF56D81BE2D50D1114479657B3A43A6B1CC259C7FB5A1F279ECFBF307E94F4711A09495E25A3ACDF9EBC7C3661F14; prodTS=1407268926; showNewsAlert=true; timestamp=" -H "Connection: keep-alive" -H "Cache-Control: no-cache" --compressed -v --trace-time -o stage4app.js -w "@curltime.txt"


3. curl -o curlstage4.tar https://downloadstage4.myworkvault.com/curltest.tar -w "@curltime.txt"



==========

working on stage4 first, changed ttl from 13 mins to 24hrs.

------------

QUESTIONS:

stage4.myworkvault.com, type = A, class = IN

ANSWERS:

 stage4.myworkvault.com

internet address = 309.20.232.135

ttl = 86400

AUTHORITY RECORDS:

ADDITIONAL RECORDS:

------------

Non-authoritative answer:

Name: stage4.myworkvault.com

Address: 309.20.232.135
stage4.myworkvault.com

Server: 10.0.80.11

Address: 10.0.80.11#53

============

We have some good results with higher TTL

As stated before, stage4.vvevavault.com TTL was set for 24hrs. and we also did additional test with mywork-qms.myworkvault.com setting the TTL for 1hr.

#1. Stage4 test ran from Oct 14 11:30 - Oct 15 18:05, about 5508 curl downloads, 3 times every min. It came with 0 slow responses.

#2. QMS test ran from Oct 14 17:03 Oct 15 18:05, about 4508 curl downloads, 3 times every min. It came with 23 slow responses. (around 0.5%)

At this point we can suggest setting up most of the active domain TTL to 24hrs to fix this problem.

However since we do domain migration more occasionally then traditionally done with a url, we should decide what is the best optimal TTL to set so that there is less administrative work on these domain.

=============