Better high availability: MySQL and Percona XtraDB Cluster with good application design

Source:
https://www.percona.com/blog/2015/12/21/mysql-and-percona-xtradb-cluster-even-higher-availability-with-correct-application-design/

High Availability

Have you ever wondered if your application should be able to work in read-only mode? How important is that question?

MySQL seems to be the most popular database solution for web-based products. Most typical Internet application workloads consist of many reads, with usually few writes. There are exceptions of course – MMO games for instance – but often the number of reads is much bigger then writes. So when your database infrastructure looses its ability to accept writes, either because traditional MySQL replication topology lost its master or Galera cluster lost its quorum, why would you want to the application to declare total downtime? During this scenario, imagine all the users who are just browsing the application (not contributing content): they don’t care if the database cannot accept new data. People also prefer to have access to an application, even if it’s functionality is notably reduced, rather then see the 500 error pages. In some disaster scenarios it is a seriously time-consuming task to perform PITR or recover some valuable data: it is better to at least have the possibility of user read access to a recent backup.

My advice: design your application with the possible read-only partial outage in mind, and test how the application works in that mode during it’s development life cycle. I think it will pay off greatly and increase the perception of availability of your product. As an example, check out some of the big open source projects’ implementation of this concept, like MediaWiki or Drupal (and also some commercial products).
PXC

Having said that, I want to highlight a pretty new (and IMHO) important improvement in this regard, introduced in the Galera replication since PXC version 5.6.24. It was already mentioned by my colleague Stéphane in his blog post earlier this year.
Focus on data consistency

As you probably know, one of Galera’s key advantages is their great data consistency care and data-centric approach. No matter where you write in the cluster, all nodes must have the same data. This is important when you realize what happens when a data inconsistency is detected between the nodes. Inconsistent nodes, which cannot apply a writeset due to missing rows or duplicate unique key values for instance, will have to abort and perform an emergency shutdown. This happens in order to remove contaminated members from the cluster, and not spread the data “illness” further. If it does happen that the majority of nodes perform an emergency abort, the remaining minority may loose the cluster quorum and will stop serving further client’s requests. So the price for data consistency protection is availability.
Anti-split-brain

Sometimes a node or node cluster members loose connectivity to others, in a way that >50% of nodes can no longer communicate. Connectivity is lost all of a sudden, without a proper “goodbye” message from the “dissappeared” nodes. These nodes don’t know what the reason was for the lost connection – were the peers killed? or may be networks were split? In that situation, nodes declare a non-Primary cluster state and go into SQL-disabled mode. This is because a member of a cluster without a quorum (majority), hence not acting as Primary Component, is not trusted as it may have inconsistent or old data. Because of this state, it won’t allow the clients to access it.

This is for two reasons. First and unquestionably, we don’t want to allow writes when there is a risk of network split, where the other part of the cluster still forms the Primary Component and keeps operating. We may also want to disallow reads of a stall data, however, when there is a possibility that the other part of the infrastructure already has a lot of newer information.

In the standard MySQL replication process there are no such precautions – if replication is broken in master-master topology both masters can still accept writes, and they can also read anything from the slaves regardless of how much they may be lagging or if they are connected to their masters at all. In Galera though, even if too much lag in the applying queue is detected (similar to replication lag concept), the cluster will pause writes using a Flow Control mechanism. If replication is broken as described above, it will even stop the reads.
Dirty reads from Galera cluster

This behavior may seem too strict, especially if you just migrated from MySQL replication to PXC, and you just accept that database “slave” nodes can serve the read traffic even if they are separated from the “master.” Or if your application does not rely on writes but mostly on access to existing content. In that case, you can either enable the new wsrep_dirty_reads variable dynamically (per session only if needed), or setup your cluster to run this option by default by placing wsrep_dirty_reads = ON in the my.cnf (global values are acceptable in the config file is available since PXC 5.6.26).

The Galera topology below is something we very often see at customer sites, where WAN locations are configured to communicate via VPN:

WAN_PXC

WAN_PXCI think this failure scenario is a perfect usage case for wsrep_dirty_reads – where none of the cluster parts are able to work at full functionality alone, but could successfully keep serving read queries to the clients.

So let’s quickly see how the cluster member behaves with the wsrep_dirty_reads option disabled and enabled (for the test I blocked network communication on port 4567):
MySQL

percona3 mysql> show status like 'wsrep_cluster_status';
+----------------------+-------------+
| Variable_name        | Value       |
+----------------------+-------------+
| wsrep_cluster_status | non-Primary |
+----------------------+-------------+
1 row in set (0.00 sec)
percona3 mysql> show variables like '%dirty_reads';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| wsrep_dirty_reads | OFF   |
+-------------------+-------+
1 row in set (0.01 sec)
percona3 mysql>  select * from test.g1;
ERROR 1047 (08S01): WSREP has not yet prepared node for application use


	
percona3 mysql> show status like 'wsrep_cluster_status';
+----------------------+-------------+
| Variable_name        | Value       |
+----------------------+-------------+
| wsrep_cluster_status | non-Primary |
+----------------------+-------------+
1 row in set (0.00 sec)
 
percona3 mysql> show variables like '%dirty_reads';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| wsrep_dirty_reads | OFF   |
+-------------------+-------+
1 row in set (0.01 sec)
 
percona3 mysql>  select * from test.g1;
ERROR 1047 (08S01): WSREP has not yet prepared node for application use

And when enabled:
MySQL
percona2 mysql> show status like 'wsrep_cluster_status';
+----------------------+-------------+
| Variable_name        | Value       |
+----------------------+-------------+
| wsrep_cluster_status | non-Primary |
+----------------------+-------------+
1 row in set (0.00 sec)
percona2 mysql> show variables like '%dirty_reads';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| wsrep_dirty_reads | ON    |
+-------------------+-------+
1 row in set (0.00 sec)
percona2 mysql> select * from test.g1;
+----+-------+
| id | a     |
+----+-------+
|  1 | dasda |
|  2 | dasda |
+----+-------+
2 rows in set (0.00 sec)
percona2 mysql> insert into test.g1 set a="bb";
ERROR 1047 (08S01): WSREP has not yet prepared node for application use
percona2 mysql> show status like 'wsrep_cluster_status';
+----------------------+-------------+
| Variable_name        | Value       |
+----------------------+-------------+
| wsrep_cluster_status | non-Primary |
+----------------------+-------------+
1 row in set (0.00 sec)
 
percona2 mysql> show variables like '%dirty_reads';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| wsrep_dirty_reads | ON    |
+-------------------+-------+
1 row in set (0.00 sec)
 
percona2 mysql> select * from test.g1;
+----+-------+
| id | a     |
+----+-------+
|  1 | dasda |
|  2 | dasda |
+----+-------+
2 rows in set (0.00 sec)
 
percona2 mysql> insert into test.g1 set a="bb";
ERROR 1047 (08S01): WSREP has not yet prepared node for application use

MySQL

In traditional replication, you are probably using the slaves for reads anyway. So if the master crashes, and for some reason a failover toolkit like MHA or PRM is not configured or also fails, in order to keep the application working you should direct new connections meant for the master to one of the slaves. If you use a loadbalancer, maybe just have the slaves as backups for the master in the write pool. This may help to achieve a better user experience during the downtime, where everyone can at least use existing information. As noted above, however, the application must be prepared to work that way.

There are caveats to this implementation, as the “read_only” mode that is usually used on slaves is not 100% read-only. This is due to the exception for “super” users. In this case, the new super_read_only variable comes to the rescue (available in Percona Server 5.6) as well as stock MySQL 5.7. With this feature, there is no risk that after pointing database connections to one of the slaves, some special users will change the data.

If a disaster is severe enough, it may be necessary to recover data from a huge SQL dump, and it’s often hard to find enough spare servers to serve the traffic with an old binary snapshot. It’s worth noting that InnoDB has a special read-only mode, meant to be used in a read-only medium, that is lightweight compared to full InnoDB mode.

–EOF–

oracle password file issue

There is a online help from a old fellow which it’s oracle RAC env meet the error root in corrupt of orapwd file.This kind of issue can be fixed by following guide:

source from:http://www.adp-gmbh.ch/ora/admin/password_file.html

Oracle’s password file

If the DBA wants to start up an Oracle instance there must be a way for Oracle to authenticate this DBA. That is if (s)he is allowed to do so. Obviously, his password can not be stored in the database, because Oracle can not access the database before the instance is started up. Therefore, the authentication of the DBA must happen outside of the database. There are two distinct mechanisms to authenticate the DBA: using the password file or through the operating system.
The init parameter remote_login_passwordfile specifies if a password file is used to authenticate the DBA or not. If it set either to shared or exclusive a password file will be used.
Default location and file name
The default location for the password file is: $ORACLE_HOME/dbs/orapw$ORACLE_SID on Unix and %ORACLE_HOME%\database\PWD%ORACLE_SID%.ora on Windows.
Deleting a password file
If password file authentication is no longer needed, the password file can be deleted and the init parameter remote_login_passwordfile set to none.
Password file state
If a password file is shared or exclusive is also stored in the password file. After its creation, the state is shared. The state can be changed by setting remote_login_passwordfile and starting the database. That is, the database overwrites the state in the password file when it is started up.
A password file whose state is shared can only contain SYS.
Creating a password file
Password files are created with the orapwd tool.
Adding Users to the password file
Users are added to the password file when they’re granted the SYSDBA or sysoper privilege.

SYS@ora10> show user;
USER is "SYS"
SYS@ora10> select * from v$pwfile_users;

USERNAME                       SYSDB SYSOP
------------------------------ ----- -----
SYS                            TRUE  TRUE

SYS@ora10> grant SYSDBA to rene;

Grant succeeded.

SYS@ora10> select * from v$pwfile_users;

USERNAME                       SYSDB SYSOP
------------------------------ ----- -----
SYS                            TRUE  TRUE
RENE                           TRUE  FALSE

SYS@ora10> grant SYSOPER to rene;

Grant succeeded.

SYS@ora10> select * from v$pwfile_users;

USERNAME                       SYSDB SYSOP
------------------------------ ----- -----
SYS                            TRUE  TRUE
RENE                           TRUE  TRUE

SYS@ora10> revoke SYSDBA from rene;

Revoke succeeded.

SYS@ora10> select * from v$pwfile_users;

USERNAME                       SYSDB SYSOP
------------------------------ ----- -----
SYS                            TRUE  TRUE
RENE                           FALSE TRUE

SYS@ora10> revoke SYSOPER from rene;

Revoke succeeded.

SYS@ora10> select * from v$pwfile_users;

USERNAME                       SYSDB SYSOP
------------------------------ ----- -----
SYS                            TRUE  TRUE

–EOF–

Firewall、DCD、TCP Keep alive

from :http://blogread.cn/it/article/5198?f=wb

在以前我写的一篇文章《Oracle与防火墙》中提到,网络防火墙会切断长时间空闲的TCP连接,这个空闲时间具体多长可以在防火墙内部进行设置。防火墙切断连接之后,会有下面的可能:

 

  • 切断连接之前,连接对应的Oracle会话正在执行一个耗时特别长的SQL,比如存储过程而在此过程中没有任何数据输出到客户端,这样当SQL执行完成之后,向客户端返回结果时,如果TCP连接已经被防火墙中断,这时候显然会出现错误,连接中断,那么会话也就会中断。但是客户端还不知道,会一直处于等待服务器返回结果的状态。

 

从上面的前面2种情况来看,防火墙切断数据库TCP连接,引起的后果就会有: Continue reading “Firewall、DCD、TCP Keep alive”

FW:SQL Server and the “Lock pages in memory” Right in Windows Server

Here is a nice post talking about the “Lock pages in memory” for windows server,I just forward it from the source:
http://sqlserverperformance.wordpress.com/2011/02/14/sql-server-and-the-lock-pages-in-memory-right-in-windows-server/

It is pretty important to make sure you set the Max Server memory setting for SQL Server 2005/2008 to something besides the default setting (which allows SQL Server to use as much memory as it wants, subject to signals from the operating system that it is under memory pressure). This is especially important with larger, busier systems that may be under memory pressure.

This setting controls how much memory can be used by the SQL Server Buffer Pool. If you don’t set an upper limit for this value, other parts of SQL Server, and the operating system can be starved for memory, which can cause instability and performance problems. It is even more important to set this correctly if you have “Lock Pages in Memory” enabled for the SQL Server service account (which I always do for x64 systems with more than 4GB of memory)……………

Thanks to the author for the nice post

–EOF–

How to determine the version and edition of SQL Server and its components

How to determine the version and edition of SQL Server and its components? this page will indicate you about the version information from MS website:

https://support.microsoft.com/kb/321185/en-us

SQL Server 2012 version information

The following table lists the major releases of SQL Server 2012.
Release	Product Version
SQL Server 2012 Service Pack 1	11.00.3000.00
SQL Server 2012 RTM	        11.00.2100.60

SQL Server 2008 R2 version information

The following table lists the major releases of SQL Server 2008 R2.
Release	Product version
SQL Server 2008 R2 Service Pack 2	10.50.4000.0
SQL Server 2008 R2 Service Pack 1	10.50.2500.0
SQL Server 2008 R2 RTM	                10.50.1600.1

SQL Server 2008 version information

The following table lists the major releases of SQL Server 2008.
Release	Product version
SQL Server 2008 Service Pack 3	10.00.5500.00
SQL Server 2008 Service Pack 2	10.00.4000.00
SQL Server 2008 Service Pack 1	10.00.2531.00
SQL Server 2008 RTM	        10.00.1600.22

SQL Server 2005 version information

The following table lists the major releases of SQL Server 2005.
Release	Product version
SQL Server 2005 Service Pack 4	9.00.5000.00
SQL Server 2005 Service Pack 3	9.00.4035
SQL Server 2005 Service Pack 2	9.00.3042
SQL Server 2005 Service Pack 1	9.00.2047
SQL Server 2005 RTM	        9.00.1399

–EOF–

Shrink Trunsaction Logfile for SQL Server 2008R2

check the recovery mode of database

use master
SELECT NAME, recovery_model_desc FROM sys.databases;

change the recovery mode to simple

use master
alter database EDM_DEV set recovery simple;

locate the file name and the size of LOG file you want to shrink

use EDM_DEV
select name,size from sys.database_files;

the log is named “EDM_DEV_log”,shrink it to 1G

dbcc shrinkfile (N'EDM_DEV_log',1000)

check out the result of size of logfile,see if the size of logfile decreased

change the EDM database to full recovery mode

use matser
alter database EDM_DEV set recovery full;
SELECT NAME, recovery_model_desc FROM sys.databases;

cleanTrunsactionLogfile
–EOF–

sqlserver 2008r2 backup to the UNC share directory

I was trying to backup the sqlserver 2008 R2 to the share directory in the LAN,but lots of error trap me in.
I always got the error message as below:
backup database testdb to “\192.168.56.101backuptestdb.bak”

BackupDiskFile::CreateMedia: backup device ‘z:backupedm.bak’ can not be created。operation system error 3(can not find the path)

I think there must be some difference between the local disk and the UNC mapping disk.

the priviledge in the UNC share disk is ok,I can exclude it’s problem.
while it seem that it is because the account to start the SQL Server instance,that’s the root cause.
AS SQL Server services were startup by “NT AUTHORITYSYSTEM” which has no right to visit the remote resources in the LAN.If you use the domain user or any local os account who has the priviledge to visit remote resources,the problem will be overcome.
Finally,I change the startup account to administrator,this problem was solved.

Besides,it’s not recommended that you use the local administrator account to boot the SQL Server service,because when the OS admin change the password of administrator account,the SQL Server will can not be startup as the password was incorrect..

sqlserver2008R2_bakcup_to_UNC

–EOF–

upgrade oracle 11.2.0.3 RAC to 11.2.0.4

1.enviroment information

two nodes RAC on ASM based on oracle enterprise linux 6

crs version:11.2.0.3
rdbms version:11.2.0.3

2.precheck works
run the script “runcluvfy.sh” to verify if the two nodes is ready for upgrade
you can find the script runcluvfy.sh in the grid software package

3.prepare the directory
I will use the out-of-space upgrade,so I need to prepare the directory in advance.
out-of-space upgrade is recommended by oracle,prepare the seperated directory for the new GI home and RDBMS home Continue reading “upgrade oracle 11.2.0.3 RAC to 11.2.0.4”

srvctl: line xxx: /bin/java: No such file or directory

it’s quit strange,after opatching,the srvctl went wrong:

[oracle@node1 bin]$ srvctl
/opt/app/oracle/product/10.2.0/db_1/bin/srvctl: line 188: /bin/java: No such file or directory

it’s different from the situation of bug 3937317.

after comparing the content of srvctl,I find some missing value such as:

//there should be values in “JREDIR” and “JLIBDIR”

------------------------------------------------
CHOME=/local/oracle/app/OraHome_1/crs
OHOME=/local/oracle/app/OraHome_1/db_1
if [ "X$CHOME" != "X$OHOME" ]
then
    case $ORACLE_HOME in
        "") echo "****ORACLE_HOME environment variable not set!"
            echo "    ORACLE_HOME should be set to the main"
            echo "    directory that contains Oracle products."
            echo "    Set and export ORACLE_HOME, then re-run."
            exit 1;;
    esac
else
    ORACLE_HOME=/local/oracle/app/OraHome_1/crs
    export ORACLE_HOME
fi

# External Directory Variables set by the Installer
JREDIR=/local/oracle/app/OraHome_1/db_1/jdk/jre
JLIBDIR=/local/oracle/app/OraHome_1/db_1/jlib

# jar files
-------------------------------------------------

while there are no values in the srvctl where the issue happened,there should be

# External Directory Variables set by the Installer
JREDIR=
JLIBDIR=

//adding the values.attention,it’s different in $ORACLE_HOME/bin/srvctl and $ORACLE_CRS_HOME/bin/srvctl
//there were diffenrent,take that!

FOR:$ORACLE_HOME/bin/srvctl

# External Directory Variables set by the Installer
JREDIR=/opt/app/oracle/product/10.2.0/db_1/jdk/jre
JLIBDIR=/opt/app/oracle/product/10.2.0/db_1/jlib

FOR:$ORACLE_CRS_HOME/bin/srvctl

# External Directory Variables set by the Installer
JREDIR=/opt/app/oracle/product/10.2.0/crs/jdk/jre
JLIBDIR=/opt/app/oracle/product/10.2.0/crs/jlib

//problem solved
–EOF–

GSD offline in 10g RAC

The function of GSD (10g and above) is to service requests for 9i RAC management clients and therefore when there are no 9i databases present, there is nothing for GSD to do. Consequently, there will be no impact on a RAC cluster if GSD is offline and 9i is not used.

You may meet the situation that the GSD resource is offline in CRS stack,you can leave it alone.But you may meet the problem of long time running when you restart the CRS deamon,you can speed it up by the following trick as MOS note “GSD Is Used Only if 9i RAC Database is Present (Doc ID 429966.1)” described.

For 10g,you can follow this:

Disable GSD (pre 11.2)
After confirming that there are no 9i databases being used you can disable GSD by adding 'exit 0' after the initial comments in the script $ORACLE_HOME/bin/gsdctl where $ORACLE_HOME is the home from which nodeapps are running (i.e. crs home).

#case $ORACLE_HOME in  
# "") echo "****ORACLE_HOME environment variable not set!"  
# echo " ORACLE_HOME should be set to the main"  
# echo " directory that contains Oracle products."  
# echo " Set and export ORACLE_HOME, then re-run."  
# exit 1;;  
#esac  
exit 0 ## Manually added as a temporary workaround 

finally,after you fix this issue,the startup of CRS will come much more quickly.

–EOF–

How To Enable Archive Mode In RAC

How to enable the archive mode in three nodes RAC environment,this is what the page will talk about.

ref doc:How To Enable/Disable Archive Log Mode in Real Application Cluster Environment (Doc ID 235158.1)

1.the current information of RAC

[oracle@node3 ~]$ crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora....a1.inst application    ONLINE    ONLINE    node1
ora....a2.inst application    ONLINE    ONLINE    node2
ora....a3.inst application    ONLINE    ONLINE    node3
ora.aaa.db     application    ONLINE    ONLINE    node1
ora....E1.lsnr application    ONLINE    ONLINE    node1
ora.node1.gsd  application    ONLINE    ONLINE    node1
ora.node1.ons  application    ONLINE    ONLINE    node1
ora.node1.vip  application    ONLINE    ONLINE    node1
ora....E2.lsnr application    ONLINE    ONLINE    node2
ora.node2.gsd  application    ONLINE    ONLINE    node2
ora.node2.ons  application    ONLINE    ONLINE    node2
ora.node2.vip  application    ONLINE    ONLINE    node2
ora....E3.lsnr application    ONLINE    ONLINE    node3
ora.node3.gsd  application    ONLINE    ONLINE    node3
ora.node3.ons  application    ONLINE    ONLINE    node3
ora.node3.vip  application    ONLINE    ONLINE    node3

2.prepare the arch directory for on each node.(3 directories for each node) Continue reading “How To Enable Archive Mode In RAC”