Introduction DNS (Domain Name Service) is one of the primary Internet services,which is to map human-friendly domain names to machine-friendly IPaddress. If there are a lot of people using DNS service (for example,subscribers use ISP's DNS server), one DNS server might be becoming abottleneck, and the server might fail.
Scalable DNS cluster can help provide scalability and availability of DNS service.
The Example below is about setting up a cluster for recursiveDNS but you can just as well use the same method for authorative DNS aswell. Just remember that clients who use your cluster as a secondarynameservice would need to also-notify{} each of your realservers, notjust the service-IP.
[edit] Architecture DNS is a simple service, there is no affinity between requests fromthe same client. DNS usually listens for queries at UDP port 53 and TCPport 53.
LVS can simply load balance UDP port 53 and TCP port 53 among aset of DNS servers, and there is no need to setup any persistenceoptions.
[edit] Configuration Example keepalived.conf:
! Balancer-Set for udp/53
virtual_server 194.97.173.124 53 {
delay_loop 10
lb_algo wrr
lb_kind DR
protocol UDP
! persistence_timeout 1
! persistence_granularity 255.255.255.255
! eth1.105 -> kai eth1.105
real_server 10.1.53.2 53 {
weight 1
MISC_CHECK {
misc_path "/usr/bin/dig -b 10.1.53.1 a resolve.test.roka.net @10.1.53.2 +time=1 +tries=5 +fail > /dev/null"
misc_timeout 6
}
}
! eth1.109 -> kai eth1.109
real_server 10.3.53.2 53 {
weight 1
MISC_CHECK {
misc_path "/usr/bin/dig -b 10.3.53.1 a resolve.test.roka.net @10.3.53.2 +time=1 +tries=5 +fail > /dev/null"
misc_timeout 6
}
}
}
As you can dig (;-) we are using an A record with a low TTL to testthe service for this setup is a recursive DNS cluster. So far dig worksfine with 44 real_servers configured on an idle Dual PIII 800.
on real_server kai we use the following netfilter setup tobe able to direct the traffic to different BIND processes on the samemachine/mac:
#DNAT 194.97.173.124->10.1.53.2 eth1.105
iptables -t nat -A PREROUTING -i eth1.105 -s $net -d 194.97.173.124 -p tcp --dport 53 -j DNAT --to-destination 10.1.53.2:53
iptables -t nat -A PREROUTING -i eth1.105 -s $net -d 194.97.173.124 -p udp --dport 53 -j DNAT --to-destination 10.1.53.2:53
#DNAT 194.97.173.124->10.3.53.2 eth1.109
iptables -t nat -A PREROUTING -i eth1.109 -s $net -d 194.97.173.124 -p tcp --dport 53 -j DNAT --to-destination 10.3.53.2:53
iptables -t nat -A PREROUTING -i eth1.109 -s $net -d 194.97.173.124 -p udp --dport 53 -j DNAT --to-destination 10.3.53.2:53
[edit] BIND9 When i wrote this example we were using two BIND processes on thesame machine for BIND9 currently just runs faster when it is notthreading. Here is something JINMEI Tatuya told me on the bind9-workersMailinglist which turned out to be very true:
If you go with disabling threads, you may also want to enable
"internal memory allocation". (I hear that) it should use memory more
efficiently (and can make the server faster) but is disabled by
default due to response-performance reasons in the threaded case. You
can enable this feature by adding the following line
#define ISC_MEM_USE_INTERNAL_MALLOC 1
just before the following part of bind9/lib/isc/mem.c:
#ifndef ISC_MEM_USE_INTERNAL_MALLOC
#define ISC_MEM_USE_INTERNAL_MALLOC 0
#endif
Try it and you will keep it. ;)
BIND 9.4 line makes use of this new internal malloc library bydefault now, but disabling threading will probably free you from thehickups some BIND9 users are experiencing.
[edit] PowerDNS recursor This one is a recursive-only Nameserver with very limitedauthorative DNS capabilities. The author of this Example uses PowerDNSrecursor (v.3.1.4) exclusively for his caching-only DNS cluster by nowand is glad that while giving roughly the same queries per secondperformance it generates less SERVFAIL answers and is generally severaltimes more robust than BIND9.
[edit] added redundancy via iBGP If you have more than one Loadbalancer at different locations andyou can convince your local Networker to let you speak BGP4+ to hisrouters you can use quagga with something like the followingconfiguration to failover the service IP to the second LB if the firstone goes down:
!
router bgp 5430
no synchronization
bgp router-id a.b.c.d
redistribute connected route-map benice
neighbor c.d.e.f remote-as 5430
neighbor c.d.e.f description ffm4-j2
neighbor c.d.e.f send-community both
neighbor c.d.e.f soft-reconfiguration inbound
neighbor c.d.e.f route-map nixda in
neighbor c.d.e.f route-map benice out
neighbor d.c.f.e remote-as 5430
neighbor d.c.f.e description ffm4-j
neighbor d.c.f.e send-community both
neighbor d.c.f.e soft-reconfiguration inbound
neighbor d.c.f.e route-map nixda in
neighbor d.c.f.e route-map benice out
no auto-summary
!
access-list line permit 127.0.0.1/32 exact-match
access-list line deny any
!
ip prefix-list cns-dus2 description dus2 high-metric eq low-perference
ip prefix-list cns-dus2 seq 5 permit 194.97.173.125/32
ip prefix-list cns-dus2 seq 10 deny any
ip prefix-list cns-ffm4 description ffm4 low-metric eq high-preference
ip prefix-list cns-ffm4 seq 5 permit 194.97.173.124/32
ip prefix-list cns-ffm4 seq 10 deny any
!
route-map benice permit 10
match ip address prefix-list cns-ffm4
set local-preference 100
set metric 0
!
route-map benice permit 20
match ip address prefix-list cns-dus2
set local-preference 100
set metric 1
!
route-map nixda deny 10
!
This is the LB at FFM4. Note that the metric at the DUS2 LB is justthe other way around.Here we fancy talking to two core-routers from each LB for extraredundancy.You can also have an internal anycast ServiceIP if you use the samemetric at both LBs and make sure they are attached to the same level ofrouter network-topology-wise. This way traffic gets shared between thetwo loadbalancers according to your network-topology most interestingof course for large dialin ISPs.
[edit] Problem dig does not return a non-zero error code when receiving a SERVFAILbut there are situations when some BIND9 versions return SERVFAIL forany query for example when they are out of memory. For a recursive DNScluster situation we would want to take such BIND processes out ofservice.
[edit] Workaround use the following perl script as a wrapper for dig which is quiteugly for perl is an interpretated language and forking it is not muchfun so this consumes much user cpu when executed every 6 seconds.
#!/usr/bin/perl
use strict;
use warnings;
# cmdline arguments: <FromIP> <Class> <QTYPE> <QNAME> <ToIP> <Times> <Tries> <ErrrorMatch> <Transport>
# /usr/bin/dig -b 10.5.53.1 IN A 2.0.0.127.my.test @10.5.53.2 +time=1 +tries=5 +fail
if(
((defined $ARGV[0])&&($ARGV[0]=~/^\d+\.\d+\.\d+\.\d+$/))
&&((defined $ARGV[1])&&($ARGV[1]=~/^(IN|CHAOS)$/))
&&((defined $ARGV[2])&&($ARGV[2]=~/^(A|ANY|MX|PTR|SRV|TXT|AAAA|NS|CNAME|SOA)$/))
&&((defined $ARGV[3])&&($ARGV[3]=~/^[A-Za-z0-9\-\.]+$/))
&&((defined $ARGV[4])&&($ARGV[4]=~/^\d+\.\d+\.\d+\.\d+$/))
&&((defined $ARGV[5])&&($ARGV[5]=~/^\d+$/))
&&((defined $ARGV[6])&&($ARGV[6]=~/^\d+$/))
&&((defined $ARGV[7])&&($ARGV[7]=~/^\S+$/))
) {
my $transport="notcp";
if((defined $ARGV[8])&&($ARGV[8]=~/^tcp$/i)) {
$transport="tcp";
} elsif ((defined $ARGV[8])&&($ARGV[8]=~/^udp$/i)) {
$transport="notcp";
}
my (@res)=`/usr/bin/dig -b $ARGV[0] $ARGV[1] $ARGV[2] $ARGV[3] \@$ARGV[4] +time=$ARGV[5] +tries=$ARGV[6] +fail +$transport 2>&1`;
my $return=$?;
if(my $error=(map {/status:\s*($ARGV[7])/ ? $1 : ()} @res)[0]) {
die("$error");
} elsif ($return!=0) {
die("dig returned: \"$return\"");
} elsif ($return==0) {
exit 0;
} else {
die("error: \"$return\" HAS BAD VALUE!");
}
} else {
die("dig-wrapper.pl <FromIP> <Class> <QTYPE> <QNAME> <ToIP> <Times> <Tries> <ErrrorMatch> <Transport>");
}
Ah yes, forgot to say: The Dual PIII 800 is not idleing aroundanymore - its busy running this script 44 times every 6 seconds, whichaccounts for roughly 12% user cpu and 5% system used at a query rate of~3600q/s.
[edit] Solution use a patched version of dig?
[edit] Conclusion It still just works.
这里提到了一个想法,使用BGP来做HA,这个想法不错:)
原文 http://kb.linuxvirtualserver.org ... S_Cluster_using_LVS
查看全部回复
我也来说两句