]> git.8kb.co.uk Git - slony-i/slony_failover/blobdiff - README.md
Changes:
[slony-i/slony_failover] / README.md
index 5898ee1c17dafc71b2cd611eb0b9a714c20d10a1..4b68f20287173adb3bd414328ca71d07ca9df8fc 100644 (file)
--- a/README.md
+++ b/README.md
@@ -7,13 +7,13 @@ The script can be run in interactive mode to suggest switchover or failover and
 will create a slonik script to perform the suggested action.
 
 It's hard to put together a script for all situations as different Slony 
-configurations can have different complexities (hence the existance of slonik), 
+configurations can have different complexities (hence the existence of slonik), 
 but this script is intended to be used for building and running slonik scripts 
 to move all sets from one node to another.
 
 There is also an autofailover mode which will sit and poll each node and perform
 a failover of failed nodes.  This mode should be assumed as experimental, as 
-there can be guite a few decisions to be made when failing over different setups.
+there can be quite a few decisions to be made when failing over different setups.
 
 ##Example usage
 
@@ -36,7 +36,7 @@ Run as a daemon in debian:
 
 ```bash
 $ sudo cp init.debian /etc/init.d/slony_failover
-$ cp slony_failover.conf /var/slony/failover/slony_failover.conf
+$ cp slony_failover.conf /var/slony/slony_failover/slony_failover.conf
 $ sudo chmod +x /etc/init.d/slony_failover
 $ sudo update-rc.d slony_failover start 99 2 3 4 5 . stop 24 0 1 6 
 $ sudo invoke-rc.d slony_failover start
@@ -45,7 +45,7 @@ $ sudo invoke-rc.d slony_failover start
 ##Command line parameters
 
 ```bash
-$ ./failover.pl [options]
+$ ./slony_failover.pl [options]
 ```
 
 |Switch    | Description
@@ -68,10 +68,11 @@ $ ./failover.pl [options]
 | General     |**separate_working_directory**                | boolean                       | *'true'*                        | Append a separate working directory to the prefix_directory for each run
 | General     |**slonik_path**                               | /full/path/to/bin/directory   | *null*                          | Slonik binary if not in current path
 | General     |**pid_filename**                              | /path/to/pidfile              | *'/var/run/slony_failover.pid'* | Pid file to use when running in autofailover mode
-| General     |**enable_try_blocks**                         | boolean                       | *false*                         |    Write slonik script with try blocks where possible to aid error handling
+| General     |**enable_try_blocks**                         | boolean                       | *false*                         | Write slonik script with try blocks where possible to aid error handling
 | General     |**lockset_method**                            | single/multiple               | *'multiple'*                    | Write slonik script that locks all sets
-| General     |**pull_aliases_from_comments**                | boolean                       | *false*                         | If true, script will pull text from between parentheses in comments fields
-|             |                                              |                               |                                 | and use to generate (possibly) meaningful aliases for nodes and sets.
+| General     |**pull_aliases_from_comments**                | boolean                       | *false*                         | If true, script will pull text from comment fields and use to generate
+|             |                                              |                               |                                 | possibly meaningful aliases for nodes and sets.
+|             |                                              |                               |                                 | For sl_set this uses the entire comment, and sl_node text in parentheses.
 | General     |**log_line_prefix**                           | text                          | *null*                          | Prefix to add to log lines, special values:
 |             |                                              |                               |                                 |     %p = process ID
 |             |                                              |                               |                                 |     %t = timestamp without milliseconds
@@ -101,9 +102,12 @@ $ ./failover.pl [options]
 | Autofailover|**autofailover_forwarding_providers**         | boolean                       | *'false'*                       | If true a failure of a pure forwarding provider will also trigger failover
 | Autofailover|**autofailover_config_any_node**              | boolean                       | *'true'*                        | After reading the initial cluster configuration, subsequent reads of the configuration 
 |             |                                              |                               |                                 | will use conninfo read from sl_subscribe to read from any node.
-| Autofailover|**autofailover_poll_interval**                | integer                       | 500                             | How often to check for failure of nodes (milliseconds)
-| Autofailover|**autofailover_node_retry**                   | integer                       | 2                               | When failure is detected, retry this many times before initiating failover
-| Autofailover|**autofailover_sleep_time**                   | integer                       | 1000                            | Interval between retries (milliseconds)
+| Autofailover|**autofailover_poll_interval**                | integer                       | *500*                           | How often to check for failure of nodes (milliseconds)
+| Autofailover|**autofailover_node_retry**                   | integer                       | *2*                             | When failure is detected, retry this many times before initiating failover
+| Autofailover|**autofailover_sleep_time**                   | integer                       | *1000*                          | Interval between retries (milliseconds)
+| Autofailover|**autofailover_perspective_sleep_time**       | integer                       | *20000*                         | Interval between lag reads for failed nodes from surviving nodes. If greater than zero any observation that nodes have failed is checked from surviving nodes perspective by checking if lag times are extending.  This does not guarantee 100% the nodes are down but if set to a large enough interval (at least sync_interval_timeout) can back up our observation.
+| Autofailover|**autofailover_majority_only**                | boolean                       | *false*                         | Only fail over if the quantity of surviving nodes is greater than the quantity of failed nodes.  Intended to be used to prevent a split-brain scenario in conjunction with some other logic to monitor and fence off the old origin if it is in the minority.
+| Autofailover|**autofailover_is_quorum**                    | boolean                       | *false*                         | If this script is running on a separate host set to true to treat it as a quorum server. Effectively increments sum of surviving nodes when calculating the majority above.
 
 Changes
 -------
@@ -112,6 +116,7 @@ Changes
 * 04/11/2012 - Experiment with different use of try blocks (currently can't use multiple lock sets indide try)
 * 13/04/2014 - Update to work differently for Slony 2.2+
 * 05/05/2014 - Experiment with autofailover ideas
+* 10/09/2014 - Add some logic to autofailover for doing extra checks from perspective of other nodes. Still a naive autofailover implementaition imho.
 
 Licence
 -------