git.8kb.co.uk Git - pgpool-ii/pgpool-ii_2.2.5/blob - doc/tutorial-en.html

   1 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
   2
   3 <html>
   4
   5 <head>
   6 <title>pgpool-II Tutorial</title>
   7 <meta http-equiv="Content-Type" content="text/html" />
   8 <link href="pgpool.css" rel="stylesheet" type="text/css" />
   9 </head>
  10
  11 <body>
  12 <h1>pgpool-II Tutorial</h1>
  13 <p>Welcome to the Tutorial for pgpool-II. From here, you can learn how
  14 to install, setup, and run parallel queries or do replication using
  15 pgpool-II. We assume that you already know basic PostreSQL operations,
  16 so please refer to the PostgreSQL document if needed.</p>
  17
  18 <dl>
  19   <dt><em>Table of Contents</em></dt>
  20   <dt>1. <a href="#start">Let's Begin!</a></dt>
  21   <dd>1.1. <a href="#install">Installing pgpool-II</a></dd>
  22   <dd>1.2. <a href="#config">Configuration Files</a></dd>
  23   <dd>1.3. <a href="#pcp-config">Configuring PCP commands</a></dd>
  24   <dd>1.4. <a href="#db-node">Preparing Database Nodes</a></dd>
  25   <dd>1.5. <a href="#start-shutdown">Starting/Stopping pgpool-II</a></dd>
  26   <dt>2. <a href="#replication">Your First Replication</a></dt>
  27   <dd>2.1. <a href="#replication-config">Configuring Replication</a></dd>
  28   <dd>2.2. <a href="#replication-check">Checking Replication</a></dd>
  29   <dt>3. <a href="#parallel">Your First Parallel Query</a></dt>
  30   <dd>3.1. <a href="#parallel-config">Configuring Parallel Query</a></dd>
  31   <dd>3.2. <a href="#system-db">Configuring the System Database</a></dd>
  32   <dd>3.3. <a href="#dist-def">Partitioning Rule Definition</a></dd>
  33   <dd>3.4. <a href="#replicate-def">Replication Rule Definition</a></dd>
  34   <dd>3.5. <a href="#parallel-check">Checking Parallel Query</a></dd>
  35 </dl>
  36
  37
  38 <h2>1. <a name="start">Let's Begin!</a></h2>
  39
  40 <p>First, we must learn how to install, configure pgpool-II and
  41 database nodes before using replication or parallel query.</p>
  42
  43 <h3>1.1. <a name="install">Installing pgpool-II</a></h3>
  44
  45 <p>Installing pgpool-II is very easy. In the directory which you have
  46 extracted the source tar ball, execute the following commands.</p>
  47
  48 <pre>
  49 $ ./configure
  50 $ make
  51 $ make install
  52 </pre>
  53
  54 <p><code>configure</code> script collects your system information
  55 and use it for the compilation procedure. You can pass command line
  56 arguments to <code>configure</code> script to change the default
  57 behavior, such as the installation directory. pgpool-II will be
  58 installed to <code>/usr/local</code> directory by default.</p>
  59
  60 <p><code>make</code> command compiles the source code, and <code>make
  61 install</code> will install the executables. You must have write
  62 permission on the installation directory.</p>
  63
  64 <p>In this tutorial, we will install pgpool-II in the default
  65 <code>/usr/local</code> directory.</p>
  66
  67 <p><em>Note</em>: pgpool-II requires libpq library in PostgreSQL 7.4
  68 or later (version 3 protocol). If <code>configure</code> script
  69 displays the following error message, libpq library may not be
  70 installed, or it is not of version 3.</p>
  71
  72 <pre>configure: error: libpq is not installed or libpq is old</pre>
  73
  74 <p>If the library is version 3, but above message is still displayed,
  75 your libpq library is probably not recognized by
  76 <code>configure</code> script.</p>
  77
  78 <p><code>configure</code> script searches for libpq library under
  79 <code>/usr/local/pgsql</code> libaray. If you have installed
  80 PostgreSQL to a directory other than <code>/usr/local/pgsql</code>,
  81 use <code>--with-pgsql</code>, or <code>--with-pgsql-includedir</code>
  82 and <code>--with-pgsql-libdir</code> command line options when you
  83 execute <code>configure</code>.</p>
  84
  85 <h3>1.2. <a name="config">Configuration Files</a></h3>
  86
  87 <p>pgpool-II configuration parameters are saved in
  88 <code>pgpool.conf</code> file. The file is in "<code>parameter =
  89 value</code>" per line format. When you install pgpool-II,
  90 <code>pgpool.conf.sample</code> is automatically created. We recommend
  91 copying and renaming it to <code>pgpool.conf</code>, and edit it as
  92 you like.</p>
  93
  94 <pre>$ cp /usr/local/etc/pgpool.conf.sample /usr/local/etc/pgpool.conf</pre>
  95
  96 <p>pgpool-II only accepts connections from the local host using port
  97 9999. If you wish to receive conenctions from other hosts, set
  98 <code>listen_addresses</code> to '*'.</code></p>
  99
 100 <pre>
 101 listen_addresses = 'localhost'
 102 port = 9999
 103 </pre>
 104
 105 <p>We will use the default parameters in thie tutorial.</p>
 106
 107 <h3>1.3. <a name="pcp-config">Configuring PCP Commands</a></h3>
 108
 109 <p>pgpool-II has an interface for administration purpose to retrieve
 110 information on database nodes, shutdown pgpool-II, etc. via
 111 network. To use PCP commands, user authentication is required. This
 112 authentication is different from PostgreSQL's user authentication. A
 113 username and password need to be defined in <code>pcp.conf</code>
 114 file. In the file, a username and password are listed as a pair on
 115 each line, and they are separated by a colon (:). Passwords are
 116 encrypted in md5 hash format.</p>
 117
 118 <pre>postgres:e8a48653851e28c69d0506508fb27fc5</pre>
 119 <p>
 120 When you install pgpool-II, <code>pcp.conf.sample</code> is
 121 automatically created. We recommend copying and renaming it to
 122 <code>pcp.conf</code>, and edit it.
 123 </p>
 124 <pre>$ cp /usr/local/etc/pcp.conf.sample /usr/local/etc/pcp.conf</pre>
 125
 126 <p>To encrypt your password into md5 hash format, use pg_md5 command,
 127 which is installed as a part of pgpool-II
 128 executables. <code>pg_md5</code> takes text as an command line
 129 argument, and displays its md5-hashed text.</p>
 130
 131 <p>For example, give "postgres" as the command line argument, at
 132 <code>pg_md5</code> displays md5-hashed text to the standard
 133 output.</p>
 134
 135 <pre>
 136 $ /usr/bin/pg_md5 postgres
 137 e8a48653851e28c69d0506508fb27fc5
 138 </pre>
 139
 140 <p>PCP commands are executed via network, so the port number must be
 141 configured with <code>pcp_port</code> parameter in
 142 <code>pgpool.conf</code> file.</p>
 143
 144 <p>We will use the default 9898 for <code>pcp_port</code> in this tutorial.</p>
 145
 146 <pre>pcp_port = 9898</pre>
 147
 148 <h3>1.4. <a name="db-node">Preparing Database Nodes</a></h3>
 149
 150 <p>Now, we need to set up backend PostgreSQL servers for
 151 pgpool-II. These servers can be placed within the same host as
 152 pgpool-II, or on separate machines. If you decide to place the servers
 153 on the same host, different port numbers must be assigned for each
 154 server. If the servers are placed on separate machines, they must be
 155 configured properly so that they can accept network connections from
 156 pgpool-II.</p>
 157
 158 <p>In this tutorial, we will place three servers within the same host
 159 as pgpool-II, and assign 5432, 5433, 5434 port numbers
 160 respectively. To configure pgpool-II, edit <code>pgpool.conf</code> as
 161 follows.
 162
 163 <pre>
 164 backend_hostname0 = 'localhost'
 165 backend_port0 = 5432
 166 backend_weight0 = 1
 167 backend_hostname1 = 'localhost'
 168 backend_port1 = 5433
 169 backend_weight1 = 1
 170 backend_hostname2 = 'localhost'
 171 backend_port2 = 5434
 172 backend_weight2 = 1
 173 </pre>
 174
 175 <p>For <code>backend_hostname</code>, <code>backend_port</code>,
 176 <code>backend_weight</code>, set the node's hostname, port number, and
 177 ratio for load balancing. At the end of each parameter string, node ID
 178 must be specified by adding positive integers starting with 0 (i.e. 0,
 179 1, 2, &hellip;).</p>
 180
 181 <p><code>backend_weight</code> parameters are all 1, meaning that
 182 SELECT queries are equally distributed among three servers.</p>
 183
 184 <h3>1.5. <a name="start-shutdown">Starting/Stopping pgpool-II</a></h3>
 185
 186 <p>To fire up pgpool-II, execute the following command on a terminal.</p>
 187
 188 <pre>$ pgpool</pre>
 189
 190 <p>The above command, however, prints no log messages because pgpool
 191 detaches the terminal. If you want to show pgpool log messages, you
 192 pass <code>-n</code> option to pgpool command. pgpool-II is executed
 193 as non-daemon process, and the terminal will not be detached.
 194 </p>
 195
 196 <pre>
 197 $ pgpool -n &
 198 </pre>
 199
 200 <p>
 201 The log messages are printed on the terminal, so the recommended
 202 options to use are like the following.
 203 </p>
 204
 205 <pre>$ pgpool -n -d &gt; /tmp/pgpool.log 2&gt;&1 &</pre>
 206
 207 <p><code>-d</code> option enables debug messages to be generated.</p>
 208
 209 <p>
 210 The above command keeps appending log messages to /tmp/pgpool.log. If
 211 you need to rotate log files, pass the logs to a external command
 212 which have log rotation function.
 213
 214 For example, you can use rotatelogs coming with Apache2:
 215 <pre>
 216 $ pgpool -n 2>&1 | /usr/local/apache2/bin/rotatelogs \
 217   -l -f /var/log/pgpool/pgpool.log.%A 86400 &
 218 </pre>
 219 This will generate a log file named "pgpool.log.Thursday" everyday then
 220 rotate it 00:00 at midnight. Rotatelogs adds log to a file if it already
 221 exists. To delete old log files before rotation, you could use cron:
 222 <pre>
 223 55 23 * * * /usr/bin/find /var/log/pgpool -type f -mtime +5 -exec /bin/rm -f '{}' \;
 224 </pre>
 225 Please note that rotaelogs may exist as /usr/sbin/rotatelogs2 in some
 226 distributions.
 227 -f option generates a log file as soon as rotatelogs starts and is
 228 available apache2 2.2.9 or greater.
 229 </p>
 230 <p>
 231 Also <a
 232 href="http://www.cronolog.org"><code>cronolog</code></a> helps you.
 233 </p>
 234
 235 <pre>
 236 $ pgpool -n 2>&1 | /usr/sbin/cronolog \
 237   --hardlink=/var/log/pgsql/pgpool.log \
 238   '/var/log/pgsql/%Y-%m-%d-pgpool.log' &
 239 </pre>
 240
 241 <p>To stop pgpool-II process, execute the following command.</p>
 242
 243 <pre>$ pgpool stop</pre>
 244
 245 <p>If any client is still connected, pgpool-II waits for them to
 246 disconnect, and then terminate itself. Execute the following command
 247 instead if you want to shutdown pgpool-II forcibly.
 248
 249 <pre>$ pgpool -m fast stop</pre>
 250
 251 <h2>2. <a name="replication">Your First Replication</a></h2>
 252
 253 <p>Replication enables the same data to be copied to multiple database
 254 nodes.</p>
 255
 256 <p>In this section, we'll use three database nodes, which we have
 257 already set up in section "1. <a href="#start">Let's Begin!</a>", and
 258 takes you step by step to create a database replication system. Sample
 259 data to be replicated will be generated by pgbench benchmark
 260 program.</p>
 261
 262 <h3>2.1. <a name="replication-config">Configuring Replication</a></h3>
 263
 264 <p>To enable the database replication function, set
 265 <code>replication_mode</code> to true in <code>pgpool.conf</code>
 266 file.</p>
 267
 268 <pre>replication_mode = true</pre>
 269
 270 <p>When <code>replication_mode</code> is set to true, pgpool-II will send a
 271 copy of a received query to all the database nodes.</p>
 272
 273 <p>When <code>load_balance_mode</code> is set to true, pgpool-II will
 274 distribute SELECT queries among the database nodes.</p>
 275
 276 <pre>load_balance_mode = true</pre>
 277
 278 <p>In this section, we enable both <code>replication_mode</code> and
 279 <code>load_balance_mode</code>.</p>
 280
 281 <h3>2.2. <a name="replication-check">Checking Replication</a></h3>
 282
 283 <p>To reflect the changes in <code>pgpool.conf</code>, pgpool-II must
 284 be restarted. Please refer to section "1.5 <a
 285 href="#start-shutdown">Starting/Stopping pgpool-II</a>".</p>
 286
 287 <p>After configuring <code>pgpool.conf</code> and restarting
 288 pgpool-II, let's try the actual replication and see if everything is
 289 working OK.</p>
 290
 291 <p>First, we need to create a database to be replicated. We will name
 292 it "bench_replication". This database needs to be created on all the
 293 nodes. Use <code>createdb</code> commands through pgpool-II, and the
 294 database will be created on all the nodes.</p>
 295
 296 <pre>$ createdb -p 9999 bench_replication</pre>
 297
 298 <p>Then, we'll execute pgbench with <code>-i</code>
 299 option. <code>-i</code> option initializes the database with
 300 pre-defined tables and data.</p>
 301
 302 <pre>$ pgbench -i -p 9999 bench_replication</pre>
 303
 304 <p>The following table is the summary of tables and data, which will
 305 be created by <code>pgbench -i</code>. If, on all the nodes, the
 306 listed tables and data are created, replication is working correctly.
 307
 308 <table border="1" align="center">
 309 <tr>
 310 <th>Table Name</th>
 311 <th>Number of Rows</th>
 312 </tr>
 313 <tr>
 314 <td>branches</td>
 315 <td>1</td>
 316 </tr>
 317 <tr>
 318 <td>tellers</td>
 319 <td>10</td>
 320 </tr>
 321 <tr>
 322 <td>accounts</td>
 323 <td>100000</td>
 324 </tr>
 325 <tr>
 326 <td>history</td>
 327 <td>0</td>
 328 </tr>
 329 </table>
 330
 331 <p>Let's use a simple shell script to check the above on all the
 332 nodes. The following script will display the number of rows in
 333 branches, tellers, accounts, and history tables on all the nodes (5432,
 334 5433, 5434).</p>
 335
 336 <pre>$ for port in 5432 5433 5434; do
 337 &gt;     echo $port
 338 &gt;     for table_name in branches tellers accounts history; do
 339 &gt;         echo $table_name
 340 &gt;         psql -c &quot;SELECT count(*) FROM $table_name&quot; -p $port bench_replication
 341 &gt;     done
 342 &gt; done
 343 </pre>
 344
 345 <h2>3. <a name="parallel">Your First Parallel Query</a></h2>
 346 <p>
 347 Data within the different range is stored in two or more data base nodes in a parallel Query. This is called a partitioning. Moreover, the same data as two or more data base nodes can be reproduced with partitioning.
 348 </p>
 349
 350 <p>To enable parallel query in pgpool-II, you must set up another
 351 database called "System Database" (we will denote it as SystemDB from
 352 this point).</p>
 353
 354 <p>SystemDB holds the user-defined rules to decide what data will be
 355 saved in which database node. Another use of SystemDB is to merge
 356 results sent back from the database nodes using dblink.</p>
 357
 358 <p>In this section, we will use three database nodes which we have set
 359 up in section "1. <a href="#start">Let's Begin!</a>", and takes you
 360 step by step to create a parallel query database system. We will use
 361 pgbench again to create sample data.</p>
 362
 363 <h3>3.1. <a name="parallel-config">Configuring Parallel Query</a></h3>
 364
 365 <p>To enable the parallel query function, set <code>parallel_mode</code> to true in <code>pgpool.conf</code> file.</p>
 366
 367 <pre>parallel_mode = true</pre>
 368
 369 <p>Setting <code>paralle_mode</code> to true does not start parallel
 370 query automatically. pgpool-II needs SystemDB and the rules
 371 to know how to distribute data to the database nodes.</p>
 372
 373 <p>Also, dblink used by SystemDB makes connections to
 374 pgpool-II. Therefore, <code>listen_addresses</code> needs to be
 375 configured so that pgpool-II accepts those connections.</p>
 376
 377 <pre>listen_addresses = '*'</pre>
 378
 379 <p>
 380 Attention: The replication is not done for the table that does the partitioning
 381 though a parallel Query and the replication can be made effective at the same time.
 382 The data base made by Moreover, because the composition of the data stored in
 383 the data base is different in a parallel Query and the replication, that is,
 384 "bench_replication" created in section "2. <a href="#replication">Your
 385 First Replication</a>" cannot be reused.</p>
 386
 387 <pre>
 388 replication_mode = true
 389 load_balance_mode = false
 390 </pre>
 391 <p>
 392 OR
 393 </p>
 394 <pre>
 395 replication_mode = false
 396 load_balance_mode = true
 397 </pre>
 398
 399 <p>In this section, we will set <code>parallel_mode and load_balance_mode</code> to true,
 400 <code>listen_addresses</code> to '*', <code>replication_mode</code>to false.</p>
 401
 402 <h3>3.2. <a name="system-db">Configuring SystemDB</a></h3>
 403
 404 <p>
 405 There is no difference in the data base the system data base and usually.
 406 However, the function of dblink is defined in the system data base, and
 407 the table that stores a distribution rule.. It is necessary to define dist_def. Moreover,
 408 the data base node One can make the system data base, and pgpool-II can be
 409 distributed in the load by connecting the cascade.
 410 </p>
 411
 412 <p>In this section, we will create SystemDB within the 5432 port
 413 node. The following list is the configuration parameters for
 414 SystemDB</p>
 415
 416 <pre>
 417 system_db_hostname = 'localhost'
 418 system_db_port = 5432
 419 system_db_dbname = 'pgpool'
 420 system_db_schema = 'pgpool_catalog'
 421 system_db_user = 'pgpool'
 422 system_db_password = ''
 423 </pre>
 424
 425 <p>Actually, the above are the default settings of
 426 <code>pgpool.conf</code>. Now, we must create a user called "pgpool",
 427 and a database called "pgpool" owned by user "pgpool".</p>
 428
 429 <pre>
 430 $ createuser -p 5432 pgpool
 431 $ createdb -p 5432 -O pgpool pgpool
 432 </pre>
 433
 434 <h4><p>3.2.1. Installing dblink</p></h4>
 435
 436 <p>Next, we must install dblink into "pgpool" database. dblink is one
 437 of the tools included in <code>contrib</code> directory in the
 438 PostgreSQL source code.</p>
 439
 440 <p>To install dblink to your system, execute the following commands.</p>
 441
 442 <pre>
 443 $ USE_PGXS=1 make -C contrib/dblink
 444 $ USE_PGXS=1 make -C contrib/dblink install
 445 </pre>
 446
 447 <p>After dblink has been installed into your system, we will define
 448 dblink functions in "pgpool" database. If PostgreSQL is installed in
 449 <code>/usr/local/pgsql</code>, <code>dblink.sql</code> (a file with
 450 function definitions) should have been installed in
 451 <code>/usr/local/pgsql/share/contrib</code>. Now, execute the
 452 following command to define dblink functions.</p>
 453
 454 <pre>$ psql -f /usr/local/pgsql/share/contrib/dblink.sql -p 5432 pgpool</pre>
 455
 456 <h4><p>3.2.2. Defining dist_def table</p></h4>
 457
 458 <p>Next, we will define a table called "dist_def" to hold the
 459 distribution rules. When pgpool-II was installed, a file called
 460 <code>system_db.sql</code> should have been installed in
 461 <code>/usr/local/share/system_db.sql</code> (note that in this
 462 tutorial, we are using the default installation directory,
 463 <code>/usr/local</code>). <code>systeym_db.sql</code> contains
 464 definitions to create special tables including "dist_def"
 465 table. Execute the following command to define "dist_def" table.</p>
 466
 467 <pre>$ psql -f /usr/local/share/system_db.sql -p 5432 -U pgpool pgpool</pre>
 468
 469 <p>In <code>system_db.sql</code>, tables including "dist_def" are
 470 installed in "pgpool_catalog" schema. If you have configured
 471 <code>system_db_schema</code> to use other schema, you need to edit
 472 <code>system_db.sql</code> accordingly.</p>
 473
 474 <p>The definition for "dist_def" is as shown here, and the table name
 475 cannot be changed.</p>
 476
 477 <pre>
 478 CREATE TABLE pgpool_catalog.dist_def (
 479     dbname text, -- database name
 480     schema_name text, -- schema name
 481     table_name text, -- table name
 482     col_name text NOT NULL CHECK (col_name = ANY (col_list)), -- distribution key-column
 483     col_list text[] NOT NULL, -- list of column names
 484     type_list text[] NOT NULL, -- list of column types
 485     dist_def_func text NOT NULL, -- distribution function name
 486     PRIMARY KEY (dbname, schema_name, table_name)
 487 );
 488 </pre>
 489
 490 <p>A tuple stored in "dist_def" can be classified into two types.</p>
 491
 492 <ul>
 493 <li>Distribution Rule (col_name, dist_def_func)</li>
 494 <li>Table's meta-information (dbname, schema_name, table_name, col_list, type_list)</li>
 495 </ul>
 496
 497 <p>A distribution rule decides how to distribute data to a
 498 particular node. Data will be distributed depending on the value of
 499 "col_name" column. "dist_def_func" is a function that takes the value
 500 of "col_name" as its argument, and returns an integer which
 501 points to the appropriate database node ID where the data should be
 502 stored.</p>
 503
 504 <p>A meta-information is used to rewrite queries. Parallel query
 505 must rewrite queries so that the results sent back from the backend
 506 nodes can be merged into one result.</p>
 507
 508 <h4><p>3.2.2. Defining replicate_def table</p></h4>
 509 <p>
 510 When the table that does the replication to one SQL sentence with the table
 511 registered in dist_def by uniting tables is specified, information on the
 512 table that does the replication (replication rule) is registered in the table
 513 named replicate_def beforehand.
 514 The replicate_def table has already been made when making it from the
 515 system_db.sql file when dist_def is defined.
 516 The replicate_def table is defined as follows.
 517 </p>
 518 <pre>
 519 CREATE TABLE pgpool_catalog.replicate_def (
 520     dbname text, -- database name
 521     schema_name text, -- schema name
 522     table_name text, -- table name
 523     col_list text[] NOT NULL, -- list of column names
 524     type_list text[] NOT NULL, -- list of column types
 525     PRIMARY KEY (dbname, schema_name, table_name)
 526 );
 527 </pre>
 528
 529
 530 <h3>3.3. <a name="dist-def">Defining Distribution Rules</a></h3>
 531
 532 <p>In this tutorial, we will define rules to distribute pgbench's
 533 sample data into three database nodes. The sample data will be created
 534 by "pgbench -i -s 3" (i.e. scale factor of 3). We will create a new
 535 database called "bench_parallel" for this section.</p>
 536
 537 <p>In pgpool-II's source code, you can find
 538 <code>dist_def_pgbench.sql</code> file in <code>sample</code>
 539 directoy. We will use this sample file here to create distribution
 540 rules for pgbench. Execute the following command in extracted
 541 pgpool-II source code directory.</p>
 542
 543 <pre>$ psql -f sample/dist_def_pgbench.sql -p 5432 pgpool</pre>
 544
 545 <p>Here is the explanation of <code>dist_def_pgbench.sql</code>.</p>
 546
 547 <p>Inside <code>dist_def_pgbench.sql</code>, we are inserting one
 548 row into "dist_def" table. There is a distribution
 549 function for accounts table.
 550 For key-column, aid is defined for accounts respectively (which is primary keys)</p>
 551
 552 <pre>
 553 INSERT INTO pgpool_catalog.dist_def VALUES (
 554     'bench_parallel',
 555     'public',
 556     'accounts',
 557     'aid',
 558     ARRAY['aid', 'bid', 'abalance', 'filler'],
 559     ARRAY['integer', 'integer', 'integer', 'character(84)'],
 560     'pgpool_catalog.dist_def_accounts'
 561 );
 562 </pre>
 563
 564 <p>Now, we must define the distribution function for accounts table. Note
 565 that you can use the same function from different tables. Also, you
 566 can define functions using languages other than SQL (e.g. PL/pgSQL,
 567 PL/Tcl, etc.).</p>
 568
 569 <p>
 570 The accounts table when data is initialized specifying 3 scale factor,
 571 The value of the aid is 1 to 300000.
 572 The function is defined so that data is evenly distributed to three data base nodes.
 573 </p>
 574 <p>
 575 SQL function will be defined as the return of the number of the data base node.
 576 </p>
 577
 578 </p>
 579 <pre>CREATE OR REPLACE FUNCTION pgpool_catalog.dist_def_branches(anyelement)
 580 RETURNS integer AS $$
 581     SELECT CASE WHEN $1 &gt; 0 AND $1 &lt;= 1 THEN 0
 582         WHEN $1 &gt; 1 AND $1 &lt;= 2 THEN 1
 583         ELSE 2
 584     END;
 585 $$ LANGUAGE sql;
 586 </pre>
 587
 588 <h3>3.4. <a name="replicate-def">Defining Replication Rules</a></h3>
 589 <p>
 590 The replication rule is the one that which table decides the replication whether to be done.
 591 </p>
 592
 593 <p>
 594 Here, it is made with pgbench With the branches table and  tellers  table are registered.
 595
 596 As a result, the accounts table and the inquiry that uses the branches table
 597 and the tellers table become possible.
 598 </p>
 599 <pre>
 600 INSERT INTO pgpool_catalog.replicate_def VALUES (
 601     'bench_parallel',
 602     'public',
 603     'branches',
 604     ARRAY['bid', 'bbalance', 'filler'],
 605     ARRAY['integer', 'integer', 'character(88)']
 606 );
 607
 608 INSERT INTO pgpool_catalog.replicate_def VALUES (
 609     'bench_parallel',
 610     'public',
 611     'tellers',
 612     ARRAY['tid', 'bid', 'tbalance', 'filler'],
 613     ARRAY['integer', 'integer', 'integer', 'character(84)']
 614 );
 615 </pre>
 616 <p>
 617 Replicate_def_pgbench.sql is prepared in sample directory.
 618
 619 In the directory that progresses the source code to define a replicate rule by using this as follows The psql command is executed.
 620 </p>
 621 <pre>
 622 $ psql -f sample/replicate_def_pgbench.sql -p 5432 pgpool
 623 </pre>
 624
 625 <h3>3.5. <a name="parallel-check">Checking Parallel Query</a></h3>
 626
 627 <p>To reflect the changes in <code>pgpool.conf</code>, pgpool-II must
 628 be restarted. Please refer to section "1.5 <a
 629 href="#start-shutdown">Starting/Stopping pgpool-II</a>".</p>
 630
 631 <p>After configuring <code>pgpool.conf</code> and restarting
 632 pgpool-II, let's try and see if parallel query is working OK.</p>
 633
 634 <p>First, we need to create a database to be distributed. We will name
 635 it "bench_parallel". This database needs to be created on all the
 636 nodes. Use <code>createdb</code> commands through pgpool-II, and the
 637 database will be created on all the nodes.</p>
 638
 639 <pre>$ createdb -p 9999 bench_parallel</pre>
 640
 641 <p>Then, we'll execute pgbench with <code>-i -s 3</code>
 642 options. <code>-i</code> option initializes the database with
 643 pre-defined tables and data. <code>-s</code> option specifies the
 644 scale factor for initialization.</p>
 645
 646 <pre>$ pgbench -i -s 3 -p 9999 bench_parallel</pre>
 647
 648 <p>The tables and data created are shown in "3.3. <a
 649 href="#dist-def">Defining Distribution Rules</a>".</p>
 650
 651 <p>One way to check if the data have been distributed correctly is to
 652 execute a SELECT query via pgpool-II and directly on the backend, and
 653 compare two results. If everything is configured right,
 654 "bench_parallel" should be distributed as follows.</p>
 655
 656 <table border="1" align ="center">
 657 <tr>
 658 <th >Table Name</th>
 659 <th >the number of lines</th>
 660 </tr>
 661 <tr>
 662                 <td>branches</td>
 663                 <td>3</td>
 664 </tr>
 665 <tr>
 666                 <td>tellers</td>
 667                 <td>30</td>
 668 </tr>
 669 <tr>
 670                 <td>accounts</td>
 671                 <td>300000</td>
 672 </tr>
 673 <tr>
 674                 <td>history</td>
 675                 <td>0</td>
 676 </tr>
 677 </table>
 678
 679 <p>Let's use a simple shell script to check the above on all the nodes
 680 and via pgpool-II. The following script will display the minimum and
 681 maximum values in accounts table using port 5432, 5433, 5434, and
 682 9999.</p>
 683
 684 <pre>$ for port in 5432 5433 5434 9999; do
 685 &gt;     echo $port
 686 &gt;     psql -c &quot;SELECT min(aid), max(aid) FROM accounts&quot; -p $port bench_parallel
 687 &gt; done
 688 </pre>
 689
 690 <div class="copyright">
 691 <hr>
 692 <copyright>
 693 Copyright &copy; 2003 &ndash; 2008 PgPool Global Development Group
 694 </copyright>
 695 </div>
 696 </body>
 697 </html>