PostgreSQL 7.1 中未实现功能的补偿

John Pagakis，Toolsmythe 软件服务公司总裁
Todd Gauthier，DevelopOnline 公司数据库管理员

2001 年 8 月 21 日

摘要

在大多数领域，PostgreSQL 7.1 与任何其他 SQL 数据库一样健壮。不幸的是，有一些未实现的功能，使得在事后更改架构成为一项艰巨的任务。此版本中特别缺少以下功能：

支持从表中删除列
将列从 NULL 更改为 NOT NULL（反之亦然）
添加主键
更改列类型

其他功能，例如在事后定义 ON DELETE/UPDATE 行为，需要了解 PostgreSQL 内部管理表。

此外，还有一些功能已经实现，但并不像它们需要的那样健壮。例如，ALTER TABLE 语法用于添加列，但该语法不能用于将列指定为主键，或将其指定为 NOT NULL。为了实现这一点，您再次被迫进入 PostgreSQL 的内部。

本文提出了一种替代方法，可以替代黑客攻击管理表或编辑 pg_dump 的输出，并确定了一些您无疑会遇到的陷阱。提议的替代方法使用熟悉的 SQL 语法，并且不需要了解 PostgreSQL 的内部结构。最棒的是，它对您有用，并且即使您的老板站在您的肩膀上，也能让您处于舒适区！

致谢

本文中提供的信息是在亚利桑那州坦佩阳光明媚的 DevelopOnline 公司的三个月中发现的。我在公司经费的支持下进行了研究，并捐赠了撰写本文的时间，但正是 DevelopOnline 对开源社区的承诺，才让您读到了本文。他们不必允许我与您分享这些信息，他们选择这样做。请务必访问 www.developonline.com 表示感谢！

我还要感谢 Joel Burton 撰写的出色论文“引用完整性教程和黑客攻击引用完整性表”（Burton，J. 2001a）——关于该主题的权威著作。

最后，我要感谢 DevelopOnline 公司的数据库管理员 Todd Gauthier，他在一个我们俩都很紧张的时期帮助我进行调试和测试‹笑›。

本文中的信息尽我所能准确，但与所有涉及计算机的事情一样，您的里程可能会有所不同。我和 PostgreSQL 社区的任何成员都不能对使用本文中介绍的知识或工具造成的任何损失或损害负责。买方自负！

-- John Pagakis，Toolsmythe 软件服务公司总裁

问题领域

如果您正在阅读本文，您可能已经发现，在 PostgreSQL 7.1 中实现架构后，很难进行更改。通常，您首先会在某个迫在眉睫的紧迫期限中了解到这一点。本文适合您。

首先，我敦促您，不要将本文作为逃避学习 PostgreSQL 内部结构的借口。如果您有时间，没有任何东西可以替代知识。但是，如果您明天的架构修改已经到期，并且三个部门主管都在您耳边吹气，现在不是尝试黑客攻击引用完整性的来龙去脉的时候。

解决方案

因此，让我们直接进入主题。假设以下表格

	CREATE TABLE
		user_table( user_id char( 8 ) NOT NULL PRIMARY KEY,
			first_name text,
			last_name text,
			social_security_number text NULL,
			unneeded_column text );

例如，如果您想删除不必要的 unneeded_column 列并将 social_security_number 更改为 NOT NULL UNIQUE，您需要执行以下步骤

保留数据
删除旧的表 table
使用新的定义重新创建表
从保存的数据中重新填充新表

完成此操作的 SQL 如附件 A 所示。

附件 A

	DROP TABLE
		temp_user_table;

	/* Preserve the data */
	SELECT 
		user_id,
		first_name,
		last_name,
		social_security_number,
		unneeded_column
	INTO
		temp_user_table
	FROM
		user_table;


	/* Drop the old table */
	DROP TABLE
		user_table;




	/* Recreate the table with your new definition */
	CREATE TABLE
		user_table( user_id char( 8 )NOT NULL UNIQUE PRIMARY KEY,
				first_name text,
				last_name text,
				social_security_number text NOT NULL UNIQUE );


	/* Re-populate the new table from the data you saved */
	INSERT INTO
		user_table( user_id,
			    first_name,
			    last_name,
			    social_security_number )
	SELECT
		user_id,
		first_name,
		last_name,
		social_security_number 
	FROM
		temp_user_table;

请注意，此脚本开头 temp_user_table 的 DROP 操作如果该表不存在，可能会失败。没关系，重要的是该表必须不存在才能使下一条语句正常工作。SELECT INTO FROM 语法实际上会为您创建一个表并用源表中的数据填充它。请注意，此表仅用作存放数据的地方——它不会复制约束、默认值或任何其他内容。您不应该尝试将此表用于任何其他目的。此脚本的其余部分应该相当简单。

这里有一些重要的注意事项。首先，切勿对具有主键的表应用附件 B 中所示的技术。

附件 B

	/* Dangerous!!!  Avoid!! */
	CREATE TABLE
   		temp_user_table( user_id char( 8 ) NOT NULL UNIQUE PRIMARY KEY,
				 first_name text,
				 last_name text,
				 social_security_number text );


	INSERT INTO
		temp_user_table( user_id,
				 first_name,
				 last_name,
				 social_security_number )
	SELECT
		user_id,
		first_name,
		last_name,
		social_security_number 
	FROM
		user_table;


	DROP TABLE
		user_table;



	ALTER TABLE
		temp_user_table
	RENAME TO
		user_table;

通过在具有主键的表上使用附件 B 中的技术，您将在下次必须删除和重新创建该表时发现，您在主键上有一个错误命名的索引。在我们的示例中，您第二次执行此操作时，CREATE TABLE 将失败，因为索引 temp_user_table_pkey 已经存在。

PostgreSQL 为表的索引生成一个名为‹表名›_pkey 的名称。当您执行 CREATE TABLE temp_user_table 时，在幕后创建了一个名为 temp_user_table_pkey 的索引。当您执行 ALTER TABLE RENAME 将 temp_user_table 重命名为 user_table 时，表被重命名了——但索引没有被重命名。ALTER TABLE RENAME 足够聪明，可以重命名表，但不足够聪明，无法重命名主键索引。

几个月后您可能会发现这一点，并且您会花费大量时间试图找出问题所在。为了避免这个问题，始终使用附件 A 中概述的技术，绝不使用附件 B 中的技术。

请注意，附件 B 中的技术是在“有效地升级 PostgreSQL 表中的列”中提出的（Burton，J. 2001b）。Burton 先生的论文中的示例在没有上述复杂性的情况下运行，但仅仅是因为它是对没有主键的表执行的。

关于下一个注意事项——如果您有任何表格保存对您刚刚删除的表格的引用，您会发现这些表格之间不再强制执行引用完整性。

考虑以下架构

鉴于此架构，equipment 表将保存对 user_table 表的外部键引用。删除和重新创建 user_table 表的行为将破坏这两个表之间的约束。这意味着您可以删除 equipment 表中存在引用的 user_table 条目；您可以添加包含对 user_table 表中不存在的用户的引用的 equipment（这两个操作都不会在引用完整性起作用的情况下被允许）。

要解决此问题，您需要使用一些 SQL 语句将外部键约束添加回来。

附件 D

	ALTER TABLE	
		equipment
	ADD CONSTRAINT
		fk__equipment__user_table
	FOREIGN KEY
		( fk_user )
	REFERENCES
		user_table( user_id )
	MATCH FULL;

附件 D 假设 equipment 对 user_table 的外部键引用的列名名为 fk_user。

请记住，添加回约束的行为会导致该约束在添加过程中被强制执行。换句话说，如果子表（在我们的示例中为 equipment）中存在孤立引用，则 ALTER TABLE ADD CONSTRAINT 将因引用完整性违规而失败。然后，您必须找到并删除这些孤立条目，然后再尝试添加约束。

确保所有这些操作都正确非常重要，因为如果您错过了外部键引用，这将是一个相当乏味的问题。并不是说会有一个大红色标志弹出，上面写着“警告！引用完整性已破坏！”。您可能会在很长一段时间后才发现您存在引用完整性问题，并花费更多时间将其追溯到表格的删除和重新创建。

为此，我们开发了一个 Perl 脚本——genConstraints.pl（见附录 A）——来自动创建将外部键约束添加回来的 SQL 语句。此脚本将为引用您指定的所有表的表生成 ALTER TABLE ADD CONSTRAINT 语句。要使用此脚本，请键入以下内容

	./genConstraints.pl dbName username password [table1 [table2 ?]]

dbName、username 和 password 参数是必需的（如果没有密码，请使用一对引号——作为占位符）。如果您没有指定任何表名，则将编写所有外部键约束。您可以提供一个表列表——以空格分隔——在这种情况下，只有对这些表的引用将被编写。请查看 Perl 脚本中的常规注释以获取更多信息。

在 Joel Burton 的论文“引用完整性教程和黑客攻击引用完整性表”（Burton，J. 2001a）中定义的系统目录视图也很值得实施。这将为您提供一种相当简单的查看有哪些约束到位的方法。

在将列从 NULL 更改为 NOT NULL 时，必须注意另一个注意事项。请记住，一旦您重新创建了表，您更改为 NOT NULL 的列中的任何 NULL 数据都会导致表重新加载失败，并显示一条消息，指示在不允许 NULL 的字段中发现了 NULL。如果可能，在切换到 NULL 时，也定义一个默认值。如果不可行，请记住您必须修改 temp 表中的这些 NULL 值，然后再重新填充真实表。

总之，为了解决 PostgreSQL 7.1 中的一些未实现功能，以下过程效果很好

创建 SQL 以保留引用您将修改的表（s）的约束。genConstraints.pl 实用程序可以在这里帮助您。如果您使用它，请记住将输出重定向到文件。
使用附件 A 中概述的技术创建 SQL 脚本。
执行该脚本。请务必在 psql 中使用“s”参数，以便在执行下一步之前验证每个步骤的成功。此外，请注意，当您使用“s”执行保存在文件中的脚本时，必须使用“f”来指示该文件。如果您将文件重定向到 psql，则会忽略 -s。
重新应用引用您刚刚操作的表的约束。执行在步骤 1 中创建的 SQL 脚本。您必须验证添加外部键约束的成功——确保添加没有生成外部键违规。

附录 A

GenConstraints.pl

#!/usr/bin/perl -w
################################################################
## This program is free software; you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 2 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU Library General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with this program; if not, write to the Free Software
## Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307,
## USA.
################################################################
## genConstraints.pl
##
## Input:
##        databaseName - name of the database you wish to process
##        user - valid user name in postgres
##        password - user's password (or "" for nothing)
##        tableName(s) - optional.  One or more tables in the
##                       database.  Omitted, it will process all
##                       tables.
##
## Output:
##        SQL script to regenerate foreign key constraints on the
##        tables in the tableName list (or, if no tableName list
##        all foreign key constraints).  Output goes to standard
##        output - redirect to file if you want to save it.
##
## Description:
##        If you drop and rebuild a table that has foreign key
##        references from other tables, those references will be
##        lost.  This script allows you to create the SQL
##        statements to add those foreign key constraints back
##        once you have recreated and repopulated the parent
##        table.
##
## Requirements:
##        This program must be run before you drop the table.
##        Once you drop the table, the constraints are lost.
##
##        When you try to add the constraints back, the rules of
##        referential integrity will be enforced.  This means that
##        there must be an entry in the parent table (the one you
##        dropped and re-added) for every reference in the child
##        table.
############
## Modification History
############
## 2000.08.16 - Genisis
## By John Pagakis, Toolsmythe Software Services, Inc. for
## DevelopOnline, Inc.
################################################################

use DBI;

################################################################
##  main
##
##  Input:
##        See program input in general comments above
##
##  Output:
##        See program output in general comments above
##
##  Description:
##        00) Check for valid command line args.  If wrong number of
##            args, show usage.
##
##        01) Attempt to connect to the database.  If successful,
##            select all non-postgres-internal triggers from
##            pg_trigger.
##
##        02) Initialization of control variables.
##
##        03) For each trigger ....
##                00) Parse out the trigger arguments.
##                    Postgres 7.x stores constraints as triggers.
##                    The relationship between the parent and child
##                    table is held by postgres in the tgargs column
##                    of the pg_trigger table.  The targs column is
##                    a byte array, the arguments are separated by
##                    the string "\000".  For a foreign key constraint
##                    there are six arguments:
##                        0) Trigger name (or  if no trigger
##                           name was defined.
##                        1) Referencing table name.
##                        2) Referenced table name.
##                        3) Constraint type (or "UNSPECIFIED" if none).
##                        4) Referencing column name.
##                        5) Referenced column name.
##                    We are going to use this information to generate
##                    the ALTER TABLE ADD CONSTRAINT syntax needed to
##                    put the constraint back after it has been dropped.
##                    This step parses out the arguments and holds them
##                    in the @splitArgs array.
##
##                01) Now that we have the arguments parsed out, we need
##                    to see if this is a constraint we need to generate
##                    SQL for.  The answer is yes if no table list was
##                    included in the command line args, or if the
##                    referenced table name (@splitArgs[2]) is in the
##                    command line table list.  If either of these
##                    conditions is met ....
##
##                        00) Have we hit a control break?
##                            There will be duplicate triggers in the
##                            trigger table.  This is because constraints
##                            must be checked on inserts, mods, and
##                            deletes.  Each one of these is a different
##                            trigger, but the information (arguments ) is the same,
##                            so we don't want to process what for our
##                            purposes are duplicates.  On a control
##                            break ....
##
##                                00) Save off the new control info.
##
##                                01) Generate the first part of the
##                                    SQL statement.
##
##                                02) If the trigger is unnamed, generate
##                                    a name using the following format:
##                                        fk__referencingTableName__referencedTableName
##                                    If a name exists, use it.
##
##                                03) Generate the next part of the SQL
##                                    statement.
##
##                                04) If a constraint type is specified, use it,
##                                    otherwise do nothing.
##
##                                05) Finsh off the SQL statement.
##
##        04) Finish the result set.
##
##        05) Close the database connection
##
##
##  Side effects:
##        Any unnamed constraint will be given a name matching
##                fk__referencingTableName__referencedTableName
################################################################


######
# 00 #
if ( @ARGV < 3 )
{
        print "USAGE ....\n\tgenConstraints dbName user password [table1 [table2 ...]]\n";
        exit;
}

######
# 01 #
my $dbase = shift;
my $user = shift;
my $password = shift;
$host = "localhost";

######
# 02 #
my $driver = "dbi:Pg:dbname=" . $dbase . ";host=" . $host;

my $dbh = DBI->connect( $driver, $user, $password ) or die "\nError ($DBI::err):$DBI::errstr\n";

my $targResultSet = $dbh->prepare( "SELECT tgargs FROM pg_trigger WHERE tgname NOT LIKE 'pg_%';" );

$targResultSet->execute;


######
# 03 #
$saveReferedTable = "x#";
$saveReferingTable = "x#";
$saveReferedKey = "x#";
$saveReferingKey = "x#";

$numberOfArgs = @ARGV;


######
# 04 #
while ( @targs = $targResultSet->fetchrow_array )
  {

        #########
        # 04.00 #
        my @arg = $targs[0];

        my $argAsStr = pack "a*", $arg[0];

        my @splitArgs = split /\\000/, $argAsStr;


        #########
        # 04.01 #
        if ( $numberOfArgs == 0
        ||   isInTableList( $splitArgs[2], @ARGV ) )
        {
                ############
                # 04.01.00 #
                if ( $saveReferedTable ne $splitArgs[2]
                ||   $saveReferingTable ne $splitArgs[1]
                ||   $saveReferedKey ne $splitArgs[4]
                ||   $saveReferingKey ne $splitArgs[3] )
                {
                        ###############
                        # 04.01.00.00 #
                        $saveReferingTable = $splitArgs[1];
                        $saveReferedTable = $splitArgs[2];
                        $saveReferingKey = $splitArgs[3];
                        $saveReferedKey = $splitArgs[4];

                        ###############
                        # 04.01.00.01 #
                        print "ALTER TABLE\n\t$splitArgs[1]\nADD CONSTRAINT";

                        ###############
                        # 04.01.00.02 #
                        if ( $splitArgs[0] eq "" )
                        {
                                print "\n\tfk__";
                                print $splitArgs[1];
                                print "__";
                                print $splitArgs[2];
                        }
                        else
                       {
                                print "\n\t$splitArgs[0]";
                        }

                        ###############
                        # 04.01.00.03 #
                        print "\nFOREIGN KEY\n\t( $splitArgs[4] )\nREFERENCES\n\t$splitArgs[2]( $splitArgs[5] )";

                        ###############
                        # 04.01.00.04 #
                        if ( $splitArgs[3] ne "UNSPECIFIED" )
                        {
                                print "\nMATCH $splitArgs[3]";
                        }

                        ###############
                        # 04.01.00.05 #
                        print ";\n\n\n";
                }
        }

  }


######
# 05 #
$targResultSet->finish;


######
# 06 #
$dbh->disconnect;


######################## end of main ############################

sub isInTableList
################################################################
##  isInTableList
##
##  Input:
##        $candidate - This is the table name that we are testing
##                     to see if there is a matching entry in the
##                     table array.
##
##        @tableArray - This is a list of tables (from @ARGV) that
##                      we must generate SQL statements for.
##
##  Output:
##        True ( 1 ) if $candidate is in @tableArray.
##        False ( 0 ) if $candidate is not in @tableArray.
##
##  Description:
##        See output.
##
##  Side effects:
##        None
################################################################
{
        my $candidate = $_[0];
        my @tableArray = $_[1];

        foreach $tableName ( @tableArray )
        {
                if ( $tableName eq $candidate )
                {
                        return 1;
                }
        }

        return 0;
}

参考文献

关于 Postgres 系统目录 (2001)。从万维网检索，2001 年 8 月 22 日：https://postgresql.ac.cn/docs/7.1/static/pg-system-catalogs.html

Burton，J. (2001a)。引用完整性教程和黑客攻击引用完整性表。从万维网检索，2001 年 8 月 22 日：引用完整性教程和黑客攻击引用完整性表

Burton，J. (2001b)。有效地升级 PostgreSQL 表中的列。从万维网检索，2001 年 8 月 22 日。

PostgreSQL 全球开发组，The (2001)。PostgreSQL 7.1 开发人员指南，第 3 章：系统目录（第 9-24 页）。从万维网检索，2001 年 8 月 22 日：https://postgresql.ac.cn/docs/7.1/static/catalogs.html

John Pagakis，Toolsmythe 软件服务公司总裁
Todd Gauthier，DBA，DevelopOnline，Inc。
更新于 2003 年 9 月 28 日
Marc Stosberg（marc at summersault dot com）发来以下邮件

用户补丁

我们有一个 Postgres 7.1 安装，我们/仍然/在运行。:)

相关脚本中存在一些错误，我已经在下面包含了补丁。如果您能发布它以避免像我这样的其他可怜的灵魂遇到麻烦，我们将不胜感激。

解决的两个问题是

不支持远程主机。我添加了对 PGHOST 环境变量的基本支持。

解析来自 Postgres 的 bytea 结果已损坏。我假设这是因为更新版本的 DBD::Pg 的工作方式不同。我更新了代码以使用现代 DBD::Pg，1.20。我不知道这对旧版本意味着什么。

- 马克

#####


--- GenConstraints.pl   Wed Nov  7 06:37:53 2001
+++ GenConstraints.pl.new       Sun Sep 28 16:01:53 2003
@@ -159,7 +159,7 @@
 my $dbase = shift( @ARGV );
 my $user = shift( @ARGV );
 my $password = shift( @ARGV );
-$host = "localhost";
+$host = $ENV ||  "localhost";
 
 ######
 # 02 #
@@ -193,8 +193,7 @@
 
         my $argAsStr = pack "a*", $arg[0];
 
-        my @splitArgs = split /\\000/, $argAsStr;
-
+        my @splitArgs = split /\0/, $argAsStr;
 
         #########
         # 04.01 #
@@ -300,4 +299,4 @@
         }
 
         return 0;
-}
\ No newline at end of file
+}