Updated on 2024-06-07 GMT+08:00

Troubleshooting

Low Connection Performance

  • log_hostname is enabled, but DNS is incorrect.

    Connect to the database, and run show log_hostname to check whether log_hostname is enabled in the database.

    If it is enabled, the database kernel will use DNS to check the name of the host where the client is deployed. If the host where the database is configured with an incorrect or unreachable DNS server, the database connection will take a long time to set up. For details about this parameter, see the description of log_hostname in section "GUC Parameter Description > Error Reports and Logs > Log Content" in the Developer Guide.

  • The database kernel slowly runs the initialization statement.

    Problems are difficult to locate in this scenario. Try using the strace Linux trace command.

    strace gsql -U MyUserName -d gaussdb -h 127.0.0.1 -p 23508 -r -c '\q'
    Password for MyUserName:

    The database connection process will be printed on the screen. If the following statement takes a long time to run:

    sendto(3, "Q\0\0\0\25SELECT VERSION()\0", 22, MSG_NOSIGNAL, NULL, 0) = 22
    poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])

    It can be determined that the database executes the SELECT VERSION() statement slowly.

    After the database is connected, you can run the explain performance select version() statement to find the reason why the initialization statement was run slowly. For more information, see "SQL Optimization > Introduction to the SQL Execution Plan" in Developer Guide.

    An uncommon scenario is that the disk of the machine where the DN resides is full or faulty, affecting queries and leading to user authentication failures. As a result, the connection process is suspended. To solve this problem, simply clear the data disk space of the DN.

  • TCP connection is set up slowly.

    Adapt the steps of troubleshooting slow initialization statement execution. Use strace. If the following statement is run slowly:

    connect(3, {sa_family=AF_FILE, path="/home/test/tmp/gaussdb_llt1/.s.PGSQL.61052"}, 110) = 0

    Or,

    connect(3, {sa_family=AF_INET, sin_port=htons(61052), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)

    It indicates that the physical connection between the client and the database is set up slowly. In this case, check whether the network is unstable or has high throughput.

Problems in Setting Up Connections

  • gsql: could not connect to server: No route to host

    This problem occurs generally because an unreachable IP address or port number was specified. Check whether the values of -h and -p parameters are correct.

  • gsql: FATAL: Invalid username/password,login denied.

    This problem occurs generally because an incorrect username or password was entered. Contact the database administrator to check whether the username and password are correct.

  • gsql: FATAL: Forbid remote connection with trust method!

    For security purposes, remote login in trust mode is forbidden. In this case, you need to modify the connection authentication information in the pg_hba.conf file. For details, contact the administrator.

    Do not modify the configurations of database hosts in the pg_hba.conf file. Otherwise, the database may become faulty. It is recommended that service applications be deployed outside the database instead of inside the database.

  • The DN can connect to the database if -h 127.0.0.1 is specified, and the connection will fail if -h 127.0.0.1 is removed.

    Run the SQL statement show unix_socket_directory to check whether the unix socket directory used by the DN is the same as that specified by the environment variable $PGHOST in the shell directory.

    If they are different, set $PGHOST to the directory specified by unix_socket_directory.

    For more information about unix_socket_directory, see "GUC Parameter Description > Connection and Authentication > Connection Settings" in the Developer Guide.

  • The "libpq.so" loaded mismatch the version of gsql, please check it.

    This problem occurs because the version of libpq.so used in the environment does not match that of gsql. Run the ldd gsql command to check the version of the loaded libpq.so, and then load correct libpq.so by modifying the environment variable LD_LIBRARY_PATH.

  • gsql: symbol lookup error: xxx/gsql: undefined symbol: libpqVersionString

    This problem occurs because the version of libpq.so used in the environment does not match that of gsql (or the PostgreSQL libpq.so exists in the environment). Run the ldd gsql command to check the version of the loaded libpq.so, and then load correct libpq.so by modifying the environment variable LD_LIBRARY_PATH.

  • gsql: connect to server failed: Connection timed out

    Is the server running on host "xx.xxx.xxx.xxx" and accepting TCP/IP connections on port xxxx?

    This problem is caused by network connection faults. Check the network connection between the client and the database server. If you cannot ping from the client to the database server, the network connection is abnormal. Contact network management personnel for troubleshooting.

    ping -c 4 10.10.10.1
    PING 10.10.10.1 (10.10.10.1) 56(84) bytes of data.
    From 10.10.10.1: icmp_seq=2 Destination Host Unreachable
    From 10.10.10.1 icmp_seq=2 Destination Host Unreachable
    From 10.10.10.1 icmp_seq=3 Destination Host Unreachable
    From 10.10.10.1 icmp_seq=4 Destination Host Unreachable
    --- 10.10.10.1 ping statistics ---
    4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 2999ms
  • gsql: FATAL: permission denied for database "gaussdb"

    DETAIL: User does not have CONNECT privilege.

    This problem occurs because the user does not have the permission to access the database. To solve this problem, perform the following steps:

    1. Connect to the database as the system administrator dbadmin.
      gsql -d gaussdb -U dbadmin -p 8000
    2. Grant the user with the permission to access the database.
      GRANT CONNECT ON DATABASE gaussdb TO user1;

      Actually, some common misoperations may also cause a database connection failure, for example, entering an incorrect database name, username, or password. In this case, the client tool will display the corresponding error messages.

      gsql -d gaussdb -p 8000
      gsql: FATAL:  database "gaussdb" does not exist
      
      gsql -d gaussdb -U user1 -p 8000
      Password for user user1:
      gsql: FATAL:  Invalid username/password, login denied.
  • gsql: FATAL: sorry, too many clients already, active/non-active: 197/3.

    This problem occurs because the number of system connections exceeds the allowed maximum. Contact the DBA database administrator to release unnecessary sessions.

    You can check the number of connections as described in Table 1.

    You can view the session status in the PG_STAT_ACTIVITY view. To release unnecessary sessions, use the pg_terminate_backend function.

    select datid,pid,state from pg_stat_activity;
     datid |       pid       | state  
    -------+-----------------+--------
     13205 | 139834762094352 | active
     13205 | 139834759993104 | idle
    (2 rows)

    The value of pid is the thread ID of the session. Terminate the session using its thread ID.

    SELECT PG_TERMINATE_BACKEND(139834759993104);

    If a command output similar to the following is displayed, the session is successfully terminated.

    PG_TERMINATE_BACKEND
    ----------------------
     t
    (1 row)
    Table 1 Viewing the number of session connections

    Description

    Command

    View the maximum number of sessions connected to a specific user.

    Run the following command to view the upper limit of the number of USER1's session connections. -1 indicates that no upper limit is set for the number of USER1's session connections.

    SELECT ROLNAME,ROLCONNLIMIT FROM PG_ROLES WHERE ROLNAME='user1';
     rolname | rolconnlimit
    ---------+--------------
     user1    |           -1
    (1 row)

    View the number of session connections that have been used by a specified user.

    Run the following command to view the number of session connections that have been used by USER1. 1 indicates the number of session connections that have been used by USER1.

    SELECT COUNT(*) FROM dv_sessions WHERE USERNAME='user1';
    
     count
    -------
         1
    (1 row)

    View the maximum number of sessions connected to a specific database.

    Run the following command to view the upper limit of the number of gaussdb's session connections. -1 indicates that no upper limit is set for the number of gaussdb's session connections.

    SELECT DATNAME,DATCONNLIMIT FROM PG_DATABASE WHERE DATNAME='gaussdb';
    
     datname  | datconnlimit
    ----------+--------------
     gaussdb |           -1
    (1 row)

    View the number of session connections that have been used by a specific database.

    Run the following command to view the number of session connections that have been used by gaussdb. 1 indicates the number of session connections that have been used by gaussdb.

    SELECT COUNT(*) FROM PG_STAT_ACTIVITY WHERE DATNAME='gaussdb';
     count 
    -------
         1
    (1 row)

    View the number of session connections that have been used by all users.

    Run the following command to view the number of session connections that have been used by all users:

    SELECT COUNT(*) FROM dv_sessions;
     
     count
    -------
         10
    (1 row)
  • gsql: wait xxx.xxx.xxx.xxx:xxxx timeout expired

    When gsql initiates a connection request to the database, a 5-minute timeout period is used. If the database cannot correctly authenticate the client request and client identity within this period, gsql will exit the connection process for the current session, and will report the above error.

    Generally, this problem is caused by the incorrect host and port (that is, the xxx part in the error information) specified by the -h and -p parameters. As a result, the communication fails. Occasionally, this problem is caused by network faults. To resolve this problem, check whether the host name and port number of the database are correct.

  • gsql: could not receive data from server: Connection reset by peer.

    Check whether DN logs contain information similar to "FATAL: cipher file "/data/coordinator/server.key.cipher" has group or world access". This error is usually caused by incorrect tampering with the permissions for data directories or some key files. For details about how to correct the permissions, see related permissions for files on other normal instances.

  • gsql: FATAL: GSS authentication method is not allowed because XXXX user password is not disabled.

    In pg_hba.conf of the target DN, the authentication mode is set to gss for authenticating the IP address of the current client. However, this authentication algorithm cannot authenticate clients. Change the authentication algorithm to sha256 and try again. For details, contact the administrator.

    • Do not modify the configurations of database hosts in the pg_hba.conf file. Otherwise, the database may become faulty.
    • It is recommended that service applications be deployed outside the database instead of inside the database.

Other Faults

  • There is a core dump or abnormal exit due to the bus error.

    Generally, this problem is caused by changes in loading the shared dynamic library (.so file in Linux) during process running. Alternatively, if the process binary file changes, the execution code for the OS to load machines or the entry for loading a dependent library will change accordingly. In this case, the OS kills the process for protection purposes, generating a core dump file.

    To resolve this problem, try again. In addition, do not run service programs in a database during O&M operations, such as an upgrade, preventing such a problem caused by file replacement during the upgrade.

    A possible stack of the core dump file contains dl_main and its function calling. The file is used by the OS to initialize a process and load the shared dynamic library. If the process has been initialized but the shared dynamic library has not been loaded, the process cannot be considered completely started.