Informix no levanta, logs file full

Buenas tardes, nos sucede esto en la instancia de Produccion de la unversidad, y a días de inscripción de materias (como de costumbre en el momento justo)

Les consulto por un problema que nos surgió al reiniciar informix porqué estaba colgado por tanta concurrencia.
Resulta que no lo pudimos levantar, con oninit -v se queda colgado en ‘Checking for temporary tables to drop…’
les envío los logs:

onstat -m

16:19:46 Physical Recovery Started at Page (1:136363).
16:20:14 Dynamically allocated new virtual shared memory segment (size 8192KB)
16:20:56 Dynamically allocated new virtual shared memory segment (size 8192KB)
16:21:56 Dynamically allocated new virtual shared memory segment (size 8192KB)
16:22:31 Dynamically allocated new virtual shared memory segment (size 15168KB)
16:22:31 Physical Recovery Complete: 404128 Pages Examined, 403530 Pages Restored.
16:22:31 Logical Recovery Started.
16:22:31 10 recovery worker threads will be started.
16:22:31 Dynamically allocated new virtual shared memory segment (size 8192KB)
16:22:35 Logical Recovery has reached the transaction cleanup phase.
16:22:35 Logical Recovery Complete.
774 Committed, 2 Rolled Back, 0 Open, 0 Bad Locks

16:22:36 Dataskip is now OFF for all dbspaces
16:22:38 Logical Log Files are Full – Backup is Needed

onstat -l

IBM Informix Dynamic Server Version 10.00.UC2E – Quiescent (CKPT REQ) – Up 00:04:53 – 67252 Kbytes
Blocked:CKPT

Physical Logging
Buffer bufused bufsize numpages numwrits pages/io
P-2 0 16 0 0 0.00
phybegin physize phypos phyused %used
1:263 1024000 567170 431070 42.10

Logical Logging
Buffer bufused bufsize numrecs numpages numwrits recs/pages pages/io
L-1 1 16 0 0 0 0.0 0.0
Subsystem numrecs Log Space used

Buffer Waiting
Buffer ioproc flags
L-1 444f9018 0x1 0

address number flags uniqid begin size used %used
445330b0 2 U-B---- 2348 1:1074263 50000 50000 100.00
445330f8 3 U-B---- 2349 1:1124263 50000 50000 100.00
44533140 4 U-B---- 2350 1:1174263 50000 50000 100.00
44533188 5 U-B---- 2351 1:1224263 50000 50000 100.00
445331d0 6 U-B-C-L 2352 1:1274263 50000 50000 100.00
44533218 7 U—C-- 2353 1:1324263 50000 4461 8.92
44533260 8 U-B---- 2340 1:1374263 50000 50000 100.00
445332a8 9 U-B---- 2341 1:1424263 50000 50000 100.00
445332f0 1 U-B---- 2342 3:992903 18750 18750 100.00
44533338 12 U-B---- 2343 1:1041394 17187 17187 100.00
44533380 13 U-B---- 2344 1:1805163 16796 16796 100.00
445333c8 14 U-B---- 2345 1:1854048 16699 16699 100.00
44533410 10 U-B---- 2346 1:1474263 50000 50000 100.00
44533458 11 U-B---- 2347 1:1581049 25000 25000 100.00
14 active, 14 total

Adjunto el onconfig.

Estoy intentando backupear los logs con ontape -a pero se queda ‘colgado’ o no se si termina, en este momento lo estoy dejando correr pero no da indicios de nada.
Entiendo que es un tema de los checkpoint al intentar levantar de nuevo pero como podría solucionarlo ???

Saludos


onconfig.ol_desarrollo.txt (11.2 KB)

Fijate si te deja correr el comando:

onmode -c unblock

para que ejecute el checkpoint y desbloquee el servidor.

Queda colgado sin devolver nada, miro los logs al tirarlo y nada
Previo ejecuto oninit -v para levantar el servidor pero solo entra en quiescent mode.

Ayer deje corriendo un ontape -a a ver si llegaba a hacerlo pero nada

Algo que pueda mirar ? intenté cambiando la ruta de LTAPDEV y hacer ontape -a pero nada.
intentamos agregar logs con onparams pero no lo hace.

No sabemos que mas intentar :S

Podes adjuntar el archivo onconfig?
Creo que la version 10 de informix si necesita logical logs agrega automáticamente, no? (variable DYNAMIC_LOGS)

Proba lo siguiente:

  1. Setear el LTAPEDEV en NULL.
    Si es Windows setea el valor: NUL
    Si es Linux setea del valor: /dev/null

  2. Realiza el backup manual:
    ontape -a

  3. Forzar a que el logical log actual (flag C) se pase al siguiente que esta backupeado:
    Correr el comando onmode -l


Mirando el resultado del comando “onstat -l” veo que todos menos el logical log nro 7 esta backupeado. Lo raro es que el 6 y el 7 tienen el flag C (Current logical log), cuando uno solo deberia ser el “logical log actual en uso”

No se si es tan así que los agrega automáticamente porque nunca sucedio!

Me devuelve:
Logbackup failed - Log backup to device ‘/dev/null’ not allowed

Lo había probado y creo que era porque te exige un archivo, en vez de /dev/null puse un archivo en una carpeta temporal de informix y ahí lo tomo pero nunca termino el ontape -a, no devuelve nada!

Por lo que entiendo no funciona ni el onmode ni el ontape porque está esperando un checkpoint no? no hay forma de evitar levantar estos ultimos checkpoint que lo bloquean al iniciar la instancia ?

onconfig


#**************************************************************************
#
#  Licensed Material - Property Of IBM
#
#  "Restricted Materials of IBM"
#
#  IBM Informix Dynamic Server
#  (c) Copyright IBM Corporation 1996, 2004 All rights reserved.
#
#  Title:	onconfig.std
#  Description: IBM Informix Dynamic Server Configuration Parameters
#
#**************************************************************************

# Root Dbspace Configuration

ROOTNAME	rootdbs		# Root dbspace name
#ROOTPATH	/dev/online_root # Path for device containing root dbspace
ROOTPATH	/opt/IBM/informix/ol_desarrollo/server/online_root	# Path for device containing root dbspace
MSGPATH /opt/IBM/informix/ol_desarrollo/server/online.log
ROOTOFFSET	0		# Offset of root dbspace into device (Kbytes)
ROOTSIZE	4096000		# Size of root dbspace (Kbytes)

# Disk Mirroring Configuration Parameters

MIRROR		0		# Mirroring flag (Yes = 1, No = 0)
MIRRORPATH			# Path for device containing mirrored root
MIRROROFFSET	0		# Offset into mirrored device (Kbytes)

# Physical Log Configuration

PHYSDBS		rootdbs 	# Location (dbspace) of physical log
#PHYSDBS		phydbs 	# Location (dbspace) of physical log
PHYSFILE        2048000          # Physical log file size (Kbytes)

# Logical Log Configuration

LOGFILES        14              # Number of logical log files
LOGSIZE		10000		# Logical log size (Kbytes)

# Tablespace Tablespace Configuration in Root Dbspace

TBLTBLFIRST	0		# First extent size (Kbytes) (0 = default)
TBLTBLNEXT	0		# Next extent size (Kbytes) (0 = default)

# Diagnostics 

#MSGPATH		/opt/IBM/informix/online.log # System message log file path
MSGPATH /opt/IBM/informix/ol_desarrollo/server/online.log
CONSOLE		/dev/console	# System console message path

# To automatically backup logical logs, edit alarmprogram.sh and set
# BACKUPLOGS=Y
ALARMPROGRAM /opt/IBM/informix/etc/alarmprogram.sh
#ALARMPROGRAM    /opt/IBM/informix/etc/alarmprogram.sh # Alarm program path
ALRM_ALL_EVENTS 0               # Triggers ALARMPROGRAM for any event occur
TBLSPACE_STATS  1		# Maintain tblspace statistics

# System Archive Tape Device

#TAPEDEV		/dev/tapedev	# Tape device path	
#TAPEDEV		/home/dba/backup/backup.raw	# Tape device path
TAPEDEV		/home/informix/backupInformix/backup.bkp
TAPEBLK		32		# Tape block size (Kbytes)
#TAPESIZE	10240		# Maximum amount of data to put on tape (Kbytes)
#TAPESIZE	4096000		# Maximum amount of data to put on tape (Kbytes)
TAPESIZE	8192000		#Se incremento para que no cortara el bkp 30-05-14
# Log Archive Tape Device

#LTAPEDEV	/dev/tapedev	# Log tape device path
#LTAPEDEV	/opt/IBM/informix/ol_desarrollo/backup/backup.log	# Log tape device path
LTAPEDEV	/home/informix/backupInformix/logs.bkp
#LTAPEDEV	/opt/IBM/informix/tmp
LTAPEBLK	32		# Log tape block size (Kbytes)
#LTAPESIZE	10240		# Max amount of data to put on log tape (Kbytes)
LTAPESIZE	4096000		# Max amount of data to put on log tape (Kbytes)

# Optical

STAGEBLOB                       # Informix Dynamic Server staging area 

# System Configuration

SERVERNUM	1		# Unique id corresponding to a OnLine instance
DBSERVERNAME		ol_desarrollo	# Name of default database server
DBSERVERALIASES		ol_guarani_web,ol_ypf_hec	# List of alternate dbservernames
NETTYPE				# Configure poll thread(s) for nettype
DEADLOCK_TIMEOUT	60	# Max time to wait of lock in distributed env.
RESIDENT	0		# Forced residency flag (Yes = 1, No = 0)

MULTIPROCESSOR  1               # 0 for single-processor, 1 for multi-processor
NUMCPUVPS	2		# Number of user (cpu) vps
SINGLE_CPU_VP   0               # If non-zero, limit number of cpu vps to one

NOAGE		0		# Process aging
AFF_SPROC	0		# Affinity start processor
AFF_NPROCS	0		# Affinity number of processors

# Shared Memory Parameters

LOCKS		2000		# Maximum number of locks
NUMAIOVPS	  		# Number of IO vps
PHYSBUFF	32		# Physical log buffer size (Kbytes)
LOGBUFF		32		# Logical log buffer size (Kbytes)
CLEANERS        1               # Number of buffer cleaner processes
SHMBASE         0x44000000L	# Shared memory base address
SHMVIRTSIZE	8000	        # initial virtual shared memory segment size
SHMADD          8192            # Size of new shared memory segments (Kbytes)
SHMTOTAL        0               # Total shared memory (Kbytes). 0=>unlimited
CKPTINTVL       300             # Check point interval (in sec)
TXTIMEOUT	300		# Transaction timeout (in sec)
STACKSIZE	32		# Stack size (Kbytes)

# Dynamic Logging
# DYNAMIC_LOGS:
#    2 : server automatically add a new logical log when necessary. (ON)
#    1 : notify DBA to add new logical logs when necessary. (ON)
#    0 : cannot add logical log on the fly. (OFF)
#
# When dynamic logging is on, we can have higher values for LTXHWM/LTXEHWM,
# because the server can add new logical logs during long transaction rollback.
# However, to limit the number of new logical logs being added, LTXHWM/LTXEHWM
# can be set to smaller values.
#
# If dynamic logging is off, LTXHWM/LTXEHWM need to be set to smaller values
# to avoid long transaction rollback hanging the server due to lack of logical
# log space, i.e. 50/60 or lower.

DYNAMIC_LOGS    2
LTXHWM          70
LTXEHWM         80

# System Page Size
# BUFFSIZE - OnLine no longer supports this configuration parameter.
#            To determine the page size used by OnLine on your platform
#            see the last line of output from the command, 'onstat -b'.


# Recovery Variables
# OFF_RECVRY_THREADS:
# Number of parallel worker threads during fast recovery or an offline restore.
# ON_RECVRY_THREADS:
# Number of parallel worker threads during an online restore.

OFF_RECVRY_THREADS	10	# Default number of offline worker threads
ON_RECVRY_THREADS	1	# Default number of online worker threads

# Backup/Restore variables
BAR_ACT_LOG   /usr/informix/bar_act.log  # ON-Bar Log file - not in /tmp please
BAR_DEBUG_LOG /usr/informix/bar_dbug.log # ON-Bar Debug Log - not in /tmp please
BAR_MAX_BACKUP	0
BAR_RETRY	1
BAR_NB_XPORT_COUNT 20
BAR_XFER_BUF_SIZE 31
RESTARTABLE_RESTORE	ON
BAR_PROGRESS_FREQ	0

# Informix Storage Manager variables
ISM_DATA_POOL   ISMData
ISM_LOG_POOL    ISMLogs

# Read Ahead Variables
RA_PAGES	  	        # Number of pages to attempt to read ahead
RA_THRESHOLD	  	        # Number of pages left before next group

# DBSPACETEMP:
# OnLine equivalent of DBTEMP for SE. This is the list of dbspaces
# that the OnLine SQL Engine will use to create temp tables etc.
# If specified it must be a colon separated list of dbspaces that exist
# when the OnLine system is brought online.  If not specified, or if
# all dbspaces specified are invalid, various ad hoc queries will create
# temporary files in /tmp instead.

DBSPACETEMP			# Default temp dbspaces

# DUMP*:
# The following parameters control the type of diagnostics information which
# is preserved when an unanticipated error condition (assertion failure) occurs 
# during OnLine operations.  
# For DUMPSHMEM, DUMPGCORE and DUMPCORE 1 means Yes, 0 means No.

DUMPDIR		/tmp		# Preserve diagnostics in this directory
DUMPSHMEM	1		# Dump a copy of shared memory
DUMPGCORE	0		# Dump a core image using 'gcore'
DUMPCORE	0		# Dump a core image (Warning:this aborts OnLine)
DUMPCNT		1		# Number of shared memory or gcore dumps for 
				# a single user's session

FILLFACTOR	90		# Fill factor for building indexes

# method for OnLine to use when determining current time
USEOSTIME	0	# 0: use internal time(fast), 1: get time from OS(slow)

# Parallel Database Queries (pdq)
MAX_PDQPRIORITY	100    # Maximum allowed pdqpriority
DS_MAX_QUERIES         # Maximum number of decision support queries 
DS_TOTAL_MEMORY        # Decision support memory (Kbytes) 
DS_MAX_SCANS 1048576   # Maximum number of decision support scans	
DS_NONPDQ_QUERY_MEM 128		# Non PDQ query memory (Kbytes)
DATASKIP	       # List of dbspaces to skip

# OPTCOMPIND
# 0 => Nested loop joins will be preferred (where 
#      possible) over sortmerge joins and hash joins. 
# 1 => If the transaction isolation mode is not  
#      "repeatable read", optimizer behaves as in (2) 
#      below.  Otherwise it behaves as in (0) above. 
# 2 => Use costs regardless of the transaction isolation
#      mode.  Nested loop joins are not necessarily  
#      preferred.  Optimizer bases its decision purely  
#      on costs. 
OPTCOMPIND      2      # To hint the optimizer

DIRECTIVES	1    # Optimizer DIRECTIVES ON (1/Default) or OFF (0) 

ONDBSPACEDOWN   2      # Dbspace down option: 0 = CONTINUE, 1 = ABORT, 2 = WAIT
OPCACHEMAX      0      # Maximum optical cache size (Kbytes)

# HETERO_COMMIT (Gateway participation in distributed transactions)
# 1 => Heterogeneous Commit is enabled
# 0 (or any other value) => Heterogeneous Commit is disabled
HETERO_COMMIT   0

#
# IFX_EXTEND_ROLE
#
# 0 =>	Disable use of EXTEND role to control who can register
#		external routines. This is the default behaviour.
# 1 =>	Enable use of EXTEND role to control who can register
#		external routines.  
#
IFX_EXTEND_ROLE	0	# To control the usage of EXTEND role.

SBSPACENAME            # Default smartblob space name - this is where blobs
		       # go if no sbspace is specified when the smartblob is 
		       # created. It is also used by some datablades as
		       # the location to put their smartblobs. 
SYSSBSPACENAME         # Default smartblob space for use by the Informix 
		       # Server. This is used primarily for Informix Server
		       # system statistics collection.

BLOCKTIMEOUT    3600   # Default timeout for system block
SYSALARMPROGRAM    /usr/informix/etc/evidence.sh    # System Alarm program path

# Optimization goal: -1 = ALL_ROWS(Default), 0 = FIRST_ROWS
OPT_GOAL	-1

ALLOW_NEWLINE   0       # embedded newlines(Yes = 1, No = 0 or anything but 1)


#Create Index Online Shared Memory usage limitation 
ONLIDX_MAXMEM		5120		# Per pool per index (Kbytes)

#Timeout for client connection request
LISTEN_TIMEOUT          10              # Timeout (in Seconds)

#Following are the deprecated configuration parameters, instead of these
#use BUFFERPOOL configuration parameter
#BUFFERS, LRUS, LRU_MIN_DIRTY, LRU_MAX_DIRTY
#
# The following are default settings for enabling Java in the database.
# Replace all occurrences of /usr/informix with the value of $INFORMIXDIR.

#VPCLASS        jvp,num=1       # Number of JVPs to start with

JVPJAVAHOME     /usr/informix/extend/krakatoa/jre       # JRE installation root directory
JVPHOME         /usr/informix/extend/krakatoa # Krakatoa installation directory

JVPPROPFILE     /usr/informix/extend/krakatoa/.jvpprops # JVP property file
JVPLOGFILE      /usr/informix/jvp.log   # JVP log file.

JDKVERSION    1.3              # JDK version supported by this server

# The path to the JRE libraries relative to JVPJAVAHOME
JVPJAVALIB    /bin

# The JRE libraries to use for the Java VM

JVPJAVAVM     jsig:hpi:jvm:java:net:zip:jpeg

# use JVPARGS to change Java VM configuration
#To display jni call
#JVPARGS	-verbose:jni

# Classpath to use upon Java VM start-up (use _g version for debugging)

#JVPCLASSPATH  /usr/informix/extend/krakatoa/krakatoa_g.jar:/usr/informix/extend/krakatoa/jdbc_g.jar
JVPCLASSPATH  /usr/informix/extend/krakatoa/krakatoa.jar:/usr/informix/extend/krakatoa/jdbc.jar

# The following parameters are related to the buffer pool
BUFFERPOOL	default,buffers=1000,lrus=8,lru_min_dirty=50.000000,lru_max_dirty=60.000000
BUFFERPOOL	size=2K,buffers=1000,lrus=8,lru_min_dirty=50.000000,lru_max_dirty=60.000000

Se puede forzar el checkpoint y desbloquear el motor con el comando:
onmode -c unblock

Ver: http://www-01.ibm.com/support/knowledgecenter/SSGU8G_11.50.0/com.ibm.adref.doc/ids_adr_0411.htm

Si, lo estoy dejando correr pero no da indicios de que esté haciendo algo, mientras tanto solo lo dejo.

Del onconfig deberias cambiar algunos valores, como ser:

Aumentar la cantidad de locks:
LOCKS 40000 # Maximum number of locks

Aumentar la memoria virtual, ya que por lo que vi del primer mensaje, el motor estuvo agregando varios segmentos de memoria:
SHMVIRTSIZE 131072 # initial virtual shared memory segment size → 128MB
SHMADD 32768 # Size of new shared memory segments (Kbytes) → 32mb

Esta desproporcionado el tamaño del fisical log respecto del tamaño total de los logical logs. El fisical log deberia estar entre un 25 a 30% del tamaño total de los logical logs. En este caso tenes:

PHYSFILE 2048000 # Physical log file size (Kbytes)

Logical Log Configuration

LOGFILES 14 # Number of logical log files
LOGSIZE 10000 # Logical log size (Kbytes)

Es mucho mas grande el tamaño del fisical log con respecto al tamaño total de los logical logs. Deberias agregar logical logs cuando puedas y ahicar el fisical log. Fijate que por defecto los crea en el rootdbs y el fisical log en este caso esta ocupando ocupa la mitad del tamaño del root dbspace.

Los siguientes valores creo los tenes muy altos, ver de bajarlos al valor por defecto (50 y 60):
LTXHWM 70
LTXEHWM 80

Si tenes configurado que agregue logical logs dinamicamente cuando los necesite: DYNAMIC_LOGS 2

No veo que tengas configurado un dbspace temporal, si lo tenes agregalo en esta variable:
DBSPACETEMP <nombre_dbspace_temporal>

Si corres el onstat -m nuevamente que muestra?
Tambien corre el onstat sin parametros.

Pudiste solucionar el problema o seguis con el motor en modo Quiescent?

Javier, como estan con este tema? Siguen con el mismo problema?

Hola, perdón por la tardanza en la respuesta, Ismael estuvo ayudandonos tambien.

La verdad que no lo pudimos solucionar, intentamos generar el checkpoint, backapear los logs pero nada así
que terminamos restaurando hacia atrás!

Modificamos en la configuración lo que nos indicaste Alejandro.

Gracias a todos :slight_smile:
Saludos