Technology New Update -aircraftpapercraft (0): 2009

Saturday, October 24, 2009

Links revision...

Some links were changed and specially several Informix related blogs were added. It's really nice to see new Informix related sites and authors publishing info. In particular I'm very glad to add several Brazilian sites. This is really important for the Portuguese speaking audience. The links added are:

Advanced DataTools
This was a big miss. Lester Knutsen is probably the most noticeable element of this IBM Partner who just recently organized another edition of the Fastest Informix DBA contest
IMartins Tudo Sobre Informix
A Brazilian site with articles and information related to IBM Informix
AskPontes
A blog from my Brazilian colleague Vagner Pontes
InformixBr
Another blog from another Brazilian colleague Pedro HT Rodrigues
Informix DBA
A blog from the well known Andrew Ford, a member of the International Informix Users Group board of directors
Informix Tips
A blog from Jeff Filippi from Integrated Data Consulting
Informix Administrators Blog
A blog from Markus Holzbauer, a German colleague and a long time Informix user (started on v4... same as me)

So, as you can see I've been away from a lot of new activity. I hope I've recovered the gap. If you have any info about any Informix related site that I should list, please don't hesitate to leave a comment or send me an email.

Friday, October 2, 2009

IDS Roadshow em Lisboa

This post regards an IDS Roadshow in Lisbon, so it will be written in Portuguese.
A IBM vai realizar o IDS Roadshow 2009 em Lisboa nos dias 13, 14 e 15 de Outubro.
O evento terá lugar nas instalações da IBM em Lisboa (IBM Forum Lisboa) o parque das nações.
A agenda do evento inclui:

Últimas funcionalidades na versão 11.50
IDS Data Warehouse ( Technical hands-on)
Storage Compression ( Technical hands-on)
Open Admin Tool
Enterprise Replication
Solid DB/Change Data Capture

A participação é gratuita, sendo os lugares limitados. Para se inscrever envie um email para ibmdm@pt.ibm.com, referindo o evento.

A apresentação estará a meu cargo.

Friday, September 25, 2009

IIUG 2009 vNext survey: Speak for yourself

Although the IIUG site is quiet about this, an email note was sent to IIUG members advertising this year's vNext survey.

The survey tries to poll user's opinions about new features that we would like to see in future Informix versions. The results will be (and I quote) "passed on to the powers that be". Meaning the R&D team will take them into consideration while planning for future versions.

Some of the features proposed are very interesting, but if you have some ideas not mentioned in the survey you can add your own.

I won't waste time listing the proposed features here. Just take a look at:

http://www.iiug.org/2009_survey/

and take the opportunity to fill it. It's fairly quick, and your comments are important.
IBM have been improving Informix, and this is a continuous process where user input is precious.

Thursday, August 27, 2009

Where do you stop as a DBA? Thoughts about problem solving....

This is one of my few posts that are not about Informix features. I will write about a real life situation and use it for developing some thoughts about problem solving ability (or lack of it).

I will not use real names from either customers or suppliers except of course Informix and my employer. The scenario is this:

Several IBM Informix Dynamic Servers will be upgraded from version 10 on platform P1 to version 11.50 on platform P2. P2 is from a third party supplier, let's call it S1.
Currently the customer uses ontape utility for backup, but wants to take advantage of OnBar. So tests were made using OnBar, IDS 11.50 and a Storage Manager from another supplier S2 on platform P2. After some issues the tests run fine and some development instances are migrated. Suddently we start having issues. OnBar reports a BAR_TIMEOUT in BAR_ACT_LOG file... This message informs us that OnBar executed a BSA (XOpen standard for backup API) call which failed to return (successfuly or unsuccessfuly) within a specific ammount of time. OnBar will not give up, and will keep logging messages in BAR_ACT_LOG file. The Informix error message description is this:

"-43296 Warning: BAR_TIMEOUT Storage Manager Progress may be stalled. The current command has exceeded the BAR_TIMEOUT value. This might indicate a normal transfer of a large data set or it can mean the storage manager needs attention. Check to make sure the storage manager is still running and that data is still being transferred to/from the storage manager. Refer to the storage manager message log (or equivalent) for more information. If the cause of the error is still unclear, contact your storage manager Technical Support. If you are using IBM Informix Storage Manager, contact Technical Support at tsmail@us.ibm.com"

The message looks explicit and clear, but the Storage Manager operators decided that it should be a problem with OnBar. Let's make a small pause here... this is the first point I'd like to raise... What would you do? Trust the Informix error message that explicitly says to contact the Storage Manager technical support or do your own investigation?
Well, as an onsite consultant I decided to do some investigation... First things I did were:

Running truss against onbar_d processes
Running dbx/gdb/adb against the onbar_d processes
Running lsof against the onbar_d processes

These simple steps revealed that:

truss showed OnBar was "stuck" on send(). This is an OS function used to send data through a socket
gdb showed that onbar_d had called the BSASendData() function which in turn had called send(). BSASendData() is part of the dynamic library we have to configure in $ONCONFIG (BSALIBPATH)
lsof showed that the file descriptor (fd) used in the send() call was a socket opened to the storage manager "media server". This is a host which has direct access to the storage manager allocated tape drives

So, it was clear (only for me as we shall see...) that there was a problem in the storage manager while it tried to send data from the database server to the media server. The next logical step would be to analyse the status of the process receiving the data in the media server. Since I had no access to that server I asked for it, but without success... In the following days I received all kinds of evasive replies and useless requests both from the storage manager team and from the storage manager supplier (S2) technical support. Some of these were:

You cannot run a whole system backup with more than one concurrent process
True for versions earlier than 11.10, but incorrect for version 11.50 (and this suggestion ignored all the debug info above)
We were trying to use more onbar_d processes than what the storage manager was configured to support, leading to waits on the OnBar side
This could make sense, but when this happens, the onbar_d processes wait for a free "slot" at the begining of a dbspace backup.... they don't get "stuck" in the middle (usually it happened around 90% of a dbspace backup)
They suggested that we should change the BAR_TIMEOUT parameter (it exists only in XPS, not IDS....)
They even suggest me that I should consult the OnBar manual ;)

Finnally the tech support guy from S2 was on site, and I managed to have him collect the info (truss, gdb, lsof) from the media server processes. At that moment I understood why they ignored the info above. He didn't knew how to use or what were these utilities...
But I got a big surprise when I saw the outputs.... The process receiving data was "stuck" on recv(). This is an OS function used to get data from a socket. So, in short, we had an ESTABLISHED socket between two servers, one of them was on send(), and the other was on recv().... But no data was being transferred.

This made me think about network or TCP layer issues. But usually this are the last places to look for issues. Note that I don't consider network problems a rarity, but these usually translate into broken connections, not exactly hangs...
This observation was a turning point. I immediately assumed that the platform provider (S1) would have to be involved. The situation was even more complex because the media server was a Linux machine. So the problem could be in either platform...

During the next two months (!), I've been through a real nightmare, because the customer insisted that there could be something wrong with OnBar, I spent several hours in useless conference calls (customer and S2) and meetings. They insisted in making all kind of weird tests like:

Running onbar with few and more processes
Reconfigure their whole backup infrastructure
Reconfigure their storage manager parameterization

At some point they asked me the following:

Do you have customers running with this setup?
They seemed desesperate to prove OnBar was doing something wrong...
They don't understand we don't keep any kind of tracking on customer environments. We do keep an incredible useful internal knowledge base where customer problems are recorded, but as expected I was tired of making exhaustive searches on that KB without finding any similar cases...
They proposed to use another storage manager (interestingly from vendor S1)
The point was more or less this: If it doesn't work with that, surely the problem is not with the storage manager (and that again would imply the problem was with IDS...)
They insisted in having a tech support case opened on IBM
This seems reasonably, but the behavior of OnBar was completely explained in detail by me previously

At this point I was rather tired of the whole issue. Interestingly the tests with the second storage manager revealed two things:

The new environment hit another problem
This problem was already reported in IBM knowledge base and it was a problem in that storage manager BSA library. I had to wait one week in order to know if the version they were using included the necessary fix... it turned out it didn't ;)
After solving that first issue the backups aborted with some sort of storage manager timeout... The real difference was in the storage manager behavior. First one waited indefinitely and this one establishes a timeout (much like OnBar does, but since it's the one calling the low level OS function it aborts the process instead of just warning the administrator like OnBar does)

After about two months of outstanding time waste I understood the only way to solve this in a timely manner was to reproduce the issue without any Informix or storage manager layer. Let me point that some people in the customer team had made exhaustive tests sending data across servers using scp for example. All these tests were successful, and they concluded it couldn't be anything in the TCP layer. I warn them (and showed them using the tools above that scp didn't use the same functions, block size etc.). if the tests had failed it would have beeen nice, but a working scp session didn't proove us that there was nothing wrong with the TCP stack.

After a few more days and a lot of GB sent, I was able to create a test case with just C programming. A simple client/server program that kept sending data through a socket. I made sure I established the socket options and block size similar to what the storage manager's BSA library used.

After this, they finally accepted that something was wrong with the vendor OS functions (specifically send() ). In the next day we had a conference call with all the parties involved. And in less than a week we had a fix (newly coded) for the OS. After installing we never had another similar hang on send() function.

The purpose of this post is to show how important it is to know how to debug a problem. Tools like truss, dbx/gdb/adb and lsof are invaluable for this. I personally think anybody working on IT problem solving independentely of their role (DBA, programer, system administrator, network administrator etc.) should have a minimal knowlege about how they work and what kind of information you can collect with them. There are obviously other tools equally useful in some scenarios. Tools like tcpdump, netstat etc. This whole case revealed that many people involved in this area don't have a clue about how these tools work and the info they can collect.
Everybody involved had a theory about the problem. But none of those theories were based on facts or real observations. They were merely speculations of what could be happening and all these tend to pass the problem to a different team...
Some facts about the situation:

lots of emails including polished accusations were exchanged
during two months I spent a considerable amout of time paying attention to this, trying to lead people into the righ direction (mostly without success)
Vendor S2 had a very large team involved. They even sent a consultant from another country into this customer site (when he arrived we were about to receive the fix, so apart from politics this guy was not able to do much about this problem)
The problem could have been solved in around two weeks (one for debugging and another for the vendor investigation and code fixing)
No one admitted that didn't understand the output of the tools above and no one (even after the conclusion) took the opportunity to ask me to explain how to use this kind of tools
I don't think I know everything (and in fact my knowledge about these tools is somewhat superficial), but I only learn how to use this kind of stuff because in some situation in my past I came across some references to them and I took the time to experiment and read about them. In other words, we should use the problems as opportunities to gather more knowledge and to learn new things.

I keep receiving reports about problems without any useful information. My favorite ones are:

I tried it and it doesn't work!
Surely I believe it doesn't work... But usually when something "doesn't work" it raises an error. Only rarely people include the error code/description. More often they include possible causes (usually not related to the error) than they include the error code or error message
The process was hang! It was probably in some kind of loop! So I killed it
Well... processes don't "hang". They wait on something or they effectively stay in some loop. And the way to see that is by using the tools... And the tools don't work on a process that doesn't exist anymore...
I've found a bug!
Sure... A bug is a malfunction. Something that works differently from what's documented or what is expected. Most of the times people find a problem. After some analisys (many times involving supplier's technical support) it may be mapped to a bug. People tend to expect something. If it doesn't happen they "found a bug". Usually they don't bother to read the documentation and try to understand the reasons for the unexpected behavior.
Obviously people do hit bugs. Most of the cases that I open within IBM end up as bugs. But this is just a very small portion of the problems that customers report.

In short, I feel that in general, people's ability to study a problem in the IT world is vey limited. Usually people spend more time trying alternatives than collecting and understand problem data. Error messages and codes are ignored many times. And all these translate into a big waste of time, and obviously money... And of course, this directly impacts the quality and availability of the IT systems.

Monday, August 17, 2009

Current time? Really?!

"Let S be an <SQL procedure statement> that is not generally contained in a <triggered action>. All <datetime
value function>>s that are contained in <value expression>s that are generally contained, without an intervening
<routine invocation> whose subject routines do not include an SQL function, either in S without
an intervening <SQL procedure statement> or in an <SQL procedure statement> contained in the <triggered
action> of a trigger activated as a consequence of executing S, are effectively evaluated simultaneously.
The time of evaluation of a <datetime value function> during the execution of S and its activated triggers
is implementation-dependent."

The above is a citation of SQL Standard 2008/2003/1999 (part 2 - SQL/Foundation). It's a little nightmare to read, follow and understand, but it's the cause of one behavior generally considered an annoyance in Informix: Inside a stored procedure, all CURRENT references will show the same value.

Let's check the Informix SQL Syntax guide to see how the above ANSI rule is explained in the fine manual:

"SQL is not a procedural language, and CURRENT might not execute in the lexical order of its position in a statement.
You should not use CURRENT to mark the start, the end, nor a specific point in the execution of an SQL statement.

If you use the CURRENT operator in more than once in a single statement,
identical values might be returned by each instance of CURRENT. You cannot rely
on CURRENT to return distinct values each time it executes.

The returned value is based on the system clock and is fixed when the SQL
statement that specifies CURRENT starts execution. For example, any call to
CURRENT from inside the SPL function that an EXECUTE FUNCTION (or
EXECUTE PROCEDURE) statement invokes returns the value of the system clock
when the SPL function starts."

This looks like a more friendly explanation. But it doesn't point out the reason why this is implemented as is, and that reason is mainly for ANSI compliance. In practice, what I usually hear from customers is that this is inconvenient. Many times they are trying to use CURRENT YEAR TO SECOND/FRACTION to find out the time spent on a procedure or parts of it. And it simply doesn't work, because all the values they get are the same. Is there a solution? Yes, if you just need precision up to second:


SELECT
    DBINFO('utc_to_datetime', sh_curtime)
INTO
    current_time
FROM
    sysmaster:sysshmvals;

The sh_curtime field of sysmaster:sysshmvals contains the current unix time (number of seconds since January 1 1970). The DBINFO function with the 'utc_to_datetime' converts it into a DATETIME YEAR TO SECOND value.

Sunday, August 16, 2009

IDS 11.50.xC5

It's been too long and I have a lot of catch up to do... Let's start with the latest version of IBM Informix Dynamic Server. It's 11.50.xC5, the latest fixpack in the 11.50 family, available since the end of July 2009.

Keeping up with what can now be considered as usual, this fixpack brings important features. Let's list them:

High availability

Configuring RS Secondary Server Latency for Disaster Recovery
This allows the system administrator to configure a delay of time between the primary server and a RSS (remote secondary server). This means that the secondary server will lag behind the primary server for the specified amount of time.
Associated to this we have the ability to stop and restart the roll forward of the logical logs on the secondary server.
Note that for integrity purposes the primary will still send the logical logs as soon as possible. It's up to the secondary to hold the logs without applying them. This means that if your primary server disappears you're still able to recover up to the most current image of the primary server.
The configuration of this feature involves three new parameters: DELAY_APPLY, STOP_APPLY and LOG_STAGING_DIR.
DELAY_APPLY can be used to configure a delay between primary and secondary.
STOP_APPLY can be used to specify a specific time to stop applying logs
LOG_STAGING_DIR is a secondary locally writable directory where the server will keep the logs before applying them.
All this can be dinamically changed with onmode -wf/-wm command

Administration

Forcing the Database Server to Shut Down
Two new utilities were made available to handle situations where things went wrong... If for example your IDS server is hang, or when it was not shutdwon properly, you may have trouble to stop it and/or restart it (it can leave shared memory segments behind for example).
The new utilities are onclean and onshutdown. Onclean can be used to force the IDS server down (it can kill the engine processes) and it will try to clean all the shared memory segments that were left behind. Without options it's use is to remove the sahred memory segments of a server that was already stopped. With "-k" it will kill the server and then make sure the shared memory is freed.
Onshutdown is used to attempt to kill the server using the normal "onmode -ky", but if it fails it calls onclean.
This is not a big feature, but it can be handy for scripting and for new users.
Enhanced Support for Multibyte Character Strings
By using a new environment variable, SQL_LOGICAL_CHAR, the server will interpret the size of character fields in terms of characters instead of bytes as usual
New default for the INFORMIXTERM environment variable for terminal set up on UNIX
Simply a new default for INFORMIXTERM variable
onconfig Portal: Configuration Parameters Listed by Functional Categories
onstat Portal: onstat Commands Listed by Functional Categories
These two are important documentation enhancements available at IDS Infocenter
Enhancements to the Enterprise Replication plug-in for the OpenAdmin Tool for IDS
Open Admin Tool keeps being expanded and improved. And free...

Enterprise Replication

Enterprise Replication Stops if Memory Allocation Fails
Better handling of low memory conditions. Replication will stop and raise an alarm. After solving the memory issues just run cdr start
Notification of an Incorrect Log Position When Enterprise Replication Restarts
A new alarm is fired if the replay position is invalid (too old or later than current log position)
Workaround: Reclaiming Space from Enterprise Replication Paging Smart Large Objects
Later versions could allocate space for replication of SLOBs that was not necessary. This space can now be reclaim.
Improving the Performance of Consistency Checking with an Index
A new shadow column can be added to replicated tables. This column (ifx_replcheck) can be used in a composite index with the primary key in order to speed up the checking of replicates.
Specifying the Range of Data Sync Threads to Apply Replicated Transactions
An ONCONFIG parameter can control the number of replication threads

Warehousing
These features are presented in the Warehousing category, but I personally think they can be used in any kind of environment

Loading Data into a Warehouse with the MERGE Statement
This IDS version introduces support for the MERGE sql statement. We specify a destination table, a source table/query and a match condition. If the match condition is true than the row in the destination table is UPDATEd with the data from the source table/query. If the match condition is false a new row is inserted in the destination table
Retrieving Data by Using Hierarchical Queries
Introduces support for the CONNECT BY sql construct. This allows the use of the so called hierarchical queries. Please note that the most efficient way to handle this would be using the node datablade. But for compatibility reasons IBM decided to introduce this feature. It's a nice feature for application portability

So, that's all for xC5. Let's use the new features and keep an eye on xC6 that should appear before the end of the year.

Friday, May 8, 2009

11.50.xC4: Another mark in Informix history

IBM Informix Dynamic Server 11.50xC4 is available for customers on Passport Advantage.
It's another fixpack that comes with significant improvements. Traditionally we had to wait for full releases to have some significant features but we're getting used to see great improvements in fixpacks.
This one is no exception and from the release notes we can see an overview of the new functionality:

High Availability

Availability Enabled on Windows 64-bit Operating System
Some of the HA features were not available on MS Windows 64 bit versions

Administration

Save Disk Space by Compressing Data
Data compression for tables. This deserves a few more words... Check the end of the article
Improved Options for Specifying Processor Affinity
New options for providing binding between virtual CPUs and physical CPUs. We can now specify a list of physical CPUs, one ore more ranges, and ranges with increments
Disable IPv6 Support
If you prefer to disable the IPv6 protocol you can create and empty file named $INFORMIXDIR/etc/IFX_DISABLE_IPV6 (readable by informix) or set and enrionment variable: IFX_DISABLE_IPV6=yes
Enhancements to the OpenAdmin Tool for IDS
These include: Control of table compression, query by example for all the usual table operations (query, insert, delete and update), SQL explorer history, managing of external directives and export/import of OAT connection definitions
Generating a Customized Database Server Configuration File
A new utility called genoncfg was introduced. It takes a default onconfig.std and a text file with some directives (number of CPUs, memory, rootdbs location) and adjusts the instance configuration. The result is saved in a file called onconfig on the local directory. This can be the beginning of a great tool

Enterprise Replication

Generate ATS and RIS Files in XML Format for Custom Repair Script
The Aborted Transaction Spooling and Row Information Spooling files can be generated in XML format to facilitate the implementation of recovery processes
Prevent Upserts During Conflict Resolution
An Upsert happens if an Update it send to target and the row does not exists. This may not be desirable if the row was previously deleted. Now we can make the DELETE "win"
New Event Alarms for Enterprise Replication
Three new situations will trigger an alarm: When a server connection is dropped, when a server, replicate or replicate set changes state (stopping, starting, modifying, deleting, or changing the connection status) and when a transaction fails (generating ATS/RIS files)
Monitor the Progress of a Consistency Check or a Direct synchronization
New cdr commands to see the status of check and sync processes (cdr stats check and cdr stats sync)
Specify the Duration of a Consistency Check
By default, inconsistent rows are rechecked once after two seconds, which might not be enough time for replicated transactions to be applied on the target server. You can now specify the number of seconds to spend on rechecking the consistency of inconsistent rows. Rechecking prevents transactions that are in progress from being listed as inconsistent in the consistency report. You can use the --inprogress option of the cdr check replicate and cdr check replicateset commands to specify the maximum number of seconds to perform rechecking.
Schedule Synchronization or Consistency Checking Operations
cdr check and cdr sync allow for a new flag ( --background ) that will use the scheduler to make periodic runs of those processes
Improve Consistency Checking Performance
Several options allow the execution of parallel threads for check operations. You can specify which data or intervals to check
Improve the Performance of Checking or Synchronizing Replicate sets
A new option ( --process ) define how many threads to use in paralell for check and synchronization jobs
Role separation for Enterprise Replication (UNIX)
If you're using role separations your DBSAs can control enterprise replication. Previously only user informix could
Enhancements to the Enterprise Replication plug-in for the OpenAdmin Tool for IDS
A lot of new features in the ER plugin for Open Admin Tool (OAT)

Performance

Enable Concurrent I/O to Improve Performance on AIX Operating Systems
IDS supports DIRECT I/O since version 11.10. Now it also supports Concurrent I/O (CIO) on AIX. It's a needed step to really get to the performance of raw devices. I have plans for an article on this subject...
IFX_LARGE_PAGES Environment Variable (AIX, Solaris)
Allows IDS to take advantage of a pool of large memory pages configured by the system administrator. The shared memory virtual segments must be set to RESIDENT by the database system administrator

Application Development

DataBlade Module Registration through SQL
Datablades can now be registered using plain old SQL. A new function, SYSBldPrepare is used for the registration process. So you can do it from any SQL tool or interface
Control the Results of a Fuzzy Search with the Basic Text Search DataBlade Module
You can now specify the degree of similarity of search results in fuzzy searches when using the Basic Text Search DataBlade module. Specify a number between 0 and 1, where a higher value results in a higher degree of similarity. To limit results, specify a higher number. To maximize results, specify a lower number. The default degree of similarity is 0.5.
Map Characters for Indexing with the Basic Text Search DataBlade Module
This allows us to replace special characters with other characters for indexing purposes
Default Boolean Operator in Basic Text Search Queries
When we query a BTS index, the space between words is considered a default operator. Now we can define that default operator at index creation time
Storage for Temporary Basic Text Search Files
It's now possible to define a separate sbspace for the temporary files used in BTS searches. This can increase performance. Up to now they were created in the same space as the index.
Track Basic Text Search Query Trends
The possibility to save the queries made using a BTS index. This information can be used to provide hints to the end users in order to improve their own queries. It's like using past experiences to improve future queries
Fragment bts Indexes by Expressions
It's now possible to split a BTS index into several sbspaces by using fragmentation
Monitor Change Data Capture Sessions
A new onstat option ( onstat -g cdc ) that allows monitoring on data capture sessions. Change Data Capture API was introduced in 11.50.xC3 and can be used to capture data change and propagate it into other databases, web services etc. It's also one basic component to use SolidDB as a frontend cache for IDS.
View Change Data Capture Error Message Text
A new function ( cdc_errortext() ) that retrieves an error message from an error generated by CDC

Security

Automatic Check for Installation Path Security (UNIX, Linux, Mac OS X)
The installation process checks that the directory structure is secured
Utility to Enhance the Security of the Dynamic Server Installation Path (UNIX, Linux, Mac OS X)
A new utility ( onsecurity ) that can check the integrity of the product installation. It can detect if the installation was compromised and in that case forbids the use of the product

As we've seen in previous fixpacks, IBM introduces another major feature in this one. Compression. There is a lot to say about IDS compression, but it would be a waste of time spending too much time here. The subject is already covered in several places:

The white paper from Scott Lashley:
http://download.boulder.ibm.com/ibmdl/pub/software/data/sw-library/informix/whitepapers/compression.pdf
The IDS compression site:
http://www-01.ibm.com/software/data/informix/compression/
On this site you can access the IDS compression estimator. It's a very simple tool that you can use to connect to your instances and estimates the level of compression you can achieve. The tool requires IBM Informix Connect or Client Software Developer Kit (CSDK) and works only on Windows
A developerworks article from Nilesh Ozarkar:
https://www.ibm.com/developerworks/data/library/techarticle/dm-0904idsoptimization/

I'd like to highlight a few aspects of IDS compression:

It's a dictionary based compression, similar to DB2 deep compression. This means you don't have to rely on pattern repetition on each page
Since the dictionary is created by sampling the table it's very fast to get a compression estimate
It works for normal INSERTs and UPDATEs. It doesn't depend on load operations
It works online (without blocking the tables or rows)
It comes with two other features: repack and shrink. Repack regroups the data rows sequentially on the table, leaving the free space at the end. Shrink can return this free space to the dbspace/chunk where the table resides

But if compression is the brightest feature we should not let the others go by without notice:

From the list above we can see that Basic Text Search datablade has seen a lot of nice improvements. I'd bet that if IBM keeps improving it like this it will have to change it's name to something like "Not so Basic Text Search Datablade" :)
And again, Enterprise Replication (ER) sees a lot of improvements
A new version of Open Admin Tool (OAT) with several major improvements
The new utility to generate a configuration file (genoncfg). It's still a bit simple, but I believe it can be the start of a great utility to help new Informix users to get a basic working configuration
The introduction of CIO for AIX. This is a very important step after the introduction of DIRECT_IO. I would love to write a post about this. Stay tuned.

And that's it for now. If you're an IBM customer you can download 11.50.xC4 from the usual sites. If you're not an IDS user, you can try it by downloading the IBM Informix Dynamic Server Developer Edition from:

http://www.ibm.com/developerworks/downloads/im/ids/dev.html

Sunday, April 26, 2009

Informix community news

International Informix User Group is hosting the annual Informix User Conference. It's taking place in Kansas City, not far from the Lenexa development labs where most of the IDS development is done.
This allows the conference participants to meet with some of the more influential people in Informix development. Most of the product architects will be at the conference.

It should be starting, and it will end on Wednesday. You can check the schedule on the IIUG site.
The conference is packaged with a lot of interesting sessions, workshops and you can even take the exams for IDS v11 certification. If you're not going to the conference you can also take a free online test just to get some feedback on your knowledge level. The exams are available on the Prometric website and you can get a promotional code which makes the test exam free (usually it costs $10). Please check http://www.ibm.com/software/data/education/cert-assessment.html?ca=09iiugoffer
and go to the Prometric Website. Use the promotional code IIUG2009.

Taking advantage of being at the conference, Cathy Elliot, Informix Software Marketing Manager, has just created a new blob, called Informix Voices ( https://www.ibm.com/developerworks/mydeveloperworks/blogs/InformixVoices/ ) where she intends to post some interviews with relevant people from the Informix community. Should be worth to keep an eye on it.

Finally, we are all waiting for IDS 11.50.xC4 which we know by now that will bring table compression to the Informix world. I will post about it when the version is available. Meanwhile if you're going to the IIUG conference, enjoy it. If not, keep an eye on the Informix blogs for updates.

Saturday, April 25, 2009

Returning to null...

In a recent article I talked about the way to return a NULL from a query. That remembered me how "picky" Informix is with NULL. Did this sound like me saying Informix does something wrong?! That would be strange in an Informix dedicated blog...
Well... I said "picky" because that's the feeling I get from people who face the way Informix deals with NULL. Usually those persons have a background in other RDBMS which may times behave differently. Let me show some situations common in Informix:

The query below returns NULL. Most of the times people expect it to return "SomeString" SELECT "SomeString"||NULL FROM systables WHERE tabid = 1;
The query below return no rows. People would expect it to return some rows
SELECT "SomeString" FROM systables WHERE "A" != NULL::CHAR
The query below also returns NULL. People usually expect it to return 10
SELECT 10 + NULL::INTEGER FROM systables WHERE tabid = 1

Query 1) and 3) are similar. Above I wrote that Informix is "picky". I could say it in another way: Informix is very strict on the way it treats NULL. By definition (SQL standards), NULL is the absence of value. That means that you can't do nothing with it... Any operation (concatenation in query 1 and addition in query 3 for example) will return NULL. Any comparison with NULL returns FALSE. In order to understand it we must avoid looking at the question with a practical approach and instead use a formal, conceptual approach. In other words, it would be intuitive that if we concatenate a string with "nothing" we would still get the string. But on the other hand, conceptually, we cannot concatenate a string with the absence of string... And we cannot compare anything with NULL, because it does not exist. So, by definition any such comparison should return FALSE. "A" is not equal to something that does not exists, but conceptually you can't compare these two operands, so the comparison fails (FALSE).

Also note that in Informix NULL cannot be represented by "" for example. "" is an empty string. But at least it has a defined value. NULL hasn't.

Many developers argue that the way Informix deals with NULL does not make sense and causes incoveniences. I argue that this makes a lot of sense (mathematically) and I concede that it can be inconvenient sometimes from a practical perspective. But in reality, what those developers or I think is pretty irrelevant. What is relevant is that it's the way the SQL standard defines NULL and Informix follows it.

In order to overcome the inconveniences we can use the NVL() function.

Thursday, April 23, 2009

Compress and free....

Guy Bowerman post's about two new Informix white papers. One of them talks about the new compression feature of IDS 11.50.xC4. Be sure to check it, if you feel your data is growing beyond acceptable limits.
The other paper gives an extensive and detailed overview of managing IDS on Windows platform.

Here's the article: http://www.ibm.com/developerworks/blogs/page/gbowerman?entry=new_informix_white_papers

Saturday, April 18, 2009

Informix authentication and connections

The idea for this post comes from a recent discussion on c.d.i and also from checking the URL referrers that bring people here.
Most of this blog visits comes from Google or other search engines. And apparently there are a lot of questions about how Informix authenticates and establishes user connections.

I will try go give a generic overview on these subjects. I'll point out a few less known aspects of Informix authentication.
This article in no way tries to substitute the official documentation, but I hope this can be used as a concentrated information resource about the subject.

Authentication versus privileges

First, we have to create a distinction between two steps involved in a connection establishment.
As we all know, IDS and other databases have several privileges required for doing any action in the database.
At the first level we have Database level privileges. These are CONNECT, RESOURCE and DBA. Then we have object level privileges. For tables we have INSERT, DELETE, ALTER, REFERENCE and for column level we have SELECT and UPDATE
The list above is not exhaustive. The privileges relevant for this article are the database level ones. In particular the CONNECT privilege. Without it we will not be able to connect.
Privileges can be given using the SQL stament GRANT and removed with the REVOKE. Object level privileges can be GRANT'ed to individual users or to ROLEs. You can imagine ROLEs as groups and you can GRANT a role to individual users.

But let's get back on track. The purpose here is not to explain the privilege infra-structure, but to explain how do we connect to the database. I just explained the privileges because we need to go through two steps for establishing a connection:

First Informix has to make sure we are who we pretend to be. This is called the authentication phase
Second, Informix will check if the user we are defining in the connection has the required privileges to establish the connection.
It cannot do the second step without first completing the authentication.
To be able to connect we need at least CONNECT privilege on the database level. This privilege can be granted specifically to the user or to the special role PUBLIC which means "anyone"

User repositories

One very important aspect of Informix is that it doesn't use the concept of database only users. Any user in an Informix database must be recognized by the underlying OS (up to the current version at the time of writing which is IDS 11.50.xC3).
It's essential to know this in order to understand how it works. For example, if you want to create a user for Informix Dynamic Server usage, you have to create it in the OS or other repository, but you must make the OS aware of that user. And for users created in the OS, if you need to change their password you must do it using the OS tools.

Some people may consider this a big limitation, but the subject is a bit more complex than what it may look at first.
There are reasons for this requirement:

In an Informix stored procedure, you can call an OS command or script. This is done using the SYSTEM() SPL statement. This command will be run with the user identity that was used to establish the database connection.
Other databases execute these commands as a specifically configured user or as the user running the database software. The way Informix works can give you a lot of flexibility, but the price for that is the need to have the user recognized at the OS level
When we ask the database server to print out the explain of a query using the SET EXPLAIN SQL instruction, Informix will create a file on the database server. The information is written by the database instance processes, so it has to be written on the server running the database.
This files are written with the user ID used for the connection. The file is written on the current client directory for connections local to the database server and on the users $HOME for remote connections. Again, this requires a user created in the OS
The SET DEBUG FILE/TRACE ON SPL instructions have the same behavior and requirements

The above doesn't necessarily means we need to have the users in /etc/passwd (Unix/Linux). It means that getpwnam() must be able to retrieve the user info. This will happen transparently if you configure your system to use LDAP or NIS authentication.
So you can have a central user management infra-structure like LDAP, MS Active Directory or NIS. Informix also doesn't require the user to be able to establish a session (ssh or telnet for example) on the database server at the OS level. You can for example set the users Shell to /bin/false or use other means to prevent the users to connect to the OS.
To complicate things a little bit more, we should not confuse the need to have the user id recognized in the OS with the authentication mechanism. Since 9.40 Informix can use PAM for authentication. This gives you complete freedom to implement complex and sophisticated authentication methods. You can use whatever PAM modules you desire to implement the authentication, which means your Informix authentication doesn't have to be the same as your OS authentication.
But the user has to be known by the OS due to the reasons presented before. As a quick example, you can create your users with random passwords in the OS (not known to anyone) and configure IDS to authenticate your users based on files, or any remote process using PAM.

The fact that the user has to be known in the OS is generally accepted as an inconvenience, and as such I believe it's safe to assume that in a future version Informix will relax these requirements. It could probably accept connections from users not recognized by the OS (either authenticated internally or with PAM or both) and have some parameter to define a default user for the situations where it needs a user id.

Types of connections

Informix implements the following types of connections:

Trusted connections
These connections are based on trusted relations between hosts. Basically the database server host is configured to trust the user if he connects from a specific host or list of hosts.
As such, this connections can be established without providing a password or any other authentication mechanism. The way it works is exactly the same as the "r" services (rshell, rcmd, rlogin).
The configuration is done using the network connectivity files, /etc/hosts.equiv and ~/.rhosts
Non trusted connections
These connections are the standard type of connections. Typically we provide a user and a password that is used to check the identity. The password is checked against the user's stored password
Challenge/response based authentication using PAM modules
Connections made to PAM configured ports (DBSERVERALIAS). PAM stands for Plugin Authentication Modules and can be used in any Unix/Linux system (AIX, HP-UX, Linux and Solaris)
Distributed query connections
These connections can be considered trusted connections. But they're established implicitly when a client connected to instance "A" sends a query referencing instance "B".
In these situation the IDS server "A" will make an implicit connection on behalf of the user, to IDS server "B". The connection authentication will follow the rules for implicit connections, or will use a different authentication mechanism if server "B" is setup with PAM authentication (more on this later)

Non trusted connections

We use non-trusted connections when we give a user id and a token (typically a password) that guarantees our identity (only the user should know it's personal authentication secret or password) . Meaning we have not only the user id, but also it's secret key.
These connections are used mostly in applications that use JDBC, ODBC, .NET, PHP (PDO), Perl (DBI) etc. These APIs require a connection string or URL, or use specific API connection functions. So we define the password in the string, or we provide is as an argument for the API connection function.
Here's an example of a JDBC URL to connect to an Informix instance:

jdbc:informix-sqli://ids_server_machine:9088/my_database;user=username;password=secret

In ESQL/C (embedded SQL/C - which is basically C language with SQL commands) or Informix 4GL we can use the CONNECT instruction:

CONNECT TO database@ids_instance USER <"username"|variable_user> USING <variable_password>

One important note, that allows me to show the two steps in a connection establishment: authentication and database open:
Typically in 4GL we use a "database test_auth" instruction. This, by default, makes the two steps by trying to make a trusted connection using the owner of the process.
But let's create a simple 4GL program to show the difference between authentication and the database opening phase. Here's the code (don't worry if you don't know 4GL, because these lines are self explanatory):

DEFINE username,password char(20)
DEFINE c CHAR

MAIN
PROMPT "Insert your username: " FOR username
PROMPT "Insert your password: " FOR password ATTRIBUTE ( INVISIBLE )
CONNECT TO "@cheetah2" USER username USING password
PROMPT "We are authenticated, but not yet connected to a database..." FOR CHAR c
DATABASE test_auth

PROMPT "Now we have an opened database!" FOR CHAR c
END MAIN

So, we're asking the user's name and password. Than we're CONNECTing to the server. Note that I didn't use a database name. This is a supported syntax, although not very frequently used.
After the connect, we then issue the usual "DATABASE" instruction. Let's see what happens when we run the program above. First the status of the IDS instance showing the existing sessions:

cheetah2@PacMan.onlinedomus.net:informix-> onstat -u

IBM Informix Dynamic Server Version 11.50.UC2     -- On-Line -- Up 09:52:57 -- 88064 Kbytes

Userthreads
address  flags   sessid   user     tty      wait     tout locks nreads   nwrites
4780c018 ---P--D 1        informix -        0        0    0     69       559
4780c5f0 ---P--F 0        informix -        0        0    0     0        1032
4780cbc8 ---P--F 0        informix -        0        0    0     0        596
4780d1a0 ---P--- 5        informix -        0        0    0     0        0
4780d778 ---P--B 6        informix -        0        0    0     48       0
4780e328 ---P--- 17       informix -        0        0    1     315      224
4780e900 ---P--D 9        informix -        0        0    0     3        0
4780eed8 ---P--- 16       informix -        0        0    1     1286     1093
4780f4b0 ---P--- 15       informix -        0        0    1     189      3
47810060 Y--P--D 21       informix -        440cfe28 0    0     0        0
10 active, 128 total, 18 maximum concurrent

So... Only informix user system sessions. Now let's run the program (I'll do it as root):

cheetah2@pacman.onlinedomus.net:root-> ./test.4ge
Insert your username: fnunes

Insert your password: [......][ENTER]

We are authenticated, but not yet connected to a database...

The program is now holding at the first "PROMPT" instruction after the connect. Let's see the list of sessions:

cheetah2@PacMan.onlinedomus.net:informix-> onstat -u

IBM Informix Dynamic Server Version 11.50.UC2     -- On-Line -- Up 09:57:30 -- 88064 Kbytes

Userthreads
address  flags   sessid   user     tty      wait     tout locks nreads   nwrites
4780c018 ---P--D 1        informix -        0        0    0     69       559
4780c5f0 ---P--F 0        informix -        0        0    0     0        1032
4780cbc8 ---P--F 0        informix -        0        0    0     0        596
4780d1a0 ---P--- 5        informix -        0        0    0     0        0
4780d778 ---P--B 6        informix -        0        0    0     48       0
4780e328 ---P--- 17       informix -        0        0    1     315      224
4780e900 ---P--D 9        informix -        0        0    0     3        0
4780eed8 ---P--- 16       informix -        0        0    1     1286     1093
4780f4b0 ---P--- 15       informix -        0        0    1     189      3
4780fa88 Y--P--- 61       fnunes   4        485bff38 0    0     0        0
47810060 Y--P--D 21       informix -        440cfe28 0    0     0        0
11 active, 128 total, 18 maximum concurrent

cheetah2@PacMan.onlinedomus.net:informix-> onstat -g ses 61

IBM Informix Dynamic Server Version 11.50.UC2     -- On-Line -- Up 09:57:39 -- 88064 Kbytes

session           effective                            #RSAM    total      used       dynamic
id       user     user      tty      pid      hostname threads  memory     memory     explain
61       fnunes   -         4        15267    pacman.o 1        40960      37280      off

tid      name     rstcb    flags    curstk   status
85       sqlexec  4780fa88 Y--P---  5728     cond wait  netnorm   -

Memory pools    count 1
name         class addr     totalsize freesize #allocfrag #freefrag
61           V     483c3028 40960     3680     68         7

name           free       used           name           free       used
overhead       0          1672           scb            0          96
opentable      0          784            filetable      0          192
misc           0          64             log            0          16512
temprec        0          1608           gentcb         0          1232
ostcb          0          2632           sqscb          0          8192
sql            0          40             hashfiletab    0          280
osenv          0          1720           sqtcb          0          2208
fragman        0          48

sqscb info
scb      sqscb    optofc   pdqpriority sqlstats optcompind  directives
488a18a0 48558018 0        0           0        2           1

Sess       SQL            Current            Iso Lock       SQL  ISAM F.E.
Id         Stmt type      Database           Lvl Mode       ERR  ERR  Vers  Explain
61         -              -                  -   Not Wait   0    0    9.29  Off

So what do we see? We have a session, on behalf of user "fnunes" but the "Current Database" is not defined. So we've just gone through the authentication process with success.
Now, if I press ENTER on the program I'll force the program to execute the DATABASE statement and wait on the second PROMPT:

cheetah2@pacman.onlinedomus.net:root-> ./test.4ge
Insert your username: fnunes

Insert your password: [...] [ENTER]

We are authenticated, but not yet connected to a database... [ENTER]
Now we have an opened database!

ok... now let's look at the database session again:

IBM Informix Dynamic Server Version 11.50.UC2     -- On-Line -- Up 10:09:32 -- 88064 Kbytes

session           effective                            #RSAM    total      used       dynamic
id       user     user      tty      pid      hostname threads  memory     memory     explain
61       fnunes   -         4        15267    pacman.o 1        49152      45544      off

tid      name     rstcb    flags    curstk   status
85       sqlexec  4780fa88 Y--P---  5728     cond wait  netnorm   -

Memory pools    count 1
name         class addr     totalsize freesize #allocfrag #freefrag
61           V     483c3028 49152     3608     83         6

name           free       used           name           free       used
overhead       0          1672           scb            0          96
opentable      0          1792           filetable      0          352
misc           0          64             log            0          16512
temprec        0          1608           gentcb         0          1232
ostcb          0          2632           sqscb          0          13712
sql            0          40             rdahead        0          832
hashfiletab    0          280            osenv          0          1720
sqtcb          0          2792           fragman        0          208

sqscb info
scb      sqscb    optofc   pdqpriority sqlstats optcompind  directives
488a18a0 48558018 0        0           0        2           1

Sess       SQL            Current            Iso Lock       SQL  ISAM F.E.
Id         Stmt type      Database           Lvl Mode       ERR  ERR  Vers  Explain
61         -              test_auth          CR  Not Wait   0    0    9.29  Off

So, now we have a current database, in the same session.
This ends the description of non trusted connections using username and passwords. Pretty simple, just like in any other database server.

Challenge/response connections using PAM

Another kind of non trusted connections are connections established through PAM configured DBSERVERALIAS (specific ports where the instance will listen for connections).
I will not go into details about this, because I already wrote an article about it. In http://informix-technology.blogspot.com/2007/11/informix-user-authentication-pam-for.html I explain how to setup and give examples of challenge/response scenarios.

In summary, a non-trusted connection in Informix is similar to connections in other RDBMS. You supply a username and a token that certifies you're entitled to connect as that user.
Typically this token is a password. But with PAM this can be a much more complex item.
In IDS 11.50 we can also use single sign on with kerberos. Due to the complexity of this scenario I've left it out in this article.
Non trusted connections are used mostly in more recent applications, like Web based applications, Windows client applications, or script languages (PHP, Perl, Ruby etc.) languages.

Trusted connections

These connections are made without any token to confirm the user identity. It's assumed that the user is already identified on the client system and we configure the database server to trust that identification. Trusted connections are mostly used in applications created using Informix 4GL and Informix ESQL/C.
The definition of the trusts are done by configuration of the so called network security files. These files are /etc/hosts.equiv and the ~/.rhosts (.rhosts created in each users's home dir).
Before we dig into these files content, and what it means I have to make a parenthesis to deal with something I'd almost call a myth surrounding Informix trusted connections. If you look around on the Internet you'll easy find a lot of information about how insecure is to use these files. In particular, you may found explicit instructions to not use the ~/.rhosts file.
The main reasons for these are two facts:

These files are not specific for Informix, and in fact they were created to configure a group of services, sometimes called the "r" services. These services are rexec (allows remote execution of a command) on TCP port 512, rlogin (allows remote login) on TCP port 513 and rcmd (all remote shell execution) on TCP port 514.
Many of these services are considered insecure and have a long history of security flaws. The most famous worm also took advantage of some of these security flaws, and also from very relaxed security setup on many of the networks connected to the Internet at the time.

Why did I referrer before the "myth"? Well, many customers have the idea that since IDS uses the same files for trusted connections authentication that you need to be running the above services for it to work. This is completely false. I cannot stress this enough. Even in the official documentation, the administrator guide, is written:

To determine whether a client is trusted, execute the following statement on the client computer: rlogin hostname If you log in successfully without receiving a password prompt, the client is a trusted computer.

Although this is not wrong, it misses the point, and gives the idea that the rlogin service should be running. The above sentence, saying that if rlogin works the user is trusted, is true. But you can create a trust for an Informix connection without having the service(s) running. In this case you'll get a "can't connect error" instead of the password prompt when you run rlogin. And it doesn't mean it won't work with Informix.
I strongly believe Informix should not use these same files for defining it's trust relationships. But the concern that this usage raises is not really an issue. If you're concerned about security, you probably will not use the "r" services. Most of the sites which take security seriously have replaced them with ssh or something similar. So the truth is that these files should be useless in a modern system. And in this scenario, the fact that they're used by Informix should not be considered too critical, since nothing else should depend on them.

I hope this contributes to the disappearance of the "myth". Now let's see how we have to change the network security files in order to create the trusts.
The contents of the files should be similar to how you would configure "r" services trusts. Let's see the two files, because there some slight but very important differences.

/etc/hosts.equiv
Each line in this file defines a trust relation. You can specify a host only, or a host and a user. This file is a system file. No user other than system administrator should have permission to change it.
If you specify only the host name, you're saying that any user from the remote host is trusted to connect as the same user in the local host. If you specify a "+" (plus sign) this means "any host" and you should never do that...
There is a significant difference between how Informix and "r" services interpret the file if you specify a hostname followed by a user name. For "r" services, this means that you trust the remote user on the remote system to connect as ANY user in the local system (except root). This is obviously a very serious issue... And by the way, "+ +" means any user from anywhere can login locally as any user! (don't try this even at home :) )
Informix on the other hand interprets this as a way to specify user by user and not all the users. The remote user cannot connect as another local user without providing a username and password.
~/.rhosts
This file also includes a trust relation in each non comment line. This file, being in the user home directory, means that you allow the user on the local host to define who he trusts. Many systems verify that the file has no write permissions to the "world". In this case it ignores the settings.
A line with a host and a username means that the specified username on the specified remote host can connect as the local user owning the .rhosts file. So, for "r" services, if you specify a host and user in /etc/hosts.equiv it means that user can connect as any local user. If you do the same in ~/.rhosts you're reducing the scope of local users that the remote user can authenticate as, to only the file owner
For Informix, the interpretation is the same: The remote user can authenticate on the local machine, but without changing it's identity. It means that for example, if the local user is luser and it's ~/.rhosts contains "remotehost ruser", then this entry is useless because Informix doesn't have a way to specify the identity change in a connection attempt.

Some additional notes on these files:

You can specify negative entries. For example you could use the following in /etc/hosts.equiv:

pacman -informix
pacman

This would allow all users from host "pacman" to connect except informix
An entry with a host name means "trust all users except root". A trust relation for root user has to be defined in it's .rhosts file.
The order of the entries is relevant. If we exchange the order of the entries in the example above, even informix will be allowed to connect. That's because a positive entry will be found first

Let's see some examples. I have two hosts (pacman and ids1150srvr). I'm running IDS on the ids1150srvr host. If I want to allow the user informix to connect to this instance from the pacman host I can do it in the following ways (these file entries are on the ids1150srvr host):

/etc/hosts.equiv:

pacman
pacman informix

The first entry allows all users (except root) to connect from pacman. The second will allow only informix. But be aware that if you're running "r" services you're saying that informix on pacman can connect as any user (again, except root) on ids1150srvr.
~informix/.rhosts

pacman

Distributed query connections

We've seen a client/server trusted connection. But we have another situation where we must establish trusts between systems. That's the case when you need to join data from two Informix instances. The SQL syntax to do this will look like this:


SELECT
local_table.column, remote_table.column
FROM
table1 local_table, remote_database@remote_ids_instance:table2 remote_table
WHERE
local_table.join_column = remote_table.join_column;

Assuming we are connected to a "local_ids_instance" on ServerA, and that "remote_ids_instance" is running on ServerB, this query will establish a connection on "remote_ids_instance" from the "local_ids_instance". Informix will establish this connection on behalf of our user on "local_ids_instance". As such, our user must be trusted on ServerB when connecting from ServerA. Note that the trust is not done between our client and the ServerB. It's between ServerA and ServerB. If the "remote_ids_instance" is accessed through a normal Informix port, the authentication mechanism used is the same as for trusted connections.

The situation will be different if the "remote_ids_instance" port is configured with PAM. In this situation the trust configuration is done entirely through SQL instructions. IDS versions that support PAM also have a new system database. It's called sysuser. In it we have a table called sysauth with the following schema:


{ TABLE "informix".sysauth row size = 322 number of columns = 4 index size = 295              }
create table "informix".sysauth
(
username char(32) not null ,
groupname char(32),
servers varchar(128) not null ,
hosts varchar(128) not null ,
check (groupname IS NULL )
);
revoke all on "informix".sysauth from "public" as "informix";


create unique index "informix".sysauth_idx on "informix".sysauth     (username,servers,hosts) using btree ;

This table is used to define trusts for distributed queries when the destination Informix instance is configured for PAM authentication.
We must specify the username, the hostname and the IDS server from where the connection is made. Note that this not only avoids the usage of the network security files but also provides more control. With the network security files we cannot specify the Informix instance originating the connection.

So, picking up the query above, and assuming the username is "fnunes" we would need the following configuration on ServerB for it to work:

Using /etc/hosts.equiv one of the following lines:

ServerA
# this would create a trust for all users from ServerA
ServerA fnunes
# This would restrict the trust in Informix for user fnunes.
# But for "r" services it would allow fnunes to login as any user except root on ServerB

Using ~fnunes/.rhosts
```
ServerA
```

Using the sysuser:sysauth table (if the port was configured for PAM):

INSERT INTO sysauth ( username, groupname, servers, hosts) VALUES ('fnunes', NULL, 'local_ids_instance', 'ServerA');

Additional notes about sysauth:

Although the field names "servers" and "hosts" suggest you could use a list of servers or hosts the documentation says we should create one entry per user/server/host
The field "groupname" is not used currently as expected from the check constraint

Additional notes

There are a few less none configuration details about connections in Informix. Let's check them.

There are some options that we can specify in the sqlhosts file that control the way IDS deals with trusted connection attempts (for the specific ALIAS of the sqlhosts line).The options I'm referring to are "s=X". "s" stands for security on the server side. The following values are supported:

s=0
Disables both hosts.equiv and rhosts lookup from the database server side (only incoming connections with passwords are accepted).
s=1
Enables only the hosts.equiv lookup from the database server side.
s=2
Enables only the rhosts lookup from the database server side.
s=3
Enables both hosts.equiv and rhosts lookup on the database server side (default setting for the database server side).
s = 6
Marks the port/ALIAS as a replication only (HDR or ER) port
If you use the option s=6 on sqlhosts options field, than you can create a file called hosts.equiv in $INFORMIXDIR/etc, owned by informix:informix and only modifiable by user informix. This makes the informix user independent of the system administrator for configuring the trusts, and additionally doesn't interfere with "r" services if they're in use

There is another security option that defines if the client uses or not the .netrc file. I haven't yet mentioned the .netrc file... You can use this file so specify a user and password, per host that you want to connect. If you setup the file and try to make a trusted connection you'll instead make a non-trusted connection. An example of this file content would be:

ids1150srvr login fnunes password mysecret

This would connect you as user "fnunes" using password "mysecret" if you attempt to make a trusted connection to an Informix instance running on host "ids1150srvr". You can have several lines, one for each host you connect to.
There are obvious security issues derived from having a password in a file...
The option that specifies if the client libraries will look and use this file is the "r=X" settings on the sqlhosts options (client side):

r=0 it ignores the .netrc file
r=1 it looks at the .netrc file when a client program tries to make a connection without using username and password

Windows specific information

In MS Windows, the network security files are used in a similar way, but they're in different locations. /etc/hosts.equiv becomes %SYSTEMROOT%\system32\drivers\etc\hosts.equiv.
The .rhosts and .netrc should exist in the users HOME

Common error messages

The following is not an exhaustive list of connection errors, but a small list of the more common ones with a very brief comment. For more information about each one consult the documentation or use "finderr errorcode":

-908 Attempt to connect to database server (servername) failed.
Usually means some problem on the TCP level. Either your server is not running or you have your client SQLHOSTS points to the wrong server or port
-930 Cannot connect to database server servername.
Typically the client cannot obtain the server IP address. Check your DNS configuration.
-951 Incorrect password or user user@host is not known on the database
You can get this error on the server side (online.log) and on the client side.
On the server side it means the user does not exist.
On the client side you'll see these error in three situations.. The same error is sent to the client side on different situations in case someone is trying to guess a user or password. The database system administrator can see the real problem. The situations where the error is sent to the client are:

The user does not exists on the database server
The user's password cannot be validated, although the user exists
The user exists, but you're trying to make a trusted connection and the trust is not setup correctly

-952 User ()'s password is not correct for the database server.
This is the error given on the server side when the user password did not validate successfully
-956 Client client-name or user is not trusted by the database server.
Server side error when a trusted connection fails because it's not properly setup
-1809 Server rejected the connection.
This is the only error you'll see when a connection to a PAM configured server happens. In order to understand what went wrong you have to check the PAM stack modules messages
-25596 The INFORMIXSERVER value is not listed in the sqlhosts file or the Reg
The INFORMIXSERVER you're trying to connect is not properly defined in you configured SQLHOSTS

Summary

I really hope this article covers most practical aspects of connection establishment in Informix. There are several aspects I left out for simplicity. These include:

Single sign-on (introduced in IDS 11.50)
Password encryption
SSL connections

These topics may be covered in future articles. You can find more information about Informix connection setup in the Administrator Guide and the Security Guide.

Wednesday, April 15, 2009

Returning null....

Once in a while I'm asked on how to return NULL in a SELECT. This can be a strange request, and usually relates to two also strange situations:

UNLOAD TO file.unl SELECT "" FROM table
This will create the unload file with "|\ |" (note that there's a space following the backslash)
SELECT NULL FROM table
This simple SQL instruction raises a 201 (syntax error) exception

Let's explain the first issue. Unload files (typically pipe separated) generated by the UNLOAD instruction were created as a way to export/import data into an Informix database table. This means Informix has to understand, by looking at what's in the middle of two consecutive pipes ("|"), what was the original value. Given this: "||" (pipe followed immediately by pipe), and assuming we're dealing with a VARCHAR value, what would be the original value? I could give you two options:

NULL
"" (empty string)

Having two options is not really an option. When we LOAD the file, we want to be sure that the inserted value is exactly the original value. Given the two options what would we INSERT? A NULL value or an empty string? This is why we need to identify the NULL differently. And this is done by using "|\ |" (pipe, backslash, space, pipe).
This is fine for Informix, but people tend to use UNLOAD files to move data from Informix into other applications. And other applications don't usually like this notation. So how can we create an unload file with a "NULL" column, meaning "||" (pipe, pipe). The first obvious answer leads to problem 2):

Why does Informix raise an error if we use NULL in the projection clause (SELECT list)? It's seems an error. Specially if you're used to other databases where this works.
Well, the answer is simple. Informix is very strict with the concept of NULL. NULL is the absence of value. So, if we're instructing the SELECT to return an absence of value it raises an unsolvable problem: What is the data type of an absence of value?! Of course Informix could assume some data type, but again, the engine is very strict. So it won't take decisions of this kind for you and consequently the use of NULL in this context is not acceptable (and it raises a syntax error) So, what is the solution? Simple:

SELECT NULL::CHAR FROM table

SELECT CAST(NULL AS CHAR) FROM table

This simply defines the data type of the result set to return to the client.

Another option would be to create a stored procedure which return a CHAR and inside return the NULL value. Something like:

CREATE PROCEDURE ret_null() RETURNING CHAR;
RETURN NULL;
END PROCEDURE;

SELECT ret_null() FROM table

But this is more complex and less efficient.

There are much more things to say about NULL, but I'll leave that for future articles.

Monday, March 23, 2009

Green: further thoughts

This is my second post about the green topic. If you're a frequent visitor you're probably used to more technical posts, or at least Informix oriented posts (that's what the blog is about!), but I couldn't avoid thinking about these issues and I'd like to share some thoughts with you.

In my last post I referenced an online radio show dedicated to the subject of how you can save money by implementing green initiatives. In other words, a green initiative can be described as any action we make to turn our companies and our presence in the planet more environment friendly. I couldn't stop noticing that lately we see a lot of activities in this field. A lot of companies are implementing green initiatives. I see this at a global and local level. Although I'm not a specialist on this issues, and I certainly haven't studied them, I think I won't be very far from the truth if I say that there are several reasons for this:

Companies can really save money by going green
It "looks good" to be green. Most of us are worried about the future of our planet. We probably feel better if we use green products. It's very difficult to change our daily routines, but it's very easy to buy products or use services from companies that show some care to the environment. So it makes us feel comfortable with our conscious
Government regulations or actions favor companies with a green attitude
Some of the reasons above make the green thing a market opportunity, specially in the current times of economic recession

A few examples of green initiatives from several companies:

IBM's smarter planet campaign (and blog)
IBM is pushing not only the green initiatives, but also moving one step forward, and speaking of smarter resource usage. I'll pick this again below...
A local energy company launched a website promoting car pooling
This company is the biggest Oil and Gas company on the Portuguese market
The biggest Portuguese bank launched a project called "Carbon zero"
This initiative according to their words involves the bank staff, the customers and the society in general. The proposed idea is to reduce the environmental impacts of the bank's activities
The more than 4M results on Google when searching for "carbon capture" (see Wikipedia)

I could fill a large web page with examples, but I'm sure you know a lot of them. This has become a bit more than a buzz word. It's a trend. And it's probably a need for us as a society.
When I was in school, during my childhood, the idea of green was something like "don't throw papers on the street", "keep the beaches clean" etc. Currently kids are thought on how to separate and recycle garbage wastes etc. In the future, hopefully all this will be taken as granted and we'll be teaching our kids that everything we do has an impact on the environment. So all the projects, all the designs, all the plans will take this into careful consideration. As IBM put it, we must be smarter on our natural resource usage.
To give you a personal perspective on this issues I can tell you about two situations I've faced:

Last year I had the opportunity to spend a few days working in Toronto, Canada. For someone from a small country, Toronto is really big. I was impressed by the number of lanes in the highways. And the number of highways... and the length of highways, most of them apparently converging to the city center... Most of the highways had a lane reserved for cars with at least 2 (or more?) people in it... And most of these lanes were more or less empty...
A few days ago I took around 1H to get from my workplace to home. This may not look to bad for many of us, but in a normal day I would take about 15m. That's why I use a car and not public transportation... usually it's so much quicker... But on that specific day I passed by 3 accidents (small car crashes...). I felt really desperate to be stuck in the traffic, just a few kilometers from home...

These two situations definitely didn't feel "smart". How can we waste so many time and resources? There are better ways to move around. And we have so much technology that we could use to avoid situations like this... Cars could be smarter in order to avoid some crashes. We could be smarter and have detailed information about the road conditions. We could plan our itineraries in order to avoid traffic. We could use other means of transportation. We could use local systems in our companies in order to use car pooling etc.

This obviously isn't directly related to Informix... But we really need to use the technology we have available today in order to make better usage of the environment. And we will save money by doing so. This applies to companies and to individuals. And we, IT professionals, will probably have a word on this. We have to make our hardware run better and with less resources. Our software systems should be optimized so that we don't consume so much resources (hardware). We have to design and implement systems that make our daily live "smarter" and easier. And this will require information, and as such databases....

I'll end this post with a brief overview of some Informix characteristics that make it a good choice for green systems. This is my best attempt to give some sense to this post ;) :

Informix is very light. It's thread based architecture and efficient code makes it run fast with little resources
Informix thread based architecture makes it a very good choice for current multi-threaded processors.
Informix is very easy to setup. You can save time and use it elsewhere like in application design and tuning
Informix is embeddable. It has a very small footprint. It makes it a good choice for applications that need a local database
Informix is extensible. It allows anybody with the right skills to extend the base functionality. We can create new functions, new datatypes, new aggregate functions, new index mechanisms etc. As an example, the Spatial datablade can be used in a lot of modern applications that require geo-spatial information
Informix works well in virtualized environments. Virtualization is one of the most used technologies to save resources and make better usage of the hardware resources.

And this ends this non technical post. Next one should be about virtualization. How does Informix fits into this environments? And what has IBM done to help you with it...

Saturday, March 7, 2009

Green: good for the environment and your wallet

Jacques Roy, an IBMer which also maintains a blog, participated in an internet radio show organized by the Information Management magazine. The topic of the show was how you can save money by going green. In other words, how you can make your company carbon footprint as light as possible and at the same time save money in energy and hardware costs.

Several people from different companies and areas joined the show, and each one provided his view on the subject. As you can imagine, Jacques talked about the subject from a database perspective, pointing out several aspects of Informix that can help you in the going green task.

The debate was very interesting covering topics like virtualization, multi tier storage, information lifecycle management etc.

Please find more details in Jacques's blog.

Thursday, March 5, 2009

Informix event in Lisbon

This post will be written in Portuguese due to it's local nature. It's purpose is just to inform about a local half-day session covering several recent news and announcements about Informix.

A Companhia IBM Portuguesa vai realizar uma sessão de meio-dia, no dia 24 de Março, no IBM Forum Lisboa, localizado nas instalações da IBM. Esta sessão servirá para dar a conhecer aos clientes e parceiros as últimas novidades e próximos passos no que diz respeito ao Informix.

Os temas cobertos incluirão o recente anúncio sobre o Informix Warehouse Feature, bem como as próximas novidades a incluir na versão 11.50.xC4. Para além dos aspectos mais técnicos serão abordados também aspectos de integração com outros componentes do portfolio da IBM, nomeadamente Cognos e Optim.

A sessão contará com a presença de oradores internacionais, membros da equipa de Information Management da IBM.

Pode aceder à agenda e descrição do evento, bem como efectuar o registo online acedendo a: http://www.ibm.com/events/pt/pt/

Para além do evento dedicado ao Informix serão realizados durante o mês de Março outras sessões sobre Lotus, Rational e Maximo (asset management). A informação e registo destes eventos encontra-se na mesma página.
Pode obter mais informações sobre o IBM Forum Lisboa (localização, como chegar etc.) em http://www.ibm.com/pt/ibmforum/

Se utiliza Informix ou se quer saber mais sobre esta base de dados da IBM não deixe de comparecer. Obterá informação sobre o que está a ser disponibilizado e planeado e terá oportunidade de contactar e trocar impressões com a equipa IBM local e da região, com parceiros e outros clientes.

Wednesday, March 4, 2009

Informix Warehouse feature

IBM has announced today the availability of Informix Warehouse Feature v11.50.
This is a set of tools that help customers by taking advantage of their Informix infra-structure in order to transform operational data into information useful for business optimization.
This bundle includes the SQL Warehouse component which offers the following:

SQW Warehouse Design Studio and SQW Client for data modeling, schema design, data transformation design, and data flow design
SQW Warehouse Server with Administration Console to schedule and manage data flows
SQL Warehouse runtime to perform data transformation within the IDS data server

The infrastructure specified can be integrated with business intelligence tools like Cognos (now part of the IBM Software portfolio), and Optim (former Princeton Softech) for management of data growth, and ETL tools like Information Server (former Ascential Datastage).
This feature is available for Informix Dynamic Server v11.50.xC3.

If you're an Informix customer, or if you've been paying attention to the traditional positioning of Informix as an OLTP database you may wonder how this fits the traditional Informix usage.
Well, this does not mean that Informix became the perfect choice for a large scale enterprise level data warehouse. But the fact is, a lot of Informix customers need to extract more information from the data they manage in their Informix instances. Information organized in a different manner. Not in the usual OLTP transaction oriented schema. Many of this customers can take advantage of their existing infra-structure to build some data-warehouse functionality into their information systems. This package attempts to provide them with some tools that can help setup these new capabilities.
Although Informix is not specially oriented into the data-warehouse usage, it includes a lot of functionality that can help:

Dynamic Scalable Architecture (DSA)
The internal engine arquitechture permits scalability and excelent performance. It also allows for extremely efficient usage of your hardware
Decision Support Queries
IDS has several paremeters that control the type of queries made in a datawarehouse type environment (decision support queries). You can control resourses like memory, number of parallel scans etc. Also, this type of queries are run in parts that can run concurrently (scans, sorts, ortders etc.)
Read ahead capabilities and light scans
This allow for more efficient batch reads like the ones ocurring in big tables sequential scans
Configurable page sizes
This can help improve performance for large batch reads/writes
High performance loader
A very efficient way to load data into your tables
Flexible table fragmentation
Can be used for data lifecycle management, parallel queries, fragment elimination in queries etc.

And what about the future? Well, if you take a look at the recent webcast about the IDS roadmap you can see features like "automatic storage provisioning", "warehouse", "external table support" (currently possible, but not built in). In a very near future, and accordingly to the early validation program, IDS 11.50.xC4 will support table compression.
So, there are old features, new features, and better integration between Informix and other IBM products that can make it a good choice to build a BI supporting structure.

You can find more information about this announcement in the following resources:

The Informix Warehouse Feature announcement
The Informix Warehouse Feature installation guide (at IDS 11.50 Infocenter)
The Informix Warehouse Information page on IDS 11.50 Infocenter

Monday, March 2, 2009

FUD for thought

If you're a frequent visitor here or to any other Informix related post, there are good chances you also check the Informix usenet group. If you don't know what I'm writing about you can check it at http://groups.google.pt/group/comp.databases.informix?hl=en.

I'm a usual reader and poster on that group, and recently one of the topics got my attention. I think it deserves a bit of our time in order to understand a few things...
One person posted a message on the group, asking for an "IBM 10 year plan for Informix". The reason he needs it, as far as the post says, relates to the fact that, allegedly, a sales representative from an IBM competitor had that for it's database, and the consultants were saying IBM hadn't a similar documented plan for Informix.

Let me tell you that usually, the technical posts are more participated than the "commercial oriented" ones. But in this case we could see very quick answers from IBM business partners, IBM Informix architects and even from Oracle employees... All of them agreed on something: Neither IBM, or Oracle (and probably not any other software vendor) has a plan for 10 years. And why not? Well, simply because in 10 years everything can change in the IT industry. New trends, new needs, new paradigms. So what would be the point in making a plan that would almost for sure be changed in the middle?
This leads us back to the initial situation... so why would someone try to raise Fears, Uncertainty and Doubts (FUD) about IBM and Informix? The simple answer, provided by some, was simply because it's a usual sales tactic when the customer is leaning towards the competition.
Why not use technical or other commercial arguments in that situation? Probably because it's harder, or simply because there are no other arguments strong enough...
This simple situation, accordingly to some of the posters is relatively usual. I've seen it happen, or at least I've seen the effect of them, when people that are simply not informed ask me if IBM is trying to move Informix customers to DB2. So, let me try to raise a few questions and propose some answers:

Does IBM have a plan for Informix?
Yes. You can check a roadmap in a recent presentation which you can access at:
http://www.ibm.com/informix/webcasts
Does IBM have a ten year plan for Informix?
No. It doesn't have a ten year plan for any product, simply because it's not realist. If such a plan existed it would ignore all the future changes, trends and shifts in the IT industry.
On the other hand, IBM has nearly a ten year past with Informix. Think about that when you read the next questions
Has Informix been improved in the hands of IBM?
Sure. IDS releases since 9.4 have seen done by IBM. I prefer to exclude 9.40 because it happened almost immediately after the acquisition, so the development phase was done by Informix Corporation. But think about IDS 10, IDS 11 and IDS 11.5. Think about all the new features. Think about MACH 11. Think about all the improvements in security (LBAC, default roles, SSL, single sign on, column level encryption, encryption expert, encrypted replication, PAM authentication...). Think about compression (11.50.xC4 as already announced in the early validation program). Think about unicode. Think about common client (DRDA). Think about performance (shared statement cache, non-blocking checkpoints, index self join, CPUVP cache, Direct I/O...). Think about Open Admin Tool. Think about all the new features in enterprise replication. Think about dynamic SQL in stored procedures. Think about ontape and OnBAR improvements. Think about SQL trace. Think about SQL admin API. Think about DB scheduler...
Does IBM talk about Informix?
Yes. Have you noticed the recent announcements about Informix availability in the "cloud"? Have you noticed that we have regular webcasts? Have you read all the IBM redbooks about Informix? Have you been to any of the proof of technology events all around the globe? Have you noticed what are the "brands" on the bottom of the 2009 EMEA IOD Conference page?
Did Informix loose any of it's strengths in the hands of IBM?
No... It's still stable, fast, easy to setup and manage, and "light" (both in installation size and in resource consuption)
Is it harder to upgrade than before?
No... it's still the same process to move from IDS 7.31 to IDS 11.50 as it was from 7.30 to 7.31. Simple and quick...
Is it more expensive?
No. It's still competitive. You get more features included on the base price than in most of it's competition. And you can get a fair deal if you run it in virtualized environments.
Do we have more or less information about Informix?
More. Currently we have a lot of blogs and sites with information about Informix. The IBM site, although complex, has all the manuals, product lifecyle, product downloads etc.
It it harder to test Informix?
No. You can download a developer version from the IBM site. You can even download a virtual appliance with the software and some demos (MACH 11 for example)
Did we loose Informix technical support quality?
No. In recent surveys, Informix customer satisfaction was proven to be high. You don't get that with lousy technical support.
Do we still have a great international user group?
Yes! IIUG continues to do a terrific job. Even in the current difficult times, IIUG continues to promote Informix and works closely with IBM. Just check some recent announcements, the IIUG 2009 user conference, the surveys, the monthly insider etc.
What about Informix training?
IBM has been updating the Informix training courses. I had the pleasure to try some of them in the ILO (Instructor Led Online) environment, so I can assure that the new features are covered in the courses.

So... what the mentioned comp.databases.informix thread talked about is FUD (Fear, Uncertainty and Doubt). What I exposed above are verifiable facts. The decison between the two should be taken by customers. Do you prefer to trust those who have been providing good, fast, stable, scalable and easy to manage software, or do you prefer to trust those who try to scare you with statements that are contradicted by facts and an history of almost 10 years?

Regards.