My_DataStage

Thursday, March 4, 2010

DataStage Performance Tuning Tips

Some of the Key factors for the consideration

Staged the data coming from ODBC/OCI/DB2UDB stages or any database on the server using Hash/Sequential files for optimum performance also for data recovery in case job aborts.
Tuned the OCI stage for 'Array Size' and 'Rows per Transaction' numerical values for faster inserts, updates and selects.
Tuned the 'Project Tunables' in Administrator for better performance.
Used sorted data for Aggregator.
Sorted the data as much as possible in DB and reduced the use of DS-Sort for better performance of jobs
Removed the data not used from the source as early as possible in the job.
Worked with DB-admin to create appropriate Indexes on tables for better performance of DS queries
Converted some of the complex joins/business in DS to Stored Procedures on DS for faster execution of the jobs.
If an input file has an excessive number of rows and can be split-up then use standard logic to run jobs in parallel.
Before writing a routine or a transform, make sure that there is not the functionality required in one of the standard routines supplied in the sdk or ds utilities categories.
Constraints are generally CPU intensive and take a significant amount of time to process. This may be the case if the constraint calls routines or external macros but if it is inline code then the overhead will be minimal.
Try to have the constraints in the 'Selection' criteria of the jobs itself. This will eliminate the unnecessary records even getting in before joins are made.
Tuning should occur on a job-by-job basis.
Use the power of DBMS.
Try not to use a sort stage when you can use an ORDER BY clause in the database.
Using a constraint to filter a record set is much slower than performing a SELECT … WHERE….
Make every attempt to use the bulk loader for your particular database. Bulk loaders are generally faster than using ODBC or OLE.
Minimise the usage of Transformer (Instead of this use Copy modify Filter Row Generator)
Use SQL Code while extracting the data
Handle the nulls
Minimise the warnings
Reduce the number of lookups in a job design
Use not more than 20stages in a job
Use IPC stage between two passive stages Reduces processing time
Drop indexes before data loading and recreate after loading data into tables
Gen\'ll we cannot avoid no of lookups if our requirements to do lookups compulsory.
There is no limit for no of stages like 20 or 30 but we can break the job into small jobs then we use dataset Stages to store the data.
IPC Stage that is provided in Server Jobs not in Parallel Jobs
Check the write cache of Hash file. If the same hash file is used for Look up and as well as target disable this Option.
If the hash file is used only for lookup then \ enable Preload to memory\ . This will improve the performance. Also check the order of execution of the routines.
Don\'t use more than 7 lookups in the same transformer; introduce new transformers if it exceeds 7 lookups.
Use Preload to memory option in the hash file output.
Use Write to cache in the hash file input.
Write into the error tables only after all the transformer stages.
Reduce the width of the input record - remove the columns that you would not use.
Cache the hash files you are reading from and writting into. Make sure your cache is big enough to hold the hash files.
Use ANALYZE.FILE or HASH.HELP to determine the optimal settings for your hash files

This would also minimize overflow on the hash file. (Need for Server Jobs)

If possible break the input into multiple threads and run multiple instances of the job.
Staged the data coming from ODBC/OCI/DB2UDB stages or any database on the server using Hash/Sequential files for optimum performance also for data recovery in case job aborts.
Tuned the OCI stage for 'Array Size' and 'Rows per Transaction' numerical values for faster inserts updates and selects.
Tuned the 'Project Tunables' in Administrator for better performance.
Used sorted data for Aggregator.
Sorted the data as much as possible in DB and reduced the use of DS-Sort for better performance of jobs
Removed the data not used from the source as early as possible in the job.
Worked with DB-admin to create appropriate Indexes on tables for better performance of DS queries
Converted some of the complex joins/business in DS to Stored Procedures on DS for faster execution of the jobs.
If an input file has an excessive number of rows and can be split-up then use standard logic to run jobs in parallel.
Before writing a routine or a transform make sure that there is not the functionality required in one of the standard routines supplied in the sdk or ds utilities categories.
Constraints are generally CPU intensive and take a significant amount of time to process. This may be the case if the constraint calls routines or external macros but if it is inline code then the overhead will be minimal.
Try to have the constraints in the 'Selection' criteria of the jobs itself. This will eliminate the unnecessary records even getting in before joins are made.
Tuning should occur on a job-by-job basis.
Use the power of DBMS.
Try not to use a sort stage when you can use an ORDER BY clause in the database.
Using a constraint to filter a record set is much slower than performing a SELECT WHERE .
Make every attempt to use the bulk loader for your particular database. Bulk loaders are generally faster than using ODBC or OLE.

Monday, February 22, 2010

DataStage Director Can't open (IS 8.1)

Problem:
The job can be created using DS Designer and compiled as well. But, while opening in the Director, the error message was thrown like below:
Error calling sunroutine DSR_EXEJOB ( Action=5); check Datastage is set up correctly in project SBIDEMO ( The connection is brocken (81002))

Cause:
The Director crashes with an 81002 error when a job is invoked and log view is selected even though you can login to the Designer and create and compile a job. The issue occurs if the jobmon (px engine job monitor) application crashes or is not running (in our case it crashed with nothing in its log file after running several jobs successfully) and is due to the changes introduced with 8.1 that direct DataStage logging to xmeta instead of the local DSEngine log files.

Resolution:
It is possible to revert to the old 8.0.1 functionality by setting the following project level
parameters in the DSParams file to the following values:

RTLogging=1
ORLogging=0

After making the switch back to DSEngine level logging an error will be logged in the job log indicating that the job monitor is not running but the job will run to completion: main_program: Failed to initialize job monitoring. Monitor information will not be generated.
The job monitor can be started manually via the following steps while in the InformationServer/Server directory:

# . ./DSEngine/dsenv
# ./PXEngine/java/jobmoninit start

Tuesday, January 12, 2010

DataStage logging changes introduced at 8.1

Abstract
DataStage logging was changed at release 8.1 to log job run detail records into the operational repository, (xmeta) rather than the local project level log files, (RT_LOGxxx) that we utilized in prior releases for each job. As a result of this change we have seen the following issues:

Job run times can increase and Information Server client applications may not be as responsive depending on the amount of log data that is generated by the jobs.

In some cases no job log entries are written to the log as viewed via the DataStage Director client or Web Console, even though jobs appear to run and in some cases job log detail entries may be missing or do not show up until later.

Job log purge operations may run slowly or fail depending on the amount of log entries.

Content
These issues can be worked around by reverting to the logging mechanism that was in place prior to release 8.1 by implementing the following project level changes on the DataStage engine server.
Edit the project level DSParams file, typically located in:
/opt/IBM/InformationServer/Server/Projects/%projectName% for Linux/UNIX and C:\IBM\InformarmationServer\Server\Projects\%projectName% for Windows and modify the following 2 lines as shown to revert to pre 8.1 logging:

RTLogging=1
ORLogging=0

Keep in mind that newly created projects inherit their settings from the DSParams file that is located in /opt/IBM/InformationServer/Server/Template/DSParams for Linux/UNIX and C:\IBM\InformationServer\Server\Template\DSParams for Windows by default and that it should also be modified to ensure that new projects use the pre 8.1 logging mechanism.

After switching to RTLogging, the existing log details entries in the repository can still be viewed via the Web Console or Server Console but they will not be accessible using the DataStage Director. These log entries should be purged when they are no longer required by scheduling a background purge of the log entries; up to 10,000 at a time is suggested to minimize the memory requirements and to ensure that we do not run into WebSphere Application Server out of memory issues by trying to purge all the log entries as one task.

The following commands will schedule a task to delete up to 10,000 log entries at a time and will loop until all the DataStage entries are purged. This could take some time depending on how many log entries you have to purge. This method has the advantage in that it can be run without stopping other activities.

Use the following command line, (from the …InformationServer/ASBServer/bin directory) to create the scheduled auto purge task (replace with a suite admin user/password):

For Windows:

LoggingAdmin -user -password -create -schedule -name "DS job event purge task" -frequency -minutes 30 -threshold 10000 -percentage 100 -includeCategories IIS-DSTAGE-RUN

For Linux/UNIX:

./LoggingAdmin.sh -user -password -create -schedule -name "DS job event purge task" -frequency -minutes 30 -threshold 10000 -percentage 100 -includeCategories IIS-DSTAGE-RUN

After all DS log events have been purged, the scheduled task can be deleted with the following command line:

For Windows:

LoggingAdmin -user -password -delete -schedule -name "DS job event purge task"

For Linux/UNIX:

./LoggingAdmin.sh -user -password -delete -schedule -name "DS job event purge task"

APAR JR31806 has been submitted for this issue

Tuesday, December 8, 2009

METASTAGE_LOC_INFO in CAHETLCPCA type *FILE not found

Error message (From Datastage Level)
FPCONDTL,0: Fatal Error: Fatal: [IBM][CLI Driver][AS] SQL0913N Unsuccessful execution caused by deadlock or timeout. Reason code "CCAREPDATA.FPCONDTL". SQLSTATE=57033
DSTAGE-DB2CLI-0026`:`SQLFetch: Error retrieving results from server.

Error message (From AS/400 Level)
Target job assigned to handle DDM connection started by source system over
TCP/IP.
Local relational database accessed by LTOH0010ISPHN01.
METASTAGE_LOC_INFO in CAHETLCPCA type *FILE not found.
Prepared statement STMT0005 not found.

Cause:
This problem happened when there is job trying to access a table and some parallel insert / updated is being processed into the same table. So, in this case there is a SHARED lock applied for that particular record which is under transaction, as it was not in committed mode.

Resolution:
In the stage level, there is an option to set the Isolation level, which can be changed from Cursor Stability > Read Uncommitted. Setting this condition, we can also fetch the records which are not in the committed mode ignoring the SHARED lock processed by the record level information.

PT_Decimal::operator=: the source decimal has even precision, but non-zero in the leading nybble, or is too large for the destination decimal

Error Message:
APT_CombinedOperatorController,0: APT_Decimal::operator=: the source decimal has even precision, but non-zero in the leading nybble, or is too large for the destination decimal.

Resolution:
The error message is not quite informative as far as the datastage logs are concerned. So,

set APT_CHECK_DECIMAL_VALIDITY=1 in the project level.
Once you rerun the job you will see some kind of different log message like
APT_CombinedOperatorController,0: Fatal Error: APT_Decimal::assignFrom: the source decimal (precision = 38, scale = 10) is too large
for the destination decimal (precision = 3, scale = 0).

Now, looking to this log, you can make decision, which target field with length Decimal (3,0) being mapped from Integer (Default precision is 38:10) when it is mapped. Then you can go back to your job and modify the job accordingly.

Some other points (For Numeric / Decimal):
The following table specifies the storage size required based on the precision specified for the NUMERIC or DECIMAL data type:

Precision Storage Size
1 - 9 5 bytes
10- 19 9 bytes
20-28 13 bytes
29-38 17 bytes

ISv8.1 - installation of IA8.1.1 - how to keep silent

Question:
Trying to automate the client upgrade of Information Analyzer from v8.1 to v8.1.1 (installation of IA-811-upgrade-V2-windows.ispkg)

- is there a way to avoid the need to enter '1' and 'enter' to accept the license agreement during console-installation?

In the main installations a response file can be used, but is there a way for updates?

My installation command:

C:\IBM\InformationServer\ASBNode\apps\jre\bin\java.exe -jar updater.jar -i C:\IBM\InformationServer -p C:\patch\81-critical-patch.ispkg -console

Answer:
InstallPatches.sh:
"\IBM\InformationServer\ASBNode\apps\jre\bin\java" -jar updater.jar -i "/IBM/InformationServer" "$@"

TestInstall.sh
./InstallUpdate.sh -version -console
./InstallUpdate.sh -p IA-811-upgrade-V2-windows.ispkg -console -force << EOF
1
EOF
./InstallUpdate.sh -version -console

Lost DataStage Designer menu

For several designer questions on the techlist, especially dealing with windows that show up off screen due to resolution / duel monitor changes, it has been recommended to delete the registry keys under HKEY_CURRENT_USER\Software\Ascential Software\DSDesign to reset the Designer Window positions. So you may want to backup that registry key, and then clear it out. I believe one of the DSDesign-Bar? entries under it is the position of the Menu, but I couldn't tell which one. If something bad happens, you should be able to restore your backup of the registry key.

I deleted the key on my desktop, after docking the menu tool bar on the bottom of the screen. The registry key was recreated the first time I started the designer client, and my menu tool bar was back at the top as normal. So this would be my first step in solving.

My_DataStage_Tips