My_DataStage_Tips: Select Max for surrogate key in EE

The surrogate key stage in 8.0 can be used with DB Sequences or using a Flat File to manage the keys (set the 'Source Type' option to Flat File). Fortunately, seeding the flat file with the max value of the table is easy, unfortunately it can't be done in the same job as the job that generates the keys.

There are 2 ways to do it --

1. Create a job that does a 'select max(..)' from the target table, connected to a surrogate key stage (set the 'Update Action' to 'Update'). This will seed the state file with the max value. In the job that needs the keys, use the surrogate key stage and this state file to generate keys (SK stage takes 1 input link, and produces 1 output link with the SK column appended). If this is the only thing that inserts rows into the table, you don't need run the 'Update' job again, the state file will remember where it left off for next time.

2. When using a Flat File to manage keys, you can supply an initial value in the surrogate key stage. This value can be a job parameter, so you can hook this together with something that does the select max() and sets the job param in a job sequencer.

BTW -- the new surrogate key operator was designed for the parallel execution environment, so key generation is handled without the use of @PartitionNum or any of that. It also supports multiple jobs (running in parallel) that are getting keys from the same state file.

And finally... you can also use the surrogate key generation functionality directly in a transformer (rather than using the SKG stage). It requires a little set up in the transformer stage properties, then you can use the utility function 'NextSurrogateKey()' as the derivation for the SK column.

If the any of this sounds like something that you want to try, let me know and I can set you up with some simple examples.

My_DataStage_Tips

Wednesday, April 7, 2010

Select Max for surrogate key in EE

No comments:

Post a Comment

Followers

About Me

Search This Blog

Blog Archive