Wednesday, April 7, 2010

Sequential file stage File Name Column option

Questions:
In the sequential file stage on the parallel canvas there is an option to specify a File Name column. This, in theory, allows you to read from multiple files (if the Read Method is set to "File Pattern") and to populate the file name in the specified outbound column.

However, if I specify a wildcard in the File Pattern such as C:\Test_Data\*, I don't get the individual file names in the outbound column - I get C:\Test_Data\* for the file name outbound column. That's rather lame and is a defect if you ask me - anyone else experience this? This is on a windows implementation.

Answer
By default the sequential file stage takes all the files returned by the pattern and cats them together reading from one big stream of data, so it is not possible to determine an individual file name for each record.

You can get the individual file names by setting APT_IMPORT_PATTERN_USES_FILESET. This will change the behavior of sequential file stage patterns so it will create a file set with the returned files. This has the advantages of better parallelism depending on configuration and leaves the file names available to populate a file name column.

No comments:

Post a Comment