Wednesday, May 15, 2013

DB2 for z/OS Performance Tuning: the Rising Tide and the Home Run

I've been working with DB2 for z/OS for about 25 years. During most of that time, I've been actively engaged in performance tuning work -- specifically, performance tuning as it pertains to the CPU and elapsed time associated with application access to data in DB2-managed databases. I've learned that the best results, in terms of efficiency and throughput gains, are achieved via a two-pronged approach that I conceptualize as a rising tide and home runs. In this blog entry, I'll explain these concepts and try to provide you with a road map for achieving your own DB2 for z/OS performance tuning successes.

The rising tide

Here in the USA (and perhaps in some other countries), there's an old saying: "A rising tide lifts all boats." The DB2 tuning actions that I think of as being in the "rising tide" category are those that positively affect the performance of all SQL statements (or at least many statements) that execute in the target DB2 subsystem. Examples of these actions include the following (and I'm providing here a few links to blog posts and articles of mine that provide additional information):

"Rising tide" DB2 performance tuning actions are attractive for several reasons. Chief among these:
  1. They tend to have a wide-ranging impact on the performance of DB2 data access operations. That's the "lifts all boats" effect to which I alluded above. In taking one or more of these actions, you can expect to see improved performance (particularly in terms of CPU efficiency) for lots of the SQL statements that execute in your DB2 subsystem -- not just one or two statements.
  2. They're often pretty easy to implement. Want to page-fix buffer pool BP2 (for example)? OK -- issue an -ALTER BUFFERPOOL(BP2) PGFIX(YES) command (the change will take effect when the pool is next deallocated and reallocated -- usually as a result of "bouncing" the DB2 subsystem). Want to increase the size of your DB2 dynamic statement cache? Change the value of the EDMSTMTC parameter in your ZPARM module. Want to boost CICS-DB2 thread reuse? Change PROTECTNUM to a non-zero value in the DB2ENTRY resources associated with your most frequently executed CICS transactions. In these and other cases, no database design changes are needed, and no application code changes are required.
  3. They can be implemented quickly. This is directly related to item 2, above. When a tuning action requires a change to an SQL statement issued by a program, that change is not going to go into production overnight. Even if it's a pretty simple change, a change request has to be generated and approved, the change has to be assigned to a programmer, it has to be tested, etc. All this could take weeks. Because "rising tide" tuning actions generally don't necessitate modification of application code, they can often be put into effect in relatively short order.

All good; and yet, there is a pretty significant "yeah, but" associated with most "rising tide" DB2 performance tuning actions (as in, "Yeah, that's nice, but..."). The big "yeah, but" to which I'm referring is the typically modest magnitude of the performance gain that usually results from implementation of a "rising tide" tuning step. Very often, what you'd expect to see is a reduction in CPU consumption of maybe a few percentage points for SQL statements impacted by a "rising tide" change. Basically, you're improving a lot of DB2-access operations by a little bit each. Saving a little bit of CPU for a lot of SQL statements is a good thing, of course, but along with the "lots of little" improvements, you want some big scores. That's where "home run" tuning comes into play.

The home run: narrow focus, big payback

Concentrate your tuning efforts on a single SQL statement, with all performance-improving options on the table, and you might see the CPU time (and, quite possibly, the elapsed time) of that statement go down by 90% or more. I've seen that happen time and time again over the past couple of decades, with query elapsed and CPU times slashed as a result of statement-focused tuning actions. Examples of such actions include:
  • Defining a new index on a table to provide a better-performing access path for an expensive query.
  • Using Data Studio Statistics Advisor recommendations to update DB2 catalog statistics via RUNSTATS to reflect the non-uniform distribution of duplicate values in a column of a table, thereby enabling DB2 to choose a much better access path for a query.
  • Adding a non-result-set-altering predicate to a query to provide DB2 with more information about the characteristics of qualifying rows, enabling selection of a much superior access path for the query.
  • Changing a "between" predicate in a query to an in-list predicate, and changing another predicate in the same query to be indexable (by removing an arithmetic column expression in the predicate), resulting in an increase in the number of index key columns on which DB2 can match in processing the predicates, and a much better-performing access path.
  • Changing a join query with OR'ed predicates referencing columns in two different tables (WHERE TABLE_A.COL1 = 'XYZ' OR TABLE_B.COL2 = 123) to a union of two selects without such OR'ed predicates, leading to vastly improved pre-join row filtering.
  • Creating an index on a column expression to make a formerly non-indexable predicate indexable and stage 1.

In these and other cases, I've observed that chances of success are improved when:
  1. You do a good job of target selection. Because statement-level tuning work might improve performance only for the one query on which you're focused, choose your target query wisely. Make sure that the query you've selected for tuning is one for which a big performance improvement would be truly valuable for your organization.
  2. You don't develop tunnel vision at the outset of a statement-level tuning effort. By this I mean that you don't want to settle prematurely on a particular technique for reducing the query's CPU and/or elapsed time, as doing so could blind you to other means of accomplishing your goal. Consider all available tuning options. What worked for one query may or may not work for another.
  3. You get the right people involved. Not only DBAs and application developers, but business-knowledgeable people and data domain experts, as well. Different people will view a query that's up for tuning in different ways, and that can spur creative and effective thinking in terms of identifying actions that boost query performance.

One other thing: working to reduce CPU and elapsed time for one particular query might in fact improve the performance of other similar queries. For example, an index on an expression created to cut the response time of one query might similarly shorten run times for other queries that also have predicates with the column expression in question. Think of that -- getting better performance for several queries as a result of an action taken to drive down the CPU cost of a single query -- as hitting a home run with men on base (sticking with the baseball analogy).

It's not an either/or thing

By all means, you should avoid getting into arguments over the relative value of the "rising tide" versus the "home run" approach to DB2 for z/OS performance tuning. One approach produces low-cost (particularly in terms of required effort) wins that have wide-ranging -- if generally modest -- cost-reducing effects on query execution. The other approach is typically more labor-intensive and can require more time for implementation, but often delivers big performance improvement results for queries singled out for tuning. What you want is to use both approaches in an ongoing, systematic way: make your query execution environment as conducive to good performance as possible, and look for ways to make your most-costly queries much less costly.

Pursuing better DB2 application performance on these two fronts brings to my mind one more baseball-related image. The San Francisco Giants play their home games in a baseball stadium that's right on the water. Hitters who can drive the ball to right field with some power can land a ball in San Francisco Bay. That's where the rising tide meets the home runs, and that's where you want to be with respect to tuning the performance of your mainframe DB2 system.

Monday, April 29, 2013

Application Performance: When Reducing Concurrency can Improve Throughput

There's an interesting application performance situation that I've encountered several times over the past twenty years or so, most recently a couple of weeks ago. I was talking with a DBA over lunch at a regional DB2 users group meeting. He mentioned that he and some of his colleagues had been working to address a performance problem involving a purchased application that accessed a DB2 for z/OS-managed database. A particular data-changing application process was not performing well, and the cause was clear: lock contention -- especially deadlocks. Row-level locking was in use, so that contention-reducing card had already been played. One of the moves being contemplated was a further reduction in the already-low deadlock detection cycle time. I had a different suggestion: reduce the level of execution concurrency for the process. In other words, reduce the degree of multi-threading for the process. I said that serious consideration should be given to single-threading the application process. That recommendation ended up putting the DBA's team on the path toward success.

The advice I gave to my DBA friend may seem contradictory at first blush. I mean, increasing the level of multi-threading for a process is supposed to boost application performance, right? Well, yeah, if the process is designed for concurrency. Some applications are so designed, and some aren't. Design for concurrency is, of course, chiefly applicable to processes that change data, because such processes acquire X-locks on DB2 data pages or rows. Most especially, design for concurrency applies to processes that update or delete data rows, since inserts are generally not blocked by locks (I say "generally" because an insert will wait for an X-lock held by another process on the target page to be released, if the table is hash-organized -- I described hash-organized tables in an entry I posted to this blog a few months ago). If a process is such that the same set of rows can be changed by overlapping units of work (given a multi-threading scenario), it is possible that deadlocks will negatively impact application throughput (of course, if concurrently executing data-changing processes act on different sets of rows -- as is very often the case -- then the level of execution concurrency can go very high with no lock contention problems whatsoever).

If an application process driving a lot of deadlocking is one that was developed in-house by your organization's programmers, it may be that a design modification could improve concurrency by changing the order in which data-changing operations occur (for example, row-changing actions might be moved to the "end" of a unit of work, or a sequence of change A / change B / change C might be altered to B-A-C or otherwise re-sequenced). If, on the other hand, the application in question is a product purchased from a vendor (as was the case for the DBA to whom I spoke at the user group meeting), code modification likely won't be an option. Even for an in-house-developed application, changing a program's design to alleviate a deadlock problem may not be feasible or fruitful. What then?

That's when your thoughts should turn to reducing the level of multi-threading for the process -- perhaps as far as going the single-thread route. I know for a fact that reducing concurrency of execution for a process can improve performance, even if you go all the way to one-at-a-time. Here's how: if your deadlock detection time (set via the DEADLOK parameter in your IRLM PROC) is at the default of 1 second, two deadlocked transactions could wait for a second (that is, 1000 milliseconds) before one is selected by DB2 to be the "loser." That transaction's work then gets rolled back, and it has to be re-tried (if the associated program has retry logic). Now, suppose that the process is single-threaded. If the average transaction time is, say, 50 milliseconds, and a transaction gets there "second" (i.e., arrives when another instance of the process is in the midst of execution, requiring a wait because of the single-thread situation), it will have to wait a maximum of 50 milliseconds before it can get going, AND there won't be any rollbacks or required retries. Much better, right?

"Wait," you might say, "'Better' depends on the transaction arrival rate," and you'd be right in focusing on that as a key factor -- that, and average transaction elapsed time. If the average elapsed time for a transaction that you want to single-thread is 50 milliseconds, then an arrival rate of up to 20 transactions per second should be OK in terms of avoiding big back-ups in a single-threaded scenario. That may not sound like a very high transaction rate to you, but I've seen that some processes that have had deadlocking problems when the level of multi-threading is high in fact have transaction arrival rates that make single-threading a very viable throughput-boosting technique. And, keep in mind that reducing the level of multi-threading for a process to two or three -- as opposed to one -- might be sufficient to alleviate a deadlock problem while providing support for a higher transaction arrival rate than could be handled with single-threading in effect.

How might one go about reducing multi-threading for a process in order to boost throughput through elimination (or near-elimination) of deadlocks? There are several alternatives, depending on the nature of the process and other database-accessing applications running in the system. If the process with deadlock issues is the only one accessing a table or set of tables when it executes, single-threading could be achieved through LOCK TABLE IN EXCLUSIVE MODE statements. If other processes require access to the target table or tables while the deadlocking transactions execute, one-at-a-time locking can be effected in a more granular fashion. Suppose, for example, that a process changes some rows in a set that have a common identifier, such as a customer number. If, at the beginning of a transaction, a column in a row with a given customer number in a "master" or "control" table is updated to equal itself (e.g., UPDATE CUST_MASTER SET COL_A = COL_A WHERE CUSTNO = :hvar), a transaction can change data in rows with that customer number without deadlocking with another instance of the process -- this is a very low-cost way to get a serializing X-lock on a row or page.

If the process is executed as a CICS transaction, specifying THREADLIMIT=1 and THREADWAIT=YES in the associated DB2ENTRY CICS RESOURCE will result in one-at-a-time execution.

For a DRDA-using client-server process, the number of connections to a DB2 for z/OS system for an application can be controlled on the client side. DB2 10 for z/OS provided a server-side means of controlling the number of connections from an application server: profile tables (something about which I blogged a few weeks ago). DB2 server-side control of client connections is even more dynamic and comprehensive if you have IBM InfoSphere Optim Configuration Manager.

Whatever your preferred means of achieving single-threading (or just a reduced level of multi-threading), make sure that this is in your application tuning tool box. Under certain circumstances, you really can get more throughput for an application process by having less of it executing in the system at one time.

Tuesday, April 23, 2013

DB2 for z/OS: Goodbye "Security" Views, Hello Row Permissions and Column Masks

It's an age-old requirement -- one that's been around as long as DB2 itself: ensure that a particular group of users can see only certain rows in a given table, and/or ensure that values in a certain column of the table are transformed before being returned to users in the group. For example, employees who work in a branch office of an organization might be permitted to see information in a CUSTOMER table for customers assigned to that branch, but prohibited from seeing information for customers assigned to other branches; furthermore, these employees are permitted to see an indication of the relative magnitude of the income of a customer assigned to their branch, but not actual income values.

For a long time, such needs were commonly addressed through the creation of so-called "security" views. The mechanism could be implemented in something like the following manner:
  1. A view that filters out rows that do not have the identifier of an employee's assigned branch in the BRANCH column is defined on the CUSTOMER table. The view might be distinguished by a high-level qualifier (e.g., BRANCH.CUSTOMER, versus PROD.CUSTOMER for the underlying table), or it might have a different unqualified name (e.g., BRANCH_CUSTOMER). Such a view could also transform and abstract information in the table's INCOME column as required.
  2. The SELECT privilege on the view is granted to an appropriate secondary authorization ID (or set of secondary IDs), and/or (in the static SQL case) the EXECUTE privilege on packages of programs that access the view is granted to that secondary authorization ID (or set of secondary IDs).

That seems simple enough, but as many DBAs and security administrators will attest, it's not an ideal solution. Downsides of the view approach to row- and column-level data security include:
  • The name of view must be different from the name of the table on which it's defined, and different from any other existing view in the catalog of the DB2 system. This can complicate things from an application perspective. If the unqualified name of the view is what makes the name unique (e.g., a view BRANCH_CUSTOMER is defined on the table named CUSTOMER), SQL statements targeting the view will have to explicitly reference the unique unqualified view name, or they will have to reference an alias that has the same unqualified name as the underlying table but which resolves to the different unqualified name of the view (and that alias will of course have to have a high-level qualifier that's different from the underlying table's high-level qualifier, since the unqualified names of the alias and the table are identical). If the view's high-level qualifier is what makes its name unique (e.g., a view BRANCH.CUSTOMER is defined on the table PROD.CUSTOMER), packages of programs with static SQL statements referencing unqualified object names will have to be bound into different collections using different bind-time qualifiers, and the right collection will have to be selected (likely via SET CURRENT PACKAGESET) when users need access to the view versus access to the underlying table. Also for a view distinguished from an underlying table by way of a different high-level qualifier, users issuing dynamic SQL statements (perhaps generated by an application or a tool) will have to utilize the correct fully qualified object name to access the view versus the table, or a SET CURRENT SQLID = 'BRANCH' will have to be issued to ensure that the right high-level qualifier is added to unqualified object names (and that may or may not be feasible, depending on the application in question). Any way you slice it, the requirement for a unique name for the view creates a hassle factor when it comes to object access and management.
  • As the number of different row and column restrictions increases, the number of security views can become rather large. In some cases, security views can proliferate like crazy. Not only can this be a database administration headache, it can also get in the way of making database design changes that could improve application performance or functionality. I recall a situation, a few years back, in which a potentially performance-enhancing database design change was vetoed at a certain DB2 for z/OS site because it would require making a change to a view. There were views defined on other views at this site (primarily security views), going up several layers, and the view that would be changed was at the ground level, so to speak. There was so much concern about the effect that a modification of this foundational security view might have on other views that we had to go with a less desirable database change that would leave the view definition intact.
  • The "super user" loophole. Security views can do a pretty good job of restricting data access at the row and/or column level for most users, but what about "super users," such as those who have SYSADM authority on the DB2 subsystem? You can have all the security views you want, but the underlying table will continue to exist as before, and someone with SYSADM authority can view all the data in that table. Knowing who your SYSADMs are, you may be comfortable with that, but are your auditors?

Kind of reminds you of a lot of product-pitching TV commercials you've seen, right? You know, the ones in which a flustered individual exclaims...

"There must be a better way!"

Well, Mr. or Ms. DB2 person, there is a better way. It's called row permissions and column masks, and it was introduced with DB2 10 (new-function mode). It's beautiful in its simplicity. In the remainder of this blog entry I'll briefly describe the feature and how you can put it to work, and I'll provide some simple usage examples. You can read more about row permissions and column masks in the DB2 10 for z/OS Information Center on the Web.

A row permission is implemented by way of the SQL statement CREATE PERMISSION. Through this statement, you specify column-filtering predicates for a table and associate these with a primary or secondary authorization ID, or with a role (the association is accomplished via new scalar functions that include VERIFY_GROUP_FOR_USER and VERIFY_ROLE_FOR_USER). Similarly, you can use the statement CREATE MASK to specify column-value transformation actions in the form of CASE expressions, and associate these with authorization IDs or roles through the aforementioned VERIFY functions.

To show you how permissions and masks can be utilized to implement data security safeguards at the row and column level in a DB2 database, I'll return to the example CUSTOMER table that includes, among other things, a column containing the identity of a branch to which a customer is assigned, and a column containing that customer's income. This is a very small and very simple table, as indicated by the result set of a SELECT * statement targeting the table:

CUSTNO  CUSTNAME    BRANCH       INCOME
---------+---------+---------+---------
111222  JONES       A             45000
222333  SMITH       A             57000
333444  TERRY       B            110000
444555  WILSON      B            230000


If we want to make sure that employees in the respective branch offices can only see data for customers assigned to their branch, we can create a permission like the one below. It's assumed here that employees' individual authorization IDs have been connected to the RACF (or equivalent) groups STAFF and MGR (the former for professional staff and the other for managers), and that columns EMP_BRANCH and EMP_AUTHID of table EMP_INFO contain employees' branch office assignments and system authorization IDs, respectively). 

CREATE PERMISSION BRANCH_EMP ON CUSTOMER              
 FOR ROWS WHERE                                       
  VERIFY_GROUP_FOR_USER(SESSION_USER,'STAFF','MGR') = 1
  AND                                                 
  BRANCH = (SELECT EMP_BRANCH FROM EMP_INFO           
  WHERE EMP_AUTHID = SESSION_USER)                    
 ENFORCED FOR ALL ACCESS                              
 ENABLE;


This row permission is activated through an ALTER TABLE statement, as follows:

ALTER TABLE CUSTOMER       
ACTIVATE ROW ACCESS CONTROL;


With the row permission defined and activated, a user who is assigned to branch A and who has 'STAFF' (or 'MGR') as a secondary authorization ID will get the following result set if the statement SELECT * FROM CUSTOMER is executed:

CUSTNO  CUSTNAME    BRANCH       INCOME
---------+---------+---------+---------
111222  JONES       A             45000
222333  SMITH       A             57000
 


Now, suppose there is also a requirement that non-management employees in branches not see actual customer income figures. Suppose that instead they are to see a number 1, 2, 3, or 4 in the INCOME column of a SELECT FROM CUSTOMER result set, with the number indicating a relative income level for a customer. If branch managers are allowed to see actual customer income figures for customers assigned to their branch, the data transformation requirement could be addressed through a column mask defined as follows:

CREATE MASK INCOME_MASK ON CUSTOMER                     
 FOR COLUMN INCOME RETURN                               
  CASE                                                  
   WHEN (VERIFY_GROUP_FOR_USER(SESSION_USER,'MGR') = 1) 
    THEN INCOME                                         
   WHEN (VERIFY_GROUP_FOR_USER(SESSION_USER,'STAFF') = 1)
     THEN CASE                                          
      WHEN INCOME >= 200000 THEN 4                      
      WHEN INCOME BETWEEN 100000 AND 199999 THEN 3      
      WHEN INCOME BETWEEN 50000 AND 99999 THEN 2        
      WHEN INCOME < 50000 THEN 1                        
     END                                                
    ELSE NULL                                           
  END                                                   
 ENABLE;


An ALTER TABLE statement activates column masking for the table:

ALTER TABLE CUSTOMER     
ACTIVATE COLUMN ACCESS CONTROL;


And now, when a non-management employee assigned to branch B executes the statement SELECT * FROM CUSTOMER, the following result set will be returned:

CUSTNO  CUSTNAME    BRANCH       INCOME
---------+---------+---------+---------
333444  TERRY       B                 3
444555  WILSON      B                 4


But when a manager in branch B issues the same statement, the result set will look like this:

CUSTNO  CUSTNAME    BRANCH       INCOME
---------+---------+---------+---------
333444  TERRY       B            110000
444555  WILSON      B            230000


Here's why I so prefer permissions and masks over views when it comes to implementing row- and column-level data security in a DB2 for z/OS database:
  • No more "name game" headaches. Creating and activating row permissions and column masks does NOT require users and programmers to reference anything other than the actual target table. Instead of some access being to a security view and other access being to the underlying table as necessitated by data protection requirements, access is always to the CUSTOMER table (in my example) -- the permissions and masks defined on the table take care of filtering rows and transforming data values as needed.
  • Expect a less-tangled data security set-up. The flexibility and power of the CREATE PERMISSION and CREATE MASK statements should enable you to implement the row- and column-level data access controls that your organization needs without the proliferation of related database constructs that you might have on your hands were you to go the security view route. Your permission- and mask-controlled data protection scheme is likely to be simpler in construction, easier to understand, and easier to manage than one that relies on security views.
  • Super users get no special treatment, if that's what you want. When you create and activate a row permission or a column mask for a DB2 table, it applies to ALL access to that table by ALL users. Again referring to the example I've used throughout this blog entry, with the permission I called BRANCH_EMP defined and activated for the CUSTOMER table, will the table's creator be able to retrieve rows from CUSTOMER? How about someone with SYSADM authority? No and no -- unless 'STAFF' or 'MGR' happens to be an authorization ID (either primary or secondary) of the table creator or the person with SYSADM authority. Ah, but can't a user with SYSADM authority gain access to CUSTOMER data simply by executing the statement SET CURRENT SQLID = 'MGR' (or 'STAFF')? Yes, but only if the ZPARM called SEPARATE_SECURITY (new with DB2 10) is set to NO (which happens to be the default). If the value of SEPARATE_SECURITY is YES, someone with SYSADM authority can only change his or her current SQL ID to an ID that is among his or her primary and secondary IDs. Another way to tighten things up is to associate row permissions and column masks with roles, as opposed to authorization IDs -- SET CURRENT SQLID cannot be used to change one's secondary authorization ID to the name of a role.

A lot of DB2 people are drawn to new DB2 features that enhance system performance. I get that. It's always fun to tune a DB2 system and applications. Data security may not be as exciting to you as performance tuning, but it's more important now to your organization than it's ever been before. I'm telling you, DB2 10 row permissions and column masks are a big deal -- a major advance in DB2's already-strong data protection capabilities. Get to know this technology, and leverage it at your site. Your auditors (and, maybe, even your CEO) will appreciate your efforts.

Monday, March 25, 2013

Bringing Control to a DB2 for z/OS Client-Server Workload (Part 2)

In part one of this two-part blog entry, I pointed out that the notion that one cannot control a DB2 for z/OS client-server workload as one can a local-to-DB2 application is quite outdated, and I described a couple of the mechanisms (native SQL procedures, and roles and trusted contexts) by which a DB2 DDF workload can be controlled. In this part two entry, I'll cover some of the other available options for conforming DB2 client-server activity to organizational requirements.

The z/OS Workload Manager, for application priority control

WLM is nothing new -- it's been around for a long time. Still, there are people who seem to feel that WLM cannot be used for precision control of a DB2 client-server workload. What really galls me is the belief -- still, apparently, held by some individuals -- that all work processed through DDF executes at the priority of the DDF address space. That was once the case, long ago. What's true now, and has been for years, is that DDF's dispatching priority applies only to work done by what you might call the address space's "main" or "system" tasks -- and that work is likely to account for only a very small percentage of the total CPU consumption associated with a DB2 client-server workload (as I pointed out in a blog entry on this topic that I posted last year). The vast majority of DDF-related CPU consumption is directly tied to the execution of SQL statements executed via database access threads (DBATs), and THAT work can most definitely be controlled, in a dispatching priority sense, by way of a WLM policy. In fact, it's important to explicitly assign priorities to components of your DDF workload, because if you don't then your DB2 client-server transactions will execute, by default, not at the priority of the DDF address space but as "discretionary" work. That's a low priority, and not at all good for performance in a busy system (and z/OS LPARs are commonly quite busy, typically running with relatively high levels of CPU utilization).

If you have a "mixed" DDF application environment, with some high-volume, structured transactions, some lower-volume, less-structured decision-support queries, some long-running report-generating processes, etc., can you distinguish these different types of work from a z/OS dispatching priority perspective? Of course you can. Through a WLM policy you have the ability to set up multiple service classes for your DB2 client-server workload, and to map various components of the workload to appropriate service classes using a variety of identifiers. These identifiers, which can be provided by client information APIs or through client-side data source specifications, include -- but are not limited to -- the following:
  • Accounting information
  • Collection name (i.e., the name of the collection for the first package used by a DRDA requester in a unit of work)
  • Stored procedure name (the name of the stored procedure initially called in a unit of work)
  • Application process name
  • User ID (the primary DB2 authorization ID associated with an application)

A broad range of WLM options are available for managing DDF workload prioritization, including "period aging" for longer-running processes. A really good write-up of DB2 client-server application priority control using WLM can be found in the IBM "red book" titled, "DB2 9 for z/OS: Distributed Functions" (this information is applicable to DB2 10 systems, as well). Check out section 3.3.2 in this document: "Managing DDF work with WLM."

Profiles for application-level control of DB2 connection resources

Some mainframe DB2 people have wished for a while for a more granular, server-side means of managing client connections to a DB2 for z/OS system -- something more refined than the ZPARM parameters CONDBAT (maximum connections to a DB2 subsystem from network-attached clients), MAXDBAT (maximum number of concurrently active database access threads for a DB2 subsystem), and IDTHTOIN (the maximum amount of time that an active DBAT can be idle before being cancelled). Those folks got what they wanted when, with DB2 10, the two tables SYSIBM.DSN_PROFILE_TABLE and SYSIBM.DSN_PROFILE_ATTRIBUTES, originally introduced with DB2 9 for SQL statement monitoring purposes, were made the enablers of an application-level connection management capability. Using these tables, one can define multiple scopes of DB2 connection control, based on criteria such as the IP address of an application server, the DB2 authorization ID or role associated with an application, or the name of a collection or a package used by an application. For a client-server workload component identified by one of these criteria, one can define and manage controls in relation to several thresholds:
  • The maximum number of connections (including both active and inactive connections) to the DB2 subsystem from an IP address or domain name.
  • The maximum number of concurrently active DBATs used for a DDF workload component identified by IP address, authorization ID, role, collection, or client application name, among other filtering criteria.
  • The active DBAT idle timeout limit for a DDF workload component (as for concurrent active DBAT limits, there are multiple criteria available for defining a particular workload component). Note that the specified active DBAT idle timeout limit can be greater or less than the subsystem-wide active DBAT idle timeout limit indicated by the value of IDTHTOIN in ZPARM.

For any of these connection control profiles, you can instruct DB2 to just inform you (via messages) when thresholds have been exceeded, or take action when limits are reached (e.g., reject new incoming remote connection requests from an IP address when the connection limit for that IP address has been reached). You can find a lot more information about managing DDF connections and threads using profile tables in the DB2 10 for z/OS Information Center.

The query repository, for access path and execution option control at the SQL statement level

As I mentioned in the part one entry that preceded this part two, DB2 client-server applications very often involve the execution of dynamic SQL statements. This fact presented some control challenges that were, formerly, particularly acute in situations in which the dynamic SQL statements in question could not be modified (think about client-server applications purchased from vendors, or queries generated by business intelligence tools). I say "formerly" because DB2 10 delivered a significant enhancement in the area of dynamic SQL statement control: the query repository (more formally known as the access path repository). That term refers to a set of new catalog tables that can be used for two important purposes: access path stabilization and statement execution option selectivity. In both cases, one begins by binding the individual SQL statement in question, using a DSN_USERQUERY_TABLE (the schema of which would be your DB2 authorization ID) and the new BIND QUERY command. With that action, information is inserted into the new SYSIBM.SYSQUERY catalog table. Two other new catalog tables SYSIBM.SYSQUERYPLAN and SYSIBM.SYSQUERYOPTS, can also be populated with information when you issue the BIND QUERY command, depending on what you want to do:
  • Influence a query's access path using a hint, WITHOUT having to modify the query itself. I refer to this as "hintless hints," to distinguish the capability from the more traditional DB2 hint. The problem with the latter is that it often requires modification of the SQL statement in question to add a QUERYNO clause, and as I've noted that is frequently not possible. With DB2 10, a hint can be applied to a query based on a matching of that query's text with that of a query in the SYSQUERY catalog table; thus, no modification of the query for which you want the hint to apply is required. What's more, you can specify that a hint of this type is to apply based only on query text-matching, or on a combination of text-matching and a particular schema or collection or package.
  • Specify certain execution options for a particular SQL statement. Sometimes, you want a particular dynamic SQL statement to be a candidate for query parallelization, without making ALL dynamic queries candidates for parallelization through a specification of ANY for the CDSSRDEF parameter in ZPARM. Similarly, you might want a REOPT option (such as ONCE or AUTO) to apply to a certain statement, versus all statements associated with a given package (a useful capability when a whole lot of dynamic statements are associated with, say, one of the IBM Data Server Driver or DB2 Connect packages). The query repository lets you exert that kind of specific control for query parallelization, reoptimization, and other execution options (such as those pertaining to star join processing).

The DB2 10 Information Center is a great source of additional information on creating statement-level optimization hints and parameters.

And there's more...

How about control over retention of certain resources needed for execution of statements associated with a given package? Prior to DB2 10, RELEASE(COMMIT) was the only option for packages executed through DDF (a package been bound with RELEASE(DEALLOCATE) would be treated as though bound with RELEASE(COMMIT), if executed via a DBAT). DB2 10 high-performance DBATs changed that. Do you run DB2 in data sharing mode? If so, do you want to control the data sharing group members on which requests from certain client-server applications can be processed? If you do, member subsetting is your ticket. Want even more control over the execution of a client-server workload in a DB2 data sharing system? Check out the IBM's InfoSphere Optim Configuration Manager.

Client-server applications are a major driver -- perhaps the major driver -- of DB2 for z/OS workload growth these days. Be a part of that story, and know that this is work that you CAN control.

Saturday, March 16, 2013

Bringing Control to a DB2 for z/OS Client-Server Workload (Part 1)

At System z sites all over the world, DB2 for z/OS workloads are growing -- often at a brisk clip. You'd think that this would be unqualified good news for the folks who support mainframe DB2 subsystems, but in fact some people view a lot of this new DB2 activity with trepidation. Why? Because the lion's share of application work being implemented these days on a DB2 for z/OS foundation is of the client-server variety. In other words, it involves data access through DB2's distributed data facility, aka DDF. OK, so what's the problem with that? The problem has to do with the belief of some individuals that a DB2 DDF workload is one that they cannot control.

I'll tell you up front that this notion of a DB2 client-server workload as being a chaotic swirl of data requests that execute without regard to rules is NOT grounded in facts -- at least, not in today's facts. And there you have a root cause of a lot of misunderstanding: knowledge of DDF that was current 20 years ago but which hasn't kept pace with enhancements that have been delivered with each succeeding release of DB2. In this blog post and in the part 2 post that will follow, I will provide information on several mechanisms that can be used to establish various controls on a DB2 for z/OS client-server application workload.

Let me start by saying that I don't consider people who are concerned about managing a DB2 DDF workload to be wound too tightly, as we say. No, they're being good mainframers. They support a platform that is known for rock-solid reliability, and a pillar of that reliability is control. In a DB2 sense, that means control over data access privileges, over SQL statement access paths, over workload classification and and management, and over DB2 connection resources provided to particular applications. It's true that in the early years of DDF (back in the 1990s), some controls that you'd like to have weren't there. Now they are. In the paragraphs below I'll describe a few of the techniques that can be utilized to bring control to a DB2 for z/OS client-server workload. More information will be provided in another week or so, in the companion to this part 1 entry.

Native SQL procedures, for data access control and SQL statement access path stability

Now, in truth, the SQL statement access path and data access control benefits of stored procedures are available whether external or native SQL procedures are utilized; however, in a client-server application context I'm particularly keen on native SQL procedures (about which I first started blogging while working as an independent DB2 consultant), for a couple of reasons:
  • All kinds of people can write them. This includes people who have worked predominantly with DB2 for LUW, or even with relational database management systems other than DB2. If you can write SQL, you can code a native SQL procedure (and it's even easier when you use Data Studio).
  • They give you lots of zIIP engine offload when invoked through DDF. Unlike an external DB2 stored procedure, which runs under a TCB in a WLM-managed stored procedure address space, a native SQL procedure runs under the task of the calling program. When the CALL comes from a DRDA requester, that task is a preemptible SRB in the DB2 DDF address space, and so the processing associated with the native SQL procedure's execution can be largely offloaded (to the tune of about 60%) to an available zIIP engine. Native SQL procedures are pretty CPU-efficient anyway (and more so in a DB2 10 environment). Toss in the high degree of zIIP eligibility in a client-server setting, and native SQL procedures look even more attractive from a cost-of-computing standpoint.

The data access control and access path stability benefits of native SQL procedures owe to the fact that the SQL statements issued by such a procedure are static (or typically so -- one can, of course, code PREPARE statements in native SQL procedures). Data security is enhanced when static SQL is used because the only authorization required for successful execution of SQL statements in that case is the EXECUTE privilege on the package into which the statements were bound (plus, in the case of a stored procedure, the EXECUTE privilege on the stored procedure). By contrast, successful execution of dynamic SQL DML statements requires that the authorization ID of the process issuing the statements have the requisite access privileges (SELECT and/or INSERT and/or UPDATE and/or DELETE) on the tables targeted by the statements. Stored procedures further enhance data security by abstracting from client-side developers the details of the database schema (table names, column names, etc.).

The stable access paths that characterize static SQL statements help to ensure consistent performance when the statements are executed. Static SQL also minimizes CPU consumption: even with a high "hit" percentage in the DB2 dynamic statement cache, dynamic SQL might approach -- but will not equal -- the CPU efficiency of static SQL.

So, static SQL is good, but do you need to utilize stored procedures to get the benefits of static SQL for DB2 client-server applications? Not necessarily. If your client-side code is written in Java, those programs can issue static SQL statements by way of SQLJ; however, while SQLJ is a standard at some sites, it is often the case that Java programmers will have a strong preference for using JDBC, and JDBC means dynamic SQL at the DB2 data server. That's what makes stored procedures a great fit in a DB2 client-server application environment: they provide a means whereby static SQL can be dynamically invoked. If your client-side developers prefer a database interface such as JDBC or ODBC, fine -- let them use JDBC or ODBC to call DB2 stored procedures and to process output parameters and/or result sets returned by those stored procedures.

Roles and trusted contexts for data access control

I won't repeat here everything that I wrote in a blog entry on roles and trusted contexts that I posted a couple of years ago. Let me instead focus on setting the stage as to why these features, introduced with DB2 9 for z/OS, are so valuable when it comes to controlling data access in a DB2 client-server context.

Sometimes, DB2 client-server applications are going to issue dynamic SQL DML statements, whether you like it or not. That will very often be case when the application in question is vendor-supplied (e.g., a purchased enterprise resource planning or human resources application). You'll be dealing with dynamic SQL, as well, if statements are generated by an end-user query or reporting tool. You could have dynamic SQL statements coming from an in-house-developed application, if the aforementioned stored procedure route (or the SQLJ route, for Java programs) is not taken. Whatever the reason for dynamic SQL DML statements, they create a potential data security exposure. As mentioned previously, successful execution of a dynamic SELECT, INSERT, UPDATE, or DELETE statement requires that the application's DB2 authorization ID have the corresponding privilege (i.e., SELECT, INSERT, UPDATE, DELETE) on the targeted table. Now, it's likely that an application issuing dynamic SQL DML statements will connect to DB2 using an authorization ID and password that are unique to the application, but anyone who knows that authorization ID and password would seem to be free to use the table access privileges granted to the ID to access data -- maybe very sensitive data -- outside of the application. That's kind of a scary thought.

Enter roles and trusted contexts. In essence, here's how these two DB2 features work together to tighten data access control for client-server applications that issue dynamic SQL DML statements: first, the table access privileges needed for an application to work are granted not to an authorization ID, but to a role (something created by way of the CREATE ROLE statement). Next, the conditions of the use of the role's privilege set are defined through the creation of a trusted context. In plain English, the CREATE TRUSTED CONTEXT statement says something like this: "OK, I will allow the privileges granted to role XYZ to be used by an application that connects to the DB2 subsystem using authorization ID ABC, but only when that connection is established from the application server with IP address 9.30.131.203." If someone who knows the application's authorization ID and password attempts to use these credentials to connect to the DB2 subsystem from his personal workstation, the connection might be successful but no security exposure will exist because no DB2 privileges were granted to the authorization ID -- they were granted to role XYZ. The unauthorized user of the application's ID will not be able to use the table privileges granted to role XYZ, because he has connected to the DB2 subsystem from an IP address other than the one specified in the CREATE TRUSTED CONTEXT statement. If the rogue user isn't permitted to log in to the app server at (in this example) IP address 9.30.131.203, he can't do anything, DB2-wise, with the application's authorization ID and password. If needs be, a trusted context can further tighten data access control by specifying that only user IDs SMITH, JONES, and MILLER can use the privileges granted to role XYZ, and again only when the "come from" IP address is 9.30.131.203. With roles and trusted contexts, people responsible for safeguarding DB2 data can rest easier, even when client-server applications are issuing dynamic SQL DML statements.

And more to come...

In my part 2 entry, I'll provide information about additional mechanisms -- namely, the z/OS workload manager, DB2 10 profiles, and the DB2 10 query repository -- that you can use to control a DB2 client-server workload.

Wednesday, February 27, 2013

DB2 for z/OS: What Might "Template" DDL for CREATE TABLESPACE and CREATE INDEX Look Like?

Recently, a DB2 for z/OS DBA sent me a question about a particular clause of the SQL DDL (data definition language) statements CREATE TABLESPACE and CREATE INDEX. In his note containing the question he also sent what you might call his "template" CREATE TABLESPACE and CREATE INDEX statements -- the statements that serve as a pattern that he uses when creating these objects in his DB2 environment. Having such template DDL can be handy, in that it gives you a starting point for coding the statement that will be used to create a particular object. You might vary a clause specification or two to suit a given need, but the general pattern will be followed -- and that to your organization's benefit, if you've chosen the specifications in your template DDL thoughtfully. In this blog entry I'll give you my thoughts on what you might want to have in your template CREATE TABLESPACE and CREATE INDEX statements.

[Note that my use of the word template as it pertains to DDL should not be confused with the TEMPLATE control statement that can be used to facilitate data set allocation when executing some IBM DB2 utilities. Note, too, that I'm writing about application table spaces and indexes, not work file or system (e.g., catalog and directory) objects.]

Let me tell you up front that I'm not going to comment here on every single clause of the CREATE TABLESPACE and CREATE INDEX statements -- I don't have time for that, and even if I did it wouldn't add a lot of value. I'm fine with the default values for many of these clauses (e.g., LOGGED and TRACKMOD YES for CREATE TABLESPACE, and CLOSE YES and GBPCACHE CHANGED for both CREATE TABLESPACE and CREATE INDEX). If you are also OK with the default values for these clauses, you can leave them out of your template DDL (in order to accept the default values), or you can specify some or all of them if you want to see in the DDL a reminder of just what the default values for the clauses are. What I'll provide from here on out are my thoughts concerning some CREATE TABLESPACE and CREATE INDEX clauses about which there is some apparent misunderstanding within the mainframe DB2 community, and/or for which I have some opinions regarding specified values.

For CREATE TABLESPACE and CREATE INDEX
  • USING STOGROUP -- You might think that this would go without saying -- that the use of DB2-managed data sets (indicated by USING STOGROUP) versus user-defined data sets (indicated by USING VCAT) is understood by everyone as being the way to go. I hope that's the case, but I'm including this recommendation anyway, just to make sure. Who wants to mess with z/OS Access Method Services statements to create the data sets that DB2 will use for table spaces and indexes? Let DB2 do it. Note, by the way, that this convention has moved beyond application-related objects: starting with DB2 9, work file table spaces can be DB2-managed, and with DB2 10 the catalog and directory objects become DB2-managed (I have some information about DB2-managed work file table spaces in an entry that I posted to this blog last spring, and information about DB2- and SMS-management of catalog and directory objects in a DB2 10 environment in another entry, posted in the fall of 2011). Note also that, starting with DB2 9 (in new-function mode), you can specify SMS data class, management class, and storage class names in a CREATE STOGROUP statement. That could reduce the need for automatic class selection (ACS) routines used for DB2 data sets at your site. 
  • PRIQTY and SECQTY -- Leave them off of your CREATE TABLESPACE and CREATE INDEX statements, or specify them with a value of -1 (i.e., PRIQTY -1 and SECQTY -1). Either way, what this means is that you're going with a default primary quantity for the object's underlying data set(s), and you're letting DB2 handle allocation of secondary space as needed for the data set(s). The default primary quantity is specified via two ZPARM parameters (TSQTY for table spaces, and IXQTY for indexes -- both with a default value of one cylinder), and DB2 manages secondary space allocation by way of what's known as the "sliding scale" algorithm (this when the ZPARM parameter MGEXTSZ is set to YES). Letting DB2 manage space allocation for table space and index data sets makes all kinds of sense: it works, it helps to keep you out of trouble (in terms of avoiding data set extend failures), and it frees up DBA time for more high-value tasks. I wrote about DB2 management of data set space allocation in an entry that I posted to the blog that I maintained while working as an independent DB2 consultant.

For CREATE TABLESPACE
  • MAXPARTITIONS -- What I'm getting at here is the idea of a default table space type for your environment. Your first decision is universal versus non-universal. You might want to give serious consideration to making universal partition-by-growth your default table space type (that would be indicated by having the MAXPARTITIONS and SEGSIZE clauses in the CREATE TABLESPACE statement). The primary reason for this recommendation: to an increasing extent, exploitation of new DB2 features will require the use of universal table spaces. With DB2 9 (the version with which universal table spaces were introduced), it was clone tables. With DB2 10, it's hash-organized tables, LOB-inlining, and the CURRENTLY COMMITTED mode of data access. I fully expect that this trend will continue with future versions of DB2. Using universal table spaces gives you maximum flexibility when it comes to leveraging DB2 technology. If you're going to go universal, the right choice for most table spaces is likely to be partition-by-growth (PBG) versus partition-by-range (PBR). Partition-by-range has its advantages, but I believe that PBR is best used for certain larger tables in a database (e.g., tables holding at least one million rows) that have columns or column sets that would make for good partitioning keys. Is non-universal ever the right choice for a table space? It could be if you want to assign several tables (perhaps a number of small-ish reference tables) to a single table space -- something you might do to keep a lid on the number of data sets in a DB2 system, and to simplify the backing up of said set of tables (via the COPY utility, which operates at the table space level). A traditional segmented table space can hold several tables, whereas one universal table space can hold one table. Bear in mind that the MEMBER CLUSTER table space definition option, required at some sites to support very high volumes of insert processing, is valid for universal table spaces in a DB2 10 new-function mode environment (it could not be used with universal table spaces in a DB2 9 system).
  • SEGSIZE -- Most all of your tables should be segmented -- either traditional segmented table spaces or universal table spaces (the latter are also segmented). An exception to this rule: in a DB2 9 environment, you might choose a traditional range-partitioned table space over a PBR universal table space if the volume of inserts into the associated table will be very high, as the former table space type will likely deliver a performance advantage over the latter (with DB2 10, high-volume insert performance for a universal PBR table space is pretty much equivalent to what you get with a traditional range-partitioned table space). Aside from that case, segmented -- whether universal or non-universal -- is the way to go. The default SEGSIZE value is 4, and that might make sense for a non-universal segmented table space to which several really small tables are assigned. For single-table table spaces, the defualt SEGSIZE of 4 is probably not the best choice. I'd be more inclined to go with something like SEGSIZE 32, with an eye towards boosting prefetch efficiency.
  • LOCKSIZE -- The default value here is ANY, which allows DB2 to use any type of locking granularity when accessing the table space. Yes, DB2 will typically use page-level locking for a table space that is defined with LOCKSIZE ANY, but I like to make the locking granularity choice myself. I prefer to go with LOCKSIZE PAGE as a default for table spaces. I'd change that to LOCKSIZE ROW for a table space only if page-level locking resulted in an unacceptable level of lock contention. Some people believe that you can't use row-level locking in a DB2 data sharing environment, or that doing so would be a big mistake. In fact, selective use of row-level locking in a DB2 data sharing system can be quite beneficial in relieving lock contention issues, as I pointed out in a blog entry that I wrote a few years ago. Another thing about a change to row-level locking from page-level locking for a table space: doing that could cause some data-changing programs that target the table space to acquire considerably more locks between commits, and THAT could conceivably cause said programs to hit the limits you've established (via ZPARMs) on the number of locks held by one process across all the tables that it is accessing, and/or the number of locks acquired on a single table space (the latter put into effect for a table space by way of the LOCKMAX SYSTEM clause of CREATE TABLESPACE). To avoid a surprise in this department, consider raising the values of the ZPARM parameters NUMLKUS and NUMLKTS if you change from page-level to row-level locking for one or more of your table spaces. Also keep in mind that if you change the locking granularity for a table space from page-level to row-level (or vice versa), and you want the new locking granularity to be utilized for static SQL statements that target the table (or tables) in the table space, you'll need to rebind the associated packages after altering the LOCKSIZE specification.
  • COMPRESS -- DB2 for z/OS data compression is very CPU-efficient, thanks to a hardware assist, and it is therefore widely used. I'm a big fan of COMPRESS YES myself, but I don't think that it should be used for every table in a database. A compression dictionary takes up 64 KB of space in memory, so it's possible that compressing a really small table space could lead to that table space actually having a somewhat larger virtual and real storage "footprint" than it would have were it not compressed. I'd certainly lean towards compressing table spaces with 1000 or more pages.
  • PCTFREE -- For a table space, the default value for PCTFREE is 5 (that's 5% of the space in a page). If a table's rows are not clustered by a continuously-ascending key, it would be good to have some "holes" in pages (following a REORG or a LOAD) into which newly-inserted rows could go -- the better to maintain good data organization between REORGs. In that case, PCTFREE 5 might be too small (5% of a 4K page is about 200 bytes, and that may or may not be enough to hold a table row -- and remember to consider compressed row length if you specified COMPRESS YES for the table space). A PCTFREE value of 10 might be better for maintaining data organization than PCTFREE 5. Of course, if a table's rows are clustered by a continuously-ascending key, a non-zero PCTFREE value would probably result in wasted space.
For CREATE INDEX
  • FREEPAGE -- The default value of FREEPAGE is 0. Many indexes are defined on keys that do not have continuously-ascending values. For these indexes, new entries will go into the "middle" of the index. Some index leaf page split activity has to be expected, and when that happens it would be nice to be able to put the "split out" index entries into a page that is not too far from the page that was split. For that to happen you need to have some empty pages throughout the index (following an index REORG or a LOAD of the underlying table), because index entries that are relocated as a result of a leaf page split have to go into an empty page. If FREEPAGE 0 is specified for an index, the only empty pages will be at the end of the index, and that's where "split out" entries will go when page splits occur. The result could be fairly rapid disorganization for an index defined on a non-continuously-ascending key, if a lot of rows are inserted into the underlying table. For such an index, consider a FREEPAGE value of between 5 and 10 (FREEPAGE 0 is appropriate for an index on a continuously-ascending key).
  • BUFFERPOOL -- Starting with DB2 9 in new-function mode, indexes could be assigned to buffer pools used for pages larger than 4K in size. Going with 8K, 16K, or 32K pages for an index is required if you want to utilize the index compression capability introduced with DB2 9, but even if you aren't interested in compressing an index, a larger page size could be a good choice. In particular, if the index in question is defined on a non-continuously-ascending key, and the volume of inserts into the underlying table is high, a large index page size (32K, for example) could substantially reduce the occurrence of index page splits, and that could deliver a nice performance dividend.

Do you have template DDL that you use as a starting point for CREATE TABLESPACE and CREATE INDEX statements in your DB2 environment? If not, perhaps you should. If you already have DDL that you use as a pattern for CREATE TABLESPACE and CREATE INDEX statements, how long ago was that pattern set up? Has the template DDL kept pace with DB2 changes that provide new options for table spaces and indexes? If not, a review and an update of the template DDL could be in order. It's all part of being ready to put the latest DB2 technology to the best use at your site.

Monday, February 11, 2013

DB2 for z/OS: Boosting UNLOAD and LOAD Performance with Pipes and FORMAT INTERNAL

My colleague Jorn Thyssen, who is based in Denmark, recently let me know of some nice work that he had done in setting up a high-performance process that gets data from one DB2 for z/OS table into another via the UNLOAD and LOAD utilities. What Jorn shared with me I'll share with you in this blog post.

Here's how this got started: Jorn, an IBM System z Information Management Technical Specialist, had delivered presentations on the DB2 Analytics Accelerator for z/OS to a number of organizations. In so doing, he would describe how the copying of DB2 tables into the Analytics Accelerator was performance-optimized in a couple of ways: 1) data is unloaded from the source DB2 tables in internal format, saving the CPU cycles that would otherwise be consumed in converting the data to external format; and 2) the unloaded DB2 data is transferred to the Analytics Accelerator by way of z/OS UNIX System Services (USS) pipes. [Pipes, also known as FIFO (first-in, first-out) files, are commonly used in UNIX environments. One process can read from a pipe as another process is writing data to the pipe, enabling a data load operation to run simultaneously with the unload operation that provides its input.] Following one of these presentations, someone asked Jorn a question: could data be moved from one DB2 for z/OS table to another in a similar fashion, with no data conversion and without the use of an intermediary data set that would force serialization of the load and unload tasks?

In considering this question, Jorn thought first about the cross-loader function of the DB2 LOAD utility. That could be used to get data from one table to another without the need for an "in-between" data set that would first be the output an UNLOAD and then the input to a LOAD. The cross-loader, however, can't be used with the FORMAT INTERNAL option, so while it would address the "no intermediate data set" stipulation, it would leave the "no data conversion" requirement unsatisfied.

It then occurred to Jorn that one could utilize, for a table-to-table data move in a DB2 for z/OS context, the same technique employed for copies of data into a DB2 Analytics Accelerator: combine the FORMAT INTERNAL option with a transference of data through a USS pipe. Jorn went looking for examples of this approach, and when he didn't find any that precisely fit the bill, he created his own UNLOAD and LOAD jobs to show how FORMAT INTERNAL and USS pipes can be used to avoid data conversion and the serializing effect of a traditional "in-between" data set. He ran these jobs on a DB2 subsystem that he uses for testing purposes, and they worked as expected. The jobs were submitted at the same time. The UNLOAD process waited for the LOAD process to open the pipe for reading, whereupon it commenced writing unloaded records to the "back" of the pipe, while the LOAD process read records from the "front" of the pipe. This is all done in memory -- there is no physical I/O involved.

Here is Jorn's UNLOAD job:

//UNLD1 EXEC DSNUPROC,SYSTEM=DB2A,
//             LIB='DB2.V9R1.SDSNLOAD',
//             UID=''
//SYSPUNCH  DD DSN=ABC1234.DB2A.CNTL.DSN8D91A.DSN8S91E.PTALL,
//             DISP=(,CATLG,DELETE),
//             DCB=(LRECL=80,BLKSIZE=0,RECFM=FB,DSORG=PS),
//             SPACE=(TRK,(5,5),RLSE),
//             UNIT=SYSDA
//SYSABC  DD PATH='/tmp/unload.pipe1',DSNTYPE=PIPE,
//        LRECL=107,BLKSIZE=27998,RECFM=VB,
//        PATHOPTS=(OCREAT),
//        PATHMODE=(SIWUSR,SIRUSR),
//        PATHDISP=(DELETE,DELETE)
//SYSTSPRT  DD SYSOUT=*
//SYSPRINT  DD SYSOUT=*
//SYSIN     DD *
 TEMPLATE UD
      PATH('/tmp/unload.pipe1')
      PATHDISP(KEEP,KEEP)
      RECFM(VB)   LRECL(00000049)
      FILEDATA(BINARY)
                         
UNLOAD TABLESPACE DSN06697.TEST UNLDDN(UD)
  FORMAT INTERNAL
/*


And here is the control statement for the complementary LOAD job:

//DSNUPROC.SYSIN DD *
 TEMPLATE D3RYKD6Z
      PATH('/tmp/unload.pipe1')
      PATHDISP(KEEP,KEEP)    PATHOPTS(ORDONLY)
      RECFM(VB)   LRECL(00000049)
      FILEDATA(BINARY)
 LOAD DATA INDDN D3RYKD6Z LOG NO  REPLACE
   FORMAT INTERNAL
  INTO TABLE ABC1234.TESTKOPI


In addition to unloading data from one DB2 for z/OS table and loading the data into another table, Jorn successfully tested some variations on the technique:
  • He unloaded data from a table to a USS pipe, and sent that pipe to another system via FTP.
  • He sent a file to a USS pipe via FTP, and loaded data into a table from the pipe.

In exploring the use of USS pipes in your DB2 for z/OS environment, you might find the following sources of additional information to be useful:
  • DB2 for z/OS APAR PK70269. The fix for this APAR introduced DB2 for z/OS TEMPLATE support for USS files (these are associated PTFs for DB2 for z/OS versions 8 and 9 -- the functionality is part of the DB2 10 base code). The text of this APAR is quite informative.
  • The IBM "red book" titled, "DB2 9 for z/OS: Using the Utilities Suite." Section 3.2.8 of this document explains the use of TEMPLATE with USS pipes. Section 7.17 covers unloading and loading data using USS pipes.
  • The DB2 for z/OS Utility Guide and Reference contains supporting information in the section on TEMPLATE. Refer to the DB2 9 or the DB2 10 manual, depending on the DB2 release you're running.  

We often think of DB2 for z/OS utilities as workhorses, and they are, but they are workhorses that are constantly being enhanced with new functionality, one example of which I've written about here (with, again, a tip of the hat to Jorn Thyssen). Consider how the combination of FORMAT INTERNAL and USS pipes could enhance the performance of UNLOAD and LOAD operations at your site.