Friday, January 29, 2016

DB2 for z/OS: Thoughts on History and Archive Tables

I'll state up front that I'm using the terms "history table" and "archive" table not in a generic sense, but as they have technical meaning in a DB2 for z/OS context. A history table is paired with a "base" table that has been enabled for system-time temporal support (introduced with DB2 10 for z/OS), and an archive table goes with a base table that has been enabled for DB2-managed archiving (also known as "transparent archiving" -- a feature delivered with DB2 11). Last week, I posted to this blog an entry that explored the use of system-time temporal and DB2-managed archiving as alternative solutions for different data archiving scenarios. Today I want to draw attention to some considerations related to history and archive tables.

Actually, it's not so much history and archive tables themselves that are on my mind. Tables are an embodiment of logical database design. What I'm thinking about is physical database design. You see, history tables and archive tables have to be logically identical to their associated base tables -- same columns, in the same order and with the same data type. Must history and archive tables be identical to the associated base table in a physical sense? NO. That physical-difference flexibility is something that a DB2 DBA should definitely have in mind as he or she contemplates enabling system-time temporal support or DB2-managed archiving for a base table.

Here's an example of what I'm talking about: suppose that a table you want to enable for system-time temporal support or DB2-managed archiving resides in a traditional segmented table space. Does that mean that the associated history or archive table has to reside in such a table space? NO. It could be that a universal range-partitioned or partition-by-growth table space would be a better choice for the base table's history or archive table (that 'or' means that a single table cannot be enabled for both system-time temporal support and DB2-managed archiving -- it's one or the other, and my previous blog entry was intended to help you make that choice). Think about it. There's a good chance that a history or an archive table will end up being a lot larger than its associated base table, and partitioned table spaces can hold a very large amount of data (up to 128 TB).

Indexes are another aspect of physical database design that can be different, in a number of ways, for history or archive tables versus their associated base tables. Is there an index on a base table that would not be so useful (i.e., that would not have benefits in excess of its CPU and disk space and DBA-time costs) for an associated history or archive table? Fine -- don't define that index for the history or archive table. Similarly, would an index NOT defined on a base table be useful for an associated history or archive table? Fine -- define it for the history or archive table. Even for an index you want on both a base table and its associated history or archive table, should the index page size be different? Should index compression be used or not used? Should data rows in a history or archive table be clustered differently versus rows in the associated base table?

And you can go right on down the line with other aspects of physical database design. It might make sense to assign a base table's table space to a PGSTEAL(NONE) buffer pool, but would that be a good choice for the table space of an associated history or archive table? If LOCKSIZE(ROW) is needed for a base table's table space, would page-level locking be more appropriate for an associated history or archive table? If hash organization of data delivers performance benefits for a base table, would cluster-organized data work out better for the associated history or archive table?

You get the idea. What you want to do, in considering enablement of system-time temporal support or DB2-managed archiving for a base table, is keep in mind that LOGICAL equivalence of the base table and its history table, or the base table and its archive table, does not require PHYSICAL equivalence between these tables. Think about the physical database design characteristics that would be optimal for the base table, and determine whether those or DIFFERENT physical database design characteristics would be right for the associated history or archive table.

Thursday, January 21, 2016

DB2 for z/OS-Managed Archiving, or System-Time Temporal?

Recently I had the opportunity to talk to a group of application developers and DBAs about the temporal data support that was introduced with DB2 10 for z/OS. One of the meeting attendees asked an interesting question: for data-archive purposes, should one use system-time temporal functionality (by which DB2 retains "before" images of rows that are affected by update or delete operations), or the DB2-managed archiving capability delivered with DB2 11 for z/OS?

At first glance, that might look like a question with an obvious answer: because the question is about data archiving, the answer should be, "Use DB2-managed archiving," right? In actuality, one really could go either way, depending on an organization's requirements regarding the data in question. In certain circumstances, system-time temporal functionality can be an appropriate data archiving mechanism, despite the fact that the word "archiving" is not part of the formal name of that DB2 feature.

Before proceeding further, I'll provide a brief description of DB2's system-time temporal support (DB2-managed arching is described in the blog entry to which I provided a hyperlink, above). System-time refers to one of two "flavors" of DB2 temporal data functionality (the other flavor, known as business-time temporal, provides a means whereby future data changes -- such as a price adjustment for a product or service -- can be reflected in a DB2 table's data without affecting programs that, by default, access data that is in effect, from a business perspective, now). When a DB2 table is enabled for system-time temporal support (either when it is created, or, for an existing table, via ALTER operations), it has a couple of extra columns, both of the timestamp variety (there is actually a third added timestamp column, called a "transaction-start ID" column, that appears to be largely unused at present but may get a role to play in a future release of DB2 for z/OS). These two columns indicate the time at which a row became current (by being inserted into the table, or by way of an update operation targeting an existing row), and the time at which the row became non-current (by being deleted, or changed via update -- rows that are "currently current" have a high value of midnight, December 31, 9999 in the "time the row became non-current" column).

A table enabled for system-time temporal functionality, in addition to having "row became current" and "row became non-current" timestamp columns, also has an associated table called a history table. The history table holds the "before" images of rows that were either deleted from the "base" table or were changed in the base table via update; so, all the rows in the history table are non-current, and all the rows in the base table are "currently current." If a row for product XYZ is inserted into a system-time temporal-enabled PRODUCTS table at 9:27 AM on January 22, 2016, that timestamp (which will in fact have picosecond precision) will go into the row's "became current" column (which is actually called the "row begin" column), and the "became non-current" column (officially, the "row end" column) will contain the high value for a timestamp (as previously mentioned, that's midnight on December 31, 9999). If the price of product XYZ is changed from $10 to $15 via an UPDATE at 3:30 PM on March 10, 2016, the row in the base table will show the $15 price, a "row begin" time of 3:30 PM on March 10, 2016 (the time of the UPDATE), and a "row end" time of the previously noted high-timestamp value. DB2 will place in the base table's history table the "before" image of the row (before the update, that is), with a price of $10 (the pre-UPDATE price), a "row begin" time of 9:27 AM on January 22, 2016 (the time the row was first inserted in the base table with the $10 price), and a "row end" time of 3:30 PM on March 10, 2016 (the time of the update that changed the price for product XYZ to $15).

Here's what's really cool about this system-time temporal thing: through extensions to SQL that accompanied the introduction of temporal functionality, I can (or a program can) ask DB2 this question: "What was the price of product XYZ on February 17, 2016?" DB2 will retrieve information from the row that was current as of the date specified, and provide the answer: $10. Importantly, that temporal query does not have to reference the history table associated with the system-time temporal-enabled base table -- it just references the PRODUCTS table (the base table in this example), and DB2 takes care of looking for the row, if needs be, in the history table that has been paired with the PRODUCTS table. Besides allowing data retrieval on a prior POINT in time basis, DB2 system-time temporal support allows data retrieval on a prior RANGE of time basis -- a program or user can ask DB2, "Show me any and all rows for product XYZ that were current for at least some time between December 1, 2015 and April 30, 2016." DB2 will retrieve that row or rows (in our example, it would be two rows for product XYZ -- one from the base table and one from the history table), with again no need for the history table to be referenced in the query. By examining the values in the "row begin" and "row end" columns of the retrieved rows, one can see how data for (in this example) a product changed, and when those changes were effected.

What system-time temporal support and DB2-managed archiving have in common is the notion of a "single logical table" that in fact consists of two physical tables (a base and a history table, in the case of system-time temporal, and for DB2-managed archiving a base and an archive table). In both cases, application programs do not have to reference the "associate" table. Data access references are to the base table, and DB2 takes care of pulling in data from the "associate" table, as needed.

Back now to data archiving. Various DB2 for z/OS-using organizations have seen system-time temporal support as a data archiving solution, and they are not wrong in looking at it that way. The choice between system-time temporal and DB2-managed archiving (and an either-or choice it is, because these capabilities cannot both be used with a single base table) comes down to the nature of data in a table that an organization wishes to retain over some long period of time (and long-term retention of historical data is my simple conception of archiving). It may be that data in a table, once inserted, is never updated, and that rows deleted are either discarded completely (thrown in "the bit bucket," if you will) or moved to offline media. In that case, if interest is only in rows that are "currently current" (i.e., still valid and true, even if quite old), and if rows have to be kept for a long time, and there is a desire to physically separate older from newer rows to boost access performance for the newer rows (the assumption here being that newer rows are the ones most frequently accessed), DB2-managed archiving offers a nice solution that combines optimal performance for newer-row retrieval with large-capacity historical data retention, without complicating SQL coding for queries that might need to access older as well as newer rows (thanks to the fact that, as noted, programs need only reference the base table, even if the associated archive table needs to be searched for data). Even if rows in the table are updated after having been inserted, DB2-managed archiving can be a good historical data retention solution if there is interest only in currently-valid rows -- if there is no interest in what updated rows "used to look like" before they were updated.

What if, on the other hand, there is an interest in what rows looked like at some prior point in time, even if the rows look different now because of updates, or even if the rows were deleted from the base table? If the need to retain large amounts of historical data includes a requirement -- or at least a desire -- to maintain accessibility to non-current (i.e., not in effect now), "prior versions" of rows made non-current through DELETEs and/or UPDATEs, DB2's system-time temporal functionality (sometimes referred to as "row versioning") could be just the ticket.

System-time temporal data support and DB2-managed archiving can both be thought of as data archiving solutions, even though only the latter feature has "archiving" in its name. The decision to use one or the other will depend on an organization's requirements for data in a given table, and one DB2-using company might well make use of both solutions for different tables (keeping in mind that a single table cannot be both archive-enabled and system-time temporal-enabled). Remember that system-time temporal functionality is about holding onto "what was true" data associated with a table, even as "what is true" is changed by way update and/or delete operations. DB2-managed archiving is the right tool for the job when the only data to be retained is of the "is true now" variety, and when a large quantity of historical data and programs' propensity to access fairly "new" data make physical separation of older and newer rows -- without complicating data access logic -- advantageous from a performance perspective.

In about a week or so, I'll post a companion piece to this entry, with some matters for DB2 DBAs to consider as they ponder physical database design options for history tables (used with system-time temporal-enabled base tables) and archive tables (used with archive-enabled base tables).

Tuesday, December 29, 2015

Thoughts on DB2 for z/OS Index Compression

Not long ago, a DB2 for z/OS DBA requested of me some guidance in planning for his organization's initial use of index compression (a capability introduced with DB2 9 for z/OS). He asked some good questions, and with the thought that my responses might be useful to others in the DB2 community, I offer this blog entry.

First, strive to get good "bang" from your index compression "buck." "Bang for the buck" is an expression here in the USA (and perhaps elsewhere in the English-speaking world) that refers to getting a good return on investment. At this DBA's site, there were many thousands of indexes defined on DB2 tables (in two large production environments and multiple development and test systems). The DBA understandably viewed compression of every last one of these indexes as a daunting prospect. My advice: don't compress all the indexes. Generally speaking, at a DB2 for z/OS site a relatively small number of really big tables will hold the lion's share of data rows, and the indexes defined on these really big tables will be the ones that occupy the large majority of disk space used for indexes. Compress this proportionally small number of indexes, and you'll achieve maximum disk space savings with minimal effort and overhead.

I told the DBA to identify the largest indexes in a given DB2 system (and by "system," I'm referring to the objects for which information can be found in a particular DB2 catalog). He did that with a query similar to this one (the grouping and summing aggregates information pertaining to the partitions of a partitioned index):

SELECT
 DBNAME, NAME
,SUM(TOTALENTRIES) AS TOT_ENT
,SUM(SPACE) AS TOT_SPC_KB
FROM SYSIBM.SYSINDEXSPACESTATS
GROUP BY DBNAME, NAME
WITH UR;


We speak sometimes of the "80/20" rule, whereby 80% of results are tied to 20% of things acted upon. In the case of the DB2 systems examined by the DBA with whom I was working, 80/20 would have understated the outcome: in one of the production DB2 environments, 1% of the indexes (the top 1% in terms of number of entries) occupied 85% of the total disk space used for indexes (and results were similar for the other DB2 systems at this site). By compressing a relatively small number of indexes, a large decrease in disk space consumption will be achieved (index compression can often deliver a 50-70% reduction in the disk space occupied by a given index). That's a very good bang for the index compression buck.

Second, choose the right page size for an index that is to be compressed. When a DB2 for s/OS table space is compressed, data is in compressed form in memory as well as on disk (data is compressed when inserted into a page, and decompressed when accessed on behalf of a program), so a 4KB page (for example) of compressed data in memory also occupies 4KB of space on disk.

Not so with index compression. Index pages in memory are always uncompressed -- they are in compressed form only on disk. That being the case, index compression is achieved by squeezing the contents of a 32KB, 16KB, or 8KB index page in memory into a 4KB page on disk. Compressing an index, then, involves two actions: 1) change the index's page size to something larger than 4KB (if the current page size is 4KB) via ALTER INDEX with a BUFFERPOOL specification that references an 8K, 16K, or 32K buffer pool; and 2) change the index again to be compressed via a second ALTER INDEX with a COMPRESS YES specification. [The first ALTER INDEX, specifying an 8K, 16K, or 32K buffer pool, will place the index in AREOR status, indicating a pending DDL change -- a subsequent online REORG of the index will change the index's page size. The second ALTER INDEX, with the COMPRESS YES specification, will place the index in rebuild-pending status.] Knowing this, you might think, "Hey, I'll change the page size for all the indexes I want to compress to 32KB. That way, I'll get an 87% reduction in disk space for these indexes!"

That would not be a good move. Here's why: if a compressed index has 32KB-sized pages, DB2 will stop inserting entries into a page in memory if it determines that the page already holds the maximum amount of information that can be squeezed into a 4KB-sized page on disk; thus, going with 32KB pages for an index could result in space being wasted in the in-memory pages of the index. Does that mean that 8KB would be a better page-size choice for indexes that you want to compress? Not necessarily -- with 8KB-size pages, you might be missing out on disk space consumption reduction that could be achieved with a larger page size. What to do?

Thankfully, DB2 provides a utility, DSN1COMP, that provides a nice assist with respect to choosing the right page size for an index that you want to compress. When you run DSN1COMP for an index, the output will show you, for each page size (8, 16, and 32 KB), the estimated amount of disk space savings that could be achieved with compression, and the estimated amount of space that would be wasted in in-memory pages of the index. For a given index that you want to compress, the right page size will be the one -- per DSN1COMP output -- that will deliver the most disk space savings with the least amount of in-memory space wastage.

Finally, leverage z/OS LPAR memory resources to minimize the CPU overhead of index compression. As noted above, pages of a compressed DB2 for z/OS table space are compressed in memory as well as on disk. Consequently, data compression CPU overhead is incurred when a row is inserted into a table in a compressed table space (or an existing row is updated), and when a row is retrieved from a compressed page in memory. Because pages of compressed indexes are compressed only on disk, index compression CPU overhead is incurred when a compressed page is decompressed upon being read into memory, and when an in-memory index page is compressed as it is written to disk. In other words, the CPU cost of index compression is paid at read and write I/O time -- once an index page is in memory, there is no compression-related cost to access that page, because the page is in memory in uncompressed form.

Since the cost of index compression is associated with index I/O operations, it stands to reason that as these operations decrease, the CPU cost of index compression will go down. And what makes index I/O rates go down? A larger buffer pool. If you compress index XYZ, a subsequent enlargement of the buffer pool to which XYZ is assigned should reduce the rate of I/Os (especially read I/Os) for the index, and that should translate to reduced compression overhead for index XYZ. Where would you see this overhead reduction? That depends on the type of I/Os that are being reduced via buffer pool enlargement. To the extent that the rate of synchronous read I/Os is lowered by way of buffer pool enlargement, the in-DB2 (aka class 2) CPU time for programs using a compressed index should be positively impacted. Reduced prefetch read and database write activity should reduce the CPU consumption of the DB2 database services address space (aka DBM1) -- and recall that prefetch reads and database writes became 100% zIIP eligible starting with DB2 10, so a lower volume of that activity will cut down on associated zIIP engine consumption.

Index compression has been an option for some years now in a DB2 for z/OS environment, but I have the impression that it's a DB2 feature that is not yet used as often as it should be. If you're not yet using index compression, think about putting the capability to work at your site; and, if you decide to compress some of your indexes, do it the right way.

Monday, November 30, 2015

DB2 for z/OS: How I Use -DISPLAY GROUPBUFFERPOOL Output

Having recently posted a blog entry on my use of output from the DB2 for z/OS -DISPLAY BUFFERPOOL command, it seems a good time to post a companion entry focused on the -DISPLAY GROUPBUFFERPOOL command. Obviously, this latter command is relevant to people who work with DB2 operating in data sharing mode on a Parallel Sysplex. If you are such a person, read on.

First, enter the command the right way

Actually, you'll want to enter the command in two forms. First, from any one member of the DB2 data sharing group of interest, issue the following:

-DISPLAY GROUPBUFFERPOOL(*) TYPE(GCONN) GDETAIL(INTERVAL)

More specifically, issue that command twice, one hour apart, and retain the output of the second issuance of the command. Why? Because that command's output will show activity, from a group perspective, for each group buffer pool for a one-hour period of time (i.e., for the period of time between the first and second issuances of the command). In the command output you'll see, for each group buffer pool (GBP), a DSNB782I message, the text of which will include the phrase "INCREMENTAL GROUP DETAIL STATISTICS SINCE timestamp-value." You can check that timestamp value to verify that the data in the second issuance of a pair of hour-separated -DISPLAY GROUPBUFFERPOOL command captures one hour of activity.

Consider issuing pairs of hour-separated -DISPLAY GROUPBUFFERPOOL commands, in the form shown above, at two different times of day -- maybe once during a busy online transaction time, and once during a period of heavy batch activity.

The other form of the command you'll want to use is as follows:

-DISPLAY GROUPBUFFERPOOL(*) TYPE(MCONN) MDETAIL(INTERVAL)

Since this command provides member-view information, you might want to issue it for more than one member of the data sharing group. As mentioned above for the GDETAIL form of the command, issue the MDETAIL form, on a given DB2 data sharing group member, twice, with an hour between each command issuance, so as to capture (in the output of the second issuance of the command) one hour of member-scope activity for each GBP (note that for the MDETAIL form of the command, the "STATISTICS SINCE timestamp-value" information is in a DSNB771I message, versus the DSNB782I message associated with the GDETAIL form of the command). Also as mentioned previously, consider issuing pairs of hour-separated commands at different times of the day (maybe heavy online and heavy batch).

If you or one of your colleagues has some REXX programming skills, you can do as some have done and have a program issue pairs of hour-separated -DISPLAY GROUPBUFFERPOOL commands, and then have a REXX routine parse the output of the second of those commands and insert useful information (as described below) into a DB2 table or maybe a file. If such a program and associated REXX routine are executed on a daily basis, you will have information that you can use to track, and adjust to, trends in GBP usage.

How I use information in the output of -DISPLAY GROUPBUFFERPOOL(*) TYPE(GCONN) GDETAIL(INTERVAL)
  • Look for the "double zeros" -- referring here to zero write failures due to lack of storage (see the DSNB786I message text that's part of the command output), and zero cross-invalidations due to directory entry reclaims (part of the DSNB788I message text in the command output), for each GBP. If GBP write failures occur due to lack or storage, more than likely the GBP is too small (or the ratio of directory entries to data entries for the GBP is too large -- more on this below). You don't want GBP write failures, because that can land pages on the logical page list (LPL), and that can lead to failures for programs that subsequently attempt to access those pages prior to their being recovered from the LPL. [Note that a DB2 11 enhancement called group buffer pool write-around was introduced to help reduce the incidence of GBP write failures due to lack of storage. You can read about this enhancement in section 5.1 of the IBM "redbook" titled DB2 11 for z/OS Technical Overview (downloadable from http://www.redbooks.ibm.com/abstracts/sg248180.html?Open).] Cross-invalidations due to directory entry reclaims occur when a directory entry (used to track inter-DB2 interest in table space and index pages) has to be stolen (because all are in use and a new page has to be registered). Because the directory entry reclaim results in the GBP losing the "pointer" to a page, from a GBP-dependent data set, that is cached locally in a buffer pool of at least one group member, the page has to be preemptively invalidated wherever it is locally cached. You'd rather avoid that situation, because when a page invalidated for this reason is next accessed by a DB2 member subsystem, it will have to be read back into memory from disk (or from the GBP, but a read "hit" in this case tends to be quite unlikely). The page read in from disk will probably look just like the previously cached page that was marked invalid (because the invalidation was necessitated by a directory entry reclaim, not by a change of the page by a program on another group member); thus, a disk read that would be otherwise unnecessary is driven by a directory entry reclaim. The key to avoiding directory entry reclaims is to have a number of directory entries in a GBP that is at least as large as the number of data entries in the GBP plus the number of buffers in the corresponding local buffer pools in each member DB2 subsystem (as described in further detail in an entry I posted to this blog a couple of years ago).
  • Check out the specifications of each GBP. In particular, look at the ratio of directory entries to data entries. The default for this ratio is 5:1. If you see a directory-to-data-entry ratio that is significantly higher than 5:1, check to see if ALLOWAUTOALT(YES) is specified for the GBP in the CFRM policy for the Sysplex. If the directory-to-data-entry ratio for a GBP has been automatically increased by the system in order to avoid directory entry reclaims, that's OK, but in my experience this kind of adjustment can be an indication that the GBP is smaller than it should be (I wrote about this in a blog entry I posted a few months ago). Note that if you decide to make a GBP larger, make sure that all structures defined for both coupling facilities used by a DB2 data sharing group can fit in one coupling facility LPAR (except for secondary group buffer pools, when GBPs are -- as they should be -- duplexed), in case the other CF LPAR is down for maintenance purposes or due to a failure (if you temporarily had all structures in one CF LPAR, you would not during that time be duplexing the GBPs).
How I use information in the output of -DISPLAY GROUPBUFFERPOOL(*) TYPE(MCONN) MDETAIL(INTERVAL)
  • Basically, I use the output of this command to get one item of information that is not available in the output of the GDETAIL form of the command: the XI GBP hit ratio. That ratio (described in more detail in the blog entry referenced in the item above about directory-to-data-entry ratios) is so named because it has to do with synchronous GBP read requests that are driven by cross-invalidations (or XIs, in DB2 data sharing shorthand). Why does this ratio merit attention? Because if there are no cross-invalidations due to directory entry reclaims (see the first item above pertaining to the output of the GDETAIL form of the -DISPLAY GROUPBUFFERPOOL command) then what cross-invalidations did occur pretty much had to be caused by data-change activity (when a program connected to member DB2A changes a page cached locally in a buffer pool of member DB2B, the DB2B copy of the page has to be invalidated so that DB2B will know to retrieve the current version of the page). Data-change activity associated with GBP-dependent data sets results in changed pages being written to the appropriate GBP, and if pages were written to a GBP and the GBP has enough data entries to keep changed pages cached for a while, it is reasonable to expect that a good percentage of GBP reads driven by data-change-caused cross-invalidations will result in GBP read "hits." The XI read hit ratio is easily calculated using the SYNCHRONOUS READS DUE TO BUFFER INVALIDATION information in the DSNB773I message that is part of the -DISPLAY GROUPBUFFERPOOL MDETAIL output for each GBP: just divide the number in the DATA RETURNED field by the sum of the DATA RETURNED and DATA NOT RETURNED fields. I regularly see this ratio in excess of 80% for well-sized GBPs. If you see an XI read hit ratio that is substantially below 80% for one of your GBPs, consider making that GBP larger (if available CF LPAR memory permits).
There you have it. I hope that this information will be of use to you.

Sunday, November 29, 2015

DB2 for z/OS: How I Use -DISPLAY BUFFERPOOL Output

A couple of weeks ago, I was going over output of the DB2 for z/OS command -DISPLAY BUFFERPOOL with a group of people, and one of the meeting participants said words to this effect: "You've got some useful formulas and rules of thumb here. Could you write them down for us?" I did that, and it occurred to me that the information would be useful to others in the larger DB2 for z/OS community; so, I've packaged it in this blog entry.

First, enter the command the right way

Based on my experience, the most useful form of the -DISPLAY BUFFERPOOL command is as follows:

-DISPLAY BUFFERPOOL(ACTIVE) DETAIL

You actually want to enter the command twice, one hour apart. Why? Because the command output shows activity since the buffer pools were last allocated (which was probably when the target DB2 subsystem was last "bounced"), or since the command was last issued. If you issue the command once, then issue it again one hour later, the output of the second issuance of the command will show the preceding hour's worth of activity. That being the case, you can divide the activity counters by 3600 to get per-second figures, and those figures are very useful for performance monitoring and tuning work. [Note that the command output includes a DSNB409I message for each buffer pool, the text of which reads, "INCREMENTAL STATISTICS SINCE timestamp-value." Check that timestamp in the output of the second of the two issuances of the command, to verify that the command captured approximately one hour of activity.]

It might be a good idea to use -DISPLAY BUFFERPOOL to capture activity for a busy "online" hour (often a mid-morning or mid-afternoon hour) and for an hour during which a batch workload is particularly heavy -- one period might see more of a load on your buffer pool configuration than the other.

Something else to consider: some organizations have written programs that issue a pair of -DISPLAY BUFFERPOOL commands (again, separated by one hour) one or more times per day, and have a REXX routine that parses the output of the second issuance of a pair of commands and extracts useful information and inserts these values into a DB2 table or maybe a file. In that way, trends can be tracked and responded to with adjustments to the buffer pool configuration.

OK, with the output of the second issuance of a pair of one-hour-apart -DISPLAY BUFFERPOOL commands in hand, here's how you can put the information to good use:

  • Add up the size of all the buffer pools (in megabytes), to get the total size of the buffer pool configuration for the DB2 subsystem. That's easily done by multiplying the number of buffers allocated for a pool by the pool's page size; so, a pool with 20,000 buffers of 4KB apiece is sized at 80 MB. If you have a single production DB2 subsystem running in a DB2 for z/OS LPAR, I would look to move towards a situation in which the total size of the subsystem's buffer pool configuration is 30-40% of the size of the z/OS LPAR's memory resource. For example, if a z/OS LPAR with 100 GB of central storage holds one production DB2 subsystem, I'd aim for a buffer pool configuration size (that is, the aggregate size of all buffer pools allocated for the subsystem) that is in the range of 30-40 GB (if a z/OS LPAR holds more than one production DB2 subsystem, I'd want the total size of all the subsystems' buffer pool configurations to be not much more than 50% of the LPAR's real storage resource). If you're not in that 30-40% of memory "zone" at present, there's no need get there in one giant step from where you are. A series of steps (but pretty good-sized steps, not a lot of baby steps) would be fine. You should be most aggressive in growing the pools that have the highest total read I/O rates (see the next item). By the way, the aim here is to have a buffer pool configuration that makes really good use of an LPAR's memory for enhanced performance, while avoiding a situation in which LPAR memory is under too much pressure -- if the LPAR's demand paging rate, which can be obtained from an RMF (or equivalent z/OS monitor) CPU activity report, is in the low single digits or less per second, the memory resource is not under too much pressure.
  • For each pool, calculate the total read I/O rate. That's the sum of five numbers (for each pool) from the command output -- random synchronous reads, sequential synchronous reads, sequential prefetch reads, list prefetch reads, and dynamic prefetch reads -- divided by the number of seconds in the interval between the first and second issuance of the -DISPLAY BUFFERPOOL command (if that interval was, as suggested, an hour, the number of seconds would be 3600). Focus your attention first on the pools (if any) with read I/O rates in excess of 1000 per second, and see if you can get the I/O rate for those pools below 1000 per second by making the buffer pools larger. You can then turn your attention to pools with a read I/O rate in the hundreds per second, to see if the read I/O rate for those pools can be brought below 100 per second. If the read I/O rate for each of your buffer pools is below 100 per second, you are making exceedingly good use of LPAR memory.
  • The -DISPLAY BUFFERPOOL output shows the specifications for each buffer pool, and you should examine those. All high-I/O pools (total read I/O rate > 100 per second) should be defined with PGFIX(YES), and those page-fixed pools should be backed by 1 MB page frames (check to see that the value of the LFAREA parameter in the IEASYSxx member of the system's PARMLIB data set is large enough for this purpose). If your demand paging rate is very low (as noted previously, that would be low single digits per second or less), consider going with PGFIX(YES) even for low-I/O pools that have a lot of GETPAGE activity (more than 1000 GETPAGEs per second). Page-fixing these pools will deliver a CPU benefit if the pools are backed by 1 MB page frames.
  • Are any of the DB2 subsystem's pools used to "pin" objects in memory (i.e., to cache table spaces and/or indexes in memory in their entirety)? If yes, such pools should be defined with PGSTEAL(NONE), so that DB2 will know that they are intended to be "pinning" pools (the PGSTEAL(NONE) specification was introduced with DB2 10 for z/OS). Additionally, the target read I/O rate for a "pinning" pool is zero. If a "pinning" pool has a read I/O rate that is greater than zero, consider whether the pool needs to be enlarged so that all pages of all objects assigned to the pool can be kept in memory.
  • The counter for the number of times that the data manager threshold (DMTH) was reached should be zero for all pools. If that threshold is hit, either the buffer pool is way too small or the deferred write thresholds are way too high. [For a pool dedicated to 4KB-page or 32KB-page work file table spaces, the deferred write thresholds -- DWQT and VDWQT -- can be set to values higher than the respective defaults of 30 and 5, and doing this can result in some CPU savings, but don't take that idea too far. DWQT/VDWQT settings of 70/40 or even 80/50 should be OK for a work file-dedicated pool, but some sites have gone to 90 for DWQT for such pools, and have ended up hitting the data manager threshold as a result. Hitting DMTH causes the CPU cost of accessing data in a pool to jump, so make sure that you're staying under that threshold.]
  • The "prefetch disabled" counters -- no buffer, and no read engine -- should ideally be zero for all pools. If either is non-zero for a pool, either the pool is too small, or the VPSEQT threshold (default value is 80) is too low (so there are too few buffers available to hold pages read via prefetch, leading to elevated levels of prefetch read activity), or the deferred write queue threshold (DWQT) is too high (this would generally apply to a work file-dedicated pool, as mentioned in the preceding item).
  • For each pool, the number of asynchronous write operations should be significantly larger than the number of synchronous write operations. If this is not the case, the vertical deferred write queue threshold (VDWQT), or the horizontal deferred write queue threshold (DWQT), or both, should be lowered.

I hope that you find this information to be useful. Stay tuned for a companion entry in which I'll document the ways in which I use -DISPLAY GROUPBUFFERPOOL information (relevant to DB2 data sharing systems).

Thursday, October 29, 2015

DB2 for z/OS: DDF and Accounting Trace Records

I recently encountered a situation that served to illuminate a number of important aspects of application processing in the context of the DB2 for z/OS distributed data facility (DDF). I'll go through the scenario here, in hopes that you will find the information to be useful.

A systems programmer contacted me about a problem his organization had run into with a new application that accesses DB2 for z/OS by way of a network connection (and so uses DDF). The application drives large numbers of row-insert operations, and was generating huge numbers of DB2 accounting trace records (written to SMF data sets). In one test, executed in a pre-production environment, the application issued 140 million INSERT statements, and that resulted in DB2 cutting 140 million accounting trace records. The trace records filled the SMF data sets faster than they could be offloaded, and as a consequence some of the records were lost.

To reduce the torrent of trace records flooding the SMF data sets, the systems programmer changed the value of the ZPARM parameter ACCUMACC in the target DB2 subsystem to 100. In doing that, he made it possible for DB2 to "roll up" activity for as many as 100 DDF units of work into a single accounting trace record. That action addressed the immediate problem, but only for the test system: in the production environment, the organization had a requirement for one accounting trace record per DDF unit of work (ACCUMACC on the production DB2 subsystem was set to NO).

With roll-up of activity for multiple DDF transactions into a single accounting record not an option in production, the systems programmer looked into multi-row INSERTs (also referred to as "bulk inserts") as a means of reducing the volume of trace records that would be generated when the new application executed. He put this question to me: Could multi-row INSERT be utilized for a Java application accessing DB2 for z/OS-managed data via DDF? I responded affirmatively, and pointed out that information on multi-row INSERT for Java programs can be found in the DB2 for z/OS Application Programming Guide and Reference for Java (the DB2 10 manual can be downloaded from http://www-01.ibm.com/support/docview.wss?uid=swg27019288#manuals, and the DB2 11 manual from http://www-01.ibm.com/support/docview.wss?uid=swg27039165#manuals).

The systems programmer got together with one of the developers of the new application, and they ran a test in which 1000 rows were bulk-inserted into a DB2 table, in chunks of 100 rows. Instead of 1000 accounting trace records, execution of the test program generated 10 trace records. Here's the interesting part: the program was not issuing any explicit commits. That led the systems programmer to surmise that perhaps accounting record volume was being driven by the volume of INSERT statements issued by the application program. After all, when 140 million single-row INSERTs had been issued by the program, 140 million accounting records had been written to SMF. When 1000 rows were inserted by way of 10 bulk inserts of 1000 rows apiece, 10 trace records had been cut. I explained that accounting records are generated not by issuance of data-changing SQL DML statements, but by commits issued following execution of one or more SQL DML statements. [That's the way it works for application processes, such as DDF-using programs, that are transactional in nature -- a z/OS batch program is different, in that one accounting record will be generated when the program completes, even if the program issued hundreds of commits while executing.]. As the Java application developer was not issuing explicit commits from his program, I speculated that AutoCommit was enabled on the client side of this client-server application. Not so. The systems programmer verified that AutoCommit was set to FALSE for the new application. This information left us scratching our heads.

The sysprog next suggested that perhaps the DBAT used by the application was going inactive after each INSERT, and the accounting records were being generated as a result of DBATs going inactive (DBAT is short for database access thread -- the kind of thread used by applications that connect to DB2 via DDF). Not the case, I told him. DBATs don't go inactive in the usual sense. A connection (from a client application to DB2) will go inactive when a DDF transaction completes. Yes, the DBAT that had been used to service the transaction will go back to the DBAT pool at that time, but I'd say that "disconnected" is a better term that "inactive" for the pooled DBATs (I'm talking here about "regular" DBATs -- a high-performance DBAT will remain associated with the connection through which it was instantiated, and that connection will not go inactive until the high performance DBAT is terminated after having been reused 200 times). In any case, I said, the key is transactions completing, not connections going inactive. Connections going inactive, like accounting records getting cut (with ACCUMACC set to NO), are a response to DDF transactions completing, and transaction completion involves commit processing.

Asked the sysprog: Could packages bound with RELEASE(COMMIT) be the cause of the large volume of accounting trace records associated with execution of the new application? My response: No. Accounting trace records are generated at DDF transaction boundaries, and RELEASE(COMMIT) and RELEASE(DEALLOCATE) do not affect transactional boundaries. Commits establish transaction boundaries. Could DB2 itself be driving the commit activity that was in turn driving trace record generation? No. DB2 does not cause commits. DB2 responds to commits (an exception to that rule is the commit driven upon completion of a DB2 stored procedure defined with COMMIT ON RETURN YES).

Finally, in searching for the elusive commits that we knew were coming from somewhere, the systems programmer and his application development colleagues found the answer: the new application was using the open-source Spring Framework, and in response to the application indicating completion of a transaction, the Spring Framework would drive a commit that in turn would drive generation of a DB2 accounting trace record. With this now clarified, the application team can vary commit frequency as desired to balance volume of trace records generated (don't want too many commits) against accumulation of DB2 child X-locks associated with INSERT processing (don't want too few commits).

The key take-away here: for a lot of reasons, issuance of commits is really important for DB2-accessing application processes. Sometimes, developers of DDF-using applications have direct visibility of commit-issuing logic. Other times, use of something like a framework between a client program and a DB2 for z/OS data server (and the Spring Framework is just one example of this kind of thing) abstracts issuance of commits in a way that removes commit visibility from an application developer. When that is the case, commit frequency can still be controlled and changed as needed, but that requires an understanding of how interaction by a program with the framework drives the commits that flow to DB2. If your DB2-accessing client programs interface with an application framework, take the time to understand how commits are made to flow to DB2. In doing that, you'll avoid surprises and you'll be less likely to spend time scratching your head and spinning your wheels.

Tuesday, October 20, 2015

DB2 for z/OS: Getting a Handle on a CPU-Constrained Environment

z/OS systems are famous for their reliability, and part of that story is the ability of z/OS to accommodate surges of application activity: you can pile more and more work onto a z/OS system, and it will keep on trucking, slicing the "pie" of available processing capacity into smaller and smaller slices to keep a workload moving along. That's highly preferable to failing as a result of being overloaded, but there is a point at which applications running on a z/OS system will perform in a sub-optimal way if the system's processing capacity is undersized relative to the workload that the system is being asked to execute. If you work with DB2 for z/OS, you want to be able to spot a situation in which DB2 performance is being negatively impacted by a shortage of CPU capacity, so that your organization can take corrective action. Through this blog entry, I want to help you in that endeavor.

Recognizing indications of CPU overload in DB2 monitor reports and displays

First, know the signs of strained CPU capacity that are reflected in DB2 monitor data. With respect to overloaded general-purpose engines, the key indicator is in-DB2 not-accounted-for time. You can get that metric from a DB2 monitor-generated accounting long report (also known as an accounting detail report) or an online display of DB2 application workload activity. My preference is to use a report, and what I particularly like to use is an accounting long report with information ordered by (or the term may be "grouped by," depending on the DB2 monitor product in use at your site) connection type. That kind of report will contain a sub-report for each connection type used with the target DB2 subsystem: one for the CICS-DB2 workload (if you use CICS with DB2), one for the IMS-DB2 workload (if that's present in your environment), one for the DRDA workload (i.e., the DDF workload -- again, if you have such a workload), one for batch jobs that connect to DB2 via the call attach facility, etc. Check the in-DB2 not-accounted-for times for your higher-priority transactional workloads, such as CICS-DB2, IMS-DB2, and DRDA. Average in-DB2 not-accounted-for time should be reported by your DB2 monitor, but if you don't see a field called not-accounted-for time among the in-DB2 (also known as class 2) times, you can easily derive the value yourself: just subtract average in-DB2 CPU time (that's general-purpose engine CPU time plus zIIP, or "specialty engine," CPU time) from average in-DB2 elapsed time, and then subtract from that figure the total class 3 suspend time (class 3 times include wait for synchronous read time, wait for other read time, wait for lock/latch time, etc.). What's left is average in-DB2 not-accounted-for time. If that time is more than 10% of in-DB2 elapsed time for a higher-priority transactional workload (CICS-DB2, IMS-DB2, DRDA), you have an indication that overloaded general-purpose engines are having a detrimental effect on application throughput. [An in-DB2 not-accounted-for time that is more than 10% of a batch workload's in-DB2 elapsed time is less of an issue in my eyes, owing to the fact that 1) batch programs often run at a lower priority than transactional programs, and 2) organizations sometimes intentionally push general-purpose engine utilization close to 100% during periods of heavy batch processing.]

Don't stop with an assessment of CPU constraint related to general-purpose engines. You need to check out the situation regarding a mainframe's zIIP engines (if any), as well. Here, the most useful indicator is a field in the "class 1" column of a DB2 monitor-generated accounting report (this field might be available via online displays as well, depending on the DB2 monitor product in use at your site). The field to which I'm referring might have a label (depending on the monitor) like SECP CPU, with the "SE" being short for "specialty engine" (that would be zIIP engine) and the "CP" standing for "central processor," which is a reference to a general-purpose engine. The value of the SECP CPU field shows the average class 1 CPU time that is associated with zIIP-eligible work that ended up being dispatched to a general-purpose engine. If you're looking at a DB2 monitor-generated accounting long report with data ordered by connection type, look at the sub-report for the DRDA connection type (SQL from DRDA requesters tends to be the main DB2-related driver of zIIP engine utilization). Referring to class 1 times (versus class 2), call the value in the SECP CPU field A, and the value in the SE CPU TIME field B (this is CPU time for zIIP-eligible work that was dispatched to a zIIP engine). Perform the simple calculation A / (A + B), and what you get is the percentage of zIIP-eligible work that ran on a general-purpose engine. I call that the "zIIP spill-over percentage," and if it is higher than 5% for a workload that drives a lot of zIIP usage (and again, the prime example there, from a DB2 perspective, is the DRDA workload), that's an indicator that your zIIP engines are over-used to the point of reducing overall throughput for DB2-accessing applications.

Why does zIIP-eligible work get dispatched to general-purpose engines, leading to a non-zero value in the SECP CPU field for your DRDA workload? That happens when a zIIP-eligible piece of work is ready for dispatch but no zIIP engine is available to process that work. Why can more than a small amount of zIIP-eligible-on-general-purpose time be a performance-impacting issue? Because the system will wait a few milliseconds before dispatching a piece of zIIP-eligible work to a general-purpose engine (this to, presumably, give a zIIP engine a chance to become available in that few-millisecond window). Why is that a problem? Because prefetch processing is 100% zIIP-eligible starting with DB2 10 for z/OS, and if zIIP engines are busy enough to cause more than a small amount of zIIP-eligible work to be redirected to general purpose engines, that can slow prefetch processing. That, in turn, can reduce throughput for applications that drive a lot of prefetch read activity (and don't think that's just batch -- plenty of online transactional applications are characterized by significant prefetch activity in addition to synchronous database reads).

Check your z/OS monitor for engine-busy information

If your DB2 monitor is pointing to a shortage of general-purpose and/or zIIP engine capacity, use your z/OS monitor to check on the utilization of the engines assigned to the LPAR in which the DB2 subsystem is running. In doing this, use the right monitor-generated report for the task at hand. Take IBM's RMF (Resource Measurement Facility) monitor, for example. RMF can be used to generate a number of different reports pertaining to processor utilization. All are useful, but for different purposes. If you want to confirm a general purpose engine-utilization problem suggested by elevated in-DB2 not-accounted-for times as reported by your DB2 monitor, look at an RMF CPU Activity Report (information about that report -- and others -- can be found in the IBM Knowledge Center on the Web). In that report (generated for the LPAR of interest), if the TOTAL/AVERAGE value for the general-purpose engines (identified as CPs, short for central processors) in the MVS BUSY column is greater than 85% (for a transactional workload) AND the value in the % column of the IN READY row for "number of address spaces" <= N in the SYSTEM ADDRESS SPACE ANALYSIS section of the report is less than 80%, you likely have work delays that are caused by CPU contention (an average CP utilization greater than 85% can be OK during periods of heavy batch processing).

You'll also see in an RMF CPU Activity Report a TOTAL/AVERAGE value in the MVS BUSY column for the specialty engines (zIIP engines, identified as IIPs) assigned to the LPAR. How high can this value be before performance problems arise? That depends very much on the number of zIIP engines available. As I mentioned previously in this entry, you can get from your DB2 monitor (or should be able to get) the numbers (from an accounting detail report or online display of accounting information) that you need to calculate the "zIIP spill-over percentage" for a zIIP-driving workload, such as the DRDA workload (that ratio indicates the percentage of zIIP-eligible work that ends up being executed on general-purpose processors). I noted that you want the "zIIP spill-over ratio" to be not more than 5%. If the z/OS LPAR in which the DB2 subsystem of interest is running has a single zIIP engine, you could see an undesirable zIIP spill-over percentage if that zIIP engine's utilization is in the 30-40% range. Contrast that with a real-world situation I encountered recently in which average utilization of an LPAR's zIIP engines was 78%, while the zIIP spill-over percentage was only 1.6%. How could that be? The LPAR has 9 zIIP engines, that's how. It's queuing theory, folks. Even though these zIIPs are running at a relatively high level of utilization, because there are so many of them a zIIP engine is almost always available on this system when a zIIP-eligible piece of work is ready for dispatch. Note that starting with the EC12 z Systems servers (and continuing with the z13 servers), a mainframe can be configured with up to two zIIP engines per general-purpose engine (that ratio previously couldn't go above 1:1). The more zIIPs you have, the hotter you can run them while still avoiding an overly large degree of zIIP spill-over.

Summing it up

The fact that z Systems remain reliable servers even when running at very high levels of utilization is a very good thing, but even mainframes can be loaded with work to an extent that application performance will be sub-optimal. Using DB2 and z/OS monitor data, keep an eye on your system to make sure that a heavily-loaded server doesn't become an overloaded server. With effective capacity planning, you can keep the supply of MIPS sufficiently ahead of demand to deliver topflight performance for your mainframe-based applications.