Use chunky iteration for performance

If you have an iterator over some collection of data, but have performance problems from paying a function call for accessing each member of the collection, you might be tempted to make the internal data structures of that collection public for direct access.

Instead, to preserve data abstraction and still improve performance, amortize the function call over multiple data members, by introducing a chunky iterator. A chunky iterator returns multiple data elements at one time. The iterator, not the client, determines the number of elements to return based on the internal structure of the data collection, which the client knows nothing about. The iterator also indicates how many elements were returned with each call.

For example, consider an iterator over a string of characters. Making a function call for each character is very expensive. But exporting the data structure directly precludes using some other data structure in the future. By using a chunky iterator, you can get the best performance possible:

Where the string is just an array of characters, the chunky iterator returns a pointer to the array and the length of the array, thus returning the whole string at once.

For a string that consists of several such arrays (such as text stored in a recursive run array or other noncontiguous chunks), the iterator returns each array successively, indicating how many characters are in each one.

If a string does not use an array of characters internally at all, the iterator has an internal array of characters of some predetermined size (large enough to amortize the costs of the iterator function calls). For each iteration, it copies characters out of the string into its internal array, then returns a pointer to the array and indicates the number of characters.

In each case, the client knows nothing about the internal data structures, and still gets good performance. It's even possible to preserve the easier one-at-a-time interface by using inline functions that turn around and call the chunky interface (in fact, it is preferable because it is simpler for clients). This is the technique used by TStream and the C stdio library functions, such as getchar().


[Contents] [Previous] [Next]
Click the icon to mail questions or corrections about this material to Taligent personnel.
Copyright©1995 Taligent,Inc. All rights reserved.

Generated with WebMaker