Document Numbering and Tracking Phillip A. Covington


Personal Web Site
While other components of the document management process are important, none would be possible without a central database within which is kept all of the critical information about all documents in the system, including the keywords used to describe each file and the location where it is stored. Most important, is that a unique ID gets assigned to each document and the file containing that document. In most systems the name assigned to the document itself and the filename are one in the same. Often the ID is simply called the "Document Number." However, other popular terms include "Document ID," "Document Source Number," "Document Tracking Number," "Document Control Number," etc. As for the digits and characters that actually go into making up a document number, and what it actually looks like, that is determined by the number and type of documents that need to be managed. Though numbering systems vary greatly from one organization to another all systems usually utilize one of two basic formats: the plain number, or the coded number. [Home]

This page last updated
29-Jan-1998



The "Plain" Document Number Format

The plain number format simply uses a straightforward number for each document. The only difference between these and other kinds of numbers you might encounter on a daily basis are that they often contain letters, underscores, hyphens or possibly one of the few other special characters computers allow to be in filenames. Most commonly such numbers are preceded by one or more capital letters. An excellent example of this type of system can be seen in use at the Web sites of several large software companies for managing the documents in their technical support Knowledge Bases (just another way of saying "database"). Each document in the Knowledge Base (KB for short) contains helpful instructions, tips, technical specifications, and information about possible or confirmed bugs. If you are using any Microsoft product, for instance, you can access the above types of documents (usually called articles) on their KB at this address:

http://www.microsoft.com/kb/default.asp.

In the case of the Microsoft KB all article numbers are in the format of a 6-digit number preceded by a capital letter "Q" (although as the number of documents grows they may go to the next letter in the alphabet or add a 7th digit for new articles). So a complete number might look something like this: "Q999999." This number is just used as an example and is not a valid KB article to the best of my knowledge. However, if there is something you've needed to know about a Microsoft product, their KB is an excellent resource.

Pros and Cons of The Plain Number Format

There is really only one advantage to the plain number format and that is in its simplicity and compactness. The plain number format is best applied to situations like those above where it is being used to manage similar or identical documents. Even so, the plain number format is still widely used for general applications where the purpose and type of documents vary. A second possible advantage is that ID's using the plain number format are usually easy for employees or customers to remember and type.

The disadvantages of the plain number format include the fact that while the numbers may be easy to remember and type, because the number contains little information about the nature of the document it is easier for errors to occur if the wrong number is entered. In the example of the KB article numbers above the only distinctive identifying information is that it must be preceded by a capital letter "Q" and followed by 6 digits. If you saw a number without the "Q" or that had more or less than 6 digits you could probably safely assume it isn't a Knowledge Base document. However, any other errors would be hard to catch. We'll use "E" (for our example here) to create a document number example below that illustrates a common potential error.

Let's assume that we have two documents. The first has the number "E589839" and is an example help document for instructing new employees. The second, "E598939," is a memo that was circulated last week concerning an upcoming increase in employee contributions for the company's health care program. Notice how easy it would be to transpose the "98" with the "89," and vice versa? And because the the number contains no information which serves as a clue to its contents the mistake might not be caught until the wrong document had actually been circulated. This is often what happens when you get the "wrong" junk mail in your mailbox, or the wrong kind of letter or statement from your bank. With the plain number format virtually all information concerning the nature of the document is stored within the document management database only.

The "Coded" Number Format

As the name implies, within this number format is coded information describing details about the nature of the document it references. Unlike the plain number format which looks more or less like an ordinary number, coded numbers can take on almost any appearance depending upon the application. Even though not a regular "document," an example almost everyone is familiar with is that of credit-card numbers. More information is built into these and other numbers than some realize. However, most know that the first 4 digits of a credit-card number identify the type of card. Most Visa cards, for instance, start with 4387, while MasterCard uses 5407. Discover, American Express, and all of the others are similarly distinct. Have you ever noticed how if you order something over the phone you are hardly ever asked anymore what type of card you are using? That's because experienced order takers have memorized which cards start with which numbers, so they know almost as soon as you start talking.
Another excellent example though not directly related to documents is that of telephone numbers. All telephone numbers are coded exactly the same way: a 3-digit area code (which identifies which part of the state), a 3-digit prefix (which identifies which part of which city), and a 4-digit number which identifies the individual or business. Lastly, another common example which IS VERY directly document related, zip codes! Zip codes form the generic document management system the US Postal Service utilizes to manage YOUR documents. Within the Zip+4, 9-digit number is coded enough information so that even without your address mail can be sorted as close as your specific residential block, or an individual commercial building. In both the examples of credit-card numbers and zip codes you can see how the coded format of a number can provide helpful information even aside from the computer that manages those numbers.

The pros and cons of coded format numbers will be covered shortly. However, it is worth noting the major drawback here: coded format numbers require MUCH more careful planning from the very beginning because once a format is decided upon it can be VERY difficult to modify or expand upon. With plain format numbers, such as the "Q" numbers above, when the numbers' limits are reached, Q999999 in this case, a new prefix letter (or letters) can simply be added to the front, or, more likely, more digits can simply be added to the end. It is easy to see, though, how each of the coded format numbers provided as examples above would be difficult to modify or expand. The Year 2000 problem is the result of a poorly planned coded number format.

Another looming problem, which I spotted years before hearing anyone else talk about, is one that I can hopefully draw more attention to by using here as an example: Phone numbers! I call it simply the "ACPS" (Area Code Prefix Shortage) problem. In my opinion the ACPS problem is potentially as serious as Year 2000, with the only difference being that there is no specific deadline; the problem just continues to be ongoing and to get worse. Just as poorly (though not intentionally) planned coded number systems were responsible for limiting PCs and other small computers to an inefficient 640K-based memory system that is still with us today, and ditto for the Year 2000 date format problem, so it is with our country's phone numbering system.

The current 10-digit phone numbering system can only accommodate so many numbers within each area code. When the phone companies run out of numbers, which is frequently these days, the only option is to add a new area code and disrupt existing customers by changing the area codes they already have. Think of the costs associated with reprinting business cards, letterhead, brochures, etc., and updating directories each time this happens. What is really needed is a new coded number format. By putting this problem off (as most organizations did with Y2K) the phone companies are also denying all phone customers something many have wanted for a long time: the Universal Phone Number.
With a UPN customers could be assigned a phone number one time which they could keep for a lifetime, no matter where in the country they might move! To accomplish this the current format of the phone number would have to be changed, and possibly a digit added. However, most wouldn't mind memorizing an extra digit for the convenience of a lifetime phone number. In addition, the cost savings from not having to reprint and/or update items or databases because of changed phone numbers would be enormous. Although many phones would have to be replaced, as with the Year 2000 problem, ACPS is not a technology issue. Instead it is just a matter of people not wanting to make the changes because they don't have to.

While I did just take a paragraph or two to editorialize a bit, my purpose even in offering my perspective on the ACPS issue was to really drive home the importance of thoroughly planning any new system before you implement it. Especially if you are leaning toward using a coded format numbering system because of its advantages, it simply cannot be overstated that the major disadvantage is poor planning.

Again, the details of any coded number format you might adopt will depend upon your specific application and needs. The coded format number is the one I personally use for my own document management system. An example of an actual plain number format system, Microsoft's Knowledge Base, was given above. Below you can look at the structure of the most flexible of several coded format systems I've personally designed over the years. A number using this system looks like this:

S__00563300_0137_IBM010_V001_01JAN2000_
20709_PAC_STD_SO3-rct3

In case you're thinking: "that's a long number!" You're right, it is! However, coded format numbers by nature tend to be longer. How long depends on how much information you feel it is important for the number to hold, the size and potential future growth of your organization, and how flexible you want the system to be into the foreseeable future. The above number has only wrapped to the next line because of the narrow margin used here to make text easier to follow and size within a browser's window. Also smaller type can easily be used to reduce the size of the number when it is used on printed correspondence. However, I know of several organizations with coded format numbering systems than take up two or even three lines when placed at the bottom of printed correspondence!
The format of the above number is divided into 11 sections which are defined below. Remember that most computer numbering systems, including this one, start at 0, not 1. Note also that this system is not in use at IBM. I just used IBM in this example because it happened to neatly match the 3-letter format of the code. For most organizations, which have a longer names, the name would be abbreviated to form a 3-digit name code.



S__00563300_0137_IBM010_V001_01JAN2000_
20709_PAC_STD_SO3-rct3


S__ Routing Status Code: Sent, Received, etc.

00563300 The actual document number

0137 The ID of the Doc. Mgmt. network server

IBM The organization's name or dept. code

010 The office's country code

V001 Document version number

01JAN2000 Date (Year 2000 compliant)

20709_PAC Employee ID (me in this case)

STD Type of document: Standard, Form, etc.

SO3 Security level (organization dependent)

rct3 Document revision information




Note that revision information, "rct3," is not an actual part of the main document number, but is suffixed, where needed, within the body of each individual document.

The above coded format numbering system is flexible, generic in that it can be used by different organizations, and has enough built in capacity for growth so that it is unlikely to be outgrown. It can handle any size organization, from the smallest to the largest, with many offices spread over many locations and even overseas. I provided the above number as an example only since each organization must determine what will work for them; so I won't go into further detail about how various sections of the number are coded. However, the above system can accommodate in excess of 1 billion documents being generated per year. More than all but the largest of organizations might produce, and sufficient to avoid a Year 2000 type of limitation.

As we've already seen coded format numbers are less prone to human data entry error because it's easier to spot digits or characters that look out of place within the sequence. Other advantages include the ability to locate many documents manually when appropriate, or in the event of a systems failure should the document management server go down temporarily. And the ability to easily ascertain when talking with a recipient of one of your documents which version they have even if you don't happen to be at your computer.

Pros and Cons of The Coded Number Format

We've already covered the major disadvantage of coded format numbers: that they are rigid and hard to change if not thoroughly planned from the beginning. Another disadvantage is that if you don't have software capable of generating the numbers and/or managing filenames that it can become tedious, error prone, or even impractical for users to manually work with such numbers. One other possible disadvantage is fairly minor: the filenames may sometimes be too long to easily be seen when browsing files, etc. Overall, however, the advantages of the coded format number, if properly planned and implemented, largely outweigh the disadvantages.



Return To Doc. Mgmt. Contents

Return To Homepage

Copyright © 1998 Phillip A. Covington