"duri" and "tdb" URN namespaces based on dated URIs
Adobe Systems345 Park AveSan JoseCA95110US+1 408 536 3024LMM@acm.orghttp://larry.masinter.net
Applications
This document defines two namespaces of URNs, based on using
a timestamp with an (encoded) URI. The results are namespaces in
which names are readily assigned, offer the persistence of reference
that is required by URNs, but do not require a stable authority to
assign the name. The first namespace ("duri") is used to refer to
URI-identified resources as they appeared at a particular time. The
second namespace ("tdb") is useful as a way of creating URNs that
refer to physical objects or even abstractions that are not
themselves networked resources.
The definition of these namespaces may reduce the need to
define new URN namespaces merely for the purpose of creating stable
identifiers. In addition, they provide a ready means for identifying
"non-information resources" by semantic indirection.
This document is not a product of any working group. Many of
the ideas here have been discussed since 2001. This document has
been discussed on the mailing list <uri@w3.org>. The URN namespaces defined here solve several related problems.
The URN specification allows for many
URN namespaces, and many have been registered. However, obtaining an
appropriate URN in any of the currently defined URN namespaces may
be difficult: a number of URN namespace registrations have been
accompanied by comments that no other URN namespace was available
for the class of documents for which identifiers were wanted.
RFC 1737
defines several requirements for Uniform Resource
Names. In particular, it requires "persistence":
Persistence: It is intended that the lifetime of a URN be
permanent. That is, the URN will be globally unique forever, and
may well be used as a reference to a resource well beyond the
lifetime of the resource it identifies or of any naming authority
involved in the assignment of its name.
Many people have wondered how to create globally unique and
persistent identifiers. There are a number of URI schemes and URN
namespaces already registered. However, an absolute guarantee of
both uniqueness and persistence is very difficult.
In some cases, the guarantee of persistence comes through a promise
of good management practice, such as is encouraged in "Cool URLs
don't change". However, relying on promise of good
management practice is not the same as having a design that
guarantees reliability independent of actual administrative
practice.
A primary design goal for URIs is that they are intended to mean the
same thing, no matter in what context they appear: a "Uniform" way
to Identify a Resource. However, even when URIs have Uniform meaning
from the point of view of the source of the reference, they don't
guarantee stability over time. Despite best efforts and intentions,
identifying information can change in unpredictable ways: domain
names can disappear or be reassigned, name assigning organizations
can change structure, responsibility, disappear, merge, or change in
unpredictable ways.
There is a significant dependence in the interpretation of many URNs
with the concept of "naming authority". The authority is presumably
some individual or organization both to insure uniqueness of
assignment and also to help with understanding the meaning of the
link between the name and the named.
However, authorities, whether individuals or organizations, have a
lifetime, and must be consulted at some point to understand the
bindings. The functioning of names as unique identifiers and holders
of meaning depends on having a reliable infrastructure of consulting
the authority or the authorities records to determine the thing
referenced.
The description of URIs describes a
range for 'Resource' that is quite broad:
This specification does not limit the scope of what might be a
resource; rather, the term "resource" is used in a general sense
for whatever might be identified by a URI. Familiar examples
include an electronic document, an image, a source of information
with a consistent purpose (e.g., "today's weather report for Los
Angeles"), a service (e.g., an HTTP-to-SMS gateway), and a
collection of other resources.
A resource is not necessarily
accessible via the Internet; e.g., human beings, corporations, and
bound books in a library can also be resources. Likewise,
abstract concepts can be resources, such as the operators and
operands of a mathematical equation, the types of a relationship
(e.g., "parent" or "employee"), or numeric values (e.g., zero,
one, and infinity).
One might use a URI such as "mailto:" email address to identify
a person, or a "http:" URI to identify an abstract comment.
However, this leaves the question of how one might identify, within
the same context, both the system mailbox and the person to which
it is assigned, or the web page at a http URI and the person or
concept it describes. The "tdb" URN scheme allows ready assignment
of URIs for abstractions that are distinguished from the media
content that describes them.
The goal, then, of the "tdb" URN scheme proposed below is to
provide a mechanism which is, at the same time:
permanent: The identity of the resource identified
is not subject to reinterpretation over time.
explicitly bound: The mechanism by which the identified
resource can be determined is explicitly included in
the URI.
useful for non-networked items:
Allows identification of resources outside the network:
people, organizations, abstract concepts.
no administration:
The mechanism does not depend on reliable administrative processes
of authorities for either assignment or interpretation.
It is traditional in convention references and citations in printed
works to include the date of publication; this practice serves the
important purpose that the context of the naming can be determined.
The "duri" URN namespace takes the form:
where <date> is a digit string corresponding to a date (),
and an <encoded-URI> is an absolute URI-reference in
which any character excluded from URN syntax has been escaped
().
The meaning of a duri is "the resource (or fragment) that was
identified by the <encoded-URI> (after hex decoding) at the very
last instant of the date(time) given".
For example, 'urn:duri:2001:http://www.ietf.org' is a persistent
identifier to 'http://www.ietf.org' as of the very last instant of
the year 2001. A duri may not be a resource locator in a practical
sense if the time of location has not yet arrived or has
passed. However, is an acceptable resource identifier, and fulfills
all of the requirements for URNs .
The second URN namespace defined is a parallel space which is useful
for describing entities, concepts, abstractions, and other items
which are not themselves network accessible resources, but have been
at some point described by network accessible resources.
The "tdb" namespace designates the "thing described by" a resource at
a given URI at the given time. This URN namespace is described by
'tdb', e.g.,
with the same syntactic rules as 'duri'.
The intent is to use the inversion of "is a document about". It is
common practice to give a reference for a concept by including a
pointer to a document, segment, phrase that defines the concept.
"tdb" attempts to capture this practice in URI space.
For example, "urn:tdb:2001:http://www.ietf.org" can be used to
designate the Internet Engineering Task Force organization, at least
as it was described by or referenced by its home page at the last
instant of 2001.
The "tdb" namespace differs from most other URN methods for
identifying abstractions because the designation of what is actually
identified by the tdb doesn't depend on knowing the intention of the
"assigner" of the identifier. Unlike many of the alternatives
proposed, the identification is not dependent on the context of use.
The "tdb" namespace can be thought of as adding a level of
semantic indirection to URI resolution. While one could imagine using 'tdb'
without a date, it would leave the possibility that a reference that
is unambiguous at one time might become ambiguous at some other
time. There are two ways that the date is useful for "tdb" URNs:
it fixes the time of access of the resource, for variable descriptions,
and it fixes the time of interpretation, for descriptions whose
meaning (in natural language) might vary.
Both "duri" and "tdb" URN namespaces require that some characters in
the URI references be encoded.
The characters that must be encoded are:
All characters marked excluded in
RFC 2141, section 2.4:
"&<>[]^`{|}~
These are excluded because they are not allowed in URNs.
The character "#"
Note that <encoded-URI> can include a
fragment identifier; the "#" character used to delimit it must
be encoded. This feature is intended for use with "tdb", where
the fragment identified might contain the description. Including
an encoded "#" with a "duri" is not as useful, since the fragment
identifier might as well be applied to the duri itself.
The character "%"
The encoded-URI can itself contain encoded characters, which are
encoded with the same method. To insure that decoding happens at
the right level of processing, the "%" itself must be encoded.
Unfortunately, there may well be cases where there is a double
encoding of characters, first to construct the embedded URI
itself and second to then embed the URI within the tdb or
duri URN.
The URN specification
discourages the use of "/" in URNs because,
in general, there is no good interpretation of hierarchy and
relative URIs for assigned names. However, for the particular case
of duris (at least), there seems to be no good reason to avoid the
"/" because it corresponds fairly naturally (in many cases) to the
hierarchy of the original space.
Note that because of this, "duri" URNs can actually be used with
relative URI references with some amount of reliability. (The double
encoding of previously encoded URI characters causes some problems.)
A <date> is a simple expression of date, optional time, with
arbitrary precision. The goal is to allow relatively short
expressions of dates with no ambiguity, and with arbitrary
precision.
The representation of a date or time refers to the very last instant
of the given date/time range at the resolution supplied, so that
1999 and 1999123123595999999 are equivalent. If necessary, "dates" can
include times and even fractional times, so that a generator of
duris can be arbitrarily precise.
Dates are interpreted relative to International Atomic Time (TAI)
. The syntax and semantics are similar to
those in RFC 2550; in particular,
using TAI avoids ambiguity about time zones and difficulties with
leap seconds.
The intent of "duri" and "tdb" is to use them with embedded URI
references that identify documents or document fragments. That is,
they are most useful for information resources.
For example, use with a "http" URI can be used to refer to a web
page or the subject of a web site at a given time. This can be a way
of referring to a web site at some time in the past, or an
organization that has changed or merged.
Local systems that have known-to-be unique host names can use "file" URIs
with "tdb", for example, since this use is primarily focused on providing a unique way of
identifying an abstraction, even if the referent of the abstraction
is not widely known. (Using 'file:' URIs in this way without a fully
qualified domain name is not recommended, because the interpretation
is not uniform.)
Some URI schemes are more problematic. For example, using tdb
or duri with an embedded 'urn:' might not seem to be too useful.
But it might be useful where the assignment of names in a URN
namespace are not, in practice, permanent, or that one might want to
refer to the assignment as of a given date. In this case, it is
possible to use a "urn" within a "duri", e.g., might be used to refer to "the document that was STD 50 in effect as
of the last instant of 2000".
One might consider using "tdb" with "data" to designate concepts
that can be described uniquely briefly inline. For example, names the concept described by the (text/plain) string "The US
president" at the very last instant of 2001. (Note the awkward
double quoting of space as "%20" and then the "%" as "%25".) Of
course, this practice is only useful if the referent of the data is
(or was at the time) completely unique. Since "data" does not
contain a way to designate content-language, the string in question
would have to not be ambiguous as to its language. In the case of
'data', there is no assigning authority at all; the interpretation
of the 'tdb' URN depend on the interpreting community.
Some have suggested that perhaps only one of "tdb" or "duri" might
have a date parameter, e.g., allow "tdb:<encoded-URI>" as a URI
scheme, and "urn:duri:<date>:tdb:<encoded-URI>" if one
really wanted to fix the date of interpretation. While doing this
might be possible, it seems that the usage pattern which allows the
date to be omitted when needed satisfies the instances where the
resource doing the description is permanent to stand.
There are actually two dates to consider, with "tdb". There is
the date that the resource is obtained, and there is the date
that the description it makes is read, understood, and used to denote.
Normally in a literary work in natural language which makes
a reference to another work, both the reference itself and the
work referenced are dated, e.g., a footnote in an article
written in 1967 might talk about a "private communication" which
itself had a date. The difference between a URI and a conventional
literary reference is the desire to be able to extract the URI
from its context and still retain its meaning.
Dates far in the future are suspect, because the meaning of the duri
or tdb cannot readily be determined in advance reliably. Dates
whose range ends before assignment of the resource to the embedded
URI SHOULD NOT be used, because the meaning of the reference is left
in question. For example, using http URIs with a date range which
ends before a web service was available at the given URI doesn't
make much sense.
However, although these practices are NOT RECOMMENDED, there is no
assurance that they haven't been used; by itself, a duri/tdb does not
constitute an assertion that the encoded-URI was available or
assigned at the date specified.
Note that the use of the "very last instant" allows for the
conventional bibliographic convention that a work published
in 2009 can use "2009" as the date string, to refer to the
work in the year of publication.
Because of the many possible schemes that can be used in the
<encoded- URI> portion, there should be no difficulty in almost any
computational process being able to assign duris or tdbs at will. Of
course, it is necessary for there to be some resource which is
available at some point in time, and to have a clock which is
accurate to the granularity of the frequency of assignment.
There no accurate resolution servers for duri or tdb URNs. However,
duri might be "resolvable" in the sense that a resource that was
accessed at a point in time might have the result of that access
cached or archived in an Internet archive service. See, for example,
the "Internet Archive" project . A
"tdb" is only tesolvable in the sense that if the corresponding duri
can be esolved, it may be possible that the result can be accessed
and interpreted.
Clients without access to an Internet archive service might take the
decoded <encoded-URI> of a duri and attempt resolution of
*that* identifier. This will give an approximation whose reliability
depends on the amount of time elapsed since the date indicated.
There are a number of proposals for URN schemes that create otherwise
unbound "names", where the URN scheme only provides for uniqueness.
Neither "duri" nor "tdb" intrinsically have the property that the
names assigned are without any resolution semantics. This is
intentional; it's difficult to create names that carry no semantics
whatsoever about the authority that assigned the name and the
intention of the authority for what the name should designate.
One might consider the date in a duri/tdb to be just one piece of
additional metadata about the encoded-URI, and consider adding other
pieces of metadata as annotation.
However, the use of the date in a duri/tdb is intended primarily as a
mechanism of accomplishing uniqueness over time. No other bit of
metadata or description readily fills that purpose. Further, the date
is not descriptive (an assertion about the encoded-URI) but merely
refining.
Many applications of URIs already provide a context of date. For
example, one could imagine a hypertext system where the URIs contained
within a document were intended to refer to the resources as of the
date of the enclosing document. This would be a reasonable
interpretation of URIs within an Internet archive system, for example.
And some applications of URIs arguably already contain the level of
interpretive indirection that is explicit with "tdb". For example,
one might consider the use of URIs as namespace names within XML
as a reference to the "thing
described by" the URI used.
The "tdb" scheme introduces a level of semantic indirection. The
puzzles and confusions about use and mention, name and reference,
and levels of indirection have been puzzling and amusing for quite a
while.
"It's long," said the Knight, "but it's very, very beautiful. Everybody that hears me sing it--either it brings tears into their eyes, or else--"
"Or else what?" said Alice, for the Knight had made a sudden pause.
"Or else it doesn't, you know. The name of the song is called 'Haddock's Eyes.'"
"Oh, that's the name of the song, is it?" Alice said, trying to
feel interested.
"No, you don't understand," the knight said, looking a little
vexed. "That's what the name is called. The name really is 'The
Aged Aged Man.'"
"Then I ought to have said 'That's what the song is called'?" Alice corrected herself.
"No, you oughtn't: that's quite another thing! The song is called 'Ways and Means': but that's only what it's called, you know!"
"Well, what is the song, then?" said Alice, who was by this time completely bewildered.
"I was coming to that," the Knight said. "The song really is 'A-sitting On A Gate': and the tune's my own invention."
"duri" requested.
Registration Version: 1
Registration Date: 2001-08-19
Larry Masinter
Briefly, the syntax is
urn:duri:<date>:<encoded-URI>
The syntax is described in this document.
(See References of this document)
Uniqueness is guaranteed by the structure of adding
a designation of a specific instant to a URI. However,
URIs with ambiguous interpretation at any given
instant (e.g., "file" URIs without a given host name)
will not be unique.
The designation of a dated URI is completely persistent
for all time.
Any date can be used with any URI independently
by anyone.
Identifiers can only be resolved approximately. See
.
Note that the use of "/" for hierarchy, while discouraged
in the URN specification, is allowed in duris.
For dates, YYYY is equivalent to YYYY01, YYYYMM is equivalent to
YYYYMM01, while YYYYMMDD is equivalent to YYYYMMDD0... followed
by any number of 0's.
In considering equivalence of the encoded URI, if two duris with
equivalent dates contain lexically equivalent URIs, the duris
are equivalent.
Dates should be reasonable and meet the syntactic requirements.
The URI encoded within should meet the syntactic requirements of
the URI scheme used.
Global.
"tdb" requested.
Registration Version: 1
Registration Date: 2002-04-01
Larry Masinter
Briefly, the syntax is
urn:tdb:<date>:<encoded-URI>
The syntax is described in .
(See References of this document)
Uniqueness is guaranteed by the structure of adding
a designation of a specific instant to a URI. However,
URIs with ambiguous interpretation at any given
instant (e.g., "file" URIs without a given host name)
will not be unique.
The designation of a dated URI is completely persistent
for all time, although the intent of a resource that
is no longer available will be hard to discern.
Any date can be used with any URI independently
by anyone. However, assigning an identifier to
a non-networked resource such as a person or
abstract concept requires (at least conceptually)
first creating a networked resource that uniquely
describes the target, and then constructing the duri
using the URI of the description.
Resolution of "tdb" identifiers requires interpreting
the resource identified by the corresponding "duri".
See of this document.
As with "duri"; see .
As with "duri"; see .
As with "duri", see .
Global.
This document includes two URN NID registrations ( and ) that should be
entered into the IANA registry of URN NIDs.
"duri" and "tdb" identifiers are not any more reliable because they
have dates. URIs don't contain enough information to supply the
authority for deciding what was or wasn't at a given URI at a given
date.
There have been many discussions over several years on the relationship of URLs, URNs, URIs, resources and resource identifiers, with many contributions.
Particular thanks to Aaron Swartz, Brian McBride and
Stuart Williams, Michael Mealling, and many others.
URN SyntaxAT&TA URN Namespace for IETF DocumentsUniform Resource Identifiers (URI): Generic SyntaxInternational Atomic TimeBureau International des Poids et Mesures
Namespaces in XMLCool URIs don't changeW3CFunctional Requirements for Uniform Resource NamesY10K and Beyond
Preserving the InternetAlexa Internet Through the Looking Glass