Network Working Group M. Nottingham Internet-Draft E. Hammer-Lahav Intended status: Informational February 10, 2009 Expires: August 14, 2009 Host Metadata for the Web draft-nottingham-site-meta-01 Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on August 14, 2009. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Abstract This memo describes a method for locating host-specific metadata for the Web. Nottingham & Hammer-Lahav Expires August 14, 2009 [Page 1] Internet-Draft Host Metadata for the Web February 2009 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Notational Conventions . . . . . . . . . . . . . . . . . . . . 3 3. The host-meta File Format . . . . . . . . . . . . . . . . . . 4 3.1. The Link host-meta Field . . . . . . . . . . . . . . . . . 5 4. Discovering host-meta Files . . . . . . . . . . . . . . . . . 5 5. Minting New meta-fields . . . . . . . . . . . . . . . . . . . 6 6. Security Considerations . . . . . . . . . . . . . . . . . . . 6 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 6 7.1. application/host-meta Media Type Registration . . . . . . 6 7.2. The host-meta Field Registry . . . . . . . . . . . . . . . 7 7.2.1. Registration Template . . . . . . . . . . . . . . . . 8 7.2.2. The Link host-meta field . . . . . . . . . . . . . . . 8 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 8.1. Normative References . . . . . . . . . . . . . . . . . . . 8 8.2. Informative References . . . . . . . . . . . . . . . . . . 9 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 9 Appendix B. Frequently Asked Questions . . . . . . . . . . . . . 10 B.1. Is this mechanism appropriate for all kinds of metadata? . . . . . . . . . . . . . . . . . . . . . . . . 10 B.2. Why not use OPTIONS * with content negotiation to discover different types of metadata directly? . . . . . . 10 B.3. Why not use a META tag or microformat in the root resource? . . . . . . . . . . . . . . . . . . . . . . . . 10 B.4. Why not use response headers on the root resource, and have clients use HEAD? . . . . . . . . . . . . . . . . . . 10 B.5. Why scope metadata to an authority? . . . . . . . . . . . 10 B.6. Why /host-meta? . . . . . . . . . . . . . . . . . . . . . 11 B.7. Aren't you concerned about pre-empting an authority's URI namespace? . . . . . . . . . . . . . . . . . . . . . . 11 B.8. Why use link relations instead of media types to identify kinds of metadata? . . . . . . . . . . . . . . . 11 B.9. What impact does this have on existing mechanisms, such as P3P and robots.txt? . . . . . . . . . . . . . . . 11 B.10. Why not (insert existing similar mechanism here)? . . . . 11 Appendix C. Document History . . . . . . . . . . . . . . . . . . 11 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11 Nottingham & Hammer-Lahav Expires August 14, 2009 [Page 2] Internet-Draft Host Metadata for the Web February 2009 1. Introduction It is increasingly common for Web-based protocols to require the discovery of policy or metadata before making a request. For example, the Robots Exclusion Protocol specifies a way for automated processes to obtain permission to access resources; likewise, the Platform for Privacy Preferences [W3C.REC-P3P-20020416] tells user- agents how to discover privacy policy beforehand. While there are several ways to access per-resource metadata (e.g., HTTP headers, WebDAV's PROPFIND [RFC4918]), the overhead associated with them often precludes their use in these scenarios. When this happens, it is common to designate a "well-known location" for such metadata, so that it can be easily located. However, this approach has the drawback of risking collisions, both with other such designated "well-known locations" and with pre-existing resources. To address this, this memo proposes a single (and hopefully last) "well-known location", /host-meta, which acts as a directory to the interesting metadata about a particular authority. Future mechanisms that require authority-wide metadata can easily include an entry in the host-meta resource, thereby making their metadata cheaply available (indeed, because it can be cached, the more mechanisms that use it, the more efficient it becomes) without impinging on others' URI space. Note that the metadata provided by a host-meta resource is explicitly scoped to apply to the entire authority (in the URI [RFC3986] sense) associated with it (using the process described in Section 4); it does not apply to a subset, nor does it apply to other authorities (e.g., using another port, or a different hostname in the same domain). However, individual mechanisms (e.g., a relation type in the Link field) MAY reduce or expand this scope. This should only be done after careful consideration of the consequences upon security, administration, interoperability and network load. Please discuss this draft on the www-talk@w3.org [1] mailing list. 2. Notational Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. This documnet uses the Augmented Backus-Naur Form (ABNF) notation of [RFC5234], and explicitly includes the following rules from it: CRLF Nottingham & Hammer-Lahav Expires August 14, 2009 [Page 3] Internet-Draft Host Metadata for the Web February 2009 (CR LF), OCTET (any 8-bit sequence of data), DIGIT, ALPHA, and WSP (white space). 3. The host-meta File Format The host-meta file format is an extremely simple textual language that allows an authority to convey metadata about itself and its resources. Its syntax is similar to that of HTTP header-fields [RFC2616], but has a few differences: o White space is permissible both before and after the block of fields, and o fields MUST NOT be folded across multiple lines. Furthermore, this format's use diverges from HTTP header-fields in a number of ways: o The fields are transferred as the message body, not as headers, and o rather than being related to a message, the fields in host-meta pertain to the entire associated authority (see Section 4), and o the permissible field-names are constrained by the host-meta field registry. This specification defines one such field, Link. host-meta = *( WSP / CRLF ) *( meta-field CRLF ) *( WSP / CRLF ) meta-field = field-name ":" [ field-value ] field-name = 1*tchar field-value = *( field-content / WSP ) field-content = tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA For example, Link: ; rel="robots" Link: ; rel="privacy"; type="application/p3p.xml" Link: ; rel="http://example.com/rel" As with HTTP headers, field-names are not case-sensitive, unrecognised field-names SHOULD be silently ignored when parsing this format, and ordering of fields SHOULD NOT be considered significant unless specified otherwise. Additionally, although the syntax does Nottingham & Hammer-Lahav Expires August 14, 2009 [Page 4] Internet-Draft Host Metadata for the Web February 2009 not explicitly allow empty lines between fields, parsers SHOULD silently discard them (i.e., be permissive in what they accept). Field content is constrained by the specification indicated by its associated field-name. 3.1. The Link host-meta Field The "Link" host-meta field uses the syntax of the Link HTTP header- field [I-D.nottingham-http-link-header] to convey links whose context is the entire authority, rather than a single resource. For example, Link: ; rel="license" indicates that the URI "/terms" refers to a license for all resources associated with the authority. The Link host-meta field differs from the Link header in the following respects: o Its context is defined as all resources that share its authority, by default (although this MAY be overridden by a representation obtained from the indicated resource), and o When the link URI is relative, its base URI is the root resource of the authority. For example, in the example above, if the authority is "example.com", the full link URI would be "http://example.com/me". 4. Discovering host-meta Files The metadata for a given authority can be discovered by dereferencing the path /host-meta on the same authority. For example, for an HTTP URI [RFC2616], the following request would obtain metadata for the authority "www.example.com:80"; GET /host-meta HTTP/1.1 Host: www.example.com The semantics of the protocol used for access to the resource apply. Therefore, if the resource indicates the client should try a different request (in HTTP, the 301, 302, 303 or 307 response status code), the client SHOULD attempt to do so; note that this implies that the host-meta file for one authority MAY be retrieved from a different authority. Likewise, if the resource is not available or existent (in HTTP, the 404 or 410 status code), the client SHOULD infer that metadata is not available via this mechanism. Nottingham & Hammer-Lahav Expires August 14, 2009 [Page 5] Internet-Draft Host Metadata for the Web February 2009 If a representation is successfully obtained, but is not in the format described above, clients SHOULD infer that the authority is using this URI for other purposes, and not process it as a host-meta file. To aid in this process, authorities using this mechanism SHOULD correctly label host-meta responses with the "application/host-meta" internet media type. 5. Minting New meta-fields Applications that wish to mint new meta-fields for use in the host- meta format MUST register them in the host-meta field-registry, following the procedures in Section 7.2. Field-names MUST conform to the field-name ABNF Section 3, and field-value syntax MUST be well- defined (e.g., using ABNF, or a reference to the syntax of an existing header field-value). Field-values SHOULD use the ISO-859-1 character encoding. If a field-value applies to a scope other than the entire authority, that scope MUST be well-defined. 6. Security Considerations The metadata returned by the /host-meta resource is presumed to be under the control of the appropriate authority and representative of all resources contained by it. If this resource is compromised or otherwise under the control of another party, it may represent a risk to the security of the server and data served by it, depending on what mechanisms use /host-meta. Scoping metadata to a single authority is the default in host-meta. Thus "http://example.com/", "https://example.com" and "http://www.example.com/" all have different host-meta files with seperate and non-overlapping scopes of applicability. Applications that change the scope of metadata can incur security risks without careful consideration. 7. IANA Considerations 7.1. application/host-meta Media Type Registration The host-meta format can be identified with the following media type: Nottingham & Hammer-Lahav Expires August 14, 2009 [Page 6] Internet-Draft Host Metadata for the Web February 2009 MIME media type name: application MIME subtype name: host-meta Mandatory parameters: None. Optional parameters: None. Encoding considerations: field-values may specify any encoding for their contents, although it is expected that most will use ISO- 8859-1 or a subset thereof (for both historic and interoperability purposes). Security considerations: As defined in this specification. [[update upon publication]] Interoperability considerations: There are no known interoperability issues. Published specification: This specification. [[update upon publication]] Applications which use this media type: No known applications currently use this media type. Additional information: Magic number(s): File extension: None. Fragment identifiers: None. Base URI: None. Macintosh File Type code: TEXT Person and email address to contact for further information: Mark Nottingham Intended usage: COMMON Author/Change controller: This specification's author(s). [[update upon publication]] 7.2. The host-meta Field Registry This document establishes the host-meta field registry as the namespace of field-names for use in meta-fields. Although some meta- fields may be similar to message headers, both syntactically and semantically, the host-meta field registry is separate from the message header field registry [RFC3864] See Section 5 for details and requirements for registered meta-fields. meta-fields may be registered on the advice of a Designated Expert (appointed by the IESG or their delegate), with a Specification Required (using terminology from [RFC5226]). Registration requests consist of the completed registration template Section 7.2.1, typically published in an RFC or Open Standard (in the sense described by [RFC2026], section 7). However, to allow for the allocation of values prior to publication, the Designated Expert may approve registration once they are satisfied that an RFC (or other Nottingham & Hammer-Lahav Expires August 14, 2009 [Page 7] Internet-Draft Host Metadata for the Web February 2009 Open Standard) will be published. Upon receiving a registration request (usually via IANA), the Designated Expert should request review and comment from the apps- discuss mailing list (or a successor designated by the APPS Area Directors). Before a period of 30 days has passed, the Designated Expert will either approve or deny the registration request, communicating this decision both to the review list and to IANA. Denials should include an explanation and, if applicable, suggestions as to how to make the request successful. 7.2.1. Registration Template Field name: The name requested for the new meta-field. This MUST conform to the host-meta field specification details noted in Section 3 Change controller: For RFCs, state "IETF". For other open standards, give the name of the publishing body (e.g., ANSI, ISO, ITU, W3C, etc.). A postal address, home page URI, telephone and fax numbers may also be included. Specification document(s): Reference to document that specifies the field, preferably including a URI that can be used to retrieve a copy of the document. An indication of the relevant sections may also be included, but is not required. Related information: Optionally, citations to additional documents containing further relevant information. 7.2.2. The Link host-meta field This specification registers one host-meta field. Field name: Link Change controller: IETF Specification document(s): [[this document]] Related information: [I-D.nottingham-http-link-header] 8. References 8.1. Normative References [I-D.nottingham-http-link-header] Nottingham, M., "Link Relations and HTTP Header Linking", draft-nottingham-http-link-header-03 (work in progress), November 2008. [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. Nottingham & Hammer-Lahav Expires August 14, 2009 [Page 8] Internet-Draft Host Metadata for the Web February 2009 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005. [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 5226, May 2008. [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, January 2008. 8.2. Informative References [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. [RFC3864] Klyne, G., Nottingham, M., and J. Mogul, "Registration Procedures for Message Header Fields", BCP 90, RFC 3864, September 2004. [RFC4918] Dusseault, L., "HTTP Extensions for Web Distributed Authoring and Versioning (WebDAV)", RFC 4918, June 2007. [W3C.REC-P3P-20020416] Marchiori, M., "The Platform for Privacy Preferences 1.0 (P3P1.0) Specification", W3C REC REC-P3P-20020416, April 2002. URIs [1] Appendix A. Acknowledgements We would like to acknowledge the contributions of everyone who provided feedback and use cases for this draft; in particular, Phil Archer, Dirk Balfanz, Tim Bray, Paul Hoffman, Barry Leiba, Ashok Malhotra, Breno de Medeiros, and John Panzer. The authors take all responsibility for errors and omissions. Nottingham & Hammer-Lahav Expires August 14, 2009 [Page 9] Internet-Draft Host Metadata for the Web February 2009 Appendix B. Frequently Asked Questions B.1. Is this mechanism appropriate for all kinds of metadata? No. The primary use cases are described in the introduction; when it's necessary to discover metadata or policy before a resource is accessed, and/or it's necessary to describe metadata for a whole authority (or large portions of it), host-meta is appropriate. In other cases (e.g., fine-grained metadata that doesn't need to be known ahead of time), other mechanisms are more appropriate. B.2. Why not use OPTIONS * with content negotiation to discover different types of metadata directly? Two reasons; a) OPTIONS is not cacheable -- a severe problem for scaling -- and b) it is not well-supported in browsers, and difficult to configure in servers. B.3. Why not use a META tag or microformat in the root resource? This places constraints on the format of an authority's root resource to be HTML or similar. While extremely common, it isn't universal (e.g., mobile sites, machine-to-machine communication, etc.). Also, some root resources are very large, which would place additional overhead on clients and intervening networks. B.4. Why not use response headers on the root resource, and have clients use HEAD? The headers on a root resource pertain to that resource, not the whole site. While it is possible to mint new message headers that apply to the whole site, such a header would need to be sent on every response for the root resource, whether it was useful or not, with the potential for substantially increasing the size of those responses (which are often popular, and not very cacheable). B.5. Why scope metadata to an authority? The alternative is to allow scoping to be dynamic and determined locally, but this has its own issues, which usually come down to a) an unreasonable number of requests to determine authoritative metadata, b) increased complexity, with a higher likelihood of implementation and interoperability (or even security) problems. Besides, many mechanisms on the Web already presume a single authority scope (e.g., robots.txt, P3P, cookies, javascript security), and the effort and cost required to mint a new URI authority is small and shrinking. Nottingham & Hammer-Lahav Expires August 14, 2009 [Page 10] Internet-Draft Host Metadata for the Web February 2009 B.6. Why /host-meta? It's short, descriptive and according to search indices, not widely used. B.7. Aren't you concerned about pre-empting an authority's URI namespace? Yes, but it's unfortunately a necessary (and already present) evil; this proposal tries to minimise future abuses. B.8. Why use link relations instead of media types to identify kinds of metadata? A link relation declares the intent and use of the link (or inline content, when present); a media type defines the format and processing model for those bits. B.9. What impact does this have on existing mechanisms, such as P3P and robots.txt? None, until they choose to use this mechanism. B.10. Why not (insert existing similar mechanism here)? We are aware that there are several existing proposals with similar functionality. In our estimation, none have gained sufficient traction. This may be because they were perceived to be too complex, or tied too closely to one use case. Appendix C. Document History [[RFC Editor: please remove this section before publication.]] o -01 * Changed "site-meta" to "host-meta" after feedback. * Changed from XML to text-based header-like format. * Remove capability for generic inline content. * Added registry for host-meta fields. * Clarified scope of metadata application. * Added security consideration about HTTP vs. HTTPS, expanding scope. Nottingham & Hammer-Lahav Expires August 14, 2009 [Page 11] Internet-Draft Host Metadata for the Web February 2009 Authors' Addresses Mark Nottingham Email: mnot@mnot.net URI: http://www.mnot.net/ Eran Hammer-Lahav Email: eran@hueniverse.com URI: http://hueniverse.com/ Nottingham & Hammer-Lahav Expires August 14, 2009 [Page 12]