• Skip to main content
  • Select language
  • Skip to search
MDN Web Docs
  • Technologies
    • HTML
    • CSS
    • JavaScript
    • Graphics
    • HTTP
    • APIs / DOM
    • WebExtensions
    • MathML
  • References & Guides
    • Learn web development
    • Tutorials
    • References
    • Developer Guides
    • Accessibility
    • Game development
    • ...more docs
Archive of obsolete content
  1. MDN
  2. Archive of obsolete content
  3. Archived Mozilla and build documentation
  4. URIs and URLs

URIs and URLs

In This Article
    1. Overview
    2. nsIURI and nsIURL
      1. nsSimpleURI
      2. nsStandardURL
    3. Escaping
    4. Parsing URLs
      1. Noteable Differences
    5. References
  1. Original Document Information

Warning: The content of this article may be out of date. It was last updated in 2002.

Overview

Handling network and locally retrievable resources is a central part of Necko. Resources are identified by URI "Uniform Resource Identifier" (Taken from RFC 2396):

Uniform
Uniformity provides several benefits: it allows different types of resource identifiers to be used in the same context, even when the mechanisms used to access those resources may differ; it allows uniform semantic interpretation of common syntactic conventions across different types of resource identifiers; it allows introduction of new types of resource identifiers without interfering with the way that existing identifiers are used; and, it allows the identifiers to be reused in many different contexts, thus permitting new applications or protocols to leverage a pre-existing, large, and widely-used set of resource identifiers.
Resource
A resource can be anything that has identity. Familiar examples include an electronic document, an image, a service (e.g., "today's weather report for Los Angeles"), and a collection of other resources. Not all resources are network "retrievable"; e.g., human beings, corporations, and bound books in a library can also be considered resources. The resource is the conceptual mapping to an entity or set of entities, not necessarily the entity which corresponds to that mapping at any particular instance in time. Thus, a resource can remain constant even when its content---the entities to which it currently corresponds---changes over time, provided that the conceptual mapping is not changed in the process.
Identifier
An identifier is an object that can act as a reference to something that has identity. In the case of URI, the object is a sequence of characters with a restricted syntax.
A URI can be further classified as a locator, a name, or both. The term "Uniform Resource Locator" (URL) refers to the subset of URI that identify resources via a representation of their primary access mechanism (e.g., their network "location"), rather than identifying the resource by name or by some other attribute(s) of that resource. ... The URI scheme defines the namespace of the URI, and thus may further restrict the syntax and semantics of identifiers using that scheme. Although many URL schemes are named after protocols, this does not imply that the only way to access the URL's resource is via the named protocol. Gateways, proxies, caches, and name resolution services might be used to access some resources, independent of the protocol of their origin, and the resolution of some URL may require the use of more than one protocol (e.g., both DNS and HTTP are typically used to access an "http" URL's resource when it can't be found in a local cache).

In Necko every URI scheme is represented by a protocol handler. Sometimes a protocol handler represents more than one scheme. The protocol handler provides scheme specific information and methods to create new URIs of the schemes it supports. One of the main Necko goals is to provide a "plug able" protocol support. This means that it should be possible to add new protocols to Necko just by implementing nsIProtocolHandler and nsIChannel. It also might be necessary to implement a new urlparser for a new protocol but that might not be necessary because Necko already provides URI implementations that can deal with a number of schemes, by implementing the generic urlparser defined in RFC 2396.

nsIURI and nsIURL

In a strict sense Necko does only know URLs, URIs by the above definition are much too generic to be properly represented inside a library.

There are however two interfaces which loosely relate to the distinction between URI and URL as per the above definition: nsIURI and nsIURL.

nsIURI represents access to a very simple, very generic form of an URL. Simply speaking it's scheme and non-scheme, separated by a colon, like "about". nsIURL inherits from nsIURI and represents access to typical URLs with schemes like "http", "ftp", ...

nsSimpleURI

One implementation of nsIURI is nsSimpleURI which is the basis for protocols like "about". nsSimpleURI contains setters and getters for the URI and the components of an URI: scheme and path (non-scheme). There are no pre written urlparsers for simple URIs, because of it's simple structure.

nsStandardURL

The most important implementation of nsIURL is nsStandardURL which is the basis for protocols like http, ftp, ...

These schemes support a hierarchical naming system, where the hierarchy of the name is denoted by a "/" delimiter separating the components in the path. nsStandardURL also contains the facilities to parse these typ of urls, to break the specification of the URL "spec" down into the most basic segments.

The spec consists of prepath and path. The prepath consists of scheme and authority. The authority consists of prehost, host and port. The prehost consists of username and password. The path consists of directory, filename, param, query and ref. The filename consists of filebasename and fileextension.

If the spec is completly broken down, it consists of: scheme, username, password, host, port, directory, filebasename, fileextension, param, query and ref. Together these segments form the URL spec with the following syntax:

scheme://username:password@host:port/directory/filebasename.fileextension;param?query#ref

For performance reasons the complete spec is stored in escaped form in the nsStandardURL object with pointers (position and length) to each basic segment and for the more global segments like path and prehost for example.

Necko provides pre written urlparsers for schemes based on hierachical naming systems.

 

Escaping

To be able to parse an URL safely it is sometimes necessary to "escape" certain characters, to hide them from the parser. An escaped character is encoded as a character triplet, consisting of the percent character "%" followed by the two hexadecimal digits representing the octet code. For example, "%20" is the escaped encoding for the US-ASCII space character.

Another quote from RFC 2396:

A URI is always in an "escaped" form, since escaping or unescaping a completed URI might change its semantics. Normally, the only time escape encodings can safely be made is when the URI is being created from its component parts; each component may have its own set of characters that are reserved, so only the mechanism responsible for generating or interpreting that component can determine whether or not escaping a character will change its semantics. Likewise, a URI must be separated into its components before the escaped characters within those components can be safely decoded.

This implies that the segments of urls are escaped differently. This is done by NS_EscapeURL which is now part of xpcom, but started as part of Necko. The information how to escape each segment is stored in a matrix.

Also a string should not be escaped more than once. Necko will not escape an already escaped character unless forced by a special mask that can be used if it is known that a string is not escaped.

Parsing URLs

RFC 2396 defines an URL parser that can deal with the syntax that is common to most URL schemes that are currently in existence.

Sometimes scheme specific parsing is required. Also to be somewhat tolerant to syntax errors the parser has to know more about the specific syntax of the URLs for that scheme. To stay almost generic Necko contains three parsers for the main classes of standard URLs. Which one has to be used is defined by the implementation of nsIProtocolhandler for the scheme in question.

The three main classes are:

Authority
The URLs have an authority segment, like "http".
NoAuthority
These URLs have no or a degenerated authority segment, like the "file" scheme. Also this parser can identify drives if possible depending on the platform.
Standard
It is not known if an authority segment exists or not, less syntax correction can be applied in this case.

Noteable Differences

  1. Necko does not support certain deprecated forms of relative URLs, based on the following part of RFC 2396:
    If the scheme component is defined, indicating that the reference starts with a scheme name, then the reference is interpreted as an absolute URI and we are done. Otherwise, the reference URI's scheme is inherited from the base URI's scheme component. Due to a loophole in prior specifications (RFC1630), some parsers allow the scheme name to be present in a relative URI if it is the same as the base URI scheme. Unfortunately, this can conflict with the correct parsing of non-hierarchical URI. For backwards compatibility, an implementation may work around such references by removing the scheme if it matches that of the base URI and the scheme is known to always use the "hier_part" syntax. The parser can then continue with the steps below for the remainder of the reference components. Validating parsers should mark such a misformed relative reference as an error.

    The decision was made against backwards compatibility. This means that URLs like "http:page.html" or "http:/directory/page.html" are interpreted as absolute urls and "corrected" by the parser.

  2. Also the handling of query segments is different from the examples given in RFC 2396:
    Within an object with a well-defined base URI of http://a/b/c/d;p?q the relative URI would be resolved as follows: ... ?y = http://a/b/c/?y ...

    Instead

    ?y = http://a/b/c/d;p?y

    was implemented as suggested by the older RFC 1808. This decision is based on an email by Roy T. Fielding, one of the authors of RFC 2396, stating that the given example is wrong. Details can be found at bug 90439.

  3. Registry-based authoritys Currently Necko's url-objects only support host based authoritys or urls with no authoritys. Registry-based authoritys as defined in RFC 2396
    Many URI schemes include a top hierarchical element for a naming authority, such that the namespace defined by the remainder of the URI is governed by that authority. This authority component is typically defined by an Internet-based server or a scheme-specific registry of naming authorities. ... The structure of a registry-based naming authority is specific to the URI scheme, but constrained to the allowed characters for an authority component.

    are not supported.

References

Main reference for URIs, URLs and URL-parsing is RFC 2396.

 

Original Document Information

  • Author(s): Andreas Otte
  • Last Updated Date: January 2, 2002
  • Copyright Information: Portions of this content are © 1998–2007 by individual mozilla.org contributors; content available under a Creative Commons license | Details.

 

Document Tags and Contributors

Tags: 
  • Guide
  • Mozilla
  • Necko
  • NeedsUpdate
 Contributors to this page: teoli, Sheppy, ratcliffe_mike, kohei.yoshino
 Last updated by: Sheppy, Feb 19, 2014, 8:27:52 AM

  1. .htaccess ( hypertext access )
  2. <input> archive
  3. Add-ons
    1. Add-ons
    2. Firefox addons developer guide
    3. Interaction between privileged and non-privileged pages
    4. Tabbed browser
    5. bookmarks.export()
    6. bookmarks.import()
  4. Adding preferences to an extension
  5. An Interview With Douglas Bowman of Wired News
  6. Apps
    1. Apps
    2. App Development API Reference
    3. Designing Open Web Apps
    4. Graphics and UX
    5. Open web app architecture
    6. Tools and frameworks
    7. Validating web apps with the App Validator
  7. Archived Mozilla and build documentation
    1. Archived Mozilla and build documentation
    2. ActiveX Control for Hosting Netscape Plug-ins in IE
    3. Archived SpiderMonkey docs
    4. Autodial for Windows NT
    5. Automated testing tips and tricks
    6. Automatic Mozilla Configurator
    7. Automatically Handle Failed Asserts in Debug Builds
    8. BlackConnect
    9. Blackwood
    10. Bonsai
    11. Bookmark Keywords
    12. Building TransforMiiX standalone
    13. Chromeless
    14. Creating a Firefox sidebar extension
    15. Creating a Microsummary
    16. Creating a Mozilla Extension
    17. Creating a Release Tag
    18. Creating a Skin for Firefox/Getting Started
    19. Creating a Skin for Mozilla
    20. Creating a Skin for SeaMonkey 2.x
    21. Creating a hybrid CD
    22. Creating regular expressions for a microsummary generator
    23. DTrace
    24. Dehydra
    25. Developing New Mozilla Features
    26. Devmo 1.0 Launch Roadmap
    27. Download Manager improvements in Firefox 3
    28. Download Manager preferences
    29. Drag and Drop
    30. Embedding FAQ
    31. Embedding Mozilla in a Java Application using JavaXPCOM
    32. Error Console
    33. Exception logging in JavaScript
    34. Existing Content
    35. Extension Frequently Asked Questions
    36. Fighting Junk Mail with Netscape 7.1
    37. Firefox Sync
    38. Force RTL
    39. GRE
    40. Gecko Coding Help Wanted
    41. HTTP Class Overview
    42. Hacking wiki
    43. Help Viewer
    44. Helper Apps (and a bit of Save As)
    45. Hidden prefs
    46. How to Write and Land Nanojit Patches
    47. Introducing the Audio API extension
    48. Java in Firefox Extensions
    49. JavaScript crypto
    50. Jetpack
    51. Litmus tests
    52. Makefile.mozextension.2
    53. Microsummary topics
    54. Migrate apps from Internet Explorer to Mozilla
    55. Monitoring downloads
    56. Mozilla Application Framework
    57. Mozilla Crypto FAQ
    58. Mozilla Modules and Module Ownership
    59. Mozprocess
    60. Mozprofile
    61. Mozrunner
    62. Nanojit
    63. New Skin Notes
    64. Persona
    65. Plug-n-Hack
    66. Plugin Architecture
    67. Porting NSPR to Unix Platforms
    68. Priority Content
    69. Prism
    70. Proxy UI
    71. Remote XUL
    72. SXSW 2007 presentations
    73. Space Manager Detailed Design
    74. Space Manager High Level Design
    75. Standalone XPCOM
    76. Stress testing
    77. Structure of an installable bundle
    78. Supporting private browsing mode
    79. Table Cellmap
    80. Table Cellmap - Border Collapse
    81. Table Layout Regression Tests
    82. Table Layout Strategy
    83. Tamarin
    84. The Download Manager schema
    85. The life of an HTML HTTP request
    86. The new nsString class implementation (1999)
    87. TraceVis
    88. Treehydra
    89. URIScheme
    90. URIs and URLs
    91. Using Monotone With Mozilla CVS
    92. Using SVK With Mozilla CVS
    93. Using addresses of stack variables with NSPR threads on win16
    94. Venkman
    95. Video presentations
    96. Why Embed Gecko
    97. XML in Mozilla
    98. XPInstall
    99. XPJS Components Proposal
    100. XRE
    101. XTech 2005 Presentations
    102. XTech 2006 Presentations
    103. XUL Explorer
    104. XULRunner
    105. ant script to assemble an extension
    106. calICalendarView
    107. calICalendarViewController
    108. calIFileType
    109. xbDesignMode.js
  8. Archived open Web documentation
    1. Archived open Web documentation
    2. Browser Detection and Cross Browser Support
    3. Browser Feature Detection
    4. Displaying notifications (deprecated)
    5. E4X
    6. E4X Tutorial
    7. LiveConnect
    8. MSX Emulator (jsMSX)
    9. Old Proxy API
    10. Properly Using CSS and JavaScript in XHTML Documents
    11. Reference
    12. Scope Cheatsheet
    13. Server-Side JavaScript
    14. Sharp variables in JavaScript
    15. Standards-Compliant Authoring Tools
    16. Using JavaScript Generators in Firefox
    17. Window.importDialog()
    18. Writing JavaScript for XHTML
    19. XForms
    20. background-size
    21. forEach
  9. B2G OS
    1. B2G OS
    2. Automated Testing of B2G OS
    3. B2G OS APIs
    4. B2G OS add-ons
    5. B2G OS architecture
    6. B2G OS build prerequisites
    7. B2G OS phone guide
    8. Building B2G OS
    9. Building and installing B2G OS
    10. Building the B2G OS Simulator
    11. Choosing how to run Gaia or B2G
    12. Customization with the .userconfig file
    13. Debugging on Firefox OS
    14. Developer Mode
    15. Developing Firefox OS
    16. Firefox OS Simulator
    17. Firefox OS apps
    18. Firefox OS board guide
    19. Firefox OS developer release notes
    20. Firefox OS security
    21. Firefox OS usage tips
    22. Gaia
    23. Installing B2G OS on a mobile device
    24. Introduction to Firefox OS
    25. Mulet
    26. Open web apps quickstart
    27. Pandaboard
    28. PasscodeHelper Internals
    29. Porting B2G OS
    30. Preparing for your first B2G build
    31. Resources
    32. Running tests on Firefox OS: A guide for developers
    33. The B2G OS platform
    34. Troubleshooting B2G OS
    35. Using the App Manager
    36. Using the B2G emulators
    37. Web Bluetooth API (Firefox OS)
    38. Web Telephony API
    39. Web applications
  10. Beginner tutorials
    1. Beginner tutorials
    2. Creating reusable content with CSS and XBL
    3. Underscores in class and ID Names
    4. XML data
    5. XUL user interfaces
  11. Case Sensitivity in class and id Names
  12. Creating a dynamic status bar extension
  13. Creating a status bar extension
  14. Gecko Compatibility Handbook
  15. Getting the page URL in NPAPI plugin
  16. Index
  17. Inner-browsing extending the browser navigation paradigm
  18. Install.js
  19. JXON
  20. List of Former Mozilla-Based Applications
  21. List of Mozilla-Based Applications
  22. Localizing an extension
  23. MDN
    1. MDN
    2. Content kits
  24. MDN "meta-documentation" archive
    1. MDN "meta-documentation" archive
    2. Article page layout guide
    3. Blog posts to integrate into documentation
    4. Current events
    5. Custom CSS classes for MDN
    6. Design Document
    7. DevEdge
    8. Developer documentation process
    9. Disambiguation
    10. Documentation Wishlist
    11. Documentation planning and tracking
    12. Editing MDN pages
    13. Examples
    14. Existing Content/DOM in Mozilla
    15. External Redirects
    16. Finding the right place to document bugs
    17. Getting started as a new MDN contributor
    18. Landing page layout guide
    19. MDN content on WebPlatform.org
    20. MDN page layout guide
    21. MDN subproject list
    22. Needs Redirect
    23. Page types
    24. RecRoom documentation plan
    25. Remove in-content iframes
    26. Team status board
    27. Trello
    28. Using the Mozilla Developer Center
    29. Welcome to the Mozilla Developer Network
    30. Writing chrome code documentation plan
    31. Writing content
  25. MMgc
  26. Makefile - .mk files
  27. Marketplace
    1. Marketplace
    2. API
    3. Monetization
    4. Options
    5. Publishing
  28. Mozilla release FAQ
  29. Newsgroup summaries
    1. Newsgroup summaries
    2. Format
    3. Mozilla.dev.apps.firefox-2006-09-29
    4. Mozilla.dev.apps.firefox-2006-10-06
    5. mozilla-dev-accessibility
    6. mozilla-dev-apps-calendar
    7. mozilla-dev-apps-firefox
    8. mozilla-dev-apps-thunderbird
    9. mozilla-dev-builds
    10. mozilla-dev-embedding
    11. mozilla-dev-extensions
    12. mozilla-dev-i18n
    13. mozilla-dev-l10n
    14. mozilla-dev-planning
    15. mozilla-dev-platform
    16. mozilla-dev-quality
    17. mozilla-dev-security
    18. mozilla-dev-tech-js-engine
    19. mozilla-dev-tech-layout
    20. mozilla-dev-tech-xpcom
    21. mozilla-dev-tech-xul
    22. mozilla.dev.apps.calendar
    23. mozilla.dev.tech.js-engine
  30. Obsolete: XPCOM-based scripting for NPAPI plugins
  31. Plugins
    1. Plugins
    2. Adobe Flash
    3. External resources for plugin creation
    4. Logging Multi-Process Plugins
    5. Monitoring plugins
    6. Multi-process plugin architecture
    7. NPAPI plugin developer guide
    8. NPAPI plugin reference
    9. Samples and Test Cases
    10. Shipping a plugin as a Toolkit bundle
    11. Supporting private browsing in plugins
    12. The First Install Problem
    13. Writing a plugin for Mac OS X
    14. XEmbed Extension for Mozilla Plugins
  32. SAX
  33. Security
    1. Security
    2. Digital Signatures
    3. Encryption and Decryption
    4. Introduction to Public-Key Cryptography
    5. Introduction to SSL
    6. NSPR Release Engineering Guide
    7. SSL and TLS
  34. Solaris 10 Build Prerequisites
  35. Sunbird Theme Tutorial
  36. Table Reflow Internals
  37. Tamarin Tracing Build Documentation
  38. The Basics of Web Services
  39. Themes
    1. Themes
    2. Building a Theme
    3. Common Firefox theme issues and solutions
    4. Creating a Skin for Firefox
    5. Making sure your theme works with RTL locales
    6. Theme changes in Firefox 2
    7. Theme changes in Firefox 3
    8. Theme changes in Firefox 3.5
    9. Theme changes in Firefox 4
  40. Updating an extension to support multiple Mozilla applications
  41. Using IO Timeout And Interrupt On NT
  42. Using SSH to connect to CVS
  43. Using workers in extensions
  44. WebVR
    1. WebVR
    2. WebVR environment setup
  45. XQuery
  46. XUL Booster
  47. XUL Parser in Python