XPIDL Syntax

Status of this document

This is a partial reverse-engineering of the libIDL source code's parser, limited mostly to the subset of functionality that is supported by the Mozilla xpidl binary.

Purpose of this document

This document is not an introduction to XPIDL or IDL in general. It is more focused on XPIDL syntax and grammar. See XPIDL Main Page for more links and introductory content.

Simplifications, conventions and notation

The syntax is specified according to ABNF as defined by RFC 5234, although a few productions use prose for clarity of understanding.

Lexically, tokens are delimited by whitespace (defined here as spaces, tabs, vertical tabs, form feeds, line feeds, and carriage returns, or [ \t\v\f\r\n] in regular expression form). LibIDL only considers a single line feed as a newline, and not carriage returns (although xpidl begs to differ). Additionally, the use of both C-style (/* ... */) and C++-style (// ... end-of-line) comments are permitted between any two tokens.

Some productions can only occur at the beginning of lines; to simplify the grammar, I will not mention them in the grammar, especially since they are handled as a preprocessing step before the IDL source code is actually parsed.

A `%{' that appears at the beginning of a line is the start of a raw code fragment, which extends until the end of a line that begins with `%}'. Text inside raw code fragments are not otherwise parsed by xpidl directly. This may be followed by the language, as in `%{C++', to output the raw fragment only in the specified language.
A `#include "file"' line instructs the xpidl processor to include that file in the same sense that the C preprocessor includes a file. Note that includes within comments or raw code fragments are not processed by xpidl. Unlike the C preprocessor, when a file is included multiple times, it acts as if the subsequent includes did not happen; this prevents the need for include guards.

XPIDL Syntax (ABNF)

The root production here is idl_file.

idl_file = 1*definition definition = [type_decl / const_decl / interface] ";" interface = [prop_list] "interface" ident [[inheritance] "{" *(ifacebody) "}"] inheritance = ":" *(scoped_name ",") scoped_name] ifacebody = [type_decl / op_decl /attr_decl / const_decl] ";" / codefrag type_decl = [prop_list] "typedef" type_spec *(ident ",") ident type_decl /= [prop_list] "native" ident [parens] const_decl = "const" type_spec ident "=" expr op_decl = [prop_list] (type_spec / "void") parameter_decls raise_list parameter_decls = "(" [*(param_decl ",") param_decl] ")" param_decl = [prop_list] ("in" / "out" / "inout") type_spec ident attr_decl = [prop_list] ["readonly"] "attribute" type_spec *(ident ",") ident ; Descending order of precedence expr /= expr ("|" / "^" / "&") expr ; Unequal precedence "|" is lowest expr /= expr ("<<" / ">>") expr expr /= expr ("+" / "-") expr expr /= expr ("*" / "/" / "%") expr expr /= ["-" / "+" / "~"] (scoped_name / literal / "(" expr ")" ) ; Numeric literals: quite frankly, I'm sure you know how these kinds of ; literals work, and these are annoying to specify in ABNF. literal = octal_literal / decimal_literal / hex_literal / floating_literal literal /= string_literal / char_literal literal /= "TRUE" / "FALSE" ; In regex: /"[^"\n]*["\n]/. Yes, newline terminates. string_literal = 1*(%x22 *(any char except %x22 or %x0a) (%x22 / %x0a)) ; Same as above, but s/"/'/g char_literal = 1*(%x27 *(any char except %x27 or %x0a) (%x27 / %x0a)) type_spec = "float" / "double" / "string" / "wstring" type_spec /= ["unsigned"] ("short" / "long" / "long" "long") type_spec /= "char" / "wchar" / "boolean" / "octet" type_spec /= scoped_name prop_list = "[" *(property ",") property "]" property = ident [parens] raise_list = "raises" "(" *(scoped_name) ",") scoped_name ")" scoped_name = *(ident "::") ident / "::" ident
; In regex: [A-Za-z_][A-Za-z0-9_]*; identifiers beginning with _ cause warnings
ident = (%x41-5a / %x61-7a / "_") *(%x41-5a / %x61-7a / %x30-39 / "_") parens = "(" 1*(any char except ")") ")"

Functionality not used in xpidl

The libIDL parser we use is more powerful than xpidl itself can understand. The following is a list of potential features which are parseable but may not result in expected code:

Struct, union, and enumerated types
Array declarators (appears to be supported in xpidl_header.c but not xpidl_typelib.c)
Exception declarations
Module declarations
Variable arguments (that makes the ABNF get more wonky)
Sequence types
Max-length strings
Fixed-point numbers
"any" and "long double" types.

Pyxpidl syntax

idlfile = *(CDATA / INCLUDE / interface / typedef / native) typedef = "typedef" IDENTIFER IDENTIFIER ";" native = [attributes] "native" IDENTIFIER "(" NATIVEID ")" interface = [attributes] "interface" IDENTIFIER" [ifacebase] [ifacebody] ";" ifacebase = ":" IDENTIFIER ifacebody = "{" *(member) "}" member = CDATA / "const" IDENTIFIER IDENTIFIER "=" number ";" member /= [attributes] ["readonly"] "attribute" IDENTIFIER IDENTIFER ";" member /= [attributes] IDENTIFIER IDENTIFIER "(" paramlist ")" raises ";" paramlist = [param *("," param)] raises = ["raises" "(" IDENTIFIER *("," identifier) ")"] attributes = "[" attribute *("," attribute) "]" attribute = (IDENTIFIER / CONST) ["(" (IDENTIFIER / IID) ")"] param = [attributes] ("in" / "out" / "inout") IDENTIFIER IDENTIFIER number = NUMBER / IDENTIFIER number /= "(" number ")" number /= "-" number number /= number ("+" / "-" / "*") number number /= number ("<<" / >>") number number /= number "|" number ; Lexical tokens, I'm going to specify these in regex form NUMBER = /-?\d+|0x[0-9A-Fa-f]+/ CDATA = /%{[ ]*C\+\+[ ]*\n(.*?\n?)%}[ ]*(C\+\+)?/s INCLUDE = /\#include[ \t]+"[^"\n]+"/ NATIVEID = /[^()\n]+(?=\))/ IID = /[0-9A-Fa-f]{8}-[0-9A-Fa-f]{4}-[0-9A-Fa-f]{4}-[0-9A-Fa-f]{12}/ IDENTIFIER = /unsigned long long|unsigned short|unsigned long|long long|[A-Za-z][A-Za-z_0-9]*/