Xml2rfc generates RFCs and IETF drafts from document source in XML according to the dtd in RFC2629.
The IETF uses a specific format for the standards and other documents it publishes as RFCs, and for the draft documents which are produced when developing documents for publications. There exists a number of different tools to facilitate the formatting of drafts and RFCs according to the existing rules, and this tool, xml2rfc, is one of them. It takes as input an xml file which contains the text and meta-information about author names etc., and transforms it into suitably formatted output. The input xml file should follow the DTD given in RFC2629 (or it’s inofficial successor).
The current incarnation of xml2rfc provides output in the following formats: Paginated and unpaginated ascii text, html, nroff, and expanded xml. Only the paginated text format is currently (January 2013) accepable as draft submissions to the IETF.
To install a system-wide version of xml2rfc, download and unpack the latest release from the xml2rfc distribution packages, then cd into the resulting package directory and run:
$ python setup.py install
Alternatively, if you have the ‘pip’ command (‘Pip Installs Packages’) installed, you can run pip to download and install the package:
$ pip install xml2rfc
If you want to perform a local installation for a specific user, you have a couple of options. You may use python’s default location of user site-packages by specifying the flag --user. These locations are:
- UNIX: $HOME/.local/lib/python<ver>/site-packages
- OSX: $HOME/Library/Python/<ver>/lib/python/site-packages
- Windows: %APPDATA%/Python/Python<ver>/site-packages
You can additionally combine the flag --install-scripts with --user to specify a directory on your PATH to install the xml2rfc executable to. For example, the following command:
$ python setup.py install --user --install-scripts=$HOME/bin
will install the xml2rfc library and data to your local site-packages directory, and an executable python script xml2rfc to $HOME/bin.
The option --prefix allows you to specify the base path for all installation files. The setup.py script will exit with an error if your PYTHONPATH is not correctly configured to contain the library path the script tries to install to.
The command is used as follows:
$ python setup.py install --prefix=<path>
For further fine-tuning of the installation behavior, you can get a list of all available options by running:
$ python setup.py install --help
xml2rfc accepts a single XML document as input and outputs to one or more conversion formats.
Basic Usage: xml2rfc SOURCE [options] FORMATS...
The following parameters affect how xml2rfc behaves, however none are required.
Short Long Description -h --help show the help message and exit -v --verbose print extra information -q --quiet dont print anything -n --no-dtd disable DTD validation step -c CACHE --cache=CACHE specify an alternate cache directory to write to -d DTD --dtd=DTD specify an alternate dtd file -b BASENAME --basename=BASENAME specify the base name for output files -f FILENAME --filename=FILENAME specify an output filename (none) --date=DATE run as if todays date is DATE (format: yyyy-mm-dd) (none) --clear-cache purge the cache and exit (none) --version display the version number and exit
At least one but as many as all of the following output formats must be specified. The destination file will be created according to the argument given to –filename. If no argument was given, it will create the file(s) “output.format”. If no format is specified, xml2rfc will default to paginated text (--text).
Command Description --raw outputs to a text file, unpaginated --text outputs to a text file with proper page breaks --nroff outputs to an nroff file --html outputs to an html file --exp outputs to an XML file with all references expanded
- xml2rfc draft.xmlxml2rfc draft.xml --dtd=alt.dtd --basename=draft-1.0 --text --nroff --html
xml2rfc depends on the following packages:
- lxml (> 2.2.7)
Version 2.4.2 (26 May 2013)
This release fixes all major and critical issues registered in the issue tracker as of 26 May 2013. Details:
- Applied a patch from email@example.com to sort references (when PI sortrefs==yes), and added code to insert a link target if the reference has a ‘target’ attribute. Fixes issue #175.
- Added pre-installation requirements to the INSTALL file. Added code to scripts/xml2rfc in order to avoid problems if that file is renamed to scripts/xml2rfc.py. This fixes issue #152.
- Added a setup requirement for python <3.0, as things don’t currently work if trying to run setup.py or xml2rfc with python 3.X.
- Added special cases to avoid adding double spaces after many common abbreviations. Refined the sentence-end double-space fixup further, to look at whether what follows looks like the start of a new sentence. This fixes issue #115.
- Moved the get_initials() function to the BaseRfcWriter, as it now needs to look at a PI. Added code to return one initial only, or multiple, depending on the PI ‘multiple-initials’ setting. Fixes issue #138 (for now). It is possible that this resolution is too simpleminded, and a cleaner way is needed to differentiate the handling of initials in the current document versus initials in references.
- Added new undocumented PI multiple-initials to control whether multiple initials will be shown for an author, or not. The default is ‘no’, matching the xml2rfc v1.x behaviour.
- Fixed the code which determines when an author affiliation doesn’t need to be listed again in the front page author list, and removes the redundant affiliation (the old code would remove the first matching organization, rather than the immediately preceeding organization name). Also fixed a buggy test of when an organization element is present. Fixes issue #135.
- Made appearance of ‘Authors Address’ (etc.) in ToC dependent on PI ‘rfcedstyle’ == ‘yes’. Fixes issue #125.
- Updated write_text() to handle long bullets that need to be wrapped across lines better. Fixes issue #124.
- Fixed two other cases of missing blank lines when PI ‘compact’ is ‘no’. Fixes issue #82 (some more).
- Disabled the iprnotified IP. See issue #123; closes #123.
- When protecting http: URLs from line-breaking in nroff output, place the % outside enclosing parentheses, if any. Fixes issue #120.
- Added a warning for incomplete and out-of-date <date/> elements. Fixed an issue with changeset .
- Issue a warning when the source file isn’t for an RFC, but doesn’t have a docName attribute in the <rfc/> element.
- Fixed the use of separating lines in table drawing, to match v1 for text and nroff output. (There is no specification for the meaining of the different styles though…). Fixes issue #113. Note that additional style definitions are needed to get the correct results for the html output.
- Refactored and re-wrote the paginated text writer and the nroff writer in order to generate a ToC in nroff by re-using the fairly complex post-rendering code which inserts the ToC (and iref entries) in the paginated text writer. As a side effect, the page-breaking calculations for the nroff writer becomes the same as for the paginated writer. Re-factored the line and page-break emitting code to be cleaner and more readable. Changed the code to not start inserting a ToC too close to the end of a page (currently hardcoded to require at least 10 lines), otherwise skip to a new page. Fixes issue #109.
- Changed the author list in first-page header to show a blank line if no organization has been given. Fixes issue #108.
- Changed the wrapping of nroff output to match text output closely, in order to minimize insertion of .bp in the middle of a line. Fixes issue #150 (mostly – line breaks on hyphens may still cause .bp to be emitted in the middle of a line in very rare cases).
- Changed nroff output for long titles (which will wrap) so that the wrapped title text will be indented appropriately. Fixes issue #128.
- Changed the handling of special characters (nbsp, nbhy) so as to emit the proper non-breaking escapes for nroff. Fixes issue #121.
- Changed start-of-line nroff escape handling, see issue #118.
- Changed the generation of xref text to use the same numeric indexes as in the references section when symrefs=’no’. Don’t start numbering over again when starting a new references section (i.e., when moving from normative to informative). Don’t re-sort numeric references alphabetically; they are already sorted numerically. Fixes issue #107.
- Changed os.linesep to ‘<NL>’ when writing lines to text files. The library takes care of doing the right thing on different platforms; writing os.linesep on the other hand will result in the file containing ‘<CR><CR><NL>’, which is wrong. Fixes issue #141.
- Changed handling of include PIs to replace the PI instead of just appending the included tree. Updated a test file to match updated test case. Fixes issue #136.
Version 2.4.1 (13 Feb 2013)
- Fixed a problem with very long hangindent bullet text followed by <vspace/>, which could make xml2rfc abort with a traceback for certain inputs.
- Fixed a mismatched argument count for string formatting which could make xml2rfc abort with a traceback for certain inputs.
Version 2.4.0 (27 Jan 2013)
With this release, all issues against the 2.x series of xml2rfc has been resolved. Without doubt there will be new issues in the issue tracker, but the current clean slate is nice to have.
For full details on all tickets, there’s always the issue tracker: http://trac.tools.ietf.org/tools/xml2rfc/trac/report/
An extract from the commit log is available below:
- In some cases, the error messages when validating an xml document are correct, but too obscure. If a required element is absent, the error message could say for instance ‘Element references content does not follow the DTD, expecting (reference)+, got ‘, which is correct – the DTD validator got nothing, when it required something, so it says ‘got ‘, with nothing after ‘got’. But for a regular user, we now add on ‘nothing.’ to make things clearer. Fixes issue #102.
- It seems there could be a bug in separate invocation of lxml.etree.DTD.validate(tree) after parsing, compared to doing parsing with dtd_validation=True. The former fails in a case when it shouldn’t, while the latter succeeds in validating a valid document. Declaring validation as successful if the dtd.error_log is empty, even if validation returned False. This resolves issue #103.
- Factored out the code which gets an author’s initials from the xml author element, and made the get_initials() utility function return initials fixed up with trailing spaces, if missing. The current code does not mangle initials by removing any initials but the first one. Fixes issue #63, closes issue #10.
- Added code to avoid breaking URLs in boilerplate across lines. Fixes issue #78.
- Added PI defaults for ‘figurecount’ and ‘tablecount’ (not listed in the xml2rfc readme…) Also removed coupling between explicitly set rfcedstyle, compact, and subcompact settings, to follow v1 practice.
- Refactored the PI defaults to appear all in the same place, rather than spread out throughout the code.
- Updated draw_table to insert blank rows when PI compact is ‘no’. Fixes issue #82.
- Added tests and special handling for the case when a hanging type list has less space left on the first line, after the bullet, than what’s needed for the first following word. In that case, start the list text on the following line. Fixes issue #85.
- Modified the page-breaking code to better keep section titles together with the section text, and keep figure preamble, figure, postamble and caption together. Updated tests. Fixes issue #100.
- Added handling of tocdepth to the html writer. Fixes issue #101.
- Modified how the –base switch to the xml2rfc script works, to make it easier to generate multiple output formats and place them all in the same target directory. Also changed the default extensions for two output formats (.raw.txt and .exp.xml).
- Tweaked the html template to not permit crazy wide pages.
- Rewrote parts of the parsing in order to get hold of the number attribute of the <rfc/> tag before the full parsing is done, in order to be able to later resolve the &rfc.number; entity (which, based on how convoluted it is to get that right, I’d like to deprecate.) Fixes issue #86.
- Numerous small fixes to indentation and wrapping of references. Avoid wrapping URLs in references if possible. Avoid wrapping ‘Section 3.14.’ if possible. Indent more like xml2rfc v1.
- Added reduction of doublespaces in regular text, except when they might be at the end of a sentence. Xml2rfc v1 would do this, v2 didn’t till now.
- Generalized the _format_counter() method to consistently handle list counter field-widths internally, and made it adjust the field-width to the max counter width based on the list length and counter type. Fixes an v1 to -v2 incompatibility for numbered lists with 10 items or more, and other similar cases.
- Added generic base conversion code, and used that to generate list letters which will work for lists with more than 26 items.
- Reworked code to render roman numerals in lists, to place whitespace correctly in justification field. Fixes issue #94.
- Added consensus vs. no-consensus options for IAB RFCs’ Status of This Memo section. Fixes issue #88.
- Made <t/> elements with an anchor attribute generate html with an <a name=’…’/> elemnt, for linking. Closes issue #67.
- Applied boilerplate URL-splitting prevention only in the raw writer where later do paragraph line-wrapping, instead of generically. Fixes issue #62.
- Now permitting all versions of lxml >= 2.2.8, but notice that there may be missing build dependencies for lxml 3.x which may cause installation of lxml to fail. (That’s an lxml issue, rather than an xml2rfc issue, though…) This fixes issue #99.
Version 184.108.40.206 (18 Jan 2013)
- Tweaked the install_required setting in setup.py to not pull down lxml 3.x (as it’s not been tested with xml2rfc) and bumped the version.
Version 2.3.11 (18 Jan 2013)
This release fixes all outstanding major bugs, details below. The issue tracker is at http://tools.ietf.org/tools/xml2rfc/trac/.
- Updated the nroff writer to do backslash escaping on source text, to avoid escaping nroff control characters. Fixes issue #77.
- Added a modified xref writer to the nroff output writer, in order to handle xref targets which should not be broken across lines. This, together with changeset , fixes issue #80.
- Added text to the section test case to trigger the second part of issue #79. It turns out that the changes in  fixed this, too; this closes issue #79.
- Tweaked the nroff generation to not break on hyphens, in order to avoid hyphenated words ending up with embedded spaces: ‘pre-processing’ becoming ‘pre- processing’ if ‘pre-‘ occurred at the end of an nroff text line. Also tweaked the line-width used in line-breaking to have matching line-breaks between .txt and .nroff output (with exception for lines ending in hyphens).
- Tweaked roman number list counter to output roman numbers in a field 5 spaces wide, instead of having varied widths. This is different from version 1, so may have to be reverted, depending on how people react.
- Added a warning for too long lines in figures and tables. No outdenting for now; I’d like to consult some about that. Fixes issue #76.
- Updated tests showing that all list format specifiers mentioned in issue #70 now works. Closes isssue #70.
- Changed spanx emphasis back to _this_ instead of -this-, matching the v1 behaviour. Addresses issue #70.
- Make <vspace/> in a hangindent list reset the indentation to the hang-indent, even if the bullet text is longer than the hang-indent. Addresses issue #70.
- Refined the page-breaking to not insert an extra page break for artwork that won’t fit on a page anyway.
- Refined the page-breaking to avoid breaking artwork and tables across pages, if possible.
- Fixed a problem with centering of titles and labels. Fixes issue #73.
- Changed the leading and trailing whitespace lines of a page to better match legacy output. Fixed the autobreaking algorithm to correctly avoid orphans and widows; fixes issue #72. Removed an extra blank line at the top of the page following an early page break to avoid orphan or widow.
- Tweaked the generation of ToC dot-lines and page numbers to better match legacy xml2rfc. Fixed a bug in the generation of xref text where trailing whitespace could cause double spaces. Tweaked the output format to produce the correct number of leading blank lines on the first page of a document.
- Modified the handling of figure titles, so that given titles will be written also without anchor or figure counting. Fixes issue #75.
- Tweaked the html writer to have a buffer interface that provides a self.buf similar to the other writers, for test purposes.
- Reworked the WriterElementTest suite to test all the output formats, not only paginated text.
- Added a note about /usr/local/bin permissions. This closes issue #65.
- Added files describing possible install methods (INSTALL), and possible build commands (Makefile).
- The syntax that was used to specify the version of the lxml dependency (‘>=’) is not supported in python distutil setup.py files, and caused setup to try to find an lxml version greater than =2.2.8, which couldn’t succeed. Fixed to say ‘>2.2.7’ instead. This was probably the cause of always reinstalling lxml even when it was present.
- Updated README.rst to cover the new –date option, and tweaked it a bit.
- Added some files to provide an enhanced source distribution package.
- Updated setup.py with maintainer and licence information.
Version 2.3.10 (03 Jan 2013)
- Changed the output text for Internet-Draft references to omit the series name, but add (work in progress). Updated the test case to match draft revision number.
- Updated all the rfc editor boilerplate in valid test facits to match the correct outcome (which is also what the code actually produces).
- Changed the diff test error message so that the valid text is output as the original, not as the changed text of a diff.
- Corrected test cases to match correct expiry using 185 days instead of 183 days from document date.
- Added missing attributes to the XmlRfcError Exception subclass, necessary in order to make it resemble lxml’s error class and provide consistent error messages to the user whether they come from lxml or our own code.
- Added a licence file, indicating the licencing used by the IETF for the xml2rfc code.
- Fixed up the xml2rfc cli script to provide better help texts by telling the option parser the appropriate option variable names.
- Fixed up the help text formatting by explicitly providing an appropriate help text formatter to the option parser.
- Added an option (–date=DATE)to provide the document date on the command line.
- Added an option (–no-dtd) to disable the DTD validation step.
- Added code to catch additional exceptions and provide appropriate user information, instead of an exception traceback.