The adparser library for Python provides powerful capabilities for working with AsciiDoc documents

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
License
- OSI Approved :: MIT License
Operating System
Programming Language
- Python :: 3

Project description

The adparser library for Python provides powerful capabilities for working with AsciiDoc documents. In this Quick Start, you’ll learn how to use the library’s main functions to extract various elements from an AsciiDoc document.

Installation

Install the asciidoc library using pip:

pip install adparser

It is also necessary that asciidoctor is preinstalled in the system. You can find out how to do this by following the link https://asciidoctor.org/#installation . Before using the library, make sure that the asciidoctor path is in the PATH.

Extracting Document Elements

The asciidoc library can extract the following elements from an AsciiDoc document:

text lines - the paragraph element are made up of it
link
paragraphs
headings
lists
source blocks
tables
audio, video, and images.

To access these elements, you can use the Parser object.

Parser object

To start parsing, we need to create Parser object:

from adparser import Parser
my_file = open("test.adoc")
parser = Parser(my_file)

Parser methods

To work with each of the document elements described above, the Parser object has its own methods:

text_lines()
links()
paragraphs()
headings
lists
source_blocks()
tables()
audios()
images()
videos()

Example

test.adoc

= Document Title

This is a paragraph.

== Section 1

This is another paragraph.

[source,python]
print("Hello, World!")

[NOTE]
This is a note.

image::image.png[]

>>> from adparser import Parser
... my_file = open("test.adoc")
... parser = Parser(my_file)

>>> for docelem in parser.headings():
...     print(docelem.data)
'Document Title'
'Section 1'
>>> for docelem in parser.source_blocks():
...     print(docelem.data)
...     print(docelem.styles)
'print("Hello, World!")'
['listingblock', 'python']

The functions return an iterators for the objects-elements of the document. They store the following attributes:

data: The data associated with the element. Usually text, but in the case of tables, you can get a dictionary (see the example at the end of the readme).
section: List of sections of the document the element belongs to
styles: List of styles of the object
attribute (only for links): text of the link

List of styles:

text_line
- italic
- bold
- monospace
source
- source languages
for all elements admonition styles
- note
- tip
- caution
- warning
for all elements area style
- sidebarblock
- exampleblock
- quoteblock
- listningblock
- literalblock

You can get the text from the paragraph object only through the get_text() method. It has a url_opt parameter.

url_opt can be:

'show_urls'
'hide_urls'

This option can hide the url of a link ,hyperlink, media src(image, audio, video) or show it. The default is 'hide_urls'

test.adoc

= Document Title

You can also use https://www.macports.org[MacPorts], another package manager for macOS, to install Asciidoctor.

If you dont have MacPorts on your computer, complete the https://www.macports.org/install.php[installation instructions] first.

>>> from adparser import Parser
... my_file = open("test.adoc")
... parser = Parser(my_file)

>>> for docelem in parser.paragraphs():
...     print(docelem.get_text())
'You can also use MacPorts, another package manager for macOS, to install Asciidoctor.'
'If you dont have MacPorts on your computer, complete the installation instructions first.'
>>> for docelem in parser.paragraphs():
...     print(docelem.get_text('show_urls'))
'You can also use https://www.macports.org[MacPorts], another package manager for macOS, to install Asciidoctor.'
'If you dont have MacPorts on your computer, complete the https://www.macports.org/install.php[installation instructions] first.'

You can set a named style and section parameters for Parser methods for a more accurate selection.

test.adoc

= Document Title

== Python

[source,python]
print("Hello, World!")

== C++

[source,cpp]
std::cout << "Hello, World!";

>>> from adparser import Parser
... my_file = open("test.adoc")
... parser = Parser(my_file)

>>> for docelem in parser.source_blocks(['cpp']):
...     print(docelem.data)
...     print(docelem.styles)
'std::cout << "Hello, World!";'
['listingblock', 'cpp']
>>> for docelem in parser.source_blocks([], ['Python']):
...     print(docelem.data)

'print("Hello, World!")'

Styles and sections are filtered by passing lists. They store the necessary styles or sections. The selection takes place for objects whose style and section attributes have elements of the passed lists as a subset.

If you pass the list of sections ['C', 'Python'] in the example above, nothing will be output, because there is no code object that is both in the C section and in the Python section.

Features of working with the parser:

The level 0 section can only be 1
Only the text is extracted from the tables and lists
Nested tables cannot be used

How work with block titles:

It is possible to get a title for blocks by title attribute:

table
lists
source
image
video
audio

test.adoc

= Document Title

== Python

[source,python]
.python
print("Hello, World!")

== Table 

.T1
[cols="1,1"]
|===
|Cell in column 1, row 1
|Cell in column 2, row 1

|Cell in column 1, row 2
|Cell in column 2, row 2

|Cell in column 1, row 3
|Cell in column 2, row 3
|===

>>> from adparser import Parser
... my_file = open("test.adoc")
... parser = Parser(my_file)
>>> elemiter = parser.tables()
>>> elemiter = next(elemiter)
>>>  print(elemiter.title)
Table 1. T1
>>> elemiter = parser.source_blocks()
>>> elemiter = next(elemiter)
>>>  print(elemiter.title)
python

How get tables:

test.adoc

= Document Title

[cols="1,1"]
|===
|Cell in column 1, row 1
|Cell in column 2, row 1

|Cell in column 1, row 2
|Cell in column 2, row 2

|Cell in column 1, row 3
|Cell in column 2, row 3
|===

The table objects also have the data attribute which stores the dictionary

>>> from adparser import Parser
... my_file = open("test.adoc")
... parser = Parser(my_file)
>>> elemiter = parser.tables()
>>> elemiter = next(elemiter)

>>>  print(elemiter.data)
{'col1':['Cell in column 1, row 1', 'Cell in column 1, row 2', 'Cell in column 1, row 3'], 'col2':['Cell in column 2, row 1', 'Cell in column 2, row 2', 'Cell in column 2, row 3']}

Keys with the names "col1" and "col2" were automatically created

Using the to_dict() and to_matrix() methods, you can change the data attribute to a dictionary or matrix, respectively

test1.adoc

[cols="1,1,1,1"]
|===
|Column 1 |Column 2 |Column 3 |Column 4

|Cell in column 1
|Cell in column 2
|Cell in column 3
|Cell in column 4
|===

>>> from adparser import Parser
... my_file = open("test1.adoc")
... parser = Parser(my_file)
>>> elemiter = parser.tables()
>>> elemiter = next(elemiter)

>>>  print(elemiter.data["Column 1"])
["Cell in column 1"]
>>> elemiter.to_matrix()
>>> print(elemiter.data[0][0])
'Column 1'
>>> print(elemiter.data[0][1])
'Cell in column 1'

The first element in the column becomes the column name (in matrix)

Span Tables

This implementation currently supports storing tables with colspan and rowspan.

= Document Title

|===
|Column 1, header row |Column 2, header row |Column 3, header row |Column 4, header row

3+|This cell spans columns 1, 2, and 3 because its specifier contains a span of `3+`
|Cell in column 4, row 2

|Cell in column 1, row 3
|Cell in column 2, row 3
|Cell in column 3, row 3
|Cell in column 4, row 3
|===

>>> from adparser import Parser
... my_file = open("test1.adoc")
... parser = Parser(my_file)
>>> elemiter = parser.tables()
>>> elemiter = next(elemiter)

>>>  print(elemiter.data)
{'Column 1, header row': 
    ['This cell spans columns 1, 2, and 3 because its specifier contains a span of 3+', 
     'Cell in column 1, row 3'], 
 'Column 2, header row': 
    ['This cell spans columns 1, 2, and 3 because its specifier contains a span of 3+', 
     'Cell in column 2, row 3'], 
 'Column 3, header row': 
    ['This cell spans columns 1, 2, and 3 because its specifier contains a span of 3+', 
     'Cell in column 3, row 3'], 
 'Column 4, header row': 
    ['Cell in column 4, row 2', 
     'Cell in column 4, row 3']}

= Document Title

|===
|Column 1, header row |Column 2, header row

.2+|This cell spans rows 2 and 3 because its specifier contains a span of `.2+`
|Cell in column 2, row 2

|Cell in column 2, row 3

|Cell in column 1, row 4
|Cell in column 2, row 4
|===

>>> from adparser import Parser
... my_file = open("test1.adoc")
... parser = Parser(my_file)
>>> elemiter = parser.tables()
>>> elemiter = next(elemiter)

>>>  print(elemiter.data)
{'Column 1, header row': ['This cell spans rows 2 and 3 because its specifier contains a span of .2+', 
                          'This cell spans rows 2 and 3 because its specifier contains a span of .2+', 
                          'Cell in column 1, row 4'], 
 'Column 2, header row': ['Cell in column 2, row 2', 
                          'Cell in column 2, row 3', 
                          'Cell in column 2, row 4']}

= Document Title

|===
|Column 1, header row |Column 2, header row |Column 3, header row |Column 4, header row

4.1+|full width 4+

|===

>>> from adparser import Parser
... my_file = open("test1.adoc")
... parser = Parser(my_file)
>>> elemiter = parser.tables()
>>> elemiter = next(elemiter)

>>>  print(elemiter.data)
{'Column 1, header row': ['full width 4+'], 
 'Column 2, header row': ['full width 4+'], 
 'Column 3, header row': ['full width 4+'], 
 'Column 4, header row': ['full width 4+']}

= Document Title

|===
|Column 1, header row |Column 2, header row |Column 3, header row |Column 4, header row

|Cell in column 1, row 2
2.3+|This cell spans columns 2 and 3 and rows 2, 3, and 4 because its specifier contains a span of `2.3+`
|Cell in column 4, row 2

|Cell in column 1, row 3
|Cell in column 4, row 3

|Cell in column 1, row 4
|Cell in column 4, row 4
|===

>>> from adparser import Parser
... my_file = open("test1.adoc")
... parser = Parser(my_file)
>>> elemiter = parser.tables()
>>> elemiter = next(elemiter)

>>>  print(elemiter.data)
{'Column 1, header row': ['Cell in column 1, row 2', 
                            'Cell in column 1, row 3', 
                            'Cell in column 1, row 4'], 
    'Column 2, header row': ['This cell spans columns 2 and 3 and rows 2, 3, and 4 because its specifier contains a span of 2.3+', 
                            'This cell spans columns 2 and 3 and rows 2, 3, and 4 because its specifier contains a span of 2.3+', 
                            'This cell spans columns 2 and 3 and rows 2, 3, and 4 because its specifier contains a span of 2.3+'], 
    'Column 3, header row': ['This cell spans columns 2 and 3 and rows 2, 3, and 4 because its specifier contains a span of 2.3+', 
                            'This cell spans columns 2 and 3 and rows 2, 3, and 4 because its specifier contains a span of 2.3+', 
                            'This cell spans columns 2 and 3 and rows 2, 3, and 4 because its specifier contains a span of 2.3+'], 
    'Column 4, header row': ['Cell in column 4, row 2', 
                            'Cell in column 4, row 3', 
                            'Cell in column 4, row 4']}

Warning

Span Tables can parsing only if the header is clearly defined in it:

It should list all the columns
If the header is not explicitly specified, the first row will be used instead.
header columns should not have colspan or rowspan attributes

get_near() method

To access the closest element to the current one, there is method get_near. The accepted parameters are a string with the name of the required element and a string with the direction: 'up' or 'down'.

test.adoc

= Document Title

This is a paragraph.

== Section 1

This is another paragraph.

[source,python]
print("Hello, World!")

[NOTE]
This is a note.

image::image.png[]

>>> from adparser import Parser
... my_file = open("test.adoc")
... parser = Parser(my_file)
>>> for docelem in parser.source_blocks():
...     up_heading = docelem.get_near("heading", direction='up')
...     print(up_heading.data)
...     down_image = docelem.get_near("image", direction='down')
...     print(down_image.data)
'Section 1'
'image.png'

test2.adoc

= Document Title

=====
Here's a sample AsciiDoc document:

-----
= Document Title

Content goes here.
-----

The document header is useful, but not required.
=====

>>> from adparser import Parser
... my_file = open("test2.adoc")
... parser = Parser(my_file)
>>> for docelem in parser.paragraphs(style=['listingblock']):
...     up_heading = docelem.get_near("paragraph", direction='up')
...     print(up_heading.get_text())

'Here’s a sample AsciiDoc document:'

You can also set a named style parameter for these methods.

Project details

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
License
- OSI Approved :: MIT License
Operating System
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.7

Mar 9, 2026

0.1.6

Oct 8, 2025

0.1.5

Oct 6, 2025

0.1.4

Oct 6, 2025

0.1.2

Mar 15, 2025

0.1.1

Nov 28, 2024

0.1.0 yanked

Aug 15, 2024

Reason this release was yanked:

<br> bug, exist adoc file bug

0.0.3 yanked

Jul 20, 2024

Reason this release was yanked:

bad tests

0.0.2 yanked

Jul 20, 2024

Reason this release was yanked:

no lxml

0.0.1 yanked

Jul 20, 2024

Reason this release was yanked:

no lxml no work

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adparser-0.1.7.tar.gz (14.2 kB view details)

Uploaded Mar 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

adparser-0.1.7-py3-none-any.whl (16.9 kB view details)

Uploaded Mar 9, 2026 Python 3

File details

Details for the file adparser-0.1.7.tar.gz.

File metadata

Download URL: adparser-0.1.7.tar.gz
Upload date: Mar 9, 2026
Size: 14.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for adparser-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`fef17a11dc9185865212a8958ff2ab048197a1bccaf268298e54d67172dbe14b`
MD5	`7dbb965a6cba25714243ae7e4cd6a0e2`
BLAKE2b-256	`39bcdd7acd680352ae9c90961d741a812d8b8b551be04cf619564d24fd046b5b`

See more details on using hashes here.

File details

Details for the file adparser-0.1.7-py3-none-any.whl.

File metadata

Download URL: adparser-0.1.7-py3-none-any.whl
Upload date: Mar 9, 2026
Size: 16.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for adparser-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a0d0920ad609c40d15fa9205a8305dbbb0e61856ead9f5d8d37aaaea980e5d27`
MD5	`438f838ccf8987db73fb1776f26ee2eb`
BLAKE2b-256	`eaa3db672f31f2bfaedea97bcb2267f936193702562ce635973aaf0cb43ae47c`

See more details on using hashes here.

adparser 0.1.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

Extracting Document Elements

Parser object

Parser methods

Example

How work with block titles:

How get tables:

Span Tables

get_near() method

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes