Skip to main content

A basic Python parser for the Voynich Manuscript

Project description

Voynich Manuscript Parser and Resources

Pure Python parser for the IVTFF formatted ZL transliteration of the Voynich Manuscript.

Intended for NLP/ML/DL use on the Voynich Manuscript.

voynich.VoynichManuscript is what you'll likely mostly be using, it contains voynich.Pages, which contain voynich.Lines.

Example usage (subject to change):

>>> from voynich import VoynichManuscript

>>> vm = VoynichManuscript(path_to_txt, inline_comments=False)

>>> print(vm)
VoynichManuscript(num_pages=227, inline_comments=False)

>>> print(vm.pages["f1r"])
Page(page_name=f1r, quire_num=None, folio_num=None, num_lines=31, illust_type=None)

>>> print(vm.pages["f1r"][0])
Line(<%>fachys.ykal.ar.ataiin.shol.shory.[cth:oto]res.y.kor.sholdy)

>>> print(vm.pages["f1r"][0].text)
<%>fachys.ykal.ar.ataiin.shol.shory.[cth:oto]res.y.kor.sholdy

>>> print(vm.get_paragraphs()[0])
'fachys.ykal.ar.ataiin.shol.shory.cthres.y.kor.sholdy.sory.ckhar.or,y.kair.chtaiin.shar.ase.cthar.cthar,dansyaiir.sheky.or.ykaiin.shod.cthoary.cthes.daraiin.sysoiin.oteey.oteos,roloty.cthiar,daiin.okaiin.or.okansair,y.chear.cthaiin.cphar.cfhaiinydaraishy'

Each Page object also contains a list of paragraphs Page.paragraphs. These paragraphs have some additional processing on them, removing paragraph markers (<%> and <$>), gap indicators (<->), and (currently) chooses the first possible interpretation of ambiguous characters (i.e. [o:a] -> o). For future work, will update paragraph parser to produce one paragraph for every possible combination of ambiguous characters.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voynich-0.0.1.tar.gz (6.6 kB view hashes)

Uploaded Source

Built Distribution

voynich-0.0.1-py3-none-any.whl (6.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page