Skip to main content

Convert Europass Candidate XML exports into Reactive Resume JSON Resume v5.

Project description

Europass XML to Reactive Resume JSON

A Python utility that converts Europass Candidate XML exports into JSON Resume v5 files for import into Reactive Resume.

The converter reads a Europass Candidate XML file, extracts CV/resume content, sanitises HTML fragments, maps the data into the JSON Resume v5 structure expected by Reactive Resume, and reuses a provided sample JSON file as the output template for layout, typography, design, and other non-content defaults.

Status

This project is in active development.

The current target is Reactive Resume JSON Resume v5 import compatibility, not human readability or plain-text editability. The converter prioritises the JSON shape accepted or emitted by Reactive Resume, using the application's own samples as templates.

Supported input

The converter currently targets the Europass XML dialect represented by files with a root element similar to:

<Candidate xmlns="http://www.europass.eu/1.0">
    ...
</Candidate>

The supported XML structure includes, among others:

  • CandidatePerson
  • CandidateProfile
  • EmploymentHistory
  • EducationHistory
  • PersonQualifications
  • ConferencesAndSeminars
  • Others
  • Attachment
  • RenderingInformation

The older LearnerInfoType Europass schema is not the primary target of this converter.

Output

The converter produces JSON Resume v5 output intended for import into Reactive Resume.

The output is based on a provided Reactive Resume-compatible template file. The template is used to preserve non-content fields such as:

  • picture styling
  • section wrapper settings
  • template name
  • page settings
  • layout metadata
  • design colours
  • typography
  • custom CSS settings
  • notes

Existing resume content in the template is cleared and replaced with converted Europass content.

Repository structure

The repository uses a package layout. The Python source lives inside the europass_converter/ package directory, while project metadata and auxiliary files remain at repository root.

.
├── 100_src
│   ├── install_dependencies.sh
│   ├── sample.json
│   ├── sample.pdf
│   └── sample.xml
├── 200_media
│   └── media.qrc
├── adv.ui
├── europass_converter
│   ├── cli.py
│   ├── contacts.py
│   ├── converter.py
│   ├── gui.py
│   ├── __init__.py
│   ├── languages.py
│   ├── map_resume.py
│   ├── parse_candidate.py
│   ├── sanitize_html.py
│   ├── template.py
│   ├── ui_adv.py
│   ├── ui_mainwindow.py
│   └── version.py
├── mainwindow.ui
├── media_rc.py
├── pyproject.toml
├── README.md
├── LICENSE-docs
└── LICENSE

Install CLI

Install the command-line converter from a release wheel:

python -m pip install europassxml_to_reactiveresumejson-*.whl

Show the available options:

europass-convert --help

Or:

python -m europass_converter.cli --help

Check the installed version:

europass-convert --version

Or:

python -m europass_converter.cli --version

Convert a file

europass-convert "./path/to/cv.xml" \
  --template "./100_src/sample.json" \
  --output "./out/resume.json" \
  --no-split-pages

The resulting file can then be imported into Reactive Resume as a JSON Resume v5 file.

Use a Europass PDF

Some Europass PDF files contain the original XML as an embedded attachment called attachment.xml.

The command-line converter expects an XML file as input. To extract it from a Europass PDF CV:

# sudo apt-get install poppler-utils   # install `pdfdetach` if needed
pdfdetach -savefile attachment.xml -o cv.xml cv.pdf

Then convert the extracted XML:

europass-convert "./cv.xml" \
  --template "./100_src/sample.json" \
  --output "./out/resume.json" \
  --no-split-pages

Install GUI

Install the package with GUI support:

python -m pip install "europassxml_to_reactiveresumejson-*.whl[gui]"

Launch the GUI:

europass-convert-gui

The GUI accepts either:

  • a Europass Candidate XML file; or
  • a Europass PDF containing an embedded attachment.xml.

If a PDF is selected, the GUI attempts to extract the embedded XML automatically. If extraction succeeds, the XML is saved beside the selected output JSON as Europass.xml. If the PDF does not contain attachment.xml, conversion is not started.

Advanced options available in the GUI include:

  • indentation level;
  • compact JSON output;
  • verbosity level;
  • debug traceback logging;
  • parsed intermediate representation logging.

When enabled, diagnostic can be written to file at the same time as the output JSON: XMLconv.log for stout and debug.log for stderr.

Run from source

Clone the repository and prepare the virtual environment:

bash ./100_src/install_dependencies.sh

By default, this installs only the command-line converter dependencies.

To install the GUI dependencies:

bash ./100_src/install_dependencies.sh --gui

To install the development environment, including GUI and development extras:

bash ./100_src/install_dependencies.sh --dev

Run the CLI from the source tree:

.venv/bin/python -m europass_converter.cli --help

Or use the installed console script inside the virtual environment:

.venv/bin/europass-convert --help

Test the bundled sample

.venv/bin/python -m europass_converter.cli "./100_src/sample.xml" \
  --template "./100_src/sample.json" \
  --output "./out/resume2.json" \
  --no-split-pages \
  --debug \
  --debug-parsed \
  -v 2

Or:

.venv/bin/europass-convert "./100_src/sample.xml" \
  --template "./100_src/sample.json" \
  --output "./out/resume2.json" \
  --no-split-pages \
  --debug \
  --debug-parsed \
  -v 2

This writes the converted JSON file to ./out/resume2.json.

CLI options

positional arguments:
  xml_path                      Path to the Europass Candidate XML file.

required options:
  --template PATH               Path to the Reactive Resume JSON Resume v5 template.

optional output options:
  --output PATH                 Path to write the converted JSON.
  --indent N                    JSON indentation level. Default: 2.
  --compact                     Write compact single-line JSON.

optional contact-selection options:
  --preferred-email EMAIL       Preferred primary email if several are found.
  --preferred-phone PHONE       Preferred primary phone if several are found.
  --preferred-website URL       Preferred primary website if several are found.

optional layout options:
  --no-split-pages              Put all content into one metadata.layout.pages entry.

diagnostic options:
  -v, --verbose {0,1,2}         Verbosity level.
                                0 = suppress diagnostics
                                1 = print diagnostics if present
                                2 = print diagnostics and success summary

  --debug                       Print a full traceback on errors.
  --debug-parsed                Print the parsed intermediate representation.

Example with diagnostics and no converter-level page splitting:

python cli.py ./100_src/sample.xml \
  --template ./100_src/sample.json \
  --output ./out/resume.json \
  --no-split-pages \
  --debug \
  --debug-parsed \
  -v 2

Conversion policy

The converter follows these mapping rules.

Personal information

Europass XML source JSON target
CandidatePerson/PersonName basics.name
email contact channels basics.email, additional emails to basics.customFields
telephone contact channels basics.phone, additional phones to basics.customFields
website contact channels basics.website, additional websites to basics.customFields
social/academic profile links sections.profiles.items[]
address locality basics.location
nationality basics.customFields
birth date/year basics.customFields

The current default contact priority is XML order unless the user provides --preferred-email, --preferred-phone, and/or --preferred-website.

Work experience

EmploymentHistory/EmployerHistory/PositionHistory entries become sections.experience.items[].

The mapper preserves:

  • organisation
  • position
  • city/country
  • date range
  • first organisation website
  • sanitised description

Career progression embedded in the Europass description is kept inside the description unless later parser versions can reliably extract structured roles.

Education

EducationHistory/EducationOrganizationAttendance entries become sections.education.items[].

The mapper preserves:

  • institution
  • degree
  • education period
  • location
  • website
  • grade
  • thesis
  • credits
  • sanitised description

Raw programme concentration codes are intentionally discarded.

Courses

Europass Others blocks with title Course become certification items:

sections.certifications.items[]

Publications

Europass Others blocks with title Pubblications or Publications are parsed as publication groups.

The mapper attempts to split only obvious HTML list items into individual publication records. If the group cannot be split safely, it is preserved as a custom Publications section.

Conferences and seminars

ConferencesAndSeminars/ConferenceAndSeminar entries become a customSections[] with type = "projects".

Other blocks

Other Europass Others blocks are preserved under a custom section called Additional information.

The mapper does not guess whether miscellaneous blocks are awards, interests, skills, or projects until a specific parser rule is implemented.

Languages

Native languages from PrimaryLanguageCode are mapped as:

fluency = "Native"
level = 5

Foreign languages from PersonQualifications/PersonCompetency are mapped using CEFR scores:

  • the numeric level is based on the lowest available CEFR score;
  • the displayed fluency is based on the spoken CEFR level;
  • if both spoken interaction and spoken production exist, the lower of the two is used;
  • incomplete CEFR data is preserved during parsing and handled conservatively during mapping.

References

If the XML does not provide references, the converter adds the 'Available upon request' placeholder.

HTML sanitisation

Europass XML often stores escaped HTML fragments. The converter decodes and sanitises these fragments before inserting them into JSON.

  • Allowed semantic tags p, br, ul, ol, li, strong, em, u, a

  • Allowed link protocols http, https, mailto, tel

  • Headings are converted to p/strong markup.

  • Plain text without HTML tags is converted into paragraph HTML.

  • The sanitizer also removes/normalises:

    • scripts
    • styles
    • iframes
    • embedded objects
    • unknown attributes
    • inline CSS
    • layout-only spans
    • empty paragraphs
    • empty list items
    • unsafe links

Layout behaviour

By default, the mapper builds metadata.layout.pages using a standard resume order and includes only sections that contain content.

To prevent the converter from creating multiple layout page entries, use:

--no-split-pages

This does not guarantee that the target rendering app will not paginate overflowing content when exported to PDF. It only prevents the converter from splitting metadata.layout.pages.

Template handling

The converter treats the provided JSON template as the source of Reactive Resume-compatible defaults.

It clears content fields such as:

  • basics.name
  • basics.email
  • basics.phone
  • basics.location
  • summary.content
  • sections.*.items
  • customSections

It preserves non-content fields such as:

  • metadata.template
  • metadata.css
  • metadata.page
  • metadata.design
  • metadata.typography
  • metadata.notes
  • metadata.layout.sidebarWidth
  • picture styling

It rebuilds only metadata.layout.pages.

Python API

The public converter API is provided by converter.py.

Convert files

from europass_converter.converter import convert_files, resume_to_json

result = convert_files(
    "sample.xml",
    "sample.json",
    preferred_email=None,
    preferred_phone=None,
    preferred_website=None,
    split_pages=True,
)

json_text = resume_to_json(result.resume, indent=2)

Convert parsed data

from converter import convert_parsed
from parse_candidate import parse_candidate_file
from template import load_template

parsed = parse_candidate_file("sample.xml")
template = load_template("sample.json")

result = convert_parsed(parsed, template)

Convert XML string

from europass_converter.converter import convert_xml_string
from europass_converter.template import load_template

template = load_template("sample.json")

result = convert_xml_string(xml_string, template)

Design principles

The converter is split into small modules with narrow responsibilities:

  • parse_candidate.py extracts neutral content from XML.
  • contacts.py classifies and selects contact channels.
  • languages.py handles CEFR/native language compression.
  • sanitize_html.py cleans rich text.
  • template.py prepares a Reactive Resume JSON Resume v5 template.
  • map_resume.py applies the mapping policy.
  • converter.py orchestrates parsing and mapping.
  • cli.py handles command-line arguments and file output.

This separation keeps the conversion policy testable and prevents XML parsing, template preparation, and JSON mapping from becoming tangled.

Limitations

Current limitations:

  • only the uploaded Europass Candidate XML dialect is targeted;
  • the older Europass LearnerInfoType schema is not the primary supported input;
  • strict generic JSON Resume compatibility is not the priority;
  • the output is optimised for Reactive Resume JSON Resume v5 import;
  • programme concentration codes are discarded;
  • employer emails are ignored;
  • detailed language CEFR sub-scores are compressed;
  • publication parsing is conservative;
  • unknown non-empty XML blocks are preserved as unhandled content rather than guessed into specific sections;
  • visual pagination may still occur in the target app even when --no-split-pages is used.

Suggested future work

Possible improvements:

  • add tests for each module;
  • add strict-schema output mode;
  • improve GUI packaging and release builds;
  • add an interactive contact selector;
  • improve publication splitting;
  • add richer country/language code resolution;
  • add importer validation against the target app;
  • add release automation and GitHub Actions packaging checks;
  • add CI checks for linting and tests.

Licensing

Source code in this repository is licensed under the GNU Affero General Public License v3. (AGPL-3.0), unless stated otherwise.

Documentation, including README files, manuals, and explanatory text, is licensed under Creative Commons Attribution-ShareAlike 4.0 International (CC-BY-SA-4.0), unless stated otherwise.

Sample files, test fixtures, and third-party assets may be subject to separate licensing terms as indicated in their respective files or directories.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

europassxml_to_reactiveresumejson-0.2.0.tar.gz (116.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file europassxml_to_reactiveresumejson-0.2.0.tar.gz.

File metadata

File hashes

Hashes for europassxml_to_reactiveresumejson-0.2.0.tar.gz
Algorithm Hash digest
SHA256 37d750605d04e1b65396769ca07188bfd3f2074cdd616c130f1090efea87ad9e
MD5 06ea3261cca8efdcadab2a4ad9602253
BLAKE2b-256 ed3db0e2e20a0d85963b7e1c6581c89c446cc2ac0d6217eb19cb15eaada39b7d

See more details on using hashes here.

Provenance

The following attestation bundles were made for europassxml_to_reactiveresumejson-0.2.0.tar.gz:

Publisher: release.yml on FATelarico/EuropassXML_to_ReactiveResumeJSON

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file europassxml_to_reactiveresumejson-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for europassxml_to_reactiveresumejson-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a731236fd09cc6fdc56ea5e92807de11bc0ea74fffad39199d725c452363c8e6
MD5 984ffc1ef9cb70c911269827ec9ef954
BLAKE2b-256 2b5027767c91068311e54c60930466b5fd3f71639d1c5fcbe23c791d1138ae7c

See more details on using hashes here.

Provenance

The following attestation bundles were made for europassxml_to_reactiveresumejson-0.2.0-py3-none-any.whl:

Publisher: release.yml on FATelarico/EuropassXML_to_ReactiveResumeJSON

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page