Convert Europass Candidate XML exports into Reactive Resume JSON Resume v5.
Project description
Europass XML to Reactive Resume JSON
A Python utility that converts Europass Candidate XML exports into JSON Resume v5 files for import into Reactive Resume.
The converter reads a Europass Candidate XML file, extracts CV/resume content, sanitises HTML fragments, maps the data into the JSON Resume v5 structure expected by Reactive Resume, and reuses a provided sample JSON file as the output template for layout, typography, design, and other non-content defaults.
Status
This project is in active development.
The current target is Reactive Resume JSON Resume v5 import compatibility, not human readability or plain-text editability. The converter prioritises the JSON shape accepted or emitted by Reactive Resume, using the application's own samples as templates.
Supported input
The converter currently targets the Europass XML dialect represented by files with a root element similar to:
<Candidate xmlns="http://www.europass.eu/1.0">
...
</Candidate>
The supported XML structure includes, among others:
CandidatePersonCandidateProfileEmploymentHistoryEducationHistoryPersonQualificationsConferencesAndSeminarsOthersAttachmentRenderingInformation
The older LearnerInfoType Europass schema is not the primary target of this converter.
Output
The converter produces JSON Resume v5 output intended for import into Reactive Resume.
The output is based on a provided Reactive Resume-compatible template file. The template is used to preserve non-content fields such as:
- picture styling
- section wrapper settings
- template name
- page settings
- layout metadata
- design colours
- typography
- custom CSS settings
- notes
Existing resume content in the template is cleared and replaced with converted Europass content.
Repository structure
The repository uses a package layout. The Python source lives inside the europass_converter/ package directory, while project metadata and auxiliary files remain at repository root.
.
├── 100_src
│ ├── install_dependencies.sh
│ ├── sample.json
│ ├── sample.pdf
│ └── sample.xml
├── 200_media
│ └── media.qrc
├── adv.ui
├── europass_converter
│ ├── cli.py
│ ├── contacts.py
│ ├── converter.py
│ ├── gui.py
│ ├── __init__.py
│ ├── languages.py
│ ├── map_resume.py
│ ├── parse_candidate.py
│ ├── sanitize_html.py
│ ├── template.py
│ ├── ui_adv.py
│ ├── ui_mainwindow.py
│ └── version.py
├── mainwindow.ui
├── media_rc.py
├── pyproject.toml
├── README.md
├── LICENSE-docs
└── LICENSE
Install CLI
Install the command-line converter from a release wheel:
python -m pip install europassxml_to_reactiveresumejson-*.whl
Show the available options:
europass-convert --help
Or:
python -m europass_converter.cli --help
Check the installed version:
europass-convert --version
Or:
python -m europass_converter.cli --version
Convert a file
europass-convert "./path/to/cv.xml" \
--template "./100_src/sample.json" \
--output "./out/resume.json" \
--no-split-pages
The resulting file can then be imported into Reactive Resume as a JSON Resume v5 file.
Use a Europass PDF
Some Europass PDF files contain the original XML as an embedded attachment called attachment.xml.
The command-line converter expects an XML file as input. To extract it from a Europass PDF CV:
# sudo apt-get install poppler-utils # install `pdfdetach` if needed
pdfdetach -savefile attachment.xml -o cv.xml cv.pdf
Then convert the extracted XML:
europass-convert "./cv.xml" \
--template "./100_src/sample.json" \
--output "./out/resume.json" \
--no-split-pages
Install GUI
Install the package with GUI support:
python -m pip install "europassxml_to_reactiveresumejson-*.whl[gui]"
Launch the GUI:
europass-convert-gui
The GUI accepts either:
- a Europass Candidate XML file; or
- a Europass PDF containing an embedded
attachment.xml.
If a PDF is selected, the GUI attempts to extract the embedded XML automatically. If extraction succeeds, the XML is saved beside the selected output JSON as Europass.xml. If the PDF does not contain attachment.xml, conversion is not started.
Advanced options available in the GUI include:
- indentation level;
- compact JSON output;
- verbosity level;
- debug traceback logging;
- parsed intermediate representation logging.
When enabled, diagnostic can be written to file at the same time as the output JSON: XMLconv.log for stout and debug.log for stderr.
Run from source
Clone the repository and prepare the virtual environment:
bash ./100_src/install_dependencies.sh
By default, this installs only the command-line converter dependencies.
To install the GUI dependencies:
bash ./100_src/install_dependencies.sh --gui
To install the development environment, including GUI and development extras:
bash ./100_src/install_dependencies.sh --dev
Run the CLI from the source tree:
.venv/bin/python -m europass_converter.cli --help
Or use the installed console script inside the virtual environment:
.venv/bin/europass-convert --help
Test the bundled sample
.venv/bin/python -m europass_converter.cli "./100_src/sample.xml" \
--template "./100_src/sample.json" \
--output "./out/resume2.json" \
--no-split-pages \
--debug \
--debug-parsed \
-v 2
Or:
.venv/bin/europass-convert "./100_src/sample.xml" \
--template "./100_src/sample.json" \
--output "./out/resume2.json" \
--no-split-pages \
--debug \
--debug-parsed \
-v 2
This writes the converted JSON file to ./out/resume2.json.
CLI options
positional arguments:
xml_path Path to the Europass Candidate XML file.
required options:
--template PATH Path to the Reactive Resume JSON Resume v5 template.
optional output options:
--output PATH Path to write the converted JSON.
--indent N JSON indentation level. Default: 2.
--compact Write compact single-line JSON.
optional contact-selection options:
--preferred-email EMAIL Preferred primary email if several are found.
--preferred-phone PHONE Preferred primary phone if several are found.
--preferred-website URL Preferred primary website if several are found.
optional layout options:
--no-split-pages Put all content into one metadata.layout.pages entry.
diagnostic options:
-v, --verbose {0,1,2} Verbosity level.
0 = suppress diagnostics
1 = print diagnostics if present
2 = print diagnostics and success summary
--debug Print a full traceback on errors.
--debug-parsed Print the parsed intermediate representation.
Example with diagnostics and no converter-level page splitting:
python cli.py ./100_src/sample.xml \
--template ./100_src/sample.json \
--output ./out/resume.json \
--no-split-pages \
--debug \
--debug-parsed \
-v 2
Conversion policy
The converter follows these mapping rules.
Personal information
| Europass XML source | JSON target |
|---|---|
CandidatePerson/PersonName |
basics.name |
| email contact channels | basics.email, additional emails to basics.customFields |
| telephone contact channels | basics.phone, additional phones to basics.customFields |
| website contact channels | basics.website, additional websites to basics.customFields |
| social/academic profile links | sections.profiles.items[] |
| address locality | basics.location |
| nationality | basics.customFields |
| birth date/year | basics.customFields |
The current default contact priority is XML order unless the user provides --preferred-email, --preferred-phone, and/or --preferred-website.
Work experience
EmploymentHistory/EmployerHistory/PositionHistory entries become sections.experience.items[].
The mapper preserves:
- organisation
- position
- city/country
- date range
- first organisation website
- sanitised description
Career progression embedded in the Europass description is kept inside the description unless later parser versions can reliably extract structured roles.
Education
EducationHistory/EducationOrganizationAttendance entries become sections.education.items[].
The mapper preserves:
- institution
- degree
- education period
- location
- website
- grade
- thesis
- credits
- sanitised description
Raw programme concentration codes are intentionally discarded.
Courses
Europass Others blocks with title Course become certification items:
sections.certifications.items[]
Publications
Europass Others blocks with title Pubblications or Publications are parsed as publication groups.
The mapper attempts to split only obvious HTML list items into individual publication records. If the group cannot be split safely, it is preserved as a custom Publications section.
Conferences and seminars
ConferencesAndSeminars/ConferenceAndSeminar entries become a customSections[] with type = "projects".
Other blocks
Other Europass Others blocks are preserved under a custom section called Additional information.
The mapper does not guess whether miscellaneous blocks are awards, interests, skills, or projects until a specific parser rule is implemented.
Languages
Native languages from PrimaryLanguageCode are mapped as:
fluency = "Native"
level = 5
Foreign languages from PersonQualifications/PersonCompetency are mapped using CEFR scores:
- the numeric level is based on the lowest available CEFR score;
- the displayed fluency is based on the spoken CEFR level;
- if both spoken interaction and spoken production exist, the lower of the two is used;
- incomplete CEFR data is preserved during parsing and handled conservatively during mapping.
References
If the XML does not provide references, the converter adds the 'Available upon request' placeholder.
HTML sanitisation
Europass XML often stores escaped HTML fragments. The converter decodes and sanitises these fragments before inserting them into JSON.
-
Allowed semantic tags
p,br,ul,ol,li,strong,em,u,a -
Allowed link protocols
http,https,mailto,tel -
Headings are converted to
p/strongmarkup. -
Plain text without HTML tags is converted into paragraph HTML.
-
The sanitizer also removes/normalises:
- scripts
- styles
- iframes
- embedded objects
- unknown attributes
- inline CSS
- layout-only spans
- empty paragraphs
- empty list items
- unsafe links
Layout behaviour
By default, the mapper builds metadata.layout.pages using a standard resume order and includes only sections that contain content.
To prevent the converter from creating multiple layout page entries, use:
--no-split-pages
This does not guarantee that the target rendering app will not paginate overflowing content when exported to PDF. It only prevents the converter from splitting metadata.layout.pages.
Template handling
The converter treats the provided JSON template as the source of Reactive Resume-compatible defaults.
It clears content fields such as:
basics.namebasics.emailbasics.phonebasics.locationsummary.contentsections.*.itemscustomSections
It preserves non-content fields such as:
metadata.templatemetadata.cssmetadata.pagemetadata.designmetadata.typographymetadata.notesmetadata.layout.sidebarWidth- picture styling
It rebuilds only metadata.layout.pages.
Python API
The public converter API is provided by converter.py.
Convert files
from europass_converter.converter import convert_files, resume_to_json
result = convert_files(
"sample.xml",
"sample.json",
preferred_email=None,
preferred_phone=None,
preferred_website=None,
split_pages=True,
)
json_text = resume_to_json(result.resume, indent=2)
Convert parsed data
from converter import convert_parsed
from parse_candidate import parse_candidate_file
from template import load_template
parsed = parse_candidate_file("sample.xml")
template = load_template("sample.json")
result = convert_parsed(parsed, template)
Convert XML string
from europass_converter.converter import convert_xml_string
from europass_converter.template import load_template
template = load_template("sample.json")
result = convert_xml_string(xml_string, template)
Design principles
The converter is split into small modules with narrow responsibilities:
parse_candidate.pyextracts neutral content from XML.contacts.pyclassifies and selects contact channels.languages.pyhandles CEFR/native language compression.sanitize_html.pycleans rich text.template.pyprepares a Reactive Resume JSON Resume v5 template.map_resume.pyapplies the mapping policy.converter.pyorchestrates parsing and mapping.cli.pyhandles command-line arguments and file output.
This separation keeps the conversion policy testable and prevents XML parsing, template preparation, and JSON mapping from becoming tangled.
Limitations
Current limitations:
- only the uploaded Europass
CandidateXML dialect is targeted; - the older Europass
LearnerInfoTypeschema is not the primary supported input; - strict generic JSON Resume compatibility is not the priority;
- the output is optimised for Reactive Resume JSON Resume v5 import;
- programme concentration codes are discarded;
- employer emails are ignored;
- detailed language CEFR sub-scores are compressed;
- publication parsing is conservative;
- unknown non-empty XML blocks are preserved as unhandled content rather than guessed into specific sections;
- visual pagination may still occur in the target app even when
--no-split-pagesis used.
Suggested future work
Possible improvements:
- add tests for each module;
- add strict-schema output mode;
- improve GUI packaging and release builds;
- add an interactive contact selector;
- improve publication splitting;
- add richer country/language code resolution;
- add importer validation against the target app;
- add release automation and GitHub Actions packaging checks;
- add CI checks for linting and tests.
Licensing
Source code in this repository is licensed under the GNU Affero General Public License v3. (AGPL-3.0), unless stated otherwise.
Documentation, including README files, manuals, and explanatory text, is licensed under Creative Commons Attribution-ShareAlike 4.0 International (CC-BY-SA-4.0), unless stated otherwise.
Sample files, test fixtures, and third-party assets may be subject to separate licensing terms as indicated in their respective files or directories.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file europassxml_to_reactiveresumejson-0.2.0.tar.gz.
File metadata
- Download URL: europassxml_to_reactiveresumejson-0.2.0.tar.gz
- Upload date:
- Size: 116.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37d750605d04e1b65396769ca07188bfd3f2074cdd616c130f1090efea87ad9e
|
|
| MD5 |
06ea3261cca8efdcadab2a4ad9602253
|
|
| BLAKE2b-256 |
ed3db0e2e20a0d85963b7e1c6581c89c446cc2ac0d6217eb19cb15eaada39b7d
|
Provenance
The following attestation bundles were made for europassxml_to_reactiveresumejson-0.2.0.tar.gz:
Publisher:
release.yml on FATelarico/EuropassXML_to_ReactiveResumeJSON
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
europassxml_to_reactiveresumejson-0.2.0.tar.gz -
Subject digest:
37d750605d04e1b65396769ca07188bfd3f2074cdd616c130f1090efea87ad9e - Sigstore transparency entry: 1659087586
- Sigstore integration time:
-
Permalink:
FATelarico/EuropassXML_to_ReactiveResumeJSON@a9cb44b7c64993cd8217d040c1cd419954d2f5d8 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/FATelarico
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a9cb44b7c64993cd8217d040c1cd419954d2f5d8 -
Trigger Event:
push
-
Statement type:
File details
Details for the file europassxml_to_reactiveresumejson-0.2.0-py3-none-any.whl.
File metadata
- Download URL: europassxml_to_reactiveresumejson-0.2.0-py3-none-any.whl
- Upload date:
- Size: 118.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a731236fd09cc6fdc56ea5e92807de11bc0ea74fffad39199d725c452363c8e6
|
|
| MD5 |
984ffc1ef9cb70c911269827ec9ef954
|
|
| BLAKE2b-256 |
2b5027767c91068311e54c60930466b5fd3f71639d1c5fcbe23c791d1138ae7c
|
Provenance
The following attestation bundles were made for europassxml_to_reactiveresumejson-0.2.0-py3-none-any.whl:
Publisher:
release.yml on FATelarico/EuropassXML_to_ReactiveResumeJSON
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
europassxml_to_reactiveresumejson-0.2.0-py3-none-any.whl -
Subject digest:
a731236fd09cc6fdc56ea5e92807de11bc0ea74fffad39199d725c452363c8e6 - Sigstore transparency entry: 1659087674
- Sigstore integration time:
-
Permalink:
FATelarico/EuropassXML_to_ReactiveResumeJSON@a9cb44b7c64993cd8217d040c1cd419954d2f5d8 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/FATelarico
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a9cb44b7c64993cd8217d040c1cd419954d2f5d8 -
Trigger Event:
push
-
Statement type: