Replace words and remove blocks inside a Word document without losing format

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

python-docx-replace

This library was built on top of python-docx and the main purpose is to replace words inside a document without losing the format.

Let's explain the process behind the library:

First way, losing formatting

One of the ways to replace a key inside a document is by doing something like the code below. Can you do this? YES! But you are going to lose all the paragraph formatting.

key = "${name}"
value = "Ivan"
for p in _get_all_paragraphs(doc):
    if key in p.text:
        p.text = p.text.replace(key, value)

Second way, not all keys

Using the python-docx library, each paragraph has a couple of runs which is a proxy for objects wrapping <w:r> element. We are going to tell more about it later and you can see more details in the docs.

You can try replacing the text inside the runs and if it works, then your job is done:

key = "${name}"
value = "Ivan"
for p in _get_all_paragraphs(doc):
    for run in p.runs:
        if key in run.text:
            run.text = run.text.replace(key, value)

The problem here is that the key can be broken in more than one run, and then you won't be able to replace it, for example:

It's going to work:

Word Paragraph: "Hello ${name}, welcome!"
Run1: "Hello ${name}, w"
Run2: "elcome!"

It's NOT going to work:

Word Paragraph: "Hello ${name}, welcome!"
Run1: "Hello ${na"
Run2: "me}, welcome!"

You are probably wondering, why does it break paragraph text this way? What are the purpose of the run?

Imagine a Word document with this format:

word

Considering this, what would the format be after parsing the key? Highlighted yellow? Bold and underline? Red with another font? All of them?

That's the purpose of runs, each run hides their sets.

The final format will be the format that is present in the $ character. All of the others key's characters and their formats will be discarded. In the example above, the final format will be highlighted yellow.

Solution

The solution adopted is quite simple. First we try to replace in the simplest way, as in the previous example. If it's work, great, all done! If it's not, we build a table of indexes:

key = "${name}"
value = "Ivan"

Word Paragraph: "Hello ${name}, welcome!"
Run1: "Hello ${na"
Run2: "me}, welcome!"

Word Paragraph: 'H' 'e' 'l' 'l' 'o' ' ' '$' '{' 'n' 'a' 'm' 'e' '}' ',' ' ' 'w' 'e' 'l' 'c' 'o' 'm' 'e' '!'
Char Indexes:    0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22
Run Index:       0   0   0   0   0   0   0   0   0   0   1   1   1   1   1   1   1   1   1   1   1   1   1
Run Char Index:  0   1   2   3   4   5   6   7   8   9   0   1   2   3   4   5   6   7   8   9   10  11  12

Here we have the char indexes, the index of each run by char index and the run char index by run. A little confusing, right? 

With this table we can process and replace all the keys, getting the result:

# REPLACE PROCESS:
Char Index 6 = p.runs[0].text = "Ivan"  # replace '$' by the value
Char Index 7 = p.runs[0].text = ""  # clean all the others parts
Char Index 8 = p.runs[0].text = ""
Char Index 9 = p.runs[0].text = ""
Char Index 10 = p.runs[1].text = ""
Char Index 11 = p.runs[1].text = ""
Char Index 12 = p.runs[1].text = ""

After that, we are going to have:

Word Paragraph: 'H' 'e' 'l' 'l' 'o' ' ' 'Ivan' '' '' '' '' '' '' ',' ' ' 'w' 'e' 'l' 'c' 'o' 'm' 'e' '!'
Indexes:         0   1   2   3   4   5   6      7  8  9 10 11 12  13  14  15  16  17  18  19  20  21  22
Run Index:       0   0   0   0   0   0   0      0  0  0 1  1  1   1   1   1   1   1   1   1   1   1   1
Run Char Index:  0   1   2   3   4   5   6      7  8  9 0  1  2   3   4   5   6   7   8   9   10  11  12

All done, now you Word document is fully replaced keeping all the format.

How to install

Via PyPI

pip3 install python-docx-replace

Vanilla

Grab the docx_replace.py file from the src folder and be happy!

How to use

from python_docx_replace import docx_replace

# get your document using python-docx
doc = Document("document.docx")

# call the replace function with your key value pairs
docx_replace(doc, name="Ivan", phone="+55123456789")

# do whatever you want after that, usually save the document
doc.save("replaced.docx")

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.4.4

May 6, 2023

0.4.3

Apr 11, 2023

0.4.2

Jan 29, 2023

0.4.1

Nov 19, 2022

0.4.0

Oct 30, 2022

0.3.3b0 pre-release

Oct 30, 2022

0.3.2b0 pre-release

Oct 30, 2022

0.3.1b0 pre-release

Oct 30, 2022

0.3.0b0 pre-release

Oct 30, 2022

This version

0.2.3b0 pre-release

Aug 27, 2022

0.2.2b0 pre-release

Aug 27, 2022

0.2.1b0 pre-release

Aug 26, 2022

0.2.0b0 pre-release

Aug 26, 2022

0.1.0

Jun 16, 2022

0.0.4

Jun 16, 2022

0.0.3

Jun 16, 2022

0.0.2

Jun 16, 2022

0.0.1

Jun 16, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python-docx-replace-0.2.3b0.tar.gz (5.5 kB view details)

Uploaded Aug 27, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

python_docx_replace-0.2.3b0-py3-none-any.whl (5.9 kB view details)

Uploaded Aug 27, 2022 Python 3

File details

Details for the file python-docx-replace-0.2.3b0.tar.gz.

File metadata

Download URL: python-docx-replace-0.2.3b0.tar.gz
Upload date: Aug 27, 2022
Size: 5.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.9.13

File hashes

Hashes for python-docx-replace-0.2.3b0.tar.gz
Algorithm	Hash digest
SHA256	`f5fa5698d999c8c6f6b4e42aa0f6e30d440666f1a134b2dbe1582ce9cef565eb`
MD5	`9d7411276639e4919065ac8467979b0f`
BLAKE2b-256	`b499557495348d3508f55c2448ae3c4c0574067dec33af9ec747bbad89689b40`

See more details on using hashes here.

File details

Details for the file python_docx_replace-0.2.3b0-py3-none-any.whl.

File metadata

Download URL: python_docx_replace-0.2.3b0-py3-none-any.whl
Upload date: Aug 27, 2022
Size: 5.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.9.13

File hashes

Hashes for python_docx_replace-0.2.3b0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5834284a0e8d1a7d693584b55e4832bd3c2ae80b26423bdb2ec2baf83cdf8afe`
MD5	`e2ddcdab92b3c7bf50e81a765425c014`
BLAKE2b-256	`e3c6c4b3c5dd4a4d0c5d11c027d9c0e0b73514176446aa870c17fc2b4725e5bb`

See more details on using hashes here.

python-docx-replace 0.2.3b0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

python-docx-replace

First way, losing formatting

Second way, not all keys

Solution

How to install

Via PyPI

Vanilla

How to use

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes