Python convert docx to pdf without word

I'm dealing with a problem trying to develop a web-app, part of which converts uploaded docx files to pdf files (after some processing). With python-docx and other methods, I do not require a windows machine with word installed, or even libreoffice on linux, for most of the processing (my web server is pythonanywhere - linux but without libreoffice and without sudo or apt install permissions). But converting to pdf seems to require one of those. From exploring questions here and elsewhere, this is what I have so far:

import subprocess

try:
    from comtypes import client
except ImportError:
    client = None

def doc2pdf(doc):
    """
    convert a doc/docx document to pdf format
    :param doc: path to document
    """
    doc = os.path.abspath(doc) # bugfix - searching files in windows/system32
    if client is None:
        return doc2pdf_linux(doc)
    name, ext = os.path.splitext(doc)
    try:
        word = client.CreateObject('Word.Application')
        worddoc = word.Documents.Open(doc)
        worddoc.SaveAs(name + '.pdf', FileFormat=17)
    except Exception:
        raise
    finally:
        worddoc.Close()
        word.Quit()


def doc2pdf_linux(doc):
    """
    convert a doc/docx document to pdf format (linux only, requires libreoffice)
    :param doc: path to document
    """
    cmd = 'libreoffice --convert-to pdf'.split() + [doc]
    p = subprocess.Popen(cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE)
    p.wait(timeout=10)
    stdout, stderr = p.communicate()
    if stderr:
        raise subprocess.SubprocessError(stderr)

As you can see, one method requires comtypes, another requires libreoffice as a subprocess. Other than switching to a more sophisticated hosting server, is there any solution?

Python convert docx to pdf without word

Have you always wanted to convert Docx files into pdf in a batch? If yes, these python scripts will make your life much more comfortable.
The two methods of which one includes GUI another doesn’t. The following one is slightly different and requires a Word application installed on your system.To install docx2pdf:

The first one will be without GUI:

pip install docx2pdf

Python convert docx to pdf without word

Fig.1. Python Code

Explanation: The convert() function takes two arguments that are the abs path of the file you want to convert and where you want to save. You can do batch convert by providing the folder path.

This library also allows using CLI instead of writing a code in a separate file.

Python convert docx to pdf without word

Fig.2. CLI commands

To implement this using GUI, you must install PyQt5:

pip install PyQt5

Python convert docx to pdf without word

Fig.3. Drag drop ft. to convert docx to pdf in batch

Output:

Python convert docx to pdf without word

Fig.4. Main window where user has to drag and drop files

Python convert docx to pdf without word

Fig.5. Processing window

Python convert docx to pdf without word

Fig.6. Docx files get converted into PDF

Explanation: There is a lot of theory to cover all the functions used, but we will only see the important ones. You can get more information by checking out the references attached below.

QMimeData belongs to the QtCore module stores data on a clipboard then is used in the drag and drop process.

DragEnterEvent: provides an event sent to the target widget as dragging action enters it.

DragMoveEvent: is used when the drag and drop action is in progress.

DropEvent: is the event that occurs when the drop gets completed.

hasUrls(): Returns true if the object can return a list of URLs; else returns false.

The basic idea is to use PyQt5 to create a GUI, PyQt.QtWidgets to use QListWidget and its functions and PyQt5.QtCore to get the URI-list of the files, basically the location of the file, and then converting the file at this location.

There is also another way to convert Docx files into PDF that requires a word application installed on your system.

Install this first:

pip install comtypes

Python convert docx to pdf without word

Fig.7. Another way to convert Docx file into PDF

How do I convert a DOCX to PDF in Python?

How to convert DOCX to PDF.
Install 'Aspose. Words for Python via . NET'..
Add a library reference (import the library) to your Python project..
Open the source DOCX file in Python..
Call the 'save()' method, passing an output filename with PDF extension..
Get the result of DOCX conversion as PDF..

How do I convert to PDF in Python?

FPDF is a Python class that allows generating PDF files with Python code. It is free to use and it does not require any API keys..
Import the class FPDF from module fpdf..
Add a page..
Set the font..
Insert a cell and provide the text..
Save the pdf with “. pdf” extension..

How do I convert DOCX file to PDF?

The Acrobat Word to PDF online tool lets you convert DOCX, DOC, RTF, and TXT files to PDF using a web browser on any operating system. Just drag and drop a file to convert it and save as PDF.

Can Python convert PDF text?

Powerful Python library allows converting PDF files to almost all TXT document formats.