Textract installation hints#

Textract needs Poppler to extract text from PDFs.

Windows#

  1. Download the latest binary of your choice from github.com/oschwartz10612. In this example we will download and use Release-22.01.0-0.zip.
  2. Extract the archive file Release-22.01.0-0.zip
  3. Copy the folders from poppler-22.01.0\Library into C:\Program Files\Poppler.
  4. Thus, the directory structure should look something like this:
C:\Program Files\Poppler
                        \bin
                        \include
                        \lib
                        \share
  1. Add C:\Program Files\Poppler\bin to your system PATH!
  2. Try it with a filecontent example rule