These files can then be read by pgfplotstable (at least v 1.4) and be used to typeset the line numbers on top of the imported pdf file. Running the script takes about 1 second for one page, resulting in a number of files: basename-.txt, where odd contain the positions of the left line numbers, and even those of the right page numbers. I wrote a little shell script that, using ImageMagick (at least version 6.6.9-4), converts a given PDF into separate raster images for each page, splits these into half pages, shrinks them to a width of one pixel (so takes the horizontal average, basically), turns this into a monochrome image with a given threshold (black=text, white=no text), shrinks every black sequence down to one pixel (=middle of a line), outputs this as a text, pipes it to sed to clean it up and remove all the non-text lines and finally writes a txt file with the position of each line as 1/1000 of the text height.įindlines.sh: convert $1.pdf -crop 50x100% png:$1Ĭonvert $f -flatten -resize 1X1000! -black-threshold 99% -white-threshold 10% -negate -morphology Erode Diamond -morphology Thinning:-1 Skeleton -black-threshold 50% txt:-| sed -e '1d' -e '/#000000/d' -e 's/^*,//' -e 's///g' -e 's/.*//' -e 's/,/ /g' > $f.txt Alright, here's a go at numbering lines in a PDF (or any other image format) without access to the source.