CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 9 of 9
  1. #1
    Join Date
    Mar 2023
    Posts
    0

    SW to scan pdfs and generate report

    Hi all.

    I am trying to find out if there is any existing sw to scan pdf files, and then generate reports based on the findings. If nothing already exists, I have no issue hiring someone to do this.

    Does anyone know of anything? If this is too vague, I can provide some more details if need be.

    Thanks in advance.

  2. #2
    Join Date
    Nov 2018
    Posts
    120

    Re: SW to scan pdfs and generate report

    What operating system are you using?
    What are you looking for in the PDFs? text, images, tables, urls?
    Are the PDFs machine generated (like the output of word), or bundles of fuzzy images that need to be fed through OCR first?
    What format are the reports in?

    https://en.wikipedia.org/wiki/Pdftotext

  3. #3
    Join Date
    Mar 2023
    Posts
    0

    Re: SW to scan pdfs and generate report

    Thank you for the reply.

    Primarily using W7, but I have a W11 laptop I also use.

    Looking for text in the PDFs

    The PDFs are machine generated, I think.... the text is "raised" (for lack of a better way of putting it), and can be selected, copied and pasted.

    The created reports can be in any format... .doc .pdf .xlsx or other

    Cheers.

  4. #4
    Join Date
    Nov 2018
    Posts
    120

    Re: SW to scan pdfs and generate report

    The pdf2text program I linked should get you as far as generating a .txt file.

  5. #5
    Join Date
    Mar 2023
    Posts
    0

    Re: SW to scan pdfs and generate report

    I saw that, thank you. So, if I were to do that, I would still be in the same boat; the Q would then be, is there is any existing sw to scan pdf files, and then generate reports based on the findings...

  6. #6
    Join Date
    Nov 2018
    Posts
    120

    Re: SW to scan pdfs and generate report

    You'd be in the boat with the plain text, which you can then do anything you like with.

    Word will read text files just fine, which you can then edit and format to your hearts desire, and save as a word document.

  7. #7
    Join Date
    Mar 2023
    Posts
    0

    Re: SW to scan pdfs and generate report

    Right - but then I would need to auto-scan the plain text, and then generate a report automatically based on the findings.

    Is there a way to do this?

  8. #8
    2kaud's Avatar
    2kaud is offline Super Moderator Power Poster
    Join Date
    Dec 2012
    Location
    England
    Posts
    7,822

    Re: SW to scan pdfs and generate report

    If you have a plain text file, then you can use Word et al with macros and vba etc to do whatever you want. You can also write a program in almost any programming language you know/want to process a text file as required.
    All advice is offered in good faith only. All my code is tested (unless stated explicitly otherwise) with the latest version of Microsoft Visual Studio (using the supported features of the latest standard) and is offered as examples only - not as production quality. I cannot offer advice regarding any other c/c++ compiler/IDE or incompatibilities with VS. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/ and can be used without reference or acknowledgement. Also note that I only provide advice and guidance via the forums - and not via private messages!

    C++23 Compiler: Microsoft VS2022 (17.6.5)

  9. #9
    Join Date
    Nov 2018
    Posts
    120

    Re: SW to scan pdfs and generate report

    > If nothing already exists, I have no issue hiring someone to do this.
    ...
    > Right - but then I would need to auto-scan the plain text, and then generate a report automatically based on the findings.
    > Is there a way to do this?
    Well if you have zero programming knowledge, I guess the next step is to post in https://forums.codeguru.com/forumdis...sitions-(Jobs)

    Without a specific example of a representative(*) input PDF and the corresponding output document, all you're going to get is hand wavy "sure, it's possible".
    You'll also need to describe the rules of how you chose bits of the input PDF to begin with.

    Eg.
    "Each input PDF contains a table of itemised expenses and a table of itemised sales for each salesperson at a particular sales office.
    At the bottom of each table is a total.
    I want to extract those two totals and output to a new document in the form
    Last Name, First Name, salestotal, expensestotal
    Basically, write down the steps you do manually.

    With that people can start to judge how big a job it might be for them.

    representative:
    If your actual reports contain real names, then make a PDF with fictional names like "Fred Flintstone" and post that.
    It also needs to cover the majority of cases that you want to deal with, like say
    - people with hyphenated names
    - people with non-ascii characters in their name

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured