CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 1 of 2 12 LastLast
Results 1 to 15 of 22
  1. #1
    Join Date
    Oct 2001
    Location
    lake of fire and brimstone
    Posts
    1,628

    [RESOLVED] Extracting time stamps from pdf

    Hi, I am trying to automatically extract all time stamps in a pdf file. These are typically in a line like:

    when="2010-07-30T15:20:30+04:00"

    For this I was thinking of using CStdioFile and the ReadString function. Somehow this doesn't work. My example code is below. Is this because pdf is not a true text file, because strings read can be longer than some max,...? What is my mistake or anyone have another quick way of reading the file and extracting the desired text between the brackets?

    Code:
    CStdioFile InputFile;
    
    if (InputFile.Open(FileName,CFile::modeRead))
    {
     CString Line; CString ToFind; ToFind.Format("when");
    	 
     while (InputFile.ReadString(Line))
     {
      if (Line.Find(ToFind)!=-1)
      {
       CString Item; AfxExtractSubString(Item,Line,0,'"'); 
       AfxMessageBox(Item); // to be replaced with further processing
      }
     }
    
     InputFile.Close();
    }
    Last edited by Simon666; April 26th, 2013 at 07:41 AM.
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞

  2. #2
    VictorN's Avatar
    VictorN is offline Super Moderator Power Poster
    Join Date
    Jan 2003
    Location
    Hanover Germany
    Posts
    20,396

    Re: Extracting time stamps from pdf

    Well, open your .pdf file in the notepad and search for "when" text. Will it be found?
    Victor Nijegorodov

  3. #3
    Join Date
    Oct 2001
    Location
    lake of fire and brimstone
    Posts
    1,628

    Re: Extracting time stamps from pdf

    Quote Originally Posted by VictorN View Post
    Well, open your .pdf file in the notepad and search for "when" text. Will it be found?
    Already did that. What if there are a gazillion instances? I want to automate it. I thought the ten lines above could do it. Obviously I was wrong.
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞

  4. #4
    VictorN's Avatar
    VictorN is offline Super Moderator Power Poster
    Join Date
    Jan 2003
    Location
    Hanover Germany
    Posts
    20,396

    Re: Extracting time stamps from pdf

    Quote Originally Posted by Simon666 View Post
    Already did that. What if there are a gazillion instances? I want to automate it. I thought the ten lines above could do it. Obviously I was wrong.
    Did what? Did you open your (it means the file ttat for sure contains the text you are looking for).pdf file in the notepad?
    If you did it then what about the search result?
    Victor Nijegorodov

  5. #5
    Join Date
    Aug 2000
    Location
    West Virginia
    Posts
    7,721

    Re: Extracting time stamps from pdf

    I don't use CStdioFile, but in general, if you open a file up
    in text mode and read while looping, you can come across
    EOF early. Try opening in binary mode.

  6. #6
    Join Date
    Oct 2001
    Location
    lake of fire and brimstone
    Posts
    1,628

    Re: Extracting time stamps from pdf

    InputFile.Open(FileName,CFile::modeRead|CFile::typeBinary))

    Let's me read a lot further but it still exits way before the actual end of the file.
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞

  7. #7
    Join Date
    Aug 2000
    Location
    West Virginia
    Posts
    7,721

    Re: Extracting time stamps from pdf

    Some things to consider:

    1) I do not think that CString::Find() will work if their are embedded NULLS in the
    string before the time stamp.

    2) one "line" could contain multiple time stamps

  8. #8
    2kaud's Avatar
    2kaud is online now Super Moderator Power Poster
    Join Date
    Dec 2012
    Location
    England
    Posts
    7,822

    Re: Extracting time stamps from pdf

    I've looked at several .pdf files and none of them have when in them. The only time related stuff that I could find was /CreationDate eg. /CreationDate (D:20060208110100)
    All advice is offered in good faith only. All my code is tested (unless stated explicitly otherwise) with the latest version of Microsoft Visual Studio (using the supported features of the latest standard) and is offered as examples only - not as production quality. I cannot offer advice regarding any other c/c++ compiler/IDE or incompatibilities with VS. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/ and can be used without reference or acknowledgement. Also note that I only provide advice and guidance via the forums - and not via private messages!

    C++23 Compiler: Microsoft VS2022 (17.6.5)

  9. #9
    VictorN's Avatar
    VictorN is offline Super Moderator Power Poster
    Join Date
    Jan 2003
    Location
    Hanover Germany
    Posts
    20,396

    Re: Extracting time stamps from pdf

    Quote Originally Posted by 2kaud View Post
    I've looked at several .pdf files and none of them have when in them. The only time related stuff that I could find was /CreationDate eg. /CreationDate (D:20060208110100)
    Yeah, It was the first thing I've done before posting my first answer:
    Quote Originally Posted by VictorN View Post
    Well, open your .pdf file in the notepad and search for "when" text. Will it be found?
    Just because I coudn't find "when" (either as ANSI or UNICODE) in some of my .pdf files!
    However, OP seems to ignore my opnion...
    Victor Nijegorodov

  10. #10
    Join Date
    Oct 2001
    Location
    lake of fire and brimstone
    Posts
    1,628

    Re: Extracting time stamps from pdf

    Quote Originally Posted by Philip Nicoletti View Post
    Some things to consider:

    1) I do not think that CString::Find() will work if their are embedded NULLS in the
    string before the time stamp.

    2) one "line" could contain multiple time stamps
    1) I checked that doesn't occur.
    2) Same.

    Quote Originally Posted by 2kaud View Post
    I've looked at several .pdf files and none of them have when in them. The only time related stuff that I could find was /CreationDate eg. /CreationDate (D:20060208110100)
    It is metadata of pictures from Adobe Photoshop.

    Quote Originally Posted by VictorN View Post
    Just because I coudn't find "when" (either as ANSI or UNICODE) in some of my .pdf files!
    However, OP seems to ignore my opnion...
    Victor, I did not ignore you. I addressed that issue specifically. There are about 80 entries per pdf and 10 pdf's. I didn't want to do roughly all 800 of them manually when roughly 10 lines of code could do but by now the time spent will be about equal.
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞

  11. #11
    VictorN's Avatar
    VictorN is offline Super Moderator Power Poster
    Join Date
    Jan 2003
    Location
    Hanover Germany
    Posts
    20,396

    Re: Extracting time stamps from pdf

    Quote Originally Posted by Simon666 View Post
    Victor, I did not ignore you. I addressed that issue specifically. There are about 80 entries per pdf and 10 pdf's. I didn't want to do roughly all 800 of them manually when roughly 10 lines of code could do but by now the time spent will be about equal.
    So could you find in notepad at least one of these "80 entries per pdf" or not?
    Victor Nijegorodov

  12. #12
    Join Date
    Oct 2001
    Location
    lake of fire and brimstone
    Posts
    1,628

    Re: Extracting time stamps from pdf

    Yes I could, why?
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞

  13. #13
    VictorN's Avatar
    VictorN is offline Super Moderator Power Poster
    Join Date
    Jan 2003
    Location
    Hanover Germany
    Posts
    20,396

    Re: Extracting time stamps from pdf

    What "why"?
    Are they found as ANSI or UNICODE?
    Is your buils ANSI or UNICODE?
    Victor Nijegorodov

  14. #14
    Join Date
    Oct 2001
    Location
    lake of fire and brimstone
    Posts
    1,628

    Re: Extracting time stamps from pdf

    Anyway, I got an idea, I might use GetPosition and check if it is anywhere near the end of the file, if not just call ReadString again in a while loop. I'll first try that.
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞

  15. #15
    Arjay's Avatar
    Arjay is offline Moderator / EX MS MVP Power Poster
    Join Date
    Aug 2004
    Posts
    13,490

    Re: Extracting time stamps from pdf

    Quote Originally Posted by Simon666 View Post
    Anyway, I got an idea, I might use GetPosition and check if it is anywhere near the end of the file, if not just call ReadString again in a while loop. I'll first try that.
    Or you could respond to Victor's questions.

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured