CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 2 of 2 FirstFirst 12
Results 16 to 22 of 22
  1. #16
    Join Date
    Apr 1999
    Posts
    27,449

    Re: Extracting time stamps from pdf

    Quote Originally Posted by Simon666 View Post
    Victor, I did not ignore you. I addressed that issue specifically. There are about 80 entries per pdf and 10 pdf's. I didn't want to do roughly all 800 of them manually when roughly 10 lines of code could do but by now the time spent will be about equal.
    What is the significance of the file being PDF?

    All you're really asking is "how to find text in a quick way in a file". It doesn't matter if the file is PDF or not. The only thing that would need to be known is whether the file can contain control characters or not. Then you need to open the file in binary mode if it contains NULLs or control characters.

    Regards,

    Paul McKenzie
    Last edited by Paul McKenzie; April 26th, 2013 at 03:24 PM.

  2. #17
    Join Date
    Aug 2000
    Location
    West Virginia
    Posts
    7,721

    Re: Extracting time stamps from pdf

    Code:
    ToFind.Format("when");
    I don't have access to VS compiler right now ... but would
    that compile if using a UNICODE build ?

  3. #18
    Join Date
    Oct 2001
    Location
    lake of fire and brimstone
    Posts
    1,628

    Re: Extracting time stamps from pdf

    Quote Originally Posted by VictorN View Post
    What "why"?
    Are they found as ANSI or UNICODE?
    Is your buils ANSI or UNICODE?
    I use Multi Byte character set in my project configuration. The pdf I have no clue how to find out.
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞

  4. #19
    GCDEF is offline Elite Member Power Poster
    Join Date
    Nov 2003
    Location
    Florida
    Posts
    12,635

    Re: Extracting time stamps from pdf

    I just opened a couple of PDF files in a binary editor. They appear to be a mixture of binary and text data, lots of embedded nulls. The binary data means that you can't effectively use CString and CStdioFile to manipulate them.

  5. #20
    Join Date
    Aug 2000
    Location
    West Virginia
    Posts
    7,721

    Re: Extracting time stamps from pdf

    Can you run the following console code and see if the results
    are what you expect ? Change the name of the input file
    and check the results.txt file when done.

    Code:
    #include <fstream>
    #include <string>
    #include <iostream>
    
    int main()
    {
        using namespace std;
    
        ifstream in("your pdf file",ios::binary);
    
        if (!in)
        {
            cout << "could not open file\n";
            cin.ignore(100,'\n');
    
            return 0;
        }
    
        ofstream out("results.txt");
    
        string line;
    
        int count = 0;
    
        while (getline(in,line))
        {
            size_t start = 0;
            size_t pos;
    
            while ( (pos=line.find("when",start)) != string::npos )
            {
                size_t pos1 = line.find('"',pos);
                size_t pos2 = line.find('"',pos1+1);
    
                if (pos1 != string::npos && pos2!=string::npos)
                {
                    ++count;
                    start = pos2 + 1;
                    out << line.substr(pos1+1,pos2-pos1-1) << "\n";
                }
                else
                {
                    // ill formed ... skip
                    start = pos + 1;
                }
            }
        }
    
        out << "number of occurrences = " << count << "\n";
        
        return 0;
    }

  6. #21
    Join Date
    Oct 2001
    Location
    lake of fire and brimstone
    Posts
    1,628

    Re: Extracting time stamps from pdf

    Anyway, this did the trick:

    Code:
    	CStdioFile InputFile;
    
    	if (InputFile.Open(FileName,CFile::modeRead|CFile::typeBinary))
            {
    	 CString Line; CString ToFind; ToFind.Format("when");
    
    	 ULONGLONG End = InputFile.SeekToEnd();
    
    	 InputFile.SeekToBegin();
    	 
    	 while ( (InputFile.ReadString(Line)) || (InputFile.GetPosition()!=End) )
    	 {
    	  if (Line.Find(ToFind)!=-1)
    	  {
    	   //Further processing
    	  }
    	 }
    
    	 InputFile.Close();
    	}
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞

  7. #22
    Join Date
    Oct 2001
    Location
    lake of fire and brimstone
    Posts
    1,628

    Re: Extracting time stamps from pdf

    Quote Originally Posted by Philip Nicoletti View Post
    Can you run the following console code and see if the results
    are what you expect ? Change the name of the input file
    and check the results.txt file when done.

    Code:
    #include <fstream>
    #include <string>
    #include <iostream>
    
    int main()
    {
        using namespace std;
    
        ifstream in("your pdf file",ios::binary);
    
        if (!in)
        {
            cout << "could not open file\n";
            cin.ignore(100,'\n');
    
            return 0;
        }
    
        ofstream out("results.txt");
    
        string line;
    
        int count = 0;
    
        while (getline(in,line))
        {
            size_t start = 0;
            size_t pos;
    
            while ( (pos=line.find("when",start)) != string::npos )
            {
                size_t pos1 = line.find('"',pos);
                size_t pos2 = line.find('"',pos1+1);
    
                if (pos1 != string::npos && pos2!=string::npos)
                {
                    ++count;
                    start = pos2 + 1;
                    out << line.substr(pos1+1,pos2-pos1-1) << "\n";
                }
                else
                {
                    // ill formed ... skip
                    start = pos + 1;
                }
            }
        }
    
        out << "number of occurrences = " << count << "\n";
        
        return 0;
    }
    Wow, that works even better. Thanks a lot. There were 400+ occurrences in one file alone, I had seriously underestimated the number of occurrences and the manual work it would have taken. There are further time stamp entries of slightly different formatting but I can handle things now from here on. Thanks to everyone who weighed in.
    Last edited by Simon666; April 26th, 2013 at 06:28 PM.
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞
    ۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞۞

Page 2 of 2 FirstFirst 12

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured