CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 12 of 12
  1. #1
    Join Date
    Aug 2009
    Posts
    1

    pulling specific text from a notepad document

    Hi everyone,

    I have a folder full of notepad documents containing source code from a web page.

    I would like to pull certain text from the source code into an excel file. The text I am trying to grab appears in virtually the same spot in every document, however it may not always be the same length.

    Here is a rough version of the code I'm working with. I've included the specific tags that surround the data I'm after (the <table> and <td> tags are not part of the specific tags, i just used them in the example code). These tags hold true throughout all of my data. There are multiple products in each text file.

    The result sheet I'm after is just an excel spreadsheet with 4 columns, one including the product name, the price, the time sold, and the date sold. Also I need this to run on all the text documents in the folder. The text documents are named 1.txt,2.txt,3.txt, etc. For this lets say they are all saved in a folder C:\webpages

    Code:
    <html>
    <body>
    
    <table border="0" cellpadding="3px" cellspacing="3px">
    <tr>
    <td>
    	<span style="font-size: 1.1em; font-weight: bold;"><a href="tooth-brush.html">Tooth Brush</a>&nbsp;</span>
    </td>
    <td align="center">
    	<bold>05:00 PDT</bold>
    </td>
    <td>
    <td align="center">
    	<strong>$2.00</strong>
    </td>
    <td>
    	<div style="font-size: 0.7em;">08-06-2009</div>
    </td>
    </tr>
    </table>
    
    <table border="0" cellpadding="3px" cellspacing="3px">
    <tr>
    <td>
    	<span style="font-size: 1.1em; font-weight: bold;"><a href="cell-phone.html">Cell Phone</a>&nbsp;</span>
    </td>
    <td align="center">
    	<bold>06:02 PDT</bold>
    </td>
    <td>
    <td align="center">
    	<strong>$50.00</strong>
    </td>
    <td>
    	<div style="font-size: 0.7em;">08-06-2009</div>
    </td>
    </tr>
    </table>
    
    </body>
    </html>
    Thanks in advance

  2. #2
    Join Date
    Jan 2006
    Location
    Fox Lake, IL
    Posts
    15,007

    Re: pulling specific text from a notepad document

    What do you need help with? What have you done so far? I'd suggest that you read a few threads about splitting data. Skin a Cat is one that you'll find.
    David

    CodeGuru Article: Bound Controls are Evil-VB6
    2013 Samples: MS CODE Samples

    CodeGuru Reviewer
    2006 Dell CSP
    2006, 2007 & 2008 MVP Visual Basic
    If your question has been answered satisfactorily, and it has been helpful, then, please, Rate this Post!

  3. #3
    Join Date
    Mar 2009
    Posts
    12

    Re: pulling specific text from a notepad document

    You can import text file in excel. after importing it, you can do a find for your product and copy to the required sheet.

  4. #4
    Join Date
    Jul 2006
    Location
    Germany
    Posts
    3,725

    Re: pulling specific text from a notepad document

    It seems the repeating and relevant parts of the files are this:
    Code:
    	<span style="font-size: 1.1em; font-weight: bold;"><a href="tooth-brush.html">Tooth Brush</a>&nbsp;</span>
    </td>
    <td align="center">
    	<bold>05:00 PDT</bold>
    </td>
    <td>
    <td align="center">
    	<strong>$2.00</strong>
    </td>
    <td>
    	<div style="font-size: 0.7em;">08-06-2009</div>
    What I'd do is:
    Read the complete file in a string buffer. No need to split it into lines, there.
    In a do loop I would move a pointer to the next occurrence of "<a href=" which is the beginning of relevant data within a block.
    Then I'd move the pointer on to the next ">" and extract the data between there and the next "<", which gives you the product name.
    In much the same way you move on to "<bold>" which gives you the PDT,
    then "<strong>" (or even the "$") which gives you the price and finally the date.
    You loop until no more occurrences of "<a href=" are found.
    You can write the found data to a .csv file which can be read by excel.

    If you need more help with using the string functions InStr() and Mid$() to find and extract the strings, then come back here.

  5. #5
    Join Date
    Apr 2009
    Posts
    394

    Re: pulling specific text from a notepad document

    Okay, here is a possible solution for you since these are actually HTML files...

    Start a new standard exe project and add a webbrowser control (right click on toolbox>components>Microsoft Internet Controls>OK)>add it to your form and name it WB. Project>References>Microsoft HTML Object Library>OK.

    Code:
    Dim H As HTMLDocument, TD As Object, I As Object
    
    CopyFile SourcePath & SourceFileName & SourceExtension, App.Path & "\" & SourceFileName & ".htm"
    
    WB.Navigate App.Path & "\" & SourceFileName & ".htm"
    Do While WB.ReadyState <> READYSTATE_COMPLETE
      DoEvents
    Loop
    
    Set H = WB.Document
    
    Set TD = H.getElementsByTagName("td")
    
    For Each I In TD
      Debug.Print I.innerText
    Next I
    
    WB.Navigate "about:blank"
    
    Kill App.Path & "\" & SourceFileName & ".htm"
    As for putting the information into excel there are plenty of tutorials and examples out there.



    Good Luck

  6. #6
    Join Date
    Jul 2006
    Location
    Germany
    Posts
    3,725

    Re: pulling specific text from a notepad document

    Well, that's a good one, actually.
    It is also straight forward, since it deals with the text as what it is meant to be, which is Elements within a html DOM structure. (I only handled it as usual text)
    Still, while looping through all TD elements, you have to find the relevant text parts and extract them with some decent string functions.

  7. #7
    Join Date
    Apr 2009
    Posts
    394

    Re: pulling specific text from a notepad document

    No not really, if all the files are structured like this one then all the OP needs to do is to keep track of what is being returned (1,2,3,4) 4 TD's per table in same structure.

  8. #8
    Join Date
    Jul 2006
    Location
    Germany
    Posts
    3,725

    Re: pulling specific text from a notepad document

    Ok, I must admit I'm not a big man in html.
    So the first TD of the list is this:
    Code:
    <td>
    	<span style="font-size: 1.1em; font-weight: bold;"><a href="tooth-brush.html">Tooth Brush</a>&nbsp;</span>
    </td>
    If only the words "Tooth Brush" are returned then you are right and you have described the most simple way tpo get to the information.
    I thought maybe all the other statements between <td> and </td> are returnde too, resulting in:
    <span style="font-size: 1.1em; font-weight: bold;"><a href="tooth-brush.html">Tooth Brush</a>&nbsp;</span>

  9. #9
    Join Date
    Apr 2009
    Posts
    394

    Re: pulling specific text from a notepad document

    Okay, I see what you were thinking and no that would be .innerhtml and not .innertext.

  10. #10
    Join Date
    Jul 2006
    Location
    Germany
    Posts
    3,725

    Re: pulling specific text from a notepad document

    I see. Thanks for making clear.
    I think I have to learn more about the DOM and the possibilities of the WebBrowser control.

  11. #11
    Join Date
    Apr 2009
    Posts
    394

    Re: pulling specific text from a notepad document

    Well if you are going to delve into these objects, I suggest you also look at the XML vX (3+) object also as it is similiar to the DOM or HTML Object Library.



    Good Luck

  12. #12
    Join Date
    Jul 2006
    Location
    Germany
    Posts
    3,725

    Re: pulling specific text from a notepad document

    Not right away. But nevertheless, thanks for this hint.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured