-
Finding strings in multiple files
Hi,
Can some one please show me, if possible, how to read several files, looking for a certain number,
for example, you might have:
text1.txt
text2.txt
text3.txt
one of these will contain a number, in this example 5, is there a way of finding out wish one of the files contains 5.
Thanks, any questions please ask
-
Re: Finding strings in multiple files
If you look at some of the threads from today, you'll find most of your answer. All you have to do is look at more than one file, but that's simple enough.
Here are some relevant threads, all from today:
http://www.codeguru.com/forum/showthread.php?t=457987
http://www.codeguru.com/forum/showthread.php?t=457784
http://www.codeguru.com/forum/showthread.php?t=457952
-
Re: Finding strings in multiple files
Seems to me, that you could store the file names in a string array, loop through them, and within that loop, you'd open each, search for the text, and close the file. Searching could be one line at a time, or all at once. If the string being searched for is found, exit the loop.
-
Re: Finding strings in multiple files
Could you please give me an example code, because i know how to read a .txt file but new to arrays.
thanks
-
Re: Finding strings in multiple files
You do not even need an array, just read each line using somethign like line input or read the entire file into a variable then use instr() to search for the number a result > 0 means it has been found.
-
Re: Finding strings in multiple files
Something Like this?
Code:
Public Function SearchFiles(ByVal SearchString as String, ParamArray _
FilePaths() as Variant) as Variant
Dim strFilePathsWithString() as String
Dim intCounter as Integer
'initialize counter
intCounter = 0
'loop through passed files
For i = 0 to Ubound(FilePaths)
Dim strFileContents as string 'To recieve contents of file
Dim intFileNum as Integer 'Used to reference Open file.
'set file number
intFileNum = FreeFile(0)
'Open file at index i
Open FilePaths(i) for Input as intFileNum
'Loop through creating your full file string
Do Until EOF(intFileNum)
Dim strLineRead as string 'Used to read a line
'get line from file
Line Input #intFileNum, strLineRead
'Add the Line to your combined file contents.
strFileContents = strFileContents & strLineRead
Loop
'Close the file
Close intFileNum
'Test to see if search string is in contents of file
if Instr(strFileContents, SearchString) > 0 then
'redimension strFilePathsWithString to counter + 1
Redim Preserve FilePathsWithString(intCounter + 1)
'add the file path for the file that has it to the string to the
'strFilePathsWithString array
strFilePathsWithString(counter) = cstr(FilePaths(i))
'increment the counter
intCounter = intCounter + 1
End If
Next
'set function equal to created array of files with the searched for string
SearchFiles = strFilePathsWithString
End Function
There may be some errors in that; I have a hard time coding with out my color coded text editor, but this I think would be a good structure. In addition, there may be a more efficient way to do this; I am still kind of a noob myself, so one of the people with more experience may be able to clean it up.
-
Re: Finding strings in multiple files
Actually if you are going to do a line by line read then the best thing to do is check each line for the search phrase and if it is found exit the loop and close the file. No need to save all the lines and then do the search unless you want to locate all the instances in the file.
Code:
open filename for input as #filehandle
do while not eof(filehandle)
line input #filehandle, LineFromFile
if Instr(LineFromFile,SearchPhrase) then
FoundIt=True
exit do
end if
loop
close #filehandle
If foundit is false then move on to the next file and do the same process ideally using an outter loop with something like above nested within.
-
Re: Finding strings in multiple files
then, in six months, they ask you to count how many number X in each file?
back to square one?
-
Re: Finding strings in multiple files
I like the line by line check with a break as well.
-
Re: Finding strings in multiple files
That's basically it VehementSoftware, though it isn't necessary to use Variants. It is always a good idea to avoid Variants whenever possible, as they are less efficient. In this case, a string array would work well.
-
Re: Finding strings in multiple files
Quote:
Originally Posted by dglienna
then, in six months, they ask you to count how many number X in each file?
back to square one?
No biggie just a couple of minor tweaks. change the one line from foundit=true to foundit=foundit+1 and comment out the exit do and now you have a counter for how many lines it occurs in. If it could occur more than once in a line then another simple little loop around the instr and foundit lines that will check for all instances within the line.
However if the intention was to find all instances of it within each specific file then I would not use a line by line read as this would not be very efficent and could be slow on a large file. Instead I would read the entire file into memory in one shot and search the content.
-
Re: Finding strings in multiple files
Quote:
Originally Posted by DataMiser
Actually if you are going to do a line by line read then the best thing to do is check each line for the search phrase and if it is found exit the loop and close the file. No need to save all the lines and then do the search unless you want to locate all the instances in the file.
Code:
open filename for input as #filehandle
do while not eof(filehandle)
line input #filehandle, LineFromFile
if Instr(LineFromFile,SearchPhrase) then
FoundIt=True
exit do
end if
loop
close #filehandle
If foundit is false then move on to the next file and do the same process ideally using an outter loop with something like above nested within.
Thank you very much for that, unfortunatly, i basically have a file like the following:
AAAAAA 1.1.1
AAAAAB 1.1.1
AAAAAC 1.1.1
AAAAAA 1.1.2
AAAAAB 1.1.2
AAAAAA 1.1.2
What i wish to do though, is detect the newest times, in this example 1.1.2 and then find out how many different strings there are, for exmaple the newest time in my little example is 1.1.2, then i want it to say there were two different strings, AAAAAA,AAAAAB, but not detect the second AAAAAA, resulting in a textbox on the .exe = 2. Is this possible.
Thanks
-
Re: Finding strings in multiple files
Here's my very hastily thrown together solution - If you want anything explained, just asked. It only works for one file, but doubtless you can use the earlier advice to adapt it. It also will only work with the prototpye as you gave it in your last post. It isn't a very good solution over all (feel free to laugh at my attempt at error checking) but It should get the job done!
I copied your sample data into a text file, then tried this with:
Code:
Debug.Print Join(Form1.GetLinesAtLastTime("C:\text.txt"), vbNewLine)
As the debug window doesn't like arrays. It returned the correct data.
So, here we go!
Code:
Option Explicit
Private Type mTime
p1 As Integer
p2 As Integer
p3 As Integer
End Type
Private Function mTimeSame(ByRef mTime1 As mTime, ByRef mTime2 As mTime) As Boolean
mTimeSame = ((mTime1.p1 = mTime2.p1) And (mTime1.p2 = mTime2.p2) And (mTime1.p3 = mTime2.p3))
End Function
Private Function mTimeOneIsBigger(ByRef mTime1 As mTime, ByRef mTime2 As mTime) As Boolean
If mTime1.p1 > mTime2.p1 Then
mTimeOneIsBigger = True
ElseIf mTime1.p1 < mTime2.p1 Then
mTimeOneIsBigger = False
Else
If mTime1.p2 > mTime2.p2 Then
mTimeOneIsBigger = True
ElseIf mTime2.p2 < mTime2.p2 Then
mTimeOneIsBigger = False
Else
If mTime1.p3 > mTime2.p3 Then
mTimeOneIsBigger = True
ElseIf mTime1.p3 < mTime2.p1 Then
mTimeOneIsBigger = False
Else
mTimeOneIsBigger = False
End If
End If
End If
End Function
Public Function GetLinesAtLastTime(ByVal FileName As String) As String()
Dim mFile As Integer, Line As String
Dim Time As mTime, NewTime As mTime
Dim Parts() As String, Data As String
'Prototype: AAAAAA 1.1.1
mFile = freeFile
Open FileName For Input As #mFile
Do While Not EOF(mFile)
Line Input #mFile, Line
Parts() = Split(Line, " ")
If UBound(Parts) = 1 Then
Parts() = Split(Parts(1), ".")
If UBound(Parts) = 2 Then
NewTime.p1 = Parts(0)
NewTime.p2 = Parts(1)
NewTime.p3 = Parts(2)
Else
Err.Raise 1342, "Some Code", "ASGEGWRH!"
Else
Err.Raise 1342, "Some Code", "ASGEGWRH!"
End If
Parts = Split(Line, " ")
If mTimeSame(Time, NewTime) Then
Data = Data & ";" & Parts(0)
ElseIf mTimeOneIsBigger(NewTime, Time) Then
Time = NewTime
Data = Parts(0)
End If
Loop
Close #mFile
GetLinesAtLastTime = Split(Data, ";")
End Function
Hope that's of some help!
-
Re: Finding strings in multiple files
There is a problem with the line
'Private Function mTimeSame(ByRef mTime1 As mTime, ByRef mTime2 As mTime) As Boolean'
it comes up with the error 'User-Defined type not defined' could you explain why?
thanks
-
Re: Finding strings in multiple files
Have you got that function in the same form / module at the definition of mTime? If you dump all of that code into one module, it should work.
The reason for the error is that the compiler couldn't find one of the types in the line. I won't be boolean as that is a core value which you couldn't program without if you tried, so it means it can't find the type mTime. The likely reason, as I sort of mentioned, is that if it's in a different form or module and declared as private (as it was in my code ("Private Type mTime"), then it can't be used outside that Form / Module.
In fact, you can't have Types, Enums (enumerators), Declares and (if memory serves) constants as Public members of a form. That's why you'll find many projects have module full of only them.
-
Re: Finding strings in multiple files
Jeez, you make things tough...
Code:
ReDim str(5)
str(0) = "AAAAAA 1.1.1"
str(1) = "AAAAAB 1.1.1"
str(2) = "AAAAAC 1.1.1"
str(3) = "AAAAAA 1.1.2"
str(4) = "AAAAAB 1.1.2"
str(5) = "AAAAAA 1.1.2"
'str() = Split(strBuff, vbCrLf)
' MsgBox "There are " & UBound(str) + 1 & " lines in the file"
Dim words() As String, y As Integer, max2 As String
max2 = "0.0.0"
For x = 0 To UBound(str)
words() = Split(str(x), " ")
For y = 0 To UBound(words)
If y = 1 Then
If words(y) > max2 Then
max2 = words(y)
st = st & words(0) & " " & max2 & vbCrLf
End If
End If
Next y
Next x
MsgBox st
-
Re: Finding strings in multiple files
My apologies - I had forgotten that . is considered < 1. If you have a seperator which is greater than the numerical value, and ever hit a situation where the strings are of unequal length, things break down a bit...
-
Re: Finding strings in multiple files
Well, if the newer times are going to be at the end of the file, I think I'd start there. I'd also probably read the entire file into a string array, split on vbCrLf, unless the file was huge.
Code:
Dim X() As String, Newest As String
Dim I As Long, Count As Long
X = Split("AAAAAA 1.1.1,AAAAAB 1.1.1,AAAAAC 1.1.1,AAAAAA 1.1.2,AAAAAB 1.1.2,AAAAAA 1.1.2", ",")
I = UBound(X)
Newest = Mid$(X(I), InStrRev(X(I), Chr$(32)))
X = Filter(X, Newest)
I = UBound(X)
Do
Count = Count + 1
X = Filter(X, X(I), False)
I = UBound(X)
Loop While I >= 0
Debug.Print Count
-
Re: Finding strings in multiple files
Any idea how many entries there might be for the newest time? If that isn't going to be too much, it may work to scan the file backwards until you hit an entry older than the last, then grab everything from that point up to the end.
-
Re: Finding strings in multiple files
Quote:
Originally Posted by Chrispy360
Thank you very much for that, unfortunatly, i basically have a file like the following:
AAAAAA 1.1.1
AAAAAB 1.1.1
AAAAAC 1.1.1
AAAAAA 1.1.2
AAAAAB 1.1.2
AAAAAA 1.1.2
What i wish to do though, is detect the newest times, in this example 1.1.2 and then find out how many different strings there are, for exmaple the newest time in my little example is 1.1.2, then i want it to say there were two different strings, AAAAAA,AAAAAB, but not detect the second AAAAAA, resulting in a textbox on the .exe = 2. Is this possible.
Thanks
Ok but the first post said that you needed to search multiple files for a number and find out which file that number was in. What you are asking here is completely different.
In a later post you said the files are huge so reading the whole file or reading line by line does not make much since as it would be very slow. You also mention somethign about the first 2 characters. So I have a few questions.
How often is data written to the file?
Do you need to find the newest entries for each prefix or just the newest entries?
Are all of the prefixes likely to be written daily?
Abotu how much data would be written on a daily basis?
-
Re: Finding strings in multiple files
Quote:
Originally Posted by WizBang
Any idea how many entries there might be for the newest time? If that isn't going to be too much, it may work to scan the file backwards until you hit an entry older than the last, then grab everything from that point up to the end.
I'm thinking that it might be a good idea to do a binary read on the file and seek to somewhere near the end then read a block of data say maybe 1k into memory and then search that assuming of course that the data being looked for would always be found int he last 1k of data and of course that block size should be adjusted to whatever size is likely to be needed.
Assuming this is an onging process that needs to be repeated then I would keep a seperate file, perhaps an ini file that contains the results from the last scan and the byte position of the last read.
-
Re: Finding strings in multiple files
Yes DataMiser, that's basically what I had in mind for the searching. And in the case of an ongoing process, your suggestion to keep track of the byte position (and any other pertinent data) is a good one. Though we don't yet know if more than one timestamp is expected between each run.
I'm sure the OP can clear up these questions.
-
Re: Finding strings in multiple files
I have managed to work out a workaround for my problem , but have come accross another hopefully smaller problem.
By opening the file as Input and a second file for Output I can get the result I need , except there seems to be a limit on either the size of file that VB can read or write to!
My original file has over 10,000 lines and each line contains about 120-140 characters., but VB will only copy the first 140 lines.
Is there a way of getting round this?
-
Re: Finding strings in multiple files
I am not aware of limits on file sizes in vb and have worked with files that are several megs in size. What code are you using?
-
Re: Finding strings in multiple files
There actually is a limit on file size (the same as in C++), at a little over 4 GB. It doesn't sound like that's the issue here, but GremlinSA has written an article on how to overcome the limitation.
-
Re: Finding strings in multiple files
Quote:
Originally Posted by WizBang
There actually is a limit on file size (the same as in C++), at a little over 4 GB. It doesn't sound like that's the issue here, but
GremlinSA has written an article on how to overcome the limitation.
That is also a limit on file size for a fat32 drive as well. I never thought about it being a limit in VB but makes sense must be using 4 bytes to hold the file size.
I agree though that is not the issue 140bytesx10,000 is far short of 4 gigs more like 1.4 megs
-
Re: Finding strings in multiple files
WizBang -
Yeah, I forgot if ParamArrays could be declared a specific data type or just variant. I don't use ParamArrays that often, but felt one would be appropriate for this purpose. And I had the function return a variant since I was returning a full array, should I have used object or string?
-
Re: Finding strings in multiple files
Ignore the filesize error statement. The limitation was caused by errors in the line of text not by file size. Sorry about that , should have checked that first!
-
Re: Finding strings in multiple files
Quote:
Originally Posted by VehementSoftware
WizBang -
Yeah, I forgot if ParamArrays could be declared a specific data type or just variant. I don't use ParamArrays that often, but felt one would be appropriate for this purpose. And I had the function return a variant since I was returning a full array, should I have used object or string?
A function can both take and return an array. In the case of a string array, the following would do both:
Code:
Private Function SearchFiles(ByVal SearchString as String, ByVal FilePaths() as String) as String()
I can honestly say I've never used ParamArray, and it seems that would require using a Variant.
-
Re: Finding strings in multiple files
Thanx wizbang, that is very helpful. Yeah, actually passing an array would be probably more effective. The calling procedure would look like this
Code:
Dim strFilePaths(3) as String
Dim strFindString as String
strFilePaths(0) = 'path 1
strFilePaths(1) = 'path 2
strFilePaths(2) = 'path 3
strFilePaths = SearchString(strFindString, strFilePaths())
Or something similar...as opposed to
Code:
Dim strFile1 as string
Dim strFile2 as string
Dim strFile3 as string
Dim strFindString as string
Dim strFoundPaths() as string
strFile1 = 'path 1
strFile2 = 'path 2
strFile3 = 'path 3
strFoundPaths = SearchString(strFindString, strFile1, strFile2, strFile3)
Don't know if what I was trying to do makes sense there...but the paramarray let's you list unkown number of parameters. So really it would depend on how the file's paths are stored in the calling prodecure as to what the best function header would be; even though your way would overall be much more effective.
I've personally only used the paramarray in an Error Logging object I created. I had to use it so I could pass all variable names and values through to the method that writes the error logs; other then that, I have never thought of another purpose for it.