-
March 20th, 2013, 03:04 PM
#1
Quuikest way to determine if an array element exists in a string
I've been looking for most of the day so far and can find a plethora of ways to determine if a string is present in an array element but that's not what I'm looking for.
I have a string array that contains authors names, about 175,000 of them.
I have a string that contains (possibly) a book title, series and author name - though not necessarily in that order.
Currently I'm taking the brute force approach (code follows), but there has to be a better way and I'm too involved with the results to see it. Since I'm constantly looking at this array in my code, I'm spending almost 35% of my time in [b]this[b] code so any improvement would be beneficial.
Code:
Public Function FindAuthors(ByVal file As String) As String
' Getting the same name twice (or more?) possibly if the name appears in
' different forms on the same title? -- Fix 1
If file = "" Then Return ""
If file.Substring(0, 1) = "*" Then Return file ' if the author is an "*" then it's a special case possibly no author, or a magazine ...
Dim loc As Integer
Dim strTemp As String = CleanInput(file) ' regex remove all except alphanumeric and white space
Dim strTempAuthor As String = Nothing
Dim myOptions As StringComparison = StringComparison.CurrentCultureIgnoreCase
' TODO: need to be able to eliminate names which are partials, eve adam <-> steve adams
Dim strAuthorsOut As String = Nothing
For Each kvp As KeyValuePair(Of String, Integer) In AuthorSearchDict
If kvp.Key <> "" Then
loc = strTemp.IndexOf(kvp.Key)
If loc > -1 Then
strTempAuthor = IIf(cbFNF.Checked, AuthorDict(AuthorSearchDict(kvp.Key)).FNL, AuthorDict(AuthorSearchDict(kvp.Key)).LNF).ToString
If strAuthorsOut = Nothing Then
strAuthorsOut &= strTempAuthor & "; "
ElseIf strAuthorsOut.IndexOf(strTempAuthor, myOptions) = -1 Then 'fix 1
strAuthorsOut &= strTempAuthor & "; "
End If
End If
End If
Next
If Len(strAuthorsOut) > 0 Then strAuthorsOut = Microsoft.VisualBasic.Left(strAuthorsOut, Len(strAuthorsOut) - 2)
Return strAuthorsOut
End Function
Any ideas short of my continued brute force.
-
March 20th, 2013, 09:18 PM
#2
Re: Quuikest way to determine if an array element exists in a string
Should have noted, this is being done in VB 2005.
This dictionary (not an array, though I treat it pretty much as such) is searched and the kvp.data is the index to the desired name in the master array (another dictionary).
-
March 20th, 2013, 10:07 PM
#3
Re: Quuikest way to determine if an array element exists in a string
I'd use SQL Server to store and search the data.
-
March 21st, 2013, 09:26 AM
#4
Re: Quuikest way to determine if an array element exists in a string
Remeber David, I'm looking to see if any ( one or more ) element in the dictionary is in the string. I've had a fellow who uses SQL on a daily basis say the same thing, then when I passed him the data he couldn't do it.
My last database work was back on an Hp300 and haven't had the opportunity to learn or use SQL, if you feel it could be done, can you offer some pointers as to how??
-
March 21st, 2013, 03:20 PM
#5
Re: Quuikest way to determine if an array element exists in a string
or break names down into firstname/lastname as separate lists, and compare once.
-
March 21st, 2013, 03:24 PM
#6
Re: Quuikest way to determine if an array element exists in a string
Originally Posted by dglienna
or break names down into firstname/lastname as separate lists, and compare once.
Respectfully, David, either you don't understand what I'm trying to do or I'm too dense to understand what you're suggesting. Your one line comments are worth nothing as I see it. //al
-
March 22nd, 2013, 04:45 PM
#7
Re: Quuikest way to determine if an array element exists in a string
What are the min/max "words" in the AuthorSearchDict key fields (i.e. FirstName, MiddleName, LastName, etc. etc.) Maybe 2, 3 or 4? Take your input string and create sequential "names" from the string. Say the min words is two (at least a first & last) and the max is four. Then "Call of the Wild by Jack London" becomes these possible "names":
Call of
of the
the Wild
Wild by
by Jack
Jack London
Call of the
of the wild
the wild by
Wild by Jack
by Jack London
Call of the Wild
of the Wild by
the Wild by Jack
Wild by Jack London
Then do a Dictionary.ContainsKey on these possiblities. I don't know if it's faster but it is different. This would also fix your To-Do
-
March 22nd, 2013, 05:35 PM
#8
Re: Quuikest way to determine if an array element exists in a string
Mur16, now that's an interesting thought. ..... Hmmm, he says ... this sequential search is becoming entirely too long.
min is two, but there are some that are considerably longer .... most of the book titles, even assuming a long title could be broken down easily and containskey has got to be fast than my sequential .... rambling ....
That is a very interesting thought, appreciate it much .... as he wanders of to consider ramifications ....
Just to explain the odd logic already in place ... Authors.txt looks like:
<author lnf>, <author fnl>, <author name variations> for example:
le Carre, John, John le Carre, Carre, John le, John leCarre
These are extracted into a two dimensional Author array of:
<author lnf>, <author fnl>
and a dict AuthorSearch of:
<name>, <Author Index>
Last edited by AlJones; March 22nd, 2013 at 05:44 PM.
-
March 22nd, 2013, 10:31 PM
#9
Re: Quuikest way to determine if an array element exists in a string
XML to SQL isn't that hard. You are searching for a 1 to many Relationship. Common DB talk
-
March 23rd, 2013, 05:12 PM
#10
Re: Quuikest way to determine if an array element exists in a string
I've got to do a little house cleaning, but this works like a champ - loading one of the other tables, where I do an author look-up, went so fast I thought something was wrong!
Code:
If file = "" Then Return ""
If file.Substring(0, 1) = "*" Then Return file
Dim strTemp As String = CleanInput(file)
Dim strTempAuthor As String = Nothing
Dim strAuthorsOut As String = Nothing
Dim arr1 As String() = file.Split()
Dim names As New ArrayList
Dim arr2(150) As String
Dim i As Integer = 0
Dim j As Integer = 2
While j + i <= arr1.Length
While i + j <= arr1.Length
Array.Copy(arr1, i, arr2, 0, j)
strTempAuthor = Trim(Join(arr2))
If AuthorSearchDict.ContainsKey(strTempAuthor.ToLower) Then names.Add(strTempAuthor)
i += 1
End While
j += 1
i = 0
End While
strAuthorsOut = ""
Dim aname As String
For Each aName In names
strAuthorsOut = strAuthorsOut & aname & "; "
Next
If strAuthorsOut <> Nothing AndAlso strAuthorsOut.Length > 0 Then strAuthorsOut = Microsoft.VisualBasic.Left(strAuthorsOut, Len(strAuthorsOut) - 2)
Return strAuthorsOut
Be more than glad to take any other constructive suggestions you might have!
Last edited by AlJones; March 24th, 2013 at 09:31 AM.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|