CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 11 of 11
  1. #1
    Join Date
    Jun 2007
    Posts
    29

    Will using regexs in the primary key slow down searching?

    I am working on a postgresql database that will basically search for part numbers and put a lot of data together (what else). Most of these part numbers will be pretty standard, and will just be something like "4001". However, some manufacturers use lots of prefixes and suffixes, and I realized it would make sense to use something like "4001(A|B|C)" instead of making 3 additional part numbers. The database might eventually be something like 40-50 thousand parts, with maybe 25% or so of those containing regular expressions. Would that significantly slow down a search? If I don't use the regular expressions, it will increase the number of database entries significantly anyway, and make my life a lot tougher.

    What do you think?
    Last edited by Ixiterra; January 24th, 2009 at 07:35 PM.

  2. #2
    Join Date
    Jan 2006
    Location
    Fox Lake, IL
    Posts
    15,007

    Re: Will using regexs in the primary key slow down searching?

    You could use a GUID for the part number, then use two columns if needed for the different level numbers
    David

    CodeGuru Article: Bound Controls are Evil-VB6
    2013 Samples: MS CODE Samples

    CodeGuru Reviewer
    2006 Dell CSP
    2006, 2007 & 2008 MVP Visual Basic
    If your question has been answered satisfactorily, and it has been helpful, then, please, Rate this Post!

  3. #3
    Join Date
    Jun 2007
    Posts
    29

    Re: Will using regexs in the primary key slow down searching?

    An example part number could be "12-45-AL-8206 MPQ LC"
    Where "12-45-AL" are prefixes that are all optional, and could be from a range of 50 or so options, "8206" is the base part number, but it is similar to 10 other numbers, and then the suffixes "MPQ LC" MPQ which is a required option out of about 20, and LC just an extra option.

    I don't know if your idea will work. What I can regex that with is simply ".*82(06|07|08) ?(MPQ|MPL|MAQ).*" and work from there.


    edit: If this would be pretty slow, I had the idea of using one table for basic part numbers, then an extended one with expressions if the number was not found in the basic one.
    Last edited by Ixiterra; January 24th, 2009 at 08:42 PM.

  4. #4
    Join Date
    Jun 2007
    Posts
    29

    Re: Will using regexs in the primary key slow down searching?

    Well to answer my own question (I am new to SQL), it seems that doing a regex search would be really bad. I ran the statements from the commandline, and with regex even on a 3 or 4 line table it took 15ms to complete whereas the standard WHERE = took 0ms. So I'll have two separate tables and only go to the second one if it can't be found in the first.

  5. #5
    Join Date
    Feb 2005
    Location
    Denmark
    Posts
    742

    Re: Will using regexs in the primary key slow down searching?

    I can't fully understand your mention of "regex" in connection with a sql query but it might be something postgresql specific.

    Generally speaking, databases are optimized for those searching tasks especially with some indexes are used, and some 40-50K rows shouldn't concern you much. And the added data of the additional rows shouldn't be much of a concern either.

    However what you could do would be to normalize the data a bit depending on how your product /part information exists.
    Thus you could make a table with the shared fields of the parts and a "cleaned" version of the part number, and then make a connected table with all the aliases/prefixes and specialized part numbers.
    So essentially make a part table which have the part number 8206 and all the data which are generic for said part and then make a table which then have the part number 12-45-AL-8206 MPQ LC and all variations and point it to the 8026 via keys.
    However as always with normalization of data - it will be a tradeoff between keeping redundant data low and having high performance in queries.

    From what you've described, I'd not worry too much and just make sure there are proper indexes on your tables..... unless I've misunderstood you

  6. #6
    Join Date
    Jun 2007
    Posts
    29

    Re: Will using regexs in the primary key slow down searching?

    Postgres allows a query like so: SELECT * FROM xxx WHERE partnum ~ '<regular expression>'
    And it works like POSIX regular expressions.

    Alsvha, part of my problem was that I want it to be an intelligent search. There would be hundreds of thousands of combinations of ways the user could enter a search. I suppose I could make a table of just the prefixes, but it seems to me that I'd have to do a lot more processing on the program side to make everything line up correctly.

    The way I have it envisioned now is that 8206 MPQ will be in the standard table, as will 8206 MPL, and 8207 MPQ and so on. But if the user does not type it in this way, it will resort to using the regular expression table which will contain an entry like .*82__ ?M__.* which will match anything I could throw at it, and then the program can determine what is the base part number from the string, find 8206 MPQ, and then that row links to groups of prefixes and suffixes and their appropriate info such as price. From there the program could easily determine what is what, or even make some educated guesses if something was typed incorrectly.
    Last edited by Ixiterra; January 27th, 2009 at 02:54 AM.

  7. #7
    Join Date
    Feb 2005
    Location
    Denmark
    Posts
    742

    Re: Will using regexs in the primary key slow down searching?

    So let me see if I understand this now - if we take your example of the part number 12-45-AL-8206 MPQ LC. There you said that 8206 was the actual part number and 12-45-AL was a prefix (out of many which could be assigned to the 8206). Is that correctly understood?

    Then you want that one to be found if the user enters 8206, but also if they enter 8206 LC, or should they enter something akin to 8206 * LC or...... perhaps just 12-45 out of the prefix portion? Or perhaps even worse - searching 12 AL should also find it?

    Because from the mention of "hundreds of thousands" combinations of prefixes it looks like you expect the user to be able to search on pretty much any combination of the part number digits and still find it?
    Or is it a hundred thousands combinations of prefixes for the 8206 part and those prefixes are what you want to have searched?

  8. #8
    Join Date
    Oct 2003
    Location
    .NET2.0 / VS2005 Developer
    Posts
    7,104

    Re: Will using regexs in the primary key slow down searching?

    You seem to be implying that each user will know the part number they are searching for, with all relevant prefixes and suffixes, so what's wrong with a simple WHERE = .. ?

    If youre aiming to make a guided search, where the user enters a major 8206 and some minor prefixes and suffixes, then the major-minor mapping table is a good idea

    User wants an 8206 and knows that it has an LC and MPL variant

    SELECT * FROM product_families WHERE family = 8206 and variants like '&#37;LC%' and variants like '%MPL%'

    X rows selected.. these can be used to link straight into the main parts table

    Or your family table can be like:

    family, variant, partNum
    8206, LC, 8206-LC-MPL-MPQ
    8206, LC, 8206-LC-MPL-MPZ
    8206, MPL, 8206-LC-MPL-MPQ
    8206, MPL, 8206-LC-MPL-MPZ
    8206, MPQ, 8206-LC-MPL-MPQ
    8206, MPQ, 8206-LC-MPL-MPZ
    8206, MPZ, 8206-LC-MPL-MPQ
    8206, MPZ, 8206-LC-MPL-MPZ


    SELECT distinct partNum FROM product_families where family = 8206 and variant IN ('LC', 'MPL')
    "it's a fax from your dog, Mr Dansworth. It looks like your cat" - Gary Larson...DW1: Data Walkthroughs 1.1...DW2: Data Walkthroughs 2.0...DDS: The DataSet Designer Surface...ANO: ADO.NET2 Orientation...DAN: Deeper ADO.NET...DNU...PQ

  9. #9
    Join Date
    Jun 2007
    Posts
    29

    Re: Will using regexs in the primary key slow down searching?

    The problem stems from the fact that there are many manufacturers, and they all use different part numbering schemes. The user will not necessarily know the correct way to enter the part number either. If they enter 12-8206MPQ, the program itself has no way of knowing that 8206 is the base part number without some heavy duty pre-processing which defeats the point of the RDMS. It searches the first database which has the part listed as 8206 MPQ, and therefore is not a match, then searches the regex-type database, which contains the .*82__ ?M__.* and from there the program can easily determine that 8206 MPQ is the base part number. From that row, it has links to all of the base part number's prefixes and suffixes, so it finds 12- in the prefixes and the C++ program can accordingly convert the part number to the correct format and has all the additional information necessary. This also gives me the option to correctly identify the part, and identify that a prefix was typed incorrectly or not found.

    I could use LIKE, which is probably faster, but it does not give me as many options. I think with regex though, I could use very few actual rows. It's really just a way to identify the base part number, and keep the base part number database nice and small and avoids a lot of duplication.

    If it helps to visualize:

    part_list:
    text part_num
    int part_cost_id -- links to pricing information, as well as prefixes/suffixes
    int part_series_id -- links to common elements with this type of part and other parts by the same manufacturer, like 8206 and 8207 would have the same series id
    int* opt_base_incl -- basically declares what type of part this is, and what functions it has

    part_list_regex:
    text part_num -- and that's it

    Maybe I'll get yelled at for using an integer array in a database, but postgres has the option and I intend to take advantage of it.
    Last edited by Ixiterra; January 27th, 2009 at 05:53 PM.

  10. #10
    Join Date
    Oct 2003
    Location
    .NET2.0 / VS2005 Developer
    Posts
    7,104

    Re: Will using regexs in the primary key slow down searching?

    i dont understand your problem because you seem to know that you can solve it easily, and you also seem to be cocerned with having your program make up for every user stupidity.. coding nightmare but it's your call

    If your regex solution will work then I suggest you first try direct lookup, then use regex.. Hopefully the users will learn to enter the correct part number for fast access, or defer to the slower but more comprehensive search if needed
    "it's a fax from your dog, Mr Dansworth. It looks like your cat" - Gary Larson...DW1: Data Walkthroughs 1.1...DW2: Data Walkthroughs 2.0...DDS: The DataSet Designer Surface...ANO: ADO.NET2 Orientation...DAN: Deeper ADO.NET...DNU...PQ

  11. #11
    Join Date
    Jun 2007
    Posts
    29

    Re: Will using regexs in the primary key slow down searching?

    Cjard: My initial problem was I wasn't sure if using regular expressions would slow down a search. I mentioned in the 4th post that I figured it out. I'm always open to hear other ideas though. What I'm trying to do is create software that's user-friendly. There are several pieces of software out there that do similar things (it's a specific industry that I work in that I'm targeting), but they are all very flawed in that the database is very much in your face. Finding the part number that you're looking for is very much at the whim of the database entry person and how they entered it. I don't like that. Sure it works fine for someone like me who is very computer savvy, but for most people it is frustrating. So I started working on my own. I've been programming self-taught since I was 12, and that has given me a lot of insight on how to make software work better. I am very new to databases though and have spent a lot of time designing the process of this one, and a lightbulb turned on when I thought about using regex for part numbers.

    Hopefully sometime in the next couple of years it will be very functional and I'll be ready to start my own business with it. Hopefully. I've already completed one part of it that basically doubles the speed and increases the accuracy significantly of one part of the process. But now I'm trying to expand on that.

    Thanks for the help though.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured