CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 2 of 2

Threaded View

  1. #1
    Join Date
    Jun 2012
    Posts
    3

    Efficient way to do validation

    Hi all,

    Like many of you, sometimes I have to collaborate with others.
    I've been given the task of importing a .csv file with various customer information, and performing validation on some of the fields.

    Currently validation happens on specified fields of each record (there are multiple fields) within a foreach loop. Something along the lines of this:

    foreach item in records
    find item 1 within record
    validate item 1
    if item 1 passes validation
    add item 1 to db
    if item 1 doesn't pass
    add a validation message to a list

    find item 2 within record
    validate item 2
    if item 2 passes validation
    add item 2 to db
    if item 2 doesn't pass
    add a validation message to a list

    and so on, until the end of the text file

    Most of the validation can be done with a regex (number fields). Any records that don't pass validation don't get added to the db - but I will need to retain the original record, and a validation error message. Seems simple enough, right?

    The last part of the validation involves checking a db table to see if the item exists. This would happen after the other regex checks. Since the text file could potentially be tens of thousands of records, a concern is tens of thousands of trips to the database to run the validation per record. The db table itself contains 1 char(6) field and 1 int field and has approx 150,000 rows.

    One alternate suggestion is to import the file, and do the regex validation, record by record in the foreach loop, and add the records to the db *before* the last validation check. Perform the last validation check by using one query, (Select * items we just entered into the database that do not exist in this other database) to retrieve all of the records just added that don't exist in the db table (not existing means they're not valid). So that eliminates the round trips to the db. With this method, there is only one call to the db for validation on the entire list of items, instead of one item at a time. Then, of course, the offending records would have to be deleted from the db.

    The problem I'm having with that suggestion is matching the field from the last db table validation check with the original record it came from in the original text file -- remember, I have to maintain a list of the original offending records and a validation message (explaining why this record wasn't added).

    I'd really like to perform all of the validation checks before adding anything to the db. Since the repeated db calls are a concern, I wondered about reading in the entire table (the one I have to check existence in), holding on to it in memory, for the duration of the processing of the imported text file, and checking the items against it while traversing the records in the for each loop. Maybe use a list (can lists hold that many items?), and check if the list contains the item (or doesn't, in this case).

    The size of the table in sql uses 3 MB of storage space. I don't know what that translates to in terms of memory usage within the program this is happening in, but I'd like to have a valid argument to use this method instead of the suggested alternative I wrote about above.

    Will my idea (reading the entire table into memory) be more efficient than repeated calls to the db? And is it better than adding records, then deleting them because some items failed the validation check.

    Perhaps storing the table in cache?

    I'm open to any other suggestions.

    Thanks.
    Last edited by gdev; June 8th, 2012 at 07:42 AM. Reason: Additional idea

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured