Results 1 to 7 of 7

Thread: Comparison of multiple files

Thread Tools
- Show Printable Version
Display
- Switch to Linear Mode
- Switch to Hybrid Mode
- Threaded Mode

Threaded View

July 25th, 2011, 02:54 PM #1
beschner

View Profile

View Forum Posts

Junior Member
Join Date

Jul 2011

Posts

2
Comparison of multiple files

A text file that we used to get electronically we will now be getting as a printout (due to security changes, so I can't change this).

I can scan the file in and use OCR (optical character recognition), but it's not perfect. I scanned the same page in several times on the same scanner and the OCR gives me slightly different results.

The differences are fairly simply to humans - a "d" (D) on one version is a "cl" (C & L) on another. Or a space may be added (or skipped). This applies to lines, where one version might have an extra blank line where the other doesn't.

My idea is to scan it 3 times and compare the files. If a line is the same on 2 of the 3, then it is declared good. If different on all 3, then human intervention is required.

What if I expand this to scanning 4 times. Or 10 times??? I'm not sure if that will help or hurt...

I've searched the web know that "diff3" is good for comparing 3 files, but usually one is the ancestor of the other 2. In this case. there is no "original" version, so that won't work too good. I couldn't find anything else about comparing multiple files.

I'm trying to come up with a good algorithm for comparing 3 (or more) files. It should be a 1-line to 1-line (to 1-line) comparison, with an occasional blank line thrown in.

Is there a good way to optimize the comparisons for each line, and/or the individual text within a line.

(My department is using Perl, which is great at comparisons. I can even compare the lines with all white-space removed.)

Reply With Quote

Quick Navigation Algorithms & Data Structures Top

« Previous Thread | Next Thread »

Tags for this Thread

compare

View Tag Cloud

Posting Permissions

You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
[VIDEO] code is On
HTML code is Off

Forum Rules

Click Here to Expand Forum to Full Width

Featured

The Best Reasons to Target Windows 8

* Porting from Android to Windows 8: The Real Story
Do you have an Android application? How hard would it really be to port to Windows 8?
* Guide to Porting Android Applications to Windows 8
If you've already built for Android, learn what do you really need to know to port your application to Windows Phone 8.
* HTML5 Development Center
Our portal for articles, videos, and news on HTML5, CSS3, and JavaScript
* Windows App Gallery
See the Windows 8.x apps we've spotlighted or submit your own app to the gallery!

Thread: Comparison of multiple files

Thread Tools

Display

Threaded View

Comparison of multiple files

Tags for this Thread

Posting Permissions