Optimizing jpeg recovery code
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 1 of 2 12 LastLast
Results 1 to 15 of 17

Thread: Optimizing jpeg recovery code

  1. #1
    Join Date
    Apr 2012
    Posts
    15

    Optimizing jpeg recovery code

    Hello guys,

    I'm taking a CS course and we've been tasked with creating a program that recovers jpegs from a formatted CF card which uses the FAT file system with a block size of 512 bytes, the jpegs in the card are block aligned which means that the beginning of a jpeg marks the end of the former.

    I've wrote the program and it works nicely and recovers the 51 jpegs in the CF card (actually just an image of some 4-5 megabytes of the actual card which can be downloaded here) but I'm looking for ways to optimize/improve my code so I need a second look from experienced programmers.

    Here's my code:
    Code:
    /*
     * filename    : recover.c
     * author      : ####
     * username    : ####
     * course      : ####
     * description : Recovers jpegs from a forensic image
     */
     
     #include <stdio.h>
     #include <stdlib.h>
     #include <string.h>
     #include <stdbool.h>
     
     #define BLOCK 512      // fat file system block size
     
     int main(void)
     {
        // open CF card image
        FILE* img = fopen("card.raw", "r");
        if (img == NULL)
        {
            fprintf(stderr, "Could not open forensic image %s.\n", "card.raw");
            return 1;
        }
     
        int num = 0;        // keep track of retrieved files
        FILE* jpeg = NULL;  // output file
        char fileName[8];   // file name
        char sig1[4] = {0xff, 0xd8, 0xff, 0xe0};    // jpeg files #1 signature
        char sig2[4] = {0xff, 0xd8, 0xff, 0xe1};    // jpeg files #2 signature
     
        // make buffer to hold blocks of data, and read first block
        char* buffer = malloc(BLOCK);
        fread(buffer, BLOCK, 1, img);
     
        // traverse img for jpegs and retrieve them
        while (!feof(img))
        {
            // jpeg signatures not found yet
            bool found = false;
            
            if (num == 0) // find first jpeg
            {
                for (int i = 0; i < BLOCK; i += 4)
                {
                    if (memcmp(sig1, &buffer[i], 4) == 0 ||
                        memcmp(sig2, &buffer[i], 4) == 0)
                    {
                        // set file name and open file
                        sprintf(fileName, "%.3d.jpg", num);
                        found = true;
                        num += 1;
                    }
                }
            }
            else    // check first four bytes only
            {
                if (memcmp(sig1, buffer, 4) == 0 ||
                    memcmp(sig2, buffer, 4) == 0)
                    {
                        // set file name and open file
                        sprintf(fileName, "%.3d.jpg", num);
                        found = true;
                        num += 1;
                    }
            }
            
            /* if already writing to a jpeg continue,
             * else start writing to a new one
             */
             if (found == false && num != 0)
             {
                fwrite(buffer, BLOCK, 1, jpeg);
             }
             else if (found == true)
             {
                if (num == 1)
                {
                    jpeg = fopen(fileName, "w");
                    fwrite(buffer, BLOCK, 1, jpeg);
                }
                else
                {
                    fclose(jpeg);
                    jpeg = fopen(fileName, "w");
                    fwrite(buffer, BLOCK, 1, jpeg);
                }
             }
             
             // read next block from file
             fread(buffer, BLOCK, 1, img);
        }
        
        fclose(jpeg);
        fclose(img);
        free(buffer);
        return 0;
     }
    My questions are:
    1) What are the possible optimizations that can make my code faster/better/more concise?
    2) I'm not handling slack space, since trailing zeros at the end of a jpeg won't affect rendering it (we were told it won't contain any garbage values)... how may I approach the problem?
    3) Can I enhance my if else constructs? or even replace them with something more elegant?

    Much thanks in advance guys.
    Last edited by VeNiX; January 21st, 2013 at 03:00 PM.

  2. #2
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,723

    Re: Optimizing jpeg recovery code

    Your location for your data download forces the install of additional software to the web browser to perform the download or it won't allow the download. This is not good!

  3. #3
    Join Date
    Apr 2012
    Posts
    15

    Re: Optimizing jpeg recovery code

    Quote Originally Posted by 2kaud View Post
    Your location for your data download forces the install of additional software to the web browser to perform the download or it won't allow the download. This is not good!
    Looks like you clicked on an ad, just click the download button at the center of the page... the one with the big blue arrow pointing at it.

  4. #4
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,723

    Re: Optimizing jpeg recovery code

    Click Download as free user. This downloads card.raw.exe. Running this program then asks to click button next. This then says going to install babylon and other things so I exit. There is no big blue arrow. It also installs putdownlocker program onto computer which I have had to delete. If I can get copy of your data I'll look at your program - but you need to find another way of enabling someone to download the data.

  5. #5
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,723

    Re: Optimizing jpeg recovery code

    There are a couple of things which instantly come to mind. If found is false, is num likely to be not 0? Also the only difference between num == 1 and num != 1 is the fclose. You could do this

    Code:
    if (num != 1) fclose(jpeg);
    jpeg = fopen(fileName, "w");
    fwrite(buffer, BLOCK, 1, jpeg);
    I want a look at the data before I offer other suggestions but can't get it as per my previous posts.

  6. #6
    Join Date
    Apr 2012
    Posts
    15

    Re: Optimizing jpeg recovery code

    Quote Originally Posted by 2kaud View Post
    There are a couple of things which instantly come to mind. If found is false, is num likely to be not 0? Also the only difference between num == 1 and num != 1 is the fclose. You could do this

    Code:
    if (num != 1) fclose(jpeg);
    jpeg = fopen(fileName, "w");
    fwrite(buffer, BLOCK, 1, jpeg);
    I want a look at the data before I offer other suggestions but can't get it as per my previous posts.
    The first condition for found and num makes sure I don't write anything to a file that isn't open (because I haven't found the first jpeg yet).

    As to your second suggestion, that is definitely better than what I was doing.

    Here's the CF image on my dropbox: https://www.dropbox.com/s/offxj7sf1ql63ka/card.raw

    Thanks for your help and sorry for the headache m8

  7. #7
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,723

    Re: Optimizing jpeg recovery code

    OK, got the data file.

  8. #8
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,723

    Re: Optimizing jpeg recovery code

    But where do you independently set found or increment num? You seem to be incrementing num in only two places and in both places you also set found to true. So in your code

    Code:
    if (found == false && num != 0)
    surely this condition will never occur as num is only incremented after found is set to true?

  9. #9
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,723

    Re: Optimizing jpeg recovery code

    What compiler are you using? I've taken your code and tried to compile under Microsoft visual studio - but it correctly reports error for the malloc

    Code:
    char* buffer = malloc(BLOCK);
    Shouldn't this be

    Code:
    char* buffer = (char *)malloc(BLOCK);
    Also, why allocate at run time when you know size at compile time? Why not just

    Code:
    char buffer[BLOCK];
    However given the above code change, I tried to run your program using your card.raw data file - but it only writes out one file 000.jpg!!! What system are you using?

  10. #10
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,723

    Re: Optimizing jpeg recovery code

    OK, I got it working now. You need to open/write the files in binary mode. You use

    Code:
    FILE* img = fopen("card.raw", "r");
    On my system, the default mode for fopen is text-mode translation which involves carriage-return and linefeed characters.

    To be sure you always open file in binary mode it would be better to use

    Code:
    FILE* img = fopen("card.raw", "rb");
    The same applies to creating the output files. It would be better to use to make sure writing in binary mode without carriage-return and linefeed translation.

    Code:
    jpeg = fopen(fileName, "wb");

  11. #11
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,723

    Re: Optimizing jpeg recovery code

    Please ignore my earlier comment re the found and num test. I see now what you're doing.

    You say that the jpegs are block aligned - which looking at the card file seems to be correct. This also applies to the first one which starts at offset 0x1400. So why are you first searching through the blocks byte by byte for a signature which starts at the beginning of a block?

    You ask about slack space at the end of an image. An image ends with 0xffd9. Have you looked at images with a hex editor? When you are writing a block you could search for 0xffd9 and if found only write a partial block upto and including these. That means searching each block. However, you are writing many blocks which consist totally of 0x00. Before writing a block check if it is all 0x00 and only write if it is not. This won't get rid of all the 0x00 at the end of a file but it will ensure that only blocks containing data are written.

    Like the cat by the way!

    Hope this has been of help.

  12. #12
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,723

    Re: Optimizing jpeg recovery code

    A couple more comments. Your while loop tests for eof but does not test for an error on the fread. As fread is only used to read one record, it should return 1 if succeeded. If either eof or error occurred then it will not read and will return 0.

    You are also not testing for error on your fopen for write. Assuming something works is not good programming practice.

    You also asked about enhancing/replacing your if..else constructs and making them more elegant. Yes, these can be very much optimised. Your bool variable 'found' is not needed. Have a try yourself to change the logic of your while loop. Hint. You only need one case of fwrite not two as you have. I've re-coded your program and can do the whole while loop in 9 lines of code (excluding { } lines and comments) including testing for create error.

    Have a go and let's know how you get on.

  13. #13
    Join Date
    Apr 2012
    Posts
    15

    Re: Optimizing jpeg recovery code

    I'm using Linux as my primary system and clang as a compiler...

    Anyway, thanks for the knowledge about dealing with slack space. As for searching blocks byte by byte at first, in the class we were told that the first few blocks contain garbage values and that seems to have confused me in that regard.

    I'll start optimizing my code as per your advice and I'll post it back here.

  14. #14
    Join Date
    Apr 1999
    Posts
    27,444

    Re: Optimizing jpeg recovery code

    Quote Originally Posted by VeNiX View Post
    I'll start optimizing my code as per your advice and I'll post it back here.
    You may be surprised that doing this may be the bottleneck:
    Code:
       sprintf(fileName, "%.3d.jpg", num);
    If you have the source code to your C library's version of sprintf(), you will see the gauntlet the code has to go through just to convert a number to a string. It may be faster just to generate the names beforehand and stick them in an array, then you access the array using "num" as an index.

    Right now, you're factoring in library functions that really do nothing except to slow down the process. Functions such as sprintf() shouldn't appear in the middle of critical code.

    Regards,

    Paul McKenzie
    Last edited by Paul McKenzie; January 24th, 2013 at 05:56 PM.

  15. #15
    Join Date
    Apr 2012
    Posts
    15

    Re: Optimizing jpeg recovery code

    Quote Originally Posted by Paul McKenzie View Post
    You may be surprised that doing this may be the bottleneck:
    Code:
       sprintf(fileName, "%.3d.jpg", num);
    If you have the source code to your C library's version of sprintf(), you will see the gauntlet the code has to go through just to convert a number to a string. It may be faster just to generate the names beforehand and stick them in an array, then you access the array using "num" as an index.

    Right now, you're factoring library functions that really do nothing except to slow down the process. Functions such as sprintf() shouldn't appear in the middle of critical code.

    Regards,

    Paul McKenzie
    So I should use sprintf() outside of the while loop to initialize all filenames first? or did you mean I should use something else entirely?

    Thanks for the help.

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center