-
January 21st, 2013, 02:56 PM
#1
Optimizing jpeg recovery code
Hello guys,
I'm taking a CS course and we've been tasked with creating a program that recovers jpegs from a formatted CF card which uses the FAT file system with a block size of 512 bytes, the jpegs in the card are block aligned which means that the beginning of a jpeg marks the end of the former.
I've wrote the program and it works nicely and recovers the 51 jpegs in the CF card (actually just an image of some 4-5 megabytes of the actual card which can be downloaded here) but I'm looking for ways to optimize/improve my code so I need a second look from experienced programmers.
Here's my code:
Code:
/*
* filename : recover.c
* author : ####
* username : ####
* course : ####
* description : Recovers jpegs from a forensic image
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
#define BLOCK 512 // fat file system block size
int main(void)
{
// open CF card image
FILE* img = fopen("card.raw", "r");
if (img == NULL)
{
fprintf(stderr, "Could not open forensic image %s.\n", "card.raw");
return 1;
}
int num = 0; // keep track of retrieved files
FILE* jpeg = NULL; // output file
char fileName[8]; // file name
char sig1[4] = {0xff, 0xd8, 0xff, 0xe0}; // jpeg files #1 signature
char sig2[4] = {0xff, 0xd8, 0xff, 0xe1}; // jpeg files #2 signature
// make buffer to hold blocks of data, and read first block
char* buffer = malloc(BLOCK);
fread(buffer, BLOCK, 1, img);
// traverse img for jpegs and retrieve them
while (!feof(img))
{
// jpeg signatures not found yet
bool found = false;
if (num == 0) // find first jpeg
{
for (int i = 0; i < BLOCK; i += 4)
{
if (memcmp(sig1, &buffer[i], 4) == 0 ||
memcmp(sig2, &buffer[i], 4) == 0)
{
// set file name and open file
sprintf(fileName, "%.3d.jpg", num);
found = true;
num += 1;
}
}
}
else // check first four bytes only
{
if (memcmp(sig1, buffer, 4) == 0 ||
memcmp(sig2, buffer, 4) == 0)
{
// set file name and open file
sprintf(fileName, "%.3d.jpg", num);
found = true;
num += 1;
}
}
/* if already writing to a jpeg continue,
* else start writing to a new one
*/
if (found == false && num != 0)
{
fwrite(buffer, BLOCK, 1, jpeg);
}
else if (found == true)
{
if (num == 1)
{
jpeg = fopen(fileName, "w");
fwrite(buffer, BLOCK, 1, jpeg);
}
else
{
fclose(jpeg);
jpeg = fopen(fileName, "w");
fwrite(buffer, BLOCK, 1, jpeg);
}
}
// read next block from file
fread(buffer, BLOCK, 1, img);
}
fclose(jpeg);
fclose(img);
free(buffer);
return 0;
}
My questions are:
1) What are the possible optimizations that can make my code faster/better/more concise?
2) I'm not handling slack space, since trailing zeros at the end of a jpeg won't affect rendering it (we were told it won't contain any garbage values)... how may I approach the problem?
3) Can I enhance my if else constructs? or even replace them with something more elegant?
Much thanks in advance guys.
Last edited by VeNiX; January 21st, 2013 at 03:00 PM.
-
January 21st, 2013, 03:22 PM
#2
Re: Optimizing jpeg recovery code
Your location for your data download forces the install of additional software to the web browser to perform the download or it won't allow the download. This is not good!
-
January 21st, 2013, 03:24 PM
#3
Re: Optimizing jpeg recovery code
Originally Posted by 2kaud
Your location for your data download forces the install of additional software to the web browser to perform the download or it won't allow the download. This is not good!
Looks like you clicked on an ad, just click the download button at the center of the page... the one with the big blue arrow pointing at it.
-
January 21st, 2013, 03:50 PM
#4
Re: Optimizing jpeg recovery code
Click Download as free user. This downloads card.raw.exe. Running this program then asks to click button next. This then says going to install babylon and other things so I exit. There is no big blue arrow. It also installs putdownlocker program onto computer which I have had to delete. If I can get copy of your data I'll look at your program - but you need to find another way of enabling someone to download the data.
-
January 21st, 2013, 04:50 PM
#5
Re: Optimizing jpeg recovery code
There are a couple of things which instantly come to mind. If found is false, is num likely to be not 0? Also the only difference between num == 1 and num != 1 is the fclose. You could do this
Code:
if (num != 1) fclose(jpeg);
jpeg = fopen(fileName, "w");
fwrite(buffer, BLOCK, 1, jpeg);
I want a look at the data before I offer other suggestions but can't get it as per my previous posts.
-
January 22nd, 2013, 06:59 AM
#6
Re: Optimizing jpeg recovery code
Originally Posted by 2kaud
There are a couple of things which instantly come to mind. If found is false, is num likely to be not 0? Also the only difference between num == 1 and num != 1 is the fclose. You could do this
Code:
if (num != 1) fclose(jpeg);
jpeg = fopen(fileName, "w");
fwrite(buffer, BLOCK, 1, jpeg);
I want a look at the data before I offer other suggestions but can't get it as per my previous posts.
The first condition for found and num makes sure I don't write anything to a file that isn't open (because I haven't found the first jpeg yet).
As to your second suggestion, that is definitely better than what I was doing.
Here's the CF image on my dropbox: https://www.dropbox.com/s/offxj7sf1ql63ka/card.raw
Thanks for your help and sorry for the headache m8
-
January 22nd, 2013, 07:19 AM
#7
Re: Optimizing jpeg recovery code
-
January 22nd, 2013, 07:57 AM
#8
Re: Optimizing jpeg recovery code
But where do you independently set found or increment num? You seem to be incrementing num in only two places and in both places you also set found to true. So in your code
Code:
if (found == false && num != 0)
surely this condition will never occur as num is only incremented after found is set to true?
-
January 22nd, 2013, 08:30 AM
#9
Re: Optimizing jpeg recovery code
What compiler are you using? I've taken your code and tried to compile under Microsoft visual studio - but it correctly reports error for the malloc
Code:
char* buffer = malloc(BLOCK);
Shouldn't this be
Code:
char* buffer = (char *)malloc(BLOCK);
Also, why allocate at run time when you know size at compile time? Why not just
Code:
char buffer[BLOCK];
However given the above code change, I tried to run your program using your card.raw data file - but it only writes out one file 000.jpg!!! What system are you using?
-
January 22nd, 2013, 08:47 AM
#10
Re: Optimizing jpeg recovery code
OK, I got it working now. You need to open/write the files in binary mode. You use
Code:
FILE* img = fopen("card.raw", "r");
On my system, the default mode for fopen is text-mode translation which involves carriage-return and linefeed characters.
To be sure you always open file in binary mode it would be better to use
Code:
FILE* img = fopen("card.raw", "rb");
The same applies to creating the output files. It would be better to use to make sure writing in binary mode without carriage-return and linefeed translation.
Code:
jpeg = fopen(fileName, "wb");
-
January 22nd, 2013, 11:48 AM
#11
Re: Optimizing jpeg recovery code
Please ignore my earlier comment re the found and num test. I see now what you're doing.
You say that the jpegs are block aligned - which looking at the card file seems to be correct. This also applies to the first one which starts at offset 0x1400. So why are you first searching through the blocks byte by byte for a signature which starts at the beginning of a block?
You ask about slack space at the end of an image. An image ends with 0xffd9. Have you looked at images with a hex editor? When you are writing a block you could search for 0xffd9 and if found only write a partial block upto and including these. That means searching each block. However, you are writing many blocks which consist totally of 0x00. Before writing a block check if it is all 0x00 and only write if it is not. This won't get rid of all the 0x00 at the end of a file but it will ensure that only blocks containing data are written.
Like the cat by the way!
Hope this has been of help.
-
January 22nd, 2013, 03:03 PM
#12
Re: Optimizing jpeg recovery code
A couple more comments. Your while loop tests for eof but does not test for an error on the fread. As fread is only used to read one record, it should return 1 if succeeded. If either eof or error occurred then it will not read and will return 0.
You are also not testing for error on your fopen for write. Assuming something works is not good programming practice.
You also asked about enhancing/replacing your if..else constructs and making them more elegant. Yes, these can be very much optimised. Your bool variable 'found' is not needed. Have a try yourself to change the logic of your while loop. Hint. You only need one case of fwrite not two as you have. I've re-coded your program and can do the whole while loop in 9 lines of code (excluding { } lines and comments) including testing for create error.
Have a go and let's know how you get on.
-
January 22nd, 2013, 03:31 PM
#13
Re: Optimizing jpeg recovery code
I'm using Linux as my primary system and clang as a compiler...
Anyway, thanks for the knowledge about dealing with slack space. As for searching blocks byte by byte at first, in the class we were told that the first few blocks contain garbage values and that seems to have confused me in that regard.
I'll start optimizing my code as per your advice and I'll post it back here.
-
January 22nd, 2013, 05:06 PM
#14
Re: Optimizing jpeg recovery code
Originally Posted by VeNiX
I'll start optimizing my code as per your advice and I'll post it back here.
You may be surprised that doing this may be the bottleneck:
Code:
sprintf(fileName, "%.3d.jpg", num);
If you have the source code to your C library's version of sprintf(), you will see the gauntlet the code has to go through just to convert a number to a string. It may be faster just to generate the names beforehand and stick them in an array, then you access the array using "num" as an index.
Right now, you're factoring in library functions that really do nothing except to slow down the process. Functions such as sprintf() shouldn't appear in the middle of critical code.
Regards,
Paul McKenzie
Last edited by Paul McKenzie; January 24th, 2013 at 05:56 PM.
-
January 24th, 2013, 03:50 PM
#15
Re: Optimizing jpeg recovery code
Originally Posted by Paul McKenzie
You may be surprised that doing this may be the bottleneck:
Code:
sprintf(fileName, "%.3d.jpg", num);
If you have the source code to your C library's version of sprintf(), you will see the gauntlet the code has to go through just to convert a number to a string. It may be faster just to generate the names beforehand and stick them in an array, then you access the array using "num" as an index.
Right now, you're factoring library functions that really do nothing except to slow down the process. Functions such as sprintf() shouldn't appear in the middle of critical code.
Regards,
Paul McKenzie
So I should use sprintf() outside of the while loop to initialize all filenames first? or did you mean I should use something else entirely?
Thanks for the help.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|