CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 1 of 2 12 LastLast
Results 1 to 15 of 16

Hybrid View

  1. #1
    Join Date
    Jul 2015
    Posts
    5

    [RESOLVED] Find all sentences or consecutive sequence of sentences where min<=length<=max

    I was given the following problem at a job interview and 2 hours to complete it:
    Write a program to find all the sentences, or consecutive sequence of sentences, in a text file where: min <= length <= max.
    Assume that a sentence ends in a period, question mark, or exclamation point. There is no fixed limit on the length of a line in the file or a practical max number of lines in the file. Count all blanks and punctuation. Include the trailing space after each sentence, but assume there is ever only one blank between sentences. (All EOL/null characters should be converted to blanks as well).

    The output of the program should be sent to another text file and be sorted in ascending order based on the ASCII value of the first character of each line in the output.
    Precondition: Min and Max will be positive integers less than 1000, and Min <= Max.
    The name of the input and output text file is to be provided as a command line argument (not read from Standard Input or written to Standard Output).

    Example Usage: program.exe <min> <max> <input file> <output file>

    For example, given this input text:
    Black is white. Day is night. Understanding is ignorance.
    Truth is fiction. Safety is danger.


    If min = max = 16 then the output is
    Black is white.
    If min = max = 18 then the output is
    Safety is danger.
    Truth is fiction.

    If min = 30 and max = 37 then the output is

    Black is white. Day is night.
    Truth is fiction. Safety is danger.

    because the two sentences are consecutive sentences with the desired length.


    I'm still having a tough time wrapping my head around what is being asked and how to approach it. This is what I came up with at the end of the 2 hours:
    Code:
    #include <iostream>
    
    #define MAXSIZE 1000
    
    
    int ParseSent(const char *str, int *lengths, char **sents);
    void initStr(char *str, int size);
    
    using namespace std;
    
    int _tmain(int argc, char* argv[])
    {
    	int lengths[MAXSIZE/2];
    	int numSents, running_count;
    	char string[MAXSIZE + 1];
    	char outstr[MAXSIZE + 1];
    	char *sents[MAXSIZE/2];
    
    	if (argc != 5){
    		printf("Usage: %s <min> <max> <input_file> <output_file>\n", argv[0]);
    		return 1;
    	}
    	int min = atoi(argv[1]);
    	int max = atoi(argv[2]);
    
    	FILE *infile = fopen(argv[3], "r");
    	if (infile == NULL){
    		printf("Error: Could not open input file %s\n", argv[3]);
    		return 1;
    	}
    
    	FILE *outfile = fopen(argv[4], "w");
    	if (outfile == NULL){
    		printf("Error: Could not create output file %s\n", argv[4]);
    		fclose(infile);
    		return 1;
    	}
        
    
    	while (!feof(infile)){
    		fread(string, MAXSIZE, 1, infile);
    		numSents = ParseSent(string, &lengths[0], sents);
    		running_count = lengths[0];
    		initStr(outstr, MAXSIZE + 1);
    		for (int i = 0; i < numSents; i++){
    			if ((running_count >= min) && (running_count <= max)){
    				sprintf(outstr, "%s ", sents[i]);
    				running_count += lengths[i+1];
    			}
    		}
    		if (outstr[0] != NULL)  // something to write?
    		    fputs(outstr, outfile);
    	}
       
    	fclose(infile);
    	fclose(outfile);
    	return 0;
    }
    
    
    
    int ParseSent(const char *str, int *lengths, char **sents){
    	int sent_count = 0;
    	int len_count = 0;
    	while (*str != '\0') {  // end of line, Null inserted by fgets()
    		if ((*str == '.') || (*str == '!') || (*str == '?')){
    			*lengths++ = len_count;
    			len_count = 0;
    			sent_count++;
    		}
    		sents[sent_count][len_count] = *str++;
    		len_count++;
    	}
    	*lengths++ = -1;
    	return sent_count;
    }
    
    
    void initStr(char *str, int size){
    	int count = 0;
    	while (count < size){
    		*str++ = '\0';
    	}
    }
    Was I even on the right track? I'm curious what approaches you would have taken to solve this (i.e. using a Sentence Class to store each sentence along with it's length and index).
    If anyone has the solution, could you kindly share it? I did find a very similar question (only missing the sort alphabetically part) over here under "Moderate Problems": http://users.csc.calpoly.edu/~jdalbe...gPractice.html

  2. #2
    Join Date
    Oct 2008
    Posts
    1,456

    Re: Find all sentences or consecutive sequence of sentences where min<=length<=max

    assuming this is c++ and we can use only the standard lib, something like ( off the top of my head, neither compiled nor tested ! )

    Code:
    #include <string>
    #include <vector>
    #include <fstream>
    #include <iostream>
    
    int main( int argc, char* argv[] )
    {
        try
        {
            // parse args
    
            auto min = argc > 1 ? std::stoi( argv[1] ) : throw std::runtime_error{ "missing min" };
            auto max = argc > 2 ? std::stoi( argv[2] ) : throw std::runtime_error{ "missing max" };
            auto infile = argc > 3 ? std::ifstream{ argv[3] } : throw std::runtime_error{ "missing input file" };
            auto outfile = argc > 4 ? std::ofstream{ argv[4] } : throw std::runtime_error{ "missing output file" };
    
            // validate args
    
            if( min <= 0 || max <= 0 || min >= 1000 || max >= 1000 || min > max  )
                throw std::runtime_error{ "wrong min/max" };
    
            if( !infile || !outfile )
                throw std::runtime_error{ "cannot open in/out files" };
    
            // read in file
    
            auto line = std::string{};
            auto running_sentences = std::vector<std::string>{};
            auto running_length = std::size_t{};
            auto filtered_sentences = std::vector<std::string>{};
    
            while( std::getline( infile, line ) )
            {
                // parse sentences
    
                auto pos = std::size_t{};
    
                line.push_back(' '); // add a blank at the end to simplify parsing
    
                while( pos < line.size() )
                {
                    auto nextpos = line.find_first_of( ".?!", pos );
    
                    if( nextpos != line.npos && nextpos + 1 < line.size() && line[ nextpos+1 ] == ' ' )
                    {
                        // found ! check ( accumulated ) length
    
                        running_sentences.push_back( line.substr( pos, nextpos - pos + 2 ) )
                        running_length += sentence.size();
                        pos += sentence.size();
    
                        while( running_length >= min && running_length > max )
                        {
                            // too big, discard front and continue
    
                            assert( !running_sentences.empty() );
                            assert( running_length >= running_sentences.front().size() );
    
                            running_length -= running_sentences.front();
                            swap( running_sentences.front(), running_sentences.back() );
                            running_sentences.pop_back();
                        }
    
                        if( running_length >= min && running_length <= max )
                        {
                            // ok, add to filtered sentences and reset for next
    
                            filtered_sentences.emplace_back();
    
                            for( auto const& s: running_sentences )
                                filtered_sentences.back() += s;
    
                            running_sentences.clear();
                            running_length = 0;
                        }
                    }
                    else
                        throw std::runtime_error{ "wrong input file format" };
                }
            }
            
            // finally, sort ( according to problem statement ) and output
    
            std::sort( filtered_sentences.begin(), filtered_sentences.end(), []( auto const& l, auto const& r )
                {
                    assert( !l.empty() && !r.empty() );
    
                    return l.front() < r.front();
                } );
                
            for( auto const& s: filtered_sentences )
            {
                outfile << s << '\n';
            }
        }
        catch( std::runtime_error const& err )
        {
            std::cout << "oops, " << err.what() << std::endl;
        }
    }
    Last edited by superbonzo; July 15th, 2015 at 04:24 AM.

  3. #3
    Join Date
    Oct 2008
    Posts
    1,456

    Re: Find all sentences or consecutive sequence of sentences where min<=length<=max

    Quote Originally Posted by bulle48 View Post
    Include the trailing space after each sentence, but assume there is ever only one blank between sentences. (All EOL/null characters should be converted to blanks as well).
    wait, do they mean that eol's can be in the middle of a sentence ? if yes, the inplace parsing can be further simplified by using a deque and char by char extraction ...

  4. #4
    Join Date
    Jul 2015
    Posts
    5

    Re: Find all sentences or consecutive sequence of sentences where min<=length<=max

    I don't think that's what they meant. I think they want the trailing '\n' or '\0' from reading a line to be converted to a space and counted in the length for that sentence.

  5. #5
    Join Date
    Apr 2000
    Location
    Belgium (Europe)
    Posts
    4,626

    Re: Find all sentences or consecutive sequence of sentences where min<=length<=max

    Quote Originally Posted by bulle48 View Post
    I'm still having a tough time wrapping my head around what is being asked and how to approach it. This is what I came up with at the end of the 2 hours:
    not surprising given the strange formulation and ambiguous meaning of som of it.

    It starts out as an exercice in attempting to mindread the strange brain of whomever created that problem.

    basically what's being asked:
    Write a program to read a textfile that has ASCII encoding. The textfile contains sentences that may be split by EOL.
    Read the file, parse the tekst into individual sentenses and write the sentences that are between min and max characters in length to another textfile in sorted order.

    A sentence is defined to start with a non-witespace and ends in a dot. You can assume that every sentence will be ended in a dot, exclamation point or question mark. Ignore whitespace after the . ! or ? and compress whitespace into a single space character. Whitespace is any sequence of characters of the set: space, tab and EOL.
    The length of a sentence includes spaces and puctuation.

    You can assume that sentences in the textfile will be length 1000 at most.
    There is no limit to the line length in the tekst file.


    ALso:
    The stated problem is incomplete or ambiguous.
    it states: There is no fixed limit on the length of a line in the file or a practical max number of lines in the file. However it does require the resulting file is sorted.
    This either requires you to
    - spot the problem/incompleteness.
    - Or omits the fact that "for any given min/max you can assume there is enough memory to hold all sentences in memory"
    - Or it expects some elaborate/complex file-based or virtual memory based sorting implementation that is WAY too difficult for a simple 2 hour test.



    It's a good thing they included an example because at least that explains what they were asking for.

    as to your answer:
    1)
    You're assuming a maximum input size length (1000), where it's stated there is none "There is no fixed limit on the length of a line in the file... "
    several parts of your code will fail on such long lines.

    2)
    Your answer is a C program, you're posting this in a C++ forum. Were they expecting you to submit a C answer or a C++ answer.
    while your program would compile with a C++ compiler, if they're lookign for a C++ programmer it's an entirely wrong answer.


    3)
    You use _tmain() indicatinf you intend to make a program that can be compiled as either unicode or ansi
    but then proceed to not use the _t-versions of any other function in there as well as never using _T("") wrapped literal strings.

    4) you include <iostream>, a C++ header
    but then don't use any of it, and you aren't including any of the headers you do need

    5) learn to pre-increment (and decrement) instead of post-increment EVERYWHERE, except and only except where you need the specific different behaviour post-increment offers (if you don't know the difference, time to get back to learning), or if you are using a class that only has a post-increment (badly designed class is bad) operator.

    6)
    Even if your answer works (I haven't tried), the program organisation/flow shows a lack of understanding of what's being asked.

    I would more have expected an answer in the form. (rough outline, not actual code)

    Code:
    int main ()
    {
        open_input_file;
    
        while ( ! end_of_file )
        {
              read_sentence_from_input_file();
              size_t senlength = length(sentence);
              if (min<=senlength && senlength<=max)
                   add_to_output_array(sentence);
         }
         close_input_file;
        
         sort_output_array;
    
         write_output_array_to_output_file;
    }
    with the "read_sentence_from_input_file" being a separate function doing all internal processing, caching, parsing, whitespace compressing and returning a sentence, be it part of a line in the tekst or be it split over multiple lines.


    you could go further and make a function for all of the code from open_input_file to close_input_file, but that just makes an extra function that doesn't really make your code more readable/understandable.

    To be fair, for a 'quick and dirty' and small application like this, I probably wouldn't bother as much and put all/most of it just in main() if I was writing this for myself. But in these interview texts and course exams, they want to see you cut a program into nice chunks even if they wouldn't even do it themselves for such a small app. :-)
    "making a program that does the job" is only one aspect of what they're after. They also want to see how well your code management skills are (even if hard to evaluate in a program you only get 2 Hours for).
    Last edited by OReubens; July 15th, 2015 at 07:40 AM.

  6. #6
    Join Date
    Jul 2015
    Posts
    5

    Re: Find all sentences or consecutive sequence of sentences where min<=length<=max

    Thanks for your lengthy explanation and breakdown. I was given the option to use C or C++. I have 9 years experience writing embedded software in C but very little C++. The reason for the mixing in my code is because I was using VisualStudio which only gave me an option for c++ templates as I don't have a regular C compiler on my computer at home. To clarify your points:
    1- They say there is no fixed length for a line, but that min/max must be less than 1000, therefore I assume that any sentences (or sequence of sentences) in a line must also be less than 1000 chars.
    2&3- explained above
    4- i used cout to show some messages on the console when debugging but removed these lines before submitting the code, guess I forgot to remove the library header as well
    5- my new solution uses a bit of pre-incr where it made sense to me, but I am more comfortable with post incr as it helps me keep track of the pointers in my head better
    6- it didn't work, and only being given a short time to tackle the problem I dove right in before taking the time to understand what was being asked. It took me about 30mins re-reading the problem this morning to actually understand what is being asked.

    I do have a working solution now that I did on my own. Well at least it works for all the examples that were given. I sent it in to the interviewer with an explanation of my misunderstanding. I doubt they will consider it, but I had to do it anyway for my own piece of mind.

  7. #7
    Join Date
    Apr 2000
    Location
    Belgium (Europe)
    Posts
    4,626

    Re: Find all sentences or consecutive sequence of sentences where min<=length<=max

    Quote Originally Posted by bulle48 View Post
    1- They say there is no fixed length for a line, but that min/max must be less than 1000, therefore I assume that any sentences (or sequence of sentences) in a line must also be less than 1000 chars.
    yes, a line could be 1Mb in size, but each individual sentense is at most 1000 bytes.

    5- my new solution uses a bit of pre-incr where it made sense to me, but I am more comfortable with post incr as it helps me keep track of the pointers in my head better
    regardless.
    learn to do it the other way. use preincrement Always UNLESS you need the different behaviour of post increment or if pre-increment isn't available.

    THis is mostly irrelevant for something like an integer. But it could be a big performance issue on other objects. Post increment requires a temporary object to be created where pre-increment does not require this.


    I do have a working solution now that I did on my own. Well at least it works for all the examples that were given. I sent it in to the interviewer with an explanation of my misunderstanding. I doubt they will consider it, but I had to do it anyway for my own piece of mind.
    A good idea anyway, it shows motivation and a desire to see things through to the end. Whether or not it will be considered is another matter entirely of course


    ALso, next time you're in this. if something is not clear. Ask. They may be used to different jargon or consider certain things Obvious that may not be when you're coming from a different background. So something that's obviously clear to them won't be to everyone else. And it's better to get things sorted out early rather than investing (company money and ) time working on a solution that isn't doing as intended. At least in my line of software dev it isn't unusual to discuss tasks several times to be sure things are done as intended.

  8. #8
    Join Date
    Jun 2009
    Location
    France
    Posts
    2,513

    Re: Find all sentences or consecutive sequence of sentences where min<=length<=max

    Quote Originally Posted by bulle48 View Post
    And only being given a short time to tackle the problem I dove right in before taking the time to understand what was being asked.
    Arguably, that's the biggest mistake you could make.

    A rule of thumb is to never rush, no matter what. Understand the problem to the best of your abilities. If there are still points you are not sure you understand, then document these in your code as "unclear requirements" and chose a behavior that makes sense to you. Knowing that you *don't* understand something is also an important skill to have. If you can communicate with your interviewer, then they might actually be testing your communication abilities.

    Anyways, from there, start your problem from the ground up. Start with a very "low" bar, and implement cleanly. For example, start with tokenize the input file into lines, and write those to another file.

    MAKE SURE that code works. There is no point in going forward if the base code doesn't work. You'd just be coding yourself into a corner, when the errors show up later and your code is more complicated.

    Make sure your code is correctly split in functions. Don't over do it with classes, as you don't have that much time, but having clean functions helps show clarity, and more importantly, will make your code more flexible later. Think of it as an investment.

    At the same time, test your code and write tests as you go that prove your code is correct so far. This is also an important point, as it shows you place importance in correctness.

    Then repeat until you have finished your program, or you run out of time.



    My personal experience in these kind of tests is that no-one is looking for rockstar super programmers. The bar is placed high just to see how far you can go, but usually, they just want to see if you can create something clean, readable, modular and bug free. Well, that's my 0.02$. Hope it goes better next time.

    PS: Even if they say "C or C++", sticking to C like code tends to give that "newbie out of college" feeling. You want to prefer C++: Using the standard C++ libraries, you'll hugely improve your overall productivity and code robustness. I highly recommend you learn to use the STL's iostream, algorithm and collections libraries: Most of these "test programs" *need* you to be productive with these.
    Is your question related to IO?
    Read this C++ FAQ article at parashift by Marshall Cline. In particular points 1-6.
    It will explain how to correctly deal with IO, how to validate input, and why you shouldn't count on "while(!in.eof())". And it always makes for excellent reading.

  9. #9
    Join Date
    Apr 2000
    Location
    Belgium (Europe)
    Posts
    4,626

    Re: Find all sentences or consecutive sequence of sentences where min<=length<=max

    Quote Originally Posted by monarch_dodra View Post
    PS: Even if they say "C or C++", sticking to C like code tends to give that "newbie out of college" feeling. You want to prefer C++: Using the standard C++ libraries, you'll hugely improve your overall productivity and code robustness. I highly recommend you learn to use the STL's iostream, algorithm and collections libraries: Most of these "test programs" *need* you to be productive with these.


    the standard library is extremely flexible and the fact you don't need to deal with a lot of details (and in particular nasty little bugs you'll get called out for such as forgetting a delete) means that particularly for smaller type apps you have more time to wory about the code that matters rather than the details that don't.
    I'll agree with monarch... when given the choice... prefer C++ just because of the fact you'll be on your way to get working.

    Your OP doesn't even have the sorting done. Consider the effort it would take to write a dynamic array of strings in C with all the associated memory management and a sorter that works with that (C-style qsort() ... shiver...)...
    You could decide to do that bit with C++, but then you're mixing your prefered C and a handfull of C++ where C falls short leading to a hybrid mix of some C and C++... Things like that don't help your assesment :-)

  10. #10
    Join Date
    Apr 2000
    Location
    Belgium (Europe)
    Posts
    4,626

    Re: Find all sentences or consecutive sequence of sentences where min<=length<=max

    7) you aren't checking if each read from the input file or write to the output file succeeded

  11. #11
    Join Date
    Jul 2015
    Posts
    5

    Re: Find all sentences or consecutive sequence of sentences where min<=length<=max

    I was told I could write it in C or C++. I am most familiar with C and have only started getting back up to speed on my c++ skills. Also it was expressed that proper software design and code readability was more important than a working solution.

    After re-reading the problem (over and over again) I think I finally understand what is being asked and have attempted it again on my own time. Unfortunately I have come to realize how far off I was from the solution I submitted at the end of the 2 hours. I believe I now have a program that works as intended. The only aspect I'm unsure of is if multiple copies of sequence of sentences should be written. For example if the input file is that which is shown in the problem and the min=2, max=50, then should the output be:
    Black is white.
    Black is white. Day is night.
    Black is white. Day is night. Understanding is ignorance.
    Day is night.
    Day is night. Understanding is ignorance.
    Safety is danger.
    Truth is fiction.
    Truth is fiction. Safety is danger.
    Understanding is ignorance.

    ... and so on since each sentence and each sequence of sentences meets the criteria? This seems a little repetitive/recursive and I'm not sure if that is what is being asked or not since the question is ambiguous in that respect and no examples were given to illustrate this situation.

    Anyway, I'm happy I got it working! If anyone is interested in my solution feel free to PM me. I don't really want it posted publicly since I'm assuming they will be asking other candidates the same question.

    Thanks for all the help/input gurus! Much appreciated.

  12. #12
    Join Date
    Apr 2017
    Posts
    1

    Re: [RESOLVED] Find all sentences or consecutive sequence of sentences where min<=len

    Hi Dear,
    I was given this assignment that you mentioned , I would be more than happy if you send me the response. anyway I couldn't do that I missed my time.
    cheers

  13. #13
    2kaud's Avatar
    2kaud is offline Super Moderator Power Poster
    Join Date
    Dec 2012
    Location
    England
    Posts
    7,824

    Re: [RESOLVED] Find all sentences or consecutive sequence of sentences where min<=len

    Quote Originally Posted by MM2016 View Post
    Hi Dear,
    I was given this assignment that you mentioned , I would be more than happy if you send me the response. anyway I couldn't do that I missed my time.
    cheers
    What did you manage to produce in the time? If you post it here we'll provide further comment. I could post my solution, but then if someone reads this thread and then has this assignment to do it's rather like exam cheating isn't it?
    All advice is offered in good faith only. All my code is tested (unless stated explicitly otherwise) with the latest version of Microsoft Visual Studio (using the supported features of the latest standard) and is offered as examples only - not as production quality. I cannot offer advice regarding any other c/c++ compiler/IDE or incompatibilities with VS. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/ and can be used without reference or acknowledgement. Also note that I only provide advice and guidance via the forums - and not via private messages!

    C++23 Compiler: Microsoft VS2022 (17.6.5)

  14. #14
    Join Date
    Jul 2015
    Posts
    5

    Re: [RESOLVED] Find all sentences or consecutive sequence of sentences where min<=len

    Ugh.. it's been a while since I've used forums and I can't seem to figure out how to edit my posts on here.
    I thought when clicking reply to someones response it would make a sub-thread, but instead it just put everything at the bottom.

    The last 2 replies, 1st one was meant for superbonzo, 2nd for OReubens.
    I've marked the thread as resolved since I have come up with a solution I am happy with. You can PM me if you're interested in what I've done and I'll send you the code. I don't want it shared publicly since I'm assuming they will be asking other candidates the same question and don't want to give away my answer.

  15. #15
    Join Date
    Jun 2015
    Posts
    208

    Re: [RESOLVED] Find all sentences or consecutive sequence of sentences where min<=len

    Quote Originally Posted by bulle48 View Post
    it was expressed that proper software design and code readability was more important than a working solution
    The problem may have been purposely underspecified to see how you handle that. For example in your interpretation this is possible,

    Black is white.
    Black is white. Day is night.

    But equally likely this may not be what is wanted. In any case it's important you acknowledge this disambiguity and that your code is flexible enough to be easily changed to handle either way.

    Another unclarity is the sorting. You can code it yourself or use a library/system utility. It can be done in-memory or on file. I'd say attempting to code a sort yourself shows naivety since it's a dire undertaking and sorting is considered a standard feature. If you use an in-memory sort this essentially implies the whole input file must be present in memory. This may be okay but it also may not. So you should not make the rest of the program depend on loading the whole input file into memory. Instead memory usage should be independent of input size and the sort part should be isolated so even if it's initially done in-memory it could later be replaced by an external file sort spawned by your program.

    I'm sure showing off in these kinds of regards is much more important than getting the minute details exactly right.
    Last edited by tiliavirga; July 16th, 2015 at 03:44 AM.

Page 1 of 2 12 LastLast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured