CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 5 of 5
  1. #1
    Join Date
    Sep 2004
    Posts
    244

    Post SortNames Program

    I was asked to write a sorting program in c++

    Specification:
    A program to sort a list of names is required.
    The file also contains the person’s age and phone number.
    The main challenge is that the entire list can not fit into
    physical memory at any given time.
    There is no limit to the amount of physical disk space.

    The Input

    An ASCII TSV (Tab Separated values) file containing a list of names.
    Each line is of the format:
    <Last Name>, <First Name><tab><age><tab><Phone Number><CR>

    Sample:

    Kalou, Salomon 30 476-2769
    Ad****or, Emmanuel 31 310-4951
    Morais, Nuno 37 660-6290
    Poom, Mart 34 469-8379
    Shevchenko, Andriy 26 922-2033
    Obi, Mikel 35 888-9370
    Ljungberg, Fredrik 34 492-1755
    Davenport, Calum 28 240-7734
    O'Hara, Jamie 30 333-5168

    The Output

    A file in the same format sorted by Last Name.

    The Program

    The final program must take three arguments

    Maxnames – the maximum number of names which can be loaded
    into memory at any given time
    Input - name of the input file
    Output – name of the results output file

    For example: SortNames.exe 10 input.txt output.txt

    So what I manage to achieve so far is to read 10 record from file, sort them and output the result.

    CodeListing:

    <code>
    #include <iostream>
    #include <fstream>
    #include <string>
    #include <vector>
    #include <algorithm>


    using namespace std;

    ////////////////////////////////////
    //Header
    ////////////////////////////////////
    struct PersonInfo
    {
    string firstName;
    string secondName;
    int age;
    string phoneNumber;

    PersonInfo(const string& fName = "",
    const string& sName = "",
    int personsAge = 0,
    const string& number = ""):
    firstName(fName), secondName(sName), age(personsAge), phoneNumber(number)
    {}

    friend bool operator<(const PersonInfo& person1, const PersonInfo& person2);
    };

    bool operator<(const PersonInfo& person1, const PersonInfo& person2)
    {
    int c1 = person1.firstName.compare(person2.firstName);
    int c2 = person1.secondName.compare(person2.secondName);
    int c3 = person1.phoneNumber.compare(person2.phoneNumber);

    return (c1 < 0) ||
    (c1 == 0 && c2 < 0) ||
    (c1 == 0 && c2 == 0 && person1.age < person2.age) ||
    (c1 == 0 && c2 == 0 && person1.age == person2.age && c3 < 0);
    }

    vector<PersonInfo> PersonInfoslist;

    void Run(const char* infile, int* maxEntries);
    //void WriteToTempFile(const vector<PersonInfo>& p);

    ////////////////////////////////////
    //Source
    ////////////////////////////////////
    void Run(const char* infile,int maxEntries)
    {
    ifstream inputFile(infile, ios::in);

    if (!inputFile)
    {
    cerr << "Input file "<< infile << " could not be opened!"<<endl;
    exit(1);
    }

    for (int i = 0;i < maxEntries;i++)
    {
    PersonInfo pInfo;
    char* buffer;
    inputFile.getline( buffer,2048);
    string line = buffer;

    int itemNum = 0;
    string token;
    size_t p0 = 0, p1 = string::npos;

    while(p0 != string::npos)
    {
    p1 = line.find_first_of(",\t ", p0);
    if(p1 != p0)
    {
    token = line.substr(p0, p1 - p0);
    }
    p0 = line.find_first_not_of(",\t ", p1);

    switch( itemNum )
    {
    case 0:
    pInfo.firstName = token;
    itemNum++;
    break;
    case 1:
    pInfo.secondName = token;
    itemNum++;
    break;
    case 2:
    pInfo.age = atoi(token.c_str());
    itemNum++;
    break;
    case 3:
    pInfo.phoneNumber = token;
    itemNum=0;
    PersonInfoslist.push_back(pInfo);
    break;
    default:
    cout << "Error in tokenizing string!" << endl;
    break;
    }
    }
    }

    //Sort the items in the vector
    sort(PersonInfoslist.begin(), PersonInfoslist.end());

    for(vector<PersonInfo>::const_iterator it = PersonInfoslist.begin(); it != PersonInfoslist.end(); ++it)
    {
    cout << it->firstName << ", " << it->secondName << ", " << it->age << ", " << it->phoneNumber << endl;
    }
    }

    int main(int argc, char** argv)
    {
    if (argc != 4)
    {
    cout << "Usage: SortNames <Number of names to read> <Input file> <Output file>" << endl;
    exit(1);
    }

    //Getting the arguments from command line.
    int maxEntries = atoi(argv[1]);
    const char* inFile = argv[2];
    const string outFile = argv[3];

    Run(inFile,maxEntries);

    return 0;
    }
    </code>

    So I was wondering if a generous C++ programmer out there will be kind enough to help me out on this task. Thanks in advance for any suggestion on improving the current solution
    Last edited by inbugable; December 21st, 2006 at 04:03 AM.

  2. #2
    Join Date
    Nov 2006
    Location
    Essen, Germany
    Posts
    1,344

    Re: SortNames Program

    There´s an algorithm described on wikipedia:

    External sorting

    It´s not that difficult, give it a try.

    Regards,
    Guido
    Last edited by GNiewerth; December 19th, 2006 at 10:32 AM.
    - Guido

  3. #3
    Join Date
    Sep 2005
    Location
    United States
    Posts
    799

    Re: SortNames Program

    Also, please use [ code ][ /code ] tags when posting source code.

    It is almost impossible to read the code you have posted.
    Please rate my post if you felt it was helpful

  4. #4
    Join Date
    Sep 2004
    Posts
    244

    Post Re: SortNames Program

    Where I am confuse is who I can generate the temp files. I have to read 100 records to read by 10 at the time. I was thinking about the following:

    Read from file;
    If person starting with letter A or a
    Create a file A.txt and append the person data

    When whole finish I can open the entire files created and sort them one after the other, by appending them in the output.txt file. But there is a problem with this technique, it can result to an long switch or if statement.

    Any body can help ?

  5. #5
    Join Date
    Nov 2006
    Location
    Essen, Germany
    Posts
    1,344

    Re: SortNames Program

    Hi,

    you can´t bucket-sort your whole recordset like you stated, because you can´t assume that the records are equally distributed. Say if you have 100,000,000,000 records of person and all of them begin with an "A" you actually don´t split your temporary file and eventually run out of memory when loading it into memory. You must split the temporary files into a fixed number of records.

    What you have to do is following:

    1) determine a reasonable size of records that can be processed in memory, say 10,000 or 100,000, or maybe 1,000,000,000.

    2) read a chunk (size calculated in 1) from your input file, sort them and write them back into a temporary file. The last chunk won´t be complete, but that doesn´t matter.
    Count the number of temporary files you have created, you´ll need it later. Starting with 0 you can name your files like tmpsrt00001.dat, tmpsrt00002.dat,... Now you have a number of temporary files with their records sorted.

    3) Create a queue for each temporary file.

    4) Pick the smallest front record from all queues and write it into the final output file.
    Remove that record from the queue.
    If there are records left in the temporary file, read one record and append it to the queue.
    If there are no more records in the temporary file and the queue is empty, mark the queue as fully processed.

    5) Repeat step 4) until all queues are fully processed.

    6) delete the temporary files.

    That´s the basic algorithm, performance issues neglected. To improve performance you can read multiple records into your queue, like 1000, so you reduce slow I/O operations.

    Regards,
    Guido
    Last edited by GNiewerth; December 21st, 2006 at 08:03 AM.
    - Guido

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured