multiple writers and single read FIFO queue performance
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 13 of 13

Thread: multiple writers and single read FIFO queue performance

  1. #1
    Join Date
    Jun 2002
    Posts
    137

    multiple writers and single read FIFO queue performance

    Hi All,

    we have designed a FIFO queue with pthread mutex and its conditionWait(), inside the queue, we use std::deque, which is quite efficient. and we only enqueue & dequeue pointers to objects, that is also as efficient as posible.

    however the time taken to enqueue/dequeue a pointer into/from the queue takes about 2-3 microseconds.

    I guess most of the time taken is by context switch, do you have any idea to achieve a better performance by using different locks and avoid context switching?

    We have tried atomic operation with pure integer array, it seems not to help much.

    Thanks a lot in advance! Appreciate your replies.
    sandodo

  2. #2
    Join Date
    Nov 2003
    Posts
    1,786

    Re: multiple writers and single read FIFO queue performance

    "Faster locks" is not the way to go. Instead, think of how you can reduce lock contention.

    For example, reader-writer contention could be reduced by having the reader deque the entire contents of the Q, instead of one element at a time.

    Another example: writer-writer contention could be removed completely by having each writer threads use a thread-local Q. Then the reader thread would iterate over all writer Q's, consuming the entire Q at a time.

    gg

  3. #3
    Join Date
    Jun 2002
    Posts
    137

    Re: multiple writers and single read FIFO queue performance

    thanks a lot for pointing out a direction.

    one question for the idea of one local queue per writer, how can we avoid the reader thread to avoid blocking by a writer which is a bit free?
    and we prefer to keep FIFO as much as possible, how should we cater it?

    THanks & regards,
    sandodo

  4. #4
    Join Date
    Nov 2003
    Posts
    1,786

    Re: multiple writers and single read FIFO queue performance

    >> how can we avoid the reader ... blocking by a writer
    No need to avoid it. The writer will only hold the lock long enough to enqueu, so the reader will only ever block for that long at most, if it blocks at all.

    >> we prefer to keep FIFO as much as possible
    If two writers enqueue "at the same time", which one is first? If all writers are forced through the same lock, then the answer is easy - who ever wins the lock is first Otherwise you'll need to include a timestamp (or some other sort criteria) so that the reader thread knows how to sort the data from multiple writers.

    gg

  5. #5
    Join Date
    Jun 2002
    Posts
    137

    Re: multiple writers and single read FIFO queue performance

    we cannot have reader thread keep running without waiting, and we want to have better throughput by read any messages enqueued as soon as possible, currently we use pthread_condwait() and pthead_condSignal(). if we have one local queue per writer thread, one writer can block the reader if there is no messages enqueued by that writer thread.

    my second question is related to above issue as well. if we force the reader to sleep in a while loop, then FIFO cannot be maintained well.


    If we get back to one queue for all writers and reader, then we get back to the issue we faced now, it takes about 2-3 micro-seconds to enqueue/dequeue a message.

    currently we do schd_yield once a writer enueued a message. and reader will read many messages as possible once it get the lock.

  6. #6
    Join Date
    Nov 2003
    Posts
    1,786

    Re: multiple writers and single read FIFO queue performance

    It sounds like you may of fallen into premature optimization. Your throughput is only going to be as fast as the reader thread can process the data.

    It also sounds like that the increased complexity of a thread-local writer's Q may not be worth the effort. This type of effort will improve scalability more than throughput, for highly contended locks.

    You should not be calling schd_yield or sleep anywhere in your code. The OS will take care of context switching on its own.

    If you have a working MPSC (multi-producer, single-consumer) Q now, why do you feel it needs to be any faster? Are you profiling the code under real-world loads, and you don't like what it's giving you now? What is this Q for? What is a real-world load?

    gg

  7. #7
    Join Date
    Jun 2002
    Posts
    137

    Re: multiple writers and single read FIFO queue performance

    we are using this queue for trading platform's backend server.
    the queue itself will cost about 6 micro-seconds to enqueue and dequeue a message, if we want to introduce a new layer into the design by using this queue, the cost is a burden.

    if we can reduce the cost to be less than 1 micro-second, we will have more freedom to improve the architecture design.

    thanks,
    sandodo

  8. #8
    Join Date
    Nov 2003
    Posts
    1,786

    Re: multiple writers and single read FIFO queue performance

    >> if we want to introduce a new layer into the design by using this queue, the cost is a burden.
    All this isn't really making sense yet. Are you just trying to reduce delays in the producer threads? Or increase end-to-end throughput of "message" processing? How does the code work today - with or without a Q? What is the end goal you are trying to achieve by introducing this Q into the code (or by making changes to how the existing Q works)?

    >> the queue itself will cost about 6 micro-seconds to enqueue and dequeue a message
    Performance analysis and measurement is hard enough to get right on it's own. What exactly did you measure to come up with "6 micro-seconds"? And how? If there is lock contention, it can easily take way longer than 6 us. Not to mention the cases where std::deque has to go to the heap for memory (while the lock is being held).

    gg

  9. #9
    Join Date
    Jun 2002
    Posts
    137

    Re: multiple writers and single read FIFO queue performance

    thanks for your replies.

    currently the throughput of one message end to end is about 50 micro-seconds, two queues (input_q and outgoing_q) take more than 10 micro-seconds out of the 50 micro-seconds.

    our target is to to maintain around the same througput of one message and improve its scalability when there are multiple messages received at the same time,
    To improve scalability, we have to introduce another layer of queues for each different processor, however it will occur extra time spent in enqueue and dequeue.

    during our test, we start 8 threads to enqueue the messages and one reader thread to dequeue. it takes about 4 seconds to finish enqueuing 1.6M messages, every message is just a pointer. We are testing in a Red Hat Linux AS5 platform with 8 CPUs.

    std::deque is very fast since its underly structure is array actually. one enqueue or dequeue operation in std::deque is just a few nano-seconds.

    thanks,
    sandodo

  10. #10
    Join Date
    Nov 2003
    Posts
    1,786

    Re: multiple writers and single read FIFO queue performance

    >> throughput of one message end to end is about 50 micro-seconds, two queues (input_q and outgoing_q)
    These are global Q's used by all threads? Thread-local Q's? Is the current code even mutli-threaded? What does "end to end" even mean? input_q::enqueue -> outgoing_q::enqueue? Or does the time include outgoing_q::dequeu + processing time?

    >> we start 8 threads to enqueue the messages and one reader thread to dequeue.
    Why would you funnel all messages down to a single thread? Does the existing code do that today?

    >> it takes about 4 seconds to finish enqueuing 1.6M messages
    So the time it takes for the 1 consumer thread to dequeue and process all the messages doesn't matter?

    >> std::deque is very fast since its underly structure is array actually.
    Only std::vector uses a contiguous array. std::deque typically uses a collection of same-sized blocks of memory. Any insertion operation may require the std::deque to allocate memory off the heap.

    gg

  11. #11
    Join Date
    Jun 2002
    Posts
    137

    Re: multiple writers and single read FIFO queue performance

    end to end is to mean that the message received at incoming socket to broadcasted to outgoing socket.

    and our test model simulate the existing way of using queue. however, of course, we donot process those messages in our test, which is not necessary.

    We found that zeroMQ is actually faster than our queue, which is facinating, we will examine the difference to see how. thanks for all your patience and advices.

    deque is designed as a ring buffer which is very efficient if there is no capacity change.

    see below:
    Deque sequences have the following properties:
    Individual elements can be accessed by their position index.
    Iteration over the elements can be performed in any order.
    Elements can be efficiently added and removed from any of its ends (either the beginning or the end of the sequence).


    Therefore they provide a similar functionality as the one provided by vectors, but with efficient insertion and deletion of elements also at the beginning of the sequence and not only at its end. On the drawback side, unlike vectors, deques are not guaranteed to have all its elements in contiguous storage locations, eliminating thus the possibility of safe access through pointer arithmetics.
    Vectors are good at:
    Accessing individual elements by their position index (constant time).
    Iterating over the elements in any order (linear time).
    Add and remove elements from its end (constant amortized time).
    vector only guarantee the performance of push_back and pop_back() which is not a FIFO.

  12. #12
    Join Date
    Nov 2003
    Posts
    1,786

    Re: multiple writers and single read FIFO queue performance

    >> deque is designed as a ring buffer
    No, not really. But if the capacity never grows, then I would expect a high-quality implementation to not go to the heap. But there is nothing in the standard that says that it can't.

    Looks like ØMQ may be a good candidate for your situation - since it's tailored to socket streams/messages.

    gg

  13. #13
    Join Date
    Jun 2002
    Posts
    137

    Re: multiple writers and single read FIFO queue performance

    thanks!

    sandodo

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Azure Activities Information Page

Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center