CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 1 of 2 12 LastLast
Results 1 to 15 of 29
  1. #1
    Join Date
    Feb 2009
    Location
    Portland, OR
    Posts
    1,488

    Trying to resolve a locking thread

    I know that this question may sound quite vague but I don't know how else to address it. I have a process with several worker threads. When the process is closing I need to gracefully shut it down. I use a well established and tried mechanism of a signaling "stop event" to let all worker threads know that it's time to quit, while at the same time waiting for all thread handles to become signaled.

    Here's the code for the waiting part:
    Code:
    //Signal threads that it's time to stop
    SetEvent(hEventStopNow);
    
    //And wait for all threads to be signaled
    DWORD dwR = WaitForMultipleObjects(nThreadCount, pThreadHandles, TRUE, 5000);
    Most times the waiting API above returns almost immediately, but maybe 1% of the times it does not and times out with error code 258. The situation is made worse by the following factors:

    1. The process in question is a screen saver
    2. And it is running on a client's machine that I don't have access to.

    The worker threads are:

    A. One that does graphics rendering using GDI+
    B. Then another thread that preps graphics for the thread A
    (The access to these objects are synchronized between threads using mutexes)


    OK, now my question. Is there any way to find out where a thread is locked up when or after a waiting API times out, or which API was called in it last?


    PS. Please note that I'm not asking to check if my process of synchronization between threads is correct, I've done that many times myself.

  2. #2
    Join Date
    Nov 2000
    Location
    Voronezh, Russia
    Posts
    6,620

    Re: Trying to resolve a locking thread

    Is there any way to find out where a thread is locked up when or after a waiting API times out, or which API was called in it last?
    Exhaustive logging operations may help, but strictly saying, the answer is 'no' (just because logging itself interferes with thread timings and ultimately distorts the picture).

    PS. Another approach would be running the app in remote debugging session. But this surely cannot/mustn't be practiced on client site.

    PPS. The situation looks like threads deadlock each other, so you should review your code for proper resource freeing order after signaling shutdown.

    PPPS. Oh, just crossed my mind, you could snap a minidump and analyze it in the lab for thread states as well.

    I've done that many times myself.
    Never trusted statements like this. You know why? This is you who did that many times, yeah, but this is the same you who still have troubles with what you did. Please note, this is not a criticism, just a little side note.
    Last edited by Igor Vartanov; November 6th, 2010 at 06:37 PM.
    Best regards,
    Igor

  3. #3
    Arjay's Avatar
    Arjay is offline Moderator / EX MS MVP Power Poster
    Join Date
    Aug 2004
    Posts
    13,490

    Re: Trying to resolve a locking thread

    Let's start in reverse order.
    Quote Originally Posted by ahmd View Post
    PS. Please note that I'm not asking to check if my process of synchronization between threads is correct, I've done that many times myself.
    Why put restrictions on what we can address to help you? You say your synchronization is correct, but how do you know (after all, if everything was working you wouldn't be posting)?

    How about posting a snippet of the shared resources and how you are using the mutext to protect it? Btw, why are you using a mutex anyway?

    Quote Originally Posted by ahmd View Post
    The situation is made worse by the following factors:
    ...
    2. And it is running on a client's machine that I don't have access to.
    Consider logging some trace messages to a file or to the event log for the thread signalling/shutdown operations.

    Quote Originally Posted by ahmd View Post
    OK, now my question. Is there any way to find out where a thread is locked up when or after a waiting API times out, or which API was called in it last?
    When you receive a WAIT_TIMEOUT (258), call ExitCodeThread on each thread and retrieve the return code or STILL_ACTIVE (259).

    Quote Originally Posted by ahmd View Post
    Most times the waiting API above returns almost immediately, but maybe 1% of the times it does not and times out with error code 258.
    Post the thread code so we can see how you are waiting for the stop event.

    As an idea of what can go wrong, consider the following code:
    Code:
    while( TRUE )
    {
      switch( WaitForMultipleEvents( .... ) )
      case WAIT_OBJECT_0 + 0:   // Shutdown event
        return 1;
        break;
      case WAIT_OBJECT_0 + 1:
        DoGraphicsWork( ); 
        break;
      ... more switch cases 
    }
    Keep in mind that in the above code the thread is only going to return immediately when the stop event is set if the thread is waiting in the WaitFor call.

    If the thread is doing some work (like in the DoGraphicsWork call), it has to return out of that call before it will be able to respond to the stop event.

    If DoGraphics accesses some shared resource that isn't synchronized properly, another thread may not be releasing the lock (perhaps when it receives its shutdown event) causing this thread to wait indefinitely.

    Showing some code of the shared resource synchronization, and the thread procs will help us help you.

  4. #4
    Join Date
    Jul 2002
    Posts
    2,543

    Re: Trying to resolve a locking thread

    Some general points.
    Does this happen only when stop event is set? Without setting this event, both threads are running smoothly, without deadlocks? In this case chevk both threads for the case when the only thread is running, and another thread already exited.
    hEventStopNow should be manual reset event, I guess it is.
    Every wait operation in both threads must include hEventStopNow. If there are some input operations, they should be overlapped.

  5. #5
    Join Date
    Apr 2004
    Location
    England, Europe
    Posts
    2,492

    Re: Trying to resolve a locking thread

    Quote Originally Posted by ahmd View Post
    The worker threads are:

    A. One that does graphics rendering using GDI+
    B. Then another thread that preps graphics for the thread A
    (The access to these objects are synchronized between threads using mutexes)
    Could using GDI+ from a worker thread be causing problems?
    My hobby projects:
    www.rclsoftware.org.uk

  6. #6
    Join Date
    Feb 2009
    Location
    Portland, OR
    Posts
    1,488

    Re: Trying to resolve a locking thread

    Thank you all, guys. And I apologize for this delay. Let me address your points.

    Quote Originally Posted by Igor Vartanov View Post
    Exhaustive logging operations may help...
    As you also pointed out, Igor, they can also interfere with the thread timings. I don't know if I mentioned this but I could not reproduce this behavior on my system. The only indication that this happens comes from event logs from a client. And as you also said remote debugging will be quite a cumbersome and probably futile way to address this issue since it will also introduce its own delay. As for minidumps I always had a hard time analyzing them.

    Quote Originally Posted by Igor Vartanov View Post
    Never trusted statements like this. You know why?
    I agree. And I came off too pompous in it. I put it there so that people weren't asking me to check the synchronization, which was quite simple and shouldn't have caused any issues by itself. [PS. I'm sure this phrase doesn't help the original statement either ]


    Quote Originally Posted by Arjay View Post
    How about posting a snippet of the shared resources and how you are using the mutext to protect it? Btw, why are you using a mutex anyway?
    I was originally using a critical section but then I had some doubts that maybe it was locking up with a possible repeated call or something, thus I reworked a synchronization object to use the wait on a mutex and a stop event together. Here's a code snippet for a synchronized object:
    Code:
    enum CS_SYNCH_OBJECTS{
    	CAE_MUTEX,		//Synchronization mutex
    	CAE_STOP_EVENT,		//Stop event
    
    	CAE_COUNT		//MUST BE LAST!
    };
    
    
    //All calls are unfurled down to bare APIs for readability
    struct SYNCHED_V1_DATA{
    	SYNCHED_V1_DATA()
    	{
    		hCSMutex = ::CreateMutex(NULL, FALSE, NULL);
    		nCntCSObjs = hCSMutex ? 1 : 0;
    		hSynchObject[CAE_MUTEX] = hCSMutex;
    
    		nV1 = 0;
    	}
    	~SYNCHED_V1_DATA()
    	{
    		UnloadAll();
    	}
    	void UnloadAll()
    	{
    		if(hCSMutex)
    			CloseHandle(hCSMutex);
    		hCSMutex = NULL;
    		nCntCSObjs = 0;
    	}
    	BOOL Initialize(HANDLE hStopEvent)
    	{
    		//Initialize the synchronization objects with a "stop event"
    		//RETURN: = TRUE if success
    		if(hStopEvent)
    		{
    			hSynchObject[CAE_STOP_EVENT] = hStopEvent;
    			nCntCSObjs = CAE_STOP_EVENT + 1;
    		}
    
    		return hSynchObject[CAE_STOP_EVENT] && hSynchObject[CAE_MUTEX] && nCntCSObjs == CAE_COUNT;
    	}
    	BOOL Get_nV1(int& v1)
    	{
    		//"GET" method
    		//RETURN: TRUE if stop event is set (thus, need to exit ASAP w/o any further processing)
    		BOOL bR = ::WaitForMultipleObjects(nCntCSObjs, hSynchObject, FALSE, INFINITE) == (WAIT_OBJECT_0 + CAE_STOP_EVENT);
    
    		V1 = nV1;
    
    		ReleaseMutex(hSynchObject[CAE_MUTEX]);
    		return bR;
    	}
    	BOOL Set_nV1(int V1)
    	{
    		//"SET" method
    		//RETURN: TRUE if stop event is set (thus, need to exit ASAP w/o any further processing)
    		BOOL bR = ::WaitForMultipleObjects(nCntCSObjs, hSynchObject, FALSE, INFINITE) == (WAIT_OBJECT_0 + CAE_STOP_EVENT);
    
    		nV1 = V1;
    
    		ReleaseMutex(hSynchObject[CAE_MUTEX]);
    		return bR;
    	}
    
    private:
    	HANDLE hCSMutex;			//Synchronization mutex
    	HANDLE hSynchObject[CAE_COUNT];		//Synchronization objects
    	int nCntCSObjs;				//Number of valid objects in 'hSynchObject'
    
    	int nV1;				//Object being synch'ed
    
    	//Copy constructor and assignments are NOT available!
    	//(Create new object instances instead)
    	SYNCHED_V1_DATA(const SYNCHED_V1_DATA& s)
    	{
    	}
    	SYNCHED_V1_DATA& operator = (const SYNCHED_V1_DATA& s)
    	{
    	}
    };
    Quote Originally Posted by Arjay View Post
    As an idea of what can go wrong, consider the following code:
    Code:
    while( TRUE )
    {
      switch( WaitForMultipleEvents( .... ) )
      case WAIT_OBJECT_0 + 0:   // Shutdown event
        return 1;
        break;
      case WAIT_OBJECT_0 + 1:
        DoGraphicsWork( ); 
        break;
      ... more switch cases 
    }
    Yes, that is exactly how it goes inside each worker thread, and I understand that if execution locks somewhere within DoGraphicsWork( ); it will pose a problem. But since it's a graphics rendering thread everything should be running (fairly) quickly within the DoGraphicsWork( ); method, that should definitely return within 5 second margin.


    Quote Originally Posted by Alex F View Post
    Does this happen only when stop event is set?
    Yes.
    Quote Originally Posted by Alex F View Post
    hEventStopNow should be manual reset event
    Yes, of course.

    Quote Originally Posted by Alex F View Post
    Every wait operation in both threads must include hEventStopNow
    Yes, of course.

    Quote Originally Posted by Zaccheus View Post
    Could using GDI+ from a worker thread be causing problems?
    Thank you for bringing it up because that was my main concern. In the back of my head I thought, "what if GDI+ does not support multi-threading like I do it here?" I still don't know the answer, by the way.


    But I think I found a possible culprit for this issue (which is less mundane). I found one SendMessage in the worker thread that is calling to the main thread and that is a red flag #1. So I replaced it with a PostMessage and now have to wait if the glitch repeats again. (Which is quite painful by itself.)

    On a side note, I want to ask you. This was always confusing for me, since some system APIs can call SendMessage themselves internally. One of my worker threads may call GetWindowLongPtr(GWL_STYLE), SetWindowLongPtr(GWL_STYLE) and SetWindowPos(). Can those APIs call SendMessage internally?

  7. #7
    Join Date
    Nov 2000
    Location
    Voronezh, Russia
    Posts
    6,620

    Re: Trying to resolve a locking thread

    SendMessage can cause a problem only when message handler gets blocked internally. And SendMessage itself is harmless most of the times calling main thread which typically is the latest thread freeing its resources and quitting the process. Of course, the main rule of the main thread is: no waiting in main thread.
    Last edited by Igor Vartanov; November 8th, 2010 at 05:09 PM.
    Best regards,
    Igor

  8. #8
    Arjay's Avatar
    Arjay is offline Moderator / EX MS MVP Power Poster
    Join Date
    Aug 2004
    Posts
    13,490

    Re: Trying to resolve a locking thread

    I'm not what you posted is pseudo code or is close to the real code. Consider:

    Code:
    	BOOL Get_nV1(int& v1)
    	{
    		//"GET" method
    		//RETURN: TRUE if stop event is set (thus, need to exit ASAP w/o any further processing)
    		BOOL bR = ::WaitForMultipleObjects(nCntCSObjs, hSynchObject, FALSE, INFINITE) == (WAIT_OBJECT_0 + CAE_STOP_EVENT);
    
    		V1 = nV1;
    
    		ReleaseMutex(hSynchObject[CAE_MUTEX]);
    		return bR;
    	}
    If the real code is close to what you have posted, then you might have a problem.

    The reason being is that although you are checking if the stop event has been set, you aren't doing anything different if it has been set. Usually you'll exit a function when stop event has been set and skip any subsequent operations.

    If the line V1 = nV1; does anything substantial (and waits for other locks to be released) you could have an issue.

    Btw, if V1 = nV1 isn't pseudo-code and is in fact a simple integer assignment, then consider getting rid of the mutex synchronization code and use the InterlockedExchange api.

  9. #9
    Join Date
    Feb 2009
    Location
    Portland, OR
    Posts
    1,488

    Re: Trying to resolve a locking thread

    Quote Originally Posted by Igor Vartanov View Post
    And SendMessage itself is harmless most of the times calling main thread which typically is the latest thread freeing its resources and quitting the process...
    Igor, the main thread is where the waiting API is called from. (I probably didn't mention that.)

    Quote Originally Posted by Arjay View Post
    Usually you'll exit a function when stop event has been set and skip any subsequent operations.
    The real code has two int's, one CString and COleDateTime object, can they lock internally? I doubt that it would happen from me not returning in time. I put those API calls that way because synchronization comes in its own struct and calls are made in that succession.

  10. #10
    Arjay's Avatar
    Arjay is offline Moderator / EX MS MVP Power Poster
    Join Date
    Aug 2004
    Posts
    13,490

    Re: Trying to resolve a locking thread

    Quote Originally Posted by ahmd View Post
    The real code has two int's, one CString and COleDateTime object, can they lock internally? I doubt that it would happen from me not returning in time. I put those API calls that way because synchronization comes in its own struct and calls are made in that succession.
    The point of having a stop event is to force a thread to cleanly exit early. If you don't exit from the work method before doing the work, what's the point of having a stop event?

    Sure, it may seem trivial now, but what if you add something later that does depend on a shared resource? Why not write robust code that would handle changes?

    I'm actually surprised to see you not using RAII for thread synchronization after all the times that folks here have suggested that you use it.

    With regard to RAII, I'm not sure I've ever seen threading problem posted on this forum where the OP has used RAII for thread sync. On the other hand, I've seem many of the same types of posts where folks run into threading problems when not using RAII.

    I'm not sure, but I think there is a lesson in there somewhere.

    Lastly, I suspect newer parallel libraries will be popular because they hide the details of thread synchronization. The interesting thing is using RAII with a couple of design patterns nearly make threading as easy as a parallel library. I guess the trick for either is to get people to use them.

  11. #11
    Join Date
    Feb 2009
    Location
    Portland, OR
    Posts
    1,488

    Re: Trying to resolve a locking thread

    Arjay, I think I made it clear that I took the WaitForMultipleObjects() and ReleaseMutex() calls from a struct that is written with the RAII principles in mind. I put them this way in the example above to make it more readable. That is actually why ReleaseMutex() is called last (because it is called from a destructor in a real example.)

  12. #12
    Join Date
    Oct 2008
    Posts
    1,456

    Re: Trying to resolve a locking thread

    if the stop event and the mutex become signaled during the same wait call then WaitForMultipleObjects will return WAIT_OBJECT_0 + CAE_MUTEX and not WAIT_OBJECT_0 + CAE_STOP_EVENT; from msdn:

    Quote Originally Posted by msdn
    If bWaitAll is FALSE, the return value minus WAIT_OBJECT_0 indicates the lpHandles array index of the object that satisfied the wait. If more than one object became signaled during the call, this is the array index of the signaled object with the smallest index value of all the signaled objects.
    therefore, you should change your enum to something like

    Code:
    enum CS_SYNCH_OBJECTS{
    	CAE_STOP_EVENT,		//Stop event - MUST BE FIRST!
    	CAE_MUTEX,		//Synchronization mutex
    
    	CAE_COUNT		//MUST BE LAST!
    };
    Last edited by superbonzo; November 9th, 2010 at 04:52 AM. Reason: added MSDN quote

  13. #13
    Join Date
    Feb 2009
    Location
    Portland, OR
    Posts
    1,488

    Re: Trying to resolve a locking thread

    Good point, thanks.

    Still, can someone share their views on this:
    Quote Originally Posted by ahmd
    This was always confusing for me, since some system APIs can call SendMessage themselves internally. One of my worker threads may call GetWindowLongPtr(GWL_STYLE), SetWindowLongPtr(GWL_STYLE) and SetWindowPos(). Can those APIs call SendMessage internally?

  14. #14
    Join Date
    Jul 2002
    Posts
    2,543

    Re: Trying to resolve a locking thread

    If API uses SendMessage internally, this is written in MSDN, for example, see SetWindowText.

  15. #15
    Arjay's Avatar
    Arjay is offline Moderator / EX MS MVP Power Poster
    Join Date
    Aug 2004
    Posts
    13,490

    Re: Trying to resolve a locking thread

    Quote Originally Posted by ahmd View Post
    Arjay, I think I made it clear that I took the WaitForMultipleObjects() and ReleaseMutex() calls from a struct that is written with the RAII principles in mind. I put them this way in the example above to make it more readable. That is actually why ReleaseMutex() is called last (because it is called from a destructor in a real example.)
    No you didn't make that clear at all.

    Next time, how about writing the pseudo code as
    Code:
    void fn( int nV1 )
    {
      //  RAII lock
      
      V1 = nV1;
    
    }
    As it stands, I'm not really sure that you understand RAII as it applies to thread sync as I have yet to see an RAII implementation from you.

    I'm not trying to pick on you, but often your posts tend to go round and round because details such as these are left out.

    No worries though, if you understood this, I guess you wouldn't be posting.

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured