[RESOLVED] application crashes with "real-time signal 1"
I am working on a console application under Linux. It transmits and receives data streams via special communication hardware (developed here in house). I am recording all data in RAM buffers (the communication links are at GBit/sec speed) printing to log files any anomalies after the transmission.
I have a timer which wakes my program every 100usec do fill / read data to/from DMA buffers.
Here is the problem:
The application works, but from time to time it would crash, just saying "real-time signal 1" on the terminal. This is very annoying and often ruins my test cycles. I am new to signal programming. How do I do this correctly? Do I have to employ some kind of locking within the timer function?
This is my application structure:
Code:
void bar(int sig)
{
}
main() /* well, not really main() but called by main() */
{
siginfo_t info;
timer_t m_id = 0;
int m_signal_control = SIGRTMIN;
int m_signal_timer = SIGRTMIN + 1;
struct sigevent timer_info;
struct itimerspec m_spec;
struct sigaction foo;
sigset_t set;
install_sig_handlers();
sigemptyset(&set);
sigaddset(&set, m_signal_control);
sigaddset(&set, m_signal_timer);
memset(&timer_info, 0, sizeof(timer_info));
timer_info.sigev_notify = SIGEV_SIGNAL;
timer_info.sigev_signo = m_signal_timer;
timer_create(CLOCK_REALTIME, &timer_info, &m_id);
m_spec.it_value.tv_sec = 0;
m_spec.it_value.tv_nsec = app_loop * 1000;
m_spec.it_interval.tv_sec = m_spec.it_value.tv_sec;
m_spec.it_interval.tv_nsec = m_spec.it_value.tv_nsec;
/* intialize other stuff */
timer_settime(m_id, 0, &m_spec, NULL );
foo.sa_handler = bar;
sigemptyset(&foo.sa_mask);
sigaction(m_signal_timer, &foo, NULL );
while (!terminate)
{
sigwaitinfo(&set, &info);
/* do stuff */
}
sigwaitinfo (&set, &info);
timer_delete(m_id);
/* output results */
}
This is the "sig_handler.c" module
Code:
/* set from a signal handler registered to the INT/QUIT/HUP signals */
volatile int terminate = 0;
/* set from a signal handler registered to the CHDL signal */
volatile int terminate_child = 0;
/* Signal handler for the QUIT/HUP/INT signals */
static void terminate_sig_handler(const int sig)
{
terminate = 1;
terminate_child = 1;
}
/* Signal handler for the CHLD signal */
static void child_sig_handler(const int sig)
{
terminate_child = 1;
}
/* Install the INT/HUP/QUIT and CHDL signal handlers. */
void install_sig_handlers(void)
{
/* register signal handlers from INT/HUP/QUIT */
signal(SIGINT, terminate_sig_handler);
signal(SIGHUP, terminate_sig_handler);
signal(SIGQUIT, terminate_sig_handler);
signal(SIGTERM, terminate_sig_handler);
signal(SIGPIPE, child_sig_handler);
signal(SIGCHLD, child_sig_handler);
}
Interface to "sig_handler.c":
Code:
#ifndef __SIG_HANDLER_H_
#define __SIG_HANDLER_H_
/* set from a signal handler registered to the INT/QUIT/HUP signals */
extern volatile int terminate;
/* set from a signal handler registered to the CHDL signal */
extern volatile int terminate_child;
void install_sig_handlers ();
#endif
I am not the originator of this code, so there are some uncertainties about details ...
Any help is appreciated - tank you!!
Johannes
Re: application crashes with "real-time signal 1"
Quote:
Originally Posted by
johanneshau
I am working on a console application under Linux. It transmits and receives data streams via special communication hardware (developed here in house). I am recording all data in RAM buffers (the communication links are at GBit/sec speed) printing to log files any anomalies after the transmission.
I don't see any specific synchronization, i.e. no mutexes, semaphores, etc. in your code.
I haven't done real-time programming, but are you sure you can get away with coding something in a "serial" or "single-threaded" manner like you've done, given what you say you are trying to accomplish?
I would expect a program such as this to use some sort of synchronization objects to ensure there are no race conditions, or attempting to enter functions that are not re-entrant/thread-safe, etc. For example, how do you maintain this buffer that reads data every fraction of a second? How do you tell the "reader" when the buffer is ready to be read into? What if the buffer isn't emptied out from the previous cycle, and then you say "read x bytes of data into the buffer"?
Regards,
Paul McKenzie
Re: application crashes with "real-time signal 1"
Quote:
The application works, but from time to time it would crash, just saying "real-time signal 1" on the terminal.
What code is printing this message as I don't see it in the code you posted.
Re: application crashes with "real-time signal 1"
>> What code is printing this message
It's an unhandled signal (SIGRT_1) causing process termination.
Are you calling seteuid() or setegid()? Apparently, glibc uses SIGRT_1 for process wide credential changes: https://bugzilla.redhat.com/show_bug.cgi?id=473907
Are you using pthreads at all? If so, what kernel and glibc version are you using? How do you link with pthreads when compiling?
gg
Re: application crashes with "real-time signal 1"
Thanks for the replies so far.
Unforutnately I cannot post the complete code, since it is a commercial project.
The application is not (yet) multithreaded. The only other "thread" which is running is the timer function.
It is a testsystem consisting of several PCs with custom network cards (PCIe) and a custom switch. The networking hardware is developed here in-house and is not programmed with sockets, etc. (no TCP, UDP, IP). The network is a closed configuration which is specified beforehand and configuration information is loaded into each node and switch of the network. Each node can have one or more virtual connections to other nodes in the network. The application runs on each PC (Ubuntu Linux, AMD64bit, kernel 3.2).
The application is structured like this:
Code:
- load the configuration into the hardware
- set up the hardware
- set up the DMA transfers for each virtual connection
- set up the timer and signal handlers
main loop:
- wait for the timer signal
- loop through all Rx virtual connections and receive what is available in the HW buffers and copy to static arrays
- loop through all Tx virtual connections and send test data
When you press CTRL C in the terminal the main loop is terminated (terminate variable is set to 1 in the signal handler).
After the main loop the following takes place:
Code:
- delete the timer
- scan through the received data and check for missed/wrong packets, etc.
- write data and status information to log files (one for each virtaul connection) for further inspection
this happens with simple fprintf() statements.
- free allocated buffers
- de-initialize the hardware
Since the application is not multithreaded (no threads are created/started) I do not link with pthreads library. But I do link with the "rt" library
For communication with the hardware there exists an elaborate API, which provides all necessary functions for buffer management, status information, data transfer, etc.
I am logged in and running the program as root. Do you think I should use a different signal number (SIGRTMIN+3)?
The system is a test system to prove the functionality and reliability of the network. The program code evolved from former projects and customer input. I am an experienced C programmer but do not yet have any experience with signals and these timers in Linux.
Thanks again for your help!
Johannes
Re: application crashes with "real-time signal 1"
I don't see in your code where "timer_info.sigev_value.sival_ptr = &m_id;" is being set. See the sample code here: http://man7.org/linux/man-pages/man2..._create.2.html
You should also add error handling to every single system/function call that returns an error.
Have you tried to reproduce the issue using a debug build, running under a debugger?
gg
Re: [RESOLVED] application crashes with "real-time signal 1"
Thank you for the hint!
I made a stripped-down program for better testing. One problem is that it happens sporadic.
After reading more about signals and timers I restructured the code (and added the initialisation as you suggested). It apparently works now, though I am not fully convinced that it really is solved (because I cannot see the real difference to the original code), but certainly the behaviour is improved.
So I am setting this to "resolved" - thank you for your time!
Johannes