This is an archive of the comp.programming.threads FAQ, which used to be hosted by Bill Lewis at the now defunct lambdacs.com. I believe this is up-to-date as of approximately 2001.
FAQ
This is a list of the questions which have come up on the newsgroup with any answers that were given. (Somewhat edited by yours truly.) In a few cases I have left in the names of the participants. If you'd like me to remove your name, let me know. If you have other comments/corrects, just drop me a line (Bil LambdaCS.com). (Of course I'll expect *you* to supply any corrections! :-) This list is a bit of a hodge-podge, containing everything that I thought *might* be useful. Hence it is HUGE and not very well edited. It even has duplicates (or worse, near-duplicates). The MFAQ is much smaller and better maintained. You may wish to check there first. -Bil ================================================================== F R E Q U E N T L Y A S K E D Q U E S T I O N S ================================================================== Also see: Brian's FAQ: http://www.serpentine.com/~bos/threads-faq (Sun's Threads page and FAQ is no more.) Many of the most general questions can be answered by reading (a) the welcome message, (b) the general information on the other threads pages, and (c) any of the books on threads. References to all of these can be found in the welcome message. Q1: How fast can context switching be? Q2: What about special purpose processors? Q3: What kinds of issues am I faced with in async cancellation? Q4: When should I use these new thread-safe "_r" functions? Q5: What benchmarks are there on POSIX threads? Q6: Has anyone used the Sparc atomic swap instruction? Q7: Are there MT-safe interfaces to DBMS libraries? Q8: Why do we need re-entrant system calls? Q9: Any "code-coverage" tools for MT applications? Q10: How can POSIX join on any thread? Q11: What is the UI equivalent for PTHREAD_MUTEX_INITALIZER? Q12: How many threads are too many in one heavyweight process? Q13: Is there an atomic mutex_unlock_and_wait_for_event()? Q14: Is there an archive of this newsgroup somewhere? Q15: Can I copy pthread_mutex_t structures, etc.? Q16: After 1800 calls to thr_create() the system freezes. ?? Q17: Compiling libraries which might be used in threaded or unthreaded apps? Q18: What's the difference of signal handling for process and thread? Q19: What about creating large numbers of threads? Q20: What about using sigwaitinfo()? Q21: How can I have an MT process communicate with many UP processes? Q22: Writing Multithreaded code with Sybase CTlib ver 10.x? Q23: Can we avoid preemption during spin locks? Q24: What about using spin locks instead of adaptive spin locks? Q25: Will thr_create(...,THR_NEW_LWP) fail if the new LWP cannot be added? Q26: Is the LWP released upon bound thread termination? Q27: What's the difference between pthread FIFO the solaris threads scheduling? Q28: I really think I need time-sliced RR. Q29: How important is it to call mutex_destroy() and cond_destroy()? Q30: EAGAIN/ENOMEM etc. apparently aren't in?! Q31: What can I do about TSD being so slow? Q32: What happened to the pragma 'unshared' in Sun C? Q33: Can I profile an MT-program with the debugger? Q34: Sometimes the specified sleep time is SMALLER than what I want. Q35: Any debugger that single step a thread while the others are running? Q36: Any DOS threads libraries? Q37: Any Pthreads for Linux? Q38: Any really basic C code example(s) and get us newbies started? Q39: Please put some Ada references in the FAQ. Q40: Which signals are synchronous, and whicn are are asynchronous? Q41: If we compile -D_REENTRANT, but without -lthread, will we have problems? Q42: Can Borland C++ for OS/2 give up a TimeSlice? Q43: Are there any VALID uses of suspension? Q44: What's the status of pthreads on SGI machines? Q45: Does the Gnu debugger support threads? Q46: What is gang scheduling? Q47: LinuxThreads linked with X11, calls to X11 seg fault. Q48: Are there Pthreads on NT? Q49: What about garbage collection? Q50: Does anyone have any information on thread programming for VMS? Q51: Any information on the DCE threads library? Q52: Can I implement pthread_cleanup_push without a macro? Q53: What switches should be passed to particular compilers? Q54: How do I find Sun's bug database? Q55: How do the various vendors' threads libraries compare? Q56: Why don't I need to declare shared variables VOLATILE? Q57: Do pthread_cleanup_push/pop HAVE to be macros (thus lexically scoped)? Q58: Analyzer Fatal Error[0]: Slave communication failure ?? Q59: What is the status of Linux threads? Q60: The Sunsoft debugger won't recognize my PThreads program! Q61: How are blocking syscall handled in a two-level system? Q62: Can one thread read from a socket while another thread writes to it? Q63: What's a good way of writing threaded C++ classes? Q64: Can thread stacks be built in privately mapped memory? Q66: I think I need a FIFO mutex for my program... Q67: Why my multi-threaded X11 app with LinuxThreads crashes? Q68: How would we put a C++ object into a thread? Q69: How different are DEC threads and Pthreads? Q70: How can I manipulate POSIX thread IDs? Q71: I'd like a "write" that allowed a timeout value... Q72: I couldn't get threads to work with glibc-2.0. Q73: Can I do dead-owner-process recovery with POSIX mutexes? Q74: Will IRIX distribute threads immediately to CPUs? Q75: IRIX pthreads won't use both CPUs? Q76: Are there thread mutexes, LWP mutexes *and* kernel mutexes? Q77: Does anyone know of a MT-safe alternative to setjmp and longjmp? Q78: How do I get more information inside a signal handler? Q79: Is there a test suite for Pthreads? Q80: Flushing the Store Buffer vs. Compare and Swap Q81: How many threads CAN a POSIX process have? Q82: Can Pthreads wait for combinations of conditions? Q83: Shouldn't pthread_mutex_trylock() work even if it's NOT PTHREAD_PROCESS_SHARED? Q84: What about having a NULL thread ID? Q85: Explain Traps under Solaris Q86: Is there anything similar to posix conditions variables in Win32 API ? Q87: What if a cond_timedwait() times out AND the condition is TRUE? Q88: How can I recover from a dying thread? Q89: How to implement POSIX Condition variables in Win32? Q90: Linux pthreads and X11 Q91: One thread runs too much, then the next thread runs too much! Q92: How do priority levels work? Q93: C++ member function as the startup routine for pthread_create(). Q94: Spurious wakeups, absolute time, and pthread_cond_timedwait() Q95: Conformance with POSIX 1003.1c vs. POSIX 1003.4a? Q96: Cleaning up when kill signal is sent to the thread.? Q97: C++ new/delete replacement that is thread safe and fast? Q98: beginthread() vs. endthread() vs. CreateThread? (Win32) Q99: Using pthread_yield()? Q100: Why does pthread_cond_wait() reacquire the mutex prior to being cancelled? Q101: HP-UX 10.30 and threads? Q102: Signals and threads are not suited to work together? Q102: Patches in IRIX 6.2 for pthreads support? Q104: Windows NT Fibers? Q105: LWP migrating from one CPU to another in Solaris 2.5.1? Q106: What conditions would cause that thread to disappear? Q107: What parts, if any, of the STL are thread-safe? Q108: Do pthreads libraries support cooperative threads? Q109: Can I avoid mutexes by using globals? Q110: Aborting an MT Sybase SQL? Q111: Other MT tools? Q112: That's not a book. That's a pamphlet! Q114: How to cleanup TSD in Win32? Q115: Onyx1 architecture has one problem Q116: LinuxThreads linked with X11 seg faults. Q117: Comments about Linux and Threads and X11 Q118: Memory barriers for synchonization Q119: Recursive mutex debate Q120: Calling fork() from a thread Q121: Behavior of [pthread_yield()] sched_yield() Q122: Behavior of pthread_setspecific() Q123: Linking under OSF1 3.2: flags and library order Q124: What is the TID during initialization? Q125: TSD destructors run at exit time... and if it crashes? Q126: Cancellation and condition variables Q127: RedHat 4.2 and LinuxThreads? Q128: How do I measure thread timings? Q129: Contrasting Win32 and POSIX thread designs Q130: What does POSIX say about putting stubs in libc? Q131: MT GC Issues Q132: Some details on using CMA threads on Digital UNIX Q133: When do you need to know which CPU a thread is on? Q134: Is any difference between default and static mutex initialization? Q135: Is there a timer for Multithreaded Programs? Q136: Roll-your-own Semaphores Q137: Solaris sockets don't like POSIX_C_SOURCE! Q138: The Thread ID changes for my thread! Q139: Does X11 support multithreading ? Q140: Solaris 2 bizzare behavior with usleep() and poll() Q141: Why is POSIX.1c different w.r.t. errno usage? Q142: printf() anywhere AFTER pthread_create() crashes on HPUX 10.x Q143: Pthreads and Linux Q144: DEC release/patch numbering Q145: Pthreads (almost) on AS/400 Q146: Can pthreads & UI threads interoperate in one application? Q147: Thread create timings Q148: Timing Multithreaded Programs (Solaris) Q149: A program which monitors CPU usage? Q150: standard library functions: whats safe and whats not? Q151: Where are semaphores in POSIX threads? Q152: Thread & sproc (on IRIX) Q153: C++ Exceptions in Multi-threaded Solaris Process Q154: SCHED_FIFO threads without root privileges ? Q155: "lock-free synchronization" Q156: Changing single bytes without a mutex Q157: Mixing threaded/non-threadsafe shared libraries on Digital Unix Q158: VOLATILE instead of mutexes? Q159: After pthread_cancel() destructors for local object do not get called?! Q160: No pthread_exit() in Java. Q161: Is there anyway I can make my stacks red zone protected? Q162: Cache Architectures, Word Tearing, and VOLATILE Q163: Can ps display thread names? Q164: (Not!) Blocking on select() in user-space pthreads. Q165: Getting functional tests for UNIX98 Q166: To make gdb work with linuxthreads? Q167: Using cancellation is *very* difficult to do right... Q168: Why do pthreads implementations differ in error conditions? Q169: Mixing threaded/non-threadsafe shared libraries on DU Q170: sem_wait() and EINTR Q171: pthreads and sprocs Q172: Why are Win32 threads so odd? Q173: What's the point of all the fancy 2-level scheduling?? Q174: Using the 2-level model, efficency considerations, thread-per-X Q175: Multi-platform threading api Q176: Condition variables on Win32 Q177: When stack gets destroyed relative to TSD destructors? Q178: Thousands of mutexes? Q179: Threads and C++ Q180: Cheating on mutexes Q181: Is it possible to share a pthread mutex between two distinct processes? Q182: How should one implement reader/writer locks on files? Q183: Are there standard reentrant versions of standard nonreentrant functions? Q184: Detecting the number of cpus Q185: Drawing to the Screen in more than one Thread (Win32) Q186: Digital UNIX 4.0 POSIX contention scope Q187: Dec pthreads under Windows 95/NT? Q188: DEC current patch requirements Q189: Is there a full online version of 1003.1c on the web somewhere? Q190: Why is there no InterlockedGet? Q191: Memory barrier for Solaris Q192: pthread_cond_t vs pthread_mutex_t Q193: Using DCE threads and java threads together on hpux(10.20) Q194: My program returns enomem on about the 2nd create. Q195: Does pthread_create set the thread ID before the new thread executes? Q196: thr_suspend and thr_continue in pthread Q197: Are there any opinions on the Netscape Portable Runtime? Q198: Multithreaded Perl Q199: What if a process terminates before mutex_destroy()? Q200: If a thread performs an illegal instruction and gets killed by the system... Q201: How to propagate an exception to the parent thread? Q202: Discussion: "Synchronously stopping things" / Cheating on Mutexes Q203: Discussion: Thread creation/switch times on Linux and NT. Q204: Are there any problems with multiple threads writing to stdout? Q205: How can I handle out-of-band communication to a remote client? Q206: I need a timed mutex for POSIX Q207: Does pthreads has an API for configuring the number of LWPs? Q208: Why does Pthreads use void** rather than void*? Q209: Should I use poll() or select()? Q210: Where is the threads standard of POSIX ???? Q211: Is Solaris' unbound thread model braindamaged? Q212: Releasing a mutex locked (owned) by another thread. Q213: Any advice on using gethostbyname_r() in a portable manner? Q214: Passing file descriptors when exec'ing a program. Q215: Thread ID of thread getting stack overflow? Q216: Why aren't my (p)threads preemted? Q217: Can I compile some modules with and others without _POSIX_C_SOURCE? Q218: timed wait on Solaris 2.6? Q219: Signal delivery to Java via native interface Q220: Concerning timedwait() and realtime behavior. Q221: pthread_attr_getstacksize on Solaris 2.6 Q222: LinuxThreads: Problem running out of TIDs on pthread_create Q223: Mutexes and the memory model Q224: Poor performance of AIO in Solaris 2.5? Q225: Strategies for testing multithreaded code? Q226: Threads in multiplatform NT Q227: Guarantee on condition variable predicate/pthreads? Q228: Pthread API on NT? Q229: Sockets & Java2 Threads Q230: Emulating process shared threads Q231: TLS in Win32 using MT run-time in dynamically loaded DLLs? Q232: Multithreaded quicksort Q233: When to unlock for using pthread_cond_signal()? Q234: Multi-Read One-Write Locking problem on NT Q235: Thread-safe version of flex scanner Q236: POSIX standards, names, etc Q237: Passing ownership of a mutex? Q238: NT fibers Q239: Linux (v.2.0.29 ? Caldera Base)/Threads/KDE Q240: How to implement user space cooperative multithreading? Q241: Tools for Java Programming Q242: Solaris 2.6, phtread_cond_timedwait() wakes up early Q243: AIX4.3 and PTHREAD problem Q244: Readers-Writers Lock source for pthreads Q245: Signal handlers in threads Q246: Can a non-volatile C++ object be safely shared amongst POSIX threads? Q247: Single UNIX Specification V2 Q248: Semantics of cancelled I/O (cf: Java) Q249: Advice on using multithreading in C++? Q250: Semaphores on Solaris 7 with GCC 2.8.1 Q251: Draft-4 condition variables (HELP) Q252: gdb + linuxthreads + kernel 2.2.x = fixed :) Q253: Real-time input thread question Q254: How does Solaris implement nice()? Q255: Re: destructors and pthread cancelation... Q256: A slight inaccuracy WRT OS/2 in Threads Primer Q257: Searching for an idea Q258: Benchmark timings from "Multithreaded Programming with Pthreads" Q259: Standard designs for a multithreaded applications? Q260: Threads and sockets: Stopping asynchroniously Q261: Casting integers to pointers, etc. Q262: Thread models, scalability and performance Q263: Write threaded programs while studying Japanese! Q264: Catching SIGTERM - Linux v Solaris Q265: pthread_kill() used to direct async signals to thread? Q266: Don't create a thread per client Q267: More thoughts on RWlocks Q268: Is there a way to 'store' a reference to a Java thread? Q269: Java's pthread_exit() equivalent? Q270: What is a "Thread Pool"? Q271: Where did "Thread" come from? Q272: Now do I create threads in a Solaris driver? Q273: Synchronous signal behavior inconsistant? Q274: Making FORTRAN libraries thread-safe? Q275: What is the wakeup order for sleeping threads? Q276: Upcalls in VMS? Q277: How to design synchronization variables? Q278: Thread local storage in DLL? Q279: How can I tell what version of linux threads I've got? Q280: C++ exceptions in a POSIX multithreaded application? Q281: Problems with Solaris pthread_cond_timedwait()? Q282: Benefits of threading on uni-process Q283: What if two threads attempt to join the same thread? Q284: Questions with regards to Linux OS? Q285: I need to create about 5000 threads? Q286: Can I catch an exception thrown by a sla Q287: _beginthread() versus CreateThread()? Q288: Is there a select() call in Java?? Q289: Comment on use of VOLATILE in the JLS.? Q290: Should I try to avoid GC by pooling objects myself?? Q291: Does thr_X return errno values? What's errno set to??? Q292: How I can wait more then one condition variable in one place? Q293: Details on MT_hot malloc()? Q294: Bug in Bil's condWait()? Q295: Is STL considered thread safe?? Q296: To mutex or not to mutex an int global variable ?? Q297: Stack overflow problem ? Q298: How would you allow the other threads to continue using a "forgotten" lock? Q299: How unfair are mutexes allowed to be? Q300: Additionally, what is the difference between -lpthread and -pthread? ? Q301: Handling C++ exceptions in a multithreaded environment? Q302: Pthreads on IRIX 6.4 question? Q303: Threading library design question ? Q304: Lock Free Queues? Q305: Threading library design question ? Q306: Stack size/overflow using threads ? Q307: correct pthread termination? Q308: volatile guarantees?? Q309: passing messages, newbie? Q310: solaris mutexes? Q311: Spin locks? Q312: AIX pthread pool problems? Q313: iostream libray and multithreaded programs ? Q314: Design document for MT appli? Q315: SCHED_OTHER, and priorities? Q316: problem with iostream on Solaris 2.6, Sparcworks 5.0? Q317: pthread_mutex_lock() bug ??? Q318: mix using thread library? Q319: Re: My agony continues (thread safe gethostbyaddr() on FreeBSD4.0) ? Q320: OOP and Pthreads? Q321: query on threading standards? Q322: multiprocesses vs multithreaded..?? Q323: CGI & Threads? Q324: Cancelling detached threads (posix threads)? Q325: Solaris 8 recursive mutexes broken? Q326: sem_wait bug in Linuxthreads (version included with glibc 2.1.3)? Q327: pthread_atfork?? Q328: Does anybody know if the GNU Pth library supports process shared mutexes? Q329: I am trying to make a thread in Solaris to get timer signals. Q330: How do I time individual threads? Q331: I'm running out of IPC semaphores under Linux! Q332: Do I have to abandon the class structure when using threads in C++? Q333: Questions about pthread_cond_timedwait in linux. Q334: Questions about using pthread_cond_timedwait. Q335: What is the relationship between C++ and the POSIX cleanup handlers? Q336: Does selelct() work on calls recvfrom() and sendto()? Q337: libc internal error: _rmutex_unlock: rmutex not held. Q338: So how can I check whether the mutex is already owned by the calling thread? Q339: I expected SIGPIPE to be a synchronous signal. Q340: I have a problem between select() and pthread... Q341: Mac has Posix threading support. Q342: Just a few questions on Read/Write for linux. Q343: The man pages for ioctl(), read(), etc. do not mention MT-safety. Q344: Status of TSD after fork()? Q345: Static member function vs. extern "C" global functions? Q346: Can i kill a thread from the main thread that created it? Q347: What does /proc expose vis-a-vis LWPs? Q348: What mechanism can be used to take a record lock on a file? Q349: Implementation of a Timed Mutex in C++ Q350: Effects that gradual underflow traps have on scaling. Q351: LinuxThreads woes on SIGSEGV and no core dump. Q352: On timer resolution in UNIX. Q353: Starting a thread before main through dynamic initialization. Q354: Using POSIX threads on mac X and solaris? Q355: Comments on ccNUMA on SGI, etc. Q356: Thread functions are NOT C++ functions! Use extern "C" Q357: How many CPUs do I have? Q358: Can malloc/free allocate from a specified memory range? Q359: Can GNU libpth utilize multiple CPUs on an SMP box? Q360: How does Linux pthreads identify the thread control structure? Q361: Using gcc -kthread doesn't work?! Q362: FAQ or tutorial for multithreading in 'C++'? Q363: WRLocks & starvation. Q364: Reference for threading on OS/390. Q365: Timeouts for POSIX queues (mq_timedreceive()) Q366: A subroutine that gives cpu time used for the calling thread? Q367: Documentation for threads on Linux Q368: Destroying a mutex that was statically initialized. Q369: Tools for debugging overwritten data. Q370: POSIX synchronization is limited compared to win32. Q371: Anyone recommend us a profiler for threaded programs? Q372: Coordinating thread timeouts with drifting clocks. Q373: Which OS has the most conforming POSIX threads implementation? Q374: MT random number generator function. Q375: Can the main thread sleep without causing all threads to sleep? Q376: Is dynamic loading of the libpthread supported in Redhat? Q377: Are reads and writes atomic? Q378: More discussion on fork(). Q379: Performance differences: POSIX threads vs. ADA threads? Q380: Maximum number of threads with RedHat 255? Q381: Best MT debugger for Windows... Q382: Thread library with source code ? Q383: Async cancellation and cleanup handlers. Q384: How easy is it to use pthreads on win32? Q385: Does POSIX require two levels of contention scope? Q386: Creating threadsafe containers under C++ Q387: Cancelling pthread_join() DOESN'T detach target thread? Q388: Scheduling policies can have different ranges of priorities? Q389: The entity life modeling approach to multi-threading. Q390: Is there any (free) documentation? Q391: Grafting POSIX APIs on Linux is tough! Q392: Any companies using pthread-win32? Q393: Async-cancel safe function: guidelines? Q394: Some detailed discussion of implementations. Q395: Cancelling a single thread in a signal handler? Q396: Trouble debugging under gdb on Linux. Q397: Global signal handler dispatching to threads. Q398: Difference between the Posix and the Solaris Threads? Q399: Recursive mutexes are broken in Solaris? Q400: pthreads and floating point attributes? Q401: Must SIGSEGV be sent to the thread which generated the signal? Q402: Windows and C++: How? Q403: I have blocked all signals and don't get SEGV! Q404: AsynchronousInterruptedException (AIE) and POSIX cancellation =================================TOP=============================== Q1: How fast can context switching be? In general purpose processors (SPARC, MIPS, ALPHA, HP-PA, POWER, x86) a LOCAL thread context switch takes on the order of 50us. A GLOBAL thread context switch takes on the order of 100us. However... [email protected] (Abdelsalam Heddaya) writes: >- Certain multi-threaded processor architectures, with special support > for on-chip caching of thread contexts can switch contexts in, > typically, less than 10 cycles, down to as little as one cycle. The Tera machine switches with 0 cycles of overhead. > Such processors still have to incur a high cost when they run out of > hardware contexts and need to perform a full "context swap" with > memory. Hmmm. With 128 contexts/processors and 16 processors on the smallest machine, we may be talking about a rare sitution. Many people doubt we'll be able to keep the machine busy, but you propose an embarassment of riches/parallelism. In any case, I disagree with the implication that a full context swap is a problem to worry about. We keep up to 2048 threads active at a time, with others confined to memory. The processors issues instructions for the active threads and completely ignore the inactive threads -- there's no swapping of threads between processor and memory in the normal course of execution. Instead, contexts are "swapped" when one thread finishes, or blocks too long, or is swapped to disk, etc. In other words, at fairly significant intervals. Preston Briggs =================================TOP=============================== Q2: What about special purpose processors? What are the distinctions between these special purpose processors and the general purpose processors we're using? ?? =================================TOP=============================== Q3: What kinds of issues am I faced with in async cancellation? Michael C. Cambria wrote: > > In article <[email protected]>, [email protected] (Spike White) wrote: > [deleted] > > thread2() > > { > > ... > > while(1) { > > pthread_setasynccancel(CANCEL_ON); > > pthread_testcancel(); /* if there's a pending cancel */ > > read(...); > > pthread_setasynccancel(CANCEL_OFF); > > ...process data... > > } > > } > > > > Obviously, you shouldn't use any results from the read() call that was > > cancelled -- God knows what state it was when it left. > > > > That's the only main use I've ever found for async cancel. > > I used something quite similar to your example (quoted above) in my > original question. > > Since the read() call itself is not async cancel safe according to Posix, > is it even safe to do the above? In general for any posix call which is > not async cancel safe, my guess (and many e-mails to me agree) is to > just not use it. > > Using read() as an example, I'll bet everyone will agree with you not > to use the results of the read() call. However, the the motivation for > my original question was, being as a call() is not async cancel safe, > by canceling a thread when it is in one of these calls _may_ screw up > other threads in general and other threads using the same fd in > particular. This is why I asked why one would use it. > > In your example, if read() did anything with static data, the next read on > that fd could have problems if a thread was cancelled while in the read(). > (Note: if you don't like the "static data" example, substitute whatever > you like for the implementation reason for read(), or any call, not being > async cancel safe. I used static data as an example only.) > > Mike Specifically, NO, it is NOT safe to call read() with async cancel. On some implementations it may work, sometimes. In general, it *MAY* work if, on the particular release of your particular operation system, read() happens to be implemented with no user-mode code (aside from a syscall trap). In most cases, a user mode cancel will NOT be allowed to corrupt kernel data. However, no implementations make any guarantees about their implementation of read(). It may be a syscall in one version and be moved partly into libc in the next version. Unfortunately, the OSF DCE porting guide made reference to the possibility of using async cancel in place of synchronous system cancel capability on platforms that don't support the latter. That was really too bad, and it set a very dangerous precedent. POSIX 1003.1c-1996 encourages all routines to document whether they are async cancel safe. (Luckily the advice is in rationale -- which is to say it's really just commentary and not part of the standard -- because it'd be horrendously difficult to change the documentation of every single routine in a UNIX system.) In practice, you should always assume that a function is NOT async cancel safe unless it says that it IS. And you won't see that very often. Because, as has already been commented, async cancel really isn't very useful. There is a certain small class of application that can benefit dramatically from async cancel, for good response to shutdown requests in long-running compute-bound threads. In a long and tight loop it's not practical to call pthread_testcancel(). So in cma we provided async cancel for those cases. In retrospect I believe that's probably one of the bad parts of cma, which POSIX should have omitted. There may well have been "hard realtime" people in the room who wanted to use it, though (the POSIX threads standard was developed by roughly 10 "threads people" and 40 to 50 "realtime people"). ------------------------------------------------------------------------ Dave Butenhof Digital Equipment Corporation [email protected] 110 Spit Brook Rd, ZKO2-3/Q18 Phone: 603.881.2218, FAX: 603.881.0120 Nashua, NH 03062-2711 "Better Living Through Concurrency" ------------------------------------------------------------------------ > In article <[email protected]>, > Jose Luis Ramos =?iso-8859-1?Q?Mor=E1n?= wrote: > % pthread_setcancelstate(PTHREAD_CANCEL_ENABLE,NULL); > % pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS,NULL); > > I would guess that your problem comes from this. Asynchronous cancellation > is almost never a good idea, but if you do use it, you should be really > careful about whether there's anything with possible side-effects in your > code. For instance, the C++ exception handler could be screwed up for your > whole process if you cancel at a bad moment. > > Anyway, try taking out the asynchronous cancellation and see if the problem > goes with it. I'll put it a little more strongly than Patrick. The program is illegal. You CANNOT call any function with asyncronous cancel enabled unless that function is explicitly specified as "async-cancel safe". There are very few such functions, and sleep() is not one of them. In fact, within the scope of the POSIX and UNIX98 standards, with async cancel enabled you are allowed only to 1. Disable asynchronous cancellation (set cancel type to DEFERRED) 2. Disable cancellation entirely (set cancel state to DISABLE) 3. Call pthread_cancel() [This is bizarre and pointless, but it is specified in the standard.] If you call any other function defined by ANSI C, POSIX, or UNIX98 with async cancel enabled, then your program is nonportable and "non conforming". It MAY still be "correct", but only IF you are targeting your code to one specific implementation of the standard that makes the NON-portable and NON-standard guarantee, in writing, that the function you're calling actually is async-cancel safe on that implementation. Otherwise, the program is simply broken. You can, of course, write your own async-cancel safe functions. It's not that hard to do. In general, like most correct implementations of pthread_cancel(), you simply DISABLE async cancellation on entry and restore the previous setting on exit. But it's silly to do that very often. And, of course, that's not the same as actually allowing async cancel. THAT is a much, much harder job, except for regions of code that own no resources of any kind. Asynchronous cancelation was designed for tight CPU-bound loops that make no calls, and therefore would suffer from the need to call pthread_testcancel() on some regular basis in order to allow responsiveness to cancellation requests. That's the ONLY time or place you should EVER even consider using asynchronous cancellation. /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q4: When should I use these new thread-safe "_r" functions? David Brownell wrote: > > If the "_r" versions are available at all times, use them but > beware of portability issues. POSIX specifies a pretty minimal > set and many implementations add more (e.g. gethostbyname_r). > Some implementations only expose the "_r" versions if you > compile in a threaded environment, too. > > - Dave POSIX 1003.1c-1995 deliberately separates _POSIX_THREAD_SAFE_FUNCTIONS from _POSIX_THREADS so that they can be easily implemented by non-threaded systems. The "_r" versions aren't just thread-safe, they are also much "cleaner" and more modular than the traditional forms. (for example, you can have several independent readdir_r or strtok_r streams active simultaneously). The grand vision is that all UNIX systems, even though without threads, would of course want to pick up this wonderful new set of interfaces. I doubt you'll see them in any system without threads support, of course, but it would be nice. =================================TOP=============================== Q5: What benchmarks are there on POSIX threads? In the book on POSIX.4 by B.Gallmeister there are some very useful POSIX benchmark programs which allow to measure the real-time performance of an operating system. However there is nothing on the threads of POSIX.4a! Does anybody know of a useful set of benchmark programs on these POSIX threads ?? Any help is greatly appreciated. Markus Joos CERN ECP/ESS ([email protected]) ?? =================================TOP=============================== Q6: Has anyone used the Sparc atomic swap instruction? Has anyone used the Sparc atomic swap instruction to safely build lists in a multithreaded application? Any examples? Any references? Yes, but it would not help you if you use sun4c machines. ( No atomic instructions..) Thus you would be forced to use atomic in sun4m or later, and spl stuff in sun4c. Does not make a pretty picture. Why not use mutex_lock/unlock and let the libraries worry about that. mutex_lock uses atomic/spl stuff. Sinan [sun4c are SPARC v7 machines such as 4/110, SS1, SS1+, SS2, IPC, IPX, EPC, EPX. sun4m are v8 machines including SS10, SS20, SS4, SS5, 4/690, SS1000, SC2000. The UltraSPARC machines are SPARC v8+ (soon to be v9), but have the same instructions as the sun4ms.] =================================TOP=============================== Q7: Are there MT-safe interfaces to DBMS libraries? A: In general, no. My current understanding is that NO major DBMS vendor has an MT-safe client-side library. (1996) Peter Sylvester wrote: > > In article <[email protected]>, Andreas Reichenberger > wrote: > > > Richard Moulding wrote: > > > > > > I need to interface to an Oracle 7 DB from DCE (non-Oracle) > > > clients. We are planning to build our own DCE RPC-stored > > > procedure interface but someone must be selling something to do > > > this, no? > > I have the same problem with an Informix application which uses a DCE > interface, and currently limit it to 1 thread coming in. This works, but > could be a bottleneck in busy environments, as other incoming RPCs are put > in a queue (blocked) until the current one finishes. ... stuff deleted > A potential way around this would be to fork off separate processes which > then start their own connection to the database. The parent then acts as a > dispatcher for requests coming in. I know the forking part works without > DCE, but I suspect that you have to do all the forks before going into the > DCE server listening mode. > > I also thought I heard something about Oracle providing a thread safe > library, maybe in 7.3. Anyone know? > > -- > Peter Sylvester > MITRE Corp. > Bedford, MA > ([email protected]) This is exactly the way we handled the problem. We wrote a tool that generates the complete dispatcher from an IDL file. The dispatcher (which is virtually invisible to the clients and to the developers) distributes the requests from the clients to its 'backends', which are connected to the DB. The backends are implemented as single-threaded DCE Servers with the Interface specified in the IDL File. We added some features that are not in DCE, like - asyncronous RPC's (the RPC returns immediately and the client can ask the dispatcher to return the state of the RPC (if it is done or still running) or request the RPC to be canceled) - dividing the backends into classes. i.e. it's possible to have one class of backends for querying the database and another class for updates, etc. By assigning 2 backends to the query class and the rest of the backends to other classes you can limit the number of concurrent queries to 2 (because they are time consuming). The client has to specify which class is to be used for a RPC (we currently support up to 10 classes) Context handles are used to tie a client to one backend for transactions which require more than one RPC to be handled by the same backend (= DB Session). The reason why the hell we had to do this anyway was to limit the number of backend processes neccessary to support a few hundred PC clients. We currently run it on AIX and Digital UNIX with Oracle and Ingres. However, there's no reason why it shouldn't work on any UNIX platform which supports OSF DCE (V1.1) and with any DB. Feel free to contact me for more details... See 'ya =================================TOP=============================== Q8: Why do we need re-entrant system calls? A: [email protected] (Jeffrey P Bradford) wrote: >Why do we need re-entrant system calls? I know that it's so that >system calls can be used in a multithreaded environment, but how often >does one really have multiple threads executing the same system call? >Do we really need system calls that can be executed by multiple >threads, or would mutual exclusion be good enough? Well, there have been some implimentations that felt (feel?) that mutual exclusion is good enough. And, in fact, that will "thread safe" the functions. But it runs havoc with performance, and things like cancelability. Turns out that real applications have multiple threads calls executing the same system call all the time. read() and write() are popular, as are send() and recv() (On UNIX). >I'm assuming that system calls can be designed intelligently enough so >that, for example, if a process wants to perform a disk read, the >process performs a system call, exits the system call (so another >thread can perform a disk read), and then is woken up when the disk >read is done. > >Jeff [I assume the behavior you reference "leave the system call" means "return to user space"] That all depends on the OS. On UNIX, that is not the default system call behavior. On VMS it is (Just two examples). Brian Silver. =================================TOP=============================== Q9: Any "code-coverage" tools for MT applications? Is there an application that can help me with "code-coverage" for MT applications? A: Upon which platform are you working? I did performance profiling last week on a MT app using prof & gprof on a Solaris 2.4 machine. For code coverage, I use tcov. I suspect that most OS's w/ kernel threads have thread-aware gprof and tcov commands. -- Spike White | [email protected] | Biker Nerds HaL Software Systems | '87 BMW K75S, DoD #1347 | From HaL Austin, TX | http://www.halsoft.com/users/spike/index.html Disclaimer: HaL, want me to speak for you? No, Dave... =================================TOP=============================== Q10: How can POSIX join on any thread? The pthread_join() function will not allow you to wait for "any" thread, like the UI function thr_join() will. How can I get this? A: > >: I want to create a number of threads and then wait for the first > >: one to finish, not knowing which thread will finish first. But > >: it appears pthread_join() forces me to specify exactly which of > >: my threads I want to wait for. Am I missing something basic, or > >: is this a fundamental flaw in pthread_join()? > > > >: Rich Stevens > > > >Good call. I notice Solaris native threads have this support and the > >pthreads implementations I've seen don't. I wondered about this myself. > > > > Same here. The situation I ran into was a case where once the main > created the necessary threads and completed any work it was responsible > for, it just needed to "hang-around" until all the threads completed > their work before exiting. pthread_join() for "any" thread in loop using > a counter for the number of threads seemed the logical choice. Then I > realized Solaris threads supported this but POSIX didn't (along with > reader/writer locks). Oh well. > > How about the Solaris SPLIT package. Does it support the "wait for any" > thread join? This "wait for any" stuff is highly misleading, and dangerous in most real threaded applications. It is easy to compare with the traditional UNIX "wait for any process", but there's no similarity. Processes have a PARENT -- and when a process "waits for any" it is truly waiting only for its own children. When your shell waits for your "make" it CANNOT accidentally chomp down on the termination of the "cc" that make forked off! This is NOT true with threads, in most of the common industry threading models (including POSIX 1003.1c-1995 and the "UNIX International" threads model supported by Solaris). Your thr_join(NULL,...) call may grab the termination status of a thread used to parallelize an array calculation within the math library, and thus BREAK the entire application. Without parent/child relationships, "wait for any" is not only totally useless, it's outright dangerous. It's like the Win32 "terminate thread" interface. It may seem "neat" on the surface, but it arbitrarily breaks all shared data & synchronization invariants in ways that cannot be detected or repaired, and thus CANNOT be used in anything but a very carefully constructed "embedded system" type environment where every aspect of the code is tightly controlled (no third-party libraries, and so forth). The very limited enviroments where they are safe & useful are dramatically outweighed by the danger that having them there (and usually very poorly explained) encourages their use in inappropriate ways. It really wouldn't have been hard to devise POSIX 1003.1c-1995 with parent/child relationships. A relatively small overhead. It wasn't even seriously considered, because it wasn't done in any of the reference systems, and certainly wasn't common industry practice. Nevertheless, there are clearly advantages to "family values" in some situations... among them being the ability to usefully support "wait for any". But wishful thinking and a dime gets you one dime... ------------------------------------------------------------------------ Dave Butenhof Digital Equipment Corporation [email protected] 110 Spit Brook Rd, ZKO2-3/Q18 Phone: 603.881.2218, FAX: 603.881.0120 Nashua, NH 03062-2711 "Better Living Through Concurrency" ------------------------------------------------------------------------ I find Dave's comments to be most insightful. He hits on a big point that I have hear a number of people express confusion about. My 2-bits to add: As a programmer we should be thinking about the availability of resources -- when is something ready for use? "Is the Matrix multiply complete?" "Has the data request been satisfied?" etc. thr_join() is often used as a cheap substitute for those questions, because we ASSUME that when all N threads have exited, that the computation is complete. (Generally accurate, as long as we control the entire program. Should some lout get hired to maintain our code, this assumption could become false in a hurry.) The only instance where we REALLY care if a thread has exited is when the resource in question IS that thread (e.g., we want to un-mmap pages we reserved for the stack or other rare stuff). So... the correct answer is "Don't do that." Don't use thr_join() to count threads as they exit. Set up a barrier or a CV and have the threads count down as they complete their work. IE: worker threads: do_work(); ... lock(M); running_threads--; if (running_threads == 0) cond_signal(CV); unlock(M); thr_exit(); "Master" thread: ... running_threads = N; create_workers(N); lock(M) while (running_threads != 0) cond_wait(M, CV); ... -Bil =================================TOP=============================== Q11: What is the UI equivalent for PTHREAD_MUTEX_INITALIZER? A: From the man page (man mutex_init): Solaris Initialize The equivalent Solaris API used to initialize a mutex so that it has several different types of behavior is the type argument passed to mutex_init(). No current type uses arg although a future type may specify additional behavior parameters via arg. type may be one of the following: USYNC_THREAD The mutex can synchronize threads only in this process. arg is ignored. The USYNC_THREAD Solaris mutex type for pro- cess scope is equivalent to the POSIX mutex attribute setting PTHREAD_PROCESS_PRIVATE. USYNC_PROCESS The mutex can synchronize threads in this process and other processes. Only one process should initialize the mutex. arg is ignored. The USYNC_PROCESS Solaris mutex type for process scope is equivalent to the POSIX mutex attribute setting PTHREAD_PROCESS_SHARED. The object initialized with this attribute must be allocated in memory shared between processes, i.e. either in Sys V shared memory (see shmop(2)). or in memory mapped to a file (see mmap(2)). It is illegal to initialize the object this way and to not allocate it in such shared memory. Initializing mutexes can also be accomplished by allocating in zeroed memory (default), in which case, a type of USYNC_THREAD is assumed. The same mutex must not be simul- taneously initialized by multiple threads. A mutex lock must not be re-initialized while in use by other threads. If default mutex attributes are used, the macro DEFAULTMUTEX can be used to initialize mutexes that are statically allo- cated. =================================TOP=============================== Q12: How many threads are too many in one heavyweight process? How many are too many for a single machine? A: The answer, of course, is "it depends". Presumably, the number of threads you're considering far outstrips the number of processors you have available, so it's not really important whether you're running on uni- or a multiprocessor, and it's not really important (in this general case) whether the threads implementation has any kernel support (presumably it doesn't on HP-UX, judging by your post from 14 Feb 1996 14:31:42 -0500). So, it comes down to what these bazillion threads of yours are actually doing. If, for the most part, they just sit there waiting for someone to tickle the other end of a socket connection, then you can probably create LOTS before you hit "too many". In this case it would depend on how much memory is available to your process, in which to keep all of these sleeping threads (and how much kernel resources are available to create sockets for them ;-). If, on the other hand, every one of these bazillion threads is hammering away on the processor (trying to compute some fractal or something :-), then creating any more threads than you have processors is too many. That is, you waste time (performance, throughput, etc.) in switching back and forth between the threads which you could be spending on something useful. That is, life would be better if you just created a couple of threads and had them make their way through all the work at hand. Presumably, your application falls somewhere between the two extremes. The idea is to design so that your "typical operating conditions" involve a relatively small number of threads active at any one time. Having extra ones running isn't a catastrophe, it just means that things aren't quite as efficient as they otherwise might be. -- ------------------------------------------------------------------------ Webb Scales Digital Equipment Corporation [email protected] 110 Spit Brook Rd, ZKO2-3/Q18 Voice: 603.881.2196, FAX: 603.881.0120 Nashua, NH 03062-2711 Rule #12: Be joyful -- seek the joy of being alive. ------------------------------------------------------------------------ =================================TOP=============================== Q13: Is there an atomic mutex_unlock_and_wait_for_event()? Is it possible for a thread to release a mutex and begin waiting on an "event" in one atomic operation? I can think of a few convoluted ways to achieve or simulate this, but am wondering if there's an easy solution that I'm missing. A: This isn't how you'd really want to look at things (POSIX). Figure out what condition you're interested in and use a CV. =================================TOP=============== The NT4.0 beta has a new Win32 API, SignalObjectAndWait that will do what you want. Sorry, it is not available in 3.51 or earlier. -John Robert V. Head =================================TOP=============================== Q14: Is there an archive of this newsgroup somewhere? I believe http://www.dejanews.com keeps a 1 year record of every newsgroup on the Usenet. You can search it by author to get your articles, then pick out individual threads... =================================TOP=============================== Q15: Can I copy pthread_mutex_t structures, etc.? "Ian" == Ian Emmons writes: In article <[email protected]> Ian Emmons writes: Ian> Variables of the data type pthread_t are, semantically speaking, a sort of Ian> reference, in the following sense: Ian> pthread_t tid1; Ian> pthread_t tid2; Ian> void* ret_val; Ian> pthread_create(&tid1, NULL, some_function, NULL); Ian> // Now tid1 references a new thread. Ian> tid2 = tid1; Ian> // Now tid2 references the same thread. Ian> pthread_join(tid2, &ret_val); Ian> In other words, after creating the thread, I can assign from one pthread_t Ian> to another, and they all reference the same thread. Pthread_key_t's (I Ian> believe) behave the same way. You should not copy one structure pthread_t to another pthread_t ... it may not be portable. In some implementations the pthread_t is not simple a structure containing only a pointer and some keys .... it is infact the REAL structure, which would then create two independant structures which each can be manipulated individually reaping havoc. Ian> An attributes object, like pthread_attr_t (or an unnamed semaphore sem_t), Ian> on the other hand does not behave this way. It has value semantics, because Ian> you can't copy one into another and expect to have a second valid attribute Ian> object. Ian> My question is, do pthread_mutex_t's and pthread_cond_t's behave as Ian> references or values? Same statement .... I have seen enough problems where someone copied an initialized lock then continued to lock the two mutexes independently creating very unwanted behavior. -- William E. Hannon Jr. internet:[email protected] AIX/DCE Technical Lead whannon@austin Austin, Texas 78758 Department ATKS/9132 Phone:(512)838-3238 T/L(678) 'Confidence is what you had, before you understood the situation.' Dr. Dobson FOLLOWUP: For most programs, you should be passing pointers around, not structures: pthread_mutex_t my_lock; main() { ... foo(&my_lock); ... } foo(pthread_mutex_t *m) { pthread_mutex_lock(m); ... } =================================TOP=============================== Q16: After 1800 calls to thr_create() the system freezes. ?? My problem is that the thread does not get freed or released back to the system for reuse. After 1800 calls to thr_create() the system freezes. ?? A: The default for threads in both UI and POSIX is for threads to be "undetached" -- meaning that they MUST be joined (thr_join()). Otherwise they will not be garbage collected. (This default is the wrong choice. Oh well.) =================================TOP=============================== Q17: Compiling libraries which might be used in threaded or unthreaded apps? What *is* the straight scoop on how to compile libraries which might be used in threaded or unthreaded apps? Hopefully the "errno" and "putc()" macros will continue to work even if libthread isn't pulled in, so that vendors can make a single version of any particular library. A: Always compile *all* libraries with the reentrancy flag (_REENTRANT for UI threads, _POSIX_C_SOURCE=199506L for POSIX threads). Otherwise some poor soul will try to use your library and get hammered. putc() and getc() WILL be slower, but you may use putc_unlocked() & getc_unlocked() if you know the I/O stream will be used safely. All Solaris libraries are compiled like this. =================================TOP=============================== Q18: What's the difference of signal handling for process and thread? What's the difference of signal handling for process and thread? Do the signals divided into the types of process-based and thread-based which were treated differently in HP-RT? Is there any examples? I'd like to know how to initiate, mask, block, wait, catch, ...... the signals. How can I set the notification list (process or thread?) of SIGIO for both socket and tty using fcntl or ioctl? A: You probably want to buy one of the books that discuss this in detail. Here's the short answer: Signal masking is on per-thread based. But the signal handlers are per-process based. The synchronous signals like SIGSEGV, SIGILL etc will be processed by the thread which caused the signal. The other signals will be handled by any ready thread which has the mask enabled for the signal. There are no special thread library for signal handling. =================================TOP=============================== Q19: What about creating large numbers of threads? I've asked a question about creating 2500 unbound threads. During these days, I have written some more testing programs. Hope you would help me to solve some more problems. 1. I have written a program that creates 10 threads. Then the 10 threads each create 10 more threads. The 100 newly created threads each creates 10 more threads. In a SPARC 2000, if the concurrency level is 100, the program takes 7 seconds to terminate. From a paper, unbound thread creation is claimed to take only 56 usec. How comes my testing program is so slow on a SPARC 2000 that has 20 CPUs? If I use a SPARC 10, the program only takes 1 second to terminate. Is SPARC 2000 slower than a SPARC 10? 2. Instead of creating 2500 threads, I have written a program that creates 200 threads and then kills them all and creates 200 threads and kills them all and ..... After some while of creating and killing, the program hangs. I use sigaction to set a global signal handler for the whole process. As the program is so simple, I don't know where the problem is. 3. In addition, I have written a program that creates 1000 bound threads. Each thread has a simple loop: while (1) { randomly read an entry of an array } This time, not only my program hangs, the whole SPARC 2000 hangs. I can't reset the machine from console. Finally, I have to power down the machine. Thanks in advance. A: =================================TOP=============================== Q20: What about using sigwaitinfo()? >Here is what I am doing. I am using the early access POSIX threads. >My main program blocks SIGUSR1 and creates a number of threads. >One of these threads is dedicated to this signal. All it does is a >sigwaitinfo on this signal, sets a flag when it returns, and exits. >If I send the SIGUSR1 signal to the process using the kill command >from another window, it does not seem to get it and the other threads >(which are doing a calculation in a loop) report that SIGUSR1 is not >pending. > >An earlier version of the program which used a signal handler to set >the flag worked perfectly. > >Do you have any ideas on this? A: I assume you are using sigwaitinfo(3r) from libposix4. Unfortunately, sigwaitinfo() is not MT-safe, i.e. does not work correctly in an MT program, on 2.3/2.4. Use sigwait(2) - it should work on 2.3/2.4. On 2.5 beta, sigwaitinfo() works. If you really need the siginfo on 2.3/2.4, it is going to be hard, and the solution depends on whether you are running 2.3/2.4 but here is an alternative suggestion: Programmers have used signals between processes as an IPC mechanism. Sounds like you are trying to do the same. If this is the case, I would strongly suggest that you use shared memory (see mmap(2)) between processes and shared memory synchronization (using the SysV shared semaphores - see semop(2)), or POSIX synchronization objects with the PTHREAD_PROCESS_SHARED attribute. For example, you can set-up a region of shared memory protected by a mutex and condition variable. The mutex and condition variable would also be allocated from the shared memory and would be initialized with the PTHREAD_PROCESS_SHARED attribute. Now, processes which share this memory can use the mutex and condition variable as IPC mechanisms - any information that needs to be passed between them can be passed through the shared memory (alternative to siginfo :-)). To make this asynchronous, you can have a thread dedicated to monitoring the shared memory area by waiting on the condition variable. Now, whenever the signalling process wants to send a signal, it instead issues a cond_signal on the condition variable. The thread sleeping on this in the other (receiving) process wakes up now and processes the information. In general, signal handlers and threads, even though the system might support this correctly, should not be used together. Signal handlers could be looked upon as "substitute threads" when threads were not around in UNIX, and now that they are, the interactions between them can be complicated. You should mix them together only if absolutely necessary. =================================TOP=============================== Q21: How can I have an MT process communicate with many UP processes? >I have a multithreaded process, each thread in the multithreaded >process wants to communicate with another single-threaded process, >what is the good way to do that? > >Assume each thread in the multithreaded process is identical, i.e. >they are generated using the same funcation call and each thread >creates a shared memory to do the communication, will the generated >shared memories operate independently if no synchronization provided? A: It sounds like you have the right idea. For each thread/process pair, build a shared memory segment and use that for communications. You'll need some sort of synchronization variable in that shared segement for coordination. There is no interaction between segments what-so-ever. =================================TOP=============================== Q22: Writing Multithreaded code with Sybase CTlib ver 10.x? >A customer is trying to write a multi-threaded application that also >uses Sybase CTlib ver 10.x, and he is facing some limitations due to >the Sybase library. > >BOTTOM LINE: CTlib is reentrant, but according to Sybase is not usable >in a multi-threaded context. That means it does NOT seem to be usable >in an MT application. > >The purpose of this mail is NOT to get a fix for CTlib, but to try to >find a workaround, if one exists... A: The workaround for the moment is to use the XA library routines from Sybase, which are, in turn, based upon the TransArc package pthread* routines. We should be getting an alpha version of MT safe/hot CTlib towards the first part of June 1995. Also of potential interest is there will also be an early version of native-threaded OpenServer soon as well, which really opens up a lot of possibilities. Chris Nicholas SunSoft Developer Engineering -------------------------------------------------------------- =================================TOP=============================== Q23: Can we avoid preemption during spin locks? > A while ago I asked you for information on preemption control > interfaces (in-kernel) which might be available in Solaris2.x. I am > looking for ways of lowering number of context switches taken as the > result of adaptive muxtex contention. We have a number of places a > lock is taken and held for a few scant lines of C. It would be great > to prevent preemption during these sections of code. A: You're obvious writing a driver of some sort. (Video driver I'd guess?) And you're VERY concerned with performance on *MP* machines (UPs be damned). You have tested you code on a standardized, repeatable benchmark, and you are running into a problem. You have solid numbers which you are absolutely certain of. Right? You'll have to excuse my playing the heavy here, but you're talking deep do-do here, and I don't want to touch it unless I'm totally convinced I (and you) have to. You could set the SPL up to turn off all interrupts. It would slow your code down quite a bit though. The probablity of preemption occuring over "a few scant lines of C" (i.e., a few dozen instructions) approaches zero. Regularly suffering from preemption during just these few instructions would be a VERY odd thing. I am hard pressed to INVENT a situation like this. Are you absolutely, totally, completely, 100% certain you're seeing this? Are you willing to put $10 on it? =================================TOP=============================== Q24: What about using spin locks instead of adaptive spin locks? > > I also would like to know more about something I saw in > /usr/include/sys/mutex.h. It would appear that it possible to > create pure spinning locks (MUXTEX_SPIN) as opposed to the default > adaptive mutexes (MUTEX_ADAPTIVE_STAT). These might provide the kind > of control I am looking for assuming that these are really supported > and not some bastard orphan left over. A: If I understand the question, the answer is "no". That's what an adaptive mutex is for. It optimizes a spin lock to sleep if there's no value in spinning. If you use a dumb spin lock instead, you are GUARANTEED to run slower. =================================TOP=============================== Q25: Will thr_create(...,THR_NEW_LWP) fail if the new LWP cannot be added? > Does Sun's implementation of thr_create(...,THR_NEW_LWP) fail >to create the multiplexed thread if the new LWP cannot be added to the >multiplexing pool? The unixware docs indicate Novell's implementation >of thr_create() uses THR_NEW_LWP as a hint to the implementation to >increase the pool size. They also do not state the behavior if the >new lwp cannot be created. What is the official statement? A: It should not create a new thread if it returns EAGAIN. Mind you, you're fairly unlikely EVER to see this happen in a real program. (You'll see it in bugs & in testing/design.) =================================TOP=============================== Q26: Is the LWP released upon bound thread termination? > In the sun implementation, if you create a bound >thread, and the thread eventually terminates, is the LWP released >upon termination, or upon thr_join with the terminated thread? A: Yes, a bound thread's LWP is released. This should not affect your programming at all. Use thr_setconcurrency() & leave it at that. =================================TOP=============================== Q27: What's the difference between pthread FIFO the solaris threads scheduling? A: Very little. =================================TOP=============================== Q28: I really think I need time-sliced RR. >Well, i really think I need time-sliced RR. Since I'm making an >multithreaded implementation of a functional concurrent process- >oriented language. MT support is needed to get good usage >of multi CPU machines and better realtime. Today processes are custom >user-level and the runtime system delivers the scheduling. And the >language semantic is that processes are timesliced RR. >Changing the sematic is not realistic. I really hope the pthreads >will spec RR timeslicing, it would make things easier. A: Think VERY carefully. When will you ever *REQUIRE* RR scheduling? And why? Remember, you've never had it ever before, so why now? (There may be a reason, but it had better be good.) Scheduling should normally be invisible, and forcing up to user-awareness is generally a bad thing. >For the moment, since this will only be a prototype, bound threads >will do but bot in a real system with a couple with houndreds of >threads/processes. > >Convince me I don't need RR timeslicing, that would make things easier. >Or how do I make my own scheduler in solaris, or should I stay with >bound threads? OK. (let me turn it around) Give one example of your program which will fail should thr 3 run before thr 2 where there is absolutely NO synchronization involved. With arbitrary time-slicing of course. I can't think of an example myself. (It's just such a weird depencency that I can't come up with it. But I don't know everything...) =================================TOP=============================== Q29: How important is it to call mutex_destroy() and cond_destroy()? here is how I init serval of my threading variables mutex_init( &lock, USYNC_PROCESS, 0 ); cond_init( ¬Busy, USYNC_PROCESS, 0 ); The storage for the variables is in memory mapped file. once I have opened the file, I call unlink to make sure it will be automatically cleaned up. How important is it to call mutex_destroy() and cond_destroy()? Will I wind up leaking some space in the kernel is I do not call these functions? A: =================================TOP=============================== Q30: EAGAIN/ENOMEM etc. apparently aren't in ?! A: 'Course not. :-) They're in errno.h. pthread_create() will return them if something goes wrong. Be careful, ERRNO is NOT used by the threads calls. =================================TOP=============================== Q31: What can I do about TSD being so slow? Q32: What happened to the pragma 'unshared' in Sun C? I read about a pragma 'unshared' for the C-compiler in some Solaris-thread papers. The new C-3.01 don't support the feature anymore I think. There is no hint in the Solaris 2.4 Multithread Programming Guide. But the new TSD is very slow. I tested a program with direct register allocation under gcc (asm "%g3") instead of calling the thr_getspecific procedure and it was over three times faster. Can I do the same thing or something else with the Sun C-compiler to make the C-3.01 Code also faster? A: The "thread local storage" feature that was mentioned in early papers about MT on Solaris, and the pragma "unshared", were never implemented. I know what you mean about the performance of TSD. It isn't very fast. I think the key here is to try to structure your program so that you don't rely too much on thread specific data, if that's possible. The SPARC specification reserves those %g registers for internal use. In general, it's dangerous to rely on using them in your code. However, SC3.0.1 does not use the %g registers in any user code. It does use them internally, but never across function calls, and never in user code. (If you do use the %g registers across function calls, be sure to save and restore the registers.) You can accomplish what gcc does with the "asm" statement by writing what we call an "inline function template." Take a look at the math library inline templates for an idea on how to do that, and see the inline() man page. You might also want to take a look at the AnswerBook for SPARC Assembly Language Programming, which is found in the "Solaris 2.x Software Developer Answerbook". The latest part number for that is 801-6649-10 SPARC ASSEMBLY LANGUAGE REFERENCE MANUAL REV.A AUG 94 The libm templates are found in /opt/SUNWspro/SC3.0.1/lib/libm.il. Inline templates are somewhat more work to write, as compared to using gcc's "asm" feature, but, it's safer. I don't know about the robustness of code that uses "asm" - I like gcc, and I use it, but that particular feature can lead to interesting bugs. Our next compiler, SC4.0 (coming out in late 1995) will use the %g registers more aggressively, for performance reasons. (Having more registers available to the optimizer lets them do more optimizations.) There will be a documented switch, -noregs=global (or something like that) that you will use to tell the SC4.0 NOT to use the global registers. When you switch to SC4.0, be sure to read the cc(1) man page and look for that switch. =================================TOP=============================== Q33: Can I profile an MT-program with the debugger? Can I profile an MT-program with the debugger and a special MT-license or do I need the thread-analyser? A: The only profiling you can do right now for an MT program is what you get with the ThreadAnalyzer. If you have the MT Debugger and SC3.0.1, then, you should also have a copy of the ThreadAnalyzer (it was first shipped on the same CD that had SC3.0.1) Look for the binary "tha" under /opt/SUNWspro/bin. The "Collector" feature that you can use from inside the Debugger doesn't work with MT programs. Future MT-aware-profiling tools will be integrated with the Debugger - is that where you'd like to use profiling? =================================TOP=============================== Q34: Sometimes the specified sleep time is SMALLER than what I want. >I have a program that generates UDP datagrams at regular intervals. >It uses real time scheduling for improved accuracy. >(The code I used is from the Solaris realtime manual.) > >This helps, but once in a while I do not get the delay I wanted. >The specified sleep time is SMALLER (i.e. faster) than what I want. > >I use the following procedure for microsecond delays > >void >delay(int us) /* delay in microseconds */ >{ > struct timeval tv; > > tv.tv_sec = us / 1000000; > tv.tv_usec = us % 1000000; > (void)select( 0, (fd_set *)NULL, (fd_set *)NULL, (fd_set *)NULL, &tv ); > >} > > >As I said, when I select a delay, occasionally I get a much smaller delay. > >examples: > Wanted: 19,776 microseconds, got: 10,379 microseconds > Wanted: 910 microseconds, got: 183 microseconds > > >As you can see, the error is significant when it happens. >It does not happen often. (0.5% of the time) > >I could use the usleep() function, but that's in the UCB library. >Anyone have any advice? A: First of all, you can not do a sleep implementation in any increments other than 10 milliseconds (or 1/HZ variable). Second, there is a bug in the scheduler (fixed in 2.5) that may mess up your scheduling in about 1 schedules around every 300,000 or so. Third, A much better timing interface will be available in Solaris 2.6 (or maybe earlier) thru posix interfaces. That should give you microsecond resolution with less than 50 microseconds latency. Sinan =================================TOP=============================== Q35: Any debugger that single step a thread while the others are running? |> Has anyone looked into the possibility of doing a MT debugger |> that will allow you to single step a thread while the others |> are running? This will probably require a debugger that attaches |> a debugger thread to each thread... A: This was the topic of my master's thesis. You might check: http://plg.uwaterloo.ca/~mkarsten and follow the link to the abstract or the full version. Martin =================================TOP================= We have used breakpoint debugging to debug threads programs. We have implemented a debugger that enables the user to write scripts to debug programs (not limited to threads programs). This is made possible by a Tcl interface atop gdb and hooks in gdb, that exports some basic debugger internals to the user domain. Thus allowing the user to essentially write his own Application Specific debugger. Please see the following web page for more info on the debugger http://www.eecs.ukans.edu/~halbhavi/debugger.html or http://www.tisl.ukans.edu/~halbhavi/debugger.html Cheers Sudhir Halbhavi [email protected] =================================TOP=============================== Q36: Any DOS threads libraries? > Is there any way or does anyone have a library that will allow to program > multitreads.. I need it for SVGA mouse functions.. I use both C++ and > Watcom C++, A: I use DesqView for my DOS based multi-thread programs. (Only they don't call them threads, they call them tasks....) I like the DesqView interface to threads better than the POSIX/Solaris interface, but putting up with DOS was almost more than I could stand. =================================TOP=============================== Q37: Any Pthreads for Linux? See: http://pauillac.inria.fr/~xleroy/linuxthreads/ http://sunsite.unc.edu/pub/Linux/docs/faqs/Threads-FAQ/html Linux has kernel-level threads now and has had a thread-safe libc for a while. With LinuxThreads, you don't have to worry about things like your errno, or blocking system calls. The few standard libc functions that are inherently not thread safe (due to using static data areas) have been augmented with thread-safe alternatives. LinuxThreads are not (fully) POSIX, however. ----------------- I'm quite familiar with Xavier's package. He's done an awesome job given what he had to work with. Unfortunately, the holes are large, and his valiant attempts to plug them result in a larger and more complicated user-mode library than should be necessary, without being able to completely resolve the problems. Linux uses clone() which is not "kernel-level threads", though, with some proposed (and possibly pending) extensions in a future version of the kernel, it could become that. Right now, it's just a way to create independent processes that share some resources. The most critical missing component is the ability to create multiple executable entities (threads) that share a single PID, thereby making those entities threads rather than processes. Linuxthreads, despite using the "pthread_" prefix, is NOT "POSIX threads" (aka "pthreads") because of the aforementioned substantial and severe shortcoming of the current implementation based on clone(). Without kernel extensions, a clone()-based thread package on Linux cannot come close to conforming to the POSIX specification. The common characterization of Linuxthreads as "POSIX threads" is incorrect and misleading. This most definitely is not "a true pthreads implementation", merely a nonstandard thread package that uses the "pthread" prefix. Note, I'm not saying that's necessarily bad. It supports much of the interface, and unlike user-mode implementations (which also tend to be far more buggy than Linuxthreads), allows the use of multiple processors. Linuxthreads is quite useful despite its substantial deficiencies, and many reasonable programs can be created and ported using it. But it's still not POSIX. =================================TOP=============================== Q38: Any really basic C code example(s) and get us newbies started? >Could one of you threads gods please post some really, really basic C code >example(s) and get us newbies started? There just doesn't seem to be any other >way for us to learn how to program using threads. A: The following is a compilation of all the generous help that was posted or mailed to me concerning the use of threads in introductory programs. I apologize for it not being edited very well... (Now I just need time to go through all of these) Here's all of the URL's: http://www.pilgrim.umass.edu/pub/osf_dce/contrib/contrib.html http://www.sun.com/workshop/threads http://www.Sun.COM/smi/ssoftpress/catalog/books_comingsoon.html http://www.aa.net/~mtp/ --Carroll =================================TOP=============================== Q39: Please put some Ada references in the FAQ. A: Most Ada books will introduce threading concepts. Also, check out Windows Tech Journal, Nov. 95 for more info on this subject. =================================TOP=============================== Q40: Which signals are synchronous, and whicn are are asynchronous? >I have another question. Since we must clearly distinguish the >sinchronous signals from the asynchronous ones for MT, is there any >documentation on which is which? I could not find any. A: In general, independent of MT, this is an often mis-understood area of signals. The adjective: "synchronous"/"asynchronous" cannot be applied to a signal. This is because any signal (including normally synchronously generated signals such as SIGSEGV) could be asynchronously generated using kill(2), _lwp_kill(2) or thr_kill(3t). e.g. SIGSEGV, which is normally synchronously generated, can also be sent via kill(pid, SIGSEGV), in which case it is asynchronously generated. So labelling SIGSEGV as synchronous and a program that assumes this, would be incorrect. For MT, a question is: would a thread that caused the generation of a signal get this signal? If this is posed for a trap (SIGSEGV, SIGBUS, SIGILL, etc.), the answer is: yes - the thread that caused the trap would get the signal. But the handler for the trap signal, i.e. a SIGSEGV handler, for example, cannot assume that the handler was invoked for a synchronously generated SIGSEGV (unless the application knows that it could not have receieved a SIGSEGV via a kill(), or thr_kill()). If this question is posed for any other signal (such as SIGPIPE, or the real-time signals) the answer should not really matter since the program should not depend on whether or not the thread that caused the signal to be generated, receives it. For traps, it does matter, but for any other signal, it should not matter. FYI: On 2.4 and earlier releases, SIGPIPE, and some other signals were sent to the thread that resulted in the generation of the signal, but on 2.5, any thread may get the signal. The only signals that are guaranteed to be sent to the thread that resulted in its generation, are the traps (SIGILL, SIGTRAP, SIGSEGV, SIGBUS, SIGFPE, etc.). This change should not matter since a correctly written MT application would not depend on the synchronicity of the signal generation for non-traps, given the above description of signal synchronicity that has always been true. -Devang =================================TOP=============================== Q41: If we compile -D_REENTRANT, but without -lthread, will we have problems? >Hi - > >I had posed a question here a few weeks ago and received a response. Since >then the customer had some follow-on questions. Can anyone address this >customer's questions: > >(note: '>' refers to previous answer we provided customer) > >> If only mutexes are needed to make the library mt-safe, the library writer >> can do the following to enable a single mt-safe library to be used by both >> MT and single-threaded programs: > >Actually, we are only using the *_r(3c) functions, such as strtok_r(3c), >getlogin_r(3c), and ctime_r(3c). We are not actually calling thr_*, >mutex_*, cond_*, etc. in the libraries. > >We want to use these *_r(3c) library functions instead of the normal >non-MT safe versions (such as strtok(), ctime(), etc.), but if we compile >the object files with -D_REENTRANT, but do not link with -lthread, will >we have problems? A: No - you will not have any problems, if you do not link with -lthread. But if your library is linked into a program which uses -lthread, then: You might have problems in a threaded program because of how you allocate and use the buffers that are passed in to the *_r routines. The usage of the *_r routines has to be thread-safe, or re-entrant in the library. The *_r routines take a buffer as an argument. If the library uses a global buffer to be passed to these routines, and does not protect this buffer appropriately, the library would be unsafe in a threaded program. Note that here, the customer's library has to do one of the following to ensure that their usage of these buffers is re-entrant: - if possible, allocate the buffers off the stack - this would be per-thread storage and would not require the library to do different things depending on whether the library is linked into a threaded program or not. - if the above is not possible: On any Solaris release, the following may be done: (recommended solution): - use mutexes, assuming that threads are present, to protect the buffers. If the threads library is not linked in, there are dummy entry points in libc for mutexes which do nothing - and so this will compile correctly and still work. If the threads library is linked in, the mutexes will be real and the buffers will be appropriately protected. On Solaris 2.5 only: - if you do not want to use mutexes for some reason and want to use thread-specific data (TSD) if threads are present (say), then on 2.4 you cannot do anything. On 2.5, though, one of the following may be done: (a) on 2.5, you can use thr_main() to detect if threads are linked in or not. If they are, carry out appropriate TSD allocation of buffers. (b) If you are sure only POSIX threads will be used (if at all), and you do not like the non-portability of thr_main() which is not a POSIX interface, then, on 2.5, you can use the following (hack) to detect if pthreads are linked in or not: you need the #pragma weak declaration so that you can check if a pthreads symbol is present or not. If it is, then pthreads are linked in, otherwise they are not. Following is a code snippet which demonstrates this. You can compile it with both -lpthread and without. If compiled without -lpthread it prints out the first print statement. If compiled with -lpthread, it prints out the second print statement. I am not sure if this usage of #pragma weak is any more portable than using thr_main(). #include#pragma weak pthread_create main() { if (pthread_create == 0) { printf("libpthread not linked\n"); } else { printf("libpthread is present\n"); /* * In this case, use Thread Specific Data * or mutexes to protect access to the global * buffers passed to the *_r routines. */ } } -Devang =================================TOP=============================== Q42: Can Borland C++ for OS/2 give up a TimeSlice? Johan> Does anyone know if Borland C++ for OS/2 has a function that could be Johan> used within a THREAD to give up a TimeSlice. A: If all you want to do is give up your timeslice DosSleep(0) however if you are the highest priority thread, you will be immediately dispatched again, before other threads. Even when all the threads are the same priority, my understanding is that the OS/2 operating system has a degradation algorithm for the threads in a process ... so even if you DosSleep with the "same" priority your thread still could be dispatched immediately --- depending on the degradation algorithm. If you want to sleep to next clock tick DosSleep(1) works, because the system round the 1 up to the next clock tick value. This should allow other threads in your process to be dispatched. Both are valid semantics, depending on what you would prefer. -- William E. Hannon Jr. internet:[email protected] DCE Threads Development whannon@austin =================================TOP=============================== Q43: Are there any VALID uses of suspension? UI threads, OS/2 and NT all allow you to suspend a thread. I have yet to see a program which does not go below the API (ie debuggers, GCs, etc.), but still uses suspension. I don't BELIEVE there is a valid use. I could be wrong. A: I'll bite. Whether we "go below the API" or not is for you to decide. Our product, ObjectStore, is a client-server object-oriented database system. For the purpose of this discussion, it functions like a user-mode virtual memory system: We take a chunk of address space and use it as a window onto a database; if the user touches an address within our special range, we catch the page fault, figure out which database page "belongs" there, and read that page from the server. After putting the page into place, we continue the faulting instruction, which now succeeds, and the user's code need never know that it wasn't there all the time. This is all fine for a single-threaded application. There's a potential problem for MT applications, however; consider reading a page from a read-only database. Thread A comes along and reads a non-existent page. It faults, the fault enters our handler, and we do the following: get data from server make page read-write ;open window copy data to page make page read-only ;close window During the window between the two page operations, another thread can come along and read invalid data from the page, or in fact write the page, with potentially disastrous effect. On Windows and OS/2, we do the following: get data from server suspend all other threads make page read-write copy data to page make page read-only resume all other threads to prevent the "window" from opening. On OS/2, we use DosEnterCritSec, which suspends all other threads. On NT, we use the DllMain routine to keep track of all the threads in the app, and we call SuspendThread on each. We're very careful to keep the interval during which threads are suspended as brief as possible, and on OS/2 we're careful not to call the C runtime while holding the critical section. On most Unix systems, we don't have to do this, because mmap() has the flexibility to map a single physical page into two or more separate places in the address space. This enables us to do this: get data from server make read-write alias of page, hidden from user copy data to alias page make read-only page visible to user The last operation here is atomic, so there's no opportunity for other threads to see bogus data. There's no equivalent atomic operation on NT or OS/2, at least not one that will operate at page granularity. =================================TOP============== Since you do not like Suspend/Resume to be available to user level apis, I thought the following set of functions (available to programs) in WinNT (Win32) might catch your interest :) : CreateRemoteThread -- allows you to start a thread in another process's address space.. The other process may not even know you've done it (depending on circumstances). Supposedly, with full security turned on (off by default!) this won't violatge C2 security. SetThreadContext/GetThreadContext - Just lke it sounds. You can manipulate a thread's context (CPU registers, selectors, etc!). Also, you can forcibly map a library (2-3 different ways: createremotethread can allow this as well) to another proces's address space (that is, you can map a DLL of yours to a running process). Then, you can do things like spawn off threads, after you have invisibly mapped your DLL into the space. Yes, there is potential for abuse (and for interestiing programs). But, microsoft has a use for these things. They can help you subclass a window on the desktop for instance. If you wanted to make say Netscape's main window beep twice every time it repaints, you could map a DLL into netscape's address space, subclass the main window (subclass == "Send ME the window's messages, instead of sending it to the window -- i'll take care of everything!"), and watch for PAINTs to come through. Anyway, don't mean to waste your time. Just thought you might find it interesting that a user can start additional threads in someone else's process, change thread context forcibly (to a decent degree), and even latch onto a running process in order to change its behavior, or just latch on period to run a thread you wrote in another proceses's address space. =================================TOP=============================== Q44: What's the status of pthreads on SGI machines? >> We are considering porting of large application from Concurrent Computer >> simmetrical multiprocessor running RTU-6 to one of the Silicon Graphics >> multiprocessors running IRIX (5.3?). >> >> Our application uses threads heavily. Both so-called user threads and >> kernel threads are required with a fair level of synchronization >> primiteves support and such. >> >> My question is: what kind of multi-threaded application programming >> support is available in IRIX? >> >> Reading some of the SGI technical papers available on their WWW page >> just confuses me. I know that Posix threads or Solaris-type >> LWP/threads supports would be OK. A: POSIX thread support in IRIX is more than a rumor - pthreads are currently scheduled to be available shortly after release of IRIX 6.2 (IRIX 6.2 is currently scheduled for release in Feb 96). If you are interested in obtaining pthreads under IRIX as soon as possible, I would recommend contacting your local SGI office. -- Bruce Johnson, SGI ASD Real-time Applications Engineering =================================TOP=============================== Q45: Does the Gnu debugger support threads? A: An engineer at Cygnus is implementing thread support in gdb for Solaris. No date for completion is given. =================================TOP=============================== Q46: What is gang scheduling? A: Gang Scheduling is described a variety of ways. Generally the consistent thread is that a GS gives a process all the processors at the same time (or none for a time slice). This is most helpful for "scientific apps" because the most common set up is something like do i=1, bignum stuff more stuff lots more stuff end do the obvious decomposition is bignum/nproc statically allocated. Stuff and friends take very close to the same time per chunk, so if you get lucky it all happens in one chime (viz. one big clock). Else it takes precisely N chimes with no leftovers. When unlucky, it's N chimes + cleanup for stragglers. Virtually all supercomputers do this, they may not even bother to give it a special name. SGI makes this explicit (and supported). On SPARC/Solaris there is no way for the compiler to know if we'll get the processors requested or when. So you can suffer multiple chime losses quite easily. One can reallocate processor/code on the fly, but with increased overhead. =================================TOP=============================== Q47: LinuxThreads linked with X11, calls to X11 seg fault. You can't rely on libraries that are not at the very least compiled with -DREENTRANT to do anything reasonable with threads. A vanilla X11 build (with out -DREENTRANT and without XTHREADS enabled) will likely behave badly with threads. It's not terribly hard to build X with thread support these days, especially if you're using libc-6 with builtin LinuxThreads. Contact your Linux distribution maintainer and insist on it. Debian has just switched to a thread-enabled X11 for their libc6 release; has any other distribution? Bill Gribble =================================TOP=============================== Q48: Are there Pthreads on Win32? Several answers here. #1 is probably the best (most recent!). A: Yes, there is a GNU pthreads library for Win32. It is still under active development, but you can find out more by looking at http://sourceware.cygnus.com/pthreads-win32/ (This is a combination of Ben Elliston & John Bossom's work. & others?) Also: Well, Dave Butenhof will probably kill me for saying this, but Digital has a pthreads implementation for WIN32. I bug them occasionally about packaging up the header and dll and selling it separately (for a reasonable price, of course). I think it's a great idea. My company has products on NT and UNIX, so it would solve some painful portability issues for us. This implementation uses the same "threads engine" that Digital uses, rather than just some wrappers on NT system services. So, maybe if a few potential customers join me in asking Digital for this, we'll get somewhere. What say, Dave? ================ I have such a beast...sort of. I have a pthreads draft 4 wrapper that is (nearly) complete and has been in use for a while (so it seems to work!). About 6 weeks back I changed this code to provide a draft 10 interface. This code has however not yet been fully tested nor folded into my projects. Casting my mind back (a lot has happened in 6 weeks!) I seem to remember one or two small issues where I wasn't sure of the semantics; I was working from a document I picked up at my last job which showed how to migrate from pthreads 4 to pthreads 10, rather than a copy of the standard. If anyone wants this code, I can make it available. Ian [email protected] ================ > > As far as I know, there is no pthreads implementation for NT. However, > > ACE provides a C++ threads wrapper which works on pthreads, and on NT > > (and some others). > > Well, Dave Butenhof will probably kill me for saying this, but Digital has a > pthreads implementation for WIN32. I bug them occasionally about packaging up > the header and dll and selling it separately (for a reasonable price, of > course). I think it's a great idea. My company has products on NT and UNIX, > so it would solve some painful portability issues for us. This implementation > uses the same "threads engine" that Digital uses, rather than just some > wrappers on NT system services. Yes, DECthreads has been ported to Win32 for quite a while. It runs on Windows NT 3.51 and 4.0, on Alpha and Intel; and also on Windows 95 (though this was not quite as trivial as Microsoft might wish us to believe.) The main questions are: 1. What's the market? 2. How do we distribute the code, and at what cost? (Not so much "cost to the customer", as "cost to Digital".) The big issue is probably that, especially with existing free code such as ACE, it seems unlikely that there'd be much interest unless it was free or "dirt cheap". Yet, even if we disclaim support, there will still be costs associated, which means it'd be really tricky to avoid losing money. > So, maybe if a few potential customers join me in asking Digital for this, > we'll get somewhere. What say, Dave? We'd love to hear who wants this and why. Although I haven't felt comfortable actually advertising the possibility here, I have forwarded the requests I've seen here, and recieved via mail (including Jeff's) to my manager, who is the first (probably of several) who needs to make any decisions. I'd be glad to forward additional requests. Indications of what sort of product (e.g., in particular, things like "sold for profit" or "internal utility" distinctions), and, of course, whether (and how much) you'd be willing to pay, would be valuable information. /---------------------------[ Dave Butenhof ]--------------------------\ From: Ben Elliston Matthias Block writes: > is there someone who knows anything about a Pthread like library for > Windows NT. It would simplify the work here for me. I am involved with a free software project to implement POSIX threads on top of Win32. For the most part, it is complete, but it's still well and truly in alpha testing right now. I expect to be posting an announcement in a few weeks (say, 4?) to comp.programming.threads. The source code will be made available via anonymous CVS for those who want to keep up to date or submit patches. I'm looking forward to getting some net testing! Over the last several months I have seen some requests for a Win32 implementation of PThreads. I, too, had been looking for such an implementation but to no avail. Back in March, I decided to write my own. It is based upon the PThreads 1003.1c standard, however, I didn't implement everything. Missing is signal handling and real-time priority functions. I based the implementation on the description provided by Programming with POSIX Threads, by Dave R. Butenhof I've created a zipped file consisting of some header files, an implib, a DLL and a simple test program. I'm providing this implementation for free and as-is. You may download it from http://www.cyberus.ca/~jebossom/pthread1c.html Cheers, John -- John E. Bossom Cognos Inc. Voice: (613) 738-1338 x3386 O_o P.O. Box 9707, Stn. T FAX: (613) 738-0002 =( )= Ack! OTTAWA, ON K1G 4K9 INET: [email protected] U CANADA =================================TOP=============================== Q49: What about garbage collection? Please, please, please mention garbage collection when you come around to talking about making code multithreaded. A whole lot of heap-allocated data needs to be explicitly reference counted *even more* in a multithreaded program than in a single threaded program (since it is so much harder to determine whether data is live or not), and this leads to lots of bugs and worries and nonsense. With garbage collection, on the other hand, you get to throw away *all* of your worries over memory management. This is a tremendous win when your brain is already approaching meltdown due to the strain of debugging subtle race conditions. In addition, garbage collection can help to make the prepackaged libraries you link against safer to play with (although it obviously won't help to make them thread safe). Xt, for example, is very badly written and leaks like a sieve, but a conservative garbage collector will safely kill off those memory leaks. If you're linking against legacy libraries and you need to write a long-running multithreaded server, GC can make the difference between buying more RAM and killing your server every few days so that it doesn't thrash, and simply plugging in the threads-aware GC and sailing fairly happily along. Bryan O'Sullivan [Please see: Geodesic Systems (www.geodesic.com) -Bil] =================================TOP=============================== Q50: Does anyone have any information on thread programming for VMS? No ftp or web stuff, although we do have an HTML version of the Guide to DECthreads and we'll probably try to get it outside the firewall where it'll do y'all some good, one of these days. I've been very impressed with Sun's "thread web site", and I'd like to get Digital moving in that direction to help with the global work of evangelizing threads... but not until I've finished coding, and writing, and consulting, and all sorts of other things that seem to take 500% of my time. For general info, and some information (though not enough) on using POSIX threads, check Sun's library. (They need to start tapering off the UI threads.) If you've got VMS (anything since 5.5-2), you'll have a hardcopy of the Guide in your docset, and on the doc cdrom in Bookreader format. OpenVMS version 7.0 has POSIX 1003.1c-1995 threads -- anything earlier has only the old CMA and 1003.4a/D4 "DCE threads". Furthermore, OpenVMS Alpha 7.0 supports SMP threads (kernel support for dynamic "many to few" scheduling), although "binary compatibility paranoia" has set in and it may end up being nearly impossible to use. OpenVMS VAX 7.0 does not have SMP or kernel integration -- integration will probably happen "soon", but VAX will probably never have SMP threads. ------------------------------------------------------------------------ Dave Butenhof Digital Equipment Corporation [email protected] 110 Spit Brook Rd, ZKO2-3/Q18 Phone: 603.881.2218, FAX: 603.881.0120 Nashua, NH 03062-2711 "Better Living Through Concurrency" ------------------------------------------------------------------------ =================================TOP=============================== Q51: Any information on the DCE threads library? http://www.osf.org/dce/ =================================TOP=============================== Q52: Can I implement pthread_cleanup_push without a macro? I was about to use pthread_cleanup_push, when I noticed that it is implemented as a macro (on Solraris 2.5) which forces you to have the pthread_cleanup_pop in the same function by having an open brace { at the end of the first macro and closing it int the second... Since I want to hide most of this stuff in something like a monitor (or a guard in ACE) in C++ by using the push in a constructor and the pop in the destructor I'm wondering if there is something fondamental that would prevent me to do so or could I just re-implement the stuff done by the macros inside some class services. POSIX 1003.1c-1995 specifies that pthread_cleanup_push and pthread_cleanup_pop must be used at the same lexical scope, "as if" the former were a macro that expands to include an opening brace ("{") and the latter were a macro that expands to include the matching closing brace ("}"). The Solaris 2.5 definition therefore conforms quite accurately to the intent of the standard. And so does the Digital UNIX definition, for that matter. If you can get away with "reverse engineering" the contents of the macros, swell; but beware that this would NOT be a service to those using your C++ package, as the results will be extremely non-portable. In fact, no guarantees that it would work on later versions of Solaris, even assuming strict binary compatibility in their implementation -- because they could reasonably make "compatible" changes that would take advantage of various assumptions regarding how those macros are used that you would be violating. What you want to do has merit, but you have to remember that you're writing in C++, not C. The pthread_cleanup_push and pthread_cleanup_pop macros are the C language binding to the POSIX 1003.1c cancellation cleanup capability. In C++, the correct implementation of this capability is already built into the language... destructors. That is, C++ and threads should be working together to ensure that C++ destructors are run when a thread is cancelled. If that is done, you've got no problem. If it's not done, you've got far worse problems anyway since you won't be "destructing" most of your objects anyway. /---[ Dave Butenhof ]-----------------------[ [email protected] ]---\ | Digital Equipment Corporation 110 Spit Brook Rd ZKO2-3/Q18 | | 603.881.2218, FAX 603.881.0120 Nashua NH 03062-2698 | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q53: What switches should be passed to particular compilers? > Does anyone have a list of what switches should be passed to particular > compilers to have them generate thread-safe code? For example, > > Solaris-2 & SunPro cc : -D_REENTRANT > Solaris-2 & gcc : ?? > DEC Alpha OSF 3.2 & /bin/cc : -threads > IRIX 5.x & /bin/cc : ?? > > Similarly, what libraries are passed to the linker to link in threads > support? > > Solaris-2 & Solaris threads : -lthread > DEC Alpha OSF 3.2 threads : -lpthreads > IRIX 5.x & IRIX threads : (none) > > And so forth. > > I'm trying to get GNU autoconf to handle threads gracefully. > > Bill That would be useful information in general, I suppose. I can supply the information for Digital UNIX (the operating system previously known as "DEC OSF/1"), at least. For 3.x and earlier, the proper compiler switch is -threads, which (for cc) is effectively just -D_REENTRANT. For linking, the cc driver expands -threads to "-lpthreads -lmach -lc_r" -- you need all three, immediately preceeding -lc (which must be at the end). -lpthreads isn't enough, it will pull in libmach and libc_r implicitly and in the wrong order (after libc, where they will fail to preempt symbols). For 4.0, you can still use -threads if you're using the DCE threads (D4) or cma interfaces. If you don't use -threads, the link libraries should be changed to "-lpthreads -lpthread -lmach -lexc" (before -lc). If you use 1003.1c-1995 threads, you use "-pthread" instead of "-threads". cc still gets -D_REENTRANT, but ld gets -lpthread -lmach -lexc. /---[ Dave Butenhof ]-----------------------[ [email protected] ]---\ | Digital Equipment Corporation 110 Spit Brook Rd ZKO2-3/Q18 | | 603.881.2218, FAX 603.881.0120 Nashua NH 03062-2698 | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q54: How do I find Sun's bug database? >I am trying to use Thread Analyzer in Solaris 2.4 for performance >tuning. But after loading the trace directory, tha exit with following >error message: >Thread Analyzer Fatal Error[0]: Slave communication failure It always helps if you state which version of the application you are using, in this case the Thread Analyzer. There have been a number of bugs which result in this error message that have been fixed. Please obtain the latest ThreadAnalyzer patch from your Authorized Service Provider (ASP) or from our Wep page: http://access1.sun.com/recpatches/DevPro.html =================================TOP=============================== Q55: How do the various vendors' threads libraries compare? Fundamentally, they are all based on the same paradigm, and everything you can do in one library you can (pretty much) do in any other. Ease of programming and efficency will be the major distinctions. OS Preferred Threads POSIX Version Kernel Support Sched model --------------- ----------------- ------------- -------------- ------------- Solaris 2.5 UI-threads 1003.1c-1995 yes 2 level(1) SVR4.2MP/UW 2.0 UI-threads No IRIX 6.1 sproc No IRIX 6.2 sproc 1003.1c-1995(2) Digital UNIX 3.2 cma Draft 4 yes 1 to 1 Digital UNIX 4.0 1003.1c-1995 1003.1c-1995 yes 2 level DGUX 5.4 ? Draft 6 yes NEXTSTEP (cthreads?) No AIX 4.1 AIX Threads(3) Draft 7 yes 1 to 1 Plan 9 rfork() No OpenVMS 6.2 cma Draft 4 no OpenVMS Alpha 7.0 1003.1c-1995 1003.1c-1995 yes 2 level OpenVMS VAX 7.0 1003.1c-1995 1003.1c-1995 no WinNT Win32 threads No OS/2 DosCreateThread() Draft 4 Win32 Win32 threads No yes 1 to 1 Notes: 1) Solaris 2.5 blocks threads in kernel with LWP, but provides a signal to allow user level scheduler to create a new LWP if desired (and thr_setconcurrency() can create additional LWPs to minimize the chances of losing concurrency due to blocking.) 2) According to IRIX 6.2 info on SGI's web, 1003.1c-1995 threads will be provided only as part of the REACT/pro 3.0 Realtime Extensions kit, not in the base O/S. 3) Can anyone clarify this? My impression is that AIX 4.1 favors 1003.4a/D7 threads; but then I've never heard the term "AIX Threads". =================================TOP=============================== Q56: Why don't I need to declare shared variables VOLATILE? > I'm concerned, however, about cases where both the compiler and the > threads library fulfill their respective specifications. A conforming > C compiler can globally allocate some shared (nonvolatile) variable to > a register that gets saved and restored as the CPU gets passed from > thread to thread. Each thread will have it's own private value for > this shared variable, which is not what we want from a shared > variable. In some sense this is true, if the compiler knows enough about the respective scopes of the variable and the pthread_cond_wait (or pthread_mutex_lock) functions. In practice, most compilers will not try to keep register copies of global data across a call to an external function, because it's too hard to know whether the routine might somehow have access to the address of the data. So yes, it's true that a compiler that conforms strictly (but very aggressively) to ANSI C might not work with multiple threads without volatile. But someone had better fix it. Because any SYSTEM (that is, pragmatically, a combination of kernel, libraries, and C compiler) that does not provide the POSIX memory coherency guarantees does not CONFORM to the POSIX standard. Period. The system CANNOT require you to use volatile on shared variables for correct behavior, because POSIX requires only that the POSIX synchronization functions are necessary. So if your program breaks because you didn't use volatile, that's a BUG. It may not be a bug in C, or a bug in the threads library, or a bug in the kernel. But it's a SYSTEM bug, and one or more of those components will have to work to fix it. You don't want to use volatile, because, on any system where it makes any difference, it will be vastly more expensive than a proper nonvolatile variable. (ANSI C requires "sequence points" for volatile variables at each expression, whereas POSIX requires them only at synchronization operations -- a compute-intensive threaded application will see substantially more memory activity using volatile, and, after all, it's the memory activity that really slows you down.) /---[ Dave Butenhof ]-----------------------[ [email protected] ]---\ | Digital Equipment Corporation 110 Spit Brook Rd ZKO2-3/Q18 | | 603.881.2218, FAX 603.881.0120 Nashua NH 03062-2698 | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q57: Do pthread_cleanup_push/pop HAVE to be macros (thus lexically scoped)? Paul Pelletier wrote: I was about to use pthread_cleanup_push, when I noticed that it is implemented as a macro (on Solaris 2.5) which forces you to have the pthread_cleanup_pop in the same function by having an open brace { at the end of the first macro and closing it int the second... Since I want to hide most of this stuff in something like a monitor (or a guard in ACE) in C++ by using the push in a constructor and the pop in the destructor I'm wondering if there is something fundamental that would prevent me to do so or could I just re-implement the stuff done by the macros inside some class services. POSIX 1003.1c-1995 specifies that pthread_cleanup_push and pthread_cleanup_pop must be used at the same lexical scope, "as if" the former were a macro that expands to include an opening brace ("{") and the latter were a macro that expands to include the matching closing brace ("}"). The Solaris 2.5 definition therefore conforms quite accurately to the intent of the standard. And so does the Digital UNIX definition, for that matter. If you can get away with "reverse engineering" the contents of the macros, swell; but beware that this would NOT be a service to those using your C++ package, as the results will be extremely non-portable. In fact, no guarantees that it would work on later versions of Solaris, even assuming strict binary compatibility in their implementation -- because they could reasonably make "compatible" changes that would take advantage of various assumptions regarding how those macros are used that you would be violating. What you want to do has merit, but you have to remember that you're writing in C++, not C. The pthread_cleanup_push and pthread_cleanup_pop macros are the C language binding to the POSIX 1003.1c cancellation cleanup capability. In C++, the correct implementation of this capability is already built into the language... destructors. That is, C++ and threads should be working together to ensure that C++ destructors are run when a thread is cancelled. If that is done, you've got no problem. If it's not done, you've got far worse problems anyway since you won't be "destructing" most of your objects anyway. /---[ Dave Butenhof ]-----------------------[ [email protected] ]---\ =================================TOP=============================== Q58: Thread Analyzer Fatal Error[0]: Slave communication failure ?? >I am trying to use Thread Analyzer in Solaris 2.4 for performance >tuning. But after loading the trace directory, tha exit with following >error message: >Thread Analyzer Fatal Error[0]: Slave communication failure > >I do not know what happened. It always helps if you state which version of the application you are using, in this case the Thread Analyzer. There have been a number of bugs which result in this error message that have been fixed. Please obtain the latest ThreadAnalyzer patch from your Authorized Service Provider (ASP) or from our Wep page: http://access1.sun.com/recpatches/DevPro.html Chuck Fisher =================================TOP=============================== Q59: What is the status of Linux threads? =================================TOP=============================== Q60: The Sunsoft debugger won't recognize my PThreads program! Nope. The 3.0.2 version was written before the release of Sun's pthread library. However, if you simply include -lthread on the compile line, it will come up and work. It's a little bit redundant, but works fine. Hence: %cc -o one one.c -lpthread -lthread -lposix4 -g =================================TOP=============================== Q61: How are blocking syscall handled in a two-level system? > Martin Cracauer wrote: > > > > In a thread system that has both user threads and LWPs like Solaris, > > how are blocking syscall handled? > > Well, do you mean "like Solaris", or do you mean "Solaris"? There's no > one answer for all systems. LWP, by the way, isn't a very general term. > Lately I've been using the more cumbersome, but generic and relatively > meaningful "kernel execution contexts". A process is a KEC, an LWP is a > KEC, a "virtual processor" is a KEC, a Mach thread is a KEC, an IRIX > sproc is a KEC, etc. > > > By exchanging blocking syscalls to nonblocking like in a > > pure-userlevel thread implementation? > > Generally, only "pure user-mode" implementations, without any kernel > support at all, resort to turning I/O into "nonblocking". It's just not > an effective mechanism -- there are too many limitations to the UNIX > nonblocking I/O model. > > > Or by making sure a thread that calls a blocking syscall is on its own > > LWP (the kernel is enterend anyway, so what would be the cost to do > > so)? > > Solaris 2.5 "latches" a user thread onto an LWP until it blocks in user > mode -- on a mutex, a condition variable, or until it yields. User > threads aren't timesliced, and they stick to the LWP across kernel > blocks. If all LWPs in a process block in the kernel, a special signal > allows the thread library to create a new one, but other than that you > need to rely a lot on thr_setconcurrency. > > Digital UNIX 4.0 works very differently. The kernel delivers "upcalls" > to the user mode scheduler to communicate various state changes. User > threads, for example, are timesliced on our KECs (which are a special > form of Mach thread). When a thread blocks in the kernel, the user mode > scheduler is informed so that a new user thread can be scheduled on the > virtual processor immediately. The nice thing about this model is that > we don't need anything like thr_setconcurrency to keep things running. > Compute-bound user threads can't lock each other out unless one is > SCHED_FIFO policy. And instead of "fixing things up" by adding a new > kernel execution context when the last one blocks (giving you a > concurrency level of 1), we keep you running at the maximum level of > concurrency supportable -- the number of runnable user threads, or the > number of physical processors, whichever is less. > > Neither model (nor implementation) is perfect, and it would be safe to > assume that both Digital and Sun are working on improving every aspect. > The models may easily become very different in the future. > > /---[ Dave Butenhof ]-----------------------[ [email protected] ]---\ > | Digital Equipment Corporation 110 Spit Brook Rd ZKO2-3/Q18 | > | 603.881.2218, FAX 603.881.0120 Nashua NH 03062-2698 | > \-----------------[ Better Living Through Concurrency ]----------------/ -- > Georges Brun-Cottan wrote: > > So recursive mutex is far more than just a hack for lazy programmer or > > just a way to incorporate non MT safe third party code. It is a tool > > that you need in environment such OOP, where you can not or you do not > > want to depend of an execution context. > > Sorry, but I refuse to believe that good threaded design must end where > OOP begins. There's no reason for two independently developed packages > to share the same mutex. There's no reason for a package to be designed > without awareness of where and when mutexes are locked. Therefore, in > either case, recursive mutexes remain, at best, a convenience, and, at > worst (and more commonly), a crutch. > > I created the recursive mutex for DCE threads because we were dealing > with a brand-new world of threading. We had no support from operating > systems or other libraries. Hardly anything was "thread safe". The DCE > thread "global mutex" allowed any thread-safe code to lock everything > around a call to any unsafe code. As an intellectual exercise, I chose > to implement the global mutex by demonstrating why we'd created the > concept of "mutex attributes" -- previously, there had been none. As a > result of this intellectual exercise, it became possible for anyone to > conveniently create their own recursive mutex, which is locked and > unlocked using the standard POSIX functions. There really wasn't any > point to removing the attribute, since it's not that hard to create your > own recursive mutex. > > Remember that whenever you use recursive mutexes, you are losing > performance -- recursive mutexes are more expensive to lock and unlock, > even without mutex contention (and a recursive mutex created on top of > POSIX thread synchronization is a lot more expensive than one using the > mutex type attribute). You are also losing concurrency by keeping > mutexes locked so long and across so much context that you become > tempted to use recursive mutexes to deal with lock range conflicts. > > Yes, it may be harder to avoid recursive mutexes. Although I've never > yet seen a valid case proving that recursive mutexes are NECESSARY, I > won't deny that there may be one or two. None of that changes the fact > that an implementation avoiding recursive mutexes will perform, and > scale, far better than one relying on recursive mutexes. If you're > trying to take advantage of multithreading, all the extra effort in > analysis and design will pay off in increased concurrency. > > But, like any other aspect of performance analysis, you put the effort > where the pay is big enough. There are non-critical areas of many > libraries where avoiding recursive mutexes would be complicated and > messy, and where the overhead of using them doesn't hurt performance > significantly. Then, sure, use them. Just know what you're doing, and > why. > > /---[ Dave Butenhof ]-----------------------[ [email protected] ]---\ > | Digital Equipment Corporation 110 Spit Brook Rd ZKO2-3/Q18 | > | 603.881.2218, FAX 603.881.0120 Nashua NH 03062-2698 | > \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q62: Can one thread read from a socket while another thread writes to it? It's supposed to work! That's certainly how sockets are defined. It's an easy enough test on your own system. =================================TOP=============================== Q63: What's a good way of writing threaded C++ classes? > > Ian Emmons wrote: > > > > Baard Bugge wrote: > > > > > > >How would we put the whole object into a thread? > > > > > > Been there. Done that. Let the constructor create a thread before > > > returning to the caller (another object). But beware, your OS will > > > propably start the thread by calling a function (specified by you) > > > C-style. You want this function to be a member function in your class, > > > which is ok as long as you make it static. The thread function will > > > also need the this-pointer to your newly created object. What you want > > > will look something like this (in NT): > > > > > > // Thread callback function. > > > // NOTE: Need to be written in C or be a static member function > > > // because of C style calling convention (no hidden this pointer) > > > LPTHREAD_START_ROUTINE CThread::ThreadFunc(LPVOID inputparam) > > > { > > > CThread *pseudo_this = (CThread *) inputparam; > > > ... > > > } > > > > > > This function have access to all the members in the object through the > > > pseudo this pointer. And all member functions called by this function > > > will run in the same thread. You'll have to figure out how to > > > communicate with the other objects in your system though. Be careful. > > > > > > -- > > > BaBu > > > > You can take this even a step further. Add a pure virtual to your generic > > CThread class like so: > > > > class CThread > > { > > ... > > protected: > > // I don't remember what Win32 expects as the return value, here, > > // but you can fix this up as you wish: > > virtual unsigned entryPoint() = 0; > > ... > > }; > > > > Then have the static ThreadFunc call it like so: > > > > // Thread callback function. > > // NOTE: Need to be written in C or be a static member function > > // because of C style calling convention (no hidden this pointer) > > LPTHREAD_START_ROUTINE CThread::ThreadFunc(LPVOID inputparam) > > { > > return ((CThread*) inputparam)->entryPoint(); > > } > > > > Now, to create a specific thread, derive from CThread, override entryPoint, > > and you no longer have to mess around with a pseudo-this pointer, because > > the real this pointer is available. > > > > One tricky issue: make sure you differentiate between methods that the > > thread itself will call, and methods that other threads (such as the one > > that created the thread object) will call -- you will need to do thread > > synchronization on class members that are shared data. > > > > Ian > > > > ___________________________________________________________________________ > > Ian Emmons Work phone: (415) 372-3623 > > [email protected] Work fax: (415) 341-8432 > > Persistence Software, 1720 S. Amphlett Blvd. Suite 300, San Mateo, CA 94402 > ------------------ OK, let me warn everyone this is a very long response, but I just came off of a large radar project on which I had to design multithreaded objects so this question jumped out at me. Yousuf Khan wrote in article <[email protected]>... > I got some hypothetical questions here, I'm not actually now trying to > do any of this, but I can see myself attempting something in the near > future. > > Okay, I'm thinking multithreading and OO design methodologies are > tailor-made for each other, in theory at least. OO design mandates that > all object instances are considered concurrent with each other. That > seems like a perfect application of threading principles. However, > current threading protocols (such the POSIX Pthreads, Microsoft/IBM > threads, Sun UI threads, etc.) seem to be based around getting > subprocedures threaded, rather than getting objects threaded. First, let me state my own programming background so you can apply the appropriate grain of salt to what I say and understand my assumptions. I have programmed first for a few years in a DEC, VMS environment and then for several more in a Windows/Windows NT environment. > Okay, I suppose we can get individual methods within an object to be > threaded, because they are just like subprocedures anyways. But what if we > wanted to be extremely pedantic, and we want the entire object to be in > its own thread, in order to be true to OO design paradigms? How would we > put the whole object into a thread? My feeling is that we should just > call the object's constructor inside a thread wrapper, that way the entire > object will go into a thread, including any other methods that are part of > that object. What I guess I'm saying is that will calling the constructor > inside a thread wrapper, only run the constructor inside that thread and > then the thread will end, or will the entire object now run inside that > thread from now on? Am I being oversimplistic in my speculation? If you want to force an object to, as you say, "run in one thread", you would have to be able to make public every member function perform a context switch to the desired thread upon entering the function and switch back upon exiting. You would have to protect all member variables and use Get/Set functions for them that performed context switches as well. Under Windows NT, if you send a message to a window created by a different thread, that context switch is performed for you by the operating system. Your process waits until the ::SendMessage() call completes. Other than using SendMessage(), I do not know how you would accomplish such an operation. And SendMessage requires a window to which the message will be sent. Thus, under NT, you would have to make your object create some kind of hidden window in the context of the desired thread and then have every member function do a ::SendMessage() to that window. (There are variations -- e.g. SendMessageCallback(), PostMessage(), etc for asynchronous function calls) Such a design is possible, and maybe workable, but seems to defeat the purpose of threads, doesn't it? If one thread is just going to have to wait for the special thread every function call, why have the special thread at all? And I haven't even considered OLE and accessing objects across process boundaries, or thread-local storage. (Again, I'm speaking pretty exclusively about the NT threading model here. I've had enough VMS to last me a lifetime and know very little about Posix threads.) It seems your reason for wanting the entire object to run in its own thread is to be true the OO paradigm, but I think that's perhaps too much of a good thing. Why not make your objects completely thread-safe instead? Create some sort of a Single-Writer / Multiple-Reader resource locking object for all objects of the class. Make each member function use this resource guard, acquiring a read-lock if it's a const member function or write-lock if it is not const. There's nothing to prevent you from assigning specific threads to the objects to do background work on them, but as long as all access to the objects is through those safe member functions, they are completely thread safe.. I mention this because this is how I designed a large radar project I just finished working on. I used completely thread-safe, reference counted objects, read/write locks, and smart pointers in my design and the results were far better than my most optimistic hopes. A very fast workstation program with many dynamic displays showing an enormous amount of continously changing data stored in a SQL server database. I've gone on way too long here so I'll end this without saying half of what I want to say. Hope this gives you a few ideas. =================================TOP=============================== Q64: Can thread stacks be built in privately mapped memory? I've avoided any response to this long thread for a while because I'm not sure I want to confuse the issue with facts. And, despite, the facts, I like the idea of people learning to treat thread stacks "as if they might be" private. Nevertheless, at some time I thought it might be helpful to point out what POSIX says about the matter... and I guess this is a good time. POSIX very specifically disallows "non-shared" memory between threads. That is, it requires that the address space is associated with the PROCESS, not with the individual THREADS. All threads share a single virtual address space, and no memory address is private. Stacks, in particular, CANNOT be set up with private memory. Although, for safe programming, you should almost always pretend that it's private. /---[ Dave Butenhof ]-----------------------[ [email protected] ]---\ | Digital Equipment Corporation 110 Spit Brook Rd ZKO2-3/Q18 | | 603.881.2218, FAX 603.881.0120 Nashua NH 03062-2698 | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q65: Has anyone implemented a mutex with a timeout? Has anyone implemented a mutex locking function on top of Solaris or POSIX threads with a timeout? The problem I'm trying to solve is if a thread is unable to obtain a mutex after a certain timeframe (say 30 seconds), then I want the thread to terminate and return an error. The Solaris and POSIX API's only allow the user to check if a mutex can be obtained. Of course! Check out the code for pthread_np_timed_mutex_t at http://www.lambdacs.com/jan-97/examples.html =================================TOP=============================== Q66: I think I need a FIFO mutex for my program... >There are VERY few cases where "lock ordering" is truly necessary. In >general, when it may seem to be necessary, using a work queue to distribute >the work across a pool of threads will be easier and more efficient. If >you're convinced that you need lock ordering, rather than POSIX wakeup >ordering, you have to code it yourself -- using, essentially, a work queue >model where threads wishing to lock your "queued mutex" queue themselves in >order. Use a condition variable and "next waiter" predicate to ensure proper locking order. It's not that hard. Right, and you can find a freely available implementation of essentially a "FIFO Mutex" in ACE. Take a look at http://www.cs.wustl.edu/~schmidt/ACE_wrappers/ace/Token.h http://www.cs.wustl.edu/~schmidt/ACE_wrappers/ace/Token.i http://www.cs.wustl.edu/~schmidt/ACE_wrappers/ace/Token.cpp Doug -- Dr. Douglas C. Schmidt ([email protected]) Department of Computer Science, Washington University St. Louis, MO 63130. Work #: (314) 935-4215; FAX #: (314) 935-7302 http://www.cs.wustl.edu/~schmidt/ You can also find an implementation of FIFO mutexes in the file pthread_np.{h,c} at: http://www.lambdacs.com/jan-97/examples.html =================================TOP=============================== Q67: Why my multi-threaded X11 app with LinuxThreads crashes? > Wolfram Gloger wrote: > > > > [email protected] (Jeff Noll) wrote: > > > > > I'm making an X client that connects to a tcp socket. I'm using a > > > thread to continually read from that socket connection and a text > > > widget to sent to the socket. (an X telnet program that looks kind of > > > like ncftp, seperate input/output windows). When i run this at school > > > under solaris it seems to be fine, but when i take it home and try it > > > under linus using linuxthreads 0.5 it crashes when i start entering > > > into the text window. > > > > Crash as in `fatal' X errors ? A while ago I had a similar experience > > when trying to create a multi-threaded X11 app with LinuxThreads. It > > was quite easy to debug though: the LinuxThreads libpthread library > > lets all threads get individual errno values (like they should), as > > long as all sources are compiled with _REENTRANT defined. > > > > The X11 libs (at least in XFree86-3.2) are by default not compiled in > > this way, unfortunately (note I'm not talking about multiple thread > > support in X11), and they break when using LinuxThreads, e.g. because > > Xlib relies on read() returning with errno==EAGAIN at times. This is > > a problem even when one restricts oneself to using X from a single > > thread only. > > > > Once I recompiled all X11 libs with -D_REENTRANT (totally independent > > of libpthread), everything works fine. I could put those libs up for > > ftp if you're interested to check it out. > > > > Regards, > > Wolfram. > =================================TOP=============================== Q68: How would we put a C++ object into a thread? > > > Been there. Done that. Let the constructor create a thread before > > > returning to the caller (another object). But beware, your OS will > > > propably start the thread by calling a function (specified by you) > > > C-style. You want this function to be a member function in your class, > > > which is ok as long as you make it static. The thread function will > > > also need the this-pointer to your newly created object. What you want > > > will look something like this (in NT): > > > > > > // Thread callback function. > > > // NOTE: Need to be written in C or be a static member function > > > // because of C style calling convention (no hidden this pointer) > > > LPTHREAD_START_ROUTINE CThread::ThreadFunc(LPVOID inputparam) > > > { > > > CThread *pseudo_this = (CThread *) inputparam; > > > ... > > > } > > > > > > This function have access to all the members in the object through the > > > pseudo this pointer. And all member functions called by this function > > > will run in the same thread. You'll have to figure out how to > > > communicate with the other objects in your system though. Be careful. > > > > > > -- > > > BaBu > > > > You can take this even a step further. Add a pure virtual to your generic > > CThread class like so: > > > > class CThread > > { > > ... > > protected: > > // I don't remember what Win32 expects as the return value, here, > > // but you can fix this up as you wish: > > virtual unsigned entryPoint() = 0; > > ... > > }; > > > > Then have the static ThreadFunc call it like so: > > > > // Thread callback function. > > // NOTE: Need to be written in C or be a static member function > > // because of C style calling convention (no hidden this pointer) > > LPTHREAD_START_ROUTINE CThread::ThreadFunc(LPVOID inputparam) > > { > > return ((CThread*) inputparam)->entryPoint(); > > } > > > > Now, to create a specific thread, derive from CThread, override entryPoint, > > and you no longer have to mess around with a pseudo-this pointer, because > > the real this pointer is available. > > > > One tricky issue: make sure you differentiate between methods that the > > thread itself will call, and methods that other threads (such as the one > > that created the thread object) will call -- you will need to do thread > > synchronization on class members that are shared data. > > > > Ian > > > > ___________________________________________________________________________ > > Ian Emmons Work phone: (415) 372-3623 > > [email protected] Work fax: (415) 341-8432 > > Persistence Software, 1720 S. Amphlett Blvd. Suite 300, San Mateo, CA 94402 > =================================TOP=============================== Q69: How different are DEC threads and Pthreads? Mike.Lanni wrote: > > Baard Bugge wrote: > > > > According th the thread-faq, DCE threads (as in HPUX 10.10) is an > > older version of Posix 1003.1c threads (as in Solaris 2.5). > > > > Whats the differences? Is the two of them, fully or partly, source > > code compatible? > > > > I want my multirhreaded code to be cross-compilable on at least the > > two platforms mentioned above, without too many ifdefs. Can I? > > > > -- > > BaBu > > Unfortunately, this is not black and white. If HPUX 10.10 is based on > Draft 7 or higher, the Solaris and HP codes should be similar. However, > if HP 10.10 is based on Draft 4, then there is quite a bit of work to be > done. D4 became popular due to its usage with DCE. Assuming the worst, > D4, here are some notes that I've put together based on some programming > I've done. It is not complete by any means, but it should give you an > idea of what you are up against. > > - signal handling is different > - return codes from pthreads api's are now the real error, vrs. -1 > and errno > - possibly no support for the "non-portable" apis and symbolic > constants > - non support for DCE exception handling > - Some of the pthread_attr_ apis have different types and arguments. > - Some of the scheduling apis have changed. > - Some thread specific api's have changed parameters. > > Below are some mappings that at one time were valid... > > #if defined(_D4_) > #define PTHREAD_ONCE pthread_once_init > #define PTHREAD_ATTR_DEFAULT pthread_attr_default > #define PTHREAD_MUTEXATTR_DEFAULT pthread_mutexattr_default > #define PTHREAD_CONDATTR_DEFAULT pthread_condattr_default > #define INITROUTINE pthread_initroutine_t > #define PTHREAD_ADDR_T pthread_addr_t > #define START_RTN pthread_startroutine_t > #define PTHREAD_YIELD pthread_yield > #define PTHREAD_ATTR_DELETE pthread_attr_delete > #define PTHREAD_ATTR_CREATE pthread_attr_create > #define PTHREAD_MUTEXATTR_DELETE pthread_mutedattr_delete > #define PTHREAD_MUTEXATTR_CREATE pthread_mutedattr_create > #define PTHREAD_CONDATTR_DELETE pthread_condattr_delete > #define PTHREAD_CONDATTR_CREATE pthread_condattr_create > #define PTHREAD_KEYCREATE pthread_keycreate > #define ATFORK atfork > #define SIGPROCMASK sigprocmask > #else > #define PTHREAD_ONCE PTHREAD_ONCE_INIT > #define PTHREAD_ATTR_DEFAULT NULL > #define PTHREAD_MUTEXATTR_DEFAULT NULL > #define PTHREAD_CONDATTR_DEFAULT NULL > #define INITROUTINE void * > #define PTHREAD_ADDR_T void * > #define START_RTN void * > #define PTHREAD_YIELD sched_yield > #define PTHREAD_ATTR_DELETE pthread_attr_destroy > #define PTHREAD_ATTR_CREATE pthread_attr_init > #define PTHREAD_MUTEXATTR_DELETE pthread_mutedattr_destroy > #define PTHREAD_MUTEXATTR_CREATE pthread_mutedattr_init > #define PTHREAD_CONDATTR_DELETE pthread_condattr_destroy > #define PTHREAD_CONDATTR_CREATE pthread_condattr_init > #define PTHREAD_KEYCREATE pthread_key_create > #define ATFORK pthread_atfork > #define SIGPROCMASK pthread_sigmask > #endif > #if defined(_D4_) > rc = pthread_detach(&tid); > rc = pthread_exit(status); > rc = pthread_join(tid, &status); > pthread_setcancel(CANCEL_OFF); > pthread_setcancel(CANCEL_ON); > (void) pthread_setscheduler(pthread_self(),SCHED_FIFO,PRI_FIFO_MAX); > #else > rc = pthread_detach(tid); > rc = pthread_exit(&status); > rc = pthread_join(tid, &status_p); > pthread_setcancelstate(PTHREAD_CANCEL_DISABLE,NULL); > pthread_setcancelstate(PTHREAD_CANCEL_ENABLE,NULL); > struct sched_param param; > param.sched_priority = 65535; > (void) pthread_setschedparam(pthread_self(),SCHED_FIFO,¶m); > #endif /* _D4_ */ > > Hope this helps. > > Mike L. > -------------------------------------------------------------------- > Michael J. Lanni > NCR email: [email protected] > 3325 Platt Springs Road phone: 803-939-2512 > West Columbia, SC 29170 fax: 803-939-7317 > http://www.columbiasc.ncr.com/home_pages/mlanni.html =================================TOP=============================== Q70: How can I manipulate POSIX thread IDs? Steven G. Townsend wrote: > > Jim Robinson wrote: > > > > In article <[email protected]>, Ian Emmons wrote: > > >Robert Patrick wrote: > > >> > > >> Yes, you can copy one pthread_t to another. The part you have to be > > >> careful about is that in some implementations pthread_t is a struct > > >> and in others it is not. Therefore, setting two pthread_t's to be > > >> equal by assignment will not be portable. However, memcpy(a, b, > > >> sizeof(pthread_t)) should always work. > > As to the assignment issue, see Jim's comment below. > As to the first point, assume for the moment that that a part of > the structure is an array (status, pointers to currently allocated > keys, whatever) if anything in the array can change, the "copy" > will not be updated. Assume a status flag/bit which indicates > whether the thread is runnable, looking at the copy could easily > produce different results than the actual value of the "true" > pthread_t. This is just a bad thing to do. Other problems > can occur as well... > What happens if both the original and the copy are passed to > pthread_destroy? > What happens if as we are doing the '=' or memcpy operation the > thread is currently executing on a different processor (i.e. > The contents of the pthread_t object would neeed to be protected > by a mutex)? > When it comes to copying pthread_t s... > Just say 'no'. > > > >> > > >> Just my two cents, > > >> Robert > > > > > >Since I work in C++ exclusively, this isn't an issue for me, and so I never thought > > >about that. For C coders, you're right, of course. > > > > Structure assignment is defined in *ANSI* C. See page 127 of K&R, 2nd > > edition. Since ANSI C has been standardized for quite some time now, it > > should be a non-issue for C coders as well, no? > > > -- > > Jim Robinson > > [email protected] =================================TOP=============================== Q71: I'd like a "write" that allowed a timeout value... Marc Peters wrote: > What would be nice to have is a "write" that allowed a timeout value to be > specified. A la: > > write(fdSocket, bufferPtr, bufferLength, timeSpec); > > If the write doesn't succeed withing the specified timeSpec, then errno > should be set to ETIMEOUT or something. Obviously, this would be quite handy > in network code. > > Due to other circumstances beyond my control, the fdSocket cannot be placed > in non-blocking mode. Thus, the solution I'm left with is to start a POSIX > timer, wait on the blocking write, and check for an errno of EINTR when it > returns (if it timed out). > > I'm aware of the alternate technique of dedicating a thread to dispatching > signal events. This dedicated thread merely does a sigwaitinfo() and > dispatches accordingly. These technique, too, is offensive for such a simple > requirement -- the "timed" write. Why not just do these possibly long writes in separate threads? And, if some "manager" decides they've gone too long, cancel the threads. > I've ordered the POSIX 1003.1 standard to pursue this; however, it will be > several days before it arrives. Can anyone fill me in with some details of > SIGEV_THREAD in the meantime? SIGEV_THREAD creates a thread instead of raising a signal in the more conventional manner. You get to specify the start routine (instead of a signal catching function), and the attributes. The thread runs anonymously (by default it's detached, and if you use an attributes object with detachstate set to PTHREAD_CREATE_JOINABLE, the behavior is unspecified). The main advantage is that your "signal catching function" can lock mutexes, signal condition variables, etc., instead of being restricted to only the short list of async-signal safe functions. In your case, a SIGEV_THREAD action would still constitute a "double dispatch" for your signal. The code wouldn't look much different from your current version. Oh yeah, there's a major disadvantage, for you, in SIGEV_THREAD. Solaris 2.5 doesn't implement SIGEV_THREAD. So you'd have to wait for Solaris 2.6. (Just to be fair, I'll also point out that Digital UNIX 4.0 didn't do SIGEV_THREAD, either -- it is a minor and relatively obscure function, and we all had more important things to worry about. We also caught up to it later, and we'll be supporting SIGEV_THREAD in Digital UNIX 4.0D [or at least, that's what we're calling it now, though these things are always subject to change for various reasons].) /---[ Dave Butenhof ]-----------------------[ [email protected] ]---\ | Digital Equipment Corporation 110 Spit Brook Rd ZKO2-3/Q18 | | 603.881.2218, FAX 603.881.0120 Nashua NH 03062-2698 | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q72: I couldn't get threads to work with glibc-2.0. >I finally got this to compile cleanly, but it stalls (sigsuspend) somewhere >in pthread_create()... If you are using glibc-2.0, you should upgrade to glibc-2.0.1. There is a bug in 2.0 that makes all thread creation fails. Yann Doussot =================================TOP=============================== Q73: Can I do dead-owner-process recovery with POSIX mutexes? el Gringo wrote: > > Hi. > > I am trying to create a mutxe on a program that has to work on NT 4.0 and AIX. > For NT, I use CreateMutexes...etc, and in this case, if the process owning the > mutex crashes, the system releases the mutex and returns WAIT_ABONDONED to the > thread that is waiting for the mutex to be released. And if the mutex is > opened several times by the same thread, the call succeses and the mutex count > is incremeneted. > > What I don't know is if the pthread mutexes do the same thing when a thread or > process owning the mutexe crashes...or when the Pthread_mutex_lock() is called > several times by the same one. Could someone provide me with a doc or web site > so I can find those answers ? Thanks. Riad Riad, Ah... You got a *big* set of problems to deal with. The simple answer is that POSIX mutexes don't do that, but that you can create that kind of behavior if you want to. The problems I refer to are those surrounding what you do when the owner process crashes. There is a massive amount of work to be done to ensure that you don't have corrupted data after you get WAIT_ABANDONED. Unless you've already taken care of this, you've got a h*** of a lot of work to do. So... the best answer is go find an expert in this area (or spend a month becoming one) and hash out the issues. Building the mutex will be the easiest part of it. Good luck. -Bil =================================TOP=============================== Q74: Will IRIX distribute threads immediately to CPUs? Michel Lesoinne wrote: >The first question concerns new and delete under C++ as well as malloc. >Are these functions thread-safe? ------------------------------------------------------------------------ Yes, these calls are thread-safe. >The second question has to do with multi-CPU machines. I have noticed >that POSIX thread do not get shipped to other CPU's immediately after >being started. For example if you bring up gr_osview and watch the CPU >usage as you start 4 paralll threads, it takes approximately 1 second >for the 4 threads to run in parallel rather than on the same CPU. Worse, ------------------------------------------------------------------------ The current pthread implementation only creates additional sprocs as it deems necessary given the application activity. Interaction between pthreads which may require context switching can lower the requirement while CPU-bound threads will raise it. There can be a short delay "ramping-up" before the ideal number of sprocs are active. The kernel is responsible for scheduling the sprocs on CPUs. Thus you may be seeing 2 effects (though 1 second seems a little long to me). >Is there a way to force IRIX to distribute the threads immediately? ------------------------------------------------------------------------ Currently there is no way to force this behaviour though we expect to add tuning interfaces in the future. You may try experimenting by setting the environment variable PT_ITC as a hint to the library that your app is CPU bound. sincerely, ..jph... =================================TOP=============================== Q75: IRIX pthreads won't use both CPUs? Dirk Bartz wrote: >I've written a parallel program using pthreads on two processors >which consists of two parallel stages. >In first stage reads jobs from a queue (protected by a mutex) and >should process them parallelly (which it doesn't); the second >stage works fine. > >Now, some debugger sessions of the first stage show that >both pthreads are started, but the first one is being blocked >most of the time. The cvd says: > 0 _procblk() ["procblk.s":15, 0x0fab4d74] > 1 _blockproc( ) ["blockproc.c":23, 0x0fab56e0] > 2 vp_idle( ) ["vp.c":1702, 0x07fe61e8] > >It seems that the first pthread is only sharing one processor >with the second thread. It is *not* blocked at the mutex! > >Does anyone has a clue what happend? ------------------------------------------------------------------------ Hi, First of all that curious backtrace is from one of the underlying sprocs on which the pthreads execute. As you can see it is currently idle and blocked in the kernel waiting for the library to activate it when more work (a pthread) is ready to run. If you use cvd showthread all command it will show you the pthread state which should in your case be MUTEX-WAIT for the pthread of interest. If you then backtrace that pthread you should see it blocked in the mutex locking code. A second point to note is that the pthreads library attempts to use an appropriate number of sprocs for its scheduling. If your application creates 2 CPU-bound threads then on an MP machine 2 sprocs will be created to run the threads. On a UP only one sproc will be created and will switch between the two pthreads. On an MP where the threads are not CPU-bound the problem is more complex; when 2 pthreads are tightly synchronised then a single sproc may be a better choice - this may be what you are seeing. I hope the above explains what you are seeing. sincerely, ..jph.... =================================TOP=============================== Q76: Are there thread mutexes, LWP mutexes *and* kernel mutexes? > In a typical "two level" scheduling scheme, say solaris, > synchronization primitives used at the thread level (POSIX or solaris) > are provided by the user level scheduler library. At the LWP level, > are there any synchronization primitives, and if so, where would one > use those as opposed to using the user level library primitives? > Ofcourse, there would be some synchronization primitives for the > kernel use. Does it mean that there are 3 distinct set of primitives > (user level, LWP level and kernel level)? Can anyone throw some light > on the LWP level primtives (if any) and point out where these would be > useful? You may remember that scene in the Wizard of Oz, where Toto runs away in panic at the sight of the Powerful Oz. He discovers a little man running the machinery behind a curtin. The catch-line was "Pay no attention to that man behind the curtin." Same thing here. You call pthread_mutex_lock() and it does whatever it needs to so that things work. End of story. But if you *really* want to peek... If the mutex is locked, then the thread knows that it needs to go to sleep, and it calls one set of routines if it's an unbound thread, another if it's bound. (If you hack around inside the library code, you'll be able to see the guts of the thing, and you'll find calls to things like _lwp_mutex_lock(). You will NEVER call those!) Now, as for kernel hacking, that's a different picture. If you are going to go into the kernel and write a new device driver or fix the virtual memory system, you'll be working with a different interface. It's similar to pthreads, but unique to each kernel. The older OSs didn't even HAVE a threads package! =================================TOP=============================== Q77: Does anyone know of a MT-safe alternative to setjmp and longjmp? =================================TOP=============================== > I am taking an operating systems class; therefore, my > question will sound pretty trivial. Basically, I am > trying to create thread_create, thread_yield, and thread_exit > functions. Basically, I have two files. They compile fine and > everything but whenever I try to run the program I get the error: > "longjmp or siglongjmp function used outside of saved context > abort process" > All I know is that we are running this on an alpha machine at > school [...] > > Anyway, I just want to know if anyone has ever tried to do a longjmp from a > jmp_buf that was not the same as that used in setjmp. > > The runtime environment provided with some operating systems (e.g., > Ultrix or whatever DEC `Unix' is called these days) performs an > explicit check that the destination stack frame is an ancestor of > the current one. On these systems you cannot use setjmp/longjmp > (as supplied) to implement threads. > > On systems whose longjmp is trusting, setjmp/longjmp is a very common > way of building user-space threading libraries. This particular wheel > has been reinvented many times. > > If you know the layout of a jmp_buf, you *can* use setjmp but you will > have to implement a compatible longjmp yourself in order to change the > processor context to that of the next task. If you have a > disassembler you might be able to reverse engineer a copy of longjmp > with the check disabled. > > *I* would consider this outside the scope of such an exercise but your > professor may disagree. > > Steve > -- > Stephen Crane, Dept of Computing, Imperial College of Science, Technology and > Medicine, 180 Queen's Gate, London sw7 2bz, UK:jsc@{doc.ic.ac.uk, icdoc.uucp} > Unix(tm): A great place to live, but a terrible place to visit. =================================TOP=============================== Q78: How do I get more information inside a signal handler? Mark Lindner wrote: > > I'm writing a multithreaded daemon that supports dynamic runtime loading > of modules (.so files). I want it to be able to recover from signals such > as SIGSEGV and SIGFPE that are generated by faulty module code. If a given > module causes a fault, I want the daemon to unload that module so that > it's not called again. > > My problem is that once a signal is delivered, I don't know which worker > thread it came from, and hence I have no idea which module is faulty. The > O'Reilly pthreads book conveniently skirts this issue. I poked around on > the system and found the getcontext() call; I tried saving the context for > each worker thread, and then using the ucontext_t structure passed as the > 3rd argument to the signal handler registered by sigaction(), but > unforunately I can't find anything that matches...the contexts don't even > appear to be the same. > > Since the behavior of pthreads calls is undefined within a signal handler, > I can't use pthread_self() to figure out which thread it is either. > > All examples I've seen to date assume that either: > > a) only one thread can generate a given signal > > or > > b) two or more threads can generate a given signal, but the signal handler > does the same thing regardless of which thread generated it. > > My situation doesen't fall into either of these categories. > > Any help would be appreciated. > > -- > > Cheers! > Mark > > ------------------------------------------------------------------------------ > [email protected] | http://love.geology.yale.edu/~markl/ > ------------------------------------------------------------------------------ > I looked up > As if somehow I would grasp the heavens > The universe > Worlds beyond number... > ------------------------------------------------------------------------------ =================================TOP=============================== Q79: Is there a test suite for Pthreads? Re: COMMERCIAL: Pthreads Test Suite Available The Open Group VSTH test suite for Threads implementations of POSIX 1003.1c-1995 and the X/Open System Interfaces (XSH5) Aspen threads extensions is now generally available. For further information on the test suite see http://www.rdg.opengroup.org/testing/testsuites/vsthover.htm For information on the Aspen threads extensions see http://www.rdg.opengroup.org/unix/version2/whatsnew/threads.html > Andrew Josey, Email: [email protected] > #include =================================TOP=============================== Q80: Flushing the Store Buffer vs. Compare and Swap Just looking at the CAS and InterLockedXXX instructions... "Hey!" says I to myself, "Nobody's minding the store buffer!" A couple of people have shown some examples of using InterLockedXXX in Win32, but they never address memory coherency! So, if they implement a mutex with InterLockedExchange: lock(int *lock) {while (InterLockedExchange(lock, 1) == 1) sleep();} unlock(int *lock) {*lock = 0;} at unlock time, some changed data might not be written out to main memory. Hence we need this: unlock(int *lock) { FlushStoreBuffer(); *lock = 0; } Or is there something about x86 that I don't know about here? =================================TOP=============================== Q81: How many threads CAN a POSIX process have? Dave Butenhof wrote: > > Bryan O'Sullivan wrote: > > > > r> _POSIX_THREAD_THREADS_MAX that claims to be the maximum threads per > > r> process. > > > > As I recall, this is a minimum requirement. Solaris certainly > > supports far more than 64 threads in a single process, and I'm sure > > that Irix does, too. > > POSIX specifies two compile-time constants, in , for each > runtime limit. One is the MINIMUM value of that MAXIMUM which is > required to conform to the standard. _POSIX_THREAD_THREADS_MAX must be > defined to 64 on all conforming implementations, and all conforming > implementations must not arbitrarily prevent you from creating at least > that many threads. > > The symbol PTHREAD_THREADS_MAX may ALSO be defined, to the true limit > allowed by the system, IF (and only if) that limit is fixed and can be > predicted at compile time. (The value of PTHREAD_THREADS_MAX must be at > least 64, of course.) I don't know of any systems that define this > symbol, however, because we don't implement any fixed limit on the > number of threads. The limit is dynamic, and dictated purely by a wide > range of resource constraints within the system. In practice, the only > way to predict how many threads you can create in any particular > situation is to bring a program into precisely that situation and count > how many threads it can create. Remember that the "situation" includes > the total size of your program text and data, any additional dynamic > memory used by the process (including all shared libraries), the virtual > memory and swapfile limits of the current system, and, in some cases, > the state of all other processes on the system. > > In short, the concept of "a limit" is a fiction. There's no such thing, > without knowing the complete state of the system -- rarely practical in > real life. > > Oh, and by the way, there's no guarantee (in POSIX or anywhere else) > that you can create even 64 threads. That just means that the system > cannot arbitrarily prevent you from creating that many. If you use up > enough virtual memory, you may be unable to create even one thread. > That's life. > > As Bryan said, you can normally rely on being able to create hundreds, > and usually thousands, of threads on any of the current 2-level > scheduling POSIX threads implementations. Kernel-level implementations > are typically more limited due to kernel quotas on the number of kernel > thread slots available for the system and often for each user. =================================TOP=============================== Q82: Can Pthreads wait for combinations of conditions? > Is there any way in Pthreads to wait for boolean combinations of conditions > (i.e. wait for any one of a set of conditions or wait until all of a set of > conditions have occurred). I'm looking for a feature similar to the VMS > Wait for logical OR of event flags or the OS/2 multiplexed semaphores. You mean something like this: void *consumer(void *arg) {request_t *request; while(1) {pthread_mutex_lock(&requests_lock); while ((length == 0) && (!stop)) <-- While both are true, sleep pthread_cond_wait(&requests_consumer, &requests_lock); if (stop) break; request = remove_request(); length--; pthread_mutex_unlock(&requests_lock); pthread_cond_signal(&requests_producer); process_request(request); } pthread_mutex_unlock(&requests_lock); sem_post(&barrier); pthread_exit(NULL); } Or perhaps: while ( ((length == 0) && (!stop)) || (age_of(granny) > 100) || (no_data_on_socket(fd) && still_alive(client)) || (frozen_over(hell)) ) pthread_cond_wait(&requests_consumer, &requests_lock); Nope. Can't be done :-) Now if you're thinking about something that involves blocking, it may be a bit trickier. In OS/2 or Win32 you might think in terms of: WaitForMultipleObjects(Mutex1 and Mutex2) you'll have to do a bit extra. Perhaps you'll have two different threads blocking on the two mutexes: Thread 1 pthread_mutex_lock(Mutex1); M1=TRUE; pthread_cond_signal(&requests_consumer); Thread 2 pthread_mutex_lock(Mutex2) M2=TRUE; pthread_cond_signal(&requests_consumer); Thread 3 while (!M1 || !M2) pthread_cond_wait(&requests_consumer, &requests_lock); I think this looks sort of ugly. More likely you'll find a better way to structure your code. =================================TOP=============================== Q83: Shouldn't pthread_mutex_trylock() work even if it's NOT PTHREAD_PROCESS_SHARED? Curt, I infer your're trying to get around the lack of shared memory SVs in some of the libraries by only using trylock? I can't say that I approve, but it ought to work... In the code example below I hacked up an example which does seem to do the job. I can't tell you what you were seeing in your tests. Hmmm... Just because this particular hack works on one OS does not mean that it will necessarily work on another. (Let's say I wouldn't stake MY job on it!) What about using shared semaphores? Maybe SysV semaphores? -Bil > HI, > > I'm having a problem with pthread_mutex_unlock () on Solaris 2.5 for a > pthread_mutex_t inited in a shared memory structure between 2 tasks. > > I get pthread_mutex_trylock (lockp) to return zero, and both tasks > agree the mutex is locked. > > When the owning task calls pthread_mutex_unlock (lockp), it returns > zero, but the other task's pthread_mutex_trylock (lockp) still believes > the mutex is locked?? > > FAQ location or help? Thanks. > > Heres how I initted the pthread_mutex_t struct: > > In a shared memory struct: > > pthread_mutex_t mutex_lock = PTHREAD_MUTEX_INITIALIZER; > > Then either task may call: > > pthread_mutex_trylock (&mutex_lock) > ...work... > pthread_mutex_unlock (&mutex_lock) > > I've had little luck with pthread_mutexattr_setpshared () to init for > the "shared" scenario (core dumped) and especially that this is a Sun'ism > that doesn't exist in DEC Unix 4.0b, which is another requirement, that > the solution be portable to all/most Unix'es with threads. > > Thanks. > > Curt Smith > [email protected] sleep(4-i/2); /* wait a second to make it more interesting*/ if (!err) {pthread_mutex_unlock(&buf->lock2); printf("Unlocked by parent\n"); } } printf("Parent PID(%d): exiting...\n", getpid()); exit(0); } main(int argc, char *argv[]) {int fd; pthread_mutexattr_t mutex_attr; /* open a file to use in a memory mapping */ fd = open("/dev/zero", O_RDWR); /* Create a shared memory map with the open file for the data structure which will be shared between processes. */ buf=(buf_t *)mmap(NULL, sizeof(buf_t), PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); /* Initialize the counter and SVs. PTHREAD_PROCESS_SHARED makes them visible to both processes. */ pthread_mutexattr_init(&mutex_attr); /* pthread_mutexattr_setpshared(&mutex_attr, PTHREAD_PROCESS_SHARED); */ pthread_mutex_init(&buf->lock2, &mutex_attr); if (fork() == 0) my_child(); else my_parent(); } =================================TOP=============================== Q84: What about having a NULL thread ID? This is part of an on-going discussion. The POSIX committe decided not to do this. It is, of course, possible to implement non-portable versions yourself. You would have to have a DIFFERENT version for different OSs. -1L works fine for Solaris 2.5, and IRIX 6.2, but not HP-UX 10.30, which requires (I think!) {-1, -1, -1}. BE CAREFUL!!! Now for the discussion... Ben Self wrote: > > Ian Emmons wrote: > > > > So why not support a portable "null" value? This > > could be done via a macro that can be used to initialize a pthread_t (just > > like the macro that initializes a mutex), or it could be done via a couple > > of functions to set and test for the null value. Or, the POSIX folks could > > do as they are used to doing, and make us code around their ommissions with > > YAC (Yet Another Class). > > > > Ian > > > As I stated before I think that it is a very natural thing to want to > do. In my experience POSIX's omissions are usually more interesting > than simple oversights. Often an existing industry code base stands in > the way or it was deemed too 'trivial' a matter of user code to merrit > imposing any restriction on implimentation. in any case, for all > intents and purposes, it is a done deal. > > As Dave Butenhof candidly exposes a few posts down: > > "due to > overwhelming agreement that it was a bad (and unnecessary) complication. > The Aspen committee added suspend/resume to UNIX98 -- but the functions > were later removed with no significant objections. > > There simply is no significant industry concensus supporting these > functions. (And that for many very good technical reasons as well as > some silly political reasons." > > that is exactly how POSIX works. gotta love 'em > > --Ben =================================TOP=============================== Q85: Explain Traps under Solaris Jim Moore - Senior Engineer, SunSoft wrote: Email : [email protected] | DNRC: The Day Cometh SunSoft Technical Support (Europe) | "adb is your friend" --------------------------------------------------------------- SPARC traps under SunOS (Solaris) -=-=-=-=-=-=-=-=-=-=-=- By: Jim Moore, SunSoft, Sun Microsystems Inc Email: [email protected] CONTENTS 1 Introduction 1.1 Who should read this document? 2 What is a trap? 2.1 How traps are caused 2.1.1 Precise Traps 2.1.2 Deferred Traps 2.1.3 Disrupt/Interrupt Traps 2.2 How traps are dispatched to the kernel 2.2.1 SPARC v7/v8 2.2.2 SPARC v9 2.2.2.1 Processor states, normal and special traps 2.2.2.2 Normal Trap (Processor in Execute State) 2.2.2.3 Special Trap (Processor in RED State) 3 Traps - How SunOS Handles Them 3.1 Generic Trap Handling 3.2 Register Windows 3.3 Interrupts -=-=-=-=-=-=-=-=-=-=-=-=- 1 INTRODUCTION This document describes what traps are, how they work and how they are handled by the SunOS kernel. We will look at traps for SPARC versions v7, v8 and v9 (v7 and v8 traps are essentially identical). In places, we will have to differentiate between the v8 and v9 quite extensively as there are significant differences between the two. I assume that the readers are familiar with SPARC registers but I will give some expansion on the more obscure ones as I go :-) Finally, I have made every effort to make this accurate as well as informative but at the same time without too lengthy descriptions. Even so, in parts it may be heavy going and plain ascii doesn't leave much scope for clear diagrams. Feel free to post or email questions and comments! 1.1 Who should read this document? Anyone who wants to know more about traps in detail on the SPARC architecture. I strongly recommend that you refer to one of these two books for more information: The SPARC Architecture Manual, Version 8 ISBN 0-13-099227-5 The SPARC Architecture Manual, Version 9 ISBN 0-13-825001-4 as they contain more detail on some of these topics. 2 WHAT IS A TRAP? The design of SPARC as a RISC processor means that a lot of the functionality that is normally controlled by complex instructions has to be done by supervisor (kernel) software. Examples of these could be memory exception handling or interrupt handling. When a situation arises that requires special handling in this way, a trap occurs to cause the situation to be handled. We'll look at this mechanism in more detail in this section. 2.1 How Traps are Caused Traps can be generated for a number of reasons and you can see a list of traps under /usr/include/sys/v7/machtrap.h or under /usr/include/sys/v9/machtrap.h for SPARC v7/v8 and v9 respectively. A trap can be caused either by an exception brought about by the execution of an instruction or by the occurrence of some external interrupt request not directly related to the instruction. When the IU (Integer Unit, the part of the CPU that contains the general purpose registers, does integer math and executes the instructions) is about to execute an instruction it first checks to see if there are any exception or interrupt conditions pending and, if so, it selects the highest priority one and causes a trap. Traps are also used to signal hardware faults and malfunctions, for example a level15 asynchronous memory fault. In some fatal conditions execution cannot continue and the machine will halt or the supervisor software will handle the trap by panicing. Next, we'll take a generic look at the different trap categories but we'll go into the version specific details later on. 2.1.1 Precise Traps A precise trap is brought about by an exception directly caused by the executing instruction. This trap occurs before there is any tangible change in the program state of the program that contained the trapped instruction. 2.1.2 Deferred Traps A deferred trap is similar to a precise trap but in this case the program-visible state may have changed by the time the trap occurs. Such a trap may in theory occur one or more instructions after the trap inducing instruction has executed but it must occur before any subsequent instruction attempts to use any modified register or resource that the trap inducing instruction used. Did you get that? Hmm. Okay, here's an example. Imagine that a floating point operation is being executed. This does not happen synchronously with IU instructions and so it is possible that a floating point exception could occur as a deferred trap. 2.1.3 Disrupt/Interrupt Traps An interrupt trap, as you have probably guessed, is basically the assertion of an interrupt, either generated externally (from hardware) or internally (via software). The delivery of interrupts is controlled by the PIL (Processor Interrupt Level) field of the PSR (Processor State Register), which specifies the minimum interrupt level to allow, and also by the mask of asserted bits in the IE (Interrupt Enable register...architecture specific). Under SPARC v9, we have a concept of a disrupt trap. This is very similar to a deferred trap in that it could be related to an earlier instruction but in this case the trap is an unrecoverable error. 2.2 How Traps are Dispatched to the Kernel In this section we will look at the flow of execution into the kernel when a trap occurs. This is different for SPARC v7/v8 and v9 so we will split this section into two. 2.2.1 SPARC v7/v8 When a trap occurs, the flow of execution jumps to an address which is calculated from the TBR (Trap Base Register) and the Trap Type, hereafter referred to as TT. The sequence is as follows: 1. An exception/interrupt has been detected as pending by the IU 2. The IU multiplies the TT by 16 (TT << 0x4) as there are 4 instructions per trap table entry. 3. The IU loads the address of the trap table (from the TBR) and adds the offset calculated in (2). 4. The CWP (Current Window Pointer) is decremented, so that we are in a new register window. 5. The trapped instruction (%pc) and the next instruction to be executed (%npc) are written into local registers %l1 and %l2 6. Traps are disabled and the current processor mode is set to "supervisor". This is done by setting the ET bit to zero and the supervisor mode bit to one in the PSR (refer to the PSR description in /usr/include/v7/sys/psr.h). 7. Execution resumes at [TBR + (TT<<4)], as calculated in (3) Part of the SunOS kernel code is a trap table, which contains 255 4-instruction entries, each entry corresponding to a trap type from 0 to 0xff. This structure is defined by the SPARC implementation. Each trap table entry basically contains a branch to a trap handling routine and may also load the PSR into a local register for use later. Here's an example of a trap table entry: sethi %hi(trap_handler), %l3 ! Load trap handler jmp [%l3 + %lo(trap_handler)] ! address and jump mov %psr, %l0 ! Delay: load %psr nop 2.2.2 SPARC v9 The SPARC v9 case is quite different from the v7/v8 case mainly due to the concept of processor states and trap nesting. We still use a trap table concept under v9 but the destination address for the transfer of execution is calculated differently. Also, trap table entries for v9 are 8 instructions in size, except for spill/fill traps, in which case the entries are 32 instructions in size. Also, in a special state called the RED state (more on that later) we actually use a different trap table! Pretty different, huh? The trap table is divided into three parts. The first half of the table is used for machine generated traps. The next quarter is reserved for software initiated traps and the final quarter is reserved for future use. The displacement into the trap table is defined by Trap Level (TL) and the Trap Type (TT) together. Let's take a look at this in some more detail. I strongly advise that you obtain a copy of the version 9 SPARC architecture manual if you want to follow this in detail. When a trap occurs, the action taken depends on the TT, the current level of trap nesting (contained in the TL) and the processor state at that time. Let's look at processor states and what we mean by normal and special traps so that the rest of this section has more chance of making sense! 2.2.2.1 Processor States, Normal and Special Traps The SPARC v9 processor is always in one of three states and these are: 1. Execute state. This is the normal execution state. 2. RED state. RED = Reset, Error and Debug. This is a state that is reserved for handling traps when we are at the penultimate level of trap nesting or for handling hardware or software interrupts. 3. Error state. This is a state that is entered when we have a trap occur at a point in time when we are at our maximum level of trap nesting (MAXTL) or an unrecoverable fatal error has occurred. Normal traps are traps that are processed when we are in the nice cosy execute state. If we trap in RED state, then this is a special trap. There is an implementation dependent address called RSTVaddr which contains the vector to the RED state trap table. This vector could be set to overlay the same one in the TBR. For a given trap in RED state we vector as follows: TT Vector Reason 0 RSTVaddr|0x0 SPARC v8 style reset 1 RSTVaddr|0x20 Power On Reset (POR) 2 RSTVaddr|0x40 Watchdog Reset (WDR) 3 RSTVaddr|0x60 Externally Initiated Reset (XIR) 4 RSTVaddr|0x80 Software Initiated Reset (SIR) Others RSTVaddr|0xa0 All other traps in RED state A fatal exception that causes us to drop into error state will cause the processor to note the exception and either halt, reset or watchdog reset. After the reset, the processor enters RED state with a TL appropriate to the type of reset (usually maximum). Also, the TT is set to the value of the original trap that caused the reset and NOT the TT value for the reset itself (ie. WDR - Watchdog reset or XIR - Externally Indicated Reset). Now that we have a concept of the different traps and processor states, let's look at the sequence of execution when a trap occurs to deliver the trap to the supervisor (kernel). 2.2.2.2 Normal Trap (Processor in Execute State) 1. If TL = MAXTL-1, the processor enters RED state (Goto 2.2.2.3). 2. TL = TL + 1 3. Processor state, %pc, %npc, CWP, CCR (Condition Code Register), TT and ASI (Address Space Identifier register) are saved. 4. The PSTATE (Processor State) register is updated as follows: a) RED field set to zero b) AM (Address Masking) disabled c) PRIV (Privileged Mode) enabled d) IE cleared, disabling interrupts e) AG set (Alternate Global Registers enabled) f) Endian mode set for traps (TLE) Refer to the architecture manual for a description of PSTATE 5. If TT is a register window trap, CWP is set to point to the register window to be accessed by the trap handler code. Possibilities are: a) TT = 0x24 (Clean Window), CWP = CWP + 1 b) TT <= 0x80 AND TT <= 0xbf (Window Spill), CWP = CWP + CANSAVE + 2. CANSAVE is a register that contains the number of register windows following the CWP that are NOT in use. c) TT <= 0xc0 AND TT <= 0xff (Window fill), CWP = CWP - 1 For non-register window traps, CWP remains unchanged. 6. Control is transferred to the trap table at an address calculated as follows: New %pc = TBA | (TL>0 ? 1: 0) | TL New %npc = TBA | (TL>0 ? 1: 0) | TL | 0x4 Remember, TBA = Traptable Base Address, similar to the TBR in v8 Execution then resumes at the new %pc and %npc 2.2.2.3 Special Trap (Processor in RED State) 1. TL = MAXTL 2. The existing state is preserved as in 2.2.2.2, step 3. 3. The PSTATE is modified as per 2.2.2.2, step 4 except that the RED field is asserted. 4. If TT is a register window trap, CWP processing occurs as in 2.2.2.2, step 5. 5. Implementation specific state changed may occur. For example, the MMU may be disabled. 6. Control is transferred to the RED state trap table subject to the trap type. Look back to 2.2.2.1 for the RSTVaddr information to see how this vector is made. This may seem rather complicated but once you have the picture built clearly it will all fall into place. Post or email if you need clarification. 3 TRAPS - HOW SUNOS HANDLES THEM In this section we will look at how SunOS handles traps and look at some of the alternatives which were available. Despite all the differences between SPARC v8 and v9 traps I'll do a fairly generic description here as it really isn't necessary to describe in detail what SunOS does for v9 traps as you can see from the previous section what the differences in trap processing are. Suffice to say that the SunOS kernel adheres to those rules. Instead, we'll concentrate on the principles used by the kernel when handling various traps. 3.1 Generic Trap Handling We'll look at some specifics in a moment but first we'll cover the generic trap handling algorithm. When traps are handled, the typical procedure is as follows: 1. Check CWP. If we need to handle the trap by jumping to 'C' (which would use save and restore instructions between function calls) then we must make sure we won't have cause an overflow when we dive into 'C'. If we do detect that this would be a problem we do the overflow processing now. 2. Is this an interrupt? If so, jump to the interrupt handler. Refer to section 3.3 on interrupts. 3. Enable traps and dive into the standard trap handler. We enable traps so that we can catch any exceptions brought about by handling *this* trap without causing a watchdog reset. 4. On return from the trap handler, we check the CWP with the CWP we came in with at the start to see if we have to undo the overflow processing we might have done before, so that we don't get an underflow when we return to the trapped instruction (or worse, execution continues in the WRONG window). 5. Before we actually return to the trapped instruction, we check to see if kprunrun is asserted (ie. a higher priority lightweight process is waiting to run). If so, we allow preemption to occur. Traps are used by SunOS for system calls as well as for machine generated exceptions. The parameters to the system call are placed in the output registers, the number of the system call required (see /usr/include/sys/syscall.h) is placed in %g1 and then it executes a "ta 0x8" instruction. This appears in the kernel as a trap with TT = 0x88 and the system trap handler determines this to be a system call and calls the relevant function as per the system call number in %g1. Occasionally, a process will attempt to execute from a page of VM that is not mapped in (ie. it is marked invalid in the MMU) and this will cause a text fault trap. The kernel will then attempt to map in the required text page and resume execution. However, if the process does not have the correct permissions or the mapping cannot be satisfied then the kernel will mark a pending SIGSEGV segmentation violation against that process and then resume execution of the process. A similar scenario applies to data faults; a process attempts to read or write to an address in a page marked invalid in the MMU and the kernel will attempt to map in the corresponding page for this address if possible (ie. maybe the page has been swapped out or this is the first attempt to read from that page and so we demand-page it in). I'll explain all this in detail in another text on process address spaces, paging and swapping which I plan to do as soon as I get time. A "bad trap" is simply a trap that cannot be handled (or isn't supported). Usually under SunOS a bad trap has a type of 9 or 2, for data or text fault respectively (maybe 7 for alignment in some cases). 3.2 Register Windows =================================TOP=============================== Q86: Is there anything similar to posix conditions variables in Win32 API ? Ian Emmons wrote: > > Dave Butenhof wrote: > > > > kumari wrote: > > > > > > Is there anything similar to posix conditions variables in Win32 API ? > > > Thanks in advance for any help. > > > -Kumari > > > > The answer depends very much upon what you mean by the question. Win32 > > has "events", which can be used to accomplish similar things, so the > > answer is clearly "yes". Win32 events, however, behave, in detail, very > > differently, and are used differently, so the answer is clearly "no". > > Which answer do you prefer? ;-) > > Good answer, Dave. This is one of the most frustrating things about the > Win32 threading API. CV's are incredibly powerful and fairly easy to use, > but Win32 unfortunately ommited them. > > In WinNT 4.0, there is a new API called SignalObjectAndWait which can be used > to implement a CV pretty easily. There are two problems: > > (1) This API is not available on WinNT 3.51 or Win95. Hopefully it will show > up in Win97, but I don't know for sure. > > (2) Using this API with a mutex and an auto-reset event, you can create a > CV-lookalike where PulseEvent will behave like pthread_cond_signal, but there > is no way to immitate pthread_cond_broadcast. If you use a mutex and a > manual event, PulseEvent will behave like pthread_cond_broadcast, but there > is no way to immitate pthread_cond_signal. (Sigh ...) > > I know ACE has a Win32 CV that works in general, but I seem to recall Doug > Schmidt saying that it's very complex and not very efficient. > > Ian =================================TOP=============================== Q87: What if a cond_timedwait() times out AND the condition is TRUE? [This comment is phrased in terms of the JAVA API, but the issues are the same. -Bil] > After thinking about this further even your simple example can fail. > Consider this situation. A number of threads are waiting on a condition, > some indefinitely, some with timeouts. Another thread changes the > condition, sets your state variable and does a notify(). One of the waiting > threads is removed from the wait-set and now vies for the object lock. At > about the same time a timeout expires on one of the other waiting threads > and it too is removed from the wait-set and vies for the lock - it gets the > lock! This timed-out thread now checks the state variable and wrongly > concludes that it received a notification. > > In more complex situations where there are multiple conditions and usage of > notifyAll() even more things can go wrong. You are correct with everything you say, right up until the very last word. The behavior is NOT wrong. It may not be what you *expected*, but it's not wrong. This is a point that's a bit difficult to get sometimes, and it drives the real-time crowd to distraction (as well it should), but for us time-shared folks, it's cool. When you get a time-out, you've got a choice to make. Depending upon what you want from your program, you may choose to say "Timed-out! Signal error!" or you may choose to check the condition and ignore the time out should it be true. You're the programmer. A important detail here... Everything works CORRECTLY. In particular, if a thread receives a wakeup, it is removed from the wait queue at that point and CANNOT subsequently receive a timeout. (Hence it may take another hour before it obtains the mutex, but that's OK.) A thread which times out will also be removed from the sleep queue and a subsequent wakeup (pthread_cond_signal()) will be delivered to the next sleeping thread (if any). =================================TOP=============================== Q88: How can I recover from a dying thread? [OK. So that's not *exactly* the question being addressed here, but that's the most important issue. -Bil] David Preisler wrote: > > I wish to create an *efficient* and *reliable* multi process/multi threaded > algorithm that will allow many simultaneous readers (for efficiency) to access > a block of shared memory, but allows one and only one writer. > > How could a read counter be guaranteed to be always correct even if your read > thread or process dies??? Sorry to disillusion you, but this is impossible. Remember that you're talking about SHARED MEMORY, and assuming that SOMETHING HAS GONE WRONG with some party that has access to this shared memory. Therefore, there is no possibility of any guarantees about the shared memory -- including the state of the read-write lock. You can approximate the guarantees you want by having some third party record the identity of each party accessing the lock, and periodically validate their continued existence. You could then, (assuming you'd coded your own read-write lock that allowed this sort of manipulation by a third party), "unlock" on behalf of the deceased read lock holder. Just remember that the fact that the party had DECLARED read-only intent, by locking for read access, doesn't guarantee that, in the throes of death, it couldn't have somehow unintentially modified the shared data. A read-lock really is nothing more than a statement of intent, after all. And how far do you wish to trust that statement from a thread or process that's (presumably equally unintentially) blowing its cookies? /---------------------------[ Dave Butenhof ]--------------------------\ | Digital Equipment Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cp/butenhof/posix.html | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q89: How to implement POSIX Condition variables in Win32? Subject: Re: How to implement POSIX Condition variables in Win32 (LONG) Douglas C. Schmidt wrote: the following function creates and initializes a condition variable. int pthread_cond_init (pthread_cond_t *cv, const pthread_condattr_t *); { cv->waiters_ = 0; cv->generation_count_ = 0; cv->release_count_ = 0; // Create a manual-reset Event. cv->event_ = ::CreateEvent (NULL, /* no security */ TRUE, /* manual reset */ FALSE, /* non-signalled */ NULL); /* unnamed */ } The following pthread_cond_wait function waits for a condition and atomically releases the associated generation_count_; cv->waiters++; for { ::LeaveCriticalSection (external_mutex); // Wait until the event is signaled. ::WaitForSingleObject (cv->event_, INFINITE); ::EnterCriticalSection (external_mutex); // Exit the loop when the event_> // is signaled and there are still waiting // threads from this generation that haven't // been released from this wait yet. if (cv->release_count_ 0 && cv->generation_count_ != c) break; } --cv->waiters; // If we're the last waiter to be notified // then reset the manual event. cv->generation_count_--; if (cv->generation_count_ == 0) ::ResetEvent (cv->event_); } This function loops until the event_ HANDLE is signaled and at least one thread from this ``generation'' hasn't been released from the wait yet. The generation_count_ field is used incremented every time the event_ is signal via pthread_cond_broadcast or pthread_cond_signal. It tries to eliminate the fairness problems with Solution 1, so that we don't respond to notifications that have occurred in a previous ``generation,'' i.e., before the current group of threads started waiting. The following function notifies a single thread waiting on a Condition Variable: int pthread_cond_signal (pthread_cond_t *cv) { if (cv->waiters_ cv->release_count_) { ::SetEvent (cv->event_); cv->release_count_++; cv->generation_count++; } } Note that we only signal the Event if there are more waiters than threads currently being released. Finally, the following function notifies all threads waiting on a Condition Variable: int pthread_cond_broadcast (pthread_cond_t *cv) { if (cv->waiters_ 0) { ::SetEvent (cv->event_); cv->release_count_ = cv->waiters_; cv->generation_count_++; } } Unfortunately, this implementation has the following drawbacks: 1. Busy-waiting -- This solution can result in busy-waiting if the waiting thread has highest priority. The problem is that once pthread_cond_broadcast signals the manual reset event_ it remains signaled. Therefore, the highest priority thread may cycle endlessly through the for loop in pthread_cond_wait. 2. Unfairness -- The for loop in pthread_cond_wait leaves the critical section before calling WaitForSingleObject. Thus, it's possible that another thread can acquire the external_mutex and call pthread_cond_signal or pthread_cond_broadcast again during this unprotected region. Thus, the generation_count_ will increase, which may fool the waiting thread into breaking out of the loop prematurely and stealing a release that was intended for another thread. 3. Potential for race conditions -- This code is only correct provided that pthread_cond_signal and pthread_cond_broadcast only ever called by a thread that holds the external_mutex. That is, code that uses the classic "condition variable signal" idiom shown above will work. Doug -- Dr. Douglas C. Schmidt ([email protected]) Department of Computer Science, Washington University St. Louis, MO 63130. Work #: (314) 935-4215; FAX #: (314) 935-7302 http://www.cs.wustl.edu/~schmidt/ =================================TOP=============================== -- ================ Bil LambdaCS.com =================================TOP=============================== Q90: Linux pthreads and X11 Date: Mon, 16 Feb 1998 17:47:03 +0000 Organization: Visix Software Steve Cusack wrote: > I've just started using Linux pthreads and have immediately run into > the fact that the package appears to be incompatible with X. I've > read (via DejaNews) that X is "threads unsafe" and that others have > had similar problems (X refusing to start). Does anyone have X11 and > pthreads working together on a Linux system? If so, what did you have > to do? I ported Vibe (a multi-threaded Java IDE) to Linux, so I have made X11 and linuxThreads work together. Its not easy, unless you can target a glibc2 platform. You need to do one of two things: either get/create a recompilied -D_REENTRANT version of the X libraries, or patch linuxThreads to use the extern int errno as the errno for the initial thread. I chose to create thread aware Xlibs. (That was not fun. For some reason the build process would get into an infinite loop. I don't remember how I got it to work) You should be able to search Deja-news for pointers to patching LinuxThreads. Doug -- [email protected] Replace "xyzzy" with Victor India Sierra India X-Ray to email me. ================ Get yourself a copy of Redhat 5.0. It comes with pthreads and thread safe X libs. -Arun ================ For such-compiled binaries for ix86-libc5, see ftp://ftp.dent.med.uni-muenchen.de/pub/wmglo/XFree86-3.3-libs.tar.gz Strange, for me it was very easy. Just adding -D_REENTRANT next to -D_POSIX_SOURCE in linux.cf did the trick. But remember: These libs are still not thread-safe (you can make only one X11 call at a time -- I don't think this is too bad). The better option at this stage really is glibc2 with X11 libs compiled with XTHREADS. Regards, Wolfram. =================================TOP=============================== Q91: One thread runs too much, then the next thread runs too much! [I've seen variations on this concern often. Johann describes the problem very well (even if he can't find the "shift" key). -Bil] =================== The Problem ================================ Johann Leichtl wrote: > hi, > > i have a a problem using pthreads in c++. > > basically what i want to do is have class that manages a ring buffer and > say 2 threads, where one adds entries to the buffer and one that removes > them. > i have a global object that represents the buffer and the member > functions make sure that adding and removing entries from the buffer > work ok. > > the 2 functions: > > inline void ringbuf::put(int req) > { > pthread_mutex_lock(&mBufLock); > while(elemNum == size) > pthread_cond_wait(&cNotFull, &mBufLock); > buf[next] = req; > next = (next + 1) % size; > elemNum++; > pthread_cond_signal(&cNotEmpty); > pthread_mutex_unlock(&mBufLock); > } > > inline void ringbuf::get(int& req) > { > pthread_mutex_lock(&mBufLock); > while(elemNum == 0) > pthread_cond_wait(&cNotEmpty, &mBufLock); > req = buf[last]; > last = (last + 1) % size; > elemNum--; > pthread_cond_signal(&cNotFull); > pthread_mutex_unlock(&mBufLock); > } > > now my problem is that my consumer thread only wakes up after the buffer > is full. i tried different buffer sizes and simulated work in both > producer and consumer. > when i use a sleep() function in the producer (and or consumer) i can > get the thing to look at the buffer earlier. > > i was wondering if anybody would have some input on what the problem > could be here. i've done something similar with UI threads and not C++ > and it works fine. > > thanks a lot. > > hans > [email protected] =================== The Solutions ================================ Use either threads w/ system scope or call thr_setconcurrency to increase the concurrency level... Here's a handy thing to stick in a *Solaris* pthreads program: #include #include thr_setconcurrency(sysconf(_SC_NPROCESSORS_ONLN)+1); This will give you as much actual concurrency as there are processors on-line plus one. It's a starting point rather than a fix-all, but will cure some of the more obvious problems... - Bart -- Bart Smaalders Solaris Clustering SunSoft [email protected] (415) 786-5335 MS UMPK17-201 http://playground.sun.com/~barts 901 San Antonio Road Palo Alto, CA 94303 =================================TOP=================== Johann, No, actually you *don't* have a problem. Your program works correctly, it just happens to work in a slightly unexpected fashion. The functions that call put & get are probably unrealistically simple and you are using local scheduling. First the buffer fills up 100%, then it completely empties, then it fills up again, etc. Make your threads system scoped and you'll get what you expect. [You'll notice Bart suggests a different method for obtaining the same results (ie. more LWPs). I like this method because I think it's a clearer statement of intention AND PROCTOOL will give me LWP statistics, but not thread statistics.] (You can look on the web page below for exactly this program, written in C, one_queue_solution.c.) -Bil =================================TOP=============================== Q92: How do priority levels work? Kamal Kapila wrote: > > Hi there, > > I'm working on an internal package to provide platform independant > thread services (the initial platforms are DECUNIX 4.0 and Windows NT). > The problem I'm having is understanding the thread scheduling on > DECUNIX. > > It would seem to me logical that the threads of a process would have the > same priority and policy of their associated process by default. > However, when I check the process priority/policy I get completely > different values from when I check the individual thread priorities and > policies. In fact, the priority values do not seem to even follow the > same scale (I can set process priorities from 0-63, while thread > priorities go only from 0-31). In addition, setting the process > priority does not seem to effect the thread priorities at all (!). Basically, there are "interaction issues" in implementing a 2-level scheduling model (as in Digital UNIX 4.0), that POSIX didn't attempt to nail down. We deferred dealing with these issues until some form of industry concensus emerged. That industry concensus has, since, not merely "emerged", but has become a mandatory standard in the new Single UNIX Specification, Version 2 (UNIX98). With 2-level scheduling, it really doesn't make much sense to "inherit" scheduling attributes from the process -- because those attributes MEAN entirely different things. Digital UNIX, by the way, doesn't really have a "process" -- it has (modified) Mach tasks and threads. (There are a set of structures layered over tasks to let the traditional UNIX kernel code deal with "processes" in a more or less familiar way, but a process is really sheer illusion.) Since tasks aren't scheduled, they really have no scheduling attributes -- threads do. Since a non-threaded process has a task and a single thread, the 1003.1b (realtime) scheduling functions operate, in general, on the "initial thread" of the specified "process". The kernel thread scheduling attributes control scheduling between various kernel threads. But a POSIX thread is really a user object, that we map onto one or more kernel threads (which we call "virtual processors"). Pretending to set the scheduling attributes of this thread to the "process" attributes makes no sense, because the scheduling domain is different. POSIX threads are scheduled only against other threads within the process -- not against kernel threads in other processes. POSIX provides a way to create threads that you really want to be scheduled against other kernel threads -- essentially, forcing the POSIX thread to be "bound" to a kernel thread itself, at the expense of (often substantially) higher scheduling costs. This is called "system contention scope". Digital UNIX 4.0 didn't support system contention scope (which is an optional feature of POSIX), but we've added it for the next version (4.0D). Each system contention scope (SCS) thread has its own scheduling attributes, independent of the process. While it might make some intuitive sense to inherit the process priority, POSIX doesn't provide any such semantics. A newly created thread either has explicit scheduling attributes, or inherits the attributes of the thread that created it. Of course, since setting the "process" attributes affects the initial thread, threads that IT creates will inherit the "process" attributes by default. But changing the "process" attributes won't (and shouldn't) affect any SCS threads in the process. The ambiguity (and the only relevant question for the implementation you're using, which doesn't support SCS threads), is, what happens to the virtual processors that are used to execute POSIX threads, when the "process" scheduling attributes are changed? And with what attributes should they run initially? UNIX98 removes the (intentional) POSIX ambiguity by saying that setting the 1003.1b scheduling attributes of the "process" WILL affect all "kernel entities" (our virtual processors, Sun's LWPs) used to execute process contention scope (PCS, the opposite of SCS) threads. By extension, the virtual processors should initially run with the existing process scheduling attributes. This will be true of any UNIX98 branded system -- but until then, there's no portable rules. The fact that the POSIX thread interfaces don't use the same priority range as the system is a stupid oversight -- I just didn't think about it when we converted from DCE threads to POSIX threads for 4.0. This has been fixed for 4.0D, though it's a bit too substantial a change (and with some potential risk of binary incompatibilities) for a patch. > (BTW, I am using sched_getparam() and sched_getscheduler() to get the > process related values and pthread_getparam() to get the thread related > values). Right. > Specifically, I have the following questions : > > - What is the relationship between the process priority/policy and the > thread priority and policy ? There's very little relationship. Each POSIX thread (SCS or PCS) has its own scheduling attributes (priority and policy) that are completely independent of "process" attributes. UNIX98, however, says that the "kernel entities" used to execute PCS POSIX threads WILL be affected by changes to the "process" scheduling attributes -- but SCS threads will not (and should not) be affected by such changes. (Nor will the scheduling attributes of PCS threads, even though their "system scheduling attributes" effectively come from the virtual processor, which is affected.) > - Does the scheduler schedule individual threads independently, or are > processes scheduled, with a process's threads then sharing the process > CPU time? As I said, there's no such thing as a process, and the closest analog, the Mach task, isn't a schedulable entity. All threads are scheduled independently -- each has its own scheduling attributes, its own time slice quantum, etc. On Digital UNIX 4.0, with only PCS threads, the kernel schedules the virtual processor threads of all the processes (plus the single kernel threads associated with all non-threaded processes). Threaded processes also contain a user-mode scheduler, which assigns PCS threads to the various virtual processors, based on the PCS thread scheduling attributes. (A process has one virtual processor for each available physical processor on the system.) On Digital UNIX 4.0D, with SCS thread support added, each process may also have any number of SCS threads, which map directly to individual and independent kernel threads. SCS threads are scheduled the same as virtual processors -- each has its own scheduling attributes, time slice quantum, etc. (It might seem that managing CPU time by kernel threads rather than by processes allows users to monopolize the system by creating lots of kernel threads. But they could do that by creating lots of processes, too... and a kernel thread is cheaper for the system than a process, which is really a thread plus a task. The ability to create new kernel threads, as well as processes, is limited both by user and system quotas. And of course, in 4.0, users can't actually create new kernel threads -- only POSIX threads, which are mapped to the process' existing virtual processors.) So each process presents a set of runnable kernel threads to the kernel: A mix of SCS threads and the various PCS threads currently mapped on to one or more virtual processors. The kernel then determines which kernel threads to schedule on each processor. (That's why it's called "2-level scheduling".) > - Is the thread's overall priority a combination of the process priority > and the individual thread priority ? If so, how is this determined ? Currently, "process" priority is irrelevant for a threaded process. Virtual processors don't inherit the process priority. (Actually, they sort of do, and the first virtual processor is the initial process thread, which can be changed using the 1003.1b functions -- but the kernel generates "replacement" virtual processors at various times, and these currently are always set to the default scheduling attributes [timeshare policy and priority 19].) POSIX thread priority determines which threads the user-mode scheduler assigns to the various virtual processors. Because the virtual processor priority doesn't change (the whole point of 2-level scheduling is to avoid expensive kernel calls), the POSIX thread priority has no effect on the kernel scheduling. That's OK, except in rare cases where applications comprising multiple PROCESSES have threads (in different processes) that really need to directly preempt each other based on priority. > I have read through all of the Digital documentation that I have but I > have not been able to find any clear answers to my questions. A description of the behavior (though in less technical/internal detail than the one in this posting) can be found in Appendix A (section A.3) of the Digital UNIX "Guide to DECthreads" manual. /---------------------------[ Dave Butenhof ]--------------------------\ | Digital Equipment Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cp/butenhof/posix.html | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q93: C++ member function as the startup routine for pthread_create(). Hi You need a static member function static void *function_name(void *); in the class declaration Usually then you pass the classes address as the parameter so the static function can access the nonstatic members of the class - Robert On 19 Sep 1997 03:15:22 GMT, [email protected] (Phil Romig) wrote: > > >I know I should be able to figure this out, but I'm missing something. > >I want to pass a member function as the startup routine for pthread_create(). >That is, I want to create an instance of a class and then pass one of >the public member functions as the start routine to pthread_create(). > >I believe the question comes down to how describe the address of the >member. Simple qualification (class::member) will not work because I >need the address of the of the function that goes with the particular >instance of the class. > >For the record I'm working on an HPUX 10.01 system so I'm using pthreads >draft 4, rather than the current standard. > >Any advice, pointers and suggestions are welcome. >thanks >Phil >[email protected] > > >A quick example of what I want to try: > > class foo { > public: > foo(int i) > void *go(void *arg); > } > > main() { > foo *bar = new foo(1); > > pthread_create(...,&(bar->go),....); >} =================================TOP=============================== Q94: Spurious wakeups, absolute time, and pthread_cond_timedwait() [Bil: The summary is "Retest conditions with CVs." and "The time-out is an absolute time because it is." (NB: Deltas are a proposed extension to POSIX.) This is a nice exposition.] Brian Silver wrote: > > Ben Self wrote: > > > [Snip] > > The standard specifies that pthread_cond_wait() and > > pthread_cond_timedwait() may have spurious wakeups. The reason for this > > is that a completly reliable once and only once wake up protocol can be > > excessively expensive for some asymetric multiprocessor systems. > > Well, maybe I'm being a bit anal about this, but this > really isn't the case. If it was, then you'd have the > same issue for mutexes as well, and the standard does > not allow for spurious wakes on mutexes. [Bil: Actually, you DO get spurious (define "spurious"!) wakeups with mutexes, YOU just never see them.] > The "while(predicate)/wait" construct is very common > in concurrent environments (regardless of their symetry). > The reason is that since the environment is highly > unpredictable, while you were coming back out of the > wait, the state of the thing that you were waiting for > may have changed. > > This construct is used to impliment mutexes as well, > its just that you don't see it since the predicate > is known; it is the state of the mutex lock. Cv's force > the construct to the user code because the predicate > is not known to the impliment of the cv itself. The > warnings about spurious wakes are taken seriously when > mutexes are implimented, and are accounted for in the > exact same "while(predicate)/wait" construct. > > Wake-only-once doesn't really help. It will remove the > addition of spurious wakes, but it won't account for > "valid wake, but the predicate changed". Implimenting > wake-only-once is expensive when you consider that this > solution solves both problems. > > Also note that the mutex lock around the predicate > doesn't solve this problem either. There is a race > that starts once you see the wake and ends once you > reaquire the mutex. In that time, another thread can > get the mutex and change the data (believe me, it > happens - more often than you'd expect). When you > reaquire the mutex, and exit the wait, the predicate > has changed and you'll need to go back to waiting. > > Now, a wake-only-once,-and-gimme-that-mutex atomic > operation might be nice . > > Brian. I am not reposting to be defensive or argumentative. Upon reflection, however, I have come to the conclusion that neither I nor subsequent posters have really dealt with the original poster's question let alone the new topics that we have thrown about. Since this is a response largely to Brian Silver's post, a person I have a good deal of respect for, I have chosen to include some quotes form Dave Butenhof's book, Programming with POSIX Threads, because we both know it and have a mutual admiration for his work. First of and most importantly the original question I attempted to answer was: Fred A. Kulack wrote: > A side question for the rest of the group... > All the applications I've seen use a delta time for the wait and must > calculate the delta each time a cond_timedwait is done. What's the rational > for the > Posix functions using an ABSOLUTE time? I was hoping for an uncomplicated answer and the spurious wakeup issue seemed to fit the bill. My writing and thinking however was too simplistic to provide any meaningful insight. So I will try again. Please realize that some of what you will read below is purely personal supposition. Chime in if I have misinformed. 1) Spurious wakeup is a reason for passing a absolute value to cond_timedwait. It is not the reason or even a particularly important reason. The standard (POSIX 1003.1c-1995) specifically states that a compliant implementation of pthread_cond_timedwait() may suffer from spurious wakeups. It therefore is reasonable to use and absolute timeout value instead of an delta to simplify the act of retry. 2) More importantly it is also very likely a performance issue. Most systems when scheduling a software interrupt use an absolute value that reflects an offset into the OS's epoch. To constantly be re-evaluating a delta in user code is excessively expensive especially if most systems really want an absolute value anyway. 3) Also their is the reality that the structure timespec is the high resolution time value of choice in POSIX. And timespec happens to represent its time as absolute time. Add into that the needs of the powerful realtime group that had a great impact of the shape of POSIX 1003.1c. What integral unit would we use for a delta anyway? and would it be in nanoseconds? Eeak! 4) Most importantly one would hope that the interface were constructed to promote good coding techniques. As Brian Silver stated the "while(predicate)/wait" idiom is an important technique for far more reasons than just spurious wakeups. By using an absolute timeout value as opposed to a delta this idiom is directly supported by easing its use. When I originally brought up the "while(predicate)/wait" idiom it was because spurious wakeups would necessitate retrying the predicate. I did not intend to state that this was the only or even a particularly important reason for the pattern. The while "while(predicate)/wait" idiom or an equivalent is essential to programming with condition variables. 1) Most importantly is the reason Brian silver stated, "There is a race that starts once you see the wake and ends once you reacquire the mutex." It would be difficult and detrimental to concurrency to construct through synchronization a situation that did not require re-testing of the predicate after a wakeup. This is why Brian's magic bullet "wake-only-once,-and-gimme-that-mutex atomic" does not exist. Although it would be nice. 2) Spurious wakeups do exist. Be consoled by the fact that "The race condition that cause spurious wakeups should be considered rare. [Butenhof]" 3) Also It enables a powerful technique that I have been using for a several years with great success that Dave Butenhof refers to as "loose predicates". "For a lot of reasons it is often easy and convenient to use approximations of actual state. For example, 'there may be work' instead of 'there is work'." I will go one step beyond that in my experience of coding distributed web servers there are situations when the notification mechanism cannot know with certainly that there is work without actually have performed the entirety of the task itself. Often the best a distributed component has to work with is trends and potentialities. Lastly, (whew ;) I believe that I have overstated the significance of the performance implications of only once wakeups. "Excessively expensive" is a bit strong without further qualification. If it were such a paramount issue Brian Silver is right, mutexes would suffer from the same restrictions and they absolutely do not. There is a performance issue that I have run across many times and have seen cited in many references including : "Spurious wakeups may sound strange but on some multiprocessor systems, making condition wakeup completely predictable might substantially slow all condition variable operations. [Butenhof]" Never-the-less, it is the fact that making wakeup completely predictable does not get you that much. You still need to retest your predicate. In the end it is such an easy and cheap thing when taken in the context of the overhead of the synchronization and latency of the wait. --ben ----------- Ben R. Self [email protected] www.opentext.com Open Text Corporation -- Home of Livelink Intranet More on spurious wakeups It is so because implementations can sometimes not avoid inserting these spurious wakeups; it might be costly to prevent them. Perhaps more importantly, your own program's logic can introduce spurious wakeups which cannot be eliminated. This can start happening as soon as there are more than two threads. You see, a condition waiting thread which has been signaled may have to compete with another thread in order to re-acquire the mutex. If that other thread gets the mutex first, it can change the predicate, so that when finally the original thread acquires it, the predicate is false. This is also a spurious wakeup, for all purposes. To make this form of spurious wakeup go away, the semantics of condition variables would have to change in troublesome ways, back to the original monitors and conditions concept introduced by Quicksort father C. A. R. Hoare. Under Hoare's monitors and conditions signaling a condition would atomically transfer the monitor to the first task waiting on the condition, so that woken task could just assume that the predicate is true: if (!predicate()) wait(&condition); /* okay */ The very useful broadcast operation does not quite fit into Hoare's model, for obvious reasons; the signaler can choose only one task to become the next monitor owner. Also, such atomic transfers of lock ownership are wasteful, especially on a multiprocessor; the ownership transfer spans an entire context switch from one task to another, during which that lock is not available to other tasks. The switch can take thousands of cycles, inflating the length of a small critical region hundreds of times! Lastly, a problem with Hoare's approach is that a ``clique'' of tasks can form which bounce ownership of the monitor among themselves, not allowing any other task entry into the monitor. No reliable provision can be made for priority-based entry into the monitor, because the signal operation implicitly ingores such priority; at best it can choose the highest priority thread that is waiting on the condition, which ignores tasks that are waiting to get into the monitor. In the POSIX model, a condition variable signal merely wakes up a thread, making it runnable. The scheduling policy will effectively decide fairness, by selecting who gets to run from among runnable threads. Waking up of threads waiting on monitors and conditions is done in priority order also, depending on the scheduling policy. > You know, I wonder if the designers of pthreads used logic like this: > users of condition variables have to check the condition on exit anyway, > so we will not be placing any additional burden on them if we allow > spurious wakeups; and since it is conceivable that allowing spurious > wakeups could make an implementation faster, it can only help if we > allow them. > > They may not have had any particular implementation in mind. You're actually not far off at all, except you didn't push it far enough. The intent was to force correct/robust code by requiring predicate loops. This was driven by the provably correct academic contingent among the "core threadies" in the working group, though I don't think anyone really disagreed with the intent once they understood what it meant. We followed that intent with several levels of justification. The first was that "religiously" using a loop protects the application against its own imperfect coding practices. The second was that it wasn't difficult to abstractly imagine machines and implementation code that could exploit this requirement to improve the performance of average condition wait operations through optimizing the synchronization mechanisms. /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q95: Conformance with POSIX 1003.1c vs. POSIX 1003.4a? christof ameye we41 xxxx wrote: > Some pthread libraries talk about conformance with POSIX 1003.1c and > others of conformance with POSIX 1003.4a. What are the > differences/similarities ? > I don't realy need the answer, but it might be interesting to know ... First off, "conformance" to 1003.4a is a completely meaningless statement. There's no such thing, because 1003.4a was never a standard. Furthermore, I strongly doubt that there ever *were* any implementations that could conform even if there was a way to apply meaning to the statement. 1003.4a was the original name of the thread standard -- named for the fact that it was developed by the realtime group (POSIX working group designation 1003.4). However, it, like the original realtime standard, was really an amendment to the "base standard", 1003.1. Eventually, POSIX decided to resolve the confusion by creating more -- and renaming all O/S API standards into the 1003.1 space. Thus, 1003.4 became 1003.1b, 1003.4a became 1003.1c, 1003.4b became 1003.1d, and so forth. There were various draft versions of the standard while it was still named 1003.4a, but all are substantially different from the actual standard, and to none of them, technically, can any implementation "conform". The most common draft is 4, which was the (loose) basis for the "DCE thread" api, part of The Open Group's DCE suite. There's at least one freeware implementation that claimed to be roughly draft 6. IBM's AIX operating system provides a draft 7 implementation. The "pthread" interface on Solaris 2.4 was draft 8. There is also at least one implementation which I've seen claiming to be "draft 10". Draft 10 was the final draft, which was was accepted by the IEEE standards board and by ISO/IEC with only "minor" editorial changes. Nevertheless, draft 10 is NOT the standard, and, technically, one cannot "conform" to it. "Draft 10" and "1003.1c-1995" are NOT interchangeable terms. Finally, because 1003.1c-1995 was never published as a separate document, the official reference is the 1003.1-1996 standard, which includes 1003.1b-1993 (realtime), 1003.1c-1995 (threads), and 1003.1i-1995 (corrections to the realtime amendment). In terms of someone writing programs, a lot of that is irrelevant. But you need to be aware that there's no real definition of "conformance" for any drafts, so one vendor's "draft 4" is not necessarily the same as another's "draft 4", and, while that might be inconvenient for you, there's nothing "wrong" with it. (Although, from the POSIX standards point of view, it was foolish and irresponsible of "them" [by which I really mean "us", since I wrote the most common draft 4 implementation, the original DCE threads reference library ;-) ] to use the "pthread" prefix at all.) There are MANY differences between "draft 4" and standard POSIX threads. There are many (though slightly fewer) differences between draft 7 or 8 and the standard. There are even some differences between draft 10 and the standard. Look at a move between any two drafts, or between any draft and the standard, as a PORT to an entirely new threading library that has some similarities. Be very careful of the details, especially where things "appear to be the same". And, if you're stuck with a draft implementation, lobby the vendor to provide a full conforming POSIX implementation! /---------------------------[ Dave Butenhof ]--------------------------\ | Digital Equipment Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cp/butenhof/posix.html | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q96: Cleaning up when kill signal is sent to the thread.? > I'm writing a multi-threaded daemon, which requires some cleanup if a > kill signal is sent to the thread. I want just the thread that received > the signal to exit. > > The platform is Linux 2.0, libc 5.4.23, linuxthreads 0.6 (99% POSIX > threads). > > The docs indicate that threads share signal functions, but can > individually block or accept certain signals. This is workable -- but > how do I get the thread id of the thread that received the signal? > > And my next question, how portable are thread cleanup routines? > > Thanks, > > Jeff Garzik Quality news feeds > News Administrator INN Technical info, Consulting > Spinne, Inc. http://www.spinne.com/usenet/ Jeff, From the sounds of what you say, the answer is "No." :-) Meaning, don't do that. There's a better method, cancellation. If you really want the thread to exit asynchronously, that's the way to do it. Now, it is even more likely that a simple polling routine will do the job, and that would be even easier to write. -Bil (There's a nice little cancellation example on the web page below.) =================================TOP=============================== Q97: C++ new/delete replacement that is thread safe and fast? > [email protected] (Bob Pearson) writes: > > > Our platform is Solaris 2.5.1 and I am looking for a commerical, freeware > > or shareware C++ new/delete replacement that is thread safe and uses more > > than a single mutex. We have a multi-threaded application that is using a > > tremendous amount of new operators and is huge (>200MB) and is constantly > > running into very high mutex contention due to the single mutex for new in > > libC.a from: > > > > SUNWSpro "CC: SC4.0 18 Oct 1995 C++ 4.1". > > You might want to check out ptmalloc: > > ftp://ftp.dent.med.uni-muenchen.de/pub/wmglo/ptmalloc.tar.gz > > I would hope that operator new somehow invokes malloc at a lower > level. If not, you would have to write a small wrapper -- there > should be one coming with gcc that you could use. > > Hope this helps, > Wolfram. Wolfram Gloger =================================TOP=============================== Q98: beginthread() vs. endthread() vs. CreateThread? (Win32) [Bil: Look at the description in "Multithreading Applications in Win32" (see books.html)] Mark A. Crampton wrote: > > Juanra wrote: > > > > I'm a Windows 95 programmer and I'm developing a multithreaded > > server-side application. I use the CreateThread API to create a new > > thread whenever a connection request comes. I've read that it's better > > to use beginthread() and endthread() instead of CreateThread because > > they initialize the run time libraries. What happens with win32 > > CreateThread function?. Doesn't it work properly?. If not, I can't use > > beinthread because I can't create my thread in a suspended mode and > > release it after. > > > > Does the function beginthreadNT() work under win95? > > No > > > > > Thanks in advance. > > Juan Ra. > > Answer to beginthread - use _beginthreadex, which uses same args as > CreateThread (you can create suspended). _beginthreadex works on 95 & > NT but not Win32S. The priviledge flags are ignored under 95. > > CreateThread _works_ OK - it just doesn't free memory allocated on the C > run-time library stack when the thread exists. So you can attempt to > clean up the runtime library stack, use _beginthreadex, or not use any C > run time library calls. =================================TOP=============================== Q99: Using pthread_yield()? Johann Leichtl wrote: > > if i have some code like: > > .. > pthread_yield() > something(e.g. lock mutex) > .. > > is it guaranteed that the thread will give up the cpu before getting the > lock or not. First off, to clarify, (you probably already know this, given the set of names in your subject line), "pthread_yield" is an obsolete DCE thread interface, not part of POSIX threads. As such, it is not covered by any formal standard, and has no real portability guarantees. The way it works on your particular DCE thread system is probably the way the developers wanted it to work on that system, and if you disagree there's no "higher authority" to which you might appeal. POSIX specifies the behavior of sched_yield, (or, in fact, any scheduling operation), only with respect to the defined realtime scheduling policies, SCHED_FIFO and SCHED_RR. Threads running under one of these policies that call sched_yield will release the CPU to any thread (in SCHED_FIFO or SCHED_RR) running at the same priority. (There cannot be any at a higher priority, since they would have preempted the current thread immediately.) Is that the same thing as "guaranteed [to] give up the cpu"? For one thing, sched_yield won't do anything at all if there are no other threads that are ready to run at the calling thread's priority; it'll just return. If you have threads with non-standard scheduling policies, such as SCHED_OTHER, or a hypothetical SCHED_TIMESHARE, POSIX says nothing about the behavior or sched_yield. Most likely, (and at least in Digital's implementation), the function will do the same thing. It doesn't really worry about scheduling POLICY, only PRIORITY. Note that, because SCHED_OTHER doesn't necessarily imply preemptive scheduling, you might actually have a thread "ready to run" at a higher priority than the current thread's priority. Also, because non-realtime policies aren't necessarily "strictly priority ordered", and the system generally wants to simulate some sort of fairness in timeshare scheduling, it is possible (at least, "not ruled out by the standard") that a call to sched_yield from a non-realtime thread might yield to a thread with lower priority -- especially if that other thread is realtime. /---------------------------[ Dave Butenhof ]--------------------------\ | Digital Equipment Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cp/butenhof/posix.html | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q100: Why does pthread_cond_wait() reacquire the mutex prior to being cancelled? Firstly, thanks to all of you who responded to my post in comp.programming.threads. As I alluded to in my posting, I felt quite sure the problem I was experiencing was one due to misunderstanding. I immediately suspected this when my program exhibited the same behaviour under HP-UX *and* Solaris. Everyone told me the same thing: use a cleanup handler, pushed onto the cleanup handler stack for the active thread because pthread_cond_wait *reacquires* the mutex when it is cancelled. I can see how this causes other threads waiting on the same condition variable to fail to be cancelled, but for me, the $64,000 question is: Why does pthread_cond_wait reacquire the mutex prior to being cancelled? This seems like madness to me. We're _cancelling_ the thread, so we're no longer interested in the value of the data we're testing. Why acquire the lock and immediately require us to use a cleanup handler? There must be something more to this ;-) Thanks, Ben --- Ben Elliston .==== E-mail: [email protected] Compucat Research Pty Limited / ====. Web: http://www.compucat.com.au Canberra ACT Australia .==== / ====. Ben, You're not thinking hard enough! It *has* to be like this. > Why does pthread_cond_wait reacquire the mutex prior to being cancelled? > > This seems like madness to me. We're _cancelling_ the thread, so we're no > longer interested in the value of the data we're testing. Why acquire the > lock and immediately require us to use a cleanup handler? A: pthread_cond_wait(m, c); B: do some work... C: pthread_mutex_unlock(m); If cancellation happened while sleeping (at A) or while running (at B), the same cleanup handler would run. If the state of the mutex was DIFFERENT at those locations, you'd be up the creek. Right? -Bil =================================TOP=============================== Q101: HP-UX 10.30 and threads? o Bryan Althaus wrote: > > Jim Thomas ([email protected]) wrote: > : In article <[email protected]> [email protected] (Bryan Althaus) writes: > : > : Bryan> This is actually becoming a really bad joke. No HP people seem to want > : Bryan> to talk about 10.30 though they will post on what compiler flags to > : Bryan> use to compile an app using pthreads under 10.30! > : > : Bryan> And apparently once it does come out you must ask for it. 10.20 will be > : Bryan> the shipping OS until HP-UX 11.0 comes out. > : > : Bryan> If anyone knows when 10.30 is shipping, please email me. I'll respect > : Bryan> your privacy and not repost. We have a current need for threading, > : Bryan> but since they must be spread out over multiple CPU's, kernel thread > : Bryan> support is needed - hence HP-UX 10.30. > : > : I received an e-mail from "SOFTWARE UPDATE MANAGER" Saturday that says the > : following. Note especially the part about not for workstations :-( > : > : Jim > : > Thanks for the info Jim. A received email from a kind soul at HP who > basically explained the deal on 10.30, being used for people/ISV > transitioning for HP-UX 11.0, but was not sure when it would be out. > That was all I needed to know. Based on this we will use 10.20 for > our product roll-out and when HP-UX 11.0 comes out maybe I revisit > replacing the forking() code with threads. As it turned out, on > a two CPU machine, the forking() code actually worked nicely and > basically was written as if I were using pthreads and wasn't really > the big hack I thought it was going to be. Of course each fork() > costs an additional 70MB's of memory! :) > > Now if someone could let us know roughly the due date for HP-UX 11.0 > and maybe what we can look for in HP-UX 11.0. Obviously it will have > kernel threads with pthreads API, NFS PV3, Streams TCP/IP, and support > both 32 and 64 bit environments. Will HP-UX 11.0 ship more Net friendly > with a Java Virtual Machine? Will the JVM be threaded? Will JRE be > on all HP-UX 11.0 systems? WebNFS support? WebServer? Browser? Current > OS's now come with all these goodies standard: http:/www.sun.com/solaris/new -- =================================TOP=============================== Q102: Signals and threads are not suited to work together? Keith Smith wrote: > > This is a question I posed to comp.realtime, but noticed that you have a > discussion going on here.... can you offer me any assistance? > HEre's the excert: > > Shashi: > > Based on your previous email (below), I have a couple of questions: > > 1. If signals and threads are not suited to work together, what > mechanism can/should be used to implement timing within a thread. If I > have two threads that performed autonomous time-based functions, I want > to be able to have a per-thread timing mechanism. > > 2. If the approach of "block all signals on all threads and send various > signals to a process - putting the emphasis on the thread to unblock the > appropriate signals", how do we deal with other threads which may be > interrupted by a blocked signal (e.g. a read() call that returns EINTR > even when its thread blocks the offending signal. Isn't this a flaw? > This requires the need for a signal handler (wastefull) with the RESTART > option speicified. > > It seems like a per-thread mechanisms is needed... how does NT > accomplish this? > > ** I know I shouldn't be relying on a per-LWP signal, but how else can I > accomplish what I am trying to do? > > In message " timer's interrupting system calls... HELP", [email protected] > writes: > > >Hi, > >Signals and threads are not suited to work together. Personally, I feel that > >UNIX has a serious flaw in the sense that most blocking systems calls (e.g. > >read, semop, msgrcv etc) do not take a timeout parameter. This forces > >programmers to use alarms and signals to interrupt system calls. I have > >worked with operating systems such as Mach, NT and other which do not suffer > >from this problem. This makes porting single threaded applications from UNIX > >(which rely on signals) to a multithreaded process architecture difficult. > > > >Even though I come from an UNIX background (Bell Labs in late 80's) I have > >learnt the hard way that signals make program much more error prone. I have > >worked extensively on Mach and NT and never saw a reason to use threads. As > >far as POSIX.1c is concerned I think they did a favor to the users of threads > >on UNIX by mandating that signals be a per-process resource. You have to > >understand that LWP is more of a System R4 concept (same on Solaris) and not > >a POSIX concept. Two level scheduling is not common on UNIX systems (those > >who implement have yet to show a clear advantage of two level scheduling). > > > >I am sure that Dave Butenhof (frequent visitor to this newsgroup) would have > >more insight as to why POSIX did not choose to implement signals on a > >per-thread basis (or LWP as you say). I would advice that you should > >rearchitect your application not to depend on per-thread (LWP) signals. I > >feel you will be better off in the long run. Take care. > > > >Sincerely, > >Shashi > > =================================TOP=============================== Q102: Patches in IRIX 6.2 for pthreads support? Jeff A. Harrell wrote: > > radha subramanian wrote: > > > > I heard that a set of patches have to be applied in IRIX 6.2 > > for pthreads support. Could someone tell me which are these > > patches ? > > 1404 Irix 6.2 Posix 1003.1b man pages List 123Kb 07/01/97 > 1645 IRIX 6.2 & 6.3 POSIX header file updates List 41Kb 07/01/97 > 2000 Irix 6.2 Posix 1003.1b support modules List 164Kb 07/01/97 > 2161 Pthread library fixes List 481Kb 07/01/97 > > The whole set is downloadable from: > > > > A SurfZone password is required. =================================TOP=============================== Q104: Windows NT Fibers? Ramesh Shankar wrote: > > Found some info. on Windows NT Fibers in "Advanced Windows." Just > wanted to verify whether my understanding is correct. > > - Is a (primitive) "many to one" thread scheduling model. > - Fibre corresponds to Solaris "threads" (NT threads then correspond > to Solaris LWP). > - If a fibre blocks, the whole thread (LWP for us) blocks. > - Not as sophisticated as Solaris threads. Kinda-sorta. Certainly close enough. My understanding is that fibers were built especially for a couple of big clients and then snuck their way out. As such, I would avoid using them like the plague. I've read the APIs and they scare me. -Bil ------------------------ Jeffrey Richter, Advanced Windows, 3rd Ed., p.971 states that "The fiber functions were added to the Win32 API to help companies quickly port their existing UNIX server applications to Windows NT." (blah) The following sentences say that fibers are targeted to the proprietary user level thread-like quirks some companies did for whatever reason (ease of programming, performance). To answer your question: fibers are not an integral part of any MS application, and I can't imagine that they use it internally anywhere, and thus won't achive the stability. Does this argument weigh a bit against their use in a new program? Joerg PS: Have you noticed that I managed to keep from flaming :-) ----------------- >> Fibers are BAD because they comprise a SECOND method of doing threading. >> If you want threads, use threads. (All that co-routine stuff was >> great. We don't need them any more.) > >There are two reasons for "threads" and things similar to threads. >First, they're smaller than full blown processes and with faster >context switching than with processes. Second, they allow more fine >grained concurrency. I don't think you hit it quite on the head. Threads, allow a computation to be decomposed into separately scheduled tasks, which has these advantages: - the tasks can be run on separate processors. - the tasks can be prioritized, so that a less important computation can, in response to an external event, be suspended to process that event. - computation can occur in one thread, while waiting for an event, such as the completion of I/O So it's all about improving performance parameters like overall run time, or average response time, or real time response, and maximizing the utilization of real resources like processors and peripherals. >Originally, coprocesses (and tasks and light-weight-processes and >threads) solved both goals quite well. Then in the last decade or >more, thread-like things started getting bigger and slower; ie, >letting the kernel handle the context switching, making them work well >with OS calls and standard libraries, signal handling, asynchronous >I/O, etc. > >Fibers seem like just a return to the efficient/small type of task. >The drawback to them seems just that they're only on Windows NT, so >that even if you have a valid need for them the code won't even be >portable to other Windows boxes. If you take a thread, and then hack it into smaller units that the operating system doesn't know about, these smaller units do not realize the advantages I listed above. They are not scheduled on separate processors, they cannot be dispatched in response to real-time inputs, they cannot wait for I/O while computation occurs. I did not list, as one of the advantages, the ability to change the logical structure of the program by decomposing it into threads, like eliminate the implementation of state machines by offloading some state information into individual program counters and stacks. To me, that is a purely internal program design matter that doesn't make any externally visible difference to parameter like the running time, througput, real-time response or average response. It's also a programming language matter as well; a language with continuations (e.g. Scheme) would have no need for these types of sub-threads. In Scheme, a function return is done using a function call to a previously saved continuation. It's permitted to jump into the continuation of a function that has already terminated; the environment captured by the continuation is still available. (Such captured environments are garbage collected when they become unreachable). To me, things like fibers seem like low-level hacks to provide platform-specific coroutines or continuations to the C language, whereas threads are a language-independent operating system feature. >If Fibers are unnecessary because Threads exist, then why not say that >Threads are unnecessary because Processes exist? (Threading comprises >a SECOND method of splitting work up into separate units of control) This argument assumes that threads are to processes what processes are to the system. However according to one popular system model, processes just become collections of resources that simply *have* one or more independently scheduled control units. In this model, threads are the only separate unit of control. A process that has one unit of control is said to be single-threaded, rather than non-threaded. Or, under an alternative model exemplified by Linux, threads are just collections of tasks that share resources in a certain way. Two tasks that don't share an address space, file table, etc are by convention said to be in different processes. Again, there is just one method of splitting work into units of control: the task. =================================TOP=============================== Q105: LWP migrating from one CPU to another in Solaris 2.5.1? Hej Magnus! Kanska... > Hi! > > I've got a question about threads in Solaris 2.5.1, that I hope You can > answer for me! > > Short version: > How does the algorithm work that causes an LWP to migrate from one CPU to > another CPU in Solaris 2.5.1? The LWP gets contexted switched off CPU 0. When a different CPU becomes available, the scheduler looks to see how many ticks have passed. Solaris 2.5: if less than 4, some other LWP (or none at all!) gets the CPU. If > 3, then just put the LWP on the new CPU. > Longer version: > I'm doing some research about a tool that I hope could be used by multi-thread > programmers in order to find and possibly correct perfomance bottlenecks. > Basically the tool works in three phases: > 1) By running the multi-threaded program on a single processor we create a > trace wich represent the behaviour of the program. > 2) By simulating (or re-schedule) the trace on a multi-processor we can tell > wether the program has the desired speed-up or not. > 3) The simulated "execution" is displayd graphically in order to show where > the performance bottlenecks are. This sounds good... > I've got a problem when simulating a program that hits a barrier. > Assume that we, for instance, have 8 bound threads hitting the same barrier > on a multiprocessor with 7 processors. Here the migration for an LWP from > one CPU to another is very important. If we have no migration at all the speed > up will be 4 compared to a single processor. > On the other hand, if we have full migration, the speed up will be (almost) 7 > if we neglect the impact of cache-misses. Of course said $ misses are a BIG deal. None-the-less... I *think* this will happen on Solaris 2.5: The first 7 wake up and run for 1 tick (10ms). The 7 drop 10 points of priority. T8 then gets CPU 7, while T1 - T6 run another tick. They drop 10 points. T7 wants CPU 7 and will get it from T8. Now the time slice increases because we're near the bottom of the priority scale. Everybody runs for 10 ticks. From here on out, one thread will migrate around while the others keep their CPUs. I think. Of course you'd avoid writing a program that put 8 CPU-bound threads on 7 CPUs... -Bil =================================TOP=============================== Q106: What conditions would cause that thread to disappear? William, > I have a service thread which enters a never-exiting service loop via > a while(1). What conditions would cause that thread to disappear? You tell it to. Either return(), pthread_exit(), or pthread_cancel(). That's the only way out. > It can't be just returning off the end because of the while(1). Past > experience has indicated to me that if a single thread causes a > exception such as a SEGV that the entire process is killed. Are there > known conditions which cause just the thread to exit without > interfering with the rest of the process? You're right. SEGV etc. kill the process (unless you replace the signal handler). > I suspect there's stack corruption in this thread, but I would have > expected such corruption to take the form of a SEGV or something > similar. I'm very surprised that just the thread exited leaving > everything else (seemingly) intact. So... you have a problem. I *expect* that you'll find the place where the thread's exiting and it'll be something you wrote. (The other option is a library bug. Always possible (if unlikely).) I'm disappointed to see that a breakpoint in pthread_exit() doesn't get called in the Sun debugger. Moreover, you don't even get to see the stack from the cleanup handlers! (I'm making this a bug report.) I notice that from TSD destructors you at least get to see a bit of the call stack. So... I'd suggest this: Declare some TSD, put a breakpoint in the destructor, and see what happens when your thread exits. Try out the bit of code below. How does this work on other platforms? cc -o tmp1 tmp1.c -g -lpthread */ #define _POSIX_C_SOURCE 199506L #include #define NULL 0 pthread_attr_t attr; pthread_t thread; pthread_key_t key; void destroyer(void *arg) {pthread_t tid = pthread_self(); printf("T@%d in TSD destructor.\n", tid); } void cleanup(void *arg) {pthread_t tid = pthread_self(); printf("T@%d in cleanup handler.\n", tid); } void search_sub2() { pthread_exit(NULL); /* Surprise exit -- the one you forgot about */ } void search_sub1() { search_sub2(); /* do work */ } void *search(void *arg) { pthread_setspecific(key, (void *) 1234); /* NEED A VALUE! */ pthread_cleanup_push(cleanup, NULL); search_sub1(); /* do work */ pthread_cleanup_pop(1); pthread_exit(NULL); } main() { pthread_key_create(&key, destroyer); pthread_attr_init(&attr); pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);/* Also default */ pthread_create(&thread, &attr, search, NULL); pthread_exit(NULL); } =================================TOP=============================== Q107: What parts, if any, of the STL are thread-safe? Matt Austern wrote: > > Boris Goldberg writes: > > > > >I'm finding a memory leak in the string deallocate() (on the call to > > > >impl_->deallocate()) under heavy thread load, and it brings up a > > > >frightening question: > > > > > > >What parts, if any, of the STL are thread-safe? > > > > > > > STL thread safety is implementation-dependent. Check with > > your vendor. Many implementations are not thread-safe. > > One other important point: "thread safety" means different things to > different people. Programming with threads always involves some > cooperation between the language/library and the programmer; the > crucial queston is exactly what the programmer has to do in order to > get well-defined behavior. > > See http://www.sgi.com/Technology/STL/thread_safety.html for an > example of an STL thread-safety policy. It's not the only conceivable > threading policy, but, as the document says, it is "what we believe to > be the most useful form of thread-safety." =================================TOP=============================== Q108: Do pthreads libraries support cooperative threads? Paul Bandler wrote: > > Bryan O'Sullivan wrote: > > > > p> Thanks for those who have sent some interesting replies (although > > p> no-one seems to think its a good idea to not go all the way with > > p> pre-emptive pthreads). > > > > This is because you can't go halfway. Either you use pthreads in a > > fully safe manner, or your code breaks horribly at some point on some > > platform. > > OK, so you would disagree with the postings below from Frank Mueller and > David Butonhof in July that indicates it is possibe (if inadvisable)? > > Frank Mueller wrote: > > > >[email protected] (Schumacher Raphael, GD-FE64) >writes: > >[deleted...] > > > 1) Do pthreads libraries support cooperative threads? > > > > In a way, somewhat. Use FIFO_SCHED and create all threads at the same priority level, > > and a thread will only give up control on a blocking operation, e.g. yield, cond_wait, > > mutex_lock and (if thread-blocking is supported) maybe on blocking I/O (read, write, accept...) > > > > This may be close enough to what you want. Short of this, you probably need your coop_yield, yes. > > On HP-UX, at least until 10.30 (which introduces kernel thread support), > the SCHED_FIFO [note, not "FIFO_SCHED"] scheduling policy workaround > might work for you, because your threads won't face multiprocessor > scheduling. I wouldn't recommend it, though -- and of course it won't > come even close to working on any multiprocessor system that supports > SMP threads (Solaris, Digital UNIX, IRIX, or even the AIX draft 7 > threads). If you're interested in thread-safety, go for thread-safety. > While it might be nice to give yourself the early illusion that your > known unsafe code is running, that illusion could be dangerous later! If > you've got a real need to run the software sooner than you can convert > it, you're likely to run into other problems (such as the order in which > threads run?) If you don't have an immediate need, why look for > shortcuts that you know are only temporary? > > If you really want to build a "cooperative scheduling" package for your > threads, (and again, I don't recommend it!), build your own. It's not > that hard. Inactive threads just block themselves on a condition > variable until requested to run by some other thread (which signals the > condition variable and then blocks itself on the same, or another, > condition variable). > > The "1)" in the original mail implies the first item of a list, but my > news server has chosen, in its infinitesimal wisdom, not to reveal the > original post to me. So perhaps I'll have more to say should it repent! > > /---------------------------[ Dave Butenhof ]--------------------------\ > | Digital Equipment Corporation [email protected] | > | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | > | Nashua NH 03062-2698 http://www.awl.com/cp/butenhof/posix.html | > \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q109: Can I avoid mutexes by using globals? > > j> Now, I have implemented this without using a synchronization > > j> mechanism for the integer. Since I have only one writer, and > > j> multiple readers, and the datum is a simple integer, I believe I > > j> can get away with this. > But on the other hand why not do it correctly with locks? Locks > will make the code easier to maintain because it will be somewhat > self documenting (the call to rwlock() should give most programmers > a clue) and it will be more robust. In my experiance threaded > programs are more fragile and more difficult to debug than single > threaded programs. It is a good idea to keep thread syncronization > as controlled as you can, this will make debugging simpler. Remember that sign in "The Wizard of Oz"? "I'd go back if I were you." When you port your program to the next OS or platform and a new bug appears... Could it be caused by your hack? Won't it be fun trying to guess with each new bug? How will you prove to yourself that the bug is elsewhere? And that guy who maintains your code, won't he have fun? That's three opinions... "Just the place for a snark! I have said it three times. And what I say thrice is true!" -Bil =================================TOP=============================== Q110: Aborting an MT Sybase SQL? Bryan Althaus wrote: > > Jim Phillips wrote: > : We are using Sybase on Solaris for a database application. > > : We are trying to abort a query that turns out to be long running by > : using POSIX pthread() in a manner which allows a calling X-Windows > : program to interrupt a fetch loop and cancel the query. > > : At runtime, it works OK sometimes and sometimes it doesn't. If we > : attempt to start a new query too soon after canceling the long running > : query, we get the remains of the old query's result set. If we wait a > : couple of minutes before starting the new query, then it works fine and > : the new queries expected result set is returned. > > : We are using ESQL and Solaris 2.5 C compiler to build the Sybase SQL > : Server 11.0.2 program interface. > > : I have heard a rumor that you cannot use pthread with some X-11 > : versions? > > : Anybody out there have any ideas, thoughts, comments or critique(s). > > Having come from a Sybase OpenServer class, I can tell you that you can't > "cancel" a query. Once it is sent to Sybase, the query will run till > completion. There is no way to stop Sybase from executing the entire > query. > > I'm not saying that's your problem, just that I notice you say if you > wait a couple of minutes before starting the new query all is fine. > Obviously by then the old query has finished. > > I haven't used ESQL in years (use C++/DBtools.h++) so I don't know how your > connection to Sybase is done. When you "cancel" do you close the old connection, > and then open up a new connection when you start a new query? > > Just sounds like you may be using the same connection and getting back the > first query to finish, which would be the "cancelled" one. > > You might try comp.databases.sybase if this theory is at all likely. > > In any case I'd be interested in what the problem turns out to be. > > Good luck, > Bryan Althaus > =================================TOP=============================== Q111: Other MT tools? The Etch group at UW and Harvard is distributing some program development and analysis tools based on the Etch Binary Rewriting Engine for NT/x86 boxes. The first of these tools is a call graph profiler that helps programmers understand where time is going in their program. The Etch Call Graph Profiler works on Win32 apps built by most compilers, does not require source, works with multithreaded programs, understands program and system DLLs, and is designed to run reasonably fast even if your machine only has an 'average' configuration. We've used it on serious programs ranging from Microsoft's SQL server to Lotus Wordpro, and even on Direct Draw games like Monster Truck Madness. If you'd like to give our profiler a try, you can download it from: http://etch.cs.washington.edu Follow the link to Download Etch Tools. Part of our motivation for distributing these tools is to get some feedback about what we've built and what we should be building. So please give our tools a try, and send mail to [email protected] with your comments and suggestions. Enjoy! The Etch Group Ted Romer Wayne Wong Alec Wolman Geoff Voelker Dennis Lee Ville Aikas Brad Chen Hank Levy Brian Bershad =================================TOP=============================== Q112: That's not a book. That's a pamphlet! Brian Silver wrote: > Ben Self wrote: > > > [Snip] > > Since this is a response largely to Brian Silver's post, a person I have > > a good deal of respect for, I have chosen to include some quotes form > > Dave Butenhof's book, Programming with POSIX Threads, because we both > > know it and have a mutual admiration for his work. > > Flattery will get you everywhere! > > (But what makes you think I admire Dave's work!? His code sure is > formatted nice, though! He also puts useful comments in > his code. Except for the thd_suspend implimentation in his book.) Brian, after all, likes to tell me that I've published a "pamphlet", because Addison-Wesley chose to use a soft cover, and "books" have hard covers. For those who may not have figured all this out, by the way, Brian's office is diagonally across from mine -- he's currently a contractor working on Digital's threads library. As for the suspend example in the book... 'tis true, it is not well documented. Of course, Brian should have been a little more cautious, since, as a footnote documents for posterity, the example is Brian's code, essentially as he wrote it. (And, as Brian just admitted, he doesn't tend to comment as well as I do. ;-) ) Dave Butenhof =================================TOP=============================== Q113: Using recursive mutexes and condition variables? I have a question regarding recursive mutexes and condition variables. Given a mutex created with one of the following attributes: DCE threads pthread_mutexattr_setkind_np( &attr, MUTEX_RECURSIVE_NP ); X/Open XSH5 (UNIX98) pthread_mutexattr_settype( &attr, PTHREAD_MUTEX_RECURSIVE ); What exactly is the behavior of a pthread_cond_wait() and what effect do "nested" locks have on this behavior? Do mixing recursive locks and condition variables make any sense? This is largely achademic. However, I (like everyone else in known space/time) maintain an OO abstraction of a portable subset of Pthreads and would like to know the appropriate semantics. Since the advent of the framework (about 3 years ago) I have managed to avoid using recursive mutexes. Unfortunately, my back may be against the wall on a few new projects and I may be forced to use them. They seem to be a real pain. And yes I promise this concludes my postings on cvs for the forseen future. thanks, --ben =================================TOP=============================== Q114: How to cleanup TSD in Win32? >I am forced to use TSD in multithreading existing code. I just ran >into the problem that, while the destructor function argument to >pthread_key_create() and thr_keycreate() appears ideal, there is no >such facility with the NT TlsAlloc() and that related stuff. It's pretty easy to write the code to provide this facility. Basically you have to wrap TlsAlloc() and TlsFree() in some additional logic. This logic maintains a list of all the keys currently allocated for your process. For each allocated key it stores the address of the destructor routine for that key (supplied as an argument to the wrapped version of TlsAlloc()). When a thread exits it iterates through these records; for every key that references valid data (i.e. TlsGetValue() returns non-NULL) call the relevant destructor routine, supplying the address of the thread specific data as an argument. The tricky part to all this is figuring out when to call the routine that invokes the destructors. If you have complete control over all your threads then you can make sure that it happens in the right place. If, in the other hand, you are writing a DLL and you do not have control over thread creation/termination this whole approach gets rather messy. You can do most of the right stuff in the DLL_THREAD_DETACH section of DllMain() but the thread that attached your DLL will not take this route, and trying to clean up TLS from DLL_PROCESS_DETACH is dangerous at best. Good luck. Gilbert W. Pilz Jr. Systems Software Consultant [email protected] =================================TOP=============================== Q115: Onyx1 architecture has one problem Hi there, I made some parallel measurements on SGI's. It seemed that the Onyx1 architecture has one problem: the system bus, which introduces a communication bottleneck. Memory access and float point calculation introduces traffic on that bus. Measurements on the new Onyx2 crossbar-based architecture suggested that these problems would be solved. However, some early measurements suggested two thoughts: 1. Float point calculation scales better on the Onyx2 architecture, which suggests that this problem was really communication related. (-> crossbar). Going beyond 4 processores (more than one crossbar), the scaling goes down. 2. Memory allocation: Memory allocation (basically a sequential operation) is *really* slow. Most time is spend at the locking mechanism. This surprises me, because the pthread mutices I'm using in the code are called at least as much as the memory allocation, but they are much faster. Does anybody at SGI has some hints to explain this behaviour? Thanks, Dirk -- ======== 8< ======= 8< ======= 8< ======= 8< ======= Dirk Bartz University of Tuebingen Dirk Bartz writes: > 2. Memory allocation: > Memory allocation (basically a sequential operation) is *really* > slow. Most time is spend at the locking mechanism. I have noticed this as well, albeit with the `old' sproc-threads on Irix-5.3. ptmalloc seems to be an order of magnitude faster in the presence of multiple threads on that platform: ftp://ftp.dent.med.uni-muenchen.de/pub/wmglo/ptmalloc.tar.gz However, for Irix-6 with pthreads, you have to use a modified ptmalloc/thread-m.h file, as I've recently discovered. I will send you that file by mail if you're interested; it will also be in the next ptmalloc release, due out RSN. Regards, Wolfram. =================================TOP=============================== Q116: LinuxThreads linked with X11 seg faults. Unfortunately the X11 libraries are not compiled with -D_REENTRANT, hence the problems. You can get the source for the X11 libraries and rebuild them with the -D_REENTRANT flag and that should help. If you are using Motif you are out of luck. I spoke to the folks who supply motif for RedHat Linux. They refused to give me a version recompiled with the -D_REENTRANT version. They gave me a load of crap about having to test it and so forth. I tried using LessTif, but it seemed to be missing too much. Neil =================================TOP=============================== Q117: Comments about Linux and Threads and X11 > LinuxThreads linked with X11 by g++ causes calls to the X11 library to seg > fault. You can either use Proven's pthread package or LinuxThreads. Proven's is a giant replacment for the standard libraries that does user level threads in a single process. LinuxThreads uses the Operating system clone() call to implement threads as seperate processes that share the same memory space. LinuxThreads seems to be tricky to install as it requires new versions of the standard libraries in addition to a 2.x kernel and the pthread library. However, if you get the latest version of RedHat, you're all set. I've found Proven's implementation to be much faster, though somewhat messier to compile and a bit incomplete in it's system call implementation (remember, it has to provide a substitute for almost everys system call). Unfortunately I had to switch to LinuxThreads because the signal handling under Proven's threads was not working properly. In particular, disk performance seems to suffer under LinuxThreads. As far as I can tell, the OS level disk caching scheme gets confused by all the thread/processes that are created. It's also a bit unnerving typeing "ps" and seeing fourty copies of your application running! ... Unfortunately the X11 libraries are not compiled with -D_REENTRANT, hence the problems. You can get the source for the X11 libraries and rebuild them with the -D_REENTRANT flag and that should help. If you are using Motif you are out of luck. I spoke to the folks who supply motif for RedHat Linux. They refused to give me a version recompiled with the -D_REENTRANT version. They gave me a load of crap about having to test it and so forth. I tried using LessTif, but it seemed to be missing too much. Neil =================================TOP=============================== Q118: Memory barriers for synchonization Joe Seigh wrote: > > So there are memory barriers in mutexes, contrary to what has been stated > before in this newsgroup. Furthermore, it appears from what you are saying is > that the mutex lock acts as a fetch memory barrier and the mutex unlock > acts as a store memory barrier, much like Java's mutex definitions. > Which is not suprising. Java appears to have carried over quite a bit of the > POSIX thread semantics. This is not QUITE correct. First off, the semantic of locking or unlocking a mutex makes no distinction regarding read or write. In an architecture that allows reordering reads and writes, neither reads nor writes may be allowed to migrate beyond the scope of the mutex lock, in either direction. That is, if the architecture supports both "fetch" and "store" barriers, you must apply the behavior of both to locking AND unlocking a mutex. The Alpha, for example, uses MB to prevent reordering of both reads and writes across the "barrier". There's also a WMB that allows read reordering, but prevents write reordering. WMB, while tempting and faster, CANNOT be used to implement (either lock or unlock of) a POSIX mutex, because it doesn't provide the necessary level of protection against reordering. Finally, let's be sure we're speaking the same language, ("Gibberishese"). People use "memory barrier" to mean various things. For some, it means a full cache flush that ensures total main memory coherency with respect to the invoking processor. That's fine, but it's stronger than required for a mutex, and it's not what *I* mean. The actual required semantic (and that of the Alpha) is that a "memory barrier" controls how memory accesses by the processor may be reordered before reaching main memory. There's no "flush", nor is such a thing necessary. Instead, you ensure that data (reads and writes) cannot be reordered past the lock, in either of the processors involved in some transaction. An MB preceding the unlock of a mutex guarantees that all data visible to the unlocking processor is consistent as of the unlock operation. An MB following the lock of the mutex guarantees that the data visible to the locking processor is consistent as of the lock operation. Thus, unlocking a mutex in one thread does not guarantee consistent memory visibility to another thread that doesn't lock a mutex. Coherent memory visibility, in the POSIX model, for both readers and writers, is guaranteed only by calling specific POSIX functions; the most common of which are locking and unlocking a mutex. A "memory barrier", of any sort, is merely one possible hardware mechanism to implement the POSIX rules. /---------------------------[ Dave Butenhof ]--------------------------\ =================================TOP=============================== Q119: Recursive mutex debate Robert White wrote: > > I STRONGLY DISAGREE with the idea that recursive mutexes "are a bad idea". > > I have made and use a recursive mutex class in several key C++ endeavors. As a > low-level tool recursive mutexes are "bad" in that they tend to lead the sloppy > down dangerous roads. Conversly, in experienced hands an recursive mutex is a > tool of simple elegance. The core thing, as always, is "knowing what you are > doing". Hey, look, recursive mutexes aren't illegal, they're not "morally perverse", and with XSH5 (UNIX98) they're even standard and portable. So, fine -- if you like them, you use them. Use them as much as you like, and in any way you like. But remember that they're ALWAYS more expensive then "normal" mutexes (unless your normal mutexes are more expensive than they need to be for the platform!). And remember that WAITING on a condition variable using a recursively locked mutex simply won't work. So, if you're using a condition variable to manage your queue states, you need to at least analyze your lock usage sufficiently to ensure that the wait will work. And, once you've done that, it's a simple step to dropping back to a normal mutex. There are definitely cases where the expense is acceptable, especially when modifying existing code -- for example, to create a thread-safe stdio package. The performance isn't "extremely critical", and you don't need to worry about condition wait deadlocks (there's no reason to use them in stdio). Sorting out all of the interactions between the parts of the package is difficult, and requires a lot of new coding and reorganization -- and implementing some of the correct semantics gets really tricky. Don't waste time optimizing code that's not on the critical path. If you've got code that's on your critical path, and uses recursive mutexes, then it's NOT optimized. If you care, you should remove the recursive mutexes. If you don't care, fine. If the use of recursive mutexes in non-critical-path code doesn't put it on the critical path, there's no reason to worry about them. Still, I, personally, would use a recursive mutex in new code only with extreme reluctance and substantial consideration of the alternatives. /---------------------------[ Dave Butenhof ]--------------------------\ [I echo Dave's "extreme reluctance and substantial consideration of the alternatives" -Bil] =================================TOP=============================== Q120: Calling fork() from a thread > Can I fork from within a thread ? Absolutely. > If that is not explicitly forbidden, then what happens to the other threads in > the child process ? There ARE no other threads in the child process. Just the one that forked. If your application/library has background threads that need to exist in a forked child, then you should set up an "atfork" child handler (by calling pthread_atfork) to recreate them. And if you use mutexes, and want your application/library to be "fork safe" at all, you also need to supply an atfork handler set to pre-lock all your mutexes in the parent, then release them in the parent and child handlers. Otherwise, ANOTHER thread might have a mutex locked when one thread forks -- and because the owning thread doesn't exist in the child, the mutex could never be released. (And, worse, whatever data is protected by the mutex is in an unknown and inconsistent state.) One draft of the POSIX standard had included the UI thread notion of "forkall", where all threads were replicated in the child process. Some consider this model preferable. Unfortunately, there are a lot of problems with that, too, and they're harder to manage, because there's no reasonable way for the threads to know that they've been cloned. (UI threads allows that blocking kernel functions MAY fail with EINTR in the child... but that's not a very good basis for a recovery mechanism.) After much discussion and gnashing of teeth and tearing of hair, the following draft removed the option of forkall. > Is there a restriction saying that it's OK provided the child immediately does > an exec ? Actually, this is the ONLY way it's really safe, unless every "facility" in the process has proper and correct forkall handling to protect all of the process state across the fork. In fact, despite the addition of forkall handlers in POSIX 1003.1c, the standard specifically says that the child process is allowed to call only async signal safe functions prior to exec. So, while the only real purpose of forkall is to protect the user-mode state of the process, you're really not guaranteed that you can make any use of that state in the child. > What if I do this on a multiprocessor machine ? No real difference. You're more likely to have "stranded" mutexes and predicates, of course, in a non-fork-safe process that forks, becuase other threads were doing things simultaneously. But given timeslicing and preemption and other factors, you can have "other threads" with locked mutexes and inconsistent predicates even on a uniprocessor. Just remember, that, in a threaded process, it's not polite to say "fork you" ;-) /---------------------------[ Dave Butenhof ]--------------------------\ > David Butenhof wrote: >> >> The "UI thread" version of fork() copies ALL threads in the child. The >> more standard and reasonable POSIX version creates a child process with a >> single thread -- a copy of the one that called fork(). >> > Sorry to ask...what do you mean by `the "UI thread" version of fork()'? > I'm a little confused here. Alright, if you're only "a little confused", then we haven't done our jobs. We'll try for "very confused", OK? Let me know when we're there. ;-) First, the reference to "UI threads" may have seemed to come out of the blue if you're new to this newsgroup and threads; so let's get that out of the way. "UI" was a committee that for a time controlled the direction and architecture of the System V UNIX specification. (UNIX International.) The thread interfaces and behavior they defined (which was essentially what Sun had devised for Solaris, modified somewhat along POSIX lines in places) are commonly known as "UI threads". (Or sometimes "Solaris threads" since they originated on Solaris and aren't widely available otherwise.) The UI thread definition of fork() is that all threads exist, and continue execution, in the child process. Threads that are blocked, at the time of the fork(), in a function capable of returning EINTR *may* do so (but need not). The problem with this is that fork() in a process where threads work with external resources may corrupt those resources (e.g., writing duplicate records to a file) because neither thread may know that the fork() has occurred. UI threads also has fork1(), which creates a child containing only a copy of the calling thread. This is equivalent to the POSIX fork() function, which provides a more controlled environment. (You can always use pthread_atfork() handlers to create daemon threads, or whatever else you want, in the child.) =================================TOP=============================== Q121: Behavior of [pthread_yield()] sched_yield() > > I have a question regarding POSIX threads on Linux and Solaris. The > > program below compiles and links well on both systems, but instead of the > > expected "100000, " it always prints out > > "100000, 0", so the thread is not really ever started. > > > > Well, both sets of output are legal and correct for the code you supplied. Yes, this is correct. > First, you see [p]thread_yeild does not say "give control to another thread" > it says "if there is another thread that can be run, now might be a good time > do do that". The library is under no obligation to actually yeild. (there > is a good explaination of this elsewhere in this group, but it has to do with > the fact that you are running under SCHED_OTHER semantics which are > completely unspecified semantics, go figure.) Just for clarity... pthread_yield is an artifact of the obsolete and crufty old DCE thread implementation (loose interpretation of the 1990 draft 4 of the POSIX thread standard). It doesn't exist in POSIX threads. thr_yield is an artifact of the UI threads interface, which is, (effectively though not truly), Solaris proprietary. sched_yield is the equivalent POSIX function. As Robert said, POSIX assigns no particular semantics to the SCHED_OTHER scheduling policy. It's just a convenient name. In the lexicon that we developed during the course of developing the realtime and thread POSIX standards, it is "a portable way to be nonportable". When you use SCHED_OTHER, which is the default scheduling policy, all bets are off. POSIX says nothing about the scheduling behavior of the thread. (Although it does require a conforming implementation to DOCUMENT what the behavior will be.) Because there's no definition of the behavior of SCHED_OTHER, it would be rather hard to provide any guarantees about the operation of the sched_yield function, wouldn't it? If you want portable and guaranteed POSIX scheduling, you must use the SCHED_FIFO or SCHED_RR scheduling policies (exclusively). And, of course, you need to run on a system that supports them. > Next, the number of threads in a (POSIX) program does not necessarily say > anthing about the number of actual lightweight processes that will be used to > execute the program. In your example there is nothing that "forcably" causes > the main thread to give up the processor (you are 100% CPU related) so your > first thread runs through to completion. An identically arranged ADA program > (which wouldn't quite be possible 8-) would have equally unstable results. > (I've seen students write essentially this exact program to "play with" tasks > and threads in ADA and C, but the program is not valid in any predictable > way.) POSIX doesn't even say that there's any such thing as a "light weight process". It refers only obliquely to the hypothetical concept of a "kernel execution entity", which might be used as one possible implementation mechanism for Process Contention Scope thread scheduling. > Finally, POSIX only says that there will be "enough" LWPs at any moment to > ensure that the program as a whole "continues to make progress". That's not strictly true. All POSIX says is that a thread that blocks must not indefinitely prevent other threads from making progress. It says nothing about LWPs, nor places any requirements upon how many there must be. > When you do the SIGINT from the keyboard you are essentially causing the > "current" thread to do a pthread_exit/abort. Now there is only one thread > left, the "second" one, so to keep the program progressing that one get's the > LWP from the main thread. That is why you see the second start up when you > do a "^C"... SIGINT shouldn't "do" anything to a thread, on a POSIX thread system. IF it is not handled by a sigaction or a sigwait somewhere in the process, the default signal action will be to terminate the process (NOT the thread). It's not clear from the original posting exactly where the described results were seen: Linux or Solaris? My guess is that this is Linux, with the LinuxThreads package. Your threads are really cloned PROCESSES, and I believe that LinuxThreads still does nothing to properly implement the POSIX signal model among the threads that compose the "process". That may mean that, under some circumstances, (and in contradiction to the POSIX standard), a signal may affect only one thread in the process. The LinuxThreads FaQ says that SIGSTOP/SIGCONT will affect only the targeted thread, for example. Although it also says that threads "dying" of a signal will replicate the signal to the other threads, that might not apply to SIGINT, or there might be a timing window or an outright hole where that's not happening in this case. LinuxThreads is, after all, a freeware thread package that's from all reports done an excellent job of attacking a fairly ambitious goal. A few restrictions and nonconformancies are inevitable and apparently acceptable to those who use it (although it's gotta be a portability nightmare for those who use signals a lot, you're always best off avoiding signals in threaded programs anyway -- a little extra "incentive" isn't a bad thing). If you see this behavior on Solaris, however, it's a serious BUG that you should report to Sun. > The very same program with a single valid "operational yeild" (say reading a > character from the input device right after the pthread_create()) will run at > 100% CPU forever because it will never switch *OUT* of the second thread. At least, that's true on Solaris, where user threads aren't timesliced. To get multiple threads to operate concurrently, you need to either manually create additional LWPs (thr_setconcurrency), or create the threads using system contention scope (pthread_attr_setscope) so that each has its own dedicated LWP. Solaris will timeslice the LWPs so that multiple compute-bound threads/processes can share a single processor. LinuxThreads directly maps each "POSIX thread" to a "kernel thread" (cloned process), and should NOT suffer from the same problem. The kernel will timeslice the "POSIX threads" just as it timeslices all other processes in the system. On Digital UNIX, the 2-level scheduler timeslices the user ("process contention scope") threads, so, if a compute-bound SCHED_OTHER thread runs for its full quantum, another thread will be given a chance to run. > In essence there is no good "Hello World" program for (POSIX) threads (Which > is essentially what you must have been trying to write 8-). If the threads > don't interact with the real world, or at least eachother, the overall > program will not really run. The spec is written to be very responsive to > real-world demands. That responsiveness in the spec has this example as a > clear degenerate case. That's not true. "Hello world" is easy. If the thread just printed "Hello world" and exited, and main either joined with it, or called pthread_exit to terminate without trashing the process, you'd see exactly the output you ought to expect, on any conforming POSIX implementation. The problem is that the program in question is trying to execute two compute-bound threads concurrently in SCHED_OTHER policy: and the behavior of that case is simply "out of scope" for the standard. The translation of which is that there's no reasonable assumption of a portable behavior. /---------------------------[ Dave Butenhof ]--------------------------\ =================================TOP=============================== Q122: Behavior of pthread_setspecific() > Can you explain the discrepancy between your suggestion and the > following warning, which I found in the SunOS 5.5.1 man page for > "pthread_setspecific". > > ****************************************************************** > WARNINGS > pthread_setspecific(), pthread_getspecific(), > thr_setspecific(), and thr_getspecific(), may be called > either explicitly, or implicitly from a thread-specific data > destructor function. However, calling pthread_setspecific() > or thr_setspecific() from a destructor may result in lost > storage or infinite loops. > > SunOS 5.5.1 Last change: 30 Jun 1995 4 > ****************************************************************** > > I'm not sure how an infinite loop might occur, while using > "pthread_setspecific" in a destructor. Do you know the answer? We're talking about two different things. 1) What the standard says, which is that the destructor is called, and may be called repeatedly (until a fixed, implementation specified limit, or forever), until the thread-specific data values for the thread become NULL. Because the standard doesn't say that the implementation is required to clear the value for each key as the destructor is called, that requirement is, implicitly, placed on the application. (This oversight will be corrected in a future update to the standard.) In order to set the value to NULL, you clearly must call the function pthread_setspecific() within the destructor. Note that setting the value to NULL within the destructor will work either with the current standard (and the current LinuxThreads literal implementation) AND with the fixed standard (and most other implementations, which have already implemented the correct semantics, figuring that an infinite loop usually is not desirable behavior). 2) The correct POSIX semantics, which are implemented by Solaris and Digital UNIX. (Probably also by IRIX, HP-UX, and AIX, although I haven't been able to verify that.) The Solaris manpage warning is imprecise, however. There's no problem with a destructor explicitly setting the value to NULL. The warning SHOULD say that setting a thread-specific data value to any non-NULL value within a destructor could lead to an infinite loop. Or, alternately, to a memory leak, if the new value represents allocated heap storage, and the system has a limit to the number of times it will retry thread-specific data destruction. /---------------------------[ Dave Butenhof ]--------------------------\ Q123: Linking under OSF1 3.2: flags and library order Joerg Faschingbauer wrote: > Hi, > > recently I posted a question about the correct linking order under > Solaris 2.4. Got some valuable hints, thanks. > > I have a similar problem now, this time under OSF1 3.2. Can anybody > tell me if the following is correct? I could not find any hints on > that topic in the man pages. > > gcc ... -ldnet_stub -lm -lpthreads -lc_r -lmach > > Does pthreads need stuff from c_r, or the other way around? Do I need > mach at all? Do I need dnet_stub at all? In a threaded program prior to Digital UNIX 4.0, EVERYTHING needs libc_r, because libc is not thread-safe. Yes, the thread library requires libmach, and, because of bizarre symbol preemption requirements (which, for trivia junkies, were at one time required by OSF for "OSF/1" branding), if you don't include libmach explicitly things might not work out right. You must specify libmach BEFORE libc_r. You don't need -ldnet_stub unless YOU need it (or some other library you're including). We certainly don't use it. The best way to build a threaded program on 3.2 is to use "cc -threads". If you're going to use gcc, or an older cxx that doesn't support "-threads", or if you need to use ld to link, then the proper expansion of "-threads" is: for compilation: -D_REENTRANT for linkage: -lpthreads -lmach -lc_r The linkage switches must be the LAST libraries, exclusive of libc. That is, if you were using ld to link, ... ld <.o files...> -lpthread -lmach -lc_r -lc crt0.o I don't believe the position of -lm with respect to the thread libraries will matter much, since it's pretty much independent. If you use -lm -threads, however, libm will precede the thread libraries, and that's a good standard to follow. A side effect of "-threads" is that ld will automatically look for a reentrant variant of any library that you specify. That is, if you specify "-lfoo", and there's a "libfoo_r", ld will automatically use libfoo_r. If you don't use -threads, you'll need to check /usr/shlib (or /usr/lib if you're building non-shared) for reentrant variants. Note that, to compile a DCE thread (draft 4) threaded program once you move to Digital UNIX 4.0 or higher, the compilation expansion of -threads will need to be changed to "-D_REENTRANT -D_PTHREAD_USE_D4", and the list of libraries should be "-lpthreads -lpthread -lmach -lexc". There's no libc_r on 4.0 (libc is fully thread-safe), and you need libexe since we've integrated with the standard O/S exception mechanism. Note the distinction between libpthread (the "core" library implementing POSIX threads), and libpthreads (the "legacy" library containing DCE thread and CMA wrapper functions on top of POSIX thread functions). Minor additional notes: as of Digital UNIX 4.0D we've dropped the final dependencies on the mach interfaces, so libmach is no longer required (you'll get smaller binaries and faster activation by omitting it once you no longer need to support earlier versions). And, of course, once you've moved to 4.0 or later, you should port to POSIX threads, in which case you can drop -lpthreads and -D_PTHREAD_USE_D4. /---------------------------[ Dave Butenhof ]--------------------------\ =================================TOP=============================== Q124: What is the TID during initialization? Lee Sailer wrote: > In a program I am "maintaining", there is a > > foo = RWThreadID(); > > call at global scope. Conceptually, this gets called before main(). > Does this seem OK? Can this code rely on the "thread" being used to do > initial construction to be the same as the "main thread"? [For example, the .init sections of libraries run before main() starts. -Bil] While the assumption will likely be true, most of the time, it strikes me as an extremely dangerous and pointless assumption. There are a lot of reasons why it might NOT be true, sometimes, on some platforms, under some circumstances. There's no standard or rule of ettiquette forbidding a difference. Even if the "thread" is the same, the "thread ID" might change as things get initialized. I recommend avoiding any such assumptions. /---------------------------[ Dave Butenhof ]--------------------------\ =================================TOP=============================== Q125: TSD destructors run at exit time... and if it crashes? Sebastien Marc wrote: > On Solaris you can associate a function (called destructor) that will be > called at the termination of the thread, even if it crashes. Almost. Both POSIX and UI threads interfaces include thread-specific data. When you create a thread-specific data (TSD) key, you can specify a destructor function that will be run when any thread with a non-NULL value for that key terminates due to cancellation [POSIX only] or voluntary thread exit (return from the thread's start routine, or a thread exit call -- pthread_exit or thr_exit). Yes, you can use that as a sort of "atexit" for threads, if you make sure that each thread uses pthread_setspecific/thr_setspecific to SET a non-NULL value for the TSD key. (The default value is NULL, and only the thread itself can set a value.) However, that doesn't help. There is simply no way that a thread can "crash" without taking the process with it. A unhandled signal will never terminate a thread -- either the signal is ignored, or it does something to the process (stop, continue, terminate). TSD destructors are NOT run: * on the child side of a fork * in a call to exec * in process termination, regardless of whether that termination is voluntary (e.g., a call to exit) or involuntary (an unhandled signal). In all those cases, threads quietly "evaporate", leaving no trace of their existence. No TSD destructors, no cleanup handlers, nothing. Gone. Poof. /---------------------------[ Dave Butenhof ]--------------------------\ =================================TOP=============================== Q126: Cancellation and condition variables Marcel Bastiaans wrote: > Anyone: > > I appear to be missing something in my understanding of how condition > variables work. I am trying to write a multithreaded program which is > portable to various platforms. I am unable to cancel a thread if it is > waiting on a condition variable which another thread is waiting on also. > The problem can easily be reproduced on both Solaris 2.5 and HP-UX 10.10. A > simple program which demonstrates my problem is shown below. This program > sample uses the HP-UX pthreads library but the problem also appears when > using Solaris threads on Solaris 2.5. In any case... yes, you are missing something. The program, as written, will hang on any conforming (or even reasonably correct) implementation of either DCE threads or POSIX threads. (To put it another way, any implementation on which it succeeds is completely broken.) > Is there a problem in this program which I don't understand? I cannot use > cleanup handlers because not all platforms support them. Any help would be > greatly appreciated. If you can use cancellation, you can use cleanup handlers. Both are part of both DCE threads (what you're using on HP-UX 10.10) and POSIX threads (what you probably are, and, at least, should be, using on Solaris 2.5.) If you've got cancellation, and you don't have cleanup handlers, you've got an awesomely broken implementation and you should immediately chuck it. When you wait on a condition variable, and the thread may be cancelled, you MUST use a cleanup handler. The thread will wake from the condition wait with the associated mutex locked -- even if it was cancelled. If the thread doesn't then unlock the mutex before terminating, that mutex cannot be used again by the program... it will remain locked by the cancelled thread. > #include > #include > #include > > pthread_cond_t cond; > pthread_mutex_t mutex; > > void * func(void *) > { > // Allow this thread to be cancelled at any time > pthread_setcancel(CANCEL_ON); > pthread_setasynccancel(CANCEL_ON); Serious, SERIOUS bug alert!! DELETE the preceding line before proceeding with this or any other program. Never, ever, enable async cancelation except on small sections of straight-line code that does not make any external calls. Better yet, never use async cancel at all. In any case, you absolutely CANNOT call any POSIX (or DCE) thread function with async cancellation enabled except the ones that DISABLE async cancel. (For bizarre and absolutely unjustifiable reasons [because they're wrong], POSIX threads also allows you to call pthread_cancel -- but don't do it!) > // Wait forever on the condition var > pthread_mutex_lock(&mutex); > for(;;) { > pthread_cond_wait(&cond, &mutex); > } > pthread_mutex_unlock(&mutex); > return 0; > } I suspect your problem is in cancelling the second thread. As I said, cancellation terminates the condition wait with the associated mutex locked. You're just letting the thread terminate with the mutex still locked. That means, cancelled or not, the second thread can never awaken from the condition wait. (At a lower level, you could say that it HAS awakened from the condition wait, but is now waiting on the mutex... and a mutex wait isn't cancellable.) The answer is... if you use cancellation, you must also use cleanup handlers. (Or other, non-portable equivalent mechanisms, such as exception handlers or C++ object destructors... on platforms where they're implemented to interoperate with cancellation. [Both Solaris and Digital UNIX, for example, run C++ destructors on cancellation.]) /---------------------------[ Dave Butenhof ]--------------------------\ =================================TOP=============================== Q127: RedHat 4.2 and LinuxThreads? > > The Linux kernel has supported multithreading for a very long time. > > thank you for the info, Bill. the man page for clone that ships with > Red Hat 4.2 states that clone does not work. here are my questions. > they all relate to Red Hat 4.2: > > 1. does clone work for all defined parameter values? > 2. where can i find a list of the c library api's that are not > reentrant? > 3. does RedHat 4.2 install LinuxThreads if "everything" is selected? > > > > Until recently, the API definition for POSIX thread support was > > contained in the LinuxThreads package, but that's just a wrapper > > around the kernel's built=in functioning. With the releace of libc6 > > (GNU libc) the LinuxThreads functionality is more tightly integrated > > into the basic C library, > > do you mean that the POSIX thread api's are now in libc so that > LinuxThreads is obsolete? With the glibc2 (2.0.5 c is current I think) LinuxThreads is obsolete. However, you have to get yourself the additional glibc-linuxthreads package, but that's detail. AFAIK glibc2 is still in the beta stadium, but it works quite well. Moreover, it is recommended to use glibc2 for multithreading rather than libc5. As H.J.Lu, libc5's maintainer, once stated: "I'm surprised it works at all" (or so). You can install a "beta" of the libc6 (aka glibc2) as a secondary C library against which you link your program, and keep the good old libc5 as the primary library which the system related programs use. Take a look at http://www.imaxx.net/~thrytis/glibc/ for HOWTOs etc. Joerg ---------------------------------------------------------------------------- Joerg Faschingbauer [email protected] Voice: ++43/316/820918-31 Fax: ++43/316/820918-99 ---------------------------------------------------------------------------- =================================TOP=============================== Q128: How do I measure thread timings? Andy Sunny wrote: > I'm conducting some research to measure the following things about > pthreads using a Multikron II Hardware Instrumentation Board from NIST > 1) thread creation time (time to put thread on queue) > 2) thread waiting time (time that thread waits on queue) > 3) thread execution time (time that thread actually executes) > > Are there any decent papers that explain the pthreads run time system > and scheduling policy in DETAIL? I have read Frank Mueller's (FSU) paper > and am trying to obtain the standard from IEEE. What is the latest > version of the standard and will it help me find the proper libraries > and functions need to measure the above items? The standard is unlikely to be of any help to you. It says nothing at all about implementation. POSIX specifies SOURCE-LEVEL interfaces, and describes the required portable semantics of those interfaces. Implementation details are (deliberately, properly, and necessarily) left entirely to the creator of each implementation. For example, there's no mention of libraries -- an embedded system, for example, might include all interfaces in an integrated kernel; and that's fine. What you need is a document describing the internal implementation details of the particular system you're using. If the vendor can't supply that, you'll need to create it yourself -- either by reading source, if you can get it, or by flailing around blindly in the dark and charting the walls you hit. /---------------------------[ Dave Butenhof ]--------------------------\ =================================TOP=============================== Q129: Contrasting Win32 and POSIX thread designs Arun Sharma wrote: > On Mon, 24 Nov 1997 18:10:13 GMT, Christophe Beauregard wrote: > > c> while thread context is a Windows concept. > > How so ? pthreads don't have contexts ? This looks like an interesting discussion, of which I've missed the beginning. (Perhaps only the followup was cross-posted to comp.programming.threads?) Anyway, some comments:Anything has "context". A thread is an abstraction of the executable state traditionally attributed to a process. The "process" retains the non-executable state, including files and address space. Why would anyone contend that "thread context is a Windows concept"? I can't imagine. Maybe it's buried in the unquoted portion of the orignal message. And then again, some people think Microsoft invented the world. > c> Generally, you'll find that pthreads gives you less control > c> over how a thread runs. There are very good reasons for > c> this (one being portability, another being safety). > > In other words, it has to be the least common denominator in the > fragmented UNIX world. No wonder people love NT and Win32 threads. POSIX threads gives you far more real control over threads than Win32 (for example, far superior realtime scheduling control). What it doesn't give you is suspend/resume and uncontrolled termination. Those aren't "control over how a thread runs". They are extraordinarily poor programming mechanisms that can almost never be used correctly. Yes, to some people the key is "almost never", and one may argue that they should be provided anyway for that 0.001% of applications that "need" it. (But those of us who actually support threaded interfaces might also point out that these extremely dangerous functions are for some reason particularly tempting to beginners who don't know what they're doing -- resulting in very high maintenance costs, which mostly involves helping them debug problems in their code.) This isn't an example of "fragmented UNIX" -- it's UNIX unity, with a wide variety of different "UNIX camps" reaching a concensus on what's necessary and useful. While the Win32 interface comprises whatever the heck a few designers felt like tossing in, POSIX was carefully designed and reviewed by a large number of people, many of whom knew what they were doing. Omitting these functions was a carefully considered, extensively discussed, and quite deliberate decision. The Aspen committee that designed the thread extensions to POSIX for the Single UNIX Specification, Version 2, proposed suspend/resume -- they were later retracted by the original proposer (with no objections). A POSIX draft standard currently under development, 1003.1j, had proposed a mechanism for uncontrolled termination, with the explicit recognition that it could be used (and then only with extreme care) only in carefully constructed embedded systems. It, too, was later withdrawn as the complications became more obvious. (The notion that you can regain control of a process when you've lost control of any one thread in the process is faulty, because all threads depend completely on shared resources. If you've lost control of a thread, you don't know the state of the process -- how can you expect it to continue?) > c> Basically, using signals for dealing with threads is a Bad > c> Thing and people who try generally get screwed. > > It doesn't have to be so. That's an implementation problem. Yes, it does have to be so, because signals are a bad idea to begin with. Although there were enormous complications even before threads, the concept becomes all but unsupportable with the addition of full asynchronous execution contexts to the traditional process. The "synchronous" signals, including SIGSEGV, should be language exceptions. The other "asynchronous" signals should be handled synchronously in independent contexts (threads). If you think about it, that's what signals were attempting to do; the condition exists as a separate execution context (the signal handler). Unfortunately, a signal preempts the hardware context of the main execution context, asynchronously. That's a really, really bad idea. Although people have always casually done things like calling printf in signal handlers, too few people realize that's always been incorrect and dangerous -- only a small list of UNIX functions are "async-signal safe". The addition of threads, however, allowing the process to have multiple contexts at any time, increases the chances that some thread will be doing something that will conflict with improper use of non async-signal safe functions at signal level. > Portability doesn't necessarily have to cripple the API. And, in fact, it doesn't. It results in a well-designed and robust interface that can be efficiently implemented everywhere. I'm not arguing that the POSIX interface is perfect. There is room for additions, and the Single UNIX Specification, Version 2, makes a good start. Other areas to consider for future standardization would include debugging and analysis interfaces. There are POSIX standards in progress to improve support for "hard realtime" environments (for example, putting timeouts on all blocking functions to control latency and help diagnose failures). /---------------------------[ Dave Butenhof ]--------------------------\ =================================TOP=============================== Q130: What does POSIX say about putting stubs in libc? Patrick TJ McPhee wrote: > I'd like to know what Posix has to say about putting stubs in libc. > Is it permitted? Is it required? What return code can we expect to > receive from such a stub, and how can we portably ignore it? POSIX doesn't have ANYTHING to say. POSIX 1003.1 doesn't really recognize the existance of the concept of a "library". It defines a set of SOURCE LEVEL interfaces that shall be provided by implementations and that may be used by applications to achieve certain portable semantics. Now, 1003.2 says a little about libraries. That is, there's something with a ".a" file suffix, and there's a utility called "ar" to create them, and a utility called "c89" with a "-l " switch that may read from a "lib .a" (and may also read additional file suffixes, e.g., .so). 1003.2 doesn't say anything about which symbols may or should be resolved from which libraries, though, and hasn't been updated since 1003.1c-1995 (threads), in any case. So, if your system's provides a definition for _POSIX_THREADS, then 1003.1c-1995 tells you that you can potentially call pthread_create. It does not tell you which libraries you need to link against. UNIX98 does specify that, for c89, the proper incantation is "-lpthread". But even that's not the same as a requirement that the symbols must resolve from, and only from, a libpthread library: only that you're not allowed to build a portable threaded application without REQUESTING libpthread. (And if you use cc instead of c89, UNIX98 doesn't help you any more than POSIX, aside from the gentle SUGGESTION that an implementation provide the thread implementation in a libpthread -- which had, in any case, already become the defacto industry standard.) So, yes, it's "permitted", and, no, it's not "required". If you're building an APPLICATION using threads, there's no confusion or problem. You build according to the rules of the platform, and you've got threads, wheresoever they might reside. If you try to use threads without building properly, all bets are off, because you blew it. If you're getting the interfaces accidentally from somewhere else, that's nobody's fault but your own. If you're trying to build thread-safe code that doesn't use threads, you've got a portability problem. No standard will help you accomplish this. That's too bad. Requiring libc "stubs" would be one way out -- but as I've already said, (and as I'll reiterate in the next paragraph!), the Solaris implementation has some serious limitations of which I don't approve. I would not consider that an acceptable standard. I'm not entirely happy with our own solution (a separate set of "tis" interfaces), either, because extra interfaces are nobody's friend. One might say that there is room here for innovation. ;-) If you're trying to build a library that uses threads, regardless of whether the main program uses threads -- well, you're in trouble again. You SHOULD be able to simply build it as if you were building a threaded application, and it should work. Unfortunately it won't work (portably) unless the main program is linked against the thread library(s), whether or not it needs them. Symbol preemption will work against you if there are "stubs" for any functions in a library that will be searched by direct dependencies of the main program. (Even if your library is searched first, ITS direct dependencies will go at the end of the list.) That's the problem with the Solaris libc stubs. (I'd like to say that Digital UNIX avoids this, and that's certainly the intent; but unfortunately it's not yet entirely true. Although there are no stubs conflicting with libpthread, we depend on the libexc exception library, which has a conflicting stub in libc. Luckily, this affects relatively few operations -- but, technically, it still means it doesn't work.) On the other hand, your final question is easy. There's no need to "portably ignore" the errors that a stub might generate. Look, if you try to create a thread, it either succeeds or it fails. You get back 0, and it worked. Anything else, and it failed. If the failure is EAGAIN, you might choose to try again later. Otherwise... hey, you're just not going to create that thread, so deal with it. The only question is: can you live with that? If you don't NEED to create a thread, go on with life, single threaded. If you NEED to create the thread, then you're done. (Whether you return a failure to your caller, or abort the process, probably depends on what you're trying to do, and how your interface is designed.) It really doesn't matter whether you got activated against libc stubs or a real thread library that for some reason refuses to create the thread. You're not going to do what you wanted to do, and that's that. /---------------------------[ Dave Butenhof ]--------------------------\ =================================TOP=============================== Q131: MT GC Issues [See Geodesic Systems (www.geodesic.com) -Bil] Sanjay Ghemawat wrote: > All collectors I have known of and read about control both the allocation > and the deallocation of objects. So it is fairly easy for them to grab > all of the locks required before suspending threads. The only problem > here might be locks held within the operating system on behalf of a thread > that is about to be suspended. Even here, if one is using a thread interface > like the one provided by Mach, a call to "thread_abort" will pop the > thread out of the OS. There is no general or portable mechanism equivalent to thread_abort, and it is pretty limited even on Mach. (First, you have to know it's in a blocking Mach call.) > >Furthermore, suspend/resume may possibly be necessary for concurrent garbage > >collection (I haven't yet been convinced of that -- but I haven't found a good > >alternative, either), but it's definitely far from desirable. It's an ugly and > >stupendously inefficient kludge. Remember, you have COMPLETELY STOPPED the > >application while garbage collecting. That's GOOD? Parallel applications want > > First of all, most incremental/concurrent collectors only stop the > application while they find all of the pointers sitting in thread > stacks/registers/etc. The collectors that provide better real-time > guarantees tend to make other operations (such as storing a pointer > onto a stack) expensive. I think there are two classes of systems > here: hard real-time and others. A good performance tradeoff for > systems that do not require hard real-time bounds is to use an > incremental/concurrent collector that may introduce pauses, but does > not slow down the mutator with lots of book-keeping work. So I think > the argument that suspend/resume are bad only applies to some systems, > not all. Probably not even the vast majority of systems that run on > desktops and commercial servers. Look, if you want concurrency, you DON'T want pauses. I already acknowledged that there might not be an alternative, and that we certainly don't know of any now. Maybe it's the best tradeoff we can get. Stopping all the threads is still bad. Necessity does not equate to desirability. If you really think that it is BENEFICIAL to stop the execution of a concurrent process, argue on. Otherwise, drop it. > >application while garbage collecting. That's GOOD? Parallel applications want > >concurrency... not to be stopped dead at various unpredictable intervals so > >the maintenance staff can check the trash cans. There has gotta be a better > >way. > > So we should all wait and not use garbage-collection in multi-threaded > programs until that better way is found? I don't see why you are so > vehemently set against suspend/resume. It solves real problems for > people implementing garbage collection. Yes there are tradeoffs > here: suspend/resume have their downside, but that doesn't mean we > should ignore them. Because suspend and resume are poor interfaces into the wrong part of a scheduler for external use. They are currently used (more or less) effectively for a small class of specialized applications (e.g., garbage collection). They are absolutely inappropriate for general use. The fact that a bad function "can be used effectively" doesn't mean it should be standardized. Standards do not, and should not, attempt to solve all possible problems. > So do that. I don't think there are an unbounded number of such > required operations. In fact I have implemented a collector for a > multi-threaded environment that requires just three such operations > suspend, resume, and reading registers from a suspended thread. > And folding in the register-state extraction into the suspend call > seems like a fine idea. Now, are we talking about "garbage collectors", or are we talking about suspend and resume? All this rationalization about garbage collection spills over naturally into a discussion of suspend and resume -- but that's happenstance. Sure, our concurrent GC systems, and especially Java, use suspend/resume. But that's "because it was there", and solved the problem of pinning down threads long enough to get their state. But the function required of concurrent garbage collection is not "suspend all threads and collect their registers, then resume them". The required function is "acquire a consistent set of live data root pointers within the process". Yes, there are a bounded set of operations required for GC -- and that has nothing at all to do with suspend or resume. If the argument for standardizing suspend and resume is to revolve entirely around the needs of today's semi-concurrent GC, then we should be designing an interface to support what GC really needs, not standardizing an ugly and dangerous overly-generalized scheduling function that can be (mis-)used to implement one small part of what GC needs. > >> Since the implementation basically needs to be there for any platform > >> that supports Java, why not standardize it? Alternatively, if there is > >> a really portable solution using signals, I would like to see it > >> advertised. > > > >Any "why not" can be rephrased as a "why". Or a "so what". > > Oh come on. By this argument, you could do away with all standards. > The reason for standards is so that multiple implementations with > the same interface can exist and be used without changing the > clients of the interface. And by the converse, perhaps we should standardize every whim that comes into anyone's head? Baloney. We should standardize interfaces that are "necessary and sufficient". Determining exactly of what that consists is not always easy -- but it's important because standards have far- and long-reaching consequences. > I apologize if this message has come across as a bit of a rant, but > I am tired of people assuming that everyone who asks for suspend/resume > must be an idiot who does not understand the available thread > synchronization mechanisms. There are legitimate uses for > suspend/resume, of course with some performance tradeoffs. By > making the decision to not implement them in a thread library, > you are taking away the ability of the clients of the library > to decide on the tradeoff according to their need. That makes > the thread library less useful. I guess we're even -- because I'm tired of hearing people insist that because they want suspend/resume, it must be universally accepted as "a cool thing" and forced down everyone's throat. It's not a cool thing. And, by the way, I have yet to hear of a single truly legitimizing use. The use of suspend and resume by GC is an expedient hack. It isn't really accomplishing what GC needs. It's far more heavyweight than required, (as you pointed out, a GC system suspends threads to get a consistent set of root pointers, NOT because it wants to suspend the threads), and it doesn't provide half the required capabilities (after all, the real goal is to get the pointers -- the registers). As for your final dig, I'm tempted to laugh. You know what makes a thread library less useful? Providing unsupportable functions that are nearly impossible to use safely and that therefore result in significant support costs, preventing the development team from doing work that would provide useful features and fixing real problems. /---------------------------[ Dave Butenhof ]--------------------------\ =================================TOP=============================== Q132: Some details on using CMA threads on Digital UNIX [email protected] wrote: > I'm trying to port code from and HP that used the cma threads package to > a DEC Alpha with the posix package. I've found that some of the standard > header files (DEC C++) have conflicting definitions (e.g., sys/types.h > and pthreads_exc.h). Has anyone encountered this porblem and is there > some simple conversion utility or a better library to use is such a port. A number of questions present themselves immediately, including: 1. What version of Digital UNIX are you using? 2. Are you trying to compile with CMA, DCE thread (draft 4 POSIX), or true POSIX? 3. What compiler/link options are you specifying? 4. What headers do you include? 5. What is actually happening? A few comments: 1. Digital UNIX provides both DCE thread (draft 4 POSIX, both "standard" and exception-raising) and CMA interfaces, as in any DCE implementation. To use these, compile with "cc -threads" or "cxx -threads", and link using the same. If you can't, compile with "-D_REENTRANT". Link depends on what version you're using -- specifically 3.2(x) or 4.0(x). (And for link, watch out for _r libraries, e.g., libfoo_r.so or libfoo_r.a if linking static -- "-threads" or "-pthread" will cause the linker to find and use them automatically; but if you roll your own you'll need to look for them yourself.) 2. Digital UNIX 4.0 (and higher) also provides true POSIX threads, if you're converting. Compile and link using "cc -pthread" or "cxx -pthread". If you can't, compile with "-D_REENTRANT" and link with "-lpthread -lexc -lc" (at the end of your list of files). (And, again, watch out for _r suffix libraries.) 3. You mentioned "pthreads_exc.h". Well, is a header used to define the exception-raising variant of the DCE thread (draft 4 POSIX) API. This conflicts with the implication of your statement "with the posix package", since DCE threads are NOT the same as POSIX threads. You cannot use with POSIX threads. /---------------------------[ Dave Butenhof ]--------------------------\ =================================TOP=============================== Q133: When do you need to know which CPU a thread is on? [This is part on an ongoing series of unsolved problems where there is a lot of "We don't quite know WHY this is happening, but..." -Bil] On Sun, 28 Dec 1997, Bil Lewis wrote: > Jason, > > That sounds like a very interesting project. I'm curious about your decision > to bind threads to CPUs. You SAY you need to do it, but you don't give any > proof. Did you test you system without binding to CPUs? What kind of results > did you get when you did? > The threaded version of the system has not been constructed yet however a non-threaded (ie forked version) has and we have found significant performance differences between allowing the processes to arbitrarily migrate between processors and locking the processes to dedicated processors. So from that experience it stands to reason that locking threads to processors would be preferable if we were to attempt to implement a fully threaded version of the system. > I infer from what you say that this is a computationally intensive task. > Which implies that the threads (or processes) would never migrate to different > CPUs anyway. DID they migrate? I'd very much like to know your experience and > the performance behavior. Yes the graphics processes are computationally intensive. It is a standard technique on multiprocessor SGI's to lock rendering processes to processors. If they are not locked they will migrate. The ability to lock threads to processors hasn't been fully implemented by SGI yet. Currently since threads are bound to their processes, when the process migrates the thread gets carried along with it. I'm guessing that pThreads on the SGI's are being implemented on top of sproc which is a superset of the capabilities of pthreads. Since sprocs can be locked to processors I'm hoping soon that the SGI implementation of pthreads will also inherit that capability. =================================TOP= Jason: > Actually in the work we do (Virtual Reality) we crucially need to know not > only which processor a thread is running on, but to be able to explicitly > assign a thread to the processor. Now I don't see any of that. You have a set of threads that you want to execute in parallel on an SMP. That's fine. Lots of people have the same need for all sorts of reasons. That, however, does NOT mean that you need to know on which processor each thread is running, much less be able to specify on which processor each thread runs. It just means you need to be sure that the O/S supports parallel computation. What you're saying is that you don't trust the O/S scheduling at all, and insist on controlling it yourself. There are cases where that's valid -- but that's quite different from staying that your application inherently requires processor identification or control. It doesn't. In nearly ever case requiring concurrency/parallelism, you'll be best off trusting the O/S to schedule the processor resources. And if you find that it's not always trustworthy, tell the developers, and help them fix it! You, and everyone else, will end up with a better system. /---------------------------[ Dave Butenhof ]--------------------------\ =================================TOP============= Q134: Is any difference between default and static mutex initialization? Robert White wrote: > Venkat Ganti wrote: > > > I want to know whether there is any difference between the following > > two mutex initializations using pthreads > > > > 1. > > > > pthread_mutex_t mp = PTHREAD_MUTEX_INITIALIZER; > > > > 2. > > > > pthread_mutex_t mp; > > pthread_mutex_init (∓, NULL); > > > > In this the allocaled memory is zero. > > > > An other way that these two may be different (in addition to the ones > mentioned by Dave B. in his reply) is that the latter form can have > different meaning as the program progresses because the default mutex > behavior of a program can be changed with the set-attribute calls (I > forget the exact call) when the attribute sepsified in the call is the > NULL pointer. You can't change the attribute values of the NULL attributes object. When you initialize a mutex using NULL, you're asking for default attributes -- those MUST ALWAYS be the same attributes that will be used by a statically initialized mutex. It doesn't (and can't) matter when the statically initialized mutex is first used. > If you use variant 2, you know that the semantics are those in-force at > the time the statement is executed. If you use variant 1, it will likely > have the default semantics in force at the time the mutex is first used. The only way this could be true is if an implementation provides some non-portable and non-standard mechanism for modifying the default attributes. You'd have a hard time convincing me that such an extension could conform, since pthread_mutex_init specifically requires that the mutex gain "default" attributes, and the standard requires that the default value of any attributes (for which the standard doesn't specify a default) must be specified in the conformance document. > The manual, if I recall correctly, "strongly suggests" that variant 1 > only be used to initalize staticly allocated mutexes only. I suspect > that the above ambiguity is the reason. Initializing a mutex on the stack is almost always bogus, and will usually lead to far more trouble than you ever might have imagined. Doesn't matter whether the mutex is statically initialized or dynamically initialized, though, except (as always), a static initialization has no choice but to use the default attributes. You can't statically initialize a heap mutex, because the language doesn't allow you to specify an initial value in that case. /---------------------------[ Dave Butenhof ]--------------------------\ =================================TOP============= Q135: Is there a timer for Multithreaded Programs? From: [email protected] (Richard Sullivan) Subject: Re: Timing Multithreaded Programs (Solaris) [email protected] (Bradley J. Marker) wrote: >I'm trying to time my multithreaded programs on Solaris with multiple >processors. I want the real world running time as opposed to the total >execution time of the programming because I want to measure speedup versus >sequential algorithms and home much faster the parallel program is for the user. Bradly, Here is what I wrote to solve this problem (for Solaris anyway). To use it just call iobench_start() after any setup that you don't want to measure. When you are done measuring call iobench_end(). When you want to see the statistics call iobench_report(). The output to stderr will look like this: Process info: elapsed time 249.995 CPU time 164.446 user time 152.095 system time 12.3507 trap time 0.661235 wait time 68.6506 pfs major/minor 3379/ 0 blocks input/output 0/ 0 65.8% CPU usage >>>>>>>>>>>>>>>>>>>>> iobench.h /*----------------------------------------------------------------------------- * * Library Name: UTIL * Module Name: iobench * * Designer: R. C. Sullivan * Programmer: R. C. Sullivan * Date: Sep 22, 1995 * * History Of Changes: * Name Date Description * ---- ---- ----------- * RCS Jan 17, 1996 Inital release * * Purpose: * To report resource usage statistics that will be correct for * programs using threads on a Solaris system. * * Notes: * *----------------------------------------------------------------------------- */ extern struct prusage prusagebuf_start, prusagebuf_end; extern int procfd; extern double real_time, user_time, system_time, trap_time, wait_time; extern unsigned long minor_pfs, major_pfs, input_blocks, output_blocks, iochars; void iobench_start(); void iobench_end(); void iobench_report(); >>>>>>>>>>>>>>>>>>>>> iobench.c /*----------------------------------------------------------------------------- * * Library Name: UTIL * Module Name: iobench * * Designer: R. C. Sullivan * Programmer: R. C. Sullivan * Date: Sep 22, 1995 * * History Of Changes: * Name Date Description * ---- ---- ----------- * RCS Jan 17, 1996 Inital release * * Purpose: * To report resource usage statistics that will be correct for * programs using threads on a Solaris system. * * Notes: * *----------------------------------------------------------------------------- */ #include #include #include #include #include #include "iobench.h" struct stat statbuf; struct prusage prusagebuf_start, prusagebuf_end; int procfd; double real_time, total_real_time, user_time, system_time, trap_time, wait_time; unsigned long minor_pfs, major_pfs, input_blocks, output_blocks, iochars; void iobench_start() { char pfile[80]; sprintf(pfile, "/proc/%ld", getpid()); procfd = open(pfile, O_RDONLY); ioctl(procfd, PIOCUSAGE, &prusagebuf_start); } void iobench_end() { ioctl(procfd, PIOCUSAGE, &prusagebuf_end); close(procfd); real_time = (double) prusagebuf_start.pr_tstamp.tv_sec + (double) prusagebuf_start.pr_tstamp.tv_nsec / NANOSEC; real_time = (double) prusagebuf_end.pr_tstamp.tv_sec + (double) prusagebuf_end.pr_tstamp.tv_nsec / NANOSEC - real_time; total_real_time = (double) prusagebuf_start.pr_rtime.tv_sec + (double) prusagebuf_start.pr_rtime.tv_nsec / NANOSEC; total_real_time = (double) prusagebuf_end.pr_rtime.tv_sec + (double) prusagebuf_end.pr_rtime.tv_nsec / NANOSEC - real_time; user_time = (double) prusagebuf_start.pr_utime.tv_sec + (double) prusagebuf_start.pr_utime.tv_nsec / NANOSEC; user_time = (double) prusagebuf_end.pr_utime.tv_sec + (double) prusagebuf_end.pr_utime.tv_nsec / NANOSEC - user_time; system_time = (double) prusagebuf_start.pr_stime.tv_sec + (double) prusagebuf_start.pr_stime.tv_nsec / NANOSEC; system_time = (double) prusagebuf_end.pr_stime.tv_sec + (double) prusagebuf_end.pr_stime.tv_nsec / NANOSEC - system_time; trap_time = (double) prusagebuf_start.pr_ttime.tv_sec + (double) prusagebuf_start.pr_ttime.tv_nsec / NANOSEC; trap_time = (double) prusagebuf_end.pr_ttime.tv_sec + (double) prusagebuf_end.pr_ttime.tv_nsec / NANOSEC - trap_time; wait_time = (double) prusagebuf_start.pr_wtime.tv_sec + (double) prusagebuf_start.pr_wtime.tv_nsec / NANOSEC; wait_time = (double) prusagebuf_end.pr_wtime.tv_sec + (double) prusagebuf_end.pr_wtime.tv_nsec / NANOSEC - wait_time; minor_pfs = prusagebuf_end.pr_minf - prusagebuf_start.pr_minf; major_pfs = prusagebuf_end.pr_majf - prusagebuf_start.pr_majf; input_blocks = prusagebuf_end.pr_inblk - prusagebuf_start.pr_inblk; output_blocks = prusagebuf_end.pr_oublk - prusagebuf_start.pr_oublk; /* iochars = prusagebuf_end.pr_ioch - prusagebuf_start.pr_ioch;*/ } void iobench_report() { fprintf(stderr, "Process info:\n"); fprintf(stderr, " elapsed time %g\n", real_time); /* fprintf(stderr, " total time %g\n", total_real_time);*/ fprintf(stderr, " CPU time %g\n", user_time + system_time); fprintf(stderr, " user time %g\n", user_time); fprintf(stderr, " system time %g\n", system_time); fprintf(stderr, " trap time %g\n", trap_time); fprintf(stderr, " wait time %g\n", wait_time); fprintf(stderr, " pfs major/minor %6lu/%6lu\n", major_pfs, minor_pfs); fprintf(stderr, " blocks input/output %6lu/%6lu\n", input_blocks, output_blocks); /* fprintf(stderr, " char inp/out %lu\n", iochars);*/ fprintf(stderr, "\n"); /* fprintf(stderr, "%2.5g Mbytes/sec (real time)\n", iochars / real_time / 1e6); fprintf(stderr, "%2.5g Mbytes/sec (CPU time) \n", iochars / (user_time + system_time) / 1e6);*/ fprintf(stderr, "%2.1f%% CPU usage\n", 100 * (user_time + system_time) / real_time + .05); } =================================TOP============= Q136: Roll-your-own Semaphores [For systems that don't support the realtime extensions (where POSIX semaphores are defined -- they're NOT in Pthreads).] In article , [email protected] says... > [[ PLEASE DON'T SEND ME EMAIL COPIES OF POSTINGS ]] > > [email protected] (Bob Withers) writes: > > >Thanks much for this info. Unfortunately I need the semaphores for > >inter-process mutual exclusion which makes sem_open important. I'll just > >have to stick with SysV semaphores until we can move to 2.6. > > > Well, you can mmap a semaphore in a file if you wish. > Well you sure can and, believe it or not, I actually thought of it before I read your post. My code has not been thoroughly tested but I'm posting it here in the hopes that it will be of help to someone else. Either that or I'm just a glutten for criticism. :-) Casper, thanks much for your help. Bob #include #include #include #include #include #include #include #include #include #include sem_t *sem_open(const char *name, int oflag, ...) { auto int need_init = 0; auto int val = 1; auto int fd; auto sem_t * sem = (sem_t *) -1; auto struct stat st; /* -----------------2/11/98 2:12PM------------------- * open the memory mapped file backing the shared * semaphore to see if it exists. * --------------------------------------------------*/ fd = open(name, O_RDWR); if (fd >= 0) { /* -----------------2/11/98 2:13PM------------------- * the semaphore already exists, it the caller * specified O_CREAT and O_EXCL we need to return * an error to advise them of this fact. * --------------------------------------------------*/ if ((oflag & O_CREAT) && (oflag & O_EXCL)) { close(fd); errno = EEXIST; return(sem); } } else { auto int sem_mode; auto va_list ap; /* -----------------2/11/98 2:14PM------------------- * if we get here the semaphore doesn't exist. if * the caller did not request that ir be created then * we need to return an error. note that errno has * already been set appropriately by open(). * --------------------------------------------------*/ if (0 == (oflag & O_CREAT)) return(sem); /* -----------------2/11/98 2:15PM------------------- * ok, we're going to create a new semaphore. the * caller should've passed mode and initial value * arguments so we need to acquite that data. * --------------------------------------------------*/ va_start(ap, oflag); sem_mode = va_arg(ap, int); val = va_arg(ap, int); va_end(ap); /* -----------------2/11/98 2:16PM------------------- * create the semaphore memory mapped file. if this * call returns an EEXIST error it means that another * process/thread snuck in and created the semaphore * since we discovered it doesn't exist above. we * don't handle this condition but rather return an * error. * --------------------------------------------------*/ fd = open(name, O_RDWR | O_CREAT | O_EXCL, sem_mode); if (fd < 0) return(sem); /* -----------------2/11/98 2:18PM------------------- * set flag to remember that we need to init the * semaphore and set the memory mapped file size. * --------------------------------------------------*/ need_init = 1; if (ftruncate(fd, sizeof(sem_t))) { close(fd); return(sem); } } /* -----------------2/11/98 2:19PM------------------- * map the semaphore file into shared memory. * --------------------------------------------------*/ sem = (sem_t *) mmap(0, sizeof(sem_t), PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); close(fd); if (sem) { /* -----------------2/11/98 2:19PM------------------- * if the mapping worked and we need to init the * semaphore, do it now. * --------------------------------------------------*/ if (need_init && sem_init(sem, 1, val)) { munmap((caddr_t) sem, sizeof(sem_t)); sem = 0; } } else { sem = (sem_t *) -1; } return(sem); } int sem_close(sem_t *sem) { return(munmap((caddr_t) sem, sizeof(sem_t))); } int sem_unlink(const char *name) { return(remove(name)); } =================================TOP============= Q137: Solaris sockets don't like POSIX_C_SOURCE! A little known requirement in Solaris is that when you define POSIX_C_SOURCE, you must also define __EXTENSIONS__ when including sys/socket.h. Hence, your file should look like this: #define _POSIX_C_SOURCE 199506L #define __EXTENSIONS__ #include #include #include ... ================ That's because POSIX_C_SOURCE of 1995 vintage doesn't include socket calls. The feature macros are *exclusion* macros, not *inclusion* macros. By default, you will get everything. When you define something, you get *only* that something. (Unless you also define __EXTENSIONS__) This is slightly different in cases where the behaviour is modified by the macro as in some socket calls. Casper ====================== From: [email protected] (David Robinson) The gratuitous use of non-POSIX conforming typedefs in headers is the root cause. (Should use ushort_t not u_short and uint_t not u_int) When defining POSIX_C_SOURCE it says use only strictly POSIX conforming features, typedefs thus can only end in _t. Good news is that 90+% of the offending headers are fixed in 2.7. -David ================ % A question... should I use -mt or -D_POSIX_C_SOURCE=199506L to compile % a pthread program on Solaris 2.6? If I use the latter even the most simple % socket program won't compile. For example, Well, these do different things. -mt sets up the necessary macro definitions for multi-threading, and links with the appropriate libraries. _POSIX_C_SOURCE tells the compiler that your application is supposed to strictly conform to the POSIX standard, and that the use of any non-POSIX functions or types that might be available on the system should not be allowed. The advantage of this is that, when you move to another system which provides POSIX support, you are assured of your program compiling, however this requires some work up-front on your part. So the answer to your question is that you should use -mt to tell the compiler your application is multi-threaded, and use _POSIX_C_SOURCE only if your application is intended to conform strictly to POSIX. HP's compiler is quite frustrating in this regard, since it assumes by default that your application is k&r C. If you use the -Aa option to tell it your application is ANSI C, it doesn't allow any functions which aren't defined by ANSI. I always end up using -Ae to tell the compiler to get stuffed and just compile my program with whatever's on the system, and I port to HP last after a big change. =================================TOP============= Q138: The Thread ID changes for my thread! I'm using IRIX6.4 threads and MPI-SGI but I'm having strange problems. To analyse and even debug my program I begun to write some "similar behaviors" programs very simples, and I detected a strange thing. Anybody can says me if I doing a mistake or if was a problem with IRIX 6.4 systems. The problems is: WHEN I CHANGE THE THREAD PRIORITY THE THREAD ID IS ALSO CHANGED. As you can imagine I have a lot of problems when I try joining them. - If I use only threads call, I commented MPI calls, the program works fine even if I link with mpi library. The program changes the main thread priority, and after, it creates 10 threads with other priorities. Threads Id are sequential. - If I use threads and MPI call (only MPI_Init, Comm_size, Comm_rank and Finalize) the SAME program does a change on main thread id after prioriry changing. - Another thing: in the first case, on my execution, thread id begun with id=10000 and the other are sequential after 10000. In the second case, thread id begun with 10000 and after priority change it assumes id=30000. ANYBODY can explain me? TIA. THE CODE IS: #include #include #include #include pthread_attr_t attr; pthread_mutex_t mutex; pthread_cond_t cond; int xcond=0; int size,rank; void *Slaves(void *arg) { int i,j; pthread_mutex_lock(&mutex); while(xcond==0) pthread_cond_wait(&cond,&mutex); pthread_mutex_unlock(&mutex); printf("Size: %d Rank: %d Thread Id %x\n", size, rank, pthread_self()); fflush(stdout); } int main (int argc, char **argv) { int i,k=10,r; pthread_t *ThreadId; struct sched_param params; int sched; printf("THREAD MAIN BEFORE MPI INIT %x\n", pthread_self()); /* * This lines are commented to see if MPI calls influence over * system behavior. MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); printf("THREAD MAIN AFTER MPI INIT %x\n", pthread_self()); */ /* * If I called MPI the main thread id will be changed after * this lines. When I just left MPI initialisation main thread * has the same ID that it had before, the problem arrives * from this point */ params.sched_priority=20; pthread_setschedparam(pthread_self(), SCHED_RR, ¶ms); printf("THREAD MAIN AFTER PRIO CHG %x\n", pthread_self()); if (argc==2) k=atoi(argv[1]); ThreadId= (pthread_t *) malloc(k*sizeof(pthread_t)); pthread_attr_init(&attr); pthread_mutex_init(&mutex, NULL); pthread_cond_init(&cond, NULL); printf("\n Creating %d threads - Main thread is %x \n", k, pthread_self()); for(i=0; i != k; i++) { pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED); params.sched_priority=21; pthread_attr_setschedparam(&attr, ¶ms); r=pthread_create(&ThreadId[i], &attr, Slaves, NULL); if (r!=0) { printf("Error on thread creation! \n"); exit(0); } } xcond=1; pthread_cond_broadcast(&cond); /* * It was to force threads execution, but this is not necessary for(;;) sched_yield(); */ for(i=0; i != k; i++) { r=pthread_join(ThreadId[i], NULL); if (r!=0) { printf("Error on joining threads...\n"); exit(0); } } printf(" Thead MAIN with id %x terminating...\n", pthread_self()); /* MPI_Finalize(); */ } =================================TOP============= Q139: Does X11 support multithreading ? > > I am developing a multithreaded app under Solaris 2.5.1 (UltraSPARC), > > using mixed Motif and X11, and i wonder if someone can help me > > answering some question: > > > > Does X11 support multithreading ? > > Well, kinda. But... Kinda? The X Consortium releases of R6.x can be build MT-safe. You can tell if you have R6 when you compile by checking the XlibSpecificationRelease or XtSpecificationRelease feature test macros. If they are >5 then your implementation may support threaded programming. Call XInitThreads and/or XtToolkitInitializeThreads to find out if your system's Xlib and Toolkit Intrinsics (libXt) really do support threaded programming. > > > Does Motif do the same ? > > No. It's not thread-safe. Motif 2.1 IS MT-safe. > > > Can different threads open their own window and listen to their own > > XEvents ? How could they do that ? [XNextEvent() can't specify a window > > handle !]. > > You don't. Your main loop listens for any event, and then decides what > to do with it. Perhaps it hands off a task to another thread. You can. Each thread could open a separate Display connection and do precisely what Daniele asks. Even without separate Display connections, the first thread to call XNextEvent will lock the Display, and the second thread's call to XNextEvent will block until the first thread releases its lock. But you can't guarantee which thread will get a particular event, so in the trivial case you can't be assured that one thread will process events solely for one window. > > Take a look at the FAQ for the threads newsgroup (on the page below). That > will help a bit. You may also want to get "Multithreaded Programming with Pthreads" > which has a section on exactly this, along with some example code. (No one else > talks about this, but I thought it important.) I recommend reading the Xlib and Xt specifications, which are contained in each and every X Consortium release -- available at ftp://ftp.x.org/pub/, or you can get ready-to-print PostScript of just the Xlib and Xt specs from ftp://ftp.x.org/pub/R6.3/xc/doc/hardcopy. =================================TOP============= Q140: Solaris 2 bizzare behavior with usleep() and poll() >Jeff Denham wrote: >> >> Adam Twiss wrote: >> >> > You really really don't want to use usleep() in a threaded environment. >> > On some platforms it is thread safe, but on Solaris it isn't. The >> > affects are best described as "unpredictable", but I've seen a usleep() >> > call segv because it was in a theaded program on Solaris. >> > >> > You want to use nanosleep() instaed. >> > >> > Adam >> >> I've found that poll(0, 0, msec-timeout-value) >> works pretty well. Is there significant overhead calling >> poll in this manner? > >It's not uncommon to use poll() or select() for sleeping. Works >great. I've seen an occasional bug in Solaris 2.6 where poll() will fail to restore a pre-existing SIGALRM handler when it returns. The sequence is: sigaction(SIGALRM,...); alarm(); ... poll(0, 0, timeout); ... (program exits with "Alarm clock" error) Looking at the truss output, poll() appears to be the only place after the initial sigaction where the handler for SIGALRM is changed. The failure is difficult to reliably reproduce, but I've seen it happen about 10% of the time. It only happens on 2.6; this same code works fine on Solaris 2.5.1, AIX 3.2.5 and 4.2.1, HP-UX 9.04 and 10.10, and SCO OpenServer 5.0.4. The same thing happens with usleep. I haven't tried nanosleep. The program in question is single-threaded. I haven't had the chance to pursue this problem yet; there may be a fix for it, or it may be some really subtle application bug. Michael Wojcik [email protected] AAI Development, Micro Focus Inc. Department of English, Miami University Q: What is the derivation and meaning of the name Erwin? A: It is English from the Anglo-Saxon and means Tariff Act of 1909. -- Columbus (Ohio) Citizen =================================TOP============= Q141: Why is POSIX.1c different w.r.t. errno usage? Bryan O'Sullivan wrote: > d> It's an issue because that implementation is "klunky" and, more > d> precisely, inefficient. > > I must admit that optimising for uncommon error cases does not make > much sense to me. Sure. In my sentence, I would have to say that "klunky" was a more important consideration than "inefficient". However, use of errno is NOT strictly in "uncommon error cases". For example, pthread_mutex_trylock returns EBUSY when the mutex is locked. That's a normal informational status, not an "uncommon error". Similarly, pthread_cond_timedwait returns ETIMEDOUT as a normal informational status, not really an "error". There are plenty of "traditional" UNIX functions that are similar. It's certainly not universal, but "uncommon" is an overstatement. > d> Still, why propagate the arcane traditions, just because they > d> exist? > > Because they are traditions. I think there is some non-trivial value > in preserving interface consistency - principle of least surprise, and > all that - and 1003.1c violates this for no particularly good reason. Let's just say that the working group (a widely diverse and contentious bunch) and the balloting group (an even larger and more diverse group) were convinced that the reasons were "good enough". Arguing about it at this point serves no purpose. > d> Overloading return values with "-1 means look somewhere else for an > d> error" is silly. > > Sure, it's silly, but it's the standard way for library calls on Unix > systems to barf, and what 1003.1c does is force programmers to plant > yet another gratuitous red flag in their brains, with wording similar > to "hey! everything else works in such-and-such a way, but *this* is > *different*!". I have enough red flags planted in my brain at this > point that it resembles a pincushion, and I would gladly sacrifice a > few to ugliness if that ugliness were at least consistent. UNIX isn't even very consistent about that. Some return -1. Some have symbolic definitions that "are usually" -1 but needn't be (at least in terms of guaranteed portability and conformance). Some return NULL. Some set errno and some don't, requiring that YOU set errno before making the call if you care WHY it failed (e.g., sysconf). Hey, even if there was a "C" in "UNIX", it would most definitely NOT stand for "consistency". Adding threads to UNIX disturbed a lot of cherished traditions... far more than most people are willing to acknowledge until they stumble over the shards of the old landscape. There was good reason for this, though, and the net result is of substantial benefit to everyone. While the changes to errno may be one of the first differences people notice, "in the scheme of things", it's trivial. If it even raises your awareness that "something's different", maybe it'll save a few people from some bad mistakes, and to that extent it's valuable even merely as a psychological tool. Hey... count your mixed blessings, Bryan. I would have reported errors by raising exceptions, if there'd been any hope at all of getting that into POSIX. ;-) /---------------------------[ Dave Butenhof ]--------------------------\ | Digital Equipment Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP============= Q142: printf() anywhere AFTER pthread_create() crashes on HPUX 10.x >I've spend the last couple of days trying to track down an >annoying problem when running a multithreaded program on >HPUX build against DCE Threads... > >If I call printf anywhere AFTER pthread_create had been executed to start >a thread, when my application ends I get thrown out of my rlogin shell. We experienced a similar problem about a year ago. It was a csh bug, and a patch from HP fixed it. =================================TOP============= Q143: Pthreads and Linux Wolfram Gloger wrote: > You should always put `-lpthread' _after_ your source files: > > % gcc the_file.cpp -lpthread > > Antoni Gonzalez Ciria writes: > > > When I compile this code with gcc ( gcc -lpthread the_file.cpp) the > > program executes fine, but doing so with g++( g++ -lpthread the_file.cpp) > > the progran crashes, giving a Segmentation fault error. > > > > #include > > > > void main(){ > > FILE * the_file; > > char sBuffer[32]; > > > > the_file=fopen("/tmp/dummy","rb"); > > fread( sBuffer, 12, 1, the_file); > > fclose( the_file); > > > > } > > Using `g++' as the compiler driver always includes the `libg++' > library implicitly in the link. libg++ has a lot of problems, and is > no longer maintained (in particular, it has a global constructor > interfering with glibc2, if I remember correctly). If you really need > it, you must get a specially adapted version for Linux/egcs/gcc-2.8. > > If you don't need libg++, please use `c++' or `gcc' as your compiler > driver, and use the libstc++ library shipped with egcs or seperately > with gcc-2.8 (`c++' will link libstdc++ in implicitly). > > When I just tested your program with egcs-1.0.1 and glibc-2.0.6, it > crashed at first (it fails to check the fopen() result for being > NULL), but after creating a /tmp/dummy file it ran perfectly, even > after compiling it with `c++ the_file.cpp -lpthread'. > > Regards, > Wolfram. The -pthread option takes care of everything: adds -D_REENTRANT during the cpp pass, and adds -lpthread during the link-edit. This option has been around for a while. I'm not sure it's working for all ports. At least for the x86 AND glibc. You may want to take a look at the spec file (gcc -v) jms. =================================TOP============= Q144: DEC release/patch numbering It was after 4.0B, and 4.0C is just 4.0B with new hardware support. (If you install a "4.0C" kit on any hardware that doesn't need the new support, it will even announce itself as 4.0B.) Although this is not true of all components, DECthreads policy has been to keep 4.0 through 4.0C identical -- we have always submitted any patches to the 4.0 patch stream, and propagated the changes through the other patch streams, releasing "functionally identical" patches that are simply built in the appropriate stream's environment. (But note that all future patches for the 4.0 - 4.0C stream will be available only on 4.0A and later... 4.0 is no longer supported.) The changes are in source in 4.0D and later. /---------------------------[ Dave Butenhof ]--------------------------\ =================================TOP============= Q145: Pthreads (almost) on AS/400 Fred A. Kulack wrote: > Hi All. > You may or may not know, that new in the v4r2m0 release of OS/400 is support > for kernel threads. Most notably the support for threads is available via > native Java, but we've also implemented a Pthreads library that is 'based on' > Posix and Unix98 > Pthreads. Thank you very much for this update. I'm not likely to ever have occasion or need to use this, but I like to keep track of who's implemented threads (and how, and which threads). If nothing else, it provides me with more information to answer the many questions I get on threading. > The implementation claims no compliance because there are some differences > and we haven't implemented all of the APIs. We do however duplicate the > specification for the APIs that are provided, and we have quite a full set > of APIs. Yeah, I understand that one. Same deal on OpenVMS. Most of the APIs are there, and do more or less what you'd expect -- but VMS isn't POSIX, and some parts just don't fit. Congratulations on "doing your best". (At least, since you say "we", I'm making the liberal assumption that you bear some personal responsibility for this. ;-) ) > Anyone whose interested, can take a look at > http://www.as400.ibm.com/developer/threads =================================TOP============= Q146: Can pthreads & UI threads interoperate in one application? >Can Solaris pthread/UI thread (pthread_xxxx() versus thr_xxx()) >interoperate in one application ? Is solaris pthread implemented >as user level threads ? I'v read JNI book which says the thread >model used in the native code must be interoperatable with JVM thread >model used. An example the book gives is that if the JVM using user >level thread (Java green thread) and the native code using Solaris >native thread, then it will have problem to interoperate. Does this >apply to pthread & UI thread interoperatibility on Solaris, if pthread >is kind of user level thread ? > >Also when people say Solaris native thread, does it mean the UI thread >(thr_xxx() calls) only or does it also include Solaris pthread ? Yes. They are built on the same underlying library. Indeed, many of the libraries you use everyday are built using UI threads and they get linked into Pthreads programs all the time. "Implemented at user level" isn't quite the right way of describing it. "Does the library use LWPs?" is the real question. Green threads don't, so you can't make JNI calls to pthreads or UI threads. Native threads do, and you can. When folks say "Solaris native threads" they mean either pthreads or UI threads, NOT green threads. For a more detailed discussion, see my *excellent* book on Java Threads: "Multithreaded Programming with Java". -Bil =================================TOP=============================== Q147: Thread create timings Matthew Houseman writes: Thought I'd throw this into the pyre. :) I ran the thread/process create stuff on a 166MHz Pentium (no pro, no mmx) under NT4 and Solaris x86 2.6: NT spawn 240s 24.0 ms/spawn Solaris spawn (fork) 123s 12.3 ms/spawn (incl. exec) Solaris spawn (vfork) 95s 9.5 ms/spawn (incl. exec) Solaris fork 47s 4.7 ms/fork Solaris vfork 0.37 ms/vfork (37s/100000) NT thread create 12s 1.2 ms/create Solaris thread create 0.11 ms/create (11s/100000) As you can see, I tried both fork() and vfork(). When doing an immediate exec(), you'd normally use vfork(); when just forking, fork() is usually what you want to use (or have to use). Note that I had to turn the number of creates up to 100000 for vfork and thread create to get better precision in the timings. To remind you, here are greg's figures (on a Pentium MMX 200MHz): >NT Spawner (spawnl): 120 Seconds (12.0 millisecond/spawn) >Linux Spawner (fork+exec): 57 Seconds ( 6.0 millisecond/spawn) > >Linux Process Create (fork): 10 Seconds ( 1.0 millisecond/proc) > >NT Thread Create 9 Seconds ( 0.9 millisecond/thread) >Linux Thread Create 3 Seconds ( 0.3 millisecond/thread) Just for fun, I tried the same thing on a 2 CPU 170MHz Ultrasparc. I leave it to someone else to figure out how much of this is due to the two CPUs... :) Solaris spawn (fork) 84s 8.4 ms/spawn (incl. exec) Solaris spawn (vfork) 69s 6.9 ms/spawn (incl. exec) Solaris fork 21s 2.1 ms/fork Solaris vfork 0.17 ms/vfork (17s/100000) Solaris thread create 0.06 ms/create (6s/100000) =================================TOP============= Q148: Timing Multithreaded Programs (Solaris) From: [email protected] (Richard Sullivan) >I'm trying to time my multithreaded programs on Solaris with multiple >processors. I want the real world running time as opposed to the total >execution time of the programming because I want to measure speedup versus >sequential algorithms and home much faster the parallel program is for the user. Bradly, Here is what I wrote to solve this problem (for Solaris anyway). To use it just call iobench_start() after any setup that you don't want to measure. When you are done measuring call iobench_end(). When you want to see the statistics call iobench_report(). The output to stderr will look like this: Process info: elapsed time 249.995 CPU time 164.446 user time 152.095 system time 12.3507 trap time 0.661235 wait time 68.6506 pfs major/minor 3379/ 0 blocks input/output 0/ 0 65.8% CPU usage The iobench code is included in the program sources on: index.html. =================================TOP============= Q149: A program which monitors CPU usage? > >Ok, I've tried some web searches and haven't found anything I like the > >look of. What I'm after is a program which runs in the background and > >monitors (primarily) CPU usage for our web server (an Ultra-1 running > >Solaris 2.6). However, all the programs I've found are about 2 years > >old and/or don't run on 2.6. > > > >I've seen top, but it doesn't really do what I want; I'd like to have > >the output from the program as a %cpu usage for each hour (or some > >other arbitrary time period) stored as a log file or, ideally, as a > >graph (in some graphics format, eg, .gif). > > Sounds like what sar does, and it comes with 2.6 - to enable recording > data for it, just uncomment the lines in /etc/init.d/perf and the > crontab for the 'sys' account. From what I've read on the product, sounds like 'spong' might be what you need. I've downloaded it, but haven't had time to install and set up yet. Try: http://strobe.weeg.uiowa.edu/~edhill/public/spong/ =================================TOP============= Q150: standard library functions: whats safe and whats not? From: [email protected] (W. Richard Stevens) Subject: Re: standard library functions: whats safe and whats not? Date: 17 Feb 1998 14:19:28 GMT > 1. Which of the standard C library functions are thread-safe and > which aren't? For example, I know that strtok() is un-safe, I can > infer that from its functionality, but what about the thousands of > other library calls? I don't want to examine each one individually > and make guesses about thread safety. > > Is there a list somewhere of what's safe and whats not? Page 32 of the 1996 Posix.1 standard says "All functions defined by Posix.1 and the C standard shall be thread-safe, except that the following functions need not be thread-safe: asctime() ctime() getc_unlocked()* getchar_unlocked()* getgrid() getgrnam() getlogin() getpwnam() getpwuid() gmtime() localtime() putc_unlocked()* putchar_unlocked()* rand() readdir() strtok() ttyname()" Note that a thread-safe XXX_r() version of the above are available, other than those with an asterisk. Also note that ctermid() and tmpnam() are only thread-safe if a nonnull pointer is used as an argument. Rich Stevens ================ POSIX and ANSI C specify only a small part of the "traditional UNIX programming environment", though it's a start. The real danger in reading the POSIX list quoted by Rich is that most people don't really know what's included. While an inclusive list would be better than an exclusive list, that'd be awfully long and awkward. The Open Group (OSF and X/Open) has extended the Single UNIX Specification (also known as "SPEC1170" for it's 1,170 UNIX interfaces, or UNIX95) to include POSIX.1b-1993 realtime, POSIX.1c-1995 threads, and various extensions. It's called the Single UNIX Specification, Version 2; or UNIX98. Within this calendar year, it's safe to assume that most UNIX versions currently branded by The Open Group (as XPG3, UNIX93, UNIX95) will extend their brand validation to UNIX98. The interfaces specification part of the Single UNIX Specification, Version 2 (known as XSH5), in section 2.8.2, "Thread-safety", specifies that all interfaces defined by THIS specification will be thread-safe, except for "the following". There are two explicit lists, and one implicit. One is the POSIX list already quoted by Rich Stevens. The second is an additional list of X/Open interfaces: basename dbm_open fcvt getutxline pututxline catgets dbm_store gamma getw setgrent dbm_clearerr dirname gcvt l64a setkey dbm_close drand48 getdate lgamma setpwent dbm_delete ecvt getenv lrand48 setutxent dbm_error encrypt getgrent mrand48 strerror dbm_fetch endgrent getpwent nl_langinfo dbm_firstkey endpwent getutxent ptsname dbm_nextkey endutxent getutxid putenv The implicit list is a statement that all interfaces in the "Legacy" feature group need not be thread-safe. From another section, that list is: advance gamma putw sbrk wait3 brk getdtablesize re_comp sigstack chroot getpagesize re_exec step compile getpass regcmp ttyslot cuserid getw regex valloc loc1 __loc1 loc2 locs Obviously, this is still an exclusive list rather than inclusive. But then, if UNIX95 had 1,170 interfaces, and UNIX98 is bigger, an inclusive list would be rather awkward. (And don't expect ME to type it into the newsgroup!) On the other hand... beware that if you've got a system that doesn't claim conformance to POSIX 1003.1c-1995 (or POSIX 1003.1-1996, which includes it), then you're not guaranteed to be able to rely even on the POSIX list, much less the X/Open list. It's reasonable to assume that any implementation's libpthread (or equivalent, though that name has become pretty much defacto standard) is thread-safe. And it's probably reasonable to assume, unless specified otherwise, that "the most common" bits of libc are thread-safe. But without a formal statement of POSIX conformance, you're just dealing with "good will". And, even at that, POSIX conformance isn't validated -- so without validation by the UNIX98 branding test suite, you've got no real guarantee of anything. /---------------------------[ Dave Butenhof ]--------------------------\ =================================TOP============= Q151: Where are semaphores in POSIX threads? David McCann wrote: > Jan Pechanec wrote: > > > > Hello, > > > > I have a summary of POSIX papers on threads, but there in no > > imformation about semaphores (just conditional vars, mutexes). *NO* > > pthread_semaphoreinit() etc. > > > > In some materials, there is information on sem_wait(), sem_send() (or > > sm. like that), but is it for threads (or just processes)? > > I think this whole discussion has digressed from Jan's original question > above. Yes, there are sem_* calls in Solaris 2.5 (and 2.4 IIRC); you > just need to link with -lposix4 or whatever to get them. But these are the > *POSIX.1b* semaphores, which are *process-based* semaphores. They have > nothing to do with threads. > > Now what Jan wants here is semaphore calls for *POSIX.1c*, i.e. POSIX > threads. Now, IIRC, the sem_* calls are NOT specified in POSIX.1c, but > rather their behaviour in MT programs has been clarified/refined in XPG5 > (Unix98) which allows you to use semaphores to synchronize threads and/or > processes, depending on how you use them. Not quite true. Yes, XSH5 (the "system interfaces" part of XPG5) says this; but it does so because POSIX 1003.1-1996 says so, not because it's added something to POSIX. In fact, POSIX 1003.1b semaphores were designed by the same working group that did 1003.1c, and while 1003.1b-1993 was approved and published first (and therefore couldn't mention threads), several aspects of 1003.1b were designed to work with threads. For example, there are "realtime" extended versions of the 1003.1c sigwait functions (sigtimedwait and sigwaitinfo). (The interfaces are slightly incompatible because they return -1 and set errno on errors, rather than returning an error code: that's because 1003.1c removed the use of errno AFTER 1003.1b was finalized.) Additionally, the sem_init function was designed with a parameter corresponding to the "pshared" attribute of mutexes and condition variables. For 1003.1b, the only supported value was 1, meaning "process shared". 1003.1c amended the sem_init description to specify in addition that the value 0 meant "process private", for use only between threads within the process. (But also note that it's perfectly reasonable to create a "process shared" semaphore and then use it only between threads within the process -- it may be less efficient on some implementations, but it does the same thing.) > Solaris 2.5 is not Unix98-conformant; the confusion arises because it > *does* appear to be compliant with POSIX.1b and POSIX.1c (somebody at Sun can > surely verify this). From what's been said here, I assume 2.6 is either Unix98- > compliant, or at least contains the MT extensions to POSIX of Unix98. Solaris 2.5 supports (most of) 1003.1b and 1003.1c, although there were a few omissions and a few interpretation errors. (Like any other implementation.) This, however, is not one of them. Solaris 2.5 does NOT define _POSIX_SEMAPHORES in , which is the way an implementation should advertise support for POSIX semaphores. Therefore, while it may not implement all capabilities described by 1003.1b and 1003.1c, it doesn't (in this case, anyway) violate the standard. If you're using POSIX semaphores (even if they seem to work) on Solaris 2.5, then your application is not "strictly conforming", and if you're subject to any incompatibilities or porting problems, that's your fault, not the fault of Solaris. IT says they're not there. (And, yes, POSIX specifically allows an implementation to provide interfaces while claiming they're not there; and if it does so, it's not obligated to provide strict conformance to the standard's description of those interfaces. This is what Solaris 2.5 should have done, also, with the _POSIX_THREAD_PRIORITY_SCHEDULING option, since it doesn't completely implement those interfaces.) Presumably, Solaris 2.6 (though I don't have a system handy to check) DOES define _POSIX_SEMAPHORES. > At any rate, you can't use the sem_* calls for thread synchronization in > 2.5; you get ENOSYS in MT programs. I know, I've tried it (on 2.5.1). > AFAIK, single-threaded programs linked with -lposix4 work fine, but as I > said above, they're only for process-based semaphores. So if you want to use the > > sem_* calls for thread-synchronization on Solaris, you have to go to 2.6. First off, other replies have indicated that it's actually libthread, not libposix4, that provides "working" (though not complete) POSIX semaphores. Most likely, these semaphores would work with the "pshared" parameter set to either 0 (process) or 1 (cross-process). However, in any case, if you've got something that can synchronize between processes, you should expect that it can synchronize between threads as well, though there may be alternatives that are more efficient on some implementations. (E.g., a pshared mutex will usually be more expensive to lock or unlock than a private mutex.) (Such a difference in efficiency is less likely for semaphores, since POSIX already requires that sem_post be async-signal safe, which means it's far better and easier to keep the implementation inside the kernel regardless of the pshared argument.) /---------------------------[ Dave Butenhof ]--------------------------\ | Digital Equipment Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP============= Q152: Thread & sproc (on IRIX) In article <[email protected]>, Yann Boniface wrote: >I'm having trouble while using threads and processus on a massive >parallel machine (SGI). >The processus creation is OK (sproc (myFunction, PR_SADDR, arg)) as long >as I don't use pthread library. If I compile the program with the flag >-lpthread, processus creation didn't work any more, even if I don't >explicity use thread functions (errno is then ENOTSUP) You shouldn't mix pthreads and sprocs. You should stick with one or the other (IMHO pthreads are preferable). -- Planet Bog -- pools of toxic chemicals bubble under a choking atomsphere of poisonous gases... but aside from that, it's not much like Earth. =================================TOP============= Q153: C++ Exceptions in Multi-threaded Solaris Process Jeff Gomsi writes: > We are running a large multi-threaded C++ (C++ 4.2 patch > 104631-03) application under Solaris (SunOS 5.5.1 > Generic_103640-14) on a 14 processor Ultra-Enterprise and > observe the following problem. > > The application runs fine single-threaded, but when run > multi-threaded, throwing a C++ exception can (evidently) > cause memory corruption which leads to a SIGSEGV core > dump. A diagnostic version of the new operator seems to > reveal that C++ is unwinding things improperly and possibly > calling destructors which should not be called. > > Does anyone have any ideas on this? The last time I looked at the patch list for the C++ 4.2, I noticed a mention of a bug stating that exceptions were not thread safe. There was no further description of this bug that I could find. However, it supposedly is addressed by one of the later patches. Try upgrading your patch to -04 or -05.... - Chris Make sure you have the libC patch 101242-13. =================================TOP============= Q154: SCHED_FIFO threads without root privileges ? Laurent Deniel wrote: > Hi, > > Is there a way to create threads that have the SCHED_FIFO scheduling > without root privileges (in system contention scope) ? by for instance > changing a kernel parameter (Digital UNIX 4.0B & D or AIX 4.2) ? > > Thanks in advance, In Digital UNIX 4.0, using process contention scope (PCS) threads, any thread can set FIFO policy; further, it can set any priority. Why? Because the policies and priorities for PCS threads affect only the threads in the containing process. PCS FIFO/63 threads are really important in relation to other PCS threads in the process, but have no influence on the scheduling of other threads in other processes. The aspect is controlled by the policies and priorities of the kernel scheduling entities (VPs -- virtual processors) underlying the PCS threads, and those characteristics are unaffected by the POSIX scheduling interfaces. On V4.0D, newly released, system contention scope (SCS) threads are supported. Their policies and priorities are by definition seen by the kernel scheduler and are therefore subject to privilege restrictions. In short, you can set SCS threads to FIFO or RR policy without privilege on V4.0D, but FIFO threads cannot exceed POSIX prio 18 and RR threads cannot exceed 19. Regardless of this "limitation," it gives you plenty or rope to hang yourself with! __________________________________________________ Jeff Denham ([email protected]) =================================TOP============= Q155: "lock-free synchronization" > I recently came across a reference to "lock-free synchronization" (in > Taligent's Guide to Designing Program's.) This document referred to > research that was looking at using primitive atomic operations to build more > complex structures in ways that did not require locking. > > I'm interested in exploring this topic further and would be greatful if > anyone could supply references. > > Regards, > Daniel Parker > > Check out the following references -- M. Herlihy, "Wait free Synchronization," ACM Transactions on Programming Languages and Systems, Vol 13, No 1, 1991, pp. 124-149. M. Herlihy, "A Methodology for Implementing Highly Concurrent Data Objects," same journal as above, Vol 15, No. 5, 1993, pp. 745 --770. They should provide a starting point. =================================TOP============= Q156: Changing single bytes without a mutex Tim Beckmann wrote: > David Holmes wrote: > > > > I thought about this after posting. An architecture such as Bil describes > > which requires non-atomic read/mask/write sequences to update variables of > > a smaller size than the natural word size, would be a multi-threading > > nightmare. As you note above two adjacent byte values would need a common > > mutex to protect access to them and this applies even if they were each > > used by only a single thread! On such a system I'd only want to program > > with a thread-aware language/compiler/run-time. > > > > David > > David, > > My thoughts exactly! > > Does anyone know of a mainstream architecture that does this sort of > thing? Oh, absolutely. SPARC, MIPS, and Alpha, for starters. I'll bet most other RISC systems do it, too, because it substantially simplifies the memory subsystem logic. And, after all, the whole point of RISC is that simplicity means speed. If you stick to int or long, you'll probably be safe. If you use anything smaller, be sure they're not allocated next to each other unless they're under the same lock. I wrote a long post on most of the issues brought up in this thread, which appears somewhere down the list due to the whims of news feeds, but I got interrupted and forgot to address this issue. If you've got pthread_mutex_t mutexA = PTHREAD_MUTEX_INITIALIZER; pthread_mutex_t mutexB = PTHREAD_MUTEX_INITIALIZER; char dataA; char dataB; And one thread locks mutexA and writes dataA while another locks mutexB and writes dataB, you risk word tearing, and incorrect results. That's a "platform issue", that, as someone else commented, POSIX doesn't (and can't) address. What do you do? I always advise that you keep a mutex and the data it protects closely associated. As well as making the code easier to understand, it also addresses problems like this. If the declarations were: typedef struct dataTag { pthread_mutex_t mutex; char data; } data_t; data_t dataA = {PTHREAD_MUTEX_INITIALIZER, 0}; data_t dataB = {PTHREAD_MUTEX_INITIALIZER, 1}; You can now pretty much count on having the two data elements allocated in separate "memory access chunks". Not an absolute guarantee, since a pthread_mutex_t might be a char as well, and some C compilers might not align structures on natural memory boundaries. But most compilers on machines that care WILL align/pad structures to fit the natural data size, unless you override it with a switch or pragma (which is generally a bad idea even when it's possible). And, additionally, a pthread_mutex_t is unlikely to be less than an int, and is likely at least a couple of longs. (On Digital UNIX, for example, a pthread_mutex_t is 48 bytes, and on Solaris it's 24 bytes.) There are, of course, no absolute guarantees. If you want to be safe and portable, you might do well to have a config header that typedefs "smallest_safe_data_unit_t" to whatever's appropriate for the platform. Then it's just a quick trip to the hardware reference manual when you start a port. On a CISC, you can probably use "char". On most RISC systems, you should use "int" or "long". Yes, this is one more complication to the process of threading old code. But then, it's nothing compared to figuring out which data is shared and which is private, and then getting the locking protocols right. /---------------------------[ Dave Butenhof ]--------------------------\ | Digital Equipment Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ > If I'm not mistaken, isn't that spelled: > > #include > > typedef sig_atomic_t smallest_safe_data_unit_t; You are not mistaken, and thank you very much for pointing that out. While I'd been aware at some point of the existence of that type, it was far from the top of my mind. If you have data that you intend to share without explicit synchronization, you should be safe in using sig_atomic_t. Additionally, using sig_atomic_t will protect you against word tearing in adjacent data protected by separate mutexes. There are additional performance considerations, such as "false sharing" effects in cache systems, that might dictate larger separations between two shared pieces of data: but those won't affect program CORRECTNESS, and are therefore more a matter of tuning for optimal performance on some particular platform. =================================TOP============= Q157: Mixing threaded/non-threadsafe shared libraries on Digital Unix claude vallee wrote: > Hi All. I have a question on building a mutli-threaded process (on > Digital Unix 4.0) which is linked with non thread safe shared libraries. > > Let's say: > > mymain.c has no calls to thread functions and none of its functions runs > in a secondary thread. I will compile this file with the -pthread > option. (I call secondary thread any but the main thread) > > liba.so contains non thread safe functions, but I know for a fact that > none of its functions will run in a secondary thread. This library was > not built using the -pthread option. > > libb.so is my multi-thread library. It creates threads and its > functions are thread safe or thread reentrant. All of its code was > compiled with the -pthread option. All the code executing in a > secondary thread is in this library. > > The questions are: > > 1. Will this work? If liba.so was not built with threads options, is it > all right if it runs only in the main thread? Which c runtime library > will be used at run time? libc or libc_r? On Digital UNIX 4.0D, this should be OK. On earlier versions, you need to be careful around use of errno. For various historical reasons I won't try to explain (much less justify), setting of errno through libc (using the "seterrno()" function or equivalent) would previously result in setting the process global errno ("extern int errno;"), not just the per-thread errno of the calling thread. For 4.0D, I was able to change the code so that seterrno() always sets the calling thread's errno, and also sets the global errno ONLY if called from the initial ("main") thread of the process. With this change, it's safe (as least as far as errno use) to run non-thread-aware libraries as long as you use them only in the initial thread. To make this clear, prior to 4.0D, your liba code running in the main thread may see errno change at random. As long as liba doesn't read errno, this shouldn't be a problem. You do have to be aware of whether liba SETS the global errno -- because your thread-safe code won't see the global errno through any normal mechanisms. > 2. I noticed that on my DU 4.0, the libc.so and libc_r.so are > identical!! I assume this means that I am always using the thread safe > version of the libc library. Is that correct? Yes -- libc_r was another historical oddity. (Due to OSF/1 rules.) It no longer exists, as of Digital UNIX 4.0. The (hard) link provides binary compatibility for older programs that still reference libc_r. > 3. What does -pthread do to my code? I saw that my objects are > different (in size anyway), and that my executable point to libmach and > libpthread. What is added to the object code? There are two basic contributions of "-pthread": * At compile-time, the definition -D_REENTRANT is provided * At link-time, the proper libraries are added, to the end of the actual list of libraries, but immediately before the implicit reference to libc that's generated by the compile driver. Additionally, -pthread causes the linker to search for "reentrant" versions of any library you specify. (E.g., if you say "-lfoo" and there's a libfoo_r in your -L path, the linker will automatically use it.) The primary effect of -D_REENTRANT is to change -- references to errno make a call into the thread library to get the thread's private errno address rather than the global errno. There are some other changes to various headers, though. > 4. Does defining _THREAD_SAFE when compiling and linking with > libpthread, libmach and libc_r equivalent to building with the -pthread > option? No, _THREAD_SAFE doesn't do anything. It's considered obsolete. You should use _REENTRANT. (Though I actually prefer the former term, I've never felt it was worth arguing, or making anyone change a ton of header files.) > I did some tests, and everything works well... for the moment, but IMHO, > it does not mean anything. Everyone knows that non thread safe code > will work perfectly fine until your demo ;-) Depends. If the demo is a critical requirement for a multi-million dollar sale, then, yeah, it can probably hurt you worst by failing then. Otherwise, though, it'll have a lot more fun by SUCCEEDING at the demo, and failing when the customer runs the code in their mission-critical environment. This is a correllary to a correllary to Murphy's Law, which stated something about the inherent perversity of inanimate objects... Oh... and since liba is, presumably, a third-party library over which you've got no direct control... you should tell them immediately that you're running their code in a threaded application, and it would be to their long-term benefit to build a proper thread-safe version before you find another option. If liba is YOUR code, then please don't play this game: build it at least with -D_REENTRANT. /---------------------------[ Dave Butenhof ]--------------------------\ =================================TOP============= Q158: VOLATILE instead of mutexes? > What about exception handlers ? I've always thought that when you had > code like: > > int i; > > TRY > { > > . . . > proc(); > } > CATCH_ALL > { > if (i > 0) > { > . . . > } > . . . > } > > that you needed to declare "i" to be volatile least the code in the > catch block assume that "i" was stored in some register the contents > of which were overwritten by the call to "proc" (and not restored by > whatever mechanism was used to throw the exception). Since neither ANSI C nor POSIX has any concept remotely resembling "exceptions", this is all rather moot in the context of our general discussion, isn't it? I mean, it's got nothing to do with sharing data between threads -- and that's what I thought we were talking about. But sure, OK, let's digress. Since there's no standard covering the behavior of anything that uses exceptions, (at least, not if you use them from C, or even if you use the DCE exception syntax you've chosen from C++), there's no portable behavior. Your fate is in the hands of the whimsical (and hypothetically malicious ;-) ) implementation. This situation might lead a cautious programmer to be unusually careful when treading in these waters, and to wear boots with thick soles. (One might also say that it could depend on exactly HOW you "foodle with i", but I'll choose to disregard an entire spectrum of mostly amusing digressions down that fork.) Should you use volatile in this case? Sure, why not? It might not be necessary on many platforms. It might destroy your performance on any platform. And, where it is necessary, it might not do what you want. But yeah, what the heck -- use it anyway. It's more likely (by some small margin) to save you than kill you. Or, even better... don't code cases like this! /---------------------------[ Dave Butenhof ]--------------------------\ =================================TOP============= Q159: After pthread_cancel() destructors for local object do not get called?! > Hello, > I've run into a trouble when I found out that when I cancel a thread via > pthread_cancel() than destructors for local object do not get called. > Surprising :). But how to deal with this? With a simple thread code > it would not be a big problem, but in my case it's fairly complex code, > quite a few STL classes etc. Has someone dealt with such problem and is > willing to share his/her soltution with me ? I thought I could 'cancel' > thread via pthread_kill() and raise an exception within a signal handler > but it's probably NOT very good idead, is it?;) > Thank you, > Ales Pour > > Linux, egcs-1.0.3, glibc-2.0.7 with LinuxThreads Ales, Unfortunately, not surprising. C++ has not formally decided what to do with thread cancellation, so it becomes compiler-specific. The Sun compiler (for example) will run local object destructors upon pthread_exit() (hence cancellation also). Others may not. I suppose the best GENERAL C++ solution is: a) Don't use stack-allocated objects. b) Don't use cancellation. Otherwise you can simply insist on a C++ compiler that runs the destructors. -Bil =================================TOP============= Q160: No pthread_exit() in Java. > In POSIX, we have pthread_exit() to exit a thread. In Java we > *had* Thread.stop(), but now that's gone. Q: What's the best way > to accomplish this? > > I can (a) arrange for all the functions on the call stack to > return, all the way up to the top, finally returning from the > top-level function. I can (b) throw some special exception I > build for the purpose, TimeForThreadToExitException, up to the > top-level function. I can throw ThreadDeath. > > But what I really want is thread.exit(). > > Thoughts? > > -Bil > -- > ================ > Bil LambdaCS.com > > http://www.LambdaCS.com > Lambda Computer Science > 555 Bryant St. #194 > Palo Alto, CA, > 94301 > > Phone/FAX: (650) 328-8952 > Here's a real quick reply (from a slow connecction from Sydney AU (yes, visiting David among other things)). I'll send something more thorough later.... Throwing ThreadDeath yourself is a pretty good way to force current thread to exit if you are sure it is in a state where it makes sense to do this. But if you mean, how to stop other threads: This is one reason why they are extremely unlikely to actually remove Thread.stop(). The next best thing to do is to take some action that is guaranteed to cause the thread to hit a runtime exception. Possibililies range from the well-reasoned -- write a special SecurityManager that denies all resource-checked actions, to the sleazy -- like nulling out a pointer or closing a stream that you know thread needs. See http://gee.cs.oswego.edu/dl/cpj/cancel.html for a discussion of some other alternatives. -Doug ThreadDeath is an Error (not a checked Exception, since app's routinely catch all checked Exceptions) which has just the semantics you are talking about: it is a Throwable that means "this thread should die". If you catch it (because you have cleanup to do), you are SUPPOSED to rethrow it. 1.2 only, though, I think. Thread.stop() uses it, but although stop() is deprecated, it appears that ThreadDeath is not. I think. :^) Nicholas There is *nothing* special about a ThreadDeath object. It does not mean "this thread should die" but rather it indicates that "this thread has been asked to die". The only reason it "should" be rethrown is that if you don't then the thread doesn't actually terminate. This has always been documented as such and is not specific to 1.2. If a thread decides that for some reason it can continue with its work then it can simply throw new ThreadDeath() rather than calling stop() on itself. The only difference is that with stop() the Thread is immediately marked as no longer alive - which is a bug in itself. Cheers, David =================================TOP============= Q161: Is there anyway I can make my stacks red zone protected? Allocate your stack segments using mmap. Use mprotect to make the page after the bottom of your stack read-only (I'm assuming the stack grows down on whatever system you're using), or leave a hole in your address space. If you get a segfault due to an attempted write at the top of a red zone, map in some more stack and build a new red zone. =================================TOP============= Q162: Cache Architectures, Word Tearing, and VOLATILE Tim Beckmann wrote: > Dave Butenhof wrote: > > > David, > > > > > > My thoughts exactly! > > > > > > Does anyone know of a mainstream architecture that does this sort of > > > thing? > > > > Oh, absolutely. SPARC, MIPS, and Alpha, for starters. I'll bet most other RISC > > systems do it, too, because it substantially simplifies the memory subsystem > > logic. And, after all, the whole point of RISC is that simplicity means speed. > > MIPS I know :) The latest MIPS processors R10K and R5K are byte addressable. > The whole point of RISC is simplicity of hardware, but if it makes the software > more complex it isn't worth it :) The whole idea of RISC is *exactly* to make software more complex. That is, by simplifying the hardware, hardware designers can produce more stable designs that can be produced more quickly and with more advanced technology to result in faster hardware. The cost of this is more complicated software. Most of the complexity is hidden by the compiler -- but you can't necessarily hide everything. Remember that POSIX took advantage of some loopholes in the ANSI C specification around external calls to proclaim that you can do threaded programming in C without requiring expensive and awkward hacks like "volatile". Still, the interpretation of ANSI C semantics is stretched to the limit. The situation would be far better if a future version of ANSI C (and C++) *did* explicitly recognize the requirements of threaded programming. > > If you stick to int or long, you'll probably be safe. If you use anything > > smaller, be sure they're not allocated next to each other unless they're under > > the same lock. > > Actually, you can be pretty sure that a compiler will split two declarations > like: > char dataA; > char dataB; > to be in two separate natural machine words. It is much faster and easier for > those RISC processors to digest. However if you declare something as: While that's certainly possible, that's just a compiler optimization strategy. You shouldn't rely on it unless you know FOR SURE that YOUR compiler does this. > char data[2]; /* or more than 2 */ > you have to be VERY concerned with the effects of word tearing since the > compiler will certainly pack them into a single word. Yes, this packing is required. You've declared an array of "char" sized data, so each array element had better be allocated exactly 1 char. > > I wrote a long post on most of the issues brought up in this thread, which > > appears somewhere down the list due to the whims of news feeds, but I got > > interrupted and forgot to address this issue. > > Yep, I saw it. It was helpful. So was the later post by someone else who > included a link to a DEC alpha document that explained what a memory barrier > was in this context. I've seen three different definitions over the years. > The definition you described in your previous post agreed with the DEC alpha > description... That a memory barrier basically doesn't allow out of order > memory accesses to cross the barrier. A very important issue if you are > implementing mutexes or semaphores :)[...] > > However, I really believe that dataA and dataB should both be declared as > "volatile" to prevent the compiler from being too aggressive on it's > optimization. The mutex still doesn't guarantee that the compiler hasn't > cached the data in an internal register across a function call. My memory > isn't perfect, but I do think this bit me on IRIX. The existence of the mutex doesn't require this, but the semantics of POSIX and ANSI C do require it. Remember that you lock a mutex by calling a function, passing an address. While an extraordinarily aggressive C compiler with a global analyzer might be able to determine reliably that there's no way that call could access the data you're trying to protect, such a compiler is unlikely -- and, if it existed, it would simply violate POSIX 1003.1-1996, failing to support threads. You do NOT need volatile for threaded programming. You do need it when you share data between "main code" and signal handlers, or when sharing hardware registers with a device. In certain restricted situations, it MIGHT help when sharing unsynchronized data between threads (but don't count on it -- the semantics of "volatile" are too fuzzy). If you need volatile to share data, protected by POSIX synchronization objects, between threads, then your implementation is busted. > > There are, of course, no absolute guarantees. If you want to be safe and > > portable, you might do well to have a config header that typedefs > > "smallest_safe_data_unit_t" to whatever's appropriate for the platform. Then > > it's just a quick trip to the hardware reference manual when you start a port. > > On a CISC, you can probably use "char". On most RISC systems, you should use > > "int" or "long". > > There never are guarantees are there :) To reiterate again one more time, ( ;-) ), the correct (ANSI C) portable type for atomic access is sig_atomic_t. > > Yes, this is one more complication to the process of threading old code. But > > then, it's nothing compared to figuring out which data is shared and which is > > private, and then getting the locking protocols right. > > But what fun would it be if it wasn't a challenge :) Well, yeah. That's my definition of "fun". But not everyone's. Sometimes "boring and predictable" can be quite comforting. > However, I would like to revist the original topic of whether it is "safe" to > change a single byte without a mutex. Although, instead of "byte" I'd like to > say "natural machine word" to eliminate the word tearing and non-atomic memory > access concerns. I'm not sure it's safe to go back to the original topic, but > what the heck ;) sig_atomic_t. > If you stick to a "natural machine word" that is declared as "volatile", > you do not absolutely need a mutex (in fact I've done it). Of course, there are > only certain cases where this works and shouldn't be done unless you really know > your hardware architecture and what you're doing! If you have a machine with a > lot of processors, unnecessarily locking mutexes can really kill parallelism. > > I'll give one example where this might be used: > > volatile int stop_flag = 0; /* assuming an int is atomic */ > > thread_1 > { > /* bunch of code */ > > if some condition exists such that we wish to stop thread_2 > stop_flag = 1; > > /* more code - or not :) */ > } > > thread_2 > { > while(1) > { > /* check if thread should stop */ > if (stop_flag) > break; > > /* do whatever is going on in this loop */ > } > } > > Of course, this assumes the hardware has some sort of cache coherency > mechanism. But I don't believe POSIX mutex's or memory barriers (as > defined for the DEC alpha) have any impact on cache coherency. If a machine has a cache, and has no mechanism for cache coherency, then it can't work as a multiprocessor. > The example is simplistic, but it should work on a vast majority of > systems. In fact the stop_flag could just as easily be a counter > of some sort as long as only one thread is modifying the counter... In some cases, yes, you can do this. But, especially with your "stop_flag", remember that, if you fail to use a mutex (or other POSIX-guaranteed memory coherence operation), a thread seeing stop_flag set CANNOT assume anything about other program state. Nor can you ensure that any thread will see the changed value of stop_flag in any particular bounded time -- because you've done nothing to ensure memory ordering, or coherency. And remember very carefully that bit about "as long as only one thread is modifying". You cannot assume that "volatile" will ever help you if two threads might modify the counter at the same time. On a RISC machine, "modify" still means load, modify, and store, and that's not atomic. You need special instructions to protect atomicity across that sequence (e.g., load-lock/store-conditional, or compare-and-swap). Am I trying to scare you? Yeah, sure, why not? If you really feel the need to do something like this, do yourself (and your project) the courtesy of being EXTREMELY frightened about it. Document it in extreme and deadly detail, and write that documentation as if you were competing with Stephen King for "best horror story of the year". I mean to the point that if someone takes over the project from you, and doesn't COMPLETELY understand the implications, they'll be so terrified of the risk that they'll rip out your optimizations and use real synchronization. Because this is just too dangerous to use without full understanding. There are ways to ensure memory ordering and coherency without using any POSIX synchronization mechanisms, on any machine that's capable of supporting POSIX semantics. It's just that you need to be really, really careful, and you need to be aware that you're writing extremely machine-specific (and therefore inherently non-portable) code. Some of this is "more portable" than others, but even the "fairly portable" variants (like your stop_flag) are subject to a wide range of risks. You need to be aware of them, and willing to accept them. Those who aren't willing to accept those risks, or don't feel inclined to study and fully understand the implications of each new platform to which they might wish to port, should stick with mutexes. /---------------------------[ Dave Butenhof ]--------------------------\ | Digital Equipment Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP============= Q163: Can ps display thread names? > Is there a way to display the name of a thread (specified thanks > to the function pthread_setname_np) in commands such a ps ? > (in order to quickly see the behavior of a well-known thread). > If it is not possible with Digital UNIX's ps, someone may > have hacked some interesting similar utilities that display > such thread informations ? The ps command is a utility to show system information, and this would be getting into an entirely different level of process information. It would, arguably, be "inappropriate" to do this in ps. In any case, the decision was made long ago to not do as you suggest. The easiest way to get this information is to attach to the process with ladebug (or another suitable user-level-thread-enabled debugger) and ask for the information. (E.g., ladebug's "show thread" command.) While one certainly could create a standalone utility, you'd need to find the binary associated with the process, look up symbols, use /proc (or equivalent) to access process memory, and so forth -- sounds a lot like a debugger, doesn't it? The mechanism used to access this information is in libpthreaddebug.so. As of 4.0D, the associated header file, is available on the standard OS kit (with the other development header files). Although it's not externally documented, it's heavily commented, and reasonably self-describing. =================================TOP============= Q164: (Not!) Blocking on select() in user-space pthreads. Subject: Re: Blocking on select() in user-space pthreads under HP/UX 10.20 David Woodhouse wrote: > HP/UX 10.20 pthreads as are implemented as user-space as opposed to > kernel. I've heard rumors that a user-space thread that blocks on > select() actually blocks all other threads within that process (ie the > entire process). True of false? The answer is an absolute, definite, unambigous... maybe. Or, to put it another way... the answer is true AND false. However, being in a generous (though slightly offbeat) mood today, I'll go a little further and explain the answer. (Remember how the mice built Deep Thought to find the answer to "Life, the Universe, and Everything", and it came back with the answer "42", so they had to go off and build an entirely different computer to find the question, which was "what is 9 times 6", resulting in a third attempt, this time to find out what the question and answer MEANT?) Anyway, any blocking kernel call, including select, will indeed block the process. However, if built correctly, a DCE thread (that's the origin of the thread library on 10.20) application will never directly call select. Instead, its calls will be redirected to the user-mode thread library's "thread-synchronous I/O" package. This package will attempt a NON-blocking select, and, if it would have needed to block (none of the files are currently ready), the thread will be blocked on a condition variable "until further notice". At various points, the library polls (with select) and awakens any thread waiting for a file that's now ready. When all application threads are blocked somewhere, the thread library blocks the process in select, with a mask that OR-s all the masks for which any thread is waiting, and with a timeout representing the next user-mode timer (pthread_cond_timewait, select, whatever). /---------------------------[ Dave Butenhof ]--------------------------\ | Digital Equipment Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP============= Q165: Getting functional tests for UNIX98 > Dave Butenhof wrote somewhere that there were > a set of functional tests for UNIX98, that could > also work with POSIX. Any idea where I could find > it? The place to look is The Open Group. Start with http://www.xopen.org/. (Unfortunately I don't have a bookmark handy for the test suite, and I can't get to xopen.org right now; so you're on your own from here. ;-)) =================================TOP============= Q166: To make gdb work with linuxthreads? Are there any ongoing work or plans to make gdb work with linuxthreads? > >- Erik Yes, there's a project at the University of Kansas called SmartGDB that is working on support for user-level and kernel-level threads. There is already support for several user level thread packages and work is currently being done on the linuxthreads support. The URL is: http://hegel.ittc.ukans.edu/projects/smartgdb/ We have most of the kernel modifications done required to support it and are working on the rest of the changes to gdb. At this point, I can't even guess on a release date, but you can check the web page for more information on what's been done so far. The email contact is [email protected]. Robert =================================TOP============= Q167: Using cancellation is *very* difficult to do right... Bil Lewis wrote: > Dave Butenhof wrote: > > > Using cancellation is *very* difficult to do right, and you > > > probably don't want to use it if there is any other way you can > > > accomplish your goal. (Such as looking at a "finish" flag as you > > > do below.) > > > > I don't agree that cancellation is "very" difficult, but it does > > require understanding of the application, and some programming > > discipline. You have to watch out for cancellation points, and be > > sure that you've got a cleanup handler to restore shared data > > invariants and release resources that would otherwise be stranded if > > the thread "died" at that point. It's no worse than being sure you > > free temporary heap storage, or unlock your mutexes, before > > returning from a routine... but that's not to say that it's trivial > > or automatic. (And I'm sure we've never gotten any of those things > > wrong... ;-) ) > > Dave has written 10x as much UNIX code as I have, so our definitions > of "very difficult" are distinct. (I've probably been writing MP code > longer tho... I built my first parallel processor using two PDP/8s > back in '72. Now THERE was a RISC machine! DEC could have owned the > world if they'd tried. I remember when...) Yeah, PDP-8 was a pretty good RISC, for the time. Of course it needed virtual memory, and 12 bits would now be considered a rather "quirky" word size. But, yeah, those could have been fixed. Oh yeah... and we DID own the world. We just let it slip out of our hands because we just laughed when little upstarts said they owned it. (Unfortunately, people listened, and believed them, and eventually it came to be true.) ;-) ;-) > It's that bit "to restore shared data invariants". Sybase, Informix, > Oracle, etc. spend years trying to get this right. And they don't > always succeed. It's hard to do hard things. Yeah, when you've got an extremely complicated and extremely large application, bookkeeping gets more difficult. This applies to everything, not just handling cancellation. Just as running a multinational corporation is harder than running a one-person home office. The point is: the fact that the big job is hard doesn't mean the small job is hard. Or, you get out what you put in. Or "thermodynamics works". Or whatever. > And don't forget to wait for the dead threads. You can't do > anything with the shared data until those have all been joined, > because you can't be sure when they actually die. That's untrue, as long as you use proper synchronization (or maybe "as long as you use synchronization properly"). That's exactly why the mutex associated with a condition wait is re-locked even when the wait is cancelled. Cleanup code needs (in general) to restore invariants before the mutex can be safely unlocked. (Note that while the data must be "consistent", at the time of the wait, because waiting has released the mutex, it's quite common to modify shared data in some way associated with the wait, for example, adding an item to a queue; and often that work must be undone if the wait terminates.) You only need to wait for the cancelled thread if you care about it's return value (not very interesting in this case, since it's always PTHREAD_CANCELED, no matter how many times you look), or if you really want to know that it's DONE cleaning up (not merely that the shared data is "consistent", but that it conforms to some specific consistency -- an attempt that I would find suspicious at best, at least if there might be more than the two threads wandering about), or if you haven't detached the thread and want to be sure it's "gone". > Conclusion: Dave is right (of course). The definition of "very" is > up for grabs. The definition of the word "very" is always up for grabs. As Samuel Clemens once wrote, when you're inclined to use the word "very", write "damn" instead; your editor will remove it, and the result will be correct. Sure, correct handling of cancellation doesn't happen automatically. Neither does correct use of threads, much less correct use of the arcane C language (and if C is "arcane", what about English!?) Somehow, we survive all this. /---------------------------[ Dave Butenhof ]--------------------------\ | Digital Equipment Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP============= Q168: Why do pthreads implementations differ in error conditions? [email protected] wrote: > I'd like to understand why pthreads implementations from different > vendors define error conditions differently. For example, if > pthread_mutex_unlock is called for a mutex that is not owned by the > calling thread. > > Under Solaris 2.5: "If the calling thread is not the owner of the > lock, no error status is returned, and the behavior of the program > is undefined." > > Under AIX 4.2: It returns the EPERM error code. > > The problem may be that the AIX 4.2 implementation is based on draft 7 > of the pthreads spec, not the standard, but I certainly prefer the AIX > approach. > > Another example: pthread_mutex_lock is called for a mutex that is > already owned by the calling thread. > > Under Solaris 2.5: "If the current owner of a mutex tries to relock > the mutex, it will result in deadlock." (The process hangs.) > > Under AIX 4.2: It returns the EDEADLK error code. > > Once again, the AIX approach certainly seems preferable. > > Aren't these issues clearly defined by the pthreads standard? If not, > why not? Yes, these issues are clearly defined by the POSIX standard. And it's clearly defined in such a manner that implementations are not required to make the (possibly expensive) checks to report this sort of programmer error -- but so that implementations that do choose to detect and report the error must do so using a standard error code. In this case, Solaris 2.5 chooses not to "waste" the time it would take to detect and report your error, while AIX 4.2 does. Both are correct and conform to the standard. (Although, as you pointed out, AIX 4.2 implenents an obsolete draft of the standard, in this respect it doesn't differ substantially from the standard.) The POSIX philosophy is that errors that are not under programmer control MUST (or, in POSIX terms, "SHALL") be reported. Examples include ENOMEM, and other resource shortages. You can't reasonably know that there's not enough memory to create a thread, because you can't really know how much you're using, or how much additional is required. On the other hand, you can be expected to know that you've already locked the mutex, and shouldn't try to lock it again. POSIX does not require that an implementation deal gracefully with such programmer errors. While it is nice to have a "debugging mode" where all programmer errors are detected, in the long run it's more important to have a "production mode" where such extremely critical functions as mutex lock and unlock execute as quickly as possible. In general, the only way to do both is to have two separate libraries. This complicates maintenance substantially, of course -- but it also complicates application development because the two libraries will have different timings, and will expose different problems in the application design. Which means you'll inevitably run into a case that only fails on the "production library", and can't be debugged using the "debug library". That usually means the development and maintenance costs of building and shipping two thread libraries usually isn't worthwhile. You're better off relying on other tools to detect this class of programming error. For example, the Solaris "lock_lint" program, or dynamic trace analysis tools that can check for incorrect usage at runtime in your real program. /---------------------------[ Dave Butenhof ]--------------------------\ | Digital Equipment Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP============= Q169: Mixing threaded/non-threadsafe shared libraries on DU Claude Vallee wrote: > Thanks Dave Butenhof for your excellent answer. I just have a few > complementary questions. > > + To make this clear, prior to 4.0D, your liba code running in the > + main thread may see errno change at random. As long as liba doesn't > + read errno, this shouldn't be a problem. > + > > I found out I was using 4.0B. Is errno the only the problem area of > the c run time library? What about other libraries like librt? The base kit libraries should be thread-safe. There are so many, though, that I'm afraid I can't claim personal knowledge that they all ARE thread-safe. I also know of at least one case where a base kit library was "almost" thread-safe, but had been compiled without -D_REENTRANT (making it subject to errno confusion). Bugs, unfortunately, are always possible. While it's not that hard to code a library for thread-safety, it's nearly impossible to PROVE thread-safety -- because problems come in the form of race conditions that are difficult to predict or provoke. > + You do have to be aware of whether liba SETS the global errno -- > + because your thread-safe code won't see the global errno through any > + normal mechanisms. > > What do you mean by that? Yes, liba sets errno each time it calls a > system service (the system service sets it actually). If you're > asking if it explicitely sets it, then no. Are you asking if I am > counting on setting errno from one thread and reading it from the > other thread counting on the value to be the same? Calling system services doesn't count. The libc syscall stubs that actually make the kernel call DO handle errno correctly with threads. (On the other hand, if your library runs in a threaded application and isn't built correctly, you'll end up looking at the global errno while the system/libc just set your thread errno. That's the point of my 4.0D change -- at least if that non-thread-safe code runs in the initial thread, it'll get the right errno.) > By the way, seterrno(), does not seem to be a public service (it > doesn't have a man page anyway, (I found _Seterrno() in errno.h, but I > we're certainly not using it )). You're right -- int _Geterrno() and _Seterrno(int) are the external interfaces. I'd recommend compiling for the threaded errno rather than using those calls, though. > + No, _THREAD_SAFE doesn't do anything. It's considered obsolete. You > + should use _REENTRANT. (Though I actually prefer the former term, > + I've never felt it was worth arguing, or making anyone change a ton > + of header files.) > > Ok, _THREAD_SAFE is out. Then, if I define _REENTRANT when compiling > all my sources, and I explicitely link with libpthread, libmach, > libc_r, and all the reentrant versions of my libraries, will this > achieve the same thing as using the "-pthread" option? (Or am I > playing with fire again?). We document the equivalents, and it's perfectly legal to use them. However, the actual list of libraries may change from time to time, and you'll get the appropriate set for the current version, automatically, when you link with "cc -pthread". Over time, using the older set of libraries may leave you carrying around extra baggage. For example, your reference to libc_r is long-since obsolete; and on 4.0D, threaded applications no longer need libmach. While -pthread links automatically stop pulling in these useless references, you still be carrying them around, costing you extra time at each program load, as well as extra virtual memory. Is it a big deal? That's up to you. But if you're using a compiler that supports "-pthread", I'd recommend using it. > + If liba is YOUR code, then please don't play this > + game: build it at least with -D_REENTRANT. > > Yes, liba is our code... Actually, liba is a set of hundreds of > libraries which take a weekend to build. And most of our processes > are not multithread. What I was trying to achieve is to save on > processing time (use non thread-safe libraries in single threaded > processes), and to save on compile time (not building both single > threaded and multi threaded versions of all the libraries). If you just compile with -D_REENTRANT, you'll get thread-safe errno, but that's only a small part of "thread safety". As long as it's only called in one thread, that's probably OK. For more general thread-safety, with very little performance impact on non-threaded processes, you might look into the TIS interfaces (tis_mutex_lock, tis_mutex_unlock, etc.). You can use these just the equivalent POSIX functions; in a threaded process, they do the expected synchronization, but in a non-threaded process they avoid the cost of interlocked data access and memory barrier instructions, giving very low overhead. (TIS is better than having pthread stubs in libc, because it works even when the main program isn't linked against libpthread.) > Thanks again for a thorough answer. By the way, for some reason, I > could never get the answer to my question from my news server (I got > it from searching usenet through altavista), so please reply by email > as well as through the newsgroup. Yeah, news servers can be annoying creatures, with their own strange ideas of what information you should be allowed to see. You really can't trust them! ;-) /---------------------------[ Dave Butenhof ]--------------------------\ | Digital Equipment Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP============= Q170: sem_wait() and EINTR W. Richard Stevens wrote: > Posix.1 says that sem_wait() can return EINTR. Tests on both Solaris 2.6 > and Digital Unix 4.0b show that both implementations do return EINTR when > sem_wait() is interrupted by a caught signal. But if you look at the > (albeit simplistic) implementation of semaphores in the Posix Rationale > that uses mutexes and condition variables (pp. 517-520) sem_wait() is > basically: > > int > sem_wait(sem_t *sem) > { > pthread_mutex_lock(&sem->mutex); > while (sem->count == 0) > pthread_cond_wait(&sem->cond, &sem->mutex); > sem->count--; > pthread_mutex_unlock(&sem->mutex); > return(0); > } > > But pthread_cond_wait() does not return EINTR so this function will never > return EINTR. So I was wondering how existing implementations actually > implement sem_wait() to detect EINTR. Seems like it would be a mess ... > > Rich Stevens On Digital UNIX, sem_wait() turns into a system call, with the usual behavior in regard to signals and EINTR. You can only implement POSIX semaphores via pthreads interfaces if you have support for the PSHARED_PROCESS synch attribute. Digital UNIX won't have that support until an upcoming major release. At that point, it's entirely possible that the POSIX semaphores will be re-implemented to use the model you cite. I certainly encouraged the maintainer to do so before I left... __________________________________________________ Jeff Denham ([email protected]) Bright Tiger Technologies SmartCluster� Software for Distributed Web Servers http://www.brighttiger.com =================================TOP============= Q171: pthreads and sprocs Peter Shenkin wrote: > > Can someone out there say something about which is better to use > for new code -- pthreads or sprocs -- and about what the tradeoffs > are? A good question. I've done little with sprocs, but have used pthreads on a 32p SGI Origin a good deal. Sprocs are heavyweight processes; pthreads are MUCH lighter weight. Sprocs have a considerable amount of O/S and programmer tool support for high performance programming; as yet, pthreads has almost none. Sprocs lend themselves to good fine-grain control of resources (memory, CPU choice, etc); as yet these strengths are largely lacking in SGI pthreads. The project on which I work has bet the farm on the present and future high performance of pthreads and the results so far have been good. However, we would dearly love for SGI and the rest of the parallel programming community to better support pthreads as well as they have their former proprietary parallel programming models so that we can control our threads as specifically as we could our sprocs and the like. Not really a complaint; more of a strong request. The upshot is, IMHO, go ahead and program in pthreads on SGIs. The performance gain you would have gotten from better control of your sprocs is made up for in the "portability" of pthreads, the rosier future of pthreads, and their more modest system resource use. Not my company's official position, I should add. -- Randy Crawford [email protected] =================================TOP============= Q172: Why are Win32 threads so odd? Bil Lewis wrote in article <[email protected]>... > You must know the folks at MS who did Win32 threads (I assume). Bad assumption. I know of them, but don't know them and they don't know me. > Some of the design sounds so inefficient and awkward to use, while > other bits look really nice. My own opinion is that win32 grew from uni-thread to pseudo-multi-thread in a very haphazard manner, basically features were added when they were found to be needed. I personally dislike the overall asymmetric properties of the API. Consider the current problem of providing POSIX condition variables: if you could release a single waiter on a manual reset event, or release all waiters on an autoreset event then the problem would be much simpler to solve. Consider also the ability to signalObjectAndWait() but no corresponding signalObjectAndwaitMultiple() - another change that would make writing various forms of CV's easier. > Are my views widely shared in the MS world? And why did they > choose that design? Have they thought of adopting the simpler > POSIX design (if not POSIX itself)? No idea, sorry. Try asking Dave Cutler he was one of the main thread architects AFAIK. David Bil Lewis wrote: > The real question is: "What the heck was Cutler (I assume?) thinking when he made > Win32 mutexes kernel objects?" > Well, Dave has a history in this department. Consider his Digital VAXELN realtime executive, a direct predecessor of WNT. It was designed and written by Cutler (and some other folks who later contributed to NT, like Darryl Havens) to run on the MicroVAX I waaaaay back in the early '80s at DECWest in Seattle. (Development soon moved to a dedicated realtime group at the Mill in Maynard.) VAXELN had processes (called jobs) and threads (called processes) and kernel objects (PROCESS, DEVICE, SEMAPHORE, AREA, PORT, MESSAGE). It ran memory-resident on embedded VAXes(or VAX 9000s for that matter), and let you program device drivers (or whatever) in Pascal, C, FORTRAN, or Bliss even. (Pretty nifty little concurrent environment, a little bit too ahead of its time for its own good, guess.) The only synchronization object provided oringally was the semaphore, which like the NT mutex, required a trip into the kernel even for uncontested locking. This of course proved to be too expensive for real-world concurrent programming, so a library-based optimized synch. object was developed. It had a count and an embedded binary semaphore object that could be locked quickly in user space through the use of the VAX ADAWI interlocked-increment instruction. A system call occurred only for blocking and wakeup on a contested lock. Sounds just like an NT critical section, huh? Ironically, in VAXELN it was called a MUTEX! History repeats itself, with only the names changed to protect the guilty... Jeff =================================TOP============= Q173: What's the point of all the fancy 2-level scheduling?? [email protected] wrote: > In article <[email protected]>, > Jeff Denham wrote: > > > > Boris Goldberg wrote: > > > > > Seriously, I've been around this Solaris threads package long enough to > > be wondering how often anyone is using PROCESS scope threads. > > With everyone just automatically setting SYSTEM scope threads > > to get the expected behavior, what's the point of all the fancy 2-level > > scheduling?? > > I think, it is better to use thr_setconcurrency to create #of processors > + some additional number (for I/O bound threads) of LWPs rather than > creating LWP for each thread. Can Dave Butenhof or somebody from > Sun thread designer team please comment on this? That's kinda funny, since I've got no connection with Sun. The real problem is that Sun's "2 level scheduling" really isn't at all like 2-level scheduling should be, or was intended to be. There's a famous paper from the University of Washington on "Scheduler Activations" (one of Jeff Denham's replies to this thread mentioned that term, so you may have noticed it), which provides the theoretical basis for modern attempts at "2 level scheduling". Both Sun and Digital, for example, claim this paper as the basis for our respective 2-level scheduling models. However, while we (that's Digital, not Sun) began with a model of 2-way communication between kernel and user schedulers that closely approximates the intended BEHAVIOR (though not the detailed implementation) of scheduler activations, I have a hard time seeing anything usefully similar in Solaris. They have a signal to save the process from total starvation when the final thread blocks in the kernel (by giving the thread library a chance to create another LWP). We automatically generate a new "replacement VP" so that the process always retains the maximum level of concurrency to which its entitled. The advantages of 2-level scheduling are in performance and scaling. 1. Scaling. Kernel threads are kernel resources, and (as with processes), there are strict limits to how many a kernel can handle. The limits are almost always fixed by some configuration process, and additionally limited by user quotas. Why? Because they're expensive -- not just to the process that uses them, but to the system as a whole. User-mode threads, on the other hand, are "just" virtual memory, and, in comparison, "dirt cheap". So you can create a lot more user threads than kernel threads. Yeah, the user threads can't all run at the same time... but neither can the kernel threads, because the number of processors (being a relatively expensive HARDWARE resource) is even more limited. The point is to balance the number of kernel threads against the "potentially parallelism" of the system (e.g., the number of processors), while balancing the number of user threads against the "potential concurrency" of the process (the maximum parallelism plus the maximum number of outstanding I/O operations the process might be able to undertake). [On Solaris, you do this manually by creating LWPs -- either by creating BOUND (SCS) threads, or by calling thr_setconcurrency. On Digital UNIX, this is done for you automatically through the integration between user and kernel level schedulers.] 2. Performance. In many typical workloads, most of the communication is between threads within the process. Synchronization involves mutexes and condition variables. A 2-level scheduler can optimize these types of synchronization, and the resulting context switches, without involving the kernel at all. A kernel thread needs to call into the kernel to block -- and then another thread needs to call into the kernel again to unblock it. A user thread (or a 2-level thread blocking in user mode) only needs to call into the thread library. Because a call into the kernel is more expensive than a call within the process (and usually LOTS more expensive), this can save a lot of time over the life of a process. /---------------------------[ Dave Butenhof ]--------------------------\ | Digital Equipment Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP============= Q174: Using the 2-level model, efficency considerations, thread-per-X Bil Lewis wrote: > My experience (which is fairly broad), and my opinion (which is > extensive) is that THE way to do this is this: Yes, Bil, your experience is extensive -- but unfortunately mostly limited to Solaris, which has a poor 2-level scheduling design. (Sorry, guys, but it's the truth.) > Use only System scoped threads everywhere. On your selected > release platforms, tune the number of threads for each configuration > that's important. (Take educated guesses at the rest.) For servers, > do everything as a producer/consumer model. Forget Thread-per-request, > thread-per-client, etc. They are too hard to tune well, and too > complex. No, use SCS ("system scoped") threads only where absolutely necessary. But, yeah, in recognition of your extensive experience, I would acknowledge that on Solaris they are (currently) almost always necessary. (Solaris 2.6 is supposed to be better than 2.5, though I haven't been able to try it, and I know that Solaris developers have hopes to make substantial improvements in the future.) This is, however, a Solaris oddity, and not inherent in the actual differences between the PCS ("process scoped") and SCS scheduling models. On the rest, though -- I agree with Bil that you should avoid "thread per request" in almost all cases. Although it seems like a simple extension into threads, you usually won't get what you want. This is especially true if your application relies on any form of "fairness" in managing the inevitable contention between clients, because "thread per request" will not behave fairly. You'll be tempted to blame the implementation when you discover this, but you'll be wrong -- the problem is in the application. The best solution is to switch to a client/server (or "producer/consumer") model, where you control the allocation and flow of resources directly. > Process scoped threads are good for a very small number of unusual > examples (and even there I'm not totally convinced.) On the contrary, PCS threads are best except for the very few applications where cross-process realtime scheduling is essential to correct operation of the application. (E.g., for direct hardware access.) > Simplicity rules. Right. (Fully recognizing the irony of agreeing with a simplistic statement while disagreeing with most of the philosophy behind it.) > Logic: Process scope gives some nice logical advantages in design, > but most programs don't need that. Most programs want to run fast. > Also, by using System scoped threads, you can monitor your LWPs, > knowing which is running which thread. SCS gives predictable realtime scheduling response across the system, but most programs don't need that. Most programs want to run fast, and you'll usually get the most efficient execution, and the best management of system resources, by using PCS threads. "Monitoring" your LWPs might be comforting, but probably provides no useful understanding of the application performance. You need an analysis tool that understands the CONTENTION between your execution contexts, not merely the identity of the execution contexts. Such an analysis tool can understand PCS threads as well as SCS threads. > Anywhere you're creating threads dynamically, you need to know > how many threads you're creating and ensure you don't create too > many. (Easy to mess up!) By using a P/C model, you create exactly > the right number of threads (tuned to machine, CPUs, etc.) and don't > have to think about them. If you run at lower than max capacity, having > a few idle threads is of very little concern. Remember that "concurrency" is much more useful to most applications than "parallelism", and is harder to tune without detailed knowledge of the actual workload. When you're doing I/O, your "concurrency width" is often far greater than your "execution width". It's often useful, for example, to dynamically increase the number of "servers" in a process to balance a large workload, because each server might be blocked on one client's request for a "long time". Dynamic creation isn't necessarily tied to a simplistic "thread per request" model. > Opinions may be worth what you paid for 'em. No, no. Opinions are hardly ever worth as much as you paid for them, and usually a good deal less. Information, however, may be worth far more. One might hope that in the process of airing our worthless opinions, we have incidentally exposed some information that might help someone! ;-) /---------------------------[ Dave Butenhof ]--------------------------\ | Digital Equipment Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP============= Q175: Multi-platform threading api From: "Constantine Knizhnik" I also have developed threads class library, which provides three system dependent implementations: based on Posix threads, Win32 and portable cooperative multitasking using setjmp() and longjmp(). This library was used in OODBMS GOODS for implementing server and client parts. If somebody is intersted in this library (and in OODBMS itself), it can be found at http://www.ispras.ru/~knizhnik Jason Rosenberg wrote in message <[email protected]>... >Hello, > >I am tasked with converting a set of large core C libraries >to be thread-safe, and to use and implement a multi-platform >api for synchronization. We will need solutions for >Solaris 2.5.1, Digital Unix 4.0B(D?), Irix 6.2-6.5, >HP-UX 10.20, AIX 4.1.2+, Windows NT 4.0(5.0) and Windows 95(98). > >I have built a basic wrapper api which implements a subset >of pthreads, and have it working under Digital Unix 4.0B, >Irix 6.2 (using sprocs), and Windows NT/95. I am in the >process of getting it going on the other platforms. =================================TOP============= Q176: Condition variables on Win32 Hi, > I don't see that this is justifiable. Have you ever seen any of the tortuous attempts by bright fellows like Jeffrey Richter to define relatively simple abstractions like readers/writer locks using the Win32 synchronization primitives? It's not pretty... In fact, the first several editions of his Advanced Windows book were full of bugs (in contrast, we got that right in ACE using CVs in about 20 minutes...). If someone of his calibre can't get this stuff right IN PRINT (i.e., after extensive reviews) I don't have much faith than garden variety Win32 programmers are going to have a clue... > It might be harder if you think in terms of the POSIX facilities. I > wouldn't say that the combination of a semaphore and mutex or > critsec is hard though, and the inconvenience of having to acquire > the mutex after waiting on the semaphore is balanced against > checking for false wakeups and being unable to signal n resources. Checking for false wakeups is completely trivial. I'm not sure what you mean by "signal n resources". I assume you're referring to the fact that condition variables can't be used directly in a WaitForMultipleObjects()-like API. Though clearly you can broadcast to n threads. > I had in mind that it is allowed to notify more than one thread > (which always seemed odd to me) but I don't have my POSIX spec > handy. Just a nit, but it does stress that false wakeups must be > handled. I had it this way originally, but others (hi David ;-)) pointed out that this was confusing, so I'm grudgingly omitting it from this discussion. I suppose anyone who really wants to understand POSIX CVs ought to read more comprehensive sources (e.g., Bil's book) than my paper! > I wish. In fact they store details about the owning thread and > support recursive acquisition. I think this was a screwup by the NT > designers - critical sections are needlessly expensive. For very > basic requirements (I believe) you can get a performance gain using > the InterlockedIncrement et al for spin locks and an auto reset > event to release waiters. (Not sure I can prove it at the moment > though. If it makes a measurable difference, you have bigger > problems than the efficiency of mutexes) BTW, there's been an interesting discussion of this on the comp.programming.threads newsgroup recently. You might want to check this out. > This occurs in several examples from 3.1 on. (Most of them? ...). > Clear copy-paste bug I'd guess. Better change to > ReleaseMutex(external_mutex). Hum, I'm not sure why you say this. In all these cases the "pthread_mutex_t" is typedef'd to be a CRITICAL_SECTION. Am I missing something here? > I think this is rather optimistic. There is no guarantee that any > of them will release the mutex in a timely fashion. My original > objection was to a solution that used a semaphore or other count of > tokens, and that one thread could loop quickly and steal several > tokens, leaving threads still blocked. Right, that was the original discussion that triggered this paper. > > EnterCriticalSection (external_mutex); > Another copy-paste bug. WaitForSingleObject? Can you please point out where you think these problems are occurring? As far as I can tell, everything is typedef'd to be a CRITICAL_SECTION (except for the SignalObjectAndWait() solution). Take care, Doug =================================TOP============= Q177: When stack gets destroyed relative to TSD destructors? Douglas C. Schmidt wrote: > Can someone please let me know if POSIX pthreads specifies > when a thread stack gets destroyed relative to the time at which the > thread-specific storage destructors get run? In particular, if a > thread-specific destructor accesses a pointer to a location on the > run-time stack, will this memory still exist or will it be gone by the > time the time the destructor runs? Thread-specific data destructors must run in the context of the thread that created the TSD value being destroyed. (This is clearly and unambiguously implied by the standard. That is, while the standard doesn't explicitly require this, an implementation that called destructors in some other context would present a wide range of severe restrictions in behavior that are not allowed by the standard.) Thus, the stack must exist and remain valid at this point. After a thread has terminated (having completed calling all cleanup handlers and destructors), "the result of access to local (auto) variables of the thread is undefined". E.g., at this point, (possibly before a pthread_join or pthread_detach), the stack may have been reclaimed. /---------------------------[ Dave Butenhof ]--------------------------\ | Digital Equipment Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP============= Q178: Thousands of mutexes? Peter Chapin wrote: > I'm considering a program design that would involve, potentially, a large > number of mutexes. In particular, there could be thousands of mutexes > "active" at any one time. Will this cause a problem for my hosting > operating system or are all the resources associated with a mutex in my > application's address space? For example, in the case of pthreads, are > there any resources associated with a mutex other than those in the > pthread_mutex_t object? Is the answer any different for Win32 using > CRITICAL_SECTION objects? (I know that there are system and process limits > on the number of mutexes that can be created under OS/2... last I knew it > was in the 64K range). POSIX mutexes are usually user space objects, so the limit is purely based on your virtual memory quotas. Usually, they're fairly small. Some obscure hardware may require a mutex to live in special memory areas, in which case there'd be a system quota -- but that's not relevent on any modern "mainstream" hardware. On an implementation with 1-1 kernel threads (AIX 4.3, HP-UX 11.0, Win32), there must be some "kernel component" of a mutex -- but this may be no more than a generic blocking channel, with the actual synchronization occurring in user-mode, so there may be no persistent kernel resources involved. Win32 mutexes are pure kernel objects -- critical sections, I believe, are user objects with a kernel blocking channel (but I don't know whether the kernel resource is persistent or dynamic). Similarly, even on a 2-level scheduling implementation (Solaris, Digital UNIX, and IRIX), a "process shared" mutex (a POSIX option that allows placing a mutex in shared memory and synchronizing between processes) requires a kernel blocking channel: but again, the persistent state may live completely in user-space. A process private (default) mutex, on a 2-level scheduling implementation, is almost certainly always a pure user-mode object. Any more detailed answers will require knowing exactly what O/S (and version) you intend to use. /---------------------------[ Dave Butenhof ]--------------------------\ | Digital Equipment Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP============= Q179: Threads and C++ In article <[email protected]>, Dave Butenhof wrote: > > Federico Fdez Cruz wrote: > > > When I create a thread using pthread_create(...), can I ask to the new > > thread to execute a object method instead other "normal" function? > > I have tried this, but it seems like the thread doesn't know anything > > about the object. If I write code in the method that doesn't use any > > member of the object, all goes well; but If I try to access a member > > function or a member in the object, I get a segmentation fault. > > I have seen that when the new thread is executing, "this" is NULL from > > inside this thread. > > Yes, you can do this. But not portably. There's no portable C++ calling standard. [Snipped] > My advice would be to avoid the practice. POSIX 1003.1c-1995 is a C language > standard, and there are, in general, few portability guarantees when crossing > language boundaries. The fact that C and C++ appear "similar" is, in some ways, > worse than crossing between "obviously dissimilar" languages. In fact, there are significant advantages in doing MT programming in C++. All the examples in Butenhof D.'s book follow a model of cleaning up (C++ destructor) at the end of block or on error (C++ exception) and can be written more elegantly in C++ using object (constructor/destructor) and exceptions. Of course, you will read Butenhof's book for threads (and not C++). > Use a static non-member function for pthread_create, and have it call your member > function. There's no need to bend the rules for a minor convenience, even when > you can depend on an implementation where the direct call happens to work. > You can use the following method in C++ : Thread.h -------- class Thread { pthread_t thread_; public: typedef void* (*StartFunction)(void* arg); enum DetachedState { joinable, detached }; Thread(StartFunction sf, void* arg, DetachedState detached_state); void join(); ... }; // use this to create a thead with a global function entry. inline Thread Thr_createThread(Thread::StartFunction sf, void* arg, Thread::DetachedState ds) { return Thread(sf, arg, ds); } // This is for C++ class member function (non-static) template class ThrAction { public: typedef void* (T::*MemberFunction)(); ThrAction(T* object, MemberFunction mf) : object_(object), mf_(mf) {} void* run() { return (object_->*mf_)(); } private: T* object_; MemberFunction mf_; }; // an implementation class : notice the friend template class ThreadImpl { static void* start_func(void* arg) { return ((Action*)arg)->run(); } friend Thread Thr_createThread(Action* action, Thread::DetachedState ds); }; // You use this to create a thread using a member function template inline Thread Thr_createThread(Action* action, Thread::DetachedState ds) { return Thread(ThreadImpl ::start_func, action, ds); } Now, you can use a C++ member function: class MyClass { typedef ThrAction Action; Action monitor_; public: void* monitor() { // this will be called in a new thread // ... } MyClass() : monitor_(this, monitor) { // start a thread that calls monitor() member function. Thread thr = Thr_createThread(&monitor_, Thread::detached); ... } Thread.cc --------- struct Detached { pthread_attr_t attr_; Detached() { pthread_attr_init(..); } ~Detached() { pthread_attr_destroy(..); } }; static Detached detached_attr; // depends on the static variable detached_attr; so make it // out-of-line (in Thread.cc file). Thread::Thread(StartFunction sf, void* arg, DetachedState ds) { pthread_create(&thread_, sf, arg, ds == detached ? detached_attr.attr_, 0); } I wrote it in a hurry. I hope it helps. - Saroj Mahapatra In article <[email protected]>, Kaz Kylheku wrote: ++ I second that! I also use the wrapper solution. Indeed, I haven't ++ found anyting in the C++ language definition which suggests that static member ++ functions are like ordinary C functions. You are correct. Most C++ compilers DO treat static member functions like ordinary C functions, so it's usually possible to pass static C++ member functions as arguments to thread creation functions. However, some compilers treat them differently, e.g., the OpenEdition MVS C++ compiler doesn't allow static member functions to be used where ordinary C functions are used, which is a PAIN. BTW, if you program with ACE (http://www.cs.wustl.edu/~schmidt/ACE.html) it hides all of this madness from you so you can write a single piece of source code that'll work with most C++ compilers. Take care, Doug I thought all those concerned with developing multi-threaded software using the STL and C++ might be interested in the topic of STL and thread safety. I just bought the July/August 1998 issue of C++ Report and within there is an article concerning the testing for thread safety of various popular implementations of STL. These are the published results: STL implementation Thread-safe? ------------------ ------------ Hewlett-Packard NO Rogue Wave NO Microsoft NO Silicon Graphics YES ObjectSpace YES ======================== But: ================ You've missed rather a lot of discussion on this topic in the intervening months. A few supplemental facts: 1) The definition of ``thread safe'' while not unreasonable is also not universal. It is the working definition promulgated by SGI and --surprise! -- they meet their own design requirement. 2) Hewlett-Packard provided the original STL years ago, at a time when Topic A was finding a C++ compiler with adequate support for templates. Thread safety was hardly a major design consideration. 3) Rogue Wave was quick to point out that the version of their code actually tested was missing a bug fix that had been released earler. The corrected code passes the tests in the article. 4) Microsoft has been the unfortunate victim of some messy litigation. (See web link in footer of this message.) If you apply the fixes from: http://www.dinkumware.com/vc_fixes.html then VC++ also passes the tests in the article. Its performance also improves dramatically. The C++ Report ain't Consumer Reports. Before you buy on the basis of an oversimplified table: a) Make sure your definition of ``thread safety'' agrees with what the vendor provides. b) Make sure you get the latest version of the code. P.J. Plauger Dinkumware, Ltd. http://www.dinkumware.com/hot_news.html -- > and I would like to implement the functionality of the jave interface > runnable. Actually I suppose both of the following questions are > really c++ questions, but I'm more optimistic to find the competent > audience for my problems here... > > My first question is: Why does it make a difference in passing the > data Argument to pthread_create, whether the function is a member > function or not? Because a member function is not a C function. A member function pointer is a combination of a pointer to the class, and the index of the function within the class. To make a member function "look" like a C function, you must make it a static member function. > In the following code I had to decrement the data pointer address > (&arg) by one. This is not necessary, if I define run() outside of any > class. This is dangerous!! One another compiler, a different platform, or the next release of the same compiler, this may or may not work. It is really a happy accident that it does work in this case.The canonical form for C++ is to pass the address of a static member function to pthread_create(), and pass the address of the object as the argument parameter to pthread_create(). the static member function then calls the non-static member by casting the void *arg to the class type. [...] > The second question is: I've tried to use the constructor of Thread to > start run() of the derived class as the thread. For this I've > implemented run() in the base class as pure virtual. But I didn't > succeed because the thread always tried to run the pure virtual base > function. Why is this? Because during the constructor of the base class, the object *is*a base object. It doesn't know that you have derived something else from it. It is not possible to call a derived class's virtual functions during construction or destruction of a base class. [...] Here is a minimalist emulation of the Java Runnable and Thread interface: no error checks, many routines left out, no thread groups and so on. ----------------------------------------------------------------------- #include // ---------------------------------------------------------------------- class Runnable { public: virtual ~Runnable(); virtual void run() = 0; }; Runnable::~Runnable() { } // ---------------------------------------------------------------------- class Thread { public: Thread(); Thread(Runnable *r); virtual ~Thread(); void start(); void stop(); void join(); virtual void run(); private: static void *startThread(void *object); void runThread(); private: Runnable *target; // 0=not started, 1=started, 2=finished int state; pthread_t thread; }; Thread::Thread() : target(0), state(0) { } Thread::Thread(Runnable *r) : target(r), state(0) { } Thread::~Thread() { if (state == 1) pthread_cancel(thread); pthread_detach(thread); } void Thread::start() { pthread_create(&thread, 0, &Thread::startThread, this); } void Thread::stop() { } void Thread::join() { void *value = 0; pthread_join(thread, &value); } void Thread::run() { } void *Thread::startThread(void *object) { Thread *t = (Thread *) object; t->runThread(); return 0; } void Thread::runThread() { state = 1; if (target) target->run(); else run(); state = 2; } // ---------------------------------------------------------------------- #include class Test : public Runnable { public: void run(); }; void Test::run() { printf("thread run called\n"); } int main(int argc, char *argv[]) { Thread t(new Test); t.start(); printf("thread started\n"); t.join(); return 0; } > I've run into a trouble when I found out that when I cancel a thread via > pthread_cancel() than destructors for local object do not get called. > Surprising :). But how to deal with this? With a simple thread code > it would not be a big problem, but in my case it's fairly complex code, > quite a few STL classes etc. Has someone dealt with such problem and is > willing to share his/her soltution with me ? I thought I could 'cancel' > thread via pthread_kill() and raise an exception within a signal handler > but it's probably NOT very good idead, is it?;) > Thank you, Unfortunately, not surprising. C++ has not formally decided what to do with thread cancellation, so it becomes compiler-specific. The Sun compiler (for example) will run local object destructors upon pthread_exit() (hence cancellation also). Others may not. I suppose the best GENERAL C++ solution is: a) Don't use stack-allocated objects. b) Don't use cancellation. Otherwise you can simply insist on a C++ compiler that runs the destructors. -- Ian Johnston, FX Architecture, UBS, Zurich =================================TOP============= Q180: Cheating on mutexes Hi all! Howz things goin? Just got back from the COOTS conference where I learned all sorts of valuable lessons ("Don't try to match Steve Vinoski drink for drink", "Snoring during dull presentations is not appreciated"). As to this question... [NOTE: SINCE THIS WAS WRITTEN, SOME THINGS HAVE CHANGED AND THE ADVICE BELOW IS NO LONGER VALID. THE ISSUE IS THAT WITH SOME MODERN CPUS IT IS POSSIBLE THAT A VARIABLE WHICH WAS SET IN CPU #1 WILL NOT BE VISIBLE TO CPU #2. SO DON'T GET TRICKY AND USE THE MUTEX!) Pretty much everybody's been largely correct, but a little excessive. If we define the objective to be "I have a variable which will under go a single, atomic state change, can I test it without a mutex?" then the answer is "yes, if you do things right." In particular, if you want to go from FALSE to TRUE, and you don't care if you see the change synchroniously, then you're OK. This is how spin locks work, this is how pthread_testcancel works (at least on Solaris), and both Dave B & I talk about how to use this for pthread_once. With spin locks, we test the ownership bit until it becomes "free". Then we do a trylock on it. If somebody else gets it first, we go back to spinning. With pthread_testcancel() we test the cancellation flag for our thread w/o a lock. If it ever becomes true, we exit. (The setter will set it to true under mutex protection, so that upon mutex unlock, the value will be quickly flushed to main memory.) With pthread_once(), we'll insert a test BEFORE calling pthread_once, testing a variable. If it's true, then we know pthread_once has executed to completion and we can skip the test. If it's false, then we need to run pthread_once(), which will grab the proper lock, and do the testing under that lock, just in case someone else was changing it at that instant. So... If you're very, very, careful and you don't mind missing the exact point of initial change... you can get away with it safely. -Bil > > ... The real > > trouble is that if you don't use some kind of synchronisation > > mechanism, the update may not be seen at other CPUs *at all*. > ... > > Again donning my newbie hat with the point on top, why not? > > For example, might a a pthreads implementation on a distributed- > memory architecture not propagate global variables to the other > CPUs at all, in the absence of something like a mutex? =================================TOP============= Q181: Is it possible to share a pthread mutex between two distinct processes? > > ie: some way to attach to one like you can attach to shared memory. > > Same question for condition variables. The answer is (as often happens) both YES and NO. Over time, the balance will shift strongly towards YES. The POSIX standard provides an option known commonly as "pshared", which, if supported on your implementation, allows you to allocate a pthread_mutex_t (or pthread_cond_t) in shared memory, and initialize it using an attributes object with a specific attribute value, such that two processes with access to the shared memory can use the mutex or condition variable for synchronization. Because this is an OPTION in the POSIX standard, not all implementations will provide it, and you cannot safely count on it. However, the Single UNIX Specification, Version 2 (UNIX 98) requires that this POSIX option be supported on any validated UNIX 98 implementation. Implementations that provide the pshared option will define the preprocessor symbol _POSIX_THREAD_PROCESS_SHARED in the header file. For example, pthread_mutexattr_t mutattr; pthread_mutexattr_init (&mutattr); pthread_mutexattr_setpshared (&mutattr, PTHREAD_PROCESS_SHARED); pthread_mutex_init (&mutex, &mutattr); /---------------------------[ Dave Butenhof ]--------------------------\ | Digital Equipment Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP============= Q182: How should one implement reader/writer locks on files? > How should one implement reader/writer locks on files? > The locks should work accross threads and processes. The only way to lock "a file" is to use the fcntl() file locking functions. Check your man page. HOWEVER, there's a big IF... these locks are held by the PROCESS, not by the THREAD. You can't use them to control access between multiple threads within a process. If you are interested in a mechanism outside the file system, you could use UNIX98 read/write locks, with the pshared option to make them useful between processes (when placed in shared memory accessible to all the processes). However, UNIX98 read/write locks are not currently available on most UNIX implementations, so you'd have to wait a while. Of course you'd have to work out a way to communicate your shared memory section and the address of the read/write lock(s) to all of the processes interested in synchronizing. Also, while there are ways to make fcntl() locking mandatory instead of advisory (at least, on most systems), there's no way to do this with external locking. /---------------------------[ Dave Butenhof ]--------------------------\ | Digital Equipment Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP============= Q183: Are there standard reentrant versions of standard nonreentrant functions? | Certain standard functions found in C programming environments, | such as gethostbyname, are not reentrant and so are not safe | for use by multithreaded programs. There appear to be two | basic approaches to providing thread-safe versions of these | functions: | | (1) Reimplement the functions to use thread-local storage. | This is the approach that Microsoft has taken. It's | nice because the interface is exactly the same so you | don't have to change existing code. Can you cite documentation that Microsoft has done this consistently? (I'd love to be able to rely on it, but haven't been able to pin this down anywhere.) | (2) Provide alternate reentrant interfaces. This is the | approach taken by (some/most/all?) Unix vendors. The | reentrant version of the function has the same name | as the non-reentrant version plus the suffix _r. For | example, the reentrant version of gethostbyname is | gethostbyname_r. | | The big problem I'm having with approach (2) is that the | reentrant versions are not the same across different Unixes. | For example, the AIX 4.2 and Solaris 2.5 gethostbyname_r | interfaces are much different (the Solaris interface is | horrendous, I must say). FYI, having dealy with this on a couple of Unix systems: There's the way Solaris does it, and the way everyone else does it. While "everybody else" may not be 100% consistent, the differences are pretty minor, and often can be worked around with an appropriate typedef. To be fair to Sun, Solaris probably got there first, and others chose to do things differently; but that's the end result. BTW, if you read the man pages for things like gethostbyname_r, you'll find a notation that says that the function is "provisional" or something like that, and may go away in future releases. There's no change through Solaris 2.6, and some indication in the release notes that later Solaris versions will play some games to support both the traditional Solaris API, and the newer standards - whereever they are drawn from.) | While this is par for the Unix | course, I'm somewhat surprised that these interfaces are | not specified by POSIX. Or are they? Is there some | attempt underway to standardize? Is there some set of | _r functions that are specified by POSIX, and if so, | where can I find this list? *Some* of them were standardized. Dave Butenhof's "Programming with POSIX Threads" lists the following: getlogin_r readdir_r strtok_r asctime_r ctime_r gmtime_r localtime_r getgrpid_r getgrpnam_r getpwuid_r getpwnam_r. Also, a few functions (ctermid is an example) were declared thread-safe if certain restrictions were followed. None of the socket-related calls are on this list. The problem, I suspect, is that they were not in any base standard: They're part of the original BSD socket definition, and that hasn't made it into any official standard until very recently. As I recall, the latest Unix specifications, like the Single Unix Specification (there really aren't all that *many* of them, but the names change so fast I, for one, can't keep up), do standardize both the old BSD socket interface, and the "_r" variants (pretty much as you see them in AIX). BTW, standardization isn't always much help: localtime_r may be in the Posix standard, but Microsoft doesn't provide it. (Then again, Microsoft doesn't claim to provide support for the Posix threads API, so why would you expect it to provide localtime_r....) You still have to come up with system-dependent code. -- Jerry =================================TOP============= Q184: Detecting the number of cpus [email protected] wrote: > > I have responding to my own posts but I forgot that NT also defines an > environment variable, NUMBER_OF_PROCESSORS. Win95/98 may do so as well. > > Bradley J. Marker > > In article <[email protected]>, [email protected] writes: > > >Win95 only used a single processor the last I looked and there were no plans > >for SMP for Win98 that I've heard. I'd personally love SMP on 98 if it > >didn't > >cost performance too much. > > > >On NT you can get the processor affinity mask and count the number of bits > >that are on. Anybody have a better method? > > > >sysconf works on IRIX as well as Solaris. sysconf(_SC_NPROC_CONF) or > >sysconf(_SC_NPROC_ONLIN) (on Solaris it is NPROCESSORS instead of NPROC). > >You > >probably want on-line. By the way, on an IRIX 6.4 Origin 2000 I am getting > >sysconf(_SC_THREAD_THREAD_MAX) equal to 64. Just 64 threads max? I have > >multithreaded test programs running with more threads than that (or they > >seem > >to be working, anyway). > > > >Anybody know how to control the number of processors the threads run on for > >IRIX? I'd like both the non-specific run on N processors case and the > >specifically binding to the Nth processor case. With Solaris I've been > >using > >thr_setconcurrency and processor_bind. > > > >Sorry but I don't know for Digital Unix, IBM AIX, or HP-UX. > > > > Bradley J. Marker In Win32, GetSystemInfo fills in a struct that contains a count of the number of processors, among other things. =================================TOP============= Q185: Drawing to the Screen in more than one Thread (Win32) Note: Followup-to: set to comp.os.ms-windows.programmer.win32 [long post removed, see the thread :-) ] Maybe I'm wrong(*), but AFAIR: You can only draw in a window from the thread which own the window (the one which creates the window). This is this thread which receives all the messages targeted to the window into its thread messages list (each thread receive messages for the window it creates). For what I remember, it runs with TextOut() because the second thread send a message to the first one (which own the window) which then do the job. So if you use a locking mechanism between the two threads for accessing the window, you may go to a deadlock (thread 2 waiting for thread 1 painting, and thread 1 waiting for thread 2 releasing the access). Maybe by defining a user message with adequate parameter, and posting it (the second thread become then immediatly ready to continue number crunching) into the thread 1 message list, you can achieve a good update and a minimum thread 2 locking. A+ Laurent. =================================TOP============= Q186: Digital UNIX 4.0 POSIX contention scope > I recently found myself at the following website, which describes the > use of pthreads under Digital Unix 4.0. It is dated March 1996, so > I am wondering how up to date it is. > > http://www.unix.digital.com/faqs/pub > ications/base_doc/DOCUMENTATION/HTML/AA-Q2DPC-TKT1_html/thrd.html > > It refers to several unimplemented optional funtions from Posix > 1003.1c 1995, including pthread_setscope. So I am wondering, then, > what sort of "scope" do dec pthreads have, are they all system level, > or all process level, etc. Digital UNIX 4.0 (through 4.0C) did not support POSIX contention scope. It was just one of those things that "missed the cut". All POSIX threads are process contention scope (PCS). Digital UNIX 4.0D supports the scope attribute. (Since 4.0D has been shipping for some time, it appears that the web link you've found is not up to date.) On 4.0D, threads are PCS by default, (as they should be), but you can create SCS (system contention scope) threads for the rare situations where they're necessary. (For example, to share realtime resource directly with hardware, or with OS threads, etc.) =================================TOP============= Q187: Dec pthreads under Windows 95/NT? > Also, appendix C refers to dec pthreads under Windows 95/NT. Is that > a reality? Depends on what you mean by "reality". Yes, we have DECthreads running on Win32 "in the lab", and have for some time. In theory, given sufficient demand and certain management decisions regarding pricing and distribution mechanism, we could ship it. Those decisions haven't yet been made. (If you have input on any of this, send me mail; I'd be glad to forward it to the appropriate person. If you can say whether you're "just curious" or "want to buy" [and particularly if you can say how much you'd pay], that information would be useful.) =================================TOP============= Q188: DEC current patch requirements > It also doesn't describe the current patch requirements, etc., for > 4.0B. The Guide to DECthreads is a reference manual, not "release notes", and is not updated routinely for patch releases. The version you're reading is clearly for 4.0 through 4.0C, and there's a new version for 4.0D. We still haven't managed to find time to push through the sticky fibres of the bureaucracy to get a thread project web page, on which we could post up-to-date information like current patches and problems. In general, you should just keep up with the latest patch kit. You can always keep an eye on the patch FTP directory for your release, under ftp://ftp.service.digital.com/public/Digital_UNIX/ =================================TOP============= Q189: Is there a full online version of 1003.1c on the web somewhere? > Is there a full online version of 1003.1c on the web somewhere? No. The IEEE derives revenue from sale of its standards, and does not give them away. I understand this policy is "under review". It doesn't really matter, though, unless you intend to IMPLEMENT the standard. 1003.1c is not a reference manual, and if you want to learn how to use threads, check out a book that's actually written to be read; for example, my "Programming with POSIX Threads" (Addison-Wesley) or Bil Lewis' "Multithreaded Programming with Pthreads" (Prentice Hall) [which, I see, is so popular that someone has apparently stolen my copy from my office: well, after all, the spine IS somewhat more colorful than my book ;-) ]. On the other hand, what IS freely available on the web is the Single UNIX Specification, Version 2, including "CAE Specification: System Interfaces and Headers, Issue 5", which is the new UNIX98 brand specification that includes POSIX threads (plus some extensions). This document includes much of the text of POSIX 1003.1c, though in slightly altered form. Check it out at http://www.rdg.opengroup.org/onlinepubs/7908799/toc.htm /---------------------------[ Dave Butenhof ]--------------------------\ | Digital Equipment Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP============= Q190: Why is there no InterlockedGet? > ) My question is renewed, however. Why is there no InterlockedGet and > ) InterlockedSet. It seems under the present analysis, these would be > ) quite useful and necessary. Their absence was leading me to speculate > ) that Intel/Alpha/MS insure that any cache incoherence/lag is not > ) possible. > > InterlockedExchange and InterlockedCompareExchange give you > combined Get and Set operations, which are more useful. Unfortunately, InterlockedCompareExchange is not available under Windows '95, only NT. But yes, I agree with you.... =================================TOP============= Q191: Memory barrier for Solaris >I was wondering if anyone knew how to use memory barriers >in the Solaris environment. I believe that Dave B. >posted one for the DEC Alphas. I assume you use Solaris on a Sparc. First you should decide whether you are programming for Sparc V8 or Sparc V9. Buy the appropriate Architecture manual(s) from Sparc International (see: http://www.sparc.com/sparc.new/shop/docs.html ) I have not seen the Sparc Architecture manuals on-line. If someone has, I would be grateful for a pointer... The Sparc chips can be set in different modes regarding the memory model (RMO, PSO, TSO). You need to understand the concepts by reading the Architecture manual (chapters 6 and J in V8, chapters 8, D and J in V9). It is also helpful to know which ordering model Solaris uses for your process. In V8, the "barrier instruction" you are looking for is "stbar". You can use it by specifying asm(" stbar"); in your C code. In V9, the architecture manual says: "The STBAR instruction is deprecated; it is provided only for compatibility with previous versions of the architecture. It should not be used in new SPARC-V9 software. It is recommended that the MEMBAR instruction be used in its place." The deprecated stbar instruction is equivalent to MEMBAR #StoreStore. In V9, "memory barriers" are done with the membar instructions. As far as I can see, there are 12 different types of the instructions, depending on the type of memory barrier you want to have (check the architecture manual). =================================TOP============= Q192: pthread_cond_t vs pthread_mutex_t Jason Mancini wrote: > I wrote a small program that loops many times, > locking and unlocking 3 mutexes. The results are > 4.2 million mutex lock-unlocks per second. Doing the same > for two threads that wait and signal each other results > in 26,000 wait-signals per second using conditional > variables. Of course, this information is largely useless without knowing what hardware and software you're using. But nevermind that -- it probably doesn't matter right now that the numbers are meaningless. > Any explanations as to why conds are so much slower > than mutexes? There are no collisions in any of the > mutex acquisitions. Also it seems like the mutex rate > should be higher that it is. So, you're trying to compare the performance of UNCONTENDED (that is, non-blocking) mutex lock/unlock versus condition variable waits. Note that waiting on a condition variable requires a mutex lock and an unlock, PLUS the wait on a condition variable. Waking a thread that's waiting on a condition variable also requires locking and unlocking the same mutex (in order to reliably set the predicate that must be tested for a proper condition wait). (If you're not locking in the signalling thread, then you're doing it wrong and your measurements have no relevance to a real program.) Why, exactly, would you expect the performance of the condition variable protocol to be equivalent to the mutex protocol that consists of a small part of the condition variable protocol -- and, most importantly, that excludes the actual BLOCKING part of the condition variable protocol? As for the mutex rate -- 4.2 million per second means that each lock/unlock pair takes less than 1/4 of a microsecond. Given the inherent memory system costs of instructions intended to allow synchronization on a multiprocessor, you'd need to be running on a REALLY fast machine for that number to be "bad". > Is there anything faster available for putting many > threads to sleep and waking them up many times a > second? If your 26,000 per second rate isn't good enough, then the answer is "probably not". Still, by my count, that's way up in the range of "many times a second". What exactly are you attempting to accomplish by all this blocking and unblocking, anyway? If you're doing it as a consequence of some real work, then what's important is the performance of the WORK, not the cost of individual operations involved in the work. (You should really be trying to AVOID blocking, not worrying about blocking faster, because blocking will always be slower than not blocking.) Synchronization is not the goal of multithreaded programming. It's a necessary evil, that's required to make concurrent programming work. Synchronization is pure overhead, to be carefully minimized. Every program will run faster without synchronization... unfortunately, most concurrent programs won't run CORRECTLY without it. /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP============= Q193: Using DCE threads and java threads together on hpux(10.20) >I was wondering if anyone here has had any experience with >using dce threads and java threads together on hpux(10.20) >platform. I'm presuming that you're using a JavaSoft jvm. DON'T DO THAT! DON'T EVEN TRY!!!!! The DCE threads on hpux 10.20 are a user-space threads package. The JVM uses a (different!) user-space threads package. Ne'er the two shall meet. If you have access to an hpux 11.x box, which has kernel threads, there is a better chance of it working (not great, but better). It's been quite a while since I looked inside the JavaSoft JVM, but I seem to recall that the thread API isn't too ugly; you should probably use those calls to do your C++ threads, but be warned that you're in some rocky, unexplored territory. I've even heard that there be dragons there... =================================TOP============= Q194: My program returns enomem on about the 2nd create. > We just upgraded our Alpha from a 250MHz something to a 500+MHz dual > processor running Digital Unix V4. My program which previously had no > problem creating hundreds of threads returns enomem on about the 2nd to > 4th thread create. DEC support advised increasing maxusers from 128 to > 512 but to no avail. We've got 2Gig of memory and some other sys The real question is, how much memory does the application use before that final thread is created? The VM subsystem has an "optimization" for tracking protected pages that simply doesn't work well with threads. The thread library always creates a protected page for each stack, to trap overflows. (You can run without this, by setting the guardsize attribute to 0... but you shouldn't do that unless you're willing to be money that your thread won't ever, under any circumstances, overflow the stack; without the guard page, the results will be catastrophic, unpredictable, and nearly impossible to debug.) The problem is that the VM subsystem has a table for dealing with adjacent pages of differing protection, and it's based on the entire memory size of the process. If the vm-pagemax parameter is set to 2048, and you have 2048 pages allocated in the process, and you try to protect one of them, the attempt will fail. If the protection was occurring as part of a stack creation, pthread_create will return ENOMEM. While most threaded programs will see this only when they create a lot of threads (so that the aggregate stack allocation brings the process up over the vm-vpagemax limit), any program that allocates lots of memory before thread creation can hit the same limit -- whether the allocation is mmap, malloc, or just a really big program text or data segment. So check your vm-vpagemax and make sure it's big enough. /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP============= Q195: Does pthread_create set the thread ID before the new thread executes? Wan-Teh Chang wrote: > The first argument for pthread_create() is the address of a pthread_t > variable in which pthread_create() will write the new thread's ID before > it returns. > > But it's not clear whether the new thread's ID is written into the > pthread_t > variable before the new thread begins to run. The POSIX standard implies that the ID will not be set before the thread is scheduled. The actual text is "Upon successful completion, pthread_create shall store the ID of the created thread [...]." You always need to remember, in any case, that threads operate asynchronously, and one great way to hammer that message home is to prevent anyone from counting on basics like having the create's ID when the thread starts. (Yeah, that sounds mean, and I guess it is. But way back in the early days of threading, when nobody know much about using threads, and people yelled "I don't want to use synchronization, so you need to give me non-preemptive thread scheduling!", we faced a really big problem in education. I DELIBERATELY, and, if you like, maliciously, designed the CMA [and therefore the DCE thread] create routine to store the new thread's id AFTER the thread was scheduled for execution, specifically so that the thread will, at least sometimes, find it unset, dragging the reluctant programmer, kicking and screaming, into the world of asynchronous programming. This was a purely user-mode scheduler, with a coarse granularity timeslicer, and it was far too easy to write lazy code that wouldn't work on future systems with multiple kernel threads and SMP. I couldn't prevent people from getting away with bad habits that would kill their code later -- but I could at least make it inconvenient! When I converted to a native POSIX thread implementation for Digital UNIX 4.0, having battled the education problem for over half a decade and feeling some reasonable degree of success, I opted for convenience over forced education -- I set the ID before scheduling the new thread, and made sure it was documented that way.) > I checked the pthread_create() man pages on all major commercial Unix > implementations, and only the pthread_create(3) man page on Digital Unix > > (V4.0D) addresses this issue (and gives an affirmative answer): > DECthreads assigns each new thread a thread identifier, which DECthreads > writes into the address specified as the pthread_create(3) routine's thread > argument. DECthreads writes the new thread's thread identifier before the > new thread executes. > > AIX 4.3, HP-UX 11.00, IRIX 6.3, and SunOS 5.6 do not specify the timing > of the writing of new thread's ID relative to the new thread's execution. > > Is this something not specified in the POSIX thread standard? I don't > have a copy of the IEEE POSIX thread standard document, so all I can do > is to read the man pages. For my application, I need DECthreads' > semantics that the new thread ID is written before the new thread > executes. I guess on other platforms, I will need to have use a mutex to > block the new thread until the pthread_create() call has returned. If you really need to code a thread that uses its own ID immediately, you have a few choices. One, yeah, it can lock a mutex. Just hold a mutex (or, better, use a condition variable and some predicate) around the pthread_create call, and treat the thread ID as a shared resource. (Which it is, although, since it's write-once, and thread create already guarantees a consistent view of the address space to the created thread, there's no need for additional synchronization if it's just written before the new thread is scheduled.) Two, forget about the shared state and just have the thread call pthread_self(), which will return the exact same ID that the creator has stored (or will store). /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP============= Q196: thr_suspend and thr_continue in pthread Niko D. Barli wrote: > Is there anyway to implement, or to emulate > Solaris thr_suspend() and thr_continue() in > pthread ? Yes, there is. But it's ugly and the result is an inefficient and complicated wart that reproduces almost all of the severe problems inherent in asynchronous suspend and resume. If you check the Deja News archive for this newsgroup, you can probably dig up (much) earlier posts where I actually told someone where to find suspend and resume code. > This is the case why I need to use thr_suspend > and thr_continue. Think again! You don't need them, and you'll be better off if you don't use them. > I have 3 servers running on 3 hosts. > Each server have 2 threads, each listening to 1 of the other 2 servers. > Socket information is held in global area, so that every thread can > access it. > > For example, in Server 1 : > - thread 1 -> listening to socket a (connection to Server 2) > - thread 2 -> listening to socket b (connection to Server 3) > > In each thread, I use select to multiplex between socket > communication and standard input. > > .......... > > For example, I ask server 1, to read data from server 2 > (by inputing command from stdin). If my input from stdin > handled by thread 1, there will be no problem. > But if thread 2 handle it, thread 2 will send request for > data to server 2 and waiting. Server 2 will send back data, > but the data is now listened by BOTH thread 1 and thread 2. > > So what I want to do is to suspend thread 1, and let thread 2 > get the data. You do NOT want to use suspend and resume for this! What you're talking about is SYNCHRONIZATION between two threads sharing the same resource. Suspend and resume are NOT synchronization functions, and they won't do what you want. For example, if you simply depend on asynchronously suspending one thread "before" it reads from stdin, what if you're late? (Threads are asynchronous, and, without explicit synchronization, you cannot know what one is doing at any given time.) What if the thread you've suspended has already tried to read, and currently has a stdio mutex locked? Your other thread will simply block when it tries to read, until the first thread is eventually resumed to complete its read and unlock the mutex. Suspend and resume are extremely dangerous and low-level scheduling functions. You need to know a lot about everything a thread might possibly be doing before you can safely suspend it -- otherwise you risk damaging the overall application. (Very likely causing a hang.) If you don't know every resource a thread might own when you suspend it, or you don't own every resource YOU might need to do whatever it is you'll do while the other thread is suspended, then you cannot use suspend and resume. Even if you do know, and control, all thread, there is always a better and less dangerous solution than suspend and resume. (Suspend and resume are often used because they seem convenient, and expedient; and there are even rare cases where they can be used successfully. But, far more often than not, you'll simply let your customer find out how badly you've broken your application instead of finding it yourself.) /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP============= Q197: Are there any opinions on the Netscape Portable Runtime? I am working on the Netscape Portable Runtime (NSPR), so my opinions are obviously biased. I'd like to provide some info to help you evaluate this product. First, NSPR is more than a thread library. It also includes functions that are tied to the thread subsystem, most notably I/O functions. I/O functions can block the caller, so they must know which thread library they are dealing with. The thread API in NSPR is very similar to pthreads. The synchronization objects are locks and condition variables. The NSPR thread API does not have suspend, resume, and terminate thread functions. It also does not have the equivalent of pthread_exit(). NSPR has thread interrupt, but not thread cancel. The absence of these functions in the API is by design. Also, condition variables are associated with locks when they are created, and condition notification must be done while holding the lock, i.e., no "naked" notifies. The implementation of NSPR is often merely a layer on top of the native thread library. Where there are no native threads available, we implement our own user-level threads. NSPR does do a few "value-added" things: 1. The condition variable notifies are moved outside of the critical section where possible. You must write code like this: PR_Lock(lock);PR_NotifyCondVar(queue_nonempty_cv); PR_Unlock(lock); The actual pthread calls made by NSPR are: pthread_mutex_lock(); pthread_mutex_unlock(); pthread_cond_signal(); (We use reference count on the condition variables to deal with their destruction.) 2. In some two-level thread implementations, a blocking I/O call incurs the creation of a kernel schedulable entity (i.e., LWP). To minimize the number of LWPs created this way, the NSPR I/O functions block all the callers on condition variablea, except for one thread. A lucky thread is chosen to block in a poll() call on all the file descriptors on behalf of the other threads. 3. On NT, NSPR implements a two-level thread scheduler using NT fibers and native threads and uses NT's asynchronous I/O, while still presenting a blocking I/O API. This allows you to use lots of threads and program in the simpler blocking I/O model. 4. Where it is just too expensive to use the one-thread-per-client model but you don't want to give up the simplicity of the blocking I/O model, there is work in progress to implement a "multiwait receive" API. (See http://www.mozilla.org/docs/refList/refNSPR/prmwait.html.) These are just some random thoughts that came to mind. Hope it helps. Wan-Teh =================================TOP============= Q198: Multithreaded Perl Hello All, I have just finished to update my Win32 IProc Perl module version 0.15,now i can really say that it's a complete module. Here is all the methods that i have implemented: new() (Constructor) Create() CloseHandle() ExitProcess() ExitThread() FindWindow() GetAffinityMask() (WinNT only) GetCommandLine() GetCurrentHandle() GetCurrentId() GetExitCode() GetExitCodeThread() GetThreadStatus() GetWorkingSet() (WinNT only) GetStatus() (WinNT only) GetPriorityClass() GetPriorityBoost() (WinNT only) GetThreadPriority() GetThreadPriorityBoost() (WinNT only) Kill() LastError() Open() Resume() SetAffinityMask() (WinNT only) SetIdealProcessor() (WinNT only) SetPriorityBoost() (WinNT only) SetPriorityClass() SetWorkingSet() (WinNT only) SetThreadPriority() SetThreadPriorityBoost() (WinNT only) SetThrAffinityMask() (WinNT only) ShowWindow() Sleep() Suspend() SwitchToThread() (WinNT only) Wait() With all those 35 methods you will be in a complete control of your Threads and Processes. Add to this my Win32 MemMap that comes with: o SysV like functions (shmget,shmread,shmwrite ...) o Memory mapped file functions ... Plus my Win32 ISync module that comes with a complete sychronisation mechanisms like: o Mutex o Semaphores o Events o Timers and the sky will be the limit . I have included a lot of examples on my modules,i have also updated my IProc Perl documentation,you will find all the doc at: http://www.generation.net/~cybersky/Perl/iprocess.htm and all my modules at: http://www.generation.net/~cybersky/Perl/perlmod.htm Than you for your time,and have a nice weekend. Regards Amine Moulay Ramdane. "Long life to Perl and Larry Wall!" =================================TOP============= Q199: What if a process terminates before mutex_destroy()? > File locks are released if a process terminates (as the files are closed), Correct. > while SYSV-IPC semaphores are persistant across processes, Unless you specify the SEM_UNDO flag. > What about (POSIX) mutex's? There is no cleanup performed on them when a process terminates. This could affect a mutex (or condition variable) with the process- shared attribute that is shared between processes. Rich Stevens One more point: the "persistence" of an IPC object is different from what you are asking about, which is whether an IPC object is "cleaned up" when a process terminates. For example, using System V semaphores, they always have kernel persistence (they remain in existence until explicitly deleted, or until the kernel is rebooted) but they may or may not be cleaned up automatically upon process termination, depending on whether the process sets the SEM_UNDO flag. Realize that automatic cleanup is normally performed by the kernel (as in the System V semaphore case and for fcntl() record locks) but the Posix mutual exclusion primitives (mutexes, condition variables, and semaphores) can be (are normally?) implemented as user libraries, which makes automatic cleanup much harder. And, as others have pointed out here, automatic cleanup of a locked synchronization primitive may not be desireable: if the primitive is locked while a linked list is being updated, and the updating process crashes, releasing the locked primitive does not help because the linked list could be in some intermediate state. But there are other scenarios (such as an fcntl() record lock being used by a daemon to make certain only one copy of the daemon is started) where the automatic cleanup is desired. > What about (POSIX) mutex's? I don't see this documented anywhere. It's hidden in the Posix specs--sometimes what is important is not what the Posix spec says, but what it doesn't say. "UNIX Network Programming, 2nd Edition, Volume 2: Interprocess Communications" (available in ~2 weeks) talks about all this. Rich Stevens =================================TOP============= Q200: If a thread performs an illegal instruction and gets killed by the system... > % threads should remain open for the life of the application. However > % they could perform an illegal instruction and get killed by the system. > % I would like for the thread creator to post an error that a thread has > % died, AND then restart the killed thread. > > You don't have to worry about this particular case, since the system will > kill the entire process for you if this happens. Threads aren't processes. I've answered many questions, here and in mail, from people who expect that illegal instructions or segmentation faults will terminate the threads. And even from people who realize that it will terminate the process, but think they WANT it to terminate only the thread. That would be really, really, bad. A quick message to anyone who thinks they want the process to "recover" from a segv/etc. in some thread: DON'T TRY IT. At best, you'll just blow up your program later on. At worst, you'll corrupt permanant external data (such as a database file), and won't detect the error until much, much later. Remember that a thread is just an "execution engine". Its only private data is in the hardware registers of the processor currently executing the thread. Everything else is a property of the ADDRESS SPACE, not of the thread. A SIGSEGV means the thread has read incorrect data from the address space. A SIGILL means the thread has read an illegal instruction from the address space. Either a pointer (the PC in the case of SIGILL) or DATA in the address space has been corrupted somehow. This corruption may have occurred within the execution context of ANY thread that has access to the address space, at any time during the execution of the program. It does NOT mean that there's anything wrong with the currently executing thread -- most often, it's an "innocent victim". The fault lies in the program's address space, and potentially affects all threads capable of executing in that address space. There's only one solution: shut down the address space, and all threads within it, as soon as possible. That's why the default action is to save the address space and context to a core file and shut down. This is what you want to happen, and you shouldn't be satisfied with anything less. You can then analyze the core file to determine what went wrong, and try to fix it. Meanwhile, you've minimized the damage to any external invarients (files, etc.)... and, at the very least, you know something went wrong. In theory, an embedded system might handle a SIGSEGV, determine exactly what happened, kill any "wayward" thread responsible for the corruption, repair all data, and continue. Don't even IMAGINE that you can do this on anything but a truly embedded system. You may be able to detect corruption in your threads, and in the data under control of your code -- but you link against libpthread, libc, and probably other libraries. They may have their own threads, and certainly have LOTS of their own data. You cannot analyze or reconstruct their data. The process is gone. Forget it and move on with life. If you need to write a "failsafe" application, fork it from a monitor process. Do NOT share any memory between them! The parent simply forks a child, which exec*s the real application. The parent then wait*s for the child, and if it terminates abnormally, forks a replacement. Either the parent (before creating the replacement) or the replacement (on startup) should analyze and repair any files that might have been damaged. And then you're off and running. Safely. > % I was going to use the posix call "pthread_join" to wait for thread > % exits. However using "pthread_join" does not give the thread id of the > % thread that has died. Is there a way to do this > % using another thread command. > > Well, you say > pthread_join(tid, &status); > > and if it returns with a 0 rc, the thread that died was the one with > id _tid_. Your real problem here is that pthread_join won't return > until the thread formerly known as tid has gone away, so you can't really > use it to wait for whatever thread goes away first. My guess is that John is writing code for Solaris (he wrote the article on Solaris 2.6), and was planning to use the unfortunate Solaris join-any "extension". John, don't do that! It's a really, really, bad idea. Unlike UNIX processes, there's no parental "line of descent" in threads. It's fine to have a "wait any" that waits for any CHILD of the calling process, and therefore it seems obvious to extend this concept to threads. But a thread has no children. There are just an amorphous set of threads within a process, all equals. You create threads, say, and a database program you're using creates threads, and a fast sort library it uses creates more threads. Maybe you're also using a thread-aware math library that creates more... and perhaps the thread library has its own internal threads that occasionally come and go. Guess what? Your "join any" will intercept the termination of the NEXT THREAD IN THE PROCESS to terminate. It may be yours, or the thread library's, or anyone else's. If it's someone else's thread, and the creator CARED about the termination of that thread, you've broken the application. (Yeah, YOU broke it, because there's nothing the library developer could reasonably be expected to do about it.) Generally true statement: Anyone who uses "join any" has a broken process. The only exception is when you're sure that can not possibly, ever, be any threads in the process you didn't create. And I don't believe anyone can ever reasonably be sure of that in a modular programming environment. That is, if you link against a library you didn't write, you don't know it can't ever use threads. And if it ever does, you lose. The POSIX pthread_join() function was nearly eliminated from the standard at several points during the development of the standard. It's a minimal "helper function" that does nothing of any particular value. It's utterly trivial to implement pthread_join() yourself. Combine one part mutex, one part condition variable, a dash of data; stir, and serve. You want to join with ANY of your worker threads? No problem. Just add another sprinkle of data to record (if you care) which thread terminated. You don't even need to add more mutexes or condition variables, because they can all share one set. (To code a "normal" pthread_join, you'd usually want each thread to have its own set.) Before terminating, each thread locks the mutex, stores its termination state (whatever information you want from it) and (if you want) its ID, then signals or broadcasts (depending on your desired semantics) the condition variable. To "join", you just wait (in, of course, a correctly tested predicated condition wait loop!) for someone to have terminated. /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP============= Q201: How to propagate an exception to the parent thread? > Does anyone have or know of an approach to mixing threads with C++ > exception handling? Specifically, how to propagate an exception to the > parent thread. I can catch any exceptions thrown within a thread by way > of a try block in the entry function. The entry being a static class > member function. (I know, a "state of sin" wrt to C++ and C function > pointers, but it works.) Copying the exception to global (static or > free store) memory comes to mind. It was the original intention of the designers of the C++ language to allow one to throw/catch across thread boundaries. As far as I know the recently ratified ISO C++ standard makes no mention of threads whatsover. (The ISO C++ committee considered threads an OS and/or implementation, not a language issue. BTW please don't send me a bunch of replies agreeing or disagreeing, I am not on the ISO committee, I am just reporting the facts. ;-) However, I am unaware of ANY compiler that implements the ability to throw/catch across thread boundaries. I have also discussed this issue with some experienced C++ programmers, and they also are unaware of any compiler that implements this. I am told that CORBA allows this if you want to take that approach. You many want to repost this in comp.lang.c++.moderated. In general there are some problems with this approach anyway, simply killing a thread does not cause the C++ destructors to be called. (Again I say generally, because the ISO standard makes no mention of threads, there is no portable behaviour upon which you may count.) It is usually better to catch an exception within the thread that threw it anyway. Peace Peter NB: There is no such thing as a "parent" thread. All threads are created equal. But we know what you mean. RogueWave's threads.h++ does a rethrow of exceptions in the user-selected thread (the assigned "parent"). You may wish to look at that. -Bil =================================TOP============= Q202: Discussion: "Synchronously stopping things" / Cheating on Mutexes William LeFebvre wrote: > In article <[email protected]>, Bil Lewis wrote: > > Practically speaking, the operation of EVERYBODY (?) is that a store > >buffer flush (or barrier) is imposed upon unlocking, and nothing at all > >done on locking. > > Well, except the guarantee that the lock won't be obtained until > the flush is finished. Actually, that's incorrect. There may be no "flush" involved. That's the whole problem with this line of reasoning. One side changes data and unlocks a mutex; the other side locks a mutex and reads the data. That's not a discrete event, it's a protocol; and only the full protocol guarantees visibility and ordering. I dislike attempts to explain mutexes by talking about "flushes" because while a flush will satisfy the requirements, it's not a minimal condition. A flush is expensive and heavy-handed. All that's required for proper implementation of a POSIX mutex is an Alpha-like (RISC) memory barrier that prevents migration of reads and writes across a (conceptual) "barrier token". This affects only the ordering of memory operations FROM THE INVOKING PROCESSOR. With a similar memory barrier in the correct place in mutex lock, the protocol is complete. But with only half the protocol you get ordering/visibility on one side, but not on the other; which means you haven't gotten much. As implied by the quoted statement above, once you've GOTTEN the mutex, you can be sure that any data written by the previous holder, while the mutex was locked, has also made its way to the memory system. The barrier in unlock ensures that, since the unlocked mutex value can't go out before the previous data goes out; and the barrier in lock ensures that your reads can't be issued before your mutex lock completes. But this assurance is not necessarily because of a "flush", and the fact that someone else unlocked a mutex after writing data is not enough to ensure that you can see it; much less that you can see it in the correct order. /---------------------------[ Dave Butenhof ]--------------------------\ | Digital Equipment Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ Subject: Re: synchronously stopping things David Holmes wrote: > [email protected] wrote in article <[email protected]>... > > One place where I try to avoid a mutex (at the risk of being called a > fool) > > is in singletons: > > > > MyClass* singleton(Mutex* mutex, MyClass** instance) > > { > > if (*instance == 0) { > > mutex->lock(); > > if (*instance == 0) // must check again > > *instance = new MyClass(); > > mutex->unlock(); > > } > > return *instance; > > } > > This coding idiom is known as the "Double Checked Locking pattern" as > documented by Doug Schmidt (see his website for a pointer to a paper > describing the pattern in detail). It is an optimisation which can work but > which requires an atomicity guarantee about the value being read/written. > > The pattern works as follows. The variable being tested must be a latched > value - it starts out with one value and at some point will take on a > second value. Once that occurs the value never changes again. > > When we test the value the first time we are assuming that we can read the > value atomically and that it was written atomically. This is the > fundamental assumption about the pattern. We are not concerned about > ordering as nothing significant happens if the value is found to be in the > latched condition, and if its not in the latched condition then acquiring > the mutex enforces ordering. Also we do not care about visibility or > staleness. The last part is critical, and maybe rather subtle. You have to not care about visibility or latency. So... the code in question is broken. It's unreliable, and not MP-safe at all. That is, it's perfectly "MT" [multithread] safe, as long as you're on a uniprocessor... but move to an aggressive MULTIPROCESSOR, and it's "game over". Why? Yeah, I thought maybe you'd ask. ;-) The problem is that you're generating a POINTER to an object of class MyClass. You're creating the object, and setting the pointer, under a mutex. But when you read a non-NULL value of the pointer, you're assuming that you also have access to the OBJECT to which that pointer refers. That is not necessarily the case, unless you are using some form of explicit synchronization protocol between the two threads, because having set the value under a mutex does not guarantee VISIBILITY or ORDERING for another thread that doesn't adhere to the synchronization protocol. Yes, "visibility" might seem not to be an issue here -- either the other thread sees the non-NULL value of "instance", and uses it, or it sees the original NULL value, and rechecks under the mutex. But ORDERING is critical, and it's really a subset of VISIBILITY. The problem is that the processor that sees a non-NULL "instance" may not yet see the MyClass data at that address. The result is that, on many modern SMP systems, you'll read garbage data. If you're lucky, you'll SEGV, but you might just accept bad data and run with it... into a brick wall. The more aggressive your memory system is, the more likely this is to occur. You wouldn't, for example, have any problem running on an Alpha EV4 chip... but on an EV5 or EV6 SMP system, you'll probably end up with intermittent failures that will be nearly impossible to debug, because they'll often depend on nanosecond timing factors that you can't reproduce reliably even in production code, much less under a debugger. (And if you slip by that "probably" and miss the races, you can be sure that one of your customers will run into one eventually... and that's even less fun.) You can fix this problem very simply without a mutex, but that solution is machine dependent. For example, using DEC C on a Digital UNIX Alpha system, it could be as simple as changing your test to: #include [...] if (*instance == 0) { [...] } else asm("mb"); The "mb" (memory barrier) between the test for the non-NULL pointer, and any later dereferences of the pointer, ensure that your memory reads occur in the correct and safe order. But now your code isn't portable. And get it wrong in one place, and your program is toast. That's what I meant in a previous post about the risk and cost of such optimizations. Is the cost of locking the mutex really so high that it's worth sacrificing portability, and opening yourself up to the whims of those ever-more-creative hardware designers? Sometimes, yes. Most of the time... no way. /---------------------------[ Dave Butenhof ]--------------------------\ | Digital Equipment Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ From - Sun May 10 01:31:31 1998 From: Achim Gratz Newsgroups: comp.programming.threads Subject: Re: Mutexes and memory systems (was: synchronously stopping things) OK, I'm trying to wrap things up a bit, although I guess it will still become rather long. Forget caching, memory barriers, store buffers and whatever. These are hardware implementation details that are out of your control. You don't want to know and most of the time you don't know everything you'd need to anyway, which is more dangerous than knowing nothing. Trying to infer the specification from the implementation is what gets you into trouble. When you lock mutex A, POSIX gives you a guarantee that all shared data written by whatever thread under mutex A is visible to your thread, whatever CPU it might run on, and has been completely written before any reads to it can occur (this is the ordering part). When you unlock the mutex, it is best to assume that the shared data vanishes in neverland. It is not guaranteed to be up-to-date or visible at all by POSIX, nor can you infer any order in which the writes may be performed or become visible. It is up to the implementation to employ the hardware in the appropriate manner. Efficient employment of the hardware is a quality of implementation issue that's to be considered after the proof of correctness. [ I don't have the standard, only Dave's book, but that is the definition that would allow for the most agressive memory implementations. It seems sensible, to me at least, to assume this definition for utmost portability. It would be interesting to know if the exact wording does indeed support my interpretation and whether it was intended to be that strong. It would seem that for multiple concurrent readers you'd need to lock multiple mutexes if you want to stay within the bounds of the above definition. ] The problem with this definition and the origin (I think) of the brouhaha in various (usenet) threads here in this group is that implementation of multiple reader situations seems overly expensive since a single mutex allows only a single reader at a time, else you need a mutex for every reader. All of the schemes presented so far that propose to avoid the mutex locking by the readers rely on further properties of the hardware or pthread library implementation. Fortunately or unfortunately these properties exist on most if not all hardware implementations in use today or the library implementors have taken care to slightly expand the guarantees made by POSIX because you usually don't tell the customer that he is wrong. Why do these hacks work? 1) no hardware designer can design a system where it takes an unbounded time for a write to memory to propagate through the system as that requires infinite resources 2) noone conciously introduces longer delays than absolutely necessary for reasons of efficiency and stability 3) there is no implementation, AFAIK, of shared memory that is visible only under an associated mutex, although capability based architectures might have them 4) in the absence of 3, keeping a directory of the data changed under each mutex is likely to be more expensive than making all data visible and introducing order with respect to any mutex except for those that are still under lock for writing 5) on a shared memory system that lacks 3 and 4, once the data has been forced to memory by one processor, it is visible to all processors and no stray stale copies are in caches if the write occured under mutex lock; even if not ordered on other processors, data in memory becomes visible after a bounded time because of 1 and 2 6) the above holds for ccNUMA systems as well, although the time for propagation can be considerably longer What does not work? a) any writes of shared data, without locking, atomic or not, with the exception of "one-shot-flags" (i.e. any value different from the initial one that has to be set before the threads are started signals that some event occured and you never change the value back and you don't care about the exact value itself) b) multiple changes to a variable without unlocking/locking between each change may not become visible at all or may be seen to have different values in different threads c) any read of shared data, without locking, where accessing data after the writer has released the lock would be an error or getting old data can't be tolerated d) any read of shared data, without locking, that is not properly aligned or is larger than the size of memory transactions (there may be several), tearing may occur without notice e) porting to a system where strict POSIX semantics are implemented (e.g. NUMA systems with software coherency) Summary: You can't do much safely without those pesky mutexes. The things you can do aren't, IMHO, in the critical path (performance-wise) most of the time. The potential payback is low and the original problem can be solved by barriers just fine within POSIX (I think - it's been a long day). POSIX1003.1j will even give you barriers that may be more efficient because the library writers took the time to evaluate the hardware properties in depth. That suggests that you could indeed wring some cycles out of these hacks at the expense of portability and correctness. If you do, just make sure you don't give this piece of code to anybody else. [ If you think I'm paranoid, perhaps I am. But I'm sick of commercial software that isn't even linked properly so that it breaks on new machines and OS releases and sometimes even OS patches when it would be a simple matter of controlling the build environment to do it right. If you have to support more than one of these beauties you get a very big headache when you find out that the intersection of system configurations that work is the null set. Specifications and standards exist for a reason. If you don't like them, get them changed. Don't break things gratuitously. ] Achim Gratz. =================================TOP============= Q203: Discussion: Thread creation/switch times on Linux and NT I'm so excited about this that I had to restate what I now think to be the key design differences between NT and Linux 2.0 wrt. task switching: 1. The design of the Linux scheduler appears to make the assumption that, at any time during "normal" operation, there will only be a small number of actually runnable processes. 2. The Linux scheduler computes which of these runnable processes to run via a linear scan of the run queue - looking for the highest priority process. 3. The Linux yield_cpu() function is EXTREMELY prejudicial towards the calling program. If you call yield_cpu() you are not only yielding the CPU, but you are also setting your priority to zero (the lowest) meaning that you will not run again (because of #2 above) until ALL other runnable processes have had a bite at the CPU. 4. A process, under Linux, steadly has its priority lowered as a function of how long it has been scheduled to the CPU. 5. The Linux scheduler re-sets everyone's priority to a "base" priority once all of them have had their priority lowered to zero (either through #3 or #4). This re-set entails another linear traversal of the run queue in schedule(). Comments: a: #1 is probably a very reasonable assumption. b: #2 causes task-switching time, on Linux, to degrade as more runnable processes are added. It was obviously a design decision driven by the assumption in #1. c: #3 is, to me, a contentious issue. Should you get penalized for voluntarily yielding the CPU - should it put you on the "back of the bus" or should it simply lower your priority by one? After all, most other voluntary yields (such as for I/O or to sleep for a time) usually raise your priority under other UNIXs (don't know if that's the case with Linux - haven't checked). In either case, Mingo's code changes this policy. d: #4 is standard, textbook, OS stuff. e: #5 is another reasonable behavior. The linear scan is, again, a function of the belief that #1 is true (or so I believe). f: Because of the combined effects of #1, #2, #3, and #5 my yield_cpu() benchmark was indeed extremely prejudicial to Linux since the assumptions that I was making were not the same as those of Linux's designers. That doesn't mean my benchmark is, or was, a "bad" benchmark. Quite the contrary - it illustrates in painful detail what happens when the designers of a system are using different criteria than those of the users of the system. It is up to the community to decide which criteria is more valid. The net result is that Linux may well beat out NT for context switches where the number of runnable processes is very small. On the other hand, NT appears to degrade more gracefully as the runnable process count increases. Which one is a "better" approach is open to debate. For example, we could probably make Linux degrade gracefully (through hashing, pre-sorting, etc.), as does NT, at the expense of more up-front work with the resultant degradation in context-switch time where the # of processes is very small. On the other hand, the crossover point between Linux vs. NT appears to be right around 20 runnable processes. On a heavily loaded web server (say) with 20-40 httpd daemons plus other code, does the "real world" prefer the NT way or the Linux way? How about as more and more programs become multithreaded? The great thing about Linux is that we have the source - thus these observations can be made with some assurance as to their accuracy. As for NT, I feel like the proverbial blind man trying to describe something I've never seen. The other great thing is that we can change it in the manner that best suits our needs. I love choice and I hate Microsoft. greg Let us pray: What a Great System. Please Do Not Crash. From: [email protected] (Linus Torvalds) Subject: Re: Thread creation/switch times on Linux and NT (was Re: Linux users working at Microsoft!) Date: 8 Mar 1998 01:31:03 GMT Organization: Transmeta Corporation, Santa Clara, CA In article , Greg Alexander wrote: >In article <[email protected]>, Gregory Travis wrote: >>All process priorities were recomputed 99,834 times - or just >>0.5% of the time. Furthermore, only 31,059,779 processes (total) >>were examined during those recalcs as opposed to the 61,678,377 that >>were examined by the much more expensive "goodness" function. >> >>From my perspective, this would tend to strongly favor the current >>scheduling implementation (simple linear search as opposed to more >>complex but robust hashed run queue) - at least for web serving (strong >>emphasis on the latter). If I were to look for improvements, under this >>scenario, I would focus on the "goodness" function since 4% of the time >>we had to throw ten or more processes through it. Perhaps bringing it >>inline with the sched() function. >> >>But even that may be overkill since we only called sched() 24,221,164 >>times over a 17 hours period - or about 400 times per second. >> >>Comments? I would be happy to make my modifications available (they are >>trivial) to anyone who wants to instrument their own application. > >My biggest suggestion is to try kernel profiling. Check if any notable >amount of time is actually spent in goodness before worrying about changing >it. Also, check out 2.1.x - there are some changes to various details of the scheduler that were brought on by the finer locking granularity, but that were sometimes also related to performance. I do obviously agree with the basic points above - I wrote most of the scheduler. Usually there aren't all that many runnable processes even under heavy load, and having a very simple linear queue is a win for almost all situations in my opinion. For example, if there are lots of processes doing IO, the process list tends to be fairly short and you really want a very simple scheduler for latency reasons. In contrast, if there are lots of CPU-bound processes, there may be lots of runnable processes, but it very seldom results in a re-schedule (because they keep running until the timeslot ends), so again there is no real reason to try to be complex. So yes, under certain circumstances the current scheduler uses more CPU than strictly necessary - and the "40 processes doing a sched_yield() all the time" example is one of the worst (because it implies a lot of runnable processes but still implies continuous thread switching). Personally I don't think it's a very realistic benchmark (it tells you _something_, but I don't think it tells you anything you need to know), which is one reason why Linux isn't maybe the best system out there for that particular benchmark. But it would be easy enough to make Linux perform better on it, so I'll think about it. [ Even when I don't find benchmarks very realistic I really hate arguing against hard numbers: hard numbers are still usually better than just plain "intuition". And I may well be wrong, and maybe there _are_ circumstances where the benchmark has some real-world implications, which is why I wouldn't just dismiss the thing out-of-hand. It's just too easy to ignore numbers you don't like by saying that they aren't relevant, and I really try to avoid falling into that trap. ] The particular problem with "sched_yield()" is that the Linux scheduler _really_ isn't able to handle it at all, which is why the Linux sched_yield() implementation sets the counter to zero - I well know that it's not the best thing to do for performance reasons, and I think it unduly penalizes people who want to yield some CPU time, but as it stands the scheduler can't handle it any other way (the "decrement counter by one" approach that Ingo suggested is similarly broken - it just happens to not show it quite as easily as the more drastic "zero the counter", and it has some other problems - mainly that it doesn't guarantee that we select another process even if another one were to be runnable). I should probably add a "yield-queue" to the thing - it should be rather easy to do, and it would get rid of the current scheduler wart with regard to sched_yield(). My reluctance is purely due to the fact that I haven't heard of any real applications that it would matter for, but I suspect we need it for stuff like "wine" etc that need to get reasonable scheduling in threaded environments that look different from pthreads(). Linus From - Sun Mar 8 15:03:12 1998 From: [email protected] (Gregory Travis) Subject: Re: Thread creation/switch times on Linux and NT (was Re: Linux users working at Microsoft!) Here's some more data, using the latest version of my context switching benchmark. Test machine is a 64MB 200Mhz Pentium "classic". Switch Number of processes/Threads Time 2 4 8 10 20 40 ---- ---- ---- ---- ---- ---- Std. Procs 19us 13us 13us 14us 16us 27us Std. Threads 16us 11us 10us 10us 15us 23us Mingo Procs 4us 6us 11us 12us 15us 28us Mingo Threads 3us 3us 5us 7us 12us 22us NT Procs 10us 15us 15us 17us 16us 17us NT Threads 5us 8us 8us 9us 10us 11us Explanation: The "Std." entries show the results of my yield_cpu() benchmark against the standard Linux scheduler using either threads or processes. The "Mingo" entries show the results of the same benchmark but after the Linux yield_cpu() entry has been modified per Mingo's suggestion so that it doesn't take the counter to zero. The "NT" entries show the results of the benchmark under NT. Each benchmark was run twice for each number (to promot accuracy). Thus the above is the result of 72 individual runs. Analysis: The dramatic drop in context switch time, between the "Std." and "Mingo" runs shows how expensive the priority recalc can be - for short run queues at least. Note that it makes little or no difference as the run queue length exceeds about 10 processes. This is almost certainly because the cost of the "goodness" function begins to dominate the picture. For a given number of iterations, the goodness function is much more expensive than the priority recalc function. The goodness function must be performed on each runnable process while the priority recalc must be performed on all processes. Thus with a small # of runnable processes, the expensive goodness function is not called much while the "cheap" priority recalc is called for each process, runnable or not. As the run queue grows, however, the goodness function is called more (while the priority recalc function is essentially constant). Around ~15 processes, on my system, the cost of "goodness" washes out the noise from the priority recalc. Nevertheless, the context switch times shown in the "Mingo" series is probably closest to the actual Linux context switch times. Note how the series dramatically illustrates how context switch overhead, on Linux, grows as a function of the run queue length. It appears that the context-switch overhead for Linux is better than NT for shortish run queues and, especially, where process/process switch time is compared. With run queues longer than about 20 processes, though, NT's scheduler starts to beat out the Linux scheduler. Also note that NT's scheduler appears more robust than the Linux scheduler - its degradation as the run queue grows is nowhere as dramatic as Linux's. NT's thread switch times doubled between 2 and 40 threads while Linux's showed a >sevenfold< slowdown. Does it matter? Quite probably not. From my earlier posting, with data from a heavily loaded webserver, I saw an average run queue length of 2.5 processes. The run queue exceeded 10 processes only about 4% of the time. I've put my benchmarks, as well as the kernel changes to record run queue length, on anonymous ftp at weasel.ciswired.com greg From - Sun Mar 8 15:04:37 1998 From: [email protected] (Gregory Travis) Subject: Re: Thread creation/switch times on Linux and NT (was Re: Linux users working at Microsoft!) In article , Greg Alexander wrote: >In article <[email protected]>, Gregory Travis wrote: >>Here's some more data, using the latest version of my context switching >>benchmark. Test machine is a 64MB 200Mhz Pentium "classic". >> >>Switch Number of processes/Threads >>Time 2 4 8 10 20 40 >> ---- ---- ---- ---- ---- ---- >>Std. Procs 19us 13us 13us 14us 16us 27us >>Std. Threads 16us 11us 10us 10us 15us 23us >> >>Mingo Procs 4us 6us 11us 12us 15us 28us >>Mingo Threads 3us 3us 5us 7us 12us 22us >> >>NT Procs 10us 15us 15us 17us 16us 17us >>NT Threads 5us 8us 8us 9us 10us 11us > >Does this look to you like NT maybe never traverses the tree and never >updates priorities (assuming it even switches every time)? This indicates >non-complexity, which is beautiful, but I bet that they didn't do it well. >(NT being VMS's deranged nephew or something) I don't know what NT's scheduling algorithm is. I'm very surprised, given your comments below, that you are venturing an opinion on how NT works. It may not even use a list (what you referred to as a "tree" which it is not in Linux) at all. >Please, please, /PLEASE/ use profiling when talking about "this is almost >certainly because the cost of the goodness function begins to dominate the >picture." It will tell you exactly which function dominates which picture >quite clearly and simply. It's much easier to say "goodness takes so much >time, the recalc takes this much time," than bothering to make appeals of >logic "goodness should take more time because." Not that the latter is a >bad idea in any case, just to explain why, but you should never explain why >something is happening that you aren't certain is happening if you have an >alternative. Greg, so far you've contributed nothing positive to this venture other than making most of us painfully aware that you don't even understand ulimit and that your favorite way of showing how smart you are is by throwing out red herrings at every opportunity. I'll tell you what - why don't you try and reverse that impression? I spent about five hours of my life last night running the above sequence (not to mention all the rest of the time I've devoted to this). For the past twenty years I've been paid to design and write software [including a UNIX kernel release that used a scheduler I wrote] during the day so perhaps you'll forgive me if I want to take this evening off and instead watch Bill Gates lie on CSPAN. So, here's something positive you can do: profile the kernel. All my sources and kernel changes are at weasel.ciswired.com (anonymous ftp). Why don't you take them and report back to us with your findings? That would be very nice, thanks. Don't forget to do it with and without Mingo's very helpful changes. >Note that there are variables here you are controlling unintentionally. >Your statement would be better made as "With my benchmark and runqueues >longer than about 20 processes, though, NT's..." or, to be specific, "When >all runnable processes are calling sched_yield() in a loop and there are a >minimal number of non-runnable processes and runqueues are longer than about >20 processes..." and I'm sure there are plenty of other variables I've left >out. Having only about 80 processes, with 40 of them in a loop calling >sched_yield(), you will not get general purpose numbers. I'd almost expect >more dormant processes to slow down linux more than NT in this case, but I >don't know what would happen if the dormant processes were more like your >"real life" example, i.e. many IO-bound programs that are awakened >frequently, with an average of some number of them in the runqueue at once. You have an awful lot of "I'd almost expect," "I don't know," and "I'm sure" statements for a guy who earlier so soundly admonished me for stating what was clearly my opinion. >NO! Robust is the WRONG word! Robust implies it can handle many different >situations. It is better at /THIS/ situation with large numbers of idling >runnable processes. Your test does not show how NT runs in real life >situations. I can accept that. Where can I download your test? >If NT's scheduler really were more "robust," it would matter a good deal. >All you've shown is that its times don't appear to grow linearly as the >number of runnable idling processes grows. Thank you. That's all I claimed to show (along with the switch times). greg =================================TOP=============================== Q204: Are there any problems with multiple threads writing to stdout? > > > However, even if there are no problems, you may be seeing interleaved > > >output: > > > > > > example: > > > > > > printf("x=%d, y=%d\n", x, y); > > > > > >there is no guarantee that x and y will appear on the same line > > > > Surely, printf() will lock the stream object (if you use the MT safe glibc2), > > no? > > Not on Linux, or any other UNIX variant I've dealt with. UNIX is used > to it, even before threads. stdout on NT doesn't make sense unless it's > a console appliation. For POSIX conformance, printf() must lock the process' stdio file stream. That is, the output is "atomic". Thus, if two threads both call a single printf() simultaneously, each output must be correct. E.g., for printf ("%d, %d\n", 1, 2); printf ("%s, %s"\n", "abc", "def"); you might get 1, 2 abc, def or you might get abc, def 1, 2 but no more "bizarre" variations. If you do, then the implementation you're using is broken. There is another level of complication, though, if you're talking about the sequence of multiple printf()s, for example. E.g., if you have printf ("%d", 1); printf ("%s", "abc"); printf (", %d\n", 2); printf (", %s\n", "def"); Then you might indeed get something like abc1, def , 2 POSIX adds an explicit stdio stream lock to avoid this problem, which you can acquire using flockfile() and release using funlockfile(). For example, you could correct that second example by coding it as flockfile (stdout); flockfile (stdout); printf ("%d", 1); printf ("%s", "abc"); printf (", %d\n", 2); printf (", %s\n", "def"); funlockfile (stdout); funlockfile (stdout); Of course, if you write to the same file using stdio from separate processes, there's no synchronization between them unless there are some guarantees about how stdio generates the actual file descriptor write() calls from its internal buffering. (And I don't believe there is.) /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q205: How can I handle out-of-band communication to a remote client? Stefan Rupp wrote: > Good afternoon, > > we encountered a problem in the design of our client-server architecture. > The situation is as follows: > > [1] the server runs as a demon process on an arbitrary host > [2] the client may connect to any number of servers > [3] when connected, the client requests data from a server > through a TCP socket and waits for the server to deliver > the requested data > [4] the server itself can send messages to the client at any > time without being asked by the client > > In a first step, I designed the client multithreaded, consisting of the > 'main' thread and an 'I/O' thread, which handles the communication between > the client and the server through a SOCK_STREAM socket with a select(2) > call. The connection between the main thread and the I/O thread is made > through a pair of pipes, so that the select call, which waits for > incoming messages from the server as well as from the main thread, > returns and handles the request. To open a new I/O thread for each server > the client wants to connect to, is probably not a good idea, because I > need two pipes for each thread to communicate with. So, only one I/O > thread must handle the connection to any server the client connects to. > > Does anybody have a better idea how to design the client, so that it > can handle unexpected callbacks from the server at any time? In the > book "UNIX Network Programming" it is stated that signal driven I/O > is nor advisable for a communication link through stream sockets, so > that is not an option. > > Thanks! > > Doei, > struppi > > -- > Dipl.-Inform. Stefan H. Rupp > Geodaetisches Institut der RWTH Aachen Email: [email protected] > Templergraben 55, D-52062 Aachen, Germany Tel.: +49 241 80-5295 > Change the client a little. Have one thread that waits on the responses from the socket- this is a blocking call so is VERY efficent - (you will want a timeout in there to do houskeeping and to check to shutdown every few seconds though). Have a second thread that sends messages to the server on the socket. This is safe, because sockets are bidirectional async. devices. If the receive thread knows how to deal with messages from the server the archeticture is quite simple. You may need a queue of messages waiting to be processed if processing time is long, or a queue of messages to send to the server to prevent contention on SENDING to the server. We have implemented a client server using such an archectutre - it works very well with full async. bidirectional messaging between client and server. the server can deal with 1500 messages (total not each) a second from 200 clients. Nick =================================TOP=============================== Q206: I need a timed mutex for POSIX [email protected] wrote: > I am doing multi-platform development, and have got several very successful > servers running on NT and on AIX. The ptroblem is that NT is MUCH more > efficient in it's MUTEX calls that AIX because of the POSIX mutex int > pthread_mutex_lock (mutex) does not have a timeout, for that reason I need > to do a loop doing a pthread_mutex_trylock (mutex) and a 20 milisecond sleep > uintil timeout ( usually 5 seconds ) Why? Or, more specifically, exactly what do you intend to do when the loop times out? * Which thread owns the mutex? (No way to tell, without additional information that cannot be used reliably except under control of a mutex; and you've already declared that, in your application, the mutex usage protocol is unreliable.) * What is that thread doing? Is it hung? Broken? Did it get prempted and miss a deadline, but "still ticking"? Unless you know that (not impossible, but EXTREMELY difficult to implement, much less to get right), you CANNOT "steal" the mutex, or know what to do once you've got it. * You cannot force the owner of the mutex to unlock. You cannot unlock from your current thread. You can't assume you now own it. If you knew the owner, you could cancel it and join with it (as long as you know nobody else is already joining with it), hoping that "it's broken but not TOO broken". But then what happens if it doesn't terminate, or if it's sufficiently broken that it doesn't release the mutex on the way out? This is the kind of thing that may sound "way cool" for reliable, fail-safe servers. In practice, I doubt the value. That kind of fail-safety is almost always complete illusion except in rigorously isolated embedded system environments. And in such an environment, it's trivial to write your own pthread_mutex_timedwait() or work out some alternate (and probably better) method to recover your runaway state. In a fully shared memory multithreaded server, when something's "gone wrong" and you lose control (and that's what we're talking about), the ONLY safe thing to do is to panic and crash the process, NOW. You can run the server under a monitor parent that recognizes server exit and forks a new copy to continue operation. You can keep operation logs to recover or roll back. But you cannot make the process "fail safe". > The problem is this is inefficient. NT has a Wait_for_MUTEX with timeout. > this is good. > (bummer, Bill got it right :-( ) No. Just another misleading and overly complicated function that looks neato-keen on paper. Any code that really, truly DEPENDS on such a capability is already busted, and just doesn't know it yet. (Oh, and, yes, I say this with the explicit knowledge that all generalizations are false, including this one. There is certainly code that doesn't need to be 100% fail safe, and that may be able to productively use such primitives as a timed mutex wait to slightly improve some failure modes. Maybe, in a very few cases, maybe even yours, all of the time and effort that went into it provides some real benefit. "The one absolute statement I might make is that none of my statements are absolute." ;-) ) You can put together a "timed mutex" yourself, if you want, using a mutex and a condition variable. Use the mutex to serialize access to control information, such as your own ownership and waiter data, and use a condition variable to wait for access. A waiter that times out can then determine which thread (in your APPLICATION scheme) owns the "mutex". Of course, if the application is really ill-behaved, then even the "control mutex" might not be unlocked -- I doubt you could do much in that case, anyway. One final note. As I said, such "unusual" things as timed mutex waits CAN make sense for carefully coded embedded application environments, and the folks in the POSIX realtime working group worry about that sort of thing a lot. While the concept of timed mutex waits was passed over for POSIX 1003.1c-1995 as too specialized, the "additional realtime features" standard, 1003.1d, (still in draft form), adds pthread_mutex_timedwait. /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ [If you *really* need a timed mutex, you can look at the sample code for timed mutexes on this web page -- Bil] =================================TOP=============================== Q207: Does pthreads has an API for configuring the number of LWPs? "Hany Morcos (CS)" wrote: > Hi, does pthreads has an API for configuring the number of > lwps for a specific sets of threads. Or does most OS's assign > an lwp per a group of thread. The short answer: PThreads, no. But UNIX98 includes a pthread_setconcurrency() extension to the POSIX thread API. The long answer: First, "LWP" is a Solaris-specific (actually, "UI thread" specific, but who cares?) term for a kernel thread used to allow a threaded process to exploit O/S concurrency and hardware parallelism. So "most OS"s don't have LWPs, though they do have some form of kernel threads. (Note, this is all probably more than you want or need, but your question is rather "fuzzy", I tend to prefer to give "too much" information rather than "not enough", and for some reason I appear to be in a "talkative" mood... ;-) ) POSIX 1003.1c-1995 ("pthreads") deliberately says very little about implementation details, and provides few interfaces specifically to control details of an implementation. It does allow for a two-level scheduler, where multiple kernel threads and POSIX threads interact within a process, but provides only broad definitions of the behavior. There is no way to directly control the scheduling of PCS threads ("Process Contention Scope", or "user mode") onto "kernel execution entities" (kernel threads). Although there is a mechanism to avoid user-mode scheduling entirely, by creating SCS ("System Contention Scope") threads, which must be directly scheduled by the kernel. (Or at least must behave as if so scheduled, with respect to threads in other processes.) There's no form of "thread grouping" supported. Some systems have class scheduling systems that allow you to specify relations between threads and/or processes, but there's nothing of the sort in POSIX. (Nor, if there were, would it necessarily group threads to an LWP as you suggest.) POSIX threads requires that a thread blocking for I/O cannot indefinitely prevent other user threads from making progress. In some cases, this may require that the implementation provide a new kernel execution entity. It can do so either as a "last ditch" effort to prevent completely stalling the process (as Solaris generally does, by creating one additional LWP as the last current LWP in the process blocks), or as a normal scheduling operation (as Digital UNIX does) to always maintain a consistent level of PCS thread concurrency in the process. (While I prefer the latter, and experience has shown that this is what most people expect and desire, POSIX doesn't say either is right or wrong; and in addition, there are costs to our approach that aren't always repaid by the increased concurrency.) UI threads was designed to allow/require the programmer to control the level of process concurrency, and Sun's POSIX thread implementation uses the same thread scheduler as their UI thread implementation. While the "last ditch" LWP creation prevents indefinite stalls of I/O-bound applications, it doesn't help applications with multiple compute-bound threads, (the implemenation doesn't time-slice PCS threads). And, at best, the model allows the process concurrency to be reduced to 1 before offering any help. (Digital UNIX does time-slice PCS threads, so compute-bound threads can coexist even on a uniprocessor [though this isn't the most efficient application model, it's common and worth supporting].) UI threads provides a thr_setconcurrency() call to allow a careful programmer to dynamically "suggest" that additional LWPs would be useful. Due to Sun influence (and various other vendors who had intended similarly inflexible 2-level schedulers), the Single UNIX Specification, Version 2 (UNIX98) includes a pthread_setconcurrency() extension to the POSIX thread API. Due to increasing cooperation between The Open Group and PASC (the IEEE group that does POSIX), you can expect to see the UNIX98 extensions appear in a future version of the POSIX standard. Note that while this function is essential on Solaris, it has no purpose (and does nothing) on Digital UNIX, (or on Linux, which uses only kernel threads). I expect other vendors to move away from hints like pthread_setconcurrency() as they (and their users) get more experience with threading. The need for such hackery is largely responsible for the unsettlingly common advice of UI thread wizards to avoid the Solaris default of PCS threads ("unbound", in UI terminology) and to use SCS threads ("bound") instead. In some ways this is much like the old Win32 vs. Mac OS debate on preemptive vs. cooperative multitasking. While cooperate multitasking (or the simplistic/efficient Solaris 2-level scheduling) can be much better for some class of applications, it's a lot harder to write programs that scale well and that work the way users expect with (unpredictable) concurrent system load. While preemptive multitasking (or tightly integrated 2-level scheduling) adds (system) implementation complexity and some unavoidable application overhead, it's easier to program for, and, ultimately, provides more predictable system scaling and user environment. > Wouldn't make more sense if one lwp blocks for a disk I/O > instead of the entir program, when using grean threads. "Green threads" is the user-level threading package for Java. It doesn't use multiple kernel threads, and therefore cannot use hardware parallelism or true I/O concurrency (although it has hooks to use non-blocking UNIX I/O to, in many cases, schedule a new user thread while waiting for I/O). Modern implementations of Java should use native threads rather than Green threads. In the case of a Solaris Java using UI threads or POSIX threads rather than Green threads, disk I/O WOULD block only the LWP assigned to the calling thread. There's no reason to be using Green threads on any O/S that has real thread support! > I guess now it is very safe for multiple threads to directly > write to a stream queue since write and read are thread safe. Java I/O must be thread-safe. ANSI C I/O under a POSIX thread implementation must be thread-safe. However, there's no standard anywhere requiring that C++ I/O must be thread-safe -- nor is there for most other languages. So you need to watch out HOW you write. If you're writing Java or C, you're probably pretty safe. In any other language, watch out unless you're using ANSI C/POSIX I/O functions directly. /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q208: Why does Pthreads use void** rather than void*? Ben Elliston wrote: > Wang Bin writes: > > > Today, when I was looking at thr_join(thread_t, thread_t*, void**), > > I was suddenly confused by void* and void**. Why the third parameter > > here is void** rather than void*? > > The third parameter is a void * so that the result can be anything you > like--it's a matter of interpretation. However, you need to pass the > address of a void * so that the function can modify the pointer. The POSIX thread working group wanted to specify a way to pass "any value" to and from a thread, without making the interface really bulky and complicated. The chosen way (it's often been debated whether the approach was right, or good, but that's all irrelevant now) was to use "void*" as a universal (untyped) value. It's NOT necessarily a pointer (though of course it may be)... it's just an untyped value. The UI thread interface (defined by many of the same people) has the same logic. So when you create a thread, you pass in a "void*" argument, which is anything you want. When a thread terminates, either by returning from its start routine or by calling pthread_exit (or thr_exit), it can specify a "void*" return value. When you join with the thread, you can pass the function a POINTER to some storage that will receive this thread return value. The storage to which you point must, of course, be a "void*". Beware ("be very, very ware", as Pooh was once warned), because this mechanism, while often convenient, is not at all type-safe. It's really easy to get yourself into trouble. Do not, EVER pass the address of something that's not "void*" into thr_join/pthread_join and simply cast the pointer to (void**). For example, let's look at size_t TaskStatus; ...... thr_join(..., ..., (void**)&TaskStatus); (This is slightly different from Ben's example. He cast the pointer to "void*"... that'll work, since ANSI C is willing to implicitly convert between any pointer type and "void*", but the parameter type is actually "void**".) What is the SIZE of size_t? Well, on conventional 32-bit system, size_t and "void*" are probably both 32 bits. On a conventional 64-bit LLP system, they're probably both 64 bits. But ANSI C doesn't require that conformity. So what if size_t is a 32-bit "int", while "void*" is 64-bit? Well, now you've got 32 bits of storage, and you're telling the thread library to write 64 bits of data at that address. You've also told the compiler that you really, really, for sure know what you're doing. But you really don't, do you? The construct is extremely common, but it's also extremely dangerous, wrong, and completely non-portable! Do it like the following example, instead. It's a little more complicated, but it's portable, that may save you a lot of trouble somewhere. Your compiler might warn you that size_t is smaller than void*, in cases where you might have otherwise experienced data corruption by overwriting adjacent storage. If the original value passed by the thread really WAS a size_t, the extra bits of the void* would have to be 0, [or redundant sign bits if "size_t" is signed and the value is negative], and losing them won't hurt you. void *result; size_t TaskStatus; ...... thr_join(..., ..., &result); TaskStatus = (size_t)result); /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q209: Should I use poll() or select()? [email protected] (W. Richard Stevens) writes: >Second, I used to advocate select() instead of poll(), mainly because >of portability, but these days select() is becoming a problem for >applications that need *lots* of descriptors. Some systems let you >#define FD_SETSIZE to use more than their compiled-in limit (often >256 or 1024), some require a recompile of the kernel for more than >the default, and some require a recompile of the library function >named select(). For these applications poll() is better, as there >is no inherent limit (other than the per-process descriptor limit). Indeed. >Another complaint I had against poll() was how hard it was to remove >a descriptor from the array (typical for network servers when a client >terminates), but now you just set the descriptor to -1 and it's ignored. But that was fixed a long, long time ago. Which brings me to another advantage of poll(): you just specify the events you are interested in once; poll uses a different field for the result events. (So no resetting of bits in the select masks). Also, on Solaris, select() is a library routine implemented on top of poll(); that costs too. (Though on other systems it might be the reverse) Casper -- =================================TOP=============================== Q210: Where is the threads standard of POSIX ???? try http://www.unix-systems.org/single_unix_specification_v2/xsh/threads.html =================================TOP=============================== Q211: Is Solaris' unbound thread model braindamaged? "Doug Royer [N6AAW]" wrote: > Did you have a specifc braindamaged bug to report? > > In article <[email protected]>, Boris Goldberg writes: > > > > I briefly browsed Solaris 7 docs at docs.sun.com and, regrettably, > > it doesn't appear that they changed their braindamaged threading model. Actually, I think Doug phrased that very well. In particular, he didn't use the word "bug". He merely said "braindamaged". One might easily infer, (as I have), that he's making the assumption that the "braindamaged" behavior is intentional, and simply expressing regret that the intent hasn't changed. Here's a few of the common problems with Solaris 2-level threading. I believe one of them could accurately be described as a "bug" in Solaris (and that's not confirmed). The others are merely poor design decisions. Or, in common terms, "brain damage". 1. thr_concurrency() is a gross hack to avoid implementing most of the 2-level scheduler. It means the scheduler puts responsibility for maintaining concurrency on the programmer. Nice for the Solaris thread subsystem maintainers -- not so nice for users. (Yes, UNIX has a long and distinguished history of avoiding kernel/system problems by complicating the life of all programmers. Not all of those decisions are even wrong. Still, I think this one is unnecessary and unjustifiable.) 2. Rumor has suggested that Solaris creates one LWP by default even on SMP systems -- if that rumor is true, this condition might shade over the line into "true bug". But then, having an SMP isn't necessarily the same as being able to use it, so maybe that's deliberate, too. 3. Blocking an LWP reduces the process concurrency. Yeah, sure the library will create a new one when the last LWP blocks, but that's not good. First, it means the process has been operating on fewer cylinders than it might think for some period of time. And, in many cases even worse, after the LWPs unblock, it will be operating on more cylinders than it can sustain until the LWPs go idle and time out. Running with more LWPs than processors is rarely a good idea unless most of them will always be blocked in the kernel. (I've heard unsubstantiated rumors that 2.6 did some work to improve on this, and 7 may do more; but I'm not inclined to let anyone "off the hook" without details.) 4. While timeslicing is not required by POSIX, it is the scheduling behavior all UNIX programmers (and most who are used to other systems, as well) EXPECT. The lack of timeslicing in Solaris 2-level scheduling is a constant source of complication and surprise to programmers. Again, this isn't a bug, because it's clearly intentional; it's still a bad idea, and goes against the best interests of application programmers. /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q212: Releasing a mutex locked (owned) by another thread. Zoom wrote: > Hello, I have inherited the maintenance of a multi-threaded application. > The application uses pthreads and runs on multiple platforms including > solaris. On solaris it seems to be somewhat squirrely (the technical > term of course :-) and I get random core dumps or thread panics. > Absolutely not consistantly reproduceable. Sometimes it will go for > hours or days cranking away and sometimes it will thread panic shortly > after it starts up. In researching the the book "Multi-threaded > Programming with Pthreads" by Bil Lewis et. al. I found on page 50 the > statement to the effect that under posix it is illegal for one thread to > release a mutex locked (owned) by another thread. Well, this application > does that. In fact it does it quite extensively. > > Is there anyone willing to commit to the idea that this may be the > source of the applications problems. The answer is an absolutely, definite, unqualified "maybe". It depends entirely on what the application is doing with those mutexes. First, I want to be completely clear about this. Make no mistake, locking a mutex from one thread and unlocking it from another thread is absoutely illegal and incorrect. The application is seriously broken, and must be fixed. However, reality is a little more complicated than that. POSIX explicitly requires that application programmers write correct applications. More specifically, should someone write an incorrect application, it explicitly and deliberately does NOT require that a correct implementation of the POSIX standard either DETECT that error, or FAIL due to that error. The results of programmer errors are "undefined". (This is the basis of the POSIX standard wording on error returns -- there are "if occurs" errors, which represent conditions that the programmer cannot reasonably anticipate, such as insufficient resources; and there are "if detected" errors, which are programmer errors that are not the responsibility of the implementation. A friendly/robust implementation may choose to detect and report some or all of the "if detected" errors -- but even when it fails to detect the error, it's still the application's fault.) The principal difference between a binary semaphore and a mutex is that a mutex carries with it the concept of "ownership". It is that characteristic that makes it illegal to unlock the mutex from another thread. The locking thread OWNS the mutex, exclusively, until it unlocks the mutex. IF an implementation can (and chooses to) detect and report violations of the ownership protocol, the erroneous attempt at unlock will result in an EPERM return. However, this is a programmer error. It is often unreasonably expensive to keep track of which thread owns a mutex: an instruction (or kernel call) to determine the identity of the locking thread may take far longer than the basic lock operation. And of course it would be equally expensive to check for ownership during unlock. Many implementations of POSIX threads, therefore, do not record, or check, mutex ownership. However, because it's a mutex, it IS owned, even if the ownership isn't recorded. The next patch to your operating system might add checking, or it might be possible to run threaded applications in a heavyweight debug environment where mutex ownership is recorded and checked... and the erroneous code will break the application. It'll be the application's (well, the application developer's) fault. Anyway, IF the implementation you're using really doesn't record or check ownership of mutexes. And IF that illegal unlock is done as part of a carefully managed "handoff" protocol so that there's no chance that the owner actually needs the mutex for anything. (And, of course, if this bizarre and illegal protocol is actually "correct" and consistent.) THEN, your application should work despite the inherent illegality. You could switch to a binary semaphore, and do the same thing without the illegality. The application still won't WORK if you're releasing a lock that's actually in use. /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q213: Any advice on using gethostbyname_r() in a portable manner? >>>>> "Tony" == Tony Gale writes: Tony> Anyone got any advice on using gethostbyname_r in a portable Tony> manner? It's definition is completely different on the Tony> three systems I have looked at. Autoconf rules would be nice Tony> :-) Sorry, no autoconf rules. Here's what I did in a similar situation: { struct hostent *hentp = NULL; int herrno; uint32 ipnum = (uint32)-1; #if defined(__GLIBC__) /* Linux, others if they are using GNU libc. We could also use Posix.1g getaddrinfo(), which should eventually be more portable and is easier to use in a mixed IPv4/IPv6 environment. */ struct hostent hent; char hbuf[8192]; if (gethostbyname_r(hostname, &hent, hbuf, sizeof hbuf, &hentp, &herrno) < 0) { hentp = NULL; } #elif defined(sun) /* Solaris 2.[456]. */ struct hostent hent; char hbuf[8192]; hentp = gethostbyname_r(hostname, &hent, hbuf, sizeof hbuf, &herrno); #elif defined(__osf__) /* On Digital Unix 4.0 plain gethostbyname is thread-safe because it uses thread specific data (and a h_errno macro). HPUX is rumoured to use this method as well. This will go wrong on Digital Unix 3.2, but this whole file is not going to compile there anyway because version 3.2 has DCE threads instead of Posix threads. */ hentp = gethostbyname(hostname); herrno = h_errno; #else #error I do not know how to do reentrant hostname lookups on this system #endif if (hentp == NULL) { /* Digital Unix doesn't seem to have hstrerror :-(. */ hmddns_logerror("gethostbyname(%s): %d", hostname, herrno); } else { memcpy(&ipnum, hentp->h_addr, sizeof ipnum); } return ipnum; } Regards, Bas. From: David Arnold we're using this ... OLDLIBS=$LIBS LIBS="$LIBS $LIB_GHBN_R" AC_CHECK_FUNC(gethostbyname_r, [ AC_DEFINE(HAVE_GETHOSTBYNAME_R) AC_MSG_CHECKING([gethostbyname_r with 6 args]) OLD_CFLAGS=$CFLAGS CFLAGS="$CFLAGS $MY_CPPFLAGS $MY_THREAD_CPPFLAGS $MY_CFLAGS" AC_TRY_COMPILE([ # include ], [ char *name; struct hostent *he, *res; char buffer[2048]; int buflen = 2048; int h_errnop; (void) gethostbyname_r(name, he, buffer, buflen, &res, &h_errnop) ], [ AC_DEFINE(HAVE_GETHOSTBYNAME_R_6_ARG) AC_MSG_RESULT(yes) ], [ AC_MSG_RESULT(no) AC_MSG_CHECKING([gethostbyname_r with 5 args]) AC_TRY_COMPILE([ # include ], [ char *name; struct hostent *he; char buffer[2048]; int buflen = 2048; int h_errnop; (void) gethostbyname_r(name, he, buffer, buflen, &h_errnop) ], [ AC_DEFINE(HAVE_GETHOSTBYNAME_R_5_ARG) AC_MSG_RESULT(yes) ], [ AC_MSG_RESULT(no) AC_MSG_CHECKING([gethostbyname_r with 3 args]) AC_TRY_COMPILE([ # include ], [ char *name; struct hostent *he; struct hostent_data data; (void) gethostbyname_r(name, he, &data); ], [ AC_DEFINE(HAVE_GETHOSTBYNAME_R_3_ARG) AC_MSG_RESULT(yes) ], [ AC_MSG_RESULT(no) ]) ]) ]) CFLAGS=$OLD_CFLAGS ], [ AC_CHECK_FUNC(gethostbyname, AC_DEFINE(HAVE_GETHOSTBYNAME)) ]) LIBS=$OLDLIBS > Whom do I shoot? take your pick :-( -- David Arnold CRC for Distributed Systems Technology +617 33654311 (fax) University of Queensland [email protected] (email) Australia (web) =================================TOP=============================== Q214: Passing file descriptors when exec'ing a program. Jeff Garzik wrote: > > My MT program must send data to the stdin of multiple processes. > It also needs to read from the stdout of those _same_ processes. > > How can this be done? use the dup() function to save your parent stdin and stdout (if needed). For each child process do: create two pipe()'s close stdin dup() one end of the first pipe close stdout dup the other end of the second pipe fork() exec() close unused ends of pipes save the pipe fd's for later use restore parents stdin and stdout (if needed) add pipe fd to fdset_t use select() call to detect when child input from pipe is available From quick Web search for examples: http://www.esrf.fr/computing/bliss/css/spec/help/piper.html http://www1.gly.bris.ac.uk/~george/unix-procs/papif-nops.c http://www.mit.edu/afs/athena/user/g/h/ghudson/info/pty.c A book? Hard to be a top notch Unix programmer without this one on your shelf: Advanced Programming in the Unix Environment W. Richard Stevens , Addison-Wesly Publishing ISBN 0-201-56317-7 Good luck! % use the dup() function to save your parent stdin and stdout (if needed). Good suggestion, although I'd suggest using dup2() to replace stdin and stdout with the pipe ends. If you do this, you have to be careful about any code that uses stdin and stdout in the rest of your program -- you've got to be sure you never try to use these handles while they're being set up for the child process. -- Patrick TJ McPhee East York Canada =================================TOP=============================== Q215: Thread ID of thread getting stack overflow? Kurt Berg wrote: > We are seeking a PORTABLE way of getting the thread ID > of a thread experiencing a stack overflow. We have to do > some post processing to try to determine, given the thread > ID, what sort of thing to do. > > It is our understanding that pthread_self is NOT "async > signal safe". > > Thanks in advance. Umm, as I mentioned in my reply to your email, once you buy into the concept of doing "portable" things in a signal handler (which represents a serious ERROR within the process), you're climbing a steep slope with no equipment. Your fortune cookie says that a disastrous fall is in your future. I also commented that, although pthread_self isn't required by the standard to be async-signal safe, it probably IS, (or, "close enough"), on most platforms. And in a program that's done something "naughty" and unpredictable, asynchronously, to its address space, that's as good as you're going to get regardless of any standard. However, you neglected to mention in your email that the SIGSEGV you wanted to handle was a stack overflow. Now this leads to all sorts of interesting little dilemmas that bring to mind, (among other things), Steven Wright's famous line "You can't have everything: where would you put it?" (Actually, the answer is that if you had everything, you could leave it right where it was, but that's beside the point.) Your system is telling you that you've got no stack left. While some systems might support a per-thread alternate signal stack, that's not required by the standards (and, in any case, it's kinda expensive since you need to allocate an alternate stack for each thread you create). So... you've used all your stack, and you want to handle the error. On what? The stack you've used up? Or the other stack that you can't even designate portably? Sure, on SOME systems, you may be able to determine (at least sometimes) that you're "near the end" of the stack, before you're actually there. The Alpha calling standard, for example, requires the compiler to "probe" the stack before changing the stack pointer to allocate a new frame. Thus, if the probe generates a SIGSEGV, you've still got whatever it was you were trying to allocate. MAYBE that's enough to run a signal handler. Unfortunately, "maybe", "sometimes", and "some systems" are not words that contribute to a portable solution. The answer is that you're as out of luck as your thread (even if you still have stack). What you want to do is DEBUGGING... so leave it to the debugger. Make sure that SIGSEGV is set to SIG_DFL. Let the ailing process pass away peacefully, and analyze the core file afterward. (And if you're faced with a system that doesn't support analysis of a threaded process core file, then find a patch... or turn around and face another system.) And if you're just trying to leave a log entry to later trace the failure of some reliable networking system, remember that a thread ID is transient and local. It means absolutely nothing within another process, or even at some other time within the same process. Why would you want to log it? Without the core file, the information is useless; and with the core file, it's redundant. /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q216: Why aren't my (p)threads preemted? Lee Jung Wooc wrote: > I have my opinion and question. > > IMO, the cases that showed up "thread 2" is not by cpu preemption, but by > normal scheduling. The printf() call in thread function induces a system call > write( ). but the printf is a library function and will not lose cpu until > some amount of bytes are stored in the buffer and fflush () is called. The > fflush() calls write() then switching occurs. Library buffer size may > influences when the switching occurs and also the second pthread_create call > may switch cpu to the first thread. A thread MAY be timesliced at any time, on any system that supports timeslicing. As I said, while SCHED_RR requires timeslicing, SCHED_OTHER does not prohibit timeslicing. (Only SCHED_FIFO prohibits timeslicing.) In addition to system calls, a thread might block on synchronization within the process. For example, the buffer written to by printf() is shared among all threads using stdout, and has to be synchronized in some fashion. Usually, that means a mutex (or possibly a semaphore). If multiple threads use printf() simultaneously (and, especially with one of the predefined file streams, like stdout, it needn't be used by any of YOUR threads to have simultaneous access), one of them may BLOCK attempting to acquire the synchronization object. That would result in a context switch. > I'm assuming the write() call , known to be none-blocking in normal cases, > can lead to switching. IMHO, the word none-blocking means that the calling > context is not scheduled after a context(thread or process, whatever) which > is assumed to be waitng for an event infinitively. That's "non-blocking", not "none-blocking". (I mean no disrespect for your English, which is far better than I could manage in your language, but while I can easily ignore many "foreign speaker" errors, this one, maybe especially because you chose to define it, stood out and made me uncomfortable. > Is my assumtion correct ? I'm afraid I can't make much sense of your second sentence. DIGRESSION ALERT (including slings and arrow not specifically targeted to, nor especially deserved by, the person who wrote the quoted phrase): I find it difficult to read anything that starts with "IMHO", an abbreviation that I despise, and which is almost always hypocritical because an opinion one takes such care to point out is almost never intended to be "humble". It's quite sufficient to simply declare your opinion. I, and all other responsible readers, will assume that EVERYTHING we read is in fact the author's opinion, except (perhaps) when the author specifically claims a statement to be something else. And even then we'll question whether the author has in fact the authority and knowledge to make such a claim. In the rare cases where it might actually be useful to explain that your opinion is your opinion, you might simply say so without the cloying abbreviation. With that out of the way, where were we? Oh yes, write(). It's true that most traditional UNIX systems have a bug by which I/O to a file oriented device is not considered capable of "blocking". That's unfortunate. It's particularly unfortunate in a threaded world, because some other thread might be capable of doing a lot of work even while an I/O works its way through to the unified buffer cache; much less if the data must actually be written to a remote NFS file system. In any case, this does not usually apply to other types of file system. If stdout is directed to a terminal, or to a network socket, the printf()'s occasional write() SHOULD result in blocking the thread. The write() syscall IS, technically, a "blocking function", despite the fact that some calls to it might not block. Being a "blocking function" does not necessarily require that every call to that function block the calling thread. > As far as, I know threre's no implemention of SCHED_RR in major unix > distributions. Neither I think the feature is on definite demand. I believe that Linux implements SCHED_RR fully. I know that Digital UNIX does, and always has. I have some reason to believe that Solaris 7 implements SCHED_RR, and I suspect that AIX 4.3 does as well. I'd be surprised if IRIX (known for its realtime support) didn't support SCHED_RR (not that I haven't already been surprised by such things). I don't have any idea whether HP-UX 11.0 has SCHED_RR... perhaps that's the "major unix distribution" you're talking about? As for demand. Oh yes, there's a very high (and growing) demand, especially for the sort of "soft realtime" scheduling (and not always all that "soft") that can be used to build reliable and "highly available" network server systems. Anybody who doesn't support SCHED_RR either has no customers interested in networking, or you can safely bet cash that they've already received numerous requests, bribes, and threats from customers with names that just about anyone would recognize. /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q217: Can I compile some modules with and others without _POSIX_C_SOURCE? Keith Michaels wrote: > The _malloc_unlocked looks REAL suspicious now. Am I getting the wrong > malloc linked into my program? The program contains two modules compiled > separately: all the posix thread stuff is compiled with > -D_POSIX_C_SOURCE=199506L, and the non-posix module is compiled without it. > This is necessary because many system interfaces that I need are not > available in posix (resource.h, bitmap.h, sys/fs/ufs*.h do not compile > under posix). > > Is the traceback above evidence I have built the program incorrectly? I don't know whether the traceback is evidence, but, regardless, you HAVE built the program incorrectly. I don't know whether that incorrectness is relevant. It's hard to believe that source files compiled without "thread support" on Solaris would be linked to a non-thread-safe malloc() -- but, if so, that could be your problem. You don't need to define _POSIX_C_SOURCE=199506L to get thread support, though that is one way to do it. Unfortunately, as you've noted, defining that symbol has many other implications. You're telling the system that you intend to build a "strictly conforming POSIX 1003.1-1996 application", and therefore that you do not intend to use any functions or types that aren't defined by that standard -- and in addition that you reserve the right to define for your own use any symbols that are not specifically reserved by that standard for implementation use. Solaris, like Digital UNIX, (and probably others, though I don't know), has a development environment that, by default, supports a wide range of standard and non-standard functions and types. That's all fine, as long as they don't conflict and as long as the application hasn't required that the environment NOT do this, as by defining _POSIX_C_SOURCE. To compile threaded code on Solaris (or Digital UNIX) that is not intended to be "strictly conforming POSIX 1003.1-1996" you should define only the symbol _REENTRANT. You'll get the thread-safe versions of any functions or symbols (e.g., errno) where that's relevant, without restricting your use of non-POSIX capabilities of the system. DEC C on Digital UNIX provides the proper defines when you compile with "cc -pthread". I believe that Solaris supports "cc -mt", (though I didn't know about that the last time I tried to build threaded code on Solaris, so I haven't checked it). Don't use -D_POSIX_C_SOURCE=199506L unless you really MEAN it, or if the system you're using doesn't give you any alternative for access to thread functions. (As I said, you never need it for Digital UNIX or Solaris.) And always build ALL of the code that you expect to live in a threaded process with the correct compiler options for your system. Otherwise, at best, they may disagree on the definition of common things like errno; and, at worst, the application may not be thread-safe. /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ -- Patrick TJ McPhee East York Canada =================================TOP=============================== Q218: timed wait on Solaris 2.6? [email protected] wrote: > I read from somewhere that pthread_cond_timedwait should only be > used "in realtime situations". Since Solaris doesn't support the > realtime option of pthread, does it mean pthread_cond_timedwait > should not be used on Solaris at all? Condition variables are a "communication channel" between threads that are sharing data. You can wait for some predicate condition to change, and you can inform a thread (or all threads) waiting for a condition that it has changed. There's nothing intrinsically "realtime" about them, at that level. You can also have a wait time out after some period of time, if the condition variable hasn't been signalled. That's not really "realtime", either, although the nanosecond precision of the datatype does originate in the needs of the realtime folk who developed 1003.1b (the realtime POSIX extensions). On a system that supports the POSIX thread realtime scheduling option (which, as you commented, Solaris 2.6 doesn't support -- though it erroneously claims to), multiple threads that have a realtime scheduling policy, and are waiting on a condition variable, must be awakened in strict priority order. That, of course, is obviously a realtime constraint -- but it doesn't apply unless you have (and are using) the realtime scheduling extensions. > I tried to use pthread_cond_timedwait in my application and got > various weird results. > > 1. Setting tv_nsec doesn't seem to block the thread at all. I > guess Solaris might just ignore this field (the value I gave > was 25,000,000). Define "at all". How did you attempt to measure it? By looking at the sweep second hand on your watch? Using time(1)? Calling gettimeofday() before and after? Querying a hardware cycle counter before and after? Your 25000000 nanoseconds is just 25 milliseconds. However, what may have happened is that you specified tv_sec=0, or time(NULL), and then set tv_nsec to 25000000. With tv_sec=0, that's a long, long way in the past, and the wait would indeed timeout immediately. Even with tv_sec=time(NULL), remember that you may well have a nanosecond "system time" of .026 seconds, and you're setting an absolute timeout of .025. You really shouldn't use time(NULL) to create a struct timespec timeout. You should use clock_gettime(). If you want to use small waits, you may also need to check clock_getres(), which returns the clock resolution. If your system supports a resolution of 0.1 second, for example, there's not much point to setting a wait of 0.025 seconds. (You'll get up to a 0.1 second wait anyway.) > 2. The thread blocks and yields fine if I use "time(NULL) + 1" in > the tv_sec field. However the thread eventually hangs in some > totally irrelevant code (in the system code `close' when I try > to close a socket descriptor). There's no connection between condition waits and sockets, so most of this item seems completely irrelevant. > We are thinking of using another thread that sleeps (with nanosleep) > for a period of time and then wakes up and signals other threads > as a timer now. Has anyone tried this approach before? Depends on what you are really trying to accomplish. I don't see any application of this technique that has anything to do with the rest of your message. /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q219: Signal delivery to Java via native interface "Stuart D. Gathman" wrote: > Dave Butenhof wrote: > > > I am trying to figure out a way to handle signals synchronously in a Java VM. > I have a thread calling sigwait() which reports supported signals > synchronously to Java. But I have no control over other threads in the VM - > so I can't get them to block the signals. The sneaky solution of blocking the > signal in a handler probably won't work in AIX - the man page says "Concurrent > use of sigaction and sigwait for the same signal is forbidden". It cannot legally say that, and it may not be saying what it seems to. There's no restriction in POSIX, or in UNIX98, against using both. However, POSIX does say that calling sigwait() for some signal MAY change the signal action for that signal. If you have a silly implementation that actually does this (there's no point except with a simple purely user-mode hack like the old DCE threads library), then trying to combine them may be pointless -- but it's not illegal. (And, by the way, if you're using any version of AIX prior to 4.3, then you ARE using that very "user-mode hack" version of DCE threads, and you're not really allowed to set signal actions for any "asynchronous" signal.) Of course, in practice, such distinctions between "forbidden" and "legal but meaningless" aren't useful, so one could argue that the incorrect statement "is forbidden" may not be entirely unjustified. ;-) > One idea is to have the handler notify the signal thread somehow - not > necessarily with a signal. Is there some kind of event queue that could be > called from a signal handler? You can call sem_post() from a signal handler. Therefore, you could have a thread waiting on a semaphore (sem_wait()), and have the signal call sem_post() to awaken the waiter. > Another idea is to have the signal thread call sigsuspend. Then, if the > handler could determine whether the thread it interrupted is the signal > thread, it could block the signal all threads except the signal thread. I don't think I understand what you mean here. One thread cannot block a signal in other threads. And that "if" hiding in the phrase "if the handler could determine" is a much bigger word that it might seem. You cannot do that using any portable and reliable mechanism. /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q220: Concerning timedwait() and realtime behavior. Bil Lewis wrote: > First, the definition I'm using of "realtime" is real time i.e., > wall clock time. In computer literature the term is not well-defined, > though colloqually "soft realtime" means "within a few seconds" while > "hard realtime" means "less than 100ms." (I've been quite upset > with conversations with realtime programmers who talk about 100% > probabilities, time limits etc. Ain't no such thing! This muddies > the RT waters further, but we'll leave that for another time.) > > As such, anything that refers to the wall clock (eg pthread_cond_timedwait()) > is realtime. "Realtime" means "wall clock time"? Wow. > I think important that people using timed waits, etc. recognize this > and write programs appropriately. (I admit to some overkill here, but > think some overkill good.) Yeah, it's important to remember that you're dealing with an "absolute" time (relative to the UNIX Epoch) rather than a "relative" time (relative to an arbitrary point in time, especially the time at which the wait was initiated). The sleep(), usleep(), and nanosleep() functions are relative. The pthread_cond_timedwait() function is absolute. So if "realtime" means "absolute time" (which has some arbitrary correlation, one might assume, to "wall clock time"), then, yeah, it's realtime. > > Condition variables are a "communication channel" between threads that are > > sharing data. You can wait for some predicate condition to change, and you > > can inform a thread (or all threads) waiting for a condition that it has > > changed. > > > > There's nothing intrinsically "realtime" about them, at that level. > > > > You can also have a wait time out after some period of time, if the > > condition variable hasn't been signalled. That's not really "realtime", > > either, although the nanosecond precision of the datatype does originate > > in the needs of the realtime folk who developed 1003.1b (the realtime > > POSIX extensions). > > And here's the sticky point: 'That's not really "realtime"'. It sure > isn't hard realtime. (Many people don't refine their terms.) But it is > real time. Reality is overrated. It certainly has little to do with programming. No, it's not "realtime" by any common computer science/engineering usage. "Realtime" isn't a matter of the datatype an interface uses, but rather of the real world constraints placed on the interface! An interface that waits for "10 seconds and 25 milliseconds plus or minus 5 nanoseconds, guaranteed, every time" is realtime. An interface (like pthread_cond_timedwait()) that waits "until some time after 1998 Dec 07 13:08:59.025" is not realtime, because no (useful) real world (real TIME) constraints are placed on the behavior. > So what's my point? Maybe just that we need some well-defined terminology > here? We're talking about POSIX functions, so let's try out the POSIX definition: "Realtime in operating systems: the ability of the operating system to provide a required level of service in a bounded response time." Does pthread_cond_timedwait() "provide a required level of service in a bounded response time"? No, absolutely not, except in conjunction with the scheduling guarantees provided by the realtime scheduling option. Of course, in a sense, it is bounded -- pthread_cond_timedwait() isn't allowed to return BEFORE the specified time. But that's not a useful bound. What realtime people want is the other direction... "if I wait for 25 milliseconds, what, worst case, is the LONGEST interval that might pass before control is returned to my application". You're correct that "hard" and "soft" realtime aren't quite so firmly defined. In normal use, soft realtime usually means that it shouldn't be too long, most of the time, or someone'll get annoyed and write a firmly worded letter of protest. Hard realtime means the plane may crash if it's too long by more than nanoseconds. Hard realtime does not necessarily require fine granularity, or even 100% precision (though some applications do require this). The principal requirement is predictability. /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q221: pthread_attr_getstacksize on Solaris 2.6 [email protected] wrote: > I am trying to find out what default stack size each thread has by the > following code (taken from Pthreads Programming) in non-main threads: > > size_t default_stack_size = -1; > > pthread_attr_t stack_size_custom_attr; > > pthread_attr_init( &stack_size_custom_attr ); > > pthread_attr_getstacksize( &stack_size_custom_attr, &default_stack_size ); > printf( "Default stack size = %d\n", default_stack_size ); > > The output is 0. Can anyone explain this? Thanks. Yes, I can explain that. "0" is the default value of the stacksize attribute on Solaris. Any more questions? ;-) POSIX says nothing about the stacksize attribute, except that you can set it to the size you require. It doesn't specify a default value, and it doesn't specify what that default means. It does say that any attempt to specify a value less than PTHREAD_STACK_MIN is an error. Therefore, it's perfectly reasonable (though odd and obscure) to have a default of 0, which is distinct from any possible value the user might set. When you actually create a thread, the Solaris thread library looks at the stacksize attribute, and, if it's 0, substitutes the actual runtime default. That's pretty simple. I happen to prefer the way I implemented it. (I suppose that goes without saying.) When you create a thread attributes object, the stacksize attribute contains the actual default value, and the code you're using will work. But the real point, and the lesson, is that what you're trying isn't portable. While it's not quite "illegal", it's close. Another way of putting it is that you've successfully acquired the information you requested. The fact that it happens to be absolutely useless to you is completely irrelevant. /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q222: LinuxThreads: Problem running out of TIDs on pthread_create Kaz Kylheku wrote: > ( The comp.programming.threads FAQ wrongfully suggests a technique of using a > suitably locked counter that is incremented when a detached thread is created > and decremented just before a detached thread terminates. The problem is that > this does not eliminate the race condition, because a thread continues to > exist after it has decremented the counter, so it's possible for the counter > to grossly underestimate the actual number of detached threads in existence. > I'm surprised at this *glaring* oversight. ) [This is true, but not very likely. Like never. Still, Kaz is right. -Bil] In most programs this is simple and reliable, because threads tend to execute the short amount of code at the end without blocking. That, of course, is not always true. And you're correct that in some cases, especially with a heavily loaded system, there can be delays, and they can be significant. > The only way to be sure that a detached thread has terminated is to poll it > using pthread_kill until that function returns an error, which is ridiculous. > That's what joinable threads and pthread_join are for. That won't work unless you know that no new threads are being created during the interval. (Anywhere in the process... and you can only know that if you're writing a monolithic application that calls no external code.) That's because a POSIX thread ID (pthread_t) may be reused as soon as a thread is both terminated and detached. (Which, for a detached thread, means as soon as it terminates.) This won't always happen, and, in some implementations, (almost certainly including Linux, which probably uses the process pid), may "almost never" happen. Still, code that uses this "trick" isn't portable, or even particularly reliable on an implementation where it happens to work most of the time. Your summary is absolutely correct: that's why join exists. > Because of this race, you should never create detached threads in an unbounded > way. Programs that use detached threads should be restricted to launching a > *fixed* number of such threads. > > I don't believe that detached threads have any practical use at all in the > vast majority of applications. An application developed in a disciplined > manner should be capable of an orderly shutdown during which it joins all of > its threads. I can't think of any circumstance in which one would absolutely > need to create detached threads, or in which detached threads would provide > some sort of persuasive advantage; it's likely that the POSIX interface for > creating them exists only for historic reasons. I believe that detached threads are far easier to use for the vast majority of programs. Joining is convenient (but not necessary) for any thread that must return a single scalar value to its creator. Joining is essential when you need to be "reasonably sure" that the thread has given up its system resources before going on to something else. In any other case, why bother? Let your threads run independently, exchange information as needed in any reasonable way, and then quietly "evaporate" in a puff of greasy black smoke. > (The FAQ, however, for some unexplained reason, suggests that detached threads > are preferred over joinable.) [Personal preference only -Bil] And I agree, though they're clearly not appropriate in situations where you're flooding the system with threads (which isn't a design I'd recommend anyway), and you really need to know when one is gone to avoid resource problems. /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q223: Mutexes and the memory model Kaz Kylheku wrote: > In article , > Keith Michaels wrote: > >I know that mutexes serialize access to data structures and this > >can be used to enforce a strongly ordered memory model. But what > >if the data structure being locked contains pointers to other > >structures that were build outside of mutex control? > > The mutex object is not aware of the data that it is protecting; it is > only careful programming discipline that establishes what is protected. > If some pointers are protected by a mutex, it may be the case that the > pointed-at objects are also protected. Or it might be the case that such > objects are not protected by the mutex. > > Any object that is accessed only whenever a certain mutex is held is > implicitly protected by that mutex. This is a really good statement, but sometimes I like to go the opposite direction to explain this. The actual truth is that mutexes are selfish and greedy. They do NOT protect your data, or your code, or anything of the sort. They don't know or care a bit about your data. What they do, and very well, is protect themselves. Aside from mutual exclusion (the "bottleneck" function), the law says that when you lock a mutex, you have a "coherent view of (all) memory" with the thread that last unlocked the mutex. If you follow carefully follow the rules, that is enough. As Kaz says, you need to apply careful programming discipline in order to be protected by a mutex. First, never touch shared data when you don't have a mutex locked... and all the threads potentially touching any shared data must agree on a single mutex (or the same set of mutexes) for this purpose. (If one thread locks mutex A to touch a queue, while another locks mutex B to touch a queue, you've got no protection.) And, if you are moving a piece of data between "private" and "shared" scopes, you must agree on a single mutex for the transition. (You can modify private data as you wish, but you must always lock a mutex before making that private data shared, and before making shared data private again -- as in queueing or dequeueing a structure.) If your structure contains pointers to other structures, then they're in the same scope. If there may also be other pointers to the data, you need to make sure all threads agree, at every point in time, whether the "secondary" data is private or shared. /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q224: Poor performance of AIO in Solaris 2.5? Bil Lewis wrote: > > Douglas C. Schmidt wrote: > > > > Hi Mike, > > > > ++ I have an application that needs to write files synchronously (i.e: a > > ++ database-like application). I figured I should try and use the "aio" > > ++ family of system calls so that several such writes can be in progress > > ++ simultaneously if the files are on different disks. (A synchronous write > > ++ takes around 12-16 msecs typically on my machine.) > > ++ > > ++ I would have expected that the lio_listio() would be no slower than 2 > > ++ write()'s in series, but it seems to be 4-5 times worse. > > > > Our ad hoc tests using quantify/purify seem to indicate that the > > aio*() calls on Solaris are implemented by spawning a thread PER-CALL. > > This is most likely to be responsible for the high overhead. I'm not > > sure how other operating systems implement the aio*() calls, but > > clearly spawning a thread for each call will be expensive. > > I never worked with the AIO stuff, but this does sound correct... AIO > was done with threads & creating a new thread per AIO call sounds likely. > But it's not terribly expensive. Depending upon the machine it should > add no more than 20-100us. You wouldn't even be able MEASURE that. > > Something is rotten in the state of Denmark. and I suspect its the scheduler. I've written a threaded external call-out/call-back system for our VisualWorks Smalltalk environment (http://www.parcplace.com/products/thapi/). It runs on Windows NT/95/98, OS/2, Intel Linux, Digital Unix, HPUX, AIX and Solaris. The scheme maintains a thread-farm, and threads in the farm are used to make external call-outs, and a rendevouz mechanism is used to respond to threaded call-ins. On all but Solaris the performance of a simple threaded call-out to a null routine is approximately 50 times slower than a non-threaded call-out (e.e. a simple threaded callout on Intel Linux using a 180 MHz Pentium Pro is about 85 usec). But on Solaris it is an order of magnitude worse (e.g. a simple threaded callout on an UltraSPARC 1 takes at least 800usecs). Since the system uses a thread farm, thread creation times aren't relevant in determining performance. Instead, the performance is determined by pthread_mutex_lock, pthread_cond_signal, pthread_mutex_unlock, pthread_cond_wait, pthread_cond_timed_wait and the underlying scheduler. Dormant threads in the farm are waiting in pthread_cond_wait. When a call-out happens the main/Smalltalk thread marshalls the call into some memory, chooses a thread and does a {pthread_mutex_lock;pthread_cond_signal;pthread_mutex_unlock} to wake the thread and let it perform the call. On return the thread signals the main thread and enters a pthread_cond_timed_wait (if it times-out the main thread is resignalled and the wait reentered). The main/Smalltalk thread responds to the signal by collecting the result of the call. To ensure calls make progress against the main thread all threads in the farm have higher priority. On many pthreads platforms, Solaris included the system has to use a non-realtime scheduling policy because of a lack of permissions, so on Solaris 2.5/2.6 the scheme is using SCHED_RR. My guess is that the scheduler is not prompt in deciding to wake-up a thread, hence when the calling thread is signalled it isn't woken up immediately, even though the thread has a higher priority. One experiment I've yet to try is putting a thr_yield (as of 2.5 pthread_yield is unimplemented) after the {pthread_mutex_lock;pthread_cond_signal;pthread_mutex_unlock}. Although this is all conjecture it does fit with a scheduler that only makes scheduling decisions occasionally, e.g. at the end of a process's timeslice. Anyone have any supporting or contradictory information? =================================TOP=============================== Q225: Strategies for testing multithreaded code? Date: Tue, 12 Jan 1999 12:41:51 +0100 Organization: NETLAB plus - the best ISP in Slovakia >Subject says it all: are there any well known or widely used >methods for ensuring your multithreaded algorithms are threadsafe? >Any pointers to useful research on the topic? Let us suppose a program property is an attribute that is true of every possible history of that program (a history of a program being a concrete sequence of program states, transformations from one state to another are carried out by atomic actions performed by one or multiple threads). Now what about being able to provide a proof that your program has safety (absence of deadlock, mutual exclusion, ...) and liveliness (partial/complete correctness, ...) properties? To prove your program has absence of deadlock property, you may define an invariant DEADLOCK that is true when all (cooperating) threads in your program block. Then proving your program will not deadlock is very simple - you need to assert that for every critical assertion C in the program proof: C => not DEADLOCK (C implicates DEADLOCK invariant is false, in other words when preconditions of program statements are true, they exclude possibility of a state where deadlock is possible). There is an excellent book covering this topic (the above is an awkward excerpt from it): Andrews, Gregory R. "Concurrent Programming, Principles and Practice" Addison Wesley 1991 ISBN 0-8053-0086-4 Applying propositions and predicates into your program (or rather sensitive multithreaded parts) to assert preconditions and postconditions required for atomic actions execution present a complication, of course. You have to spend more time on annotating your algorithm, developing invariants that have to be kept by every statement in the algorithm (and if not, you have to guard against executing the statement until the invariant is true - and here you have conditional synchronization :), proving program properties. But I think it is worth it. Once you prove your program does not deadlock using programming logic, you may be sure it will not. So I would suggest you read the above book (if only to be aware of the techniques described there). It is more a theoretical discussion, but many very helpful paralell algorithms are described (and proved) there, starting with the very classical dining philosophers problem, up to distributed heartbeat algorithm, probe-echo algorithm and multiple-processor operating system kernel implementation. Hope this helps, Best regards, Milan Gardian =================================TOP=============================== Q226: Threads in multiplatform NT Yes, I have done this. Jason "Nilesh M." wrote: > > Can I write threaded programs for Win NT and just recompile for both Alpha > and i386 without any changes or minor changes? =================================TOP=============================== Q227: Guarantee on condition variable predicate/pthreads? Pete Sheridan wrote: > Thread 2: > pthread_mutex_lock(&m); > if (n != 0) > { > pthread_cond_wait(&m, &c); > assert(n == 0); > } > pthread_mutex_unlock(&m); > > The idea here is that thread 2 wants to wake up when n is 0. Is the > assert() correct? i.e., will n always be 0 at that point? When the > condition is signalled, thread 2 has to reacquire the mutex. Thread 1 > may get the mutex first, however, and increment n before this happens. > Is this implementation dependent? Or does thread 2 have to use "while > (n != 0)" instead of "if (n != 0)"? The assert() is incorrect. The POSIX standard carefully allows for a condition wait to return "spuriously". I won't go into all the details, but allowing spurious wakeups is good for both the implementation and the application. (You can do a search on Deja News if you really want to know, because I've explained this several times before; or you can read my book, about which you may learn through the www.awl.com link below.) To correct Thread 2, change the "if" into a "while" and move the assertion out of the loop. But then, it becomes rather trivial. You hold the mutex and loop until n == 0, so, of course, it will be 0 when the loop terminates (with the mutex still locked). /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q228: Pthread API on NT? > I need to port a lot of code to NT that depends on pthreads. Has anyone > built a pthread library on NT using Win32 threads? > Scott sourceware.cygnus.com/pthreads-win32 "Darius S. Naqvi" wrote: > Dave Butenhof writes: > > > > > Lee Jung Wooc wrote: > > > > > Any one please help to redirect signal to the mainthread or > > > any idea on how to make the signal to process is handled in > > > main thread context ? > > > > As Patrick McPhee has already suggested, I recommend that you stop relying on > > a signal handler for this. To expand a little on his advice, begin by masking > > SIGUSR2 in main(), before creating any threads. Then create a special thread > > that loops on the sigwait() function, waiting for occurrences of SIGUSR2. (If > > the signal is not masked in ALL threads, then it may "slip through" somewhere > > while the signal thread is not waiting in sigwait() -- for example, while it's > > starting, or while it's responding to the previous SIGUSR2.) > > > > Does the signal become pending in the sigwaiter thread in that case? > To be clear: suppost that a given signal is blocked in all threads, > and one thread sigwait()'s on it. Suppose that the while the > sigwait()ing thread is not actually in sigwait(), that signal is sent > to the process. Is the signal then pending in the sigwait() thread, > so that the next call to sigwait() notices the signal? If *all* threads have the signal blocked, the the signal remains pending against the process. The next thread that makes itself able to receive the signal, either by unblocking the pending signal in it signal mask or by calling sigwait, will receive the pending signal. > > > I've been assuming that since a signal is received by only one of the > threads in which it is not blocked, it is not made pending in the > blocking threads *if* there exists a thread that is not blocking it. > In order to not lose any signals, it must then be the case that if > every thread is blocking a signal, then when a signal is sent to the > process, it is made pending in *every* thread. I.e., either one > thread receives the signal and it is not made pending in any thread, > or the signal is pending in every thread. Is this true? (I don't > have a copy of the standard, but the book "Pthreads Programming" from > O'Reilly and Associates is silent on this matter.) Signals sent to a process never "pend" against a thread. They can only be pending against the process, meaning, as I explained above, that any qualified thread can eventaully take the signal. Only per-thread signals, sent via pthread_kill() can be pending against a thread that has the signal blocked. Externally, it's not that complicated. Internally, it can interesting.... __________________________________________________ Jeff Denham ([email protected]) Bright Tiger Technologies: Resource-management software for building and managing fast, reliable web sites See us at http://www.brighttiger.com 125 Nagog Park Acton, MA 01720 Phone: (978) 263-5455 x177 Fax: (978) 263-5547 =================================TOP=============================== Q229: Sockets & Java2 Threads Nader Afshar wrote: > Part of a GateWay I am designing is composed of two threads. One thread > delivers messages to a server through a socket connection, the other > thread is always listening on the output-stream of the other server for > incoming messages. > > The problem I'd like to solve is How to stop the second thread. Since > that thread is blocked listening on the socket connection, I can not use > the wait() and notify() method to stop it. Furthermore since Thread.stop > is deprecated in Java2, I seem to be in a quandary!! > > Any suggestions, would be most appreciated. > > btw. I was thinking of using the socket time-out feature and then after > checking for some state variable indicating a "disconnect" request, > going back to listening on the socket again, but this approach just does > not seem very "clean" to me. > > Regards > Nader [For Java 2, this works just fine. See the Java code example ServerInterrupt on this web page. -Bil] Yes, we had the same problem. interrupt() doesn't work reliable, if the threads is blocking because of reading from a socket. Setting a variable was also not very "clean", since you also have to set a timeout then for reading. I did it this way: I opened the socket in an upper thread and passed it to the receiving thread. When I want to stop the thread, I simply clos the socket. This causes the blocking read method to throw a Exception, that could be caught. So the thread can end in a clean way. This is also the method suggested by SUN. It seems, that there is not better solution. greetings Charly >This is also the method suggested by SUN. It seems, that there is not >better solution. Despite being recommended by Sun (where do they recommend this?) it is not guaranteed to work on all platforms. On some systems closing the Java socket does not kick the blocked thread off. Such behaviour is not currently required by the API specs. David =================================TOP=============================== Q230: Emulating process shared threads "D. Emilio Grimaldo Tunon" wrote: > I was wondering if there is a way to emulate process shared > mutexes and condition variables when the OS supports Posix > threads but *not* process shared items? I know I can test > for _POSIX_THREAD_PROCESS_SHARED, but if it is not defined, > meaning that THAT is not implemented, then what are my > alternatives? of course assuming there WILL be two processes > wanting to share a mutex/condition variable. Depends on your requirements, and how much work you want to do. First, you could just use some existing cross-process mechanism to synchronize. You could use a POSIX (or SysV) semaphore. A message queue. You could use a pipe -- threads try to acquire the lock by reading, and "release" the lock by writing (unblocking one reader). You could even create a file with O_EXCL, and retry periodically until the owner releases the lock by deleting the file. You COULD emulate a mutex and condition variable yourself using (completely nonportable) synchronization instructions, in the shared memory region, and some arbitrary "blocking primitive" (a semaphore, reading from a pipe to block and write to unblock, etc.) It can be a lot of work, but it can be done. There are a million alternatives. You just need to decide how important the performance is, and how much time you're willing to spend on it. You might also keep in mind that a few systems already support UNIX98 (Single UNIX Specification, Version 2), and the others will as soon as the usual morass of overloaded and conflicting product requirements allows. UNIX98 REQUIRES implementation of the "pshared" option. /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q231: TLS in Win32 using MT run-time in dynamically loaded DLLs? In article <[email protected]>, Mike Smith wrote: >That's a mouthful! :-) > >Let me try again. I'm writing a Win32 DLL that will be loaded dynamically >(i.e. via LoadLibrary()). This DLL will spawn multiple concurrent instances >of the same thread, each of which must have some local variables. I'd >prefer if possible to use the runtime library _beginthread() or >_beginthreadex() rather than the Win32 functions (CreateThread() etc.) >Meanwhile, the docs for LoadLibrary() that come with VC++6 seem to indicate >that dynamically loaded DLLs cannot have thread-local storage, at least not >provided by the run-time library. If you are talking about the Microsoft language extension declspec(thread_local) // or however you spell it you should probably not be using it in the first place. It's best to get the job done using the standard language syntax as much as possible and stay away from compiler extensions. There is an alternative way to manage thread-local storage. Have a look at tlsalloc() and friends. This API is a pale imitation of the POSIX thread-specific keys facility, but it gets the job done. You CAN use thread-specific storage in DLL's if you use tlsalloc() Even though tlsalloc() lacks the cleanup facility that its POSIX counterpart offers, if you are writing your code strictly as a DLL you can hack in your own cleanup and destruction of thread-specific data, since your DllMain is called each time a thread is created or destroyed in the process. >Has anybody run across situation before? How did you handle it? I was >thinking about allocating the worker thread's local storage prior to >starting the thread, then passing a pointer to the memory in the thread >function's (void *) parameter. Better ideas? I usually do this dynamically. If an object requires a thread-specific pointer to something, I will create the index (or key, in POSIX terminology) when that object is constructed. Then as threads use the object, they each initialize their corresponding slot when they detect that it's null. =================================TOP=============================== Q232: Multithreaded quicksort Gordon Mueller wrote in message <[email protected]>... > > I'm looking for a multi-threaded/parallel implementation > (source or detailed description) of the famous quicksort > algorithm. I'd like to specify a maximum number of k (moderate) > processors/threads and I'm looking for linear speed-up, of course. Have a look at Chap. 20 in my book, "C Interfaces and Implementations: Techniques for Creating Reusable Software "(Addison-Wesley Professional Computing Series, 1997, ISBN 0-201-49841-3); there's a multi-threaded implementation of quicksort in Sec. 20.2.1. The source code is available on line; see http://www.cs.princeton.edu/software/cii/. dave hanson ================ Parallel quicksort doesn't work all that well; I believe the speedup is limited to something like 5 or 6 regardless of the number of processors. You should be able to find a variety of parallel sorting algorithms using your favourite search engine. One you may want to look at is PSRS (Parallel Sorting by Regular Sampling), which works well on a variety of parallel architectures and isn't really difficult conceptually. You can find some paper describing it at http://www.cs.ualberta.ca/~jonathan/Papers/par.1993.html http://www.cs.ualberta.ca/~jonathan/Papers/par.1992.html Steve -- -- Steve MacDonald, Ph.D. Candidate | Department of Computing Science [email protected] | University of Alberta http://www.cs.ualberta.ca/~stevem | Edmonton, Alberta, CANADA T6G 2H1 =================================TOP=============================== Q233: When to unlock for using pthread_cond_signal()? POSIX specifically allows that a condition variable may be signalled or broadcast with the associated mutex either locked or unlocked. (Or even locked by someone else.) It simply doesn't matter. At least, signalling while not holding the mutex doesn't make the program in any way illegal. A condition variable is just a communication mechanism to inform waiters of changes in shared data "predicate" conditions. The predicate itself IS shared data, and must be changed in a way that's thread-safe. In most cases, this means that you must hold the mutex when you change the data. (But you could also have a predicate like "read() returns data", so that you could write data, signal the condition variable -- and the waiter(s) would simply loop on the condition wait until read() returns some data.) The signal doesn't need to be synchronized with the predicate value. What you DO need to synchronize is SETTING the predicate and TESTING the predicate. Given that basic and obvious requirement (it's shared data, after all), the condition variable wait protocol (testing the predicate in a loop, and holding the mutex until the thread is blocked on the condition variable) removes any chance of a dangerous race. However, your scheduling behavior may be "more predictable" if you signal a condition variable while holding the mutex. That may reduce some of the causes of "spurious wakeups", by ensuring that the waiter has a slightly better chance to get onto the mutex waiter list before you release the mutex. (That may reduce the chance that some other thread will get the mutex, and access to the predicate, first... though there are no guarantees.) (There's a lot more about this in my book, information on which can be found through the www.awl.com link in my signature way down there at the bottom.) > You see, pthread_cond_signal has no effect if nobody is actually waiting > on the condition variable. There is no ``memory'' inside a condition variable > that keeps track of whether the variable has been signalled. Signalling > a condition variable is like shouting. If nobody is around to hear the > scream, nothing happens. > > If you don't hold the lock, your signal could be delivered just after another > thread has decided that it must wait, but just before it has actually > performed the wait. In this case, the signal is lost and the thread will wait > for a signal that never comes. This would be true if you failed to hold the lock when SETTING the predicate. But that has nothing to do with SIGNALLING the condition variable. Either the predicate is changed before the waiter's predicate test, or it cannot be changed until after the waiter is "on" the condition variable, in position to be awakened by a future signal or broadcast. You are correct that signalling (or broadcasting) a condition variable with no waiters "does nothing". That's good -- there's nothing FOR it to do. /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q234: Multi-Read One-Write Locking problem on NT Alok Tyagi wrote: > We are encountering a problem with MRSW (Multi-Read Single Write) or > SWMR/MROW Locks on Windows NT :- > > We have our own MRSW functionality implemented using mulitiple Semaphores. > We are experiencing a problem when > process holding a shared lock dies ungracefully and consequently, no other > processes requesting the exclusive access succeed until the MRSW resource is > removed and re-created. On Unix platforms, the OS SEM_UNDO mechanism can be > used. Are you aware of any solution to this problem on NT? > > TIA, > > --alok Hi, It turns out that ntdll.dll provides undocumented MRSW support which you might find of interest. There is an article on it in the Jan. 1999 edition of the Windows Developer's Journal (www.wdj.com). I have not used it myself but it looks interesting, if you understandably feel a bit shakey about using an undocumented microsoft feature, the article provides an insite of how the MRSW Lock is implemented. Hope this is of help.. Kevin\ =================================TOP=============================== Q235: Thread-safe version of flex scanner In article <[email protected]>, Donald A. Thompson wrote: % I am looking for a version of the flex program that produces a thread-safe % lex.yy.c. The version on this system (2.5) has a -+ option, which produces a C++ scanner class which is said to be re-entrant. % Alternatively, I'd like some tips on how to make the lex.yy.c % thread-safe. You need to re-implement the input routines, and change the interface to yylex so that things like state variables, yyleng and yytext, and many of those other globals are passed on the stack. You don't have to worry about the character class tables, since they're read-only, but pretty much everything else needs to be put through the call stack. You then need to create a skeleton file with your changes and tell flex to use it instead of it's default one. This is a big job, so you might think about either using the scanner class from the -+ option, or having only one thread run the scanner, and perhaps generate a byte-code which can be run by other threads. -- Patrick TJ McPhee East York Canada [email protected] =================================TOP=============================== Q236: POSIX standards, names, etc Jason L Reisman wrote: >Hi, >I am new to the world of POSIX and am interested in finding out all I >can before starting to code. >I have a few questions regarding the standard. Any help would be >greatly appreciated. > >(1) When looking up information on POSIX, I found POSIX.1, POSIX.4, etc. > What do the numbers mean? Are they indexes to different libraries or >differt versions? Lessee... This is complex, due to the history of the thing. POSIX.1 is really POSIX 1003.1, which is *the* POSIX standard (i.e. for Portable Operating System Interfaces). POSIX 1003.1 comes in several flavors, which are dated. The original is 1003.1-1990. The realtime interface, which was known during its development as 1003.4, and then 1003.1b were combined in to 1003.1 and the resulting spec was 1003.1-1994. Then the threads interface, which was known during development as 1003.4a was renamed to 1003.1c, and then combined (with a technical corrigenda to .1b) with 1003.1-1994 to produce 1003.1-1996. And yes, it's ugly. Here's a lousy attempt at a picture. Time increases from left to right. If you're viewing this in something that doesn't display news articles in a fixed-pitch font, it won't make sense. 1003.4 --+------ 1003.4a ---+ | | +- 1003.1b- + +- 1003.1c -+ | | 1003.1 -----------------+-- 1003.1 ----+---+-- 1003.1 --- . . . (+.1a? etc) 1990 1994 | 1996 1003.1i -----------+ (technical corrections to .1b) 1003.1 is the base. 1003.4 was "realtime extensions", and originally included threads. Threads were broken out to smooth the merges. 1003.1b is the realtime API amendment to 1003.1 1003.1c is the threads API amendment to 1003.1 1003.1a is the amendments for symbolic links, coming very soon. And the lettering indicates only when the projects were started, nothing more. >(2) Do POSIX sockets exist? A better way to say this is there a >standard interface (either created or supported by POSIX) to open and >maintain a socket? There is (yet another) set of amendments to 1003.1, known as 1003.1g, for this. I haven't looked at the drafts to see what the interface looks like, though. >(3) How compatible are pthreads between NT and Solaris (or any flavor of >UNIX for that matter)? If you have an NT pthreads implementation, I would hope that they're quite similar. Note that POSIX makes no requirements that threads be preemptive, unless certain scheduling options are supported, and the application requests them. This is commonly known as "the Solaris problem." >(3) Are there any recommended books for POSIX beginners (who already >know how to program)? Dave Butenhof's book, Programming with POSIX Threads, ISBN 0-201-63392-2, is quite good. In fact, I'd call it excellent, and that's not said lightly. -- Steve Watt KD6GGD PP-ASEL-IA ICBM: 121W 56' 58.1" / 37N 20' 14.2" Internet: steve @ Watt.COM Whois: SW32 Free time? There's no such thing. It just comes in varying prices... =================================TOP=============================== Q237: Passing ownership of a mutex? [See the code example for FIFO Mutexes on this web page. They *may* do what you want. -Bil "Fred A. Kulack" wrote: > You can't portably unlock a mutex in one thread that was locked by another > thread. Fred's absolutely correct, but, as this is a common problem, I'd like to expand and stress this warning. The principal attribute of a mutex is "exclusive ownership". A locked mutex is OWNED by the thread that locked it. To attempt to unlock that mutex from another thread is not merely "nonportable" -- it is a severe violation of POSIX semantics. Even if it appears to work on some platforms, it does not work. You may not be getting correct memory visibility, for example, on SMP RISC systems. While POSIX does not require that any implementation maintain the ID of the owning thread, any implementation may do so, and may check for and report the optional POSIX errors for illegal use of the mutex. The Single UNIX Specification, Version 2 (SUSV2, or UNIX98), adds support for various "types" of mutexes, and even POSIX provides for optional thread priority protection through mutex use. Most of these enhanced mutex attributes require that the implementation keep track of mutex ownership, and most implementations that track ownership will report violations. A program that attempts to lock a mutex in one thread and unlock it in another is incorrect, not just "potentially nonportable". It probably doesn't work even where the implementation fails to warn you of your error. Don't even think about doing that to yourself! If you really want to "hand off" a lock from one thread to another, you don't want to use a mutex. (This is almost always a sign that there's something wrong with your design, by the way, but to all rules there are exceptions.) Instead, use a binary semaphore. A semaphore doesn't have any concept of "ownership", and it's legal to "lock" a semaphore (e.g., with the POSIX operation sem_wait()) in one thread and then "unlock" the semaphore (sem_post()) in another thread. Legal... but is it correct? Well, remember that the synchronization operations are also memory visibility points. If you lock a semaphore in one thread, and then modify shared data, you need to be sure that something else (other than the unlock you're not doing) will ensure that the data is visible to another thread. For example, if you lock a semaphore, modify shared data, and then create another thread that will "own" the semaphore, the visibility rules ensure that the created thread will have coherent memory with its creator; and the semaphore unlock will pass on that coherency to the next thread to lock the semaphore. /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ =================================TOP=============================== Q238: NT fibers =================================TOP=============================== Q239: Linux (v.2.0.29 ? Caldera Base)/Threads/KDE jimmy wrote: > I have Caldera linux (version, if I remember it right 2.0.29, or close > to that.) My questions: can I use threads with it? (It didn't come with > the package, that's for sure.) If so, where would I get it, and what > else do I need in order to start using them? Is there any problem with > using KDE with threads? Finally, I read somewhere that g++ doesn't like > threads--is that right? Am I limited to C if I use threads? You are certainly not going to be able to use POSIX threads with Cladera. The "linuxthreads" package comes with the glibc library. Caldera is still libc5 based. That having been said, there is no reason why you can't use threads with KDE. Do you plan on *writing* KDE applications? If so, you need to determine whether or not Qt is thread-safe. Lastly, there is no limitation on using only C when using POSIX threads. In fact there is already a small C++ threads package called TThreads which can provide some easier interface to the threading library. (I think they've implimented it wrong, but it's still quite usable.) Paul Braman [email protected] =================================TOP=============================== Q240: How to implement user space cooperative multithreading? wrote: >Thanks for the help! > >1. My goal is to find a way to implement user space cooperative >multithreading ( the reason is it should works with hardware description >language which is event driven and basically serial ). The attached file >context.c shows my basic understanding. Main() calls ping() and pong() and >these two functions have private stacks so they can run independently. My >question is how to avoid those stacks ( they are on heap not on the process's >stack so it's not under control of OS kernel ) get overflowed. Make the stacks sufficiently large, and watch your use of auto variables and recursion. You can also probably roll some special assertions that can be put at the start of a function that will go off when you are getting too close to collision. Write a function which uses some tricks to return the remaining stack space (for example, take the address of a local variable and then subtract the stack top, or vice versa, depending on whether stacks grow up or down). Assert if the stack space is down to a few K. Or you could allocate stacks in their own memory mapped regions which dynamically allocate pages as they are accessed, and therefore can be large without occupying virtual memory. >2. I don't understand how pre-emptive multithreading is implemented, >especially the implementation of pure user space multithreading. I understand >preemptive multitasking in process level -- kernel scheduler does the job. This requires some support from the operating system or from the bare hardware. I don't know about Win32. In UNIX, you can do it using alarm signals as the basis for pre-emption. Win32 doesn't implement signals properly; Microsoft has just paid some token attention to them, because the header and the signal() function are part of ANSI C. In UNIX, signal delivery does most of the work already for implementing pre-emptive multi-tasking within a single process. The signal mechanism already takes care of saving the machine context. To do the remainder of a context switch in a signal handler, all you really have to do is switch the procedure context only---usually just the stack pointer. so you need some simple context-switching function which takes a pointer to two stack context areas. It saves the current stack context into one area, and restores a previously stored context from the other area. (The other area is chosen by your scheduler routine, of course, because it determines which thread will run). The pre-emptive context switch essentially takes place when a signal handler occurs in one thread, saves its stack context information, restores the stack context info of another thread and executes a return. Later on, the thread's turn will come up again, and it will return from the same signal handler which pre-empted it earlier. The detailed machine context is actually stored in the signal stack; when you resume a thread by having it execute the return from the signal handler is when the precise state of the CPU registers is restored. Also, you have to still take care of voluntary context switches, which don't go through the signal mechanism. The entry point to the voluntary reschedule context switch has to ensure that it saves at least those registers that are designated by the compiler's conventions as being ``caller saved''. When that thread is later restarted, it will execute a return from that voluntary switch function, and any registers that the compiler expects not to be clobbered must be restored. These could just be saved on the stack, or they could be in the same context area where the aforementioned stack context is put. (It's best to not cut corners and write an earnest routine that performs a complete machine context switch, even though this information is already in the signal stack, and even though some registers may be clobbered during a function call.) The context switch looks like this (in pseudocode) and must be written in assembly language, since C doesn't give you access to regiters: void context_switch(void *old_context, void *new_context) { /* save all registers into new context */ /* restore registers from old context */ } The registers do not include the instruction pointer, because that is stored on the stack of the context_switch() function. When the stack pointer is switched, the function will return in a stack frame different from the one it was called for, and the return address in this stack frame will restore the correct IP. On some RISC platforms, this stuff is tricky to do! Not all modern computing platforms give you the neat layout where you have simple stack frames and CPU registers. For example, above, I have assumed that the function return address is in the stack frame. In a RISC system, the return address might actually be in a register. And what about the SPARC architecture with register windows that supply much of the functionality of a stack frame? Therefore, take this advice: even if you write your own user-level threading package, save yourself the headache and steal someone else's context switch code. It's easy enough to write for a things like 68000 or 80x86, but forget about SPARC or PA-RISC. :) >??#@!, I don't have a clue, help! > >Jian > >#include >#include >#include > >#define STACK_SIZE 4096 > >int max_iteration; >int iter; > >jmp_buf jmp_main; >jmp_buf jmp_ping; >jmp_buf jmp_pong; These three buffers clearly form an extremely primitive process table. In the general case, you need a whole array of these, one for each thread. >void ping(void) { > int i = 0; > int j = 1000; > > if (setjmp(jmp_ping) == 0) > longjmp(jmp_main, 1); Right, here you are setting up the ping ``thread'' by saving its initial context. Later, you will resume the thread by doing a longjmp() to this context. In a real threading package, you would have some code for creating a thread which would allocate a free context, and then ``prime'' it so that the thread is started as soon as the scheduler chooses it. This priming is often a hack which tries to set up the thread context so that it looks like the thread existed before and voluntarily rescheduled. In other words, you might have to write things into the thread's stack which will fool context_switch() into executing a ``fake'' return to the entry point of the thread! I remember with fondness writing this type of fun code. :) One fine afternoon I wrote a tiny pre-emptive multi-tasking kernel on a PC using the DEBUG.COM's built in interactive assembler as my only development tool. The whole thing occupied only 129 bytes. That same day, I wrote some sample ``applications'' which animated things on the screen, as well as a KILL command to terminate threads. The scheduling policy was round-robin with no priorities. There was only one system call interrupt, ``reschedule'', which was really just a faked clock interrupt to invoke the scheduler. :) The KILL command worked by locating the kernel in memory by scanning through the interrupt chain, moving the last process in the table over top of the one being deleted and decrementing the process count---all with interrupts disabled, of course. The argument to KILL was the process ID, which was just its position in the table. This feature made it fun to guess which number to use to kill a particular process, since the relocated process inherited the ID of the killed process. :) It was the day after exams and suddenly had nothing to do, you see, and was eager to combine the knowledge gleaned from a third-year operating systems class with obfuscated programming. :) > while (1) { > i += 2; > j += 2; Oops! The variables i and j will no longer have their correct values after you come back here from the longjmp. That's because setjmp and longjmp don't (necessarily) save enough context information to preserve the values of auto variables. (Say, did you try running it after compiling with lots of optimizations turned on?) Declaring i and j volatile might help here, because that should force the compiler to generate code which re-loads them from memory. Of course, the ANSI C definition of volatile is a little wishy washy. :) In un-optimized code, volatile is often redundant, because variables are not aggressively assigned to registers, which explains why code like this often works until a release version of the program is built with optimizations. Or, by chance, the i and j registers may have been assigned to those ``caller saved'' registers that got implicitly saved and restored in the call to setjmp(). On 80x86 trash, there is such a dearth of registers that many compilers mandate that most of the registers must not be clobbered by a function. With GCC, I think that only EAX and EDX may be clobbered, though I don't recall exactly. This allows the generated code to keep as many temporary values in registers as is reasonably possible, at the cost of forcing called code to save and restore. What you really need is to forget setjmp() and longjmp() and get a real context switch routine. It shouldn't be hard to roll your own on Intel. (If you aren't using floating point math, you can get away with saving just the integer registers, so you don't have to mess with that awful floating point ``stack'' thing that some junkies at Intel dreamed up during a shared hallucination.) Anyway, I've ranted long enough about things that are probably of no interest to anyone, so good night. =================================TOP=============================== Q241: Tools for Java Programming In article <[email protected]>, Bil Lewis writes: > I'm in the midst of finishing "Multithreaded Programming > with Java" and am working on a short section covering which > companies have interesting tools for debugging, testing, > analyzing, etc. MT programs. > > Obviously, I am covering Sun's JWS, Numega's JCheck, and > Symantec's Java Cafe. Are the other products that I should > know about? > > -Bil Bil - Parasoft has a Java analyzer. I haven't used it, but if it's as good as their C version (Insure++), it's probably worth writing about. I think they have a "free" trial period too. Look at www.parasoft.com for more information. Sue Gleeson > >What I was wondering was if there was a tool (a lint sorta thing) > >available that would go through code and flag trouble spots, like global > >data usage, and static local data, etc. I of course don't expect the > > That tool is your brain! If we are talking about C, you can look for > static data by using grep to look for the string ``static''. That > is easy enough. Unfortunately, "brains" are notoriously poor at analyzing concurrency, and this is exactly the kind of problem that automated analysis and testing tools are likely to do better than people. Not as a substitute for reasoning of course, but as a significant aid (just like lint) I'm aware of at least two commercial tools that test for race conditions and/or deadlocks: AssureJ for Java (http://www.kai.com/assurej/) Lock Lint for C (http://www.sun.com/workshop/threads/doc/locklint.ps) My understanding is that there will soon be other tools in this space as well. > >tool to fix any of the problems, nor really even know for sure when a > >problem is a problem, but just flag that there may be a problem. > > It's hard enough to automate the correctness verification of ordinary > single-threaded logic. The halting problem tells us that this isn't even > possible in the general case. Luckily, you don't have to verify the correctness of software to be useful. For example, you can just check that the observed locking pattern is consistent with a "safe" locking discipline (e.g. a particular piece of shared data is always protected the same lock). Myself and some folks at DEC SRC (now Compaq SRC), built a tool like this that was extremely effective at finding race conditions. See http://www.cs.washington.edu/homes/savage/papers/sosp97.ps for details. - Stefan =================================TOP=============================== Q242: Solaris 2.6, phtread_cond_timedwait() wakes up early This may not answer the question, but it could solve the problem ! You can change the timer resolution in Solaris 2.6 and 2.7 by putting this in /etc/system and rebooting. set hires_tick = 1 This sets the system hz value to 1000. Mark John Garate wrote in message <[email protected]>... >For PTHREAD_PROCESS_SHARED condition variables, pthread_cond_timedwait() >timeouts >occur up to about 20ms prior to the requested timeout time (sample code >below). I wasn't >expecting this. I realize clock ticks are at 10ms intervals, but I >expected my timeout to occur at >the soonest tick AFTER my requested timeout, not before. Were my >expectations out of line? > >cc -mt -lposix4 testwait.c > >/* testwait.c */ >#define _POSIX_C_SOURCE 199506L >#include >#include >#include >#include > >pthread_cond_t cv; >pthread_mutex_t mutex; > >int main(int argc, char *argv[]) { > pthread_condattr_t cattr; > pthread_mutexattr_t mattr; > timespec_t ts_now, ts_then; > int timed_out; > int rc; > > /* condition variable: wait awakes early if PROCESS_SHARED */ > if(pthread_condattr_init(&cattr)) exit(-1); > if(pthread_condattr_setpshared(&cattr, PTHREAD_PROCESS_SHARED)) >exit(-1); > if(pthread_cond_init(&cv, &cattr)) exit(-1); > if(pthread_condattr_destroy(&cattr)) exit(-1); > > /* mutex: doesn't matter whether PROCESS_SHARED or not (only cv >matters) */ > if(pthread_mutexattr_init(&mattr)) exit(-1); > if(pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED)) >exit(-1); > if(pthread_mutex_init(&mutex, &mattr)) exit(-1); > if(pthread_mutexattr_destroy(&mattr)) exit(-1); > > /* calculate future timestamp */ > clock_gettime(CLOCK_REALTIME,&ts_then); > ts_then.tv_sec+=1; > > /* wait for that time */ > timed_out = 0; > if(pthread_mutex_lock(&mutex)) exit(-1); > > while(!timed_out) { > rc = pthread_cond_timedwait( &cv, &mutex, &ts_then ); > clock_gettime(CLOCK_REALTIME,&ts_now); > > switch(rc) { > case 0: > printf("spurious, in my case\n"); > break; > case ETIMEDOUT: > timed_out=1; > break; > default: > printf("pthread_cond_timedwait failed, rc=%d\n",rc); > exit(-1); > } /* switch */ > } /* while (!timed_out) */ > > pthread_mutex_unlock(&mutex); > > /* did we wake-up before we wanted to? */ > if (ts_now.tv_sec < ts_then.tv_sec || > (ts_now.tv_sec == ts_then.tv_sec && > ts_now.tv_nsec < ts_then.tv_nsec)) { > printf("ts_now %10ld.%09ld\n", ts_now.tv_sec, ts_now.tv_nsec); > printf("ts_then %10ld.%09ld\n", ts_then.tv_sec, ts_then.tv_nsec); > } > return(0); >} /* main */ > > =================================TOP=============================== Q243: AIX4.3 and PTHREAD problem In article <[email protected]>, Red Hat Linux User wrote: % After I sent this, I was talking to an IBM'er who was trying to convince me % to upgrade % our machines to AIX 4.3. He mentioned that 4.3 provides POSIX support at % level(?) 7 and % 4.2 at level 5. He said that he had to change some of his code because there Find him, and kick him. AIX 4.1 and 4.2 provide two thread libraries: one roughly implements a draft (7 if you must know) of the posix standard, and one implements DCE threads. AIX 4.3 implements the posix standard, and provides backwards compatibility with the other two libraries. There are slight changes from the draft support available in the earlier releases. If significant code changes were needed to compile on 4.3, the original code was probably written for DCE threads. It's not too difficult to keep track of this, and if you're selling the stuff you really have an obligation to at least try. Kick him hard. You _can_ run programs compiled on 4.1 without change on a 4.3 machine. -- Patrick TJ McPhee East York Canada =================================TOP=============================== Q244: Readers-Writers Lock source for pthreads In the hope someone may find it useful, here's and implementation of a readers-writeres lock for PThreads. In this implementation writers are given priority. Compile with RWLOCK_DEBUG defined for verbose debugging output to stderr. This output can help track: 1. Mismatches (eg rwlock_ReadLock(); ... rwlock_WriteUnlock();) 2. Recursive locks (eg rwlock_ReadLock(); ... rwlock_ReadLock();) Amongst other things. The debugging output also includes the line numbers of where the lock was obtained (and released) for greater usefulness. The debugging mode has been implemented using thread specific data. Anyway, here's the source: /* START: rwlock.h */ #ifndef __RWLOCK_H__ #define __RWLOCK_H__ /* * $Id: rwlock.h,v 1.8 1999/02/27 14:19:35 lk Exp $ * * Copyright (C) 1998-99 Lee Kindness * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ #include struct rwlock; typedef struct rwlock *rwlock_t; #define RWLOCK_DEBUG 2 void rwlock_Init(rwlock_t rwl); rwlock_t rwlock_InitFull(void); void rwlock_Destroy(rwlock_t rwl, int full); #ifdef RWLOCK_DEBUG void rwlock_ReadLockD(rwlock_t rwl, char *f, int l); void rwlock_ReadUnlockD(rwlock_t rwl, char *f, int l); void rwlock_WriteLockD(rwlock_t rwl, char *f, int l); void rwlock_WriteUnlockD(rwlock_t rwl, char *f, int l); # define rwlock_ReadLock(R) rwlock_ReadLockD(R, __FILE__, __LINE__) # define rwlock_ReadUnlock(R) rwlock_ReadUnlockD(R, __FILE__, __LINE__) # define rwlock_WriteLock(R) rwlock_WriteLockD(R, __FILE__, __LINE__) # define rwlock_WriteUnlock(R) rwlock_WriteUnlockD(R, __FILE__, __LINE__) #else void rwlock_ReadLock(rwlock_t rwl); void rwlock_ReadUnlock(rwlock_t rwl); void rwlock_WriteLock(rwlock_t rwl); void rwlock_WriteUnlock(rwlock_t rwl); #endif #endif /* __RWLOCK_H__ */ /* END: rwlock.h */ /* START rwlock.c */ /* * $Id: rwlock.c,v 1.9 1999/02/27 14:19:35 lk Exp $ * * Routines to implement a read-write lock. Multiple readers or one writer * can hold the lock at once. Writers are given priority over readers. * When compiled with RWLOCK_DEBUG defined verbose debugging output is * produced which can help track problems such as mismatches and * recursive locks. * * Copyright (C) 1998-99 Lee Kindness * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ #include #include #include #include "rwlock.h" struct rwlock { #ifdef RWLOCK_DEBUG pthread_key_t key; #endif pthread_mutex_t lock; pthread_cond_t rcond; pthread_cond_t wcond; int lock_count; int waiting_writers; }; #ifdef RWLOCK_DEBUG struct LockPos { int type; char *file; int line; pthread_key_t key; }; static void rwlocki_WarnNoFree(void *arg); #endif static void rwlocki_WaitingReaderCleanup(void *arg); static void rwlocki_WaitingWriterCleanup(void *arg); /* * rwlock_InitFull() * * Allocate the memory for, and initialise, a read-write lock. */ rwlock_t rwlock_InitFull(void) { rwlock_t ret; if( (ret = calloc(sizeof(struct rwlock), 1)) ) rwlock_Init(ret); return( ret ); } /* * rwlock_Init() * * Initialise a static, or otherwise allocated, read-write lock. */ void rwlock_Init(rwlock_t rwl) { #ifdef RWLOCK_DEBUG pthread_key_create(&rwl->key, rwlocki_WarnNoFree); #endif pthread_mutex_init(&rwl->lock, NULL); pthread_cond_init(&rwl->wcond, NULL); pthread_cond_init(&rwl->rcond, NULL); rwl->lock_count = 0; rwl->waiting_writers = 0; } /* * rwlock_Destroy() * * Free all memory associated with the read-write lock. */ void rwlock_Destroy(rwlock_t rwl, int full) { #ifdef RWLOCK_DEBUG pthread_key_delete(rwl->key); #endif pthread_mutex_destroy(&rwl->lock); pthread_cond_destroy(&rwl->wcond); pthread_cond_destroy(&rwl->rcond); if( full ) free(rwl); } /* * rwlock_ReadLock() * * Obtain a read lock. */ #ifdef RWLOCK_DEBUG void rwlock_ReadLockD(rwlock_t rwl, char *f, int l) { struct LockPos *d; if( (d = (struct LockPos *)pthread_getspecific(rwl->key)) ) { fprintf(stderr, "RWL %p %s:%d already has %s lock from %s:%d\n", rwl, f, l, d->type ? "write" : "read", d->file, d->line); /* but we'll carry on anyway, and muck everything up... */ } if( (d = malloc(sizeof(struct LockPos))) ) { /* init the TSD */ d->type = 0; /* read */ d->file = f; d->line = l; d->key = rwl->key; /* and set it */ pthread_setspecific(rwl->key, d); #if RWLOCK_DEBUG == 2 fprintf(stderr, "RWL %p %s:%d read lock pre\n", rwl, f, l); #endif } else fprintf(stderr, "RWL %p %s:%d cannot alloc memory!\n", rwl, f, l); #else void rwlock_ReadLock(rwlock_t rwl) { #endif pthread_mutex_lock(&rwl->lock); pthread_cleanup_push(rwlocki_WaitingReaderCleanup, rwl); while( (rwl->lock_count < 0) && (rwl->waiting_writers) ) pthread_cond_wait(&rwl->rcond, &rwl->lock); rwl->lock_count++; /* Note that the pthread_cleanup_pop subroutine will * execute the rwlocki_WaitingReaderCleanup routine */ pthread_cleanup_pop(1); #ifdef RWLOCK_DEBUG fprintf(stderr, "RWL %p %s:%d read lock\n", rwl, f, l); #endif } /* * rwlock_ReadUnlock() * * Release a read lock */ #ifdef RWLOCK_DEBUG void rwlock_ReadUnlockD(rwlock_t rwl, char *f, int l) { struct LockPos *d; #else void rwlock_ReadUnlock(rwlock_t rwl) { #endif pthread_mutex_lock(&rwl->lock); rwl->lock_count--; if( !rwl->lock_count ) pthread_cond_signal(&rwl->wcond); pthread_mutex_unlock(&rwl->lock); #ifdef RWLOCK_DEBUG if( (d = pthread_getspecific(rwl->key)) ) { if( d->type == 0 ) fprintf(stderr, "RWL %p %s:%d read unlock at %s:%d\n", rwl, d->file, d->line, f, l); else fprintf(stderr, "RWL %p %s:%d mismatch unlock %s:%d\n", rwl, d->file, d->line, f, l); free(d); pthread_setspecific(rwl->key, NULL); } else fprintf(stderr, "RWL %p %s:%d read unlock with no lock!\n", rwl, f, l); #endif } /* * rwlock_WriteLock() * * Obtain a write lock */ #ifdef RWLOCK_DEBUG void rwlock_WriteLockD(rwlock_t rwl, char *f, int l) { struct LockPos *d; if( (d = (struct LockPos *)pthread_getspecific(rwl->key)) ) { fprintf(stderr, "RWL %p %s:%d already has %s lock from %s:%d\n", rwl, f, l, d->type ? "write" : "read", d->file, d->line); /* but we'll carry on anyway, and muck everything up... */ } if( (d = malloc(sizeof(struct LockPos))) ) { /* init the TSD */ d->type = 1; /* write */ d->file = f; d->line = l; d->key = rwl->key; /* and set it */ pthread_setspecific(rwl->key, d); #if RWLOCK_DEBUG == 2 fprintf(stderr, "RWL %p %s:%d write lock pre\n", rwl, f, l); #endif } else fprintf(stderr, "RWL %p %s:%d cannot alloc memory!\n", rwl, f, l); #else void rwlock_WriteLock(rwlock_t rwl) { #endif pthread_mutex_lock(&rwl->lock); rwl->waiting_writers++; pthread_cleanup_push(rwlocki_WaitingWriterCleanup, rwl); while( rwl->lock_count ) pthread_cond_wait(&rwl->wcond, &rwl->lock); rwl->lock_count = -1; /* Note that the pthread_cleanup_pop subroutine will * execute the rwlocki_WaitingWriterCleanup routine */ pthread_cleanup_pop(1); #ifdef RWLOCK_DEBUG fprintf(stderr, "RWL %p %s:%d write lock\n", rwl, f, l); #endif } /* * rwlock_WriteUnlock() * * Release a write lock */ #ifdef RWLOCK_DEBUG void rwlock_WriteUnlockD(rwlock_t rwl, char *f, int l) { struct LockPos *d; #else void rwlock_WriteUnlock(rwlock_t rwl) { #endif pthread_mutex_lock(&rwl->lock); rwl->lock_count = 0; if( !rwl->waiting_writers ) pthread_cond_broadcast(&rwl->rcond); else pthread_cond_signal(&rwl->wcond); pthread_mutex_unlock(&rwl->lock); #ifdef RWLOCK_DEBUG if( (d = pthread_getspecific(rwl->key)) ) { if( d->type == 1 ) fprintf(stderr, "RWL %p %s:%d write unlock at %s:%d\n", rwl, d->file, d->line, f, l); else fprintf(stderr, "RWL %p %s:%d mismatch unlock %s:%d\n", rwl, d->file, d->line, f, l); free(d); pthread_setspecific(rwl->key, NULL); } else fprintf(stderr, "RWL %p %s:%d write unlock with no lock!\n",rwl, f, l); #endif } static void rwlocki_WaitingReaderCleanup(void *arg) { rwlock_t rwl; rwl = (rwlock_t)arg; pthread_mutex_unlock(&rwl->lock); } static void rwlocki_WaitingWriterCleanup(void *arg) { rwlock_t rwl; rwl = (rwlock_t)arg; rwl->waiting_writers--; if( (!rwl->waiting_writers) && (rwl->lock_count >= 0) ) /* This only happens if we have been cancelled */ pthread_cond_broadcast(&rwl->wcond); pthread_mutex_unlock(&rwl->lock); } #ifdef RWLOCK_DEBUG static void rwlocki_WarnNoFree(void *arg) { struct LockPos *d = (struct LockPos *)arg; fprintf(stderr, "RWL 0 %s:%d exit during lock-unlock pair\n", d->file, d->line); free(d); pthread_setspecific(d->key, NULL); } #endif /* END rwlock.c */ =================================TOP=============================== Q245: Signal handlers in threads In article <[email protected]>, Thank you for posting an answer ! This is pretty tricky ... I'll give it a try .... Jeff Denham wrote: > Yes -- as I said recently in a post regarding a similar > question about sigwait() -- only the faulting threads can > catch its own synchronous signals/exceptions. > > You don't have to do the work strictly in a signal > handler, though. If you have a stack-based exception > handling package available to you, such as the > try/catch model in C++, you can handle the > synchronous exceptions in the exception handler. > This model essentially unwinds the > stack at the point the signal is caught > by a special handler and delivers it back > (close to) the orignal context and outside > of the signal-handler state. At this point, > you're at "thread level" and can > pretend you just returned from > a call to sigwait() ;^) > > (If I'm being overoptimistic about > actually being at thread level in > the catch() clause, someone please > correct me.) > > Here's a little example that catches > a SIGILL instruction on Solaris, > built using their V4.2 C++ compiler > and runtime: > > #include > #include > #include > #include > > int junk = -1; > > class SigIll > { > public: > SigIll(void) {}; > }; > > void ill(int sig) > { > throw SigIll(); > } > > main() > { > typedef void (*func)(int); > func f, savedisp; > > savedisp = signal(SIGILL, ill); > try { > cout << "Issue illegal instruction...\n" << endl; > f = (func)&junk > (*f)(1); > } > catch (SigIll &si) { > cout << "Exception!!!" << endl; > } > cout << "Survived!\n" << endl; > (void) signal(SIGILL, savedisp); > } > > I'm hardly an expert with the exception stuff, so hopefully > Kaz and the gang will correct/fill-in for me. > > -- Jeff > __________________________________________________ > Jeff Denham ([email protected]) > > Bright Tiger Technologies: Resource-management software > for building and managing fast, reliable web sites > See us at http://www.brighttiger.com > > 125 Nagog Park > Acton, MA 01720 > Phone: (978) 263-5455 x177 > Fax: (978) 263-5547 =================================TOP=============================== Q246: Can a non-volatile C++ object be safely shared amongst POSIX threads? In message <[email protected]>, "David Holmes" wrote: >I tend to agree with Kaz - I'm unconvinced that there is some global law of >compilation that takes care of this. Whilst simple compilers would not >optimise across function calls because of the unknown affects of the >function, smarter compilers employing data flow analysis techniques, >whole-program optimisation etc may indeed make such optimisations - after >all pthread_mutex_lock() does not access the shared data and the compiler >(without thinking about threads) may assume that the shared data is thus >unaffected by the call and can be cached. Please see ISO/IEC 9899-1990, section 5.1.2.3, example 1. >Now maybe all that means is that smart compilers have to be thread-aware and >somehow identify the use of locks and thereby imply that data protected by >locks is shared and mustn't be optimised. There's no way a C/C++ compiler can know what data are "protected by locks" - there's no such thing as "locks" in either language. > But do the compiler writers know >this? I think perhaps the use of simple compilers allows us to currently get >away with this. Would you care to name a couple of "simple" compilers? Anyway, you can take my word for it - compiler writers are usually smart enough to know they are doing compilers for potentially multi-threaded environment. At least I know that gcc/egcs and SunSoft folks are. >David =================================TOP=============================== Q247: Single UNIX Specification V2 A web reference you may find useful is http://www.unix-systems.org/single_unix_specification_v2/xsh/threads.html This contains an overview of POSIX Threads (as contained in the Single UNIX Specification V2) and links to all the pthreads functions. You can even download a copy of the specification from that site (see http://www.unix-systems.org/go/unix ) =================================TOP=============================== Q248: Semantics of cancelled I/O (cf: Java) David Holmes wrote: > In Java there is currently a problem with what is termed interruptible I/O. > The idea is that all potentially blocking operations should be interruptible > so that the thread does not block forever if something goes wrong. The idea > is sound enough (though timeouts would allow an alternative solution). > However Java VM's do not actually implement interruptible I/O except in a > very few cases. Discussion on the Javasoft BugParade indicates that whilst > unblocking the thread is doable on most systems, actually cancelling the I/O > request is not - consequently the state of the I/O stream is left in > indeterminate as far as the application is concerned > > This leads me to wonder how POSIX defines the semantics of cancellation when > the cancellation point is an I/O operation. Does POSIX actually specify what > the affects of cancellation are on the underlying I/O stream (device, > handle, whatever) or does it simply dictate that such operations must at > some stage check the cancellation status of the current thread? > > Thanks. POSIX hasn't a lot to say about the details of cancelled I/O. It has required and optional cancellation points. Most, if not all, the required points are traditional blocking system calls. Most of the optional ones are library I/O routines. From my kernel and library experience, it's a lot easier to cancel the system calls than the library calls, because the library calls can hold internal I/O mutexes (yikes) across system calls. If that system call is canceled, the locks must be released. That means the library has to have cleanup handlers in stdio and elsewhere -- doable but potentially costly in implementation and performance. At Digital, last I knew, we were planning to support the optional points in the future (past V5.0?). Don't know the current status. In practice, the semantics of syscall cancellation are pretty much those of signals (and in a number of implementation I know of, pthread_cancel() involves some kind of specialized signal). In other words, if you're blocked in a read() system call, and a SIGXXX arrives, you'll be broken out of the sleep, and, if a signal handler is present, take the signal. If SA_RESTART is not on for the signal, the read() call returns with status -1/errno EINTR. The outcome of the I/O operation is undefined. In the case of cancellation, the error return path from the system call is redirected to a special cancellation handler in the threads library, which starts the process of propagating the cancel down the calling thread's stack. When I implemented system call cancellation on Digital UNIX, I followed this signal model, which applies only to *interruptible* sleeps in the kernel. If there's actual physical I/O in progress, the blocking in the kernel will be *uninterruptible*. This is the case when a physio() operation is in progress, meaning that the I/O buffer is temporarily wired into memory and that the thread calling read() cannot leave the kernel until the I/O completes and the memory is unwired. In these cases, the cancellation pends, just like a blocked signal, until the read() thread is about to exit the kernel, at which point the pending cancel is noticed and raised in the usual way. So, in the case of both an EINTR return and a cancel, the calling thread never has a chance to examine the outcome of the I/O operation. For a cancellation, the I/O may be complete, but the canceled thread will never see that fact directly, because its stack will be unwound by the cancellation handler past the point where the read() was called. I'm not sure whether this ramble is at all on point for you... There's probably nothing here you don't already know, but maybe there's a few useful hints. The bottom line is that most OSs offer very little in the way canceling I/O that has already been launched. If you look at the AIO section of POSIX.1c, specifically at aio_cancel(), you'll notice that the implementation is not required to do anything in response to the cancellation request. The only real requirement that I recall is to return an AIO_CANCELED status on successful cancellation. But you may never get that back. (On Digital UNIX, you can cancel AIO that's still queued to libaio, but for kernel based AIO, you'll never successfully cancel -- the request is gone into the bowels of the I/O subsystem.) So, FWIW, sounds to me like you should map this Java I/O cancel thing right onto pthread cancellation... -- Jeff __________________________________________________ Jeff Denham ([email protected]) Bright Tiger Technologies: Resource-management software for building and managing fast, reliable web sites See us at http://www.brighttiger.com 125 Nagog Park Acton, MA 01720 Phone: (978) 263-5455 x177 Fax: (978) 263-5547 Jeff Denham wrote in message <[email protected]>... >So, FWIW, sounds to me like you should map this >Java I/O cancel thing right onto pthread cancellation... Thanks Jeff. You seemed to confirm basically what I thought. With the java situation there are problems both with implementing interruptions on different platforms and establishing what the semantics of interruptions are and how they can be used. Perhaps part of the problem is that in Java they have to both deal with the semantics at the lowest level of the API's (similar to the level POSIX works at) and at a higher level too. I was just curious how POSIX dealt with the issue - maybe the Java folk are worrying too much. FYI here's a snip from the relevant bug parade entry (4154947): Besides the above implementation issues, we also need to consider the usage of interruptable semantics. Considering when one user (Java) thread need to wake up another thread, (let me name it "Foo") which is blocked on the DataInputStream, which wraps SocketInputStream which wraps recv(). When the interrupt exception is thrown, the exception will be propagated all the way up to the user level. However the state of DataInputStream, SocketInputStream, recv() are possibly in unknown state. If the user ever want to resume the io operation later, he may get unknown data from stream, and get totally lost. So Foo has to remember to close the stream if he get interrupted. But in this way, the usability of interruptable is largely lost. It is much like the close() semantics of windows. When I use grep to search the entire build tree, the IOException appear at about 1600 places. There are 67 places catch IOException, but only 9 places catch InterruptedIOException in PrintStream and PrintWriter class. Generally, the InterruptedIOException is considered as IOException, treated as fatal error. Making InterruptedIOException to have resumption semantics will be extremely difficult on any platform, and will be against the semantics of Java language exception. But if we choose termination semantics, the interruptable io is very similar to the close() semantics. Thanks again, David =================================TOP=============================== Q249: Advice on using multithreading in C++? On Tue, 30 Mar 1999 09:55:30 +0100, Ian Collins wrote: >Paul Black wrote: >> >> Does anyone have any advice on using multithreading in C++? Searching around, >> I've noticed a book "OO multithreading in C++". The book seemed to get a >> mixed reaction on the online bookstores, is it a recommended read? Are there >> any other books or resources to be recommended? >> >A few guides: > >Use a static member function as the thread run function, pass it 'this' >in pthread_create and cast the thread void* back to the class to use it. > >Make sure you understand the relationship between key data and class >data. > >Take care with class destruction and thread termination. I tend to use >joinable threads, so the class destructor can join with the thread. This is not good. By the time you are in the destructor, the object should no longer be shared; the threads should be already joined. When the destructor is executing, the object is no longer considered to be a complete object. It's not that calling the join operations is bad, what's bad is that there are still threads running. A particularly bad thing is to be in the destructor of the base class sub-object, with threads still roaming over the derived object! >Make sure your thread does not run before the containing class is >constructed! This can cause wierd problems on MP systems. Actually it can cause weird problems in non-MP systems too. It is simply verboten to use an object before it is constructed. Therefore, it's a bad idea to launch the internal threads of active objects from within the constructors of those objects. Such threads may be scheduled to run before construction completes, which may happen in non-MP systems too. The best practice with respect to destruction and construction is this: an object may be shared by multiple threads only after construction completes and before destruction commences. One way to do this is to write your active objects such that they have a Start() method that is not called from the constructor, and a Join() method that is separate from the destructor. The caller who created the object and called its constructor calls Start() immediately after construction, or perhaps after synchronizing. The Join() method simply joins on all of the threads associated with the active object. Usually, I also implement a method called Cancel() which triggers shutdown of all the threads. Having a separate Start() method is useful not only from a safety point of view, but it has practical uses as well. Here is an example. In one project I'm working on, I have a protocol driver object which has two sub-objects: a protocol object, and a device driver object. Both of these invoke callbacks in the driver, which sometimes passes control back---for example, the device driver object may hit a callback that passes received data, which is then shunted to the protocol object, which then may invoke a callback again to pass up processed data. The protocol object doesn't have any threads, but it does register a timer, which is practically as good as having a thread. The driver has two threads, for handling input and output. If I registered the timer immediately after constructing the protocol object, and started the I/O threads immediately after constructing the driver, it would be a very bad thing! Because either object might start hitting callbacks, which would end up inside the other object that is not yet constructed. Because I have separate start-up methods, I can implement a construction phase that builds both objects, and then a start phase which starts their threads or timers. Similarly, when I'm shutting down the arrangement, it would be terrible to stop the threads of the driver and destroy the driver, because the protocol timer is still running! Having Cancel() and Join() separate from destruction lets me kill all of the timer and thread activities for both objects, and then release the memory. =================================TOP=============================== Q250: Semaphores on Solaris 7 with GCC 2.8.1 I am writing a mutliprocess application that will utilize a circular buffer in a shared memory segment. I am using two semaphores to represent the # of available slots, and the # of slots to consumer by the server (consumer). The apps follow this simple model. Producer: decrement the available_slots semaphore do something... increment the to_consume semaphore. Consumer: decrement the to_consume semaphore. do something... increment the available_slots semaphore. The problem is that when I run my test programs and watch the semaphore values, I see the available_slots semaphore continually increase? The program will run for a while if I remove the last increment in the consumer program, but will eventually fail with errno 34, Result to large. Studying the output, it does not appear to me that the value of the two semaphores ever reaches a critical point. This simple example has been almost copied line for line from two different books on this subject, both yielding the same results. I have included the source to both of my test apps. If anyone can see, or knows of something that I am just overlooking, I would very much like to hear from you. Thanks Nicholas Twerdochlib Platform info: Sun Sparc 20 dual ROSS 125Mhz CPUs 64MB RAM Solaris 7/2.7 GCC v2.8.1 Server/consumer source: ***************************************************************** #include #include #include #include #include #include union semun { int val; struct semids_ds *buf; ushort *array; }; static ushort start_val[2] = {6,0}; union semun arg; struct sembuf acquire = {0, -1, SEM_UNDO}; struct sembuf release = {0, 1, SEM_UNDO}; int main( void ) { int semid; key_t SemKey = ftok( "/tmp/loggerd.sem", 'S' ); if( (semid = semget( SemKey, 2, IPC_CREAT|0666 )) != -1 ) { arg.array = start_val; if( semctl( semid, 0, SETALL, arg ) < 0 ) { printf( "Failed to set semaphore initial states.\n" ); perror( "SEMCTL: " ); return -1; } } while( 1 ) { printf( "A Ready to consume: SEM %d Value: %d\n", 0, semctl(semid, 0, GETVAL, 0) ); printf( "A Ready to consume: SEM %d Value: %d\n", 1, semctl(semid, 1, GETVAL, 0) ); acquire.sem_num = 1; if( semop( semid, &acquire, 1 ) == -1 ) { perror( "server:main: acquire: " ); exit( 2 ); } printf( "B Ready to consume: SEM %d Value: %d\n", 0, semctl(semid, 0, GETVAL, 0) ); printf( "B Ready to consume: SEM %d Value: %d\n", 1, semctl(semid, 1, GETVAL, 0) ); /* release.sem_num = 0; if( semop( semid, &release, 1 ) == -1 ) { perror( "server:main: release: " ); exit( 2 ); } */ } } ************************************************************************** Client/producer source ************************************************************************** #include #include #include #include #include #include union semun { int val; struct semids_ds *buf; ushort *array; }; static ushort start_val[2] = {6,0}; union semun arg; struct sembuf acquire = {0, -1, SEM_UNDO}; struct sembuf release = {0, 1, SEM_UNDO}; int main( void ) { int semid; key_t SemKey = ftok( "/tmp/loggerd.sem", 'S' ); if( (semid = semget( SemKey, 2, 0)) == -1 ) { perror( "client:main: semget: " ); exit( 2 ); } printf( "A Ready to consume: SEM %d Value: %d\n", 0, semctl(semid, 0, GETVAL, 0) ); printf( "A Ready to consume: SEM %d Value: %d\n", 1, semctl(semid, 1, GETVAL, 0) ); acquire.sem_num = 0; if( semop( semid, &acquire, 1 ) == -1 ) { perror( "client:main: release: " ); exit( 2 ); } printf( "B Ready to consume: SEM %d Value: %d\n", 0, semctl(semid, 0, GETVAL, 0) ); printf( "B Ready to consume: SEM %d Value: %d\n", 1, semctl(semid, 1, GETVAL, 0) ); release.sem_num = 1; if( semop( semid, &release, 1 ) == -1 ) { perror( "client:main: acquire: " ); exit( 2 ); } } >buffer in a shared memory segment. I am using two semaphores to >represent the # of available slots, and the # of slots to consumer by >the server (consumer). >The apps follow this simple model. >Producer: > decrement the available_slots semaphore > do something... > increment the to_consume semaphore. >Consumer: > decrement the to_consume semaphore. > do something... > increment the available_slots semaphore. >struct sembuf acquire = {0, -1, SEM_UNDO}; >struct sembuf release = {0, 1, SEM_UNDO}; The error is quite simple; you shouldnt' specify SEM_UNDO for semaphores that are not incremented decremented by the same process. SEM_UNDO should be used for a single process that increments and decrements the semaphore. WHen the process is killed, the net effect of the process on the sermaphore will be NIL because of the adjust value. With SEM_UNDO, each decrement in the producer will cause the "semadj" value associated with the "available_slots" semaphore to be increased by one. When the produced exits, the semaphore will be incremented by N, not what you want in this case. Solaris also puts a bound on teh semadj value; there is no good reason for this bound, except that it catches programming errors like yours. Casper -- Expressed in this posting are my opinions. They are in no way related to opinions held by my employer, Sun Microsystems. Statements on Sun products included here are not gospel and may be fiction rather than truth. =================================TOP=============================== Q251: Draft-4 condition variables (HELP) "D. Emilio Grimaldo Tunon" wrote: > Could anybody comment on the condition variable differences > between the latest Posix standard (draft 10?) and the old > draft 4 (DCE threads?) found in HP-UX 10.20? There's no "draft 10". There was, once, a draft of the document that would become the basis of the POSIX 1003.1-1996 standard, that was labelled draft 10. That document is not the same as POSIX 1003.1-1996, and the differences are more than a matter of "formalese". Some problems were found during the editing to integrate the draft 10 text into 1003.1b-1993. In addition, the 1003.1i-1995 (corrections to the realtime specification, some of which overlapped 1003.1c text) were integrated at the same time. The standard is POSIX 1003.1-1996. There is no draft 10. Also, the implementation of "draft 4" that you'll find in HP-UX 10.20 isn't really draft 4. It was a very loose adaptation of most of the text of the draft, plus a number of extensions and other changes. I prefer to call it "DCE threads" to make it clear that it's a distinct entity. Now. There are no differences in condition variables from DCE threads to POSIX threads. However, many of the names were changed "clean up" the namespace and better reflect various persons' opinions regarding exactly what the interfaces ought to be assumed to do. One of the differences, stemming from the draft 5 addition of static initialization of synchronization objects, is that they are now "initialized" (i.e., assumed to be pre-existing storage of unspecified value) rather than "created" (where the pthread_cond_t type, et al, were assumed to be pointers or "handles" to dynamically created storage). > In particular I have run into the 'problem' that neither > pthread_condattr_init() nor pthread_mutexattr_init() seem > to be present, I did find: If you're moving between POSIX threads and DCE threads, you've got many worse problems. While much of the interface appears similar, every function (except pthread_self()) has changed in at least one incompatible way. Be very, very careful about such a move! Do not consider it "a weird variety of POSIX threads". It's not. It's "DCE threads", as different a beast from POSIX threads as is UI threads. Many of the names are similar, and they do something that's often even more similar -- but porting requires a lot more thought than you might infer from those similarities. (For example, DCE threads report errors by setting errno and returning -1, while POSIX threads takes the much more reasonable and efficient approach of returning an error code directly.) HP-UX 10.30 (or, more realistically, 11.0) has POSIX threads. Your best option is to ignore HP-UX 10.20 entirely and require 11.0. But, if you can't or won't do that, be really careful, and assume nothing. /---------------------------[ Dave Butenhof ]--------------------------\ | Compaq Computer Corporation [email protected] | | 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof | | Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----------------[ Better Living Through Concurrency ]----------------/ Try this document. Porting DCE Threads Programs to HP-UX 11.0 POSIX Threads http://info.fc.hp.com/hpux/content/d8.html You will also find the following book useful. Threadtime by Scott Norton and Mark Dipasquale. HP Press, Prentice Hall. ISBN 0-13-190067-6 Discusses about programming using POSIX threads in general and also about HP-UX specific features. Vijay =================================TOP=============================== Q252: gdb + linuxthreads + kernel 2.2.x = fixed :) After two solid days of differential testing, I found the problem that was preventing me from debugging of threads under gdb. It isn't kernel version related, but it is rather strange so I thought I would share it for the common curiosity... It appears that if you are trying to debug a program that links to libpthread.so, and the symbols for that library are not loaded, the debugging doesn't work. In my case, I was doing a "set auto-solib-add 0", to avoid wading through all the libc and other system library stuff, and/or getting messages from ddd about no source files, ending up in "space" etc. Apparently, because the symbols for libpthread weren't loaded, the debugging was not working properly. Doing a manual load on the library using "sharedlibrary libpthread" solves the problem. Threads are then detected and debuggable (?!). Does anyone know if this behavior is "by design" or "by accident" ? Thank you very much to the people who responded to my original post. regards, Paul Archard ------------- parch // get rid of the @ // comments to workfire // reveal the .com // email address! On Thu, 08 Apr 1999 19:55:24 GMT, Paul Archard wrote: >Doing a manual load on the library using "sharedlibrary libpthread" solves >the problem. Threads are then detected and debuggable (?!). > >Does anyone know if this behavior is "by design" or "by accident" ? It's probably by design. The GDB patch adds LinuxThreads debugging ability by making GDB peek at internal LinuxThreads data structures. Indeed, LinuxThreads itself had to be modified to allow the hack to work by providing some extra debugging info. Presumably, without the symbols, GDB can't find the addresses of LinuxThreads objects that it needs to access. =================================TOP=============================== Q253: Real-time input thread question On Mon, 12 Apr 1999 13:54:28 GMT, JFCyr wrote: >We want our input thread to read some device values at an exact frequency. >What is the best way? Depending on the frequency, you may need a hard real-time kernel which can schedule your thread to execute periodically with great accuracy. In such operating systems, the kernel is typically preemptible and takes care not to disable interrupts for long periods of time. >- A loop (within the input thread) working on the exact frequency with an >RTDSC test. In POSIX threads, you could use pthread_cond_timedwait to suspend the thread until a specified time. However, the accuracy of this will be restricted by the degree to which your OS supports real-time processing. >- A WM_TIMER message to the primary window thread Under windows? You can't get anything better than 10 ms resolution, if that, and it's not real time by any measure. Too many ``guru meditations''. If the frequency is faster than, say, 20-30 Hz, forget it. On Intel machines, the Windows clock tick is 100Hz; even though interfaces like WaitForSingleObject() and Sleep() have parameters expressed in milliseconds, the granularity of any timed wait is ten milliseconds. The Win32 API sucks for programming timers, too. The various sleep and wait functions express their timeout parameter as a displacement from the current time rather than as an absolute wake time. Also, there is no signal handling; you can't install a periodic interrupt timer. What you can do is poll the tick count in between thread sleeps. What you can do is sleep for shorter periods and check the current tick count. Another thing you could do, in principle, is write a device driver that performs the data acquisition at the specified time intervals. Inside the driver, chances are that you have access to more accurate timing, since you are in the kernel; also, faster access to the device. Thus you can approximate real-time processing. The driver can queue the results for collection by the application, which can take its sweet time. =================================TOP=============================== Q254: How does Solaris implement nice()? > Kelvin, > > Thanks! That helps. One related question: How does NICE work? > I mean if it just raises/lowers the LWP's priority level once, then > after a couple of quantum it would become meaningless. > > -Bil Nice is there to maintain the compatibility of Solaris to the older Unix and it works in a funny way. First, when a user set a nice value, a priority value is calculated based on this nice value using some formula. This priority value is then passed onto the kernel using priocntl, which becomes the user portion (ts_upri) of the LWP priority. The priority table that I talked about in my message contributes the system portion (of the LWP priority. Therefore, we have pri = ts_cpupri + ts_upri + ts_boost ts_boost is the boosting value for the IA class. The CPU picks the LWP with the highest pri to execute next. When a user set a nice -19 on a LWP, ts_upri is -59. Since the largest ts_cpupri in the table is 59, pri is always 0, unless it is in IA and has a boost value. If a user wants a finer control of the priority, instead of using nice, he/she can use priocntl to set ts_upri directly. Hope this help, Kelvin =================================TOP=============================== Q255: Re: destructors and pthread cancelation... Hi Bil, I noticed that you responded to a fellow indicating that the Sun C++ compiler will execute local object destructors upon pthread_exit() and also if canceled. Do you know what version of the compiler does this? As you may recall, I sent you a very long winded email last year complaining about how UNIX signal handling, C++ exception handling, and pthread cancellation don't work together. This bit of information about compiler support on pthread_exit and cancellation would help solve most of my problems. ie) if a SIGSEGV occurs, or some fatal FPE, my signal handler could simply call pthread_exit and I'd get stack based object destructors invoked for free (yay!). Do you know if these semantics of pthread_exit and cancellation will be adopted by the POSIX committee at some point???? I've also heard rumblings that there is a new PThreads standard draft... I haven't seen anything though... word of mouth... Cheers, John. John Bossom [email protected] =================================TOP=============================== Q256: A slight inaccuracy WRT OS/2 in Threads Primer From: Julien Pierre Thanks for a most excellent book. I have been doing multithreaded programming under OS/2 for about 5 years; yet I would never have thought I could learn so much from a threads book. How wrong I was! Now, that said, there is a slight inaccuracy WRT OS/2 on page 102 : there is SMP support in OS/2 version 2.11 SMP ; and OS/2 Warp Server Advanced SMP. These versions of OS/2 have special kernels modified for SMP, and not all device drivers work properly with it ; but all 32-bit OS/2 apps that I have ever tried on it worked. I have found problems with some older 16-bit OS/2 apps that didn't behave, because they were relying on an old Microsoft C runtime library which used "fast RAM" semaphores that aren't SMP safe. The problem was fixed by marking the executable as uniprocessor with a utility provided with OS/2 SMP - so that its threads would always run on the same CPU. These problems with device drivers and many 16-bit apps are probably part of the reason why IBM hasn't been shipping the SMP kernel in the regular version of OS/2 (Warp 4). Warp Server SMP does make a very nice operating system though (I run it at home on one system - see http://strange.thetaband.com/). -- -------------------------------------------------------------------- Julien Pierre http://www.madbrain.com Theta Band Software LLC http://www.thetaband.com -------------------------------------------------------------------- =================================TOP=============================== Q257: Searching for an idea Eloy, Sure... Let's see what we can think up here... How about one of these: o The majority of client/server applications are limited more by disk I/O than by CPU performance, thus Java's slower computing power is less of an issue than in other areas. (A) is this really true? (B) What configuration of threads do you need to match the performance of C programs for a simple, well-defined problem? (C) What do you need to do with java to obtain this level of performance? o One problem with Java's wait/notify architecture is that, for problems like consumer/producer, many redundant wakeups may be required in order to ensure that the RIGHT threads get woken up (by use of notifyAll()). For an optimally configured program, what is the reality of this problem? (See my article in Aug. Java report for a lenghty description of this.) o Java native threads (on Solaris 2.6, JDK 1.2) use "unbound" threads. These are *supposed* to provide adaquate LWPs for any I/O bound problem so that the programmer doesn't need to call the native methods for thr_setconcurrency(). How well does this work for "realistic" programs? (Can you find any commerical applications that face this issue?) -Bil > > I´m a spanish computer science student, searching for an idea > for my final project, before getting my degree; but as of today, I > haven´t found it. > > Can you give me any ideas? I´m interested in JAVA, especially in > multithread programming. > > If you would like to help me, please send an e-mail to: > [email protected] > Thank you very much. > Eloy Salamanca. Tlf: 954 360 392 (Spain). E-mail: > [email protected] -- =============== Bil LambdaCS.com =================================TOP=============================== Q258: Benchmark timings from "Multithreaded Programming with Pthreads" I ran some of benchmark timings from Bil Lewis's book "Multithreaded Programming with Pthreads" to get a rough idea how LinuxThreads compares with PMPthreads. I only have a uniprocessor (Intel 200 MHz MMX w/Red Hat Linux 5.1) to test with, but the results are interesting anyway. In case you have an interest in running the benchmarks yourself, I have attached the performance programs distribution that compiles on LinuxThreads and PMPthreads. You need to recompile the tools for each library. Use "make -f Makefile.linuxthreads" to build the LinuxThreads version, and "make -f Makefile.pmpthreads to build the PMPthreads version. Use "make -f Makefile.pmpthreads clean" between recompiles. The "test_timings.sh" script runs the tests. I'd be interested in the results others get. Here are the results I got: (second to complete) PMPthreads LinuxThreads ---------- ------------ lock 8.09 10.15 try lock 3.77 8.69 reader 8.91 6.24 writer 9.15 6.52 context switch (bound) 10.82 49.18 context switch (unbound) 10.82 49.17 sigmask 19.19 6.15 cancel enable 9.65 4.54 test cancel 2.06 3.94 create (bound) 1.61 44.84 create (unbound) 1.61 45.64 create process 13.25 15.08 global 4.23 4.20 getspecific 10.31 2.53 Looks like LinuxThreads pays a big price for thread creation and context switching. The raw data for these results is included in the attached file. With some verifications of the results and some commentary, this might be worth a page on the Programming Pthreads website. $cott =================================TOP=============================== Q259: Standard designs for a multithreaded applications? > Hi All, > I want to know whether there are any standard design techniques for > developing a multithreaded application. Are there any > books/documents/websites which discuss multithreaded design issues? I > am mainly interested in design issues which help different programmers > working on the same project to coordinate in developing a multithreaded > application. > Any suggestion or experience in this regard is welcome. > I am interested in designing or reverse engineering multithreaded > server applications using C and not C++ or Java. There are a great many books that cover parallel programming: algorithms, programming models, library APIs, etc. However, few cover the design and construction of parallel software. The following texts may be more relevant than most: Multithreading Programming Techniques By Prasad, Shashi Online Price: $39.95 Softcover; 410 Pages Published by McGraw-Hill Companies Date Published: 01/1997 ISBN: 0079122507 [http://www.amazon.com/exec/obidos/ASIN/0079122507/qid%3D916076814/002-8883088-6545834](https://www.amazon.com/gp/product/0079122507/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&tag=abroaview-20&camp=1789&creative=9325&linkCode=as2&creativeASIN=0079122507&linkId=6d506a6b916d990758e1bd7c923ab4e9) -------- Structured Development of Parallel Programs By Pelagatti, Susanna Online Price: $44.95 Softcover; 600 Pages Published by Taylor and Francis Date Published: 11/1997 ISBN: 0748407596 [http://www.amazon.com/exec/obidos/ASIN/0748407596/qid=916076864/sr=1-1/002-8883088-6545834](https://www.amazon.com/gp/product/0748407596/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&tag=abroaview-20&camp=1789&creative=9325&linkCode=as2&creativeASIN=0748407596&linkId=e2b118246062f2ee063973f443b88263) -------- Designing and Building Parallel Programs : Concepts and Tools for Parallel Engineering By Foster, Ian T. Online Price: $50.95 Hardcover; 600 Pages Published by Addison-Wesley Publishing Company Date Published: 12/1994 ISBN: 0201575949 [http://www.amazon.com/exec/obidos/ASIN/0201575949/o/qid=916076740/sr=2-1/002-8883088-6545834](https://www.amazon.com/gp/product/0201575949/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&tag=abroaview-20&camp=1789&creative=9325&linkCode=as2&creativeASIN=0201575949&linkId=a8182bf9fd13732b70c0c46019482c9c) -------- Foundations of Parallel Programming By Skillicorn, David Online Price: $39.95 Hardcover; Published by Cambridge University Press Date Published: 12/1994 ISBN: 0521455111 [http://www.amazon.com/exec/obidos/ASIN/0521455111/qid=916076568/sr=1-3/002-8883088-6545834](https://www.amazon.com/gp/product/0521018560/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&tag=abroaview-20&camp=1789&creative=9325&linkCode=as2&creativeASIN=0521018560&linkId=d20e75997191664e7eaaf2c814925ca1) -- Randy Crawford [email protected] [email protected] =================================TOP=============================== Q260: Threads and sockets: Stopping asynchroniously Neil-- I've found that the cleanest way to do this (with regards to portability) is to set up a unique pipe for every thread that you might want to interrupt. Instead of doing a read() in your reader thread you'd use select(): FD_ZERO(&fds); FD_SET(reader_fd, &fds); FD_SET(msgpipe_fd, &fds); ready = select(highestfd+1, &fds, NULL, NULL, NULL); if (FD_ISSET(msgpipe_fd, &fds)) { /* We've been interrupted */ .. drain the pipe .. .. handle the event gracefully .. } if (FD_ISSET(reader_fd, &fds)) { /* We've received data */ .. grok the data .. } Now, from your controlling thread, (the one which is interrupting the blocked thread) you could write 1 byte to the 'msgpipe_fd' file descriptor, which would wake that thread up from it's select(). This seems like a lot of work, but it's probably the only portable way of accomplishing this task. Trying to do this with signals is ugly and potentially unreliable. Hope this helps, -S =================================TOP=============================== Q261: Casting integers to pointers, etc. > Oh Lord! Is that true? "casting integers to pointers..." ? Who the > !@$!@$ came up with this idea that casting is allowed to change bits? > If *I* were King... 'Tis true. A cast doesn't mean "pretend these bits are of type X," it is an operator meaning "convert this type Y value to the type X representation of (approximately) the same value." For example: double trouble = 3.14; double stubble = (int)trouble; Surely the `(int)' operator is allowed to "change bits," is it not? > I do not know of any machines where this does not work, however. DEC, > Sun, HP, SGI, IBM all cast back and forth as expected. Are there any? There are certainly machines where `int' and `void*' are not even the same size, which means convertibility between `int' and `void*' cannot possibly work for all values. I believe DEC's Alpha (depending on compiler options) uses a 32-bit `int' and a 64-bit `void*'; there are also ugly rumors about various "memory models" on 80x86 machines. In any case, it's not a crippling restriction. You want to pass an `int' (or a `double' or a `struct foobar' or ...) to a thread? No problem, just a slight clumsiness: struct foobar x; pthread_create (..., func, &x); /* or `(void*)&x' if there are no * prototypes in scope */ ... void func(void *arg) { struct foobar *xptr = arg; struct foobar xcpy = *(struct foobar*)arg; /* alternative */ ---- [email protected] Eric, > 'Tis true. A cast doesn't mean "pretend these bits are of type X," > it is an operator meaning "convert this type Y value to the type X > representation of (approximately) the same value." Grumble. Oh well. > > I do not know of any machines where this does not work, however. DEC, > > Sun, HP, SGI, IBM all cast back and forth as expected. Are there any? > > There are certainly machines where `int' and `void*' are not even > the same size, which means convertibility between `int' and `void*' cannot Are there? I don't doubt that there WERE, but any today? > In any case, it's not a crippling restriction. You want to pass an > `int' (or a `double' or a `struct foobar' or ...) to a thread? No problem, > just a slight clumsiness: BIG clumsiness. It is also true that everyone of us who have written on the subject have completely ignored this little detail. Thanks for the insight. =================================TOP=============================== Q262: Thread models, scalability and performance In reference to your comment below on mutex context switching you are wrong. When thread A releases the lock, time slicing may not switch in quick enough for thread B and C to grab the lock. And thread A's execution may be quick enough to grab the lock even before the OS allows B or C to attempt to grab the lock. This scenario usually occurs on high threaded applications, where some threads seem to be starved. You can actually see this if you were using a thread profiler (real-time) such as Rational Quantify-Purify or Failsafe on AIX. In reference to data sinks and streams. Remember, large scale string objects do not perform well, and byte arrays is the choice mechanism. But streams also provide the ability to be buffered, so that the mechanism for writing to the stream performs better when sending large amounts of data from one thread to another using a stream. In my case I have a MQ channel which does messaging to the mainframe on AIX. I have a socket server which can take in requests to place them on the queue to be processed on the other side of the recieving channel. What I do is divide the socket itself into two streams on two separate threads, one talking to the client while the other is sending data onto the channel and getting data from the channel. The data (or message) is huge in size and the queue manager usually breaks it up. But after I get all the segments back, I need to reformat it slightly and send it back out on the socket. Using a stream to talk to the threads provides the fastest way to send raw string data. Remember this is an asych operation. An asych operation is faster then waiting for the queue manager to reply and then sending out the message to the user. Streams are a better design. But if you want to send simple messages then objects are easier, just send into each thread the object reference and have the threads talk to the object to bridge the threads in communication. I prefer the streams mechanism overall. I think the statement you made is missing what application example I gave, so I will recap it. I have a socket based server, where I create the master worker object first - then send the sock into it when I get the accept(). I then spawn two smaller worker threads which communicate to the socket (one reading the other writing). I have the two threads communicating via streams. The one that is reading the from the socket is getting the message that needs to be relayed to the mq channel, while the other thread is writing to the socket and getting the data from the mq channel. I use this same mechanism for another server that does database work on DB2 on AIX as well. Currently I am revamping the whole server and implementing the JGL containers into the model. Thanks Sean Bil Lewis wrote: > Sean, > > Thanks 10^6! That helps a lot. A couple of thoughts: > > > Thread A hits the method, gets a lock on the > > method and begins to write to the file system. Threads B and C are attempting to write > > to the logs too, but can not get a lock since thread A has it and is not finished. So > > threads B and C wait. Now the operating system has time slicing and places those threads > > in a suspend state. Thread A completes writing to the logs, and releases the lock. Now > > thread A does some processing. Thread B and C are still in suspend state. Meanwhile > > thread A attempts to write to the logs again. It gets a lock and does so. Meanwhile > > thread B and C come out of suspend (due to the operating system time slicing the > > threads) and they try to write to the logs but can not again. They suspend, and the > > cycle repeats over and over again. > > It better not repeat! The very next time A unlocks the mutex, A will be > context switched (it now has lower priority that B & C), and the lock > will be available. Certainly this is the behavior I (think!) I see on > Solaris. ?? > > > > > > > This surprises me. Using a stream to communicate between threads? This > > > would certainly be 10x slower than using simple wait/notify. (?!) > > > Streams are the basis for talking on the > > socket which is interprocess communication, right? You have a client on a process who is > > communicating to a remote process, the socket is the communication, but the streams off > > the socket provide the fine grain communication. > > I can certainly see that you can do this, and in IPC it makes some sense. > But I don't see it in threads. It would be both limiting and slow. (Let's > see... I have a Java program where I pump ~100 strings/sec across a socket. > Versus 10,000+ communications via synchronized methods.) ? > > Best Regards, > > -Bil =================================TOP=============================== Q263: Write threaded programs while studying Japanese! Yes, indeed! You too can now learn the subtle beauty of the Japanese language while advancing your programming skills in Pthreads! Hurry! Hurry! Hurry! I just got a copy of the Japanese translation of both Dave Butenhof's book and my own. It's great to see all this Kanji and then "MUTEX" in English. I used to live in Kenya, where this happened all the time. It was pretty funny. "Mimi, sijui engine block iko wapa." =================================TOP=============================== Q264: Catching SIGTERM - Linux v Solaris Lee, I didn't notice you declaring the signal handler. You need to have a signal handler (even tho it'll never get called!) set up for many signals in order to use sigwait(). The handler turns off the default signal action. -Bil > > I wonder if anyone could shed light on the following problem I > am having. Early in my servers execution I create a thread: > to wait on SIGTERM (and SIGINT) and shutdown the server cleanly. > The shutdown thread works as expected when compiled on Linux > (libc5 + LinuxThreads, SuSE 5.2) but it doesn't seems to catch > the signals on Solaris (only tried 2.6). The shutdown thread > is as follows: y information. =================================TOP=============================== Q265: pthread_kill() used to direct async signals to thread? Darryl, Yes, you can. But no, you don't want to. What were you using your signal handlers for? To provide some version of asynchronous processing. But now you can do that processing synchronously in another thread! (This is a *good* thing.) For backwards-compatibility you may wish to retain the basic model of sending signals to the process, but you can do that with sigwait() in one thread, blocking it out from all other threads. So look at your app carefully. It is very likely that you can make it simpler and more robust with less effort. -Bil > I'm porting a multi-process based application into a thread > environment. So I've read about the traditional signal model using > sigaction() and sigprocmask() etc, and the "new" signal model using > sigthreadmask() and sigwait() etc .... But, can't I just redirect my > old SIGABRT, SIGUSR signals (previously between processes) now to a > specific thread with pthread_kill()? Sure if someone issues a command > line kill() with a SIGUSR then that delivery will be indeterminate since > it is enabled for all the threads but with enough global state data the > handler can probably manage to ignore that one. Have I missed something > here? =================================TOP=============================== Q266: Don't create a thread per client David Snearline wrote: > > Bil Lewis wrote: > > > Nilesh, > > > > While it's certainly interesting to experiment with lots of threads, I > > don't think you want to do anything serious like this. I think you'll be much > > happier doing a select() call and creating only a few dozen threads. > > > > -Bil > > > > > > > > My applications runs on a Sun Sparc station with solaries 2.6 and I am using the > > > POSIX library. > > > > > > The application is a server and creates a thread for each connection accepted > > > from a client, > > > potentially the server is expected to handle upto 1000 connections. Therefore > > > the server is expected > > > to create upto 1000 threads. > > ilesh > > Greetings, > > I was somewhat intrigued by your comment here, and was wondering what the rationale > was behind it. I've done many servers under Solaris using an accepting thread plus a > thread per connection, and so far, I've been pretty happy with results. Then again, > this usually involves a hundred threads or so max -- not a thousand. > > Since most of the threads end up being blocked in the kernel on I/O, the only > drawback I can see are the per-thread resources of the (mostly) idle threads. > Provided that these resources are kept small, running a thousand threads shouldn't be > a problem. Is there more here that I'm missing? Oh, it's just that you're using up all this memory for the threads and you don't need to. Might as well have one thread block on 1000 fds as have 1000 threads each blocking on one. For large numbers of fds, I'd expect to see some real performance gains. -Bil =================================TOP=============================== Q267: More thoughts on RWlocks > As many of you know the first and second readers-writers problems > always starves either the readers (first) or the writers (second) .. I > learned this in my operating systems textbook..... I was reading along > anticipating the solution which would not starve either the readers or > the writers, but he then referred me to the bibliography page..... And > it wasn't much help.. Does anyone know of a solution which does not > starve either one... i'll stick my neck out on this one... If starvation is a problem then RWlocks are not the answer. RWlocks are only useful when writers are rare. That's why writer-preference makes sense. If writers are so common that they can occupy the lock for significant amounts of time, then you shouldn't be using RWlocks. Actually, if they are so common, what you should be doing is buying faster hardware! Or lowering the number of incoming requests. Sure, you can always find an exceptional case where a special version of RWlocks will give good results, but we're not trying to solve imaginary problems here. For real problems the correct answer is "Don't do that!" -Bil -- =============== Bil LambdaCS.com =================================TOP=============================== Q268: Is there a way to 'store' a reference to a Java thread? > Is there a way to 'store' a reference to a thread at a certain point and > run a command in that thread at a later point in time? Of course there is! (Just depends on what you really mean.) RunnableTask task = new RunnableTask(); Thread t = new Thread(task); ^ reference t.start() ... task.addCommandToTaskQueue(new Command()); (this puts the task on the queue and wakes up the thread if sleeping.) This may not be what you were THINKING of, but it's probably what you REALLY want. -Bil =================================TOP=============================== Q269: Java's pthread_exit() equivalent? [Simple question, I thought. LOTS of answers! For my own use and for my Java Threads book I implemented InterruptibleThread.exit(), but there is a lot of logic to insisting that the run() method be the one to simply return. -Bil] Bil Lewis writes: > Doug, > > A question for you. > > In POSIX, we have pthread_exit() to exit a thread. In Java we > *had* Thread.stop(), but now that's gone. Q: What's the best way > to accomplish this? > > I can (a) arrange for all the functions on the call stack to > return, all the way up to the top, finally returning from the > top-level function. I can (b) throw some special exception I > build for the purpose, TimeForThreadToExitException, up to the > top-level function. I can throw ThreadDeath. > > But what I really want is thread.exit(). > > Thoughts? > > -Bil > -- > ================ > Bil LambdaCS.com > > http://www.LambdaCS.com > Lambda Computer Science > 555 Bryant St. #194 > Palo Alto, CA, > 94301 > > Phone/FAX: (650) 328-8952 > Here's a real quick reply (from a slow connecction from Sydney AU (yes, visiting David among other things)). I'll send something more thorough later.... Throwing ThreadDeath yourself is a pretty good way to force current thread to exit if you are sure it is in a state where it makes sense to do this. But if you mean, how to stop other threads: This is one reason why they are extremely unlikely to actually remove Thread.stop(). The next best thing to do is to take some action that is guaranteed to cause the thread to hit a runtime exception. Possibililies range from the well-reasoned -- write a special SecurityManager that denies all resource-checked actions, to the sleazy -- like nulling out a pointer or closing a stream that you know thread needs. See http://gee.cs.oswego.edu/dl/cpj/cancel.html for a discussion of some other alternatives. -Doug Hi Bil, Here's the replies I got to your question. Peter ------------------------------------------------------------------------ --- Check out the following url's. They give a good description of the problem and implementation details for better ways to stop a thread gracefully. http://java.sun.com/products/jdk/1.2/docs/guide/misc/threadPrimitiveDeprecation. html http://gee.cs.oswego.edu/dl/cpj/cancel.html Brian --------------------------------------------------------------------- rom: Jeff Kutcher - Sun Houston Subject: Re: A threadbare question To: Peter.Vanderlinden@Eng MIME-Version: 1.0 Content-MD5: KVELBotxnHX+d34FMCMY4g== Here's a suggestion: private Thread thread = null; public void start() { if (thread == null) { thread = new Thread(this); thread.start(); } } public void stop() { thread = null; } public void run() { while (thread != null) { try { ... } catch (InterruptedException e) { thread = null; // using stop() may cause side effects if the class is extended } } } -------------------------------------------------------------------------- ------- >From Lee.Worrall@UK Tue Aug 18 09:03:12 1998 I believe the recommended way to exit a thread is have it drop out of the bottom of its run() method (probably in response to some externally triggered event). lee > Date: Tue, 18 Aug 1998 08:53:45 -0700 (PDT) > From: Peter van der Linden > Subject: A threadbare question > To: [email protected] > > A thread knowledgeable colleague asks... > > ------------------ > > In POSIX, we have pthread_exit() to exit a thread. In Java we > *had* Thread.stop(), but now that's gone. Q: What's the best way > to accomplish this? > > I can (a) arrange for all the functions on the call stack to > return, all the way up to the top, finally returning from the > top-level function. I can (b) throw some special exception I > build for the purpose, TimeForThreadToExitException, up to the > top-level function. > > But what I really want is thread.exit(). > > ----------------- > > Anyone have any ideas? > Seek, and ye shall find. Peter ------------- Begin Forwarded Message ------------- ThreadDeath is an Error (not an Exception, since app's routinely catch all Exceptions) which has just the semantics you are talking about: it is a Throwable that means "this thread should die". If you catch it (because you have cleanup to do), you are SUPPOSED to rethrow it. 1.2 only, though, I think. Thread.stop() uses it, but although stop() is deprecated, it appears that ThreadDeath is not. I think. :^) Nicholas > I was feeling much more sure of myself before you asked the question. > Now I need to think. There are certainly situations where you do wanr > to exit a thread. The options would seem to be a Thread.exit() method, > or an explicit throw of an exception. What else? (You sure wouldn't > want to have "special values" to return from functions to get to the > run() method.) > > If Java had named code blocks, you could do a direct goto. But I > don't see why that would be good. Not in general. > > I don't see any logic for insisting on having an explicit exception. > (Is there?) You mean as in throwing an exception in another thread to indicate termination status of the dying thread? If so: no, there isn't, although it is always possible to hand-craft this kind of effect. > > No, I think that a Thread.exit() method defined to throw ThreadDeath > is the way to go. In which case, there is no real need for a method; just `throw new ThreadDeath()' would do. When thread.stop() was in the process of being deprecated I argued that there should be a Thread.cancel() method that is defined as setCancelledBit(); interrupt() along with a method isCancelled(), and an associated bit in the Thread class. The idea is that interrupts can be cleared, but the cancel bit is sticky, so reliably indicates that a thread is being asked to shut down. But apparently some people (I think database folks) really want the freedom to do retries -- in which case they must clear interrupts, catch ThreadDeaths, and so on, and don't want anything standing in the way of this. > (PH is talking to me about modifying my PThreads book into Java. I'm > sort of mixed on the idea.) I think it would be great to have something a lot better than Oaks and Wong as the `lighter', gentler, more traditionally MT-flavored alternative to my CPJ book. I think you could do a lot of good in helping people write better code. (Tom Cargill has been threatening to write such a book for years, but I don't think he will.) (People would then complain that your book is insufficently OO, making a perfect complement to complaints that my book is insufficiently MT :-) BTW, Have you seen my util.concurrent package? (see http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent). I'd be very interested in your reactions. I'm trying to standardize some of the more common utility classes people use in concurrent programming. Doug, > > If Java had named code blocks, you could do a direct goto. But I > > don't see why that would be good. Not in general. > > > > I don't see any logic for insisting on having an explicit exception. > > (Is there?) > > You mean as in throwing an exception in another thread to indicate > termination status of the dying thread? If so: no, there isn't, > although it is always possible to hand-craft this kind of effect. No. "Named blocks" isn't a familiar term? It's just a clean way of doing longjmp(). Java *should * have it. > > No, I think that a Thread.exit() method defined to throw ThreadDeath > > is the way to go. > > In which case, there is no real need for a method; just `throw new > ThreadDeath()' would do. > > When thread.stop() was in the process of being deprecated I argued > that there should be a Thread.cancel() method that is defined as > > setCancelledBit(); > interrupt() > > along with a method isCancelled(), and an associated bit in the Thread > class. The idea is that interrupts can be cleared, but the cancel bit > is sticky, so reliably indicates that a thread is being asked to shut > down. But apparently some people (I think database folks) really want > the freedom to do retries -- in which case they must clear interrupts, > catch ThreadDeaths, and so on, and don't want anything standing in the > way of this. I was a bit leary on interrupts until I looked at them more closely. I think now that they're pretty reasonable. So the last remaining question for me is: "Should I do an explicit throw? Or just call stop() anyway?" (I don't want to write my own subclass BilsThread that implements a java_exit() method.) > > (PH is talking to me about modifying my PThreads book into Java. I'm > > sort of mixed on the idea.) > > I think it would be great to have something a lot better than Oaks and > Wong as the `lighter', gentler, more traditionally MT-flavored > alternative to my CPJ book. I think you could do a lot of good in > helping people write better code. (Tom Cargill has been threatening to > write such a book for years, but I don't think he will.) > > (People would then complain that your book is insufficently OO, making > a perfect complement to complaints that my book is insufficiently MT :-) Touche'! > BTW, Have you seen my util.concurrent package? (see > http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent). > I'd be very interested in your reactions. I'm trying to standardize > some of the more common utility classes people use in concurrent > programming. As soon as I get back from Utah... Doug Lea wrote: > > > > But I would only do this if for some reason using interrupt() had to > > > be ruled out. > > > > ? interrupt() is unrelated. I assume my thread has aready gotten the > > interrupt and has decided to exit. I've got to check with one of the > > Java guys, just to get their story on it. (I'm surprised this has been > > asked 6k times already. Something's odd...) > > I think interrupt IS related. It seems best to propagate the > interrupt all the way back the call chain in case you have something > in the middle of the call chain that also needs to do something > important upon interruption. Don't you think? I was thinking in terms of situations were you KNOW your data is consistant and you've determined that it's time to exit and there's nothing else to do. An event which does occur... Often? Sometimes? Only in programs that I write?? But I see your point. -Bil Hi Bil, Just a comment on the stop(), ThreadDeath issue in Java. The comment you include from Nicholas is inaccurate and misleading. "ThreadDeath is an Error (not an Exception, since app's routinely catch all Exceptions) which has just the semantics you are talking about: it is a Throwable that means "this thread should die". If you catch it (because you have cleanup to do), you are SUPPOSED to rethrow it. 1.2 only, though, I think. Thread.stop() uses it, but although stop() is deprecated, it appears that ThreadDeath is not." Yes ThreadDeath is derived from Error. The term "exception" means anything that can be thrown. "exceptions" which are derived from Exception are checked "exceptions" and must be caught by the caller or delcared in the throws clause of the caller. But this is not important. There is *nothing* special about a ThreadDeath object. It does not mean "this thread should die" but rather it indicates that "this thread has been asked to die". The only reason it "should" be rethrown is that if you don't then the thread doesn't actually terminate. This has always been documented as such and is not specific to 1.2. If a thread decides that for some reason it can continue with its work then it can simply throw new ThreadDeath() rather than calling stop() on itself. The only difference is that with stop() the Thread is immediately marked as no longer alive - which is a bug in itself. Cheers, David Doug Lea wrote: > > > I see your point. Still seems ugly to me though. (Now, if *I* were > > king...) > > I'd be interested in your thoughts about this, or what you would like > to see. I used to think I knew what would be better, but I am not so > sure any more. > > -Doug I was feeling much more sure of myself before you asked the question. Now I need to think. There are certainly situations where you do want to exit a thread. The options would seem to be a Thread.exit() method, or an explicit throw of an exception. What else? (You sure wouldn't want to have "special values" to return from functions to get to the run() method.) If Java had named code blocks, you could do a direct goto. But I don't see why that would be good. Not in general. I don't see any logic for insisting on having an explicit exception. (Is there?) There's plenty to be said about how to ensure consistant data in such situations. But I don't think that has to determine the exit method. No, I think that a Thread.exit() method defined to throw ThreadDeath is the way to go. -Bil (PH is talking to me about modifying my PThreads book into Java. I'm sort of mixed on the idea.) -- =============== Bil LambdaCS.com =================================TOP=============================== Q270: What is a "Thread Pool"? > So I want to allocate a pool of threads and while the program > is executing I want to use the threads to service all the different > modules in the program. This means that there are going to be > times where I want to change the addresses of procedures that > threads are using. > > So I create the thread pool with all threads have NULL function > pointers and all threads created are put to sleep. > Some time later different modules want to be serviced so I look and > see if a thread is availible and if one is then I assign the function > to that thread and make the thread active which starts execution > of the assign function. After the thread finishes it would be put to > sleep...and avalible for use by another module.... > > is this possible or have I been smoking too much crack? Rex asks a question which we've seen here a dozen times. It's a reasonable question and kinda-sorta the right idea for a solution, but the angle, the conceptual approach, the metaphor is wrong. The term "thread pool" conjures up a temp agency where you wake up typists when you need them and give them a job to do. This is a lousy way to think of programs. You shouldn't be thinking about "giving the threads work to do". You should be thinking about "announcing that there is work to do" and letting threads pick up that work when they are ready. The Producer/Consumer model is the way to go here. A request comes in off the net, the producer puts it on a queue, and a consumer takes it off that queue and processes it. Consumer threads block when there's nothing to do, and they wake up and work when jobs come along. Some will argue that "Thread Pool" is the same thing. Yes, but. We've seen SO many questions about "stopping and starting" threads in a pool, "giving a job" to a specific thread etc. People try to implement something along these lines and totally mess up. Read my book. Read Dave's book. Read pretty much any of the books. We all say (more or less) the same thing. So, don't think "Thread Pool", think "Producer/Consumer". You'll be happier. A good example of a Producer/Consumer problem can be found in the code examples on www.LambdaCS.com. -Bil =================================TOP=============================== Q271: Where did "Thread" come from? I just picked up on your post to comp.programming.threads, and I noticed your (?) concerning the term "thread." I first heard this term used in the late '60 in a commercial IBM S/360 shop here in Dallas, Tx. One of the "heavy weights" (Jim Broyles, still at BC/BS of TX) was writing/developing a general purpose TP monitor: the "host" was a 360-40 running DOS, (DOS supported 3 application "partitions": BG, FG1, FG2): the lines/controllers/terminals managed were IBM 2260 (or "look-alikes".) I do not know how many threads Jim's TP monitor used, but this system was used a BC for almost 10 years. The system was written in assembler. All of this was "pre" CICS, TSO, etc. Jim Broyles went on to become manager of System Programming for BC/BS .. I worked for him for maybe 5-6 years in the mid '70's. Support for application threading in S/360 DOS was likely pretty "limited", but big "OZ" ... S/360 OS-MFT/MVT, SVS, MVS provided good facilities for multi-programming, and, IBM was pushing MP and AP (smp) systems. We had a S/370 AP system installed when I left BC/BS (1979). Net/net, the term has "been around a while." =================================TOP=============================== Q272: Now do I create threads in a Solaris driver? Kernel space is a different beast from user space threading and I don't deal with that myself. BUT I know that Solaris kernel threads are very similar to POSIX threads. You can run them from device drivers. The DDI should have the interface for threads, but like I said, I've never paid it much attention. I would think that a call to your support line should tell you where to look. > > Hi, I found your threads FAQ web page and wondered if you'd mind answering > a question for me. I'm writing a miscellaneous driver for Solaris (that is > it isn't tied to hardware) and would like to know how to create my own > threads in kernel space. At first glance, there appears to be no support > for this through the usual DDI/DDK means. Is this the truth ? Is there > a way around this ? Or is the best way to fake it by doing something like > using a soft interrupt or timeout to start a function that never returns ? > > Darren =================================TOP=============================== Q273: Synchronous signal behavior inconsistant? Antonio, Yes, it *seems* weird, but it's not. (Well, maybe it is still weird, but at least there's some logic to it.) If a program accesses unmapped memory, it will trap into the kernel, which will say to itself something like "What a stupid programmer!" and then arrange for a SIGSEGV for that program. Basically it will pick up the program counter right then and there and move it to the signal handler (if any). That's how synchronous signals work. If you send a signal, any signal, to the process yourself, that will be an asynchronous signal. EVEN if it's SIGSEGV or SIGBUS. And the sigwaiter will then be able to receive it. -Bil > So, I guess things are not working quite right in that sometimes a > blocked signal is not delivered to the - only - thread which is waiting > for it. > I coded an example in which SIGBUS is blocked and a thread is on > sigwait. I arranged the code so that SIGBUS is "internally" generated, > i.e. I coded a thread that is causing it on purpose. The process goes > into a spin. > If I kill the process with -10 from another shell, the result is as > expected (the thread on sigwait catches it). > I find that a little weird. > > Thanks for your suggestions, > Antonio > > Sent via Deja.com http://www.deja.com/ > Before you buy. -- =================================TOP=============================== Q274: Making FORTRAN libraries thread-safe? "James D. Clippard" wrote: > I have a need to use several libraries originally written in FORTRAN as part > of a numerically intensive multithreaded application. The libraries are > currently "wrapped" with a C/C++ interface. > > ----- > My question is: How might one safely accomplish such a task, given FORTRAN's > non-reentrant static memory layout? > ----- The answer is really "it depends". Firstly, with multi-threading you are going beyond the bounds of what is possible in standard C/C++, so any solution is by definition system dependent. I'm not sure off hand if any version of FORTRAN (eg 90 or 95) has standardised support for threading, but somehow doubt it. F77 never had any standardised support for multi-threading. Second, "FORTRAN's non-reentrant static memory layout" is not strictly true. It is definitely not true with F90 or F95. With F77 (and before) things are a little ambiguous --- eg lots of vendor specific extensions --- so you will need to look at documentation for your compiler, or try a couple of test cases like CALL TEST CALL TEST END SUBROUTINE TEST INTEGER I WRITE (*,*) i i = i+1 RETURN END to see what happens. I recall (from F77 days) some keywords like AUTO and SAVE that control whether a variable is static or auto. I don't know how widespread they were (or whether or not they were standard), as my coding practice rarely relied on that sort of thing. If your FORTRAN code uses things like common blocks, then you essentially have a set of static variables that you need to control access to. Much the same as you would need for accessing static variables in C/C++. In general, you are probably safest using some of the following schemes. None of these are really specific to FORTRAN. 1) Provide a set of wrapper functions in C/C++. Have the wrapper functions use mutex's or similar to prevent multiple threads invoking particular sets of FORTRAN functions. For example, if FORTRAN SUBROUTINE A calls B calls C, and you have a wrapper for each, ensure that a call to C prevents a call to A on another thread UNLESS you know that all variables in C are auto. 2) Control access to common blocks, as hinted above. > > > BTW, one of libraries I am using is SLATEC. Given that all NRL's base much > of their research code on SLATEC, I suspect that someone has elegantly > surmounted this problem. > =================================TOP=============================== Q275: What is the wakeup order for sleeping threads? Raghu Angadi wrote: > A. Hirche wrote: > > > > Is it > > (a) the first thread in the queue (assuming there is an ordered list of > > waiting threads) > > (b) any thread (nondeterministic choice) > > (c) a thread chosen by some other scheme > > Threads are queued in priority order. > > So the thread with the maximum priority will get the mutex. > > If there more than one threads with max priority, then it is > implementation dependant. Not quite! Actually, POSIX places mutex (and condition variable) wakeup ordering requirements only when: 1. The implementation supports the _POSIX_THREAD_PRIORITY_SCHEDULING option. 2. The threads waiting are scheduled using the SCHED_FIFO or SCHED_RR policies defined by POSIX. If these conditions are true, then POSIX requires that threads be awakened in priority order. Multiple threads of identical priority must be awakened "first in first out". The wakeup order for threads that don't use SCHED_FIFO or SCHED_RR, (e.g., the default SCHED_OTHER on many UNIX systems, which behaves more like traditional UNIX timeshare scheduling), the wakeup order is implementation defined. Most likely it's still priority ordered, but it need not be, and there's no definition of how they may interact with POSIX policies. And, in any case, except on a uniprocessor, saying that "the highest priority thread gets awakened" is not the same thing as "the highest priority thread gets the mutex". Some other thread on another processor might like the thread first, and the high priority thread will just go back to sleep. This can happen even on a uniprocessor if the thread that unlocked the mutex has a priority equal to that of the highest priority waiter, (it won't be preempted), and it locks the mutex again before the waiter can run. /---------------------------[ Dave Butenhof ]--------------------------\ =================================TOP=============================== Q276: Upcalls in VMS? Eugene Zharkov wrote: > I am somewhat confused by the OpenVMS Linker documentation, by the part > whcih describes the /threads_enable qualifier. Here is an extract from > there: > > [...] > > What confuses me is the following. A section about two-level scheduling > and upcalls in the Guide to DECthreads explicitly states that "this > section applies to OpenVMS ALPHA only". The above description of the > MULTIPLE_KERNEL_THREADS option states that the option is applicable only > to ALPHA systems. The above description of the UPCALLS options does not > mention a system it applies to. Does that means that the upcalls > mechanism is actually implemented on OpenVMS VAX? (Apologies to Dan Sugalski for apparently ignoring his answers, but since he wasn't completely sure, I figured it was best to go back to the beginning for a definitive resolution.) Despite the cleverly leading hint in the documentation, you should assume that neither upcalls nor kernel threads exist, nor will they ever exist, on OpenVMS VAX. While most of the infrastructure for upcalls has been implemented, there were some "issues" that were never resolved due to lack of resources, and it has been officially deemed low priority. Nevertheless, it is theoretically possible that, given enough signs of interest, the implementation of upcalls on OpenVMS VAX could be completed. There will never be kernel threads on OpenVMS VAX. (We all know that one should never say "never".) /---------------------------[ Dave Butenhof ]--------------------------\ =================================TOP=============================== Q277: How to design synchronization variables? "Kostas Kostiadis" writes: > Are there any rules or techniques to build and test > synchronisation protocols, or is it a "do what you think > will work best" thing? Look up the following paper: Paper Title: ``Selecting Locking Primitives for Parallel Programming'' Paper Author: Paul E. McKenny Where Published: Communications of the ACM, Vol 39, No 10, 75--82, October 1996. It is exactly what you need. In the paper, McKenny describes a pattern language that helps in the selection of synchronization primitives for parallel programs ... cheers, Ramanan =================================TOP=============================== Q278: Thread local storage in DLL? > I've written some memory allocation routines that may work when my DLL is called > from multiple threads. The routines use thread local storage to store a table of the > memory objects that have been allocated. I'm concerned that the code will not work > properly when the dll is loaded explicitly using LoadLibrary. > > Has anyone experienced that problem? > Is there a simple solution for a DLL(Win32)? > Can I allocate memory somehow in my main dllentry point routine? > > Do I need to put a mutex around the calls to malloc to ensure that the code is > thread-safe? > > You can email me at [email protected] > Rob Microsoft has explicitly stated that what you are doing will not work when your DLL is loaded with LoadLibrary. The correct solution is to explicitly use the Tls* functions. Read up on TlsAlloc, TlsFree, TlsGetValue, and TlsSetValue. Of course, you can always roll your own by using GetCurrentThreadId() to get an index into a sparse array protected with a CRITICAL_SECTION. =================================TOP=============================== Q279: How can I tell what version of linux threads I've got? >>>>> "Phil" == Phil McRevis writes: Phil> How can I tell what version of linux threads I've got on my Phil> system? I browsed through the debian bug database and Phil> didn't find anything with "threads" in the list of packages Phil> or even what version of pthreads is included in the debian Phil> distribution. executing glibc gives you useful information, as in: levanti@meta:~$ /lib/libc-2.1.2.so GNU C Library stable release version 2.1.2, by Roland McGrath et al. Copyright (C) 1992, 93, 94, 95, 96, 97, 98, 99 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 2.95.2 19991109 (Debian GNU/Linux). Compiled on a Linux 2.2.12 system on 1999-12-25. Available extensions: GNU libio by Per Bothner crypt add-on version 2.1 by Michael Glad and others linuxthreads-0.8 by Xavier Leroy BIND-4.9.7-REL NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk NSS V1 modules 2.0.2 libthread_db work sponsored by Alpha Processor Inc Report bugs using the `glibcbug' script to . Phil> What version of linux pthreads is considered to be the most Phil> stable and bug-free? No idea. Hopefully the most recent version. Phil> If I need to upgrade the pthreads package on debian, what's Phil> involved in doing that? apt-get update ; apt-get upgrade =================================TOP=============================== Q280: C++ exceptions in a POSIX multithreaded application? On Sun, 16 Jan 2000 21:38:29 -0800, Bil Lewis wrote: >Jasper Spit wrote: >> >> Hi, >> >> Is it possible to use c++ exceptions in a POSIX multithreaded application, >> without problems ? > >No. Using C++ exceptions is always a problem. (ho, ho, ho). > >But seriously... the interaction between exceptions & Pthreads >is not defined in the spec. Individual C++ compilers do (or don't) >implement them correctly in MT code. EG, Sun's 1993 C++ compiler >did it wrong, Sun's current C++ compiler does it right. Could you expand on that? What does it mean for a C++ to do it right? If we can put together a set of requirements to patch POSIX thread cancellation and C++ together, I can hack something up for Linux. The questions are: - What exception is the cancellation request turned into in the target thread? What is the exception's type? What header should it be defined in? - Upon catching the exception, what steps does the target thread take to terminate itself? Just re-enter the threads code by calling pthread_exit()? - Are the handlers for unhandled and unexpected exceptions global or thread specific? - Do unhandled cancellation exceptions terminate the entire process? - By what interface does the process arrange for cancellations to turn into C++ exceptions? - What is the interaction between POSIX cleanup handlers and exception handling? Do the handlers get executed first and then exception processing takes place? Or are they somehow nested together? - Does POSIX cleanup handling play any role in converting cancellation to a C++ exception? In article , [email protected] suggested: >On Thu, 16 Dec 1999 22:00:37 -0500, John D. Hickin >wrote: >>David Butenhof wrote: >> >>> by. It would still be wrong. You need to use 'extern "C"' to ensure that the >>> C++ compiler will generate a function with C calling linkage. >>> >> >>Also this: >> >>extern "C" void* threafFunc( void* arg ) { >> try { >> ... >> } >> catch( ... ) { >> return static_cast (1); // say; >> } >> return 0; >>} >> >>It is manifestly unsafe to let a C++ exception unwind the stack of a >>function compiled by the C compiler (in this case, the function that >>invokes your thread function). > >To clarify; what you appear to be saying is that it's a bad idea to allow >unhandled exceptions to percolate out of a thread function. Actually, I think what he's saying it stronger than that, and I'd like to clarify it, since I'm finally updating a lot of my C++/DCE code to use C++ exceptions. He's saying not to *return* from inside a try or catch block, since it will force the C++-compiled code to unwind the stack past the C++ boundary and back into the C code. Personally, one of the things that's kept me from using exception handling where I could avoid it was that I couldn't find a definitive answer as to whether it's safe and efficient to return like that. According to Stroustrup, it is, but this points out that it can be tricky in mixed environments. -------- Scott Cantor If houses were built the way software [email protected] is built, the first woodpecker would Univ Tech Services bring down civilization. The Ohio State Univ - Anon. =================================TOP=============================== Q281: Problems with Solaris pthread_cond_timedwait()? In article <[email protected]>, John Garate wrote: > I can't explain why, but I can say that if you call > pthread_cond_timedwait() with a timeout > less than 10ms in the future that you'll get return code zero. Since I > call it in a loop, the > loop spins until finally the timeout time is in the past and > pthread_cond_timedwait() returns > ETIMEDOUT. This happens for me on Solaris 2.6. If you call it with a > timeout greater than 10ms in the future, it'll return ETIMEDOUT after > waiting awhile, but it does so slightly BEFORE the requested time, which > conflicts with the man-page. This has nothing to do with spurious wakeups from pthread_cond_wait(). It is just a bug in Solaris 2.6 (and Solaris 7 and anytime before): BugID: 4188573 Synopsis: the lwp_cond_wait system call is broken at small timeout values This bug was fixed in Solaris 8 and is being patched back to Solaris 7. There are no plans for patching it back to Solaris 2.6 and beyond. True, the ETIMEDOUT timeout occurs up to a clock tick (10ms) before the requested time. This is also a bug, but has not been fixed. Of course, expecting time granularity beter than a clock tick is not a reasonable expectation. Roger Faulkner =================================TOP=============================== Q282: Benefits of threading on uni-processor PC? >Can someone please tell me what the benefits >of threading are when the potential environment >is a main-stream uni-processor PC? The benefits are that you can interrupt the execution of some low priority task to quickly respond to something more important that needs immediate attention. That is the real time event processing benefit. Another benefit is that your program can do something while it is waiting for the completion of I/O. For example, if one of your threads hits a page fault, your program can nevertheless continue computing something else using another thread. Those are the two main benefits: decrease the overall running time by overlapping input/output operations with computation, and to control the response times to events through scheduling, prioritizing and preemption. The secondary benefit is that some problems are easy to express using concurrency which leads to elegant designs. >Concurrent execution and parallel execution are >2 different things. Adding the overhead that you >get by using multiple threads, looks like a decrease >in performance... That depends. Sometimes it is acceptable to eat the overhead. If you have to respond to an event *now*, it may be acceptable to swallow a predictably long context switch in order to begin that processing. >What is all this business about "better utilisation of >resources" even on uni-processor hardware? Multitasking was invented for this reason. If you run jobs on a machine in a serial fashion, it will squander computing resources, by keeping the processor idle while waiting for input and output to complete. This is how computers were initially used, until it was realized that by running mixes of concurrent jobs, the computer could be better utilized. >All the above is based on NON-network based >applications. How does this change when you application >is related with I/O operations on sockets? In a networked server application, you have requests arriving from multiple users. This is just a modern variant of programmers lining up at the data centre window to submit punched cards. If the jobs are run one by one, you waste the resources of the machine. Moreover, even if the resources of the machine are not significantly wasted, when some programmer submits a very large processing job, everyone behind has to wait for that big job to complete, even if they just have little jobs. Moreover, they have to wait even if their jobs are more important; there is no way to interrupt the big job to run these more important jobs, and then resume the big job. The same observations still hold true of in a networked server. If you handle all of the requests serially, you don't make good use of the resources. You don't juggle enough concurrent I/O requests to keep the available peripherals busy, and idle the processor. Moreover, if a big request comes in that takes a long time to complete, the processing of additional requests grinds to a halt. =================================TOP=============================== Q283: What if two threads attempt to join the same thread? On Fri, 18 Feb 2000 22:45:17 GMT, Jason Nye wrote: >Hello, all > >If a thread, say tX, is running (joinable) and both thread tY and tZ attempt >to join it, what is the correct behaviour of pthread_join: Any behavior is correct, because the behavior is undefined. A thread may be joined by only one other thread. Ammong acceptable behaviors would be that of your program terminating with a diagnostic message, or behaving unpredictably. =================================TOP=============================== Q284: Questions with regards to Linux OS? > I have some basic questions with regards to Linux OS > 1) What types of threads (kernel/user space) and Bottom-Handler can >exist inside a task-list?? Both kernel and user space threads can go into a *wait queue*. A *task queue*, though unfortunately named, is something else. A task queue basically has lists of callbacks that are called at various times. These are not threads. > 2) Can I add a user space thread to a task-list? You cannot add threads to task queues. You can use a wait queue to block a thread. This is done by adding a thread to the wait queue, changing its state to something like TASK_INTERRUPTIBLE (interruptible sleep) and calling schedule() or schedule_timeout(). > 3) I would like to change a thread's priority within a task-list >from a bottom handler. How can I do it? With great difficulty, I suspect. You might be better off just having the thread adjust its priority just before sleeping on that queue. =================================TOP=============================== Q285: I need to create about 5000 threads? Efremov Stanislav wrote: > > I need to create about 5000 threads simultaneously (it'll be a powerful > server on NT machine, each client invokes it's own thread) > > Did anybody write programs like this? Is it a good idea to invoke a thread > for each connection? No, it's really bad. > I should write this program on Java. Can you say also, can it be implemented > with so many threads? Not likely. > I'm really appreciate your opinion. Thanks in advance. Use a more rational design approach. For NT, completion ports would be a good idea. DS =================================TOP=============================== Q286: Can I catch an exception thrown by a slave thread? Jan Koehnlein wrote: > > Hi, > > does anyone know if it's possible to catch an exception thrown by a > slave thread in the master thread using C++? > Yes. But you need some extra infrastructure. You can do this with RogueWave Threads.h++ using what is called an IOU; I believe also that the ACE Toolkit may implement something similar that is called a future (but I havn't looked into that aspect of ACE). Basically an IOU is a placeholder for a result that is computed asynchronously. To get the result you redeem the IOU. Then you may: 1) get the result that was previously computed, 2) block, if the result isn't yet available. 3) see an exception raised, if the async thread threw one. The implementation catches the exception in the async thread and copies it into a location where the IOU can see it. On redemption it is thrown. Regards, John. =================================TOP=============================== Q287: _beginthread() versus CreateThread()? In article <38be0349$0$18568@proctor>, lee wrote: % 1. Why should I use _beginthread() instead of CreateThread() when using % the c runtime libs ? Because the people who wrote the library said so. The documented effects of not using _beginthread() is that you can have per-thread memory leaks, but I always like to think that the next release will have some catastrophic problem if you don't do things their way. % 2. What can i use the saved thread_id for ? (as opposed to using the % handle to the thread) Some functions take a thread id (postthreadmessage comes to mind), so sometimes you need that. I like to close handles as soon as possible, so I don't have to keep track of them. As I recall, you were taking some steps to set up a variable to hold the thread ID, but you weren't setting it up correctly. That would be the only reason I mentioned it. If you want to pass NULL, then just pass NULL. -- Patrick TJ McPhee East York Canada [email protected] >thanks - still got a few questions though >... (forgive me if these questions are stupid - still very much a newbie) >1. Why should I use _beginthread() instead of CreateThread() when using >the c runtime libs ? First of all, you should not use _beginthread() but _beginthreadex(). The _beginthread() function is completely brain-damaged and should not beused. The answer to your question is somewhat involved. Some standard C functions have interface semantics that are inherently non-reentrant, and require thread local storage in order to work reasonably in a treaded environment. The _beginthread() function is a wrapper around CreateThread which diverts the newly created thread to a startup function within the C library. This startup function allocates thread local resources before calling your start function. More importantly, the function cleans up these thread local resources when your start function returns. If you use CreateThread, control is not diverted through the special start function, so that if your thread also uses the standard library, causing it to acquire thread local storage, that storage will not be cleaned up when the thread terminates, resulting in a storage leak. At the heart of the problem is Microsoft's brain damaged interface for managing thread local storage, which doesn't permit a destructor function to be associated with a thread local object. Thus if a library needs to be notified of a terminating thread so it can clean up its thread local resources, it needs to either provide its own thread creating function that the client must use for all threads that enter that library; or it must provide explicit thread attach and detach functions (like COM's CoInitialize and CoUninitialize); or it must be dynamically linked and use the thread destruction notifications passed through DllMain. A related problem is that Microsoft does not view the standard C library as being an integral component of the Win32 system interface, but it is rather an add on for Visual C. Thus the Win32 core is not ``aware'' of the C library. >2. What can i use the saved thread_id for ? (as opposed to using the >handle to the thread) The handle is much more important; for one thing, it lets you wait on the thread termination. The _beginthread function calls CreateThread internally and then immediately closes the handle. The _beginthreadex function casts the handle to an unsigned long and returns it. =================================TOP=============== XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX =================================TOP=============================== Q288: Is there a select() call in Java?? Not in 1.2. Maybe in the future? > Yes, I was just looking at that JSR. It isn't clear if that > includes select/poll or not. I submitted a (pretty idiotic) comment on the JSR, and got a very polite reply from Mark Reinhold that, while it wasn't heavy on detail, was enough to convince me, when I though about it, that they've got a pretty neat design that can efficiently subsume select/poll, asynch I/O, SIGIO et al, and does so much more tidily than anything that I had dreamed up. > Do you have any pointers to poll() & Java? The Developers Guide (PostScript document) that comes with JDK 1.2.1_04 (for all I know, possibly other versions too, e.g. 1.2.1_03 or 1.2.2_05) talks about it, and one of the four packages in that release is SUNWj2dem, which contains the source code (Java and C) for the demo poller code. Be warned that it is Solaris-specific and the antithesis of pure Java... Hi Bil, Hope you enjoyed the tip, I really enjoyed your book. After spending a fair amount of time playing with InterruptedException, I view interruption as just another kind of signal. I almost never use it for interruption per se, but I have wondered about using it as a "notify with memory," so that even if the thread isn't waiting right now it can still get the message. Are you involved with agitating for select() and better-defined InterruptedIOInterruptions in a future version of Java? I'll sign the petition. :-) -Stu http://staff.develop.com/halloway =================================TOP=============================== Q289: Comment on use of VOLATILE in the JLS.? >It is my opinion and my experience that inclusion >of VOLATILE in Java has lead to nothing but confusion >and bad coding practices. *Personally*, I completely agree. >ANYWAY, I think it would be of some value to include a >statement of where VOLATILE is and isn't useful and >examples of same. In particular, that VOLATILE is almost >always the WRONG thing to use, and that programmers >should avoid it unless they have a VERY clear understanding >of the consequences. We might be able to squeeze something in. However, bear in mind that the primary purpose of the JLS is to specify the language semantics, not to teach people how to use it. [Some time after this exchange, the issue of the description of the memory model required by Java popped up led by Lea & Pugh. The gist of this is that a pile of details will be fixed AND VOLATILE will be given more adaquate semantics, making it POSSIBLE to use correctly. It will still be *VERY* difficult and should still be avoided.] ********************************************* Gilad Bracha Computational Theologist Sun Java Software http://java.sun.com/people/gbracha/ =================================TOP=============================== Q290: Should I try to avoid GC by pooling objects myself?? [From a discussion in Java Report] Dear Dwight, Dr. Kolawa, In his article, Dr. Kolawa made a number of good points, but he also said one thing that drives me crazy. I've seen different people give different versions of this time and time again and it's just not good. We'll call this "Fear of Garbage." This is the preoccupation that so many C and C++ programmers have with memory usage. In those languages, this concern is well- founded, but with garbage collectors this concern becomes moot. Dr. K suggests setting a variable to NULL to allow the GC to collect garbage earlier. While it is certainly true that eliminating one pointer to an object will make it more likely to be collected earlier, it's the wrong thing to do. The whole idea of GCs is that you DON'T spend your time worrying about temporary usage of memory. Yes, you may indeed increase the footprint of your program a bit by not worrying about every reference, and you can always invent a worst case such as in his article, but normal programming practices will avoid even these. If he had written his example like this: displayMomentarily(makeSplashImage()); that monster splash image would have been GC'd as it went out of scope naturally. Now it is possible to write programs that stuff more and more data onto a list, data that will never be used again. And that IS a memory leak and you do have to avoid doing that. Infinite history lists for example. But that's a different problem. Dr. K is referring to an issue concerning singletons which just isn't a problem. In brief, "If it ain't broke, don't fix it." -Bil -- ================ =================================TOP=============================== Q291: Does thr_X return errno values? What's errno set to??? > > Some of the thr_X() man pages seem to say the functions return the errno > value. > > Is this really correct (instead of returning, say, -1 and then setting > errno)? > > If it is correct, is errno also set correctly? > Yes, that is correct. They return the error value. And errno is "set correctly" -- meaning that it has no defined value because it isn't involved in this API. NB: As a side-effect of some functions, errno WILL have a value on some systems. But it ain't defined, so don't use it. After responding to your message I went to your website. I am worring that if this is the message you are giving your students you are doing them disservice. They will never be able to write real applications in java. For the classes I teach in threading (both POSIX and Java), I found it useful to have some example programs for the students to build upon. One of these programs is a server program which accepts requests on sockets. I have just finished polishing up the robust, select() version of the server (both POSIX and Java) and would love to have folks take a look at it. -Bil POSIX There are four server programs, each accepts clients on a port and handles any number of requests (each request is a byte string of 70 characters). > There is the simple server, which is single-threaded. > There is the master/slave server, which is multithreaded and spawns one thread to handle each request. > There is the producer/consumer server, which is multithreaded, but spawns off a new thread to receive requests from each new client (replies are done by the pool of consumer threads). > Finally, there is the select server, which is multithreaded and which has only a single thread doing all of the receiving. The producer thread does a select() on all current clients AND the port AND an "interruptor" pipe. When select() returns: o Requests from clients go onto a queue for the consumer threads to handle. o New connections to the port are accept()'d and new clients are created. o Finally, if it is "shutdown" time, a message is sent on the interruptor pipe and everyone finishes up and stops. This program handles 1k clients, survives client failure, and reliably shuts down. (At least it works when *I* test it!) All these programs have been tested only on Solaris 2.6 on an SS4, but should run on any UNIX system. The code (along with a pile of other programs) is located at: http://www.lambdacs.com/code/programs_21_Mar_00.tar.gz in the directory programs/PThreads, there is a Makefile for Solaris which gets regular use, also Makefiles for DEC, HP, and SGI, which get a lot less use. Hence: bil@cloudbase[182]: make bil@cloudbase[183]: setenv DEBUG (If you want LOTS of output) bil@cloudbase[184]: server_select 6500 100 0 10 30 & Server_9206(TCP_PORT 6500 SLEEP 100ms SPIN 0us N_CONSUMERS 10 STOPPER 30s KILLER -1s) Starting up interation 0. Server bound to port: 6500 Server up on port 6500. Processed 669 requests. 41 currently connected clients. Time to stop! Shutdown successful. 676 replies sent. ... bil@cloudbase[185]: client 6500 1 1 50 (Better in different window!) Client_9207(PORT 6500 SLEEP 1ms SPIN 1us N_SOCKETS 50 N_REQUESTS 10000) Connected to server on port 6500 Connected to server on port 6500 Connected to server on port 6500 Connected to server on port 6500 Client_9207[T@9] Receiving segments on fd#6... Client_9207[T@8] Sending 10000 requests on fd#6... Client_9207[T@7] Receiving segments on fd#5... Client_9207[T@6] Sending 10000 requests on fd#5... Client_9207[T@5] Receiving segments on fd#4... Client_9207[T@4] Sending 10000 requests on fd#4... Java The Java program is quite similar and even uses much of the same C code for select(). (Java does not have an API for select(), so we have to use the native select() via JNI.) The Java server is happy to run with the C client & vice-versa. bil@cloudbase[192]: cd programs/Java/ServerSelect bil@cloudbase[193]: setenv THREADS_FLAG native bil@cloudbase[194]: setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:. bil@cloudbase[195]: setenv CLASSPATH ../Extensions/classes:. bil@cloudbase[198]: java Server 6500 100 0 10 30 Server(port: 6500 delay: 100ms spin: 0us nConsumers: 10 stopperTimeout 30s) Server now listening on port 6501 Server up on port 6501. Processed 2113 requests. 10 clients. Stopping... Everything stopped. ... bil@cloudbase[303]: java Client 6500 100 0 10 Client(port: 6500 sDelay: 100 (ms) rDelay: 0 (ms) nClients: 10) Actual Port: 6501 Client[Thread-0] Started new sender. Client[Thread-1] Started new receiver. Client[Thread-2] Started new sender. Client[Thread-3] Started new receiver. ... Client[Thread-0] Sent: 100 requests. Client[Thread-6] Sent: 100 requests. Client[Thread-8] Sent: 100 requests. Client[Thread-14] Sent: 100 requests. Client[Thread-4] Sent: 100 requests. ... =================================TOP=============================== Q292: How I can wait more then one condition variable in one place? Condition variables are merely devices to delay the execution of a thread. You don't need more than one of these in order to accomplish that task. Threads that use condition variables actually wait for a predicate to become true; the evaluation of that predicate is done explicitly in the code, while holding a mutex, e.g. lock mutex ... while not predicate() wait (condition, mutex ) ... unlock mutex You can easily wait on an array of predicates, provided that they are protected by the same mutex. Simply loop over all of them, and if they are all false, wait on the condition. If some of them are true, the perform the associated servicing. >I need pthread analog for WaitForMultipleObjects (WIN32) or >DosWaitMuxWaitSem (OS2) You should not need this, unless you (or someone else) made the design mistake of allowing operating system synchronization objects to be used as an interface mechanism between program components. If two unrelated objects, say A and B, are both to generate independent events which must give rise to some processing in a third object C, which has its own thread then there is a need for C to be able to wait for a signal from either A or B. The programmer who is oblivious to issues of portability might arrange for object C to expose two operating system objects, such as Win32 events; have A and B signal these; and have C do a wait on both objects. A technique which is easier to port is to use only the programming language alone to make an interface between A, B and C. When A and B want to signal C, they call some appropriate interface methods on C rather than invoking functions in the operating system specific library. These methods can be implemented in a number of ways. For example, on a POSIX thread platform, they can lock a mutex, set one of two flags, unlock a mutex and hit a condition variable. The thread inside C can lock the mutex, check both flags and use a single condition variable to suspend itself if both flags are false. Even on Windows, you could handle this case without two events: use a critical section to protect the flags, and have the thread wait on a single auto-reset event. There is really no need for WaitForMultipleObjects in a reasonable design. I've never needed to use it in Win32 programming despite having written plenty of code that must respond to multiple stimuli coming from different sources. On Tue, 11 Jul 2000 14:05:40 -0700, Emmanuel Mogenet wrote: >This seems to be a favorite topic (shouldn't be in the FAQ's or something), >but could someone please elaborate on the following questions: > > 1. If I had to implement WaitForMultipleObjects on top of pthreads >conditions, how would I go about it One easy way would be to write a miniature kernel which manages the objects to be waited on. This kernel would protect itself with a mutex. A condition variable would be used to delay the execution of each incoming thread until the wait condition is satisfied (either all of the desired objects are in the ``signalled'' state, or just one or more of them is in that state, depending on the arguments to the wait): lock(mutex) while ( none of the objects are signalled ) wait(mutex, condition) unlock(mutex) To do a more efficient job, you need more condition variables, so you don't wake up too many threads. You could assign one condition variable to each thread, or you could take some intermediate approach: have a pool of condition variables. The wait grabs a condition from the pool and registers it to wait. When an object is signalled, the implementation then hunts down all of the condition variables that are engaged in a wait on that object, and signals them, thereby waking up just the threads that are on that object. > 2. People seem to consider that WaitForMultipleObjects to be an >ill-designedAPI, however it looks to > me like the semantics is very close to that of select which is to my >knowledge considered > pretty useful by most UNIX programmers. Yes, however select is for multiplexing I/O, not for thread synchronization. Also note that select and poll ensure fairness; they are capable of reporting exactly which objects reported true. Whereas WaitForSingleObject returns the identity of just one object, so the app has to interrogate the state of each object with wasteful system calls. The WaitForSingleObject function is also restricted to 64 handles, whereas select and poll implementations can take thousands. % 1. If I had to implement WaitForMultipleObjects on top of pthreads % conditions, how would I go about it In general, you can wait for N items by kicking off N threads to do the waiting and signalling from those waiters. If you're waiting for events which will be signaled through CVs, and you control the code for these things, have them all signal the same CV. You can still test for a lot of things: pthread_mutex_lock(&mux); while (!a && !b && !c &&!d) pthread_cond_wait(&cv, &mux); pthread_mutex_unlock(&mux); If you're waiting for things which can be treated as file handles, you can use poll() or select(). % 2. People seem to consider that WaitForMultipleObjects to be an % ill-designedAPI, however it looks to % me like the semantics is very close to that of select which is to my % knowledge considered % pretty useful by most UNIX programmers. Like select(), it places an arbitrary limit on the number of things you can wait for, so it can be useful as long as your needs don't go beyond those limits. I think some people don't see the point of this in a multi-threaded program. =================================TOP=============================== Q293: Details on MT_hot malloc()? There are a number of malloc() implemenations which scale better than the simple, globally locked version used in Solaris 2.5 and earlier. A good reference is: http://www.ddj.com/articles/2001/0107/0107toc.htm Some comments by the author: If you quote them in the FAQ, make sure to make a note that these opinions are my personal ones, not my employer's. As I tried to describe in my DDJ article, there is no best malloc for all cases. To get the best version, I advise the application developers to try different versions of mt-hot malloc with their specific app and typical usage patterns and then select the version working best for their case. There are many mt-hot malloc implementations available now. Here are my comments about some of them. * My mt-hot malloc as described in the DDJ article and the patent. It was developed first chronologically (as far as I know). It works well when the malloc-hungry threads mostly use their own memory. It also uses a few other assumptions described in my DDJ paper. The main malloc algorithm in my mt-hot malloc is the same binary search tree algorithm used in the default Solaris libc malloc(3C). * mtmalloc(3t) introduced in Solaris 7. I can't comment on this version, other than to say that it's totally different from my mt-hot malloc implementation. * Hoard malloc It's famous, but my test (described in the DDJ article) did not scale with Hoard malloc at all. It appeas that their realloc() implementation is broken; at least it was in the version available at the time of my testing (spring 2001). I've heard reports from some Performance Computing people (who use Fortran 90 and no realloc()) that Hoard malloc has helped their scalability very well. Also, IMHO the Hoard malloc is too complicated, at least for the simple cases using the assumptions described in my DDJ article. * ptmalloc (a part of GNU libc) I have not tested ptmalloc, so I can't comment on it. * Smart Heap/SMP from MicroQuill My tests of Smart Heap/SMP were not successful. -Greg Nakhimovsky =================================TOP=============================== Q294: Bug in Bil's condWait()? In my Java Threads book I talk about how you can create explicit mutexes and condition variables that behave like POSIX. I note that you'll probably never use these, but it's useful to think about how to build them and how they work. Later on, I talk about the complexities of InterruptedException and how to handle it. It's a tricky little devil. One of the possible approaches I mention is to refuse to handle it at all, but rather catch it, then re-interrupt yourself when leaving your method. Hence allowing another method to see it. A fine idea. And most of my code for this is correct. Richard Carver (George Mason University in Fairfax, VA, (where I grew up!)) pointed out a cute little bug in one bit of my code. Here it is: I wrote condWait like this: public void condWait(Mutex mutex) { boolean interrupted = false; synchronized (this) { mutex.unlock(); while (true) { try { wait(); break; } catch (InterruptedException ie) {interrupted=true;} } } mutex.lock(); if (interrupted) Thread.currentThread().interrupt(); } which is BASICALLY correct. If you get interrupted, you set a flag, then go back and wait again. When you get signaled, you call interrupt() on yourself and all is well. UNLESS... You get interrupted, you wait to get the synchonization lock AND just at that moment, someone calls condSignal() on the CV. Guess what! You're not on the CV's sleep queue anymore and you miss the signal! Bummer! Of course not only is the unlikely to happen, it WON'T happen at all on JDK 1.1.7. But it could if JDK 1.1.7 had been written differently and other JVMs are. ANYWAY, if you followed all that (and if you find this interesting) here's the solution. The first interrupt will be treated as a spurious wakeup, but it won't repeat. (Unless I've missed something else!) public void condWait(Mutex mutex) { boolean interrupted = false; if (Thread.interrupted()) interrupted=true; synchronized (this) { mutex.unlock(); try { wait(); } catch (InterruptedException ie) {interrupted=true;} } mutex.lock(); if (interrupted) Thread.currentThread().interrupt(); } -- =================================TOP=============================== Q295: Is STL considered thread safe?? This should probably be a FAQ entry. Here's the answer I gave 2 months ago to a similar question: In general, the desired behavior is that it's up to you to make your explicit operations on containers, iterators, etc thread safe. This is good because you might have several containers all synchronized using the same locking construct, so if the implementation used individual locks underneath it would be both wrong and expensive. On the other hand, implicit operations on the containers should be thread safe since you can't control them. Typically these involve memory allocation. Some versions of the STL follow these guidelines. Look at the design notes at http://www.sgi.com/Technology/STL/thread_safety.html Jeff --------- There is not such thing as *the* STL library. It is an abstract interface defined in the C++ standard. There is no mention of threads in the C++ standard, so the STL is not required to be thread safe. To find out whether your local implementation of the STL is thread safe, consult your compiler documentation. For an STL implementation to be useful in a multithreaded programming environment, it simply has to ensure that accesses to distinct containers do not interfere. The application can ensure that if two or more threads want to access the same container, they use a lock. I believe that the SGI and recent versions of the Plauger STL (used by VC++) are safe in this way. Hi Cheng, I'm going to post yet another answer: the term 'Thread-safe' is usually a very difficult term to understand completely. There is absolutely no way to guarantee that a given library/software package is 100% thread safe because it all depends on how you use it. An example of what I mean is shown below: class Point2D { public: ... void setX(double value) { lock.acquire(); _x = value; lock.release(); } void setY(double value) { lock.acquire(); _y = value; lock.release(); } double x() const { double tmp; lock.acquire(); tmp = _x; lock.release(); return tmp; } double y() const { double tmp; lock.acquire(); tmp = _y; lock.release(); return tmp; } private: mutable Mutex lock; double _x, _y; }; While the above code can be considered 'thread-safe' to a certain extent, it is possible for it to be used incorrectly. An example is if one thread wants to move the point (we'll call it 'pt' here): pt.setX(100.0); pt.setY(20.0); The Point2D code guarantees that if another thread happens to look at pt's values that it will receive well defined values and if another thread modifies the values that it will be blocked appropriately and the two threads will not clobber the pt object. BUT.... The above two lines do NOT guarantee that the update of the point is automic which in the case of the above example is more important than the Point2D being thread-safe. We can change Point2D to have a set(double x, double y) and a get(double & x, double & y), but these are awkward and they make the Point2D aware of threads when it should not be aware of them at all. Therefore, in my opinion, the best design to overcome all the above problems is to use a Point2D class that contains no locks and we use an externally associated lock to guard the Point2D object. This way, Point2D is useful in all types of applications -- including non-threaded applications AND we have the ability to lock a set of operations on the object to make them appear automic (transaction-like, if you will). That being said, here is an example of how I would use the Point2D object (we'll use the same class declaration as above, minus the lock): class Point2D { public: ... void setX(double value) { _x = value; } void setY(double value) { _y = value; } double x() const { return _x; } double y() const { return _y; } private: double _x, _y; }; Now, for the usage: // This thread (Thread A) updates the object: .... ptLock.acquire(); pt.setX(100.0); pt.setY(20.0); ptLock.release(); .... // This thread (Thread B) reads the information: ... ptLock.acquire(); if (pt.x() > 10.0) // Do something rather uninteresting... if (pt.y() < 10.0) // Do something else rather uninteresting... ptLock.release(); ... Now, both the lock and the Point2D object are shared between two threads and the above modification of the pt instance is seen as automic -- there is no chance for a thread to view that x has been updated but y has not. *PHEW*. All that being said, it may be clear now that when writing an implementation of the STL, it is a good idea to consider threading as little as possible. Usually, the only considerations that should be made are to ensure that all functions are reentrant and that threads working on different instances of containers will not clobber each other since they are not sharing any data -- this is usually achieved by making sure there are no static members for STL containers. Some poor implementations of the commonly used rb_tree implementation use static members and a 'large' mutex that causes no end of problems (anywhere from link errors to much unnecessary overhead). A good implementation of the STL should use 0 locks. Remember that the STL is a set of containers and algorithms. It was correctly left up to the user of the STL to implement locking so they can do it in the way they see fit for the problem they are solving. BTW, SGI has an excellent implementation of the STL and they explain their design decisions on their STL page (you can find it on their site). Hope this provides some insight, Jason =================================TOP=============================== Q296: To mutex or not to mutex an int global variable ?? Frank, Nice *idea*, but no marbles. :-( Lots of variations of this idea are proposed regularly, starting with Dekker's algorithm and going on and on. The problem is that you are assuming that writes from the CPU are arriving in main memory in order. They may not. On all high performance CPUs today (SPARC, Alpha, PA-RISC, MIPS, x86, etc.) out-of-order writes are allowed. Hence in your example below, it is POSSIBLE that only one of the values will be updated before the pointer is swapped. Bummer, eh? -Bil "Use the mutex, Luke!" > To avoid locks you might try the following trick: > Replace the two ints by a struct that contains total and fraction. The global > variable would then be a pointer to the struct. To modify the variables, the writer > would use a temporary struct, update the value(s) and then swap the pointers of the > global pointer and the temporary global pointer. This works assuming that a pointer > write is an atomic operation (which is the case in all architectures I know). Hi, I don't know if you are still the maintainer of the comp.programming.threads FAQ but I was reading Q63 trying to find a good way of using threads in C++ and the suggestions were really good, but seemed to be from a windows perspective. It took me a little while to translate what was there to something that I could use and it is pretty much the same thing but here it is just in case you wanted to include it in the FAQ: class PThread { public: PThread() { pthread_create(&thr, NULL, (void *(*)(void *))thread, this); } static void thread(PThread *threadptr) { threadptr->entrypoint(); } void join() { pthread_join(thr, NULL); } virtual void entrypoint() = 0; private: pthread_t thr; }; -------------------------- David F. Newman [email protected] -------------------------- If you think C++ is not overly complicated, just what is a protected abstract virtual base pure virtual private destructor, and when was the last time you needed one? -- Tom Cargil, C++ Journal. =================================TOP=============================== Q297: Stack overflow problem ? BL> Yes, as far as I know EVERY OS has a guard page. [...] BL> Be aware that if you have a stack frame which is larger than a BL> page (typically 8k), it is POSSIBLE to jump right over the guard BL> page and not see a SEGV right away. KK> The solution to that is to initialize your locals before calling KK> lower-level functions. [...] This is an inadequate solution for the simple reason that one isn't guaranteed that variables with automatic storage duration are in any particular order on the stack. The initialisations themselves could cause out of order references to areas beyond the guard page, depending from how the compiler chooses to order the storage for the variables. The correct solution is to use a compiler that generates code to perform stack probes in function prologues. For example: All of the commercial C/C++ compilers for 32-bit OS/2 except for Borland's (i.e. Watcom's, IBM's, and MetaWare's) generate code that does stack probes, since 32-bit OS/2 uses guard pages and a commit-on-demand stack. In the function prologue, the generated code will probe the stack frame area at intervals of 4KiB starting from the top, in order to ensure that the guard pages are faulted in in the correct order. (Such code is only generated for functions where the automatic storage requirements exceed 4KiB, of course.) Microsoft Visual C/C++ for Win32 generates stack probe code too. This is controlled with the /Ge and /Gs options. As far as GCC is concerned, I know that EMX C/C++ for OS/2, the original GCC port for OS/2, had the -mprobe option, although I gather that in later ports such as PGCC this has been superceded by the -mstack-arg-probe option. (See http://goof.com/pcg/os2/differences.html .) Whether there is the same or an equivalent option in the GCC ports to other platforms I don't know. =================================TOP=============================== Q298: How would you allow the other threads to continue using a "forgotten" lock? Bhavin Shah wrote: > > Hi, > > Sorry if this is a faq, but I haven't found a solution yet: > In a multi-threaded app, say you have a thread acquire a lock, > update global variables, then release the lock. What if for > some reason (granted that updating a couple variables shouldn't > take much time), the thread crashes before releasing the lock? > I don't see a way to set a timeout on acquiring locks. How would > you allow the other threads to continue using that same lock? You wouldn't. What's more, you shouldn't! The dead thread holding the lock may have left the lock-protected data structures in an inconsistent or incorrect state, in effect planting a poison pill for any other thread which might attempt to use them later on. "Overriding" a lock is the same as "ignoring" a lock -- you know that the latter is dangerous, so you should also understand that the former is equally dangerous. There's another peculiar notion in your question, too: that of "a thread crashing." The fact that Thread A takes a bus error or something does *not* mean that Thread A was "at fault," and does *not* mean that Threads B-Z are "healthy." If any thread at all gets into trouble, you should start with the supposition that the entire program is in trouble; you shouldn't think in terms of getting rid of the "offending" thread and trying to carry on as usual. Poor but possibly helpful analogy: if a single-threaded program crashes in Function F(), would it be appropriate to replace F() with a no-op and keep on going? =================================TOP=============================== Q299: How unfair are mutexes allowed to be? > Hi > > If several threads are trying to gain access through a mutex, > how unfair are mutexes allowed to be? Is there any requirement > for fairness at all (ie that no threads will be left unluckily starving > while others get access). > > The code needs to work on many posix platforms, are any of them > unfair enough to merit building an explicit queueing construct > like ACE's Token? > > thanks, > > Jeff Assume the worst and you'll be safe (and probably correct). If your program looks like this: T1 while (true) { lock() i++ unlock() } T2 while (true) { do stuff for 10ms() lock() if (i == N) do other stuff() unlock() } You can be fairly certain T2 ain't never gonna get that lock more than once every 1,000+ interations. But this is pretty fake code and T1 shouldn't look like that. T1 should also "do stuff for Xms" outside of the critical section, in which case you're safe. Think about it: The only time there's a problem is when a thread keeps the mutex for all but a tiny fraction of the time (like T1). And that would be odd. IMHO -Bil =================================TOP=============================== Q300: Additionally, what is the difference between -lpthread and -pthread? ? On 01 Aug 2000 15:01:23 -0500, Aseem Asthana wrote: > > >Hi, > >>That is incorrect. With modern gcc, all you need is the -pthread option. >>If your gcc doesn't recognize that, then you need: > >> gcc -o hello -D_REENTRANT hello.c -lpthread > >>Without the _REENTRANT, certain library features may work incorrectly >>under threads. > >the -D_REENTRANT option, is it to link your programs with the thread safe >standard libraries, or something else? No, the -D_REENTRANT option makes the preprocessor symbol _REENTRANT known to your system headers. Some code in the system headers may behave differently when _REENTRANT is defined. For example, quite typically, the errno macro has a different definition in threaded programs in order to give each thread its own private errno. Without _REENTRANT, accesses to errno might go to one global one, resulting in race conditions. >Additionally, what is the difference between -lpthread and -pthread? -lpthread is the directive to link in the library libpthread.a or a shared version thereof. -pthread is a special command line option supported by modern gcc, and some other compilers as well, which sets up all the correct options for multithreaded building, like -D_REENTRANT and -lpthread. Subject: Re: problems with usleep() Kauser Ali Karim wrote: > > Hi, > > I'm using the usleep() function to delay threads and I get a message: > > Alarm clock > > after which my program to exits prematurely since the p_thread_join that I > call at the end is not executed. > usleep requests the process gets woken up after a time. This wakeup is a SIGALRM, which is causing you prog to exit. Use nanosleep, its is preferred for threaded apps. Ian =================================TOP=============================== Q301: Handling C++ exceptions in a multithreaded environment? On Fri, 4 Aug 2000 09:52:50 -0400, Bruce T wrote: >Hello, > >I am writing code in C++ in a multithreaded system. Can anyone point me to >any >links or articles on any special strategies/concerns or examples of handling >C++ exceptions in a multithreaded environment. Assuming that your compiler supports thread-safe exception handling, you can simply use the language feature as you normally would. There aren't really any special concerns. (You wouldn't worry whether function calls, returns or gotos are a problem under threading, right? So why fuss over exceptions). Avoid common misconceptions, like wanting to throw an exception from one thread to another. This simply isn't a valid concept, since an exception is a branching mechanism. Branching occurs within the thread of control, by definition: it's a change in one thread's instruction pointer, so to speak. (That's not strictly true: function calls *can* take place between threads, processes or machines through remote procedure calling. An analogous mechanism can be devised to pass exceptions; e.g. you catch the exception on the RPC server side, package it up, send a reply message to the client, which unpacks it and rethrows.) =================================TOP=============================== Q302: Pthreads on IRIX 6.4 question? X-Mozilla-Status2: 00000000 [email protected] wrote: > > Hello, > > I am having problems with Pthreads on IRIX 6.4. I have two > threads: the initial thread plus one that has been pthread_created. > The pthread_created pthread does an ioctl and sits in a driver waiting > for an > event. While this is happening the "initial" thread should be eligible > to run, but it is put to sleep, i.e. doesn't run. Why? On IRIX, what > kind of LWP notion > is there? Defaul scheduling scope is process on IRIX 6.4, and the number of execution vehicles is determined by the pthread library -- typically, you'll start with one execution vehicle unles sthe library detects all your threads can run in parallel and consume CPU resources. But the latest pthread patches for IRIX 6.4 would, in my experience, create an extra execution vehicle on the fly in the case you describe, so I'd certainly recommend you to get the *LATEST* set of POSIX recommended patches. You can, in 6.4, use pthread_setconcurrency to give hints as to how many kernel execution vehicles you want. You can also run with system scope threads using pthread_attr_setscope (giving you one kernel execution vehicle per thread), but on IRIX this requires CAP_SCHED_MGT capabilities, as process scope threads in IRIX can schedule themselves at higher priorities than some kernel threads (see man capabilities). In 6.5.8, you have PTHREADS_SCOPE_BOUND_NP (incorrectly referred to as PTHREADS_SCOPE_BOUND in the headers) scope, which gives you what Solaris and Linux system scope threads are -- one execution vehicle per thread, but no extra scheduling capabilities (hence no need to do fancy stuf with capabilities to run this as non-root user); blocking in one thread is guaranteed not to interfere with immediate availability of kernel execution vehicles for other threads. Frank Gerlach wrote: > > I also had the problem of pseudo-parallelity on Solaris 2.6. Only after > calling pthread_attr_setscope() the threads would *really* execute in > parallel. Maybe that helps with your problem.. > > pthread_attr_init(&attr); > pthread_attr_setscope(&attr,PTHREAD_SCOPE_SYSTEM); > pthread_attr_setschedpolicy(&attr,SCHED_OTHER); > int retv=pthread_create(&tids[i],NULL,threadfunc,&ta[i]); > As I also said, on IRIX, the closest scope to this one is PTHREAD_SCOPE_BOUND (actually, ...BOUND_NP, but the header is wrong in 6.5.8). PTHREAD_SCOPE_SYSTEM threads can do much more wrt scheduling in IRIX, and as a result require CAP_SCHED_MGT capabilities. *Latest* pthread patch sets, though, usually don't have too many problems with making extra kernel execution vehicles to avoid deadlocks (for normal process scope threads) -- in the words of the man page for pthread_setconcurrency: Conversely the library will not permit changes to the concurrency level to create starvation. Should the application set the concurrency level to n and then cause n threads to block in the kernel the library will activate additional execution vehicles as needed to enable other threads to run. In this case the concurrency level is temporarily raised and will eventually return to the requested level. Earlier flavours of the pthread library may have more problems to actually guess whether extra execution vehicles are needed. =================================TOP=============================== Q303: Threading library design question ? Some people say semaphores are the concurrent-processing equivalent of GOTO.... Still, semaphores are very useful and sometimes even indispensible. (IMO goto is sometimes also a good construct, e.g. in state machines) A useful construct might be ReaderBlock and WriterBlock classes, which take a ReadWriteMutex as a constructor argument and can be used similar to the synchronize() construct of java. Those classes lock the mutex in the constructor and unlock it in their destructor, avoiding explicit unlocking AND exception handling easy. The latter is especially important, as I cannot think of an elegant way to unlock a mutex in case an exception is thrown, which will be handled in a calling method. In general, one could provide a list of synchronization constructs in ascending order of complextity/danger for novice users. The Reader/Writerblock is quite harmless, even if you do not think about its consequences. Still, you can easily deadlock your program by using two mutexes and acquiring them in opposite order. My feeling is that concurrent programming contains inherent complexity, which cannot be eliminated. As a final input, automatic deadlock detection in debug mode would be a simple, but great feeature for both C++ libs and Java VMs (unfortunately SUN does not provide this in their VMs). Beman Dawes wrote: > There is discussion on the boost mailing list (www.boost.org) of design > issues for a possible C++ threading library suitable for eventual > inclusion in the C++ standard library. > > Some suggest starting the design with very low-level primitives, and > then using these to build higher-level features. But like a goto > statement in programming languages, some low-level features can be > error-prone and so should not always be exposed to users even if present > in the underlying implementation. > > So here is a question where comp.programming.threads readers probably > have valuable insights: > > What features should be excluded from a threading library because they > are known to be error-prone or otherwise dangerous? What are the > threading equivalents of goto statements? > > --Beman Dawes =================================TOP=============================== Q304: Lock Free Queues? On Thu, 10 Aug 2000 03:55:25 GMT, J Wendel wrote: > > > >I wonder if any of you smart quys would care to enlighten me >about "lock free" algorithms. I've found several papers on >the Web, but to be honest, I'm having a little trouble >following the logic. Lock free algorithms do actually rely on atomic instructions provided by the hardware. So they are not exactly lock free. For example, a lock-free queue can be implemented using an atomic compare-swap instruction to do the pointer swizzling. The idea is that the hardware provides you with a miniature critical region in the form of a special instruction which allows you to examine a memory location, compare it to a value that you supply, and then store a new value if the comparison matches. The instruction produces a result which tells you whether or not the store took place. The instruction cannot be interrupted, and special hardware takes care that the memory can't be accessed by other processors. Here is an illustration. Suppose you want to push a new node onto the lock-free list. How do you do that? Well, you set your new node's next pointer to point to the current head node. Then you use the compare-swap to switch the head node to point to your new node! If it succeeds, you are done. If it fails, it means that someone else succeeded in pushing or popping before you were able to execute the instruction. So you must simply loop around and try again. The subject of the comparison is simply to test whether the head node still has the original value. Pseudo code: do { node *head_copy = head; newnode->next = head; } while (!compare_and_swap(&head, head_copy, newnode)); The compare_and_swap simply behaves like this, except that it's implicitly atomic: int compare_and_swap(node **location, node *compare, node *newval) { /* lock whole system */ if (*location == compare) { *location = newval; /* unlock whole system */ return 1; } /* unlock whole system */ return 0; } >Can someone explain why "lock free" algorithms don't seem to >be in widespread use? I've got a work queue based server >that would benefit from less locking overhead. They are probably in more widespread use than you might suspect. However, there is no portable, standard interface for constructing these things. They rely on support from the hardware which is not found on all architectures! These kinds of techniques are more in the domain of the developers of operating systems and system interface libraries who can use them to construct the higher level synchronization primitives. You might find these algorithms used in the implementation of mutexes and other kinds of objects. =================================TOP=============================== Q305: Threading library design question ? [ OK, so I'm reading a little behind... ] In article <[email protected]>, Beman Dawes wrote: >What features should be excluded from a threading library because they >are known to be error-prone or otherwise dangerous? What are the >threading equivalents of goto statements? I think the most commonly asked-for feature that shouldn't be in a thread library is suspend/resume. It's amazing how many people believe they want to arbitrarily suspend another thread at some random place in its execution. Yes, there are things you can do with suspend/resume that are Very Difficult without, but it's one of those places where the bugs that can be introduced are very subtle. For a quick example, suppose I suspend a thread that's inside a library (say, stdio) and has a mutex locked (say, stdout or stderr). Now nobody can say anything without blocking on the mutex, and the mutex won't come back until the thread is resumed. Permute the above with a few dozen thread-aware libraries, and you get into *serious* trouble. The other thing that I hear a bunch (this may be unique to the embedded real-time market that I play in) is to disable context switching from user space. It implodes instantly in the face of multi-processor systems, and effectively elevates the calling thread to ultimate priority. But those are just my favorites. -- Steve Watt KD6GGD PP-ASEL-IA ICBM: 121W 56' 57.8" / 37N 20' 14.9" Internet: steve @ Watt.COM Whois: SW32 Free time? There's no such thing. It just comes in varying prices... =================================TOP=============================== Q306: Stack size/overflow using threads ? In article <[email protected]>, Jason Jesso wrote: % -=-=-=-=-=- % I just began writing a threaded program using pthreads on AIX 4.3 in C. % % In a particular thread I create two 60K arrays as local variables. % My program crashes in irregular places within this thread % and I do believe in the "Principal of Proximity". % % My hunch is stack corruption, since when I place "any" one of these two % arrays as global my program runs fine. % % Could it be possible that I am overflowing the stack space for this % thread? Yes. Threads typically have fixed-size stacks. If thread A has a stack that starts at 0x400000 and thread B has a stack that starts at 0x500000, then A's stack can't be any bigger than 0x100000, or else it would over- write B's. POSIX doesn't specify a default stack size, and it varies from system to system. You can set the stack size by calling pthread_attr_setstacksize. You can find out the default stack size by calling pthread_attr_getstacksize on a freshly initialised attr structure. >From memory, AIX gives about 90k of stack by default, so you probably need to knock it up a bit. Other systems have different limits. Solaris gives 1M, HP-UX 64k, TRU64 (sic) Unix gives ~20k, Linux gives 1M, and FreeBSD gives 64k (again, this is working from memory, so don't rely on it). -- Patrick TJ McPhee East York Canada Patrick TJ McPhee wrote: > In article <[email protected]>, Jason Jesso wrote: > % -=-=-=-=-=- > > % I just began writing a threaded program using pthreads on AIX 4.3 in C. > % > % In a particular thread I create two 60K arrays as local variables. > % My program crashes in irregular places within this thread > % and I do believe in the "Principal of Proximity". > % > % My hunch is stack corruption, since when I place "any" one of these two > % arrays as global my program runs fine. > % > % Could it be possible that I am overflowing the stack space for this > % thread? > > Yes. Threads typically have fixed-size stacks. If thread A has a stack > that starts at 0x400000 and thread B has a stack that starts at 0x500000, > then A's stack can't be any bigger than 0x100000, or else it would over- > write B's. POSIX doesn't specify a default stack size, and it varies > from system to system. You can set the stack size by calling > pthread_attr_setstacksize. You can find out the default stack size by > calling pthread_attr_getstacksize on a freshly initialised attr structure. The pthread_attr_getstacksize() is a nice trick, but it's not portable. Unfortunately, while you're correct that POSIX doesn't specify a default stack size, you're underestimating the true extent of the lack of specification. Far beyond not specifying a default size, it doesn't even specify what the default size MEANS. That is, POSIX never says that the default value of the stacksize attribute is the number of bytes of stack that will be allocated to a thread created using the default attributes. It says that there IS a default, and that, if you ask, you'll get back a size_t integer. Because you're not allowed to set any value smaller than PTHREAD_STACK_MIN, one can play tricks. Solaris, for example, has a default value for the stacksize attribute of "0". But that doesn't mean 0 bytes, it means "default". Nobody can actually ask for 0 bytes, so there's no ambiguity when pthread_create() is called. One can call this "creative" or "devious", but it's perfectly legal. > From memory, AIX gives about 90k of stack by default, so you probably need > to knock it up a bit. Other systems have different limits. Solaris gives > 1M, HP-UX 64k, TRU64 (sic) Unix gives ~20k, Linux gives 1M, and FreeBSD > gives 64k (again, this is working from memory, so don't rely on it). Tru64 UNIX V5.0 and later gives 5Mb by default. Earlier versions, stuck without kernel support for uncommitted memory, were forced to compromise with far smaller defaults to avoid throwing away bushels of swap space. [It was actually more like 24Kb, I think, (which was actually just fine for the vast majority of threads), but that's hardly relevant.] /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation http://members.aol.com/drbutenhof | | 110 Spit Brook Rd ZKO2-3/Q18, Nashua NH 03062-2698 | \--------[ http://www.awl.com/cseng/titles/0-201-63392-2/ ]-------/ [email protected] wrote: > Try adjusting the stack size for the thread: > > static pthread_attr_t thread_stack_size; > > pthread_attr_init (&thread_stack_size); > pthread_attr_setstacksize (&thread_stack_size, (size_t)81920); I recommend NEVER using an absolute size for the stack. It's not portable, it's not upwards compatible. It's just a number that really means practically nothing -- and even less except on the exact software configuration you used to measure. (And even then only as good as the accuracy and thoroughness of your measurements... and measuring runtime stack depth is not easy unless your program has only one straightline code path.) Of course, you may not have much choice... > pthread_create (&thread, &thread_stack_size, thread_func, 0); > > You can check the default size with: > static pthread_attr_t thread_stack_size; > > pthread_attr_init (&thread_stack_size); > pthread_attr_getstacksize (&thread_stack_size, &ssize); > > I ran into this problem on Digital when I had a thread call a deeply nested > function. All auto-variables will be allocated from the thread's stack, and > so it is a good idea to know how much memory your thread function will consume > beforehand. Hope this helps. That works fine on Tru64 UNIX (or the older Digital UNIX and DEC OSF/1 releases), but it's not portable, or "strictly conforming" POSIX. There's no definition of what the default value of stacksize means. (An annoying loophole, but some implementations have exploited it fully.) In fact, on Tru64 UNIX, or on any implementation where you can get the default stack size from pthread_attr_getstacksize, I recommend that you make any adjustments (if you really need to make adjustments) based on that value. Not quite big enough? Double it. Triple it. Square it. Whatever. If the thread library suddenly starts using an extra page at the base of each stack, the default stack size will probably be increased to keep pace -- your arbitrary hardcoded constant won't change, and you'll be in trouble. Furthermore, on Tru64 UNIX 5.0 and later, you'll be doing yourself a disservice by setting a stack size. The default is 5Mb... and if THAT's not enough for you, you need to have your algorithms examined. Solaris (and I believe Linux) use 1Mb, which ought to be sufficient for most needs. In general, be really, really careful about adjusting stack size. If you need to increase any thread from the default, you should consider making it "as big as you can stand". Recompilation of ANY code (yours or something in a system library you use) could expand the size of your call stack any time you change the configuration. (Installing a patch, for example.) Runtime timing variations could also affect the depth of your call stack. Cutting it too close is a recipe for disaster... now or, more likely, (because it's more "fun" for the computer that way), sometime later. /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation http://members.aol.com/drbutenhof | | 110 Spit Brook Rd ZKO2-3/Q18, Nashua NH 03062-2698 | \--------[ http://www.awl.com/cseng/titles/0-201-63392-2/ ]-------/ =================================TOP=============================== Q307: correct pthread termination? sr wrote: > I'm writing a multithreaded program under AIX 4.3.2, and noticed that > all threads, whether or not created in detached state, once terminated > correctly via a pthread_exit() call (after having freed any own > resources), are still displayed by the ps -efml command until the main > process terminates. There's no requirement in POSIX or UNIX 98 that resources be freed at any particular time. There's no way to force resources to be freed. On the contrary, POSIX only places a requirement on you, the programmer, to release your references to the resources (by detaching the thread) so that the implementation is ABLE to free the resources (at some unspecified and unbounded future time). > Is this normal, or there are still resources allocated I'm not aware > of? Many implementations cache terminated threads so that it can create new threads more quickly. A more useful test than the program you show would be, after the "first round" of threads have terminated, to create a new round of threads. Does AIX create yet more kernel threads, or does it reuse the previously terminated threads? > In the latter case, how do I make sure that a thread, when > (gracefully) terminated, gets completely freed? Why would you care? The answer is, there's no way to do this. More importantly, it should make absolutely no difference to your application (and very little difference to the system) unless AIX is failing to reuse those terminated threads. (I would also expect that unused cached threads would eventually time out, but there's no rule that they must.) In general, my advice would be "don't worry about things you don't need to worry about". If you're really sure you do need to worry, please explain why. What you have described is just "a behavior"; not "a problem". If you're sure that behavior represents a problem for you, you'll need to explain why it's a problem. (And while we're all curious out here, you might keep in mind that it'll do you more good to explain the problem to IBM support channels.) Oh, and just a few comments about your program: While it's probably "OK" for your limited purposes, I can't look at a threaded program where main() does a sleep() [or any kind of a timed wait] and then calls exit() without cringing. If you don't need main() to hang around for some real purpose, then it should terminate with pthread_exit(). If you do need it to hang around for some reason, a timed wait is nearly always the wrong way to make it hang around. Secondly, while I understand the desire to provide thread identification in your printout, you should be aware that this "(unsigned long)pthread_self()" construct is bad practice, and definitely unportable. The pthread_t type is opaque. While many implementations make this either a small integer or a pointer, it could as easily be a structure. Unfortunately, POSIX lacks any mechanism to portably identify individual threads to humans (that's "debugging", which is out of scope). I'm not saying "don't do it"; I just want to make sure you know it's a platform dependent hack, not a legal or portable POSIX construct. [[ This is another re-post of a response that was lost on the bad new server. ]] /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation http://members.aol.com/drbutenhof | | 110 Spit Brook Rd ZKO2-3/Q18, Nashua NH 03062-2698 | \--------[ http://www.awl.com/cseng/titles/0-201-63392-2/ ]-------/ =================================TOP=============================== Q308: volatile guarantees?? On Wed, 30 Aug 2000 11:28:46 -0700, David Schwartz wrote: > >Joerg Faschingbauer wrote: >> >> [email protected] (Kaz Kylheku) writes: >> >> > Under POSIX threads, you don't need volatile so long as you use the >> > locking mechanism supplied by the interface. >> >> How does pthread_mutex_(un)lock manage to get the registers flushed? > > Who cares, it just does. POSIX requires it. Everyone is saying that, but I've never seen a chapter and verse quote. I'm not saying that I don't believe it or that it's not existing practice; but just that maybe it's not adequately codified in the document. To answer the question: how can it manage to get the registers flushed? Whether or not the requirement is codified in the standard, it can can be met in a number of ways. An easy way to meet the requirement is to spill registers at each external function call. Barring that, the pthread_mutex_lock functions could be specially recognized by the compiler. They could be, for instance, implemented as inline functions which contains special compiler directives which tell the compiler to avoid caching. The GNU compiler has such a directive, for instance: __asm__ __volatile__ ("" : : : "memory"); The "memory" part takes care of defeating caching, and the __volatile__ prevents code motion of the inlined code itself. Of course, GCC doesn't need this in the context we are discussing, because it will do ``the right thing'' with external function calls. I've only used the above as a workaround to GCC optimization bugs. It can also be used as the basis for inserting a memory barrier instruction: #define mb() __asm__ __volatile__ \ (" " : : : "memory"); It's a good idea to do it like this so that the compiler's optimizations do not make the memory barrier useless, by squirreling away data in registers or moving the instruction around in the generated code. This would be typically used in the implementation of a mutex function, not in its interface, to ensure that internal accesses to the mutex object itself are conducted properly. ------ Hi, I'm having trouble with the meaning of C/C++ keyword volatile. I know you declare a variable volatile wherever it may be changed externally to the flow of logic that the compiler is processing and optimising. This makes the compiler read from the ultimate reserved storage when it is accessed (or so I believed). I have seen a discussion in one of the comp.lang.c* groups where it is suggested that the compiler does not always have to avoid optimising away memory accesses. This seems logical - since a thread which alters the value of a variable might not get scheduled, the value of the variable may not change for some time (many times round a busy loop), so the compiler can use a cached value for many loops without changing the guarantees made by the machine abstraction defined in the standards (*good* for performance). That then renders volatile practically undefinable (since a thread may legally *never* be scheduled) and when it is for hardware changing a flag, the hardware doesn't necessarily change memory (a CPU register may be changed). Volatile behaviour seems best implemented with a function call which uses some guaranteed behaviour internally (in assembler or other language). Has anyone hashed this out before and come to any conclusion whether to trust volatile or spread to other languages? because it's doing my head in (I ask this here because it concerns concurrent programming and the people here probably have the experience of this problem). -- Tristan Wibberley In article , Kaz Kylheku wrote: >Under preemptive threading, the execution can be suspended *at any point* to >invoke the scheduler; pthread_mutex_lock is not special in that regard. Yet >compilers clearly do not treat each instruction with the same suspicion that >pthread_mutex_lock deserves. pthread_mutex_lock() is special. While threads may be pre-empted at any point, they are not permitted to access shared data, so the order in which the operations are performed is irrelevant. By calling pthread_mutex_lock() a thread gains permission to access shared data, so at that point the thread needs to update any local copies of that data. Similarly, by calling pthread_mutex_unlock() a thread relinquishes this permission, so it must have updated the shared data from any local copies. Between these two calls, it is the only thread which is permitted to access the shared data, so it can safely cache as it likes. >How about the following paragraph? > > The values of all objects shall be made stable immediately > prior to the call to pthread_mutex_lock, pthread_mutex_unlock, > pthread_mutex_trylock and pthread_mutex_timedlock. The > first abstract access to any object after a call to one of the > locking functions shall be an actual access; any cached copy of > an object that is accessed shall be invalidated. That is unnecessarily restrictive. Suppose we have a buffering scheme which uses this code: for (;;) { lock while (!items_ready) wait(data) if (!--items_ready) signal(space) unlock use buffer[tail] tail = (tail + 1) % BUFLEN } Analysis of the other uses of 'items_ready' may indicate that it can be optimised into: local_ready = 0 lastbatch = 0 for (;;) { if (!local_ready) { lock items_ready -= lastbatch if (!items_ready) signal(space) while (!items_ready) wait(data) local_ready = lastbatch = items_ready unlock } else local_ready-- use buffer[tail] tail = (tail + 1) % BUFLEN } Of course the analysis required to determine that this is a valid optimisation is not simple, and I would not expect to find it in current compilers, but I don't think the standard should prohibit it. Kaz Kylheku wrote: > The standard is flawed because it doesn't mention that calls to > pthread_mutex_lock and pthread_mutex_unlock must be treated specially. > We all know how we want POSIX mutexes to work, and how they do work in > practice, but it should also be codified in the standard, even though > it may be painfully obvious. The standard requires memory coherency between threads based on the POSIX synchronization operations. It does NOT specifically dictate the compiler or system behavior necessary to achieve that coherency, because it has no power over the C language nor over the hardware. Besides, it really doesn't matter how the requirements are achieved, nor by whom. An implementation (thread library, compiler, linker, OS, hardware, etc.) that doesn't make memory behave correctly with respect to POSIX synchronization operations simply does not conform to POSIX. This means, in particular, (because POSIX does not require use of volatile), that any system that doesn't work without volatile is not POSIX. Can such a system be built? Certainly; but it's not POSIX. (It's also not particularly usable, which may be even more important to some people.) OK, you want chapter and verse? Sure, here we go. POSIX 1003.1-1996, page 32: 2.3.8 memory synchronization: Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread of control can read or modify a memory location while another thread of control may be modifying it. Such access is restricted using functions that synchronize thread execution and also synchronize memory with respect to other threads. The following functions synchronize memory with respect to other threads: fork() pthread_mutex_unlock() sem_post() pthread_create() pthread_cond_wait() sem_trywait() pthread_join() pthread_cond_timedwait() sem_wait() pthread_mutex_lock() pthread_cond_signal() wait() pthread_mutex_trylock()pthread_cond_broadcast() waitpid() In other words, the application is reponsible for relying only on explicit memory synchronization based on the listed POSIX functions. The implementation is responsible for ensuring that correct code will see synchronized memory. "Whatever it takes." Normally, the compiler doesn't need to do anything it wouldn't normally do for a routine call to achieve this. A particularly aggressive global optimizer, or an implementation that "inlines" mutex operations, might need additional compiler support to meet the requirements, but that's all beyond the scope of the standard. The requirements must be met, and if they are, application and library developers who use threads just don't need to worry. Unless of course you choose to try to create your own memory synchronization without using the POSIX functions, in which case no current standard will help you and you're entirely on your own on each platform. /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation http://members.aol.com/drbutenhof | | 110 Spit Brook Rd ZKO2-3/Q18, Nashua NH 03062-2698 | \--------[ http://www.awl.com/cseng/titles/0-201-63392-2/ ]-------/ =================================TOP=============================== Q309: passing messages, newbie? [email protected] wrote: > Hi, > > Thank you all for the info. i think i am > on my way, things are working now. > > One more thing, what is the correct way to > make a thread sleep/delay. > The "sleep()" call i think causes the whole > process to sleep. No, it cannot. At least, not in any legal implementation of POSIX threads. (Nor, in my opinion, in any rational implementation of any usable thread interface.) In practice, this happens in some "cheap" (by which I do not mean "inexpensive") pure user-mode threading libraries. These used to be common and widely used. There are now real thread packages available "just about everywhere", and you should run from any implementation with this sort of "quirk". In any implementation that uses "multiple kernel execution entities", the buggy behavior would actually be difficult to achieve, and nearly impossible to get by accident. Under Linux, for example, threads are really independent Linux processes. Just try "accidentally" getting another process to block when you sleep (or read from a file). > I saw in a paper a call "pthread_delay_np()" > call to delay a thread, but i couldnt find the > call in man pages on my Linux 2.2.12 with glibc-2.1.12. You won't find it on any "pure" implementation of POSIX threads, because that function doesn't exist. It's from the ancient and obsolete DCE threads package (which was a cheap user-mode implementation of a long since defunct draft of the document that eventually became POSIX threads). Because we couldn't count on having sleep() work, and instead of using somewhat less portable means to supercede the real sleep() by something that would work, we introduced pthread_delay_np(). (Which is modelled after nanosleep().) The function is retained in our current implementation of POSIX threads (on Tru64 UNIX and OpenVMS) as an extension, partly for "cultural compatibility" to help people upgrading from DCE threads and partly because, on OpenVMS, there are still compilation/link modes where we can't count on sleep() working correctly. > So we have to delay a thread using sleep() only. This is correct. Or usleep(), nanosleep(), select(), or whatever is appropriate. /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation http://members.aol.com/drbutenhof | | 110 Spit Brook Rd ZKO2-3/Q18, Nashua NH 03062-2698 | \--------[ http://www.awl.com/cseng/titles/0-201-63392-2/ ]-------/ =================================TOP=============================== Q310: solaris mutexes? Roy Gordon wrote: > Is the following true: Solaris mutexes only exist in the user address > space (including any shared memory space); they have no associated > kernel data structure. True. For non-shared-memory mutexes. Ditto CVs & unnamed semaphores. > > If true, this would be opposed to system V semaphores. Yup. > > > Also, if true, then a given mutex could be moved to a different > address (suitably aligned) and as long as all threads (or processes, as > the case may be) reference it at that address, then it would continue > functioning as if it hadn't been moved. > > Is this correct too (if the initial assumption is correct, that is)? You mean like a compacting garbage collector? Yeah. 'Matter of fact, I believe that's what Java does on some platforms. -Bil =================================TOP=============================== Q311: Spin locks? I think it worth noting that spin locks are an efficency hack for SMP machines which are useful under a small number of situations. Moreover, there is nothing that prevents you from using spin locks all the time (other than a slight loss of efficency). In particular, in some libraries ALL locks are actually spin locks. Solaris 2.6 (or is that 7?) and above for example. If you call pthread_mutex_lock() on an MP machine & the lock is held by a thread currently running on another CPU, you WILL spin for a little while. It is very unlikely you would EVER want to build a spin lock yourself. (I mean, it would be kinda FUN and interesting, but not practical.) If you *really* want to, go ahead, just time your program carefully. $10 says you'll find home-spun spin locks won't help. > > BTW, SMP is a bad design that scales poorly. I wish someone could come > up with a better design with some local memory & some shared memory > instead. Like democracy. It sucks, but we haven't come up with anything better :-) -Bil > > BTW, SMP is a bad design that scales poorly. I wish someone could come > > up with a better design with some local memory & some shared memory > > instead. > > Like democracy. It sucks, but we haven't come up with anything better :-) You would like SGI's high-end monster-machines... Ours consists of eight "node boards," each of which has two CPU's and a local memory pool (512MB or so). All the memory in the machine is visible to all processors, but IRIX intelligently migrates individual pages towards the processors that are hitting them most. It's like another level of cache... As long as each thread/process stays within a modest, unshared working set, the system scales very well. Dan -- Eppur si muove =================================TOP=============================== Q312: AIX pthread pool problems? Kaz Kylheku wrote: > On Thu, 31 Aug 2000 17:52:08 GMT, sr wrote: > >/* lck.c */ > >/* AIX: xlc_r7 lck.c -qalign=packed -o lck */ > >/* LINUX: cc lck.c -lpthread -fpack-struct -o lck */ > > Doh, have you reading the GNU info page for gcc? Here is what it > says about -fpack-struct: > > `-fpack-struct' > Pack all structure members together without holes. Usually you > would not want to use this option, since it makes the code > suboptimal, and the offsets of structure members won't agree with > system libraries. Kaz is quite correct, but maybe not quite firm enough... NEVER, ever, under any circumstances, "pack" any structure that you didn't define. If a header wants its structures packed, it'll do it itself. If it doesn't ask, don't presume to tell it what it should do. You asked the compiler to break your mutexes, and it let you, because it didn't know any better. Now you do know better, so stop asking it. ;-) /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation http://members.aol.com/drbutenhof | | 110 Spit Brook Rd ZKO2-3/Q18, Nashua NH 03062-2698 | \--------[ http://www.awl.com/cseng/titles/0-201-63392-2/ ]-------/ =================================TOP=============================== Q313: iostream libray and multithreaded programs ? Bil, I took a class that you taught at Xilinx. I have written some small multithreaded programs. These programs work fine if I don't use iostream library. If I use iostream, these program hanged with the following message: libc internal error: _rmutex_unlock: rmutex not held. I have attached two files with this message: problem.txt and good.txt. These files show the compile options. The only difference is that the broken version use a extra option "-library=iostream,no%Cstd", in both compile and link lines. Unfortunately, this is the standard build option at Xilinx. On one at Xilinx understands why this option will break my program. Could you help me with this problem? Thank you very much. Meiwei --------------A5ABF2A5C05B09260A39D96B Content-Type: text/plain; charset=us-ascii; name="good.txt" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="good.txt" 1) make output: CC -O -c -I../ -DSOL -DDLLIMPORT="" -DTEMP -DDEBUG ../Port_ThrTest.c CC -O -c -I../ -DSOL -DDLLIMPORT="" -DTEMP -DDEBUG ../Port_ThrMutex.c CC -O -c -I../ -DSOL -DDLLIMPORT="" -DTEMP -DDEBUG ../Port_ThrCondition.c CC -O -c -I../ -DSOL -DDLLIMPORT="" -DTEMP -DDEBUG ../Port_ThrBarrier.c CC -O -c -I../ -DSOL -DDLLIMPORT="" -DTEMP -DDEBUG ../Port_ThrThread.c "../Port_ThrThread.c", line 49: Warning (Anachronism): Formal argument start_routine of type extern "C" void*(*)(void*) in call to pthread_create(unsigned*, const _pthread_attr*, extern "C" void*(*)(void*), void*) is being passed void*(*)(void*). 1 Warning(s) detected. CC -L. -o Port_ThrTest Port_ThrMutex.o Port_ThrCondition.o Port_ThrBarrier.o Port_ThrThread.o Port_ThrTest.o -lpthread -lposix4 2) ldd output: libpthread.so.1 => /usr/lib/libpthread.so.1 libposix4.so.1 => /usr/lib/libposix4.so.1 libCrun.so.1 => /usr/lib/libCrun.so.1 libm.so.1 => /tools/sparcworks5.0/SUNWspro/lib/libm.so.1 libw.so.1 => /usr/lib/libw.so.1 libc.so.1 => /usr/lib/libc.so.1 libaio.so.1 => /usr/lib/libaio.so.1 libdl.so.1 => /usr/lib/libdl.so.1 libthread.so.1 => /usr/lib/libthread.so.1 =================================TOP=============================== Q314: Design document for MT appli? > > If you describe what it is you need to acheive, someone in this forum > > advise against using threads, or if they think threads will be good, > > how to use them for best effect. Here is something I recently posted to the Linux kernel list: ------ Let's go back to basics. Take a look inside your computer. What do you see? 1) one (or more) CPUs 2) some RAM 3) a PCI bus, containing: 4) -- a SCSI/IDE controller 5) -- a network card 6) -- a graphics card These are all the parts of your computer that are smart enough to accomplish some amount of work on their own. The SCSI or IDE controller can read data from disk without bothering any other components. The network card can send and receive packets fairly autonomously. Each CPU in an SMP system operates nearly independently. An ideal application could have all of these devices doing useful work at the same time. When people think of "multithreading," often they are just looking for a way to extract more concurrency from their machine. You want all these independent parts to be working on your task simultaneously. There are many different mechanisms for achieveing this. Here we go... A naively-written "server" program (eg a web server) might be coded like so: * Read configuration file - all other work stops while data is fetched from disk * Parse configuration file - all other work stops while CPU/RAM work on parsing the file * Wait for a network connection - all other work stops while waiting for incoming packets * Read request from client - all other work stops while waiting for incoming packets * Process request - all other work stops while CPU/RAM figure out what to do - all other work stops while disk fetches requested file * Write reply to client - all other work stops until final buffer transmitted I've phrased the descriptions to emphasize that only one resource is being used at once - the rest of the system sits twiddling its thumbs until the one device in question finishes its task. Can we do better? Yes, thanks to various programming techniques that allow us to keep more of the system busy. The most important bottleneck is probably the network - it makes no sense for our server to wait while a slow client takes its time acknowledging our packets. By using standard UNIX multiplexed I/O (select()/poll()), we can send buffers of data to the kernel just when space becomes available in the outgoing queue; we can also accept client requests piecemeal, as the individual packets flow in. And while we're waiting for packets from one client, we can be processing another client's request. The improved program performs better since it keeps the CPU and network busy at the same time. However, it will be more difficult to write, since we have to maintain the connection state manually, rather than implicitly on the call stack. So now the server handles many clients at once, and it gracefully handles slow clients. Can we do even better? Yes, let's look at the next bottleneck - disk I/O. If a client asks for a file that's not in memory, the whole server will come to a halt while it read()s the data in. But the SCSI/IDE controller is smart enough to handle this alone; why not let the CPU and network take care of other clients while the disk does its work? How do we go about doing this? Well, it's UNIX, right? We talk to disk files the same way we talk to network sockets, so let's just select()/poll() on the disk files too, and everything will be dandy... (Unfortunately we can't do that - the designers of UNIX made a huge mistake and decided against implementing non-blocking disk I/O as they had with network I/O. Big booboo. For that reason, it was impossible to do concurrent disk I/O until the POSIX Asynchronous I/O standard came along. So we go learn this whole bloated API, in the process finding out that we can no longer use select()/poll(), and must switch to POSIX RT signals - sigwaitinfo() - to control our server***). After the dust has settled, we can now keep the CPU, network card, and the disk busy all the time -- so our server is even faster. Notice that our program has been made heavily concurrent, and I haven't even used the word "thread" yet! Let's take it one step further. Packets and buffers are now coming in and out so quickly that the CPU is sweating just handling all the I/O. But say we have one or three more CPU's sitting there idle - how can we get them going, too? We need to run multiple request handlers at once. Conventional multithreading is *one* possible way to accomplish this; it's rather brute-force, since the threads share all their memory, sockets, etc. (and full VM sharing doesn't scale optimally, since interrupts must be sent to all the CPUs when the memory layout changes). Lots of UNIX servers run multiple *processes*- the "sub-servers" might not share anything, or they might file cache or request queue. If we were brave, we'd think carefully about what resources really should be shared between the sub-servers, and then implement it manually using Linux's awesome clone() API. But we're not, so let's retreat to the brightly-lit neightborhood that is pthreads. We break out the POSIX pthread standard, and find it's quite a bit more usable than AIO. We set up one server thread for each CPU; the threads now share a common queue of requests****. We add locking primitives around the shared data structures in our file cache. Now as soon as a new packet or disk buffer arrives, any one of the CPUs can grab it and perform the associated processing, while the other CPUs handle their own work. The server gets even faster. That's basically the state-of-the-art in concurrent servers as it stands today. All of the independent devices in the computer are being used simultaneously; the server plows through its workload, never waiting for network packets or disk I/O. There are still bottlenecks - for instance, RAM and PCI bandwidth are limited resources. We can't just keep adding more CPUs to make it faster, since they all contend for access to the same pool of RAM and the same bus. If the server still isn't fast enough, we need a better machine architecture that separates RAM and I/O busses into concurrently-accessible pools (e.g. a high-end SGI server). There are various other tricks that can be done to speed up network servers, like passing files directly from the buffer cache to the network card. This one is currently frowned upon by the Linux community, since the time spent copying data around the system is small compared to the overhead imposed by fiddling with virtual memory. Lots of work does go into reducing system call and context switch overhead; that's one of the reasons TUX was developed. Let's drop the "web server" example and talk about another application that benefits from concurrency - number crunching. This is a much simpler case, since the only resources you're worried about are the CPUs and RAM. To get all the CPU's going at once, you'll need to run multiple threads or processes. To get truly optimal throughput, you might choose to go the process route, so that shared memory is kept to an absolute minimum. (Not that pthreads is a terrible choice; it can work very well for this purpose) In summary, when "multithreading" floats into your mind, think "concurrency." Think very carefully about how you might simultaneously exploit all of the independent resources in your computer. Due to the long and complex history of OS development, a different API is usually required to communicate with each device. (e.g. old-school UNIX has always handled non-blocking network I/O with select(), but non-blocking disk I/O is rather new and must be done with AIO or threads; and don't even ask about asynchronous access to the graphics card =). Don't let these differences obscure your goal: just figure out how to use the machine to its fullest potential. That's the Linux way of doing things: think, then act. -- Dan The ideas here mostly come from informative pages like Dan Kegel's "C10K" http://www.kegel.com/c10k.html, and from reading various newsgroup postings and UNIX books. *** POSIX AIO is so ugly, in fact, that it's not unheard-of to simply spawn a pool of threads that handle disk I/O. You can send requests and replies via a pipe or socket, which fits right in with the old select()/poll() event loop *** If we're servicing many, many clients at once, then running a huge select()/poll() in each thread will have outrageous overhead. In that case, we'd have to use a shared POSIX I/O signal queue, which can be done with clone(), but not pthreads()... See Zach Brown's phhttpd http://www.zabbo.net/phhttpd/ buy. =================================TOP=============================== Q315: SCHED_OTHER, and priorities? Dale Stanbrough wrote: > Patrick TJ McPhee wrote: > > > % thinking about them. I simply wanted to know if, given two threads that > > % are available to run with different priorities, will SCHED_OTHER > > % -always- choose the higher priority thread. Also will SCHED_OTHER > > % -never- preempt a higher priority thread simply to run one of lower > > % priority? > > > > SCHED_OTHER does not specify any particular scheduling policy. The > > behaviour will vary from system to system. > > No it doesn't have to. There could be a part of the POSIX reference that > says something like > > "Under no circumstances should any scheduling policy preempt a > higher priority thread to run a lower priority thread". There IS, for the realtime priorities that are defined by POSIX. But the whole point of SCHED_OTHER (and for many good reasons) is to provide a "standard name" for a policy that doesn't necessarily follow any of the POSIX rules. > However it seems from other people's posting that there is no such > restriction made, or in some cases, possible to be made. I suppose > the next logical question to ask is... > > Are there any SCHED_OTHER or other named policies other than > SCHED_RR and SCHED_FIFO that -do- such preemption? That depends on your definitions and point of view. In the simplest terms, from an external user view, the answer is a resounding "yes". That's because many SCHED_OTHER implementations are based on standard UNIX timeshare scheduling, for very good reasons. (That is, it has a long history, it behaves in reasonable ways, and, perhaps most importantly, it behaves in generally and widely understood ways.) Reduced to general and implementation-independent terms, it works roughly like this: each entity (thread or process) has TWO separate priorities, a "base" priority and a "current" priority. You set (and see) only the base priority, but the scheduler operates entirely on the current priority. This priority may be adjusted to ensure that, over time, all entities get a "fair" share of the processor resources available. Either compute-bound entities' current priority may be gradually reduced from the base (where higher priorities are "better", which is the POSIX model but not the traditional UNIX model), and/or entities that block may be be gradually increased from the base. The net result is that entities that have used a lot of CPU won't be given as much in the future, while entities that haven't gotten much will be given more. In the end, it all more or less evens out. >From the scheduler's point of view, higher priority entities are always preferred. From your point of view, though, your high priority threads may behave as if they were low priority, or vice versa. Truth is sometimes not absolute. ;-) Of course, the POSIX standard doesn't even require that level of "truth". It's perfectly reasonable for SCHED_OTHER to completely ignore the specified priority. (There's no requirement that SCHED_OTHER even have a scheduling parameter.) Would such an implementation be useful? Not for some people certainly... but they probably ought to be using realtime policies, which are fully defined by POSIX. /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation http://members.aol.com/drbutenhof | | 110 Spit Brook Rd ZKO2-3/Q18, Nashua NH 03062-2698 | \--------[ http://www.awl.com/cseng/titles/0-201-63392-2/ ]-------/ =================================TOP=============================== Q316: problem with iostream on Solaris 2.6, Sparcworks 5.0? I found the cause of my problem. In my company, we have build tools that generate makfiles with list of options for compiling and linking. The link option list ends with -Bstatic. -mt implicitly appends -lthread and other libraries to the link command. This causes ld to look for libthread.a instead of libthread.so. The following link error went away once I removed -Bstatic. Meiwei "Webster, Paul [CAR:5E24:EXCH]" wrote: > Meiwei Wu wrote: > > > > My test program is linked with a shared library. This shared library was > > compiled and linked with -mt option. > > If I compiled and linked with -mt option, I would get the following link > > error: > > > > ld: fatal: library -lthread: not found > > It sounds like your compiler isn't installed properly. The WS5.0 > documentation says that to be multithreaded, your files must be compiled with > the -mt option (which defines _REENTRANT for you) and linked with the -mt > option (which links the thread library and libC_mtstubs in the correct order > for you). > > Also, 5.0 on the sun has broken MT capabilities, especially when it comes to > iostreams. There are 3 patches available which help to fix this (and a bunch > of other things): > 107357-09 > 107311-10 > 107390-10 > > -- > Paul Webster 5E24 [email protected] - My opinions are my own :-) - > Fifth Law of Applied Terror: If you are given an open-book exam, you will > forget your book. Corollary: If you are given a take-home exam, you > will forget where you live. =================================TOP=============================== Q317: pthread_mutex_lock() bug ??? [email protected] writes: >Thanks for pointing it out. I had made a mistake. I was working at a >Solaris 2.6 machine while looking at an older Solaris 2.5 Answerbook. >To my surprise 2.6 man pages do not specify an EPERM return value >though they make comments like "only the thread that locked a mutex can >unlock it" in man pthread_mutex_unlock. >My guess is that 2.5 and 2.6 had a bug but rather than fixing it in >2.6, they just deleted the EPERM part from the RETURN part of the >manual. No, not a bug. This is completely intentional. Mutexes require some expensive bus operations; if you do error checking on unlock you suddenly require more of those expensive operations, so mutex_unlock becomes a *lot* slower. >Not really POSIX ensures that implementation will let only the locked >thread to unlock it. This is acknowledge by solaris as well in >their "only the thread that locked a mutex can unlock it" phrase. I >also read this in Kleiman et. al 's "Programming with threads". You should really read this as "you're not allowed to do so but if you do all bets are off" >Andrew>If you want mutex locking to be error-checked, you need to >Andrew>create the mutex with the PTHREAD_MUTEX_ERRORCHECK type >attribute. >PTHREAD_MUTEX_ERRORCHECK is a type of teh mutex. Though I m not sure I >suspect this was not there in the initial standard ( Both my books on >pthreads do not make any mention of it ). Solaris did not have a >pthread_mutexattr_settype() interface till 2.7 ( or 2.8 ??. It >definitely wasn t there till 2.6 ). Instead this was the default >behavior as per the man pages. All comes clear when you read the unlock page in S7: If the mutex type is PTHREAD_MUTEX_NORMAL, deadlock detec- tion is not provided. Attempting to relock the mutex causes deadlock. If a thread attempts to unlock a mutex that it has not locked or a mutex which is unlocked, undefined behavior results. If the mutex type is PTHREAD_MUTEX_ERRORCHECK, then error checking is provided. If a thread attempts to relock a mutex that it has already locked, an error will be returned. If a thread attempts to unlock a mutex that it has not locked or a mutex which is unlocked, an error will be returned. Casper =================================TOP=============================== Q318: mix using thread library? "Christina, Hello. Yes, you can use both pthreads & solaris threads at the same time. As a matter of fact, you do almost all the time! Most of the Solaris libraries are written using Solaris threads. I doubt that your hang has anything to do with mixing the two libraries. But... how do you build your program? What does your compile line look like? If it hangs at malloc_unlock, then I would be suspicious that somewhere your code is corrupting the library. I would use purify (or the Sun debugger's bounds checker) to be certain that I wasn't writing past the end of an array or to an invalid pointer. -Bil > Bil: > How are you! > This is Christina Li at Lucent Technologies. I have a specific question for > MT programming, hopefully not too bother you. > In the man page of thr_create, it shows like we can use together both > pthread and Solaris thread library in an application( on Unix Solaris 2.5 ), and > most of the books don't talk anything about mixed using the two libraries. But I > heard from some people , that it is not safe to use both pthread and Solaris > thread at the same time in an application. > I have a large application, over 150K line code. Sometimes it just > mysteriously hang at some very low level, like malloc_unlock or pthread_unlock > or other places. > It seems like if I build the application with POSIX_PTHREAD_SEMANTICS and linked > with pthread helps to bypass some hang. > But I am really not sure whether it is a true fix or not, I am quite > confused of this, would you please help to share some of your ideas? > > Thanks very much! > > Christina Li. =================================TOP=============================== Q319: Re: My agony continues (thread safe gethostbyaddr() on FreeBSD4.0) ? Stephen Waits writes: >Well, hoping to avoid serial FQDN resolution, I trashed gethostbyaddr() >and attempted to write my own "thread-safe" equivalent. >After much research and plodding through nameser.h, this is what I ended >up with. The problem is that it STILL doesn't seem to be thread-safe. >If I mutex wrap the call to res_query() my proggie works great, just >that DNS lookups remain serial :( I'm assuming res_query() uses some >static data somewhere along the line (going to read the source right >after this post). >ANY suggestions on where to go next (besides "use ADNS") much >appreciated! Run Solaris (I think from 7 onwards we have a multithreaded resolver library). I believe future bind versions will have a threaded library. Perhaps bind 9 has it? Of course, in Solaris the issue of concurrent lookups was somewhat more pressing with one daemon doing all the lookups. Casper -- Expressed in this posting are my opinions. They are in no way related to opinions held by my employer, Sun Microsystems. =================================TOP=============================== Q320: OOP and Pthreads? "Mark M. Young" wrote in message news:[email protected]... > [...] > After having said this, are you still > suggesting that I use a function or macro to compare the addresses > and lock the objects accordingly? Is this common industry practice > or something? I can only speak for myself, but I have used it. > So, the burden of locking an extra mutex, having an extra mutex in > existence, and hurting serialization outways the burden of the > complicated comparisons of addresses that might arise (e.g. 4 > objects)? Once you go beyond 3 objects, the code to perform the > comparisons would be rediculous and I would like to have a clean > technique used universally. It is actually not that ridiculous if you can use the C++ STL library. Here's an example: #include #include using namespace std; class ADT { public: ADT& operator += (const ADT& b); private: friend class LockObjects; int lock() { cout << "Locked : " << this << endl; return 0; // Required by my C++/STL implementation. }; int unlock() { cout << "Unlocked: " << this << endl; return 0; }; }; class LockObjects : private priority_queue { public: LockObjects(ADT* pArg1, ADT* pArg2) { push(pArg1); push(pArg2); for_each(c.begin(), c.end(), mem_fun(&ADT::lock)); } ~LockObjects() { for_each(c.rbegin(), c.rend(), mem_fun(&ADT::unlock)); } }; ADT& ADT::operator += (const ADT& rhs) { LockObjects lock(this, const_cast (&rhs)); // Add the two. return *this; } int main() { ADT a, b; a += b; cout << endl; b += a; return 0; } Pushing into the priority_queue sorts the objects in decreasing address order. In practice you probably want to separate the sorting of the objects and the actual locking of them. I have used this scheme on several occasions and it has worked quite nicely. I don't know how expensive the use of priority_queue is, as it in my case has not been of particular importance. -- [email protected] All the other responders (Kylhelku,Wikman,Butenhof) had excellent suggestions. I prefer the key to the address for sorting the locks, for the reason KK mentioned, but if you have operator< defined for your ADTs, you could potentially use that for ordering, taking care in the case of equality. Wikman's priority queue (you could also use an STL set) is a nice way to get the sorting cleanly. I'm not sure whether I was clear in warning you off the class lock. If you have thousands of ADTs that are supposed to participate in binary operations in multiple threads, by introducing the class lock you force the thousands of operations to be carried out sequentially -- no parallelism is possible. This will matter a lot on an SMP box, or if any of the operations involve i/o or something else that blocks the processor. Comparing a few addresses (usually a pair of addresses) is trivial in comparison. The suggestion to provide locking primitives and let the higher-level code decide how to use them is very valuable. Typically you provide, e.g., operator+ which grabs the locks on its operands, and then calls plusInternal which does the addition. Then operator+= grabs the locks and calls plusInternal and assignInternal, thus avoiding the need for recursive mutexes and a lock on the temporary. If you are multiplying two matrices of these ADTs, matrixMultiply grabs locks on all the elements of all the matrices, then calls all the usual multiplyInternal and plusInternal members avoiding a horrendous number of recursive lock/unlock calls. If you had a parallelMatrixMultiply operation, it would lock all the elements of both matrices (and possibly the result matrix), hand off groups of rows from one matrix and groups of columns from the other to the participating threads, which would use the multiplyInternal,plusInternal and assignInternal operations and proceed without taking any additional locks. This is much more in the spirit of how the STL is used in a threadsafe fashion (see the discussion in http://www.sgi.com/Technology/STL/thread_safety.html). By the way, you can implement a recursive mutex on top of the normal pthread_mutex by wrapping pthread_mutex in a class which holds the owner thread id and a reference counter. When you are trying to lock, you see if you are already the owner. If so, you increment the reference count. If you are not, you call the pthread_mutex_lock function. When you unlock, you decrement the reference counter and only if it's zero do you call pthread_mutex_unlock. A full implementation can be found in the ACE (Adaptive Communications Environment) library, http://www.cs.wustl.edu/~schmidt/ACE.html Jeff =================================TOP=============================== Q321: query on threading standards? Bill, I don't know if you recall or not, but we talked at Usenix in San Diego about threading under Linux. I was curious about your opinion on a couple of things. (1) Have you been following the threading discussion on linux kernel (summarized at http://kt.linuxcare.com/kernel-traffic/kt20000911_84.epl#1for those of us that have lives)? I wondered if you had any opinions on the Posix thread discussion and Linus's evaluation of pthreads. (2) Is there a current POSIX standard for Pthreads? Where might one find or obtain this. I couldn't find a reference to where the standard is in "Multithreaded Programming with pthreads". (3) There also has been a lot of discussion on the mailing list about some changes that Linux has put into 2.4.0-test8 to support "thread groups". This is a way to provide a container for Linux threads (the process provides this container on most other operating systems). Apparently this breaks the current Linux implementation of pthreads. But other than that it is a good thing should allow a better implementation of pthreads under Linux, but no details are forthcoming at the moment..... Best Regards, Ray Bryant IBM Linux Technology Center [email protected] 512-838-8538 http://oss.software.ibm.com/developerworks/opensource/linux We are Linux. Resistance is an indication that you missed the point "...the Right Thing is more important than the amount of flamage you need to go through to get there" --Eric S. Raymond Ray, Yeah, I remember. It was right after the BOF, right? > Bill, > > I don't know if you recall or not, but we talked at Usenix in San Diego > about threading under Linux. > > I was curious about your opinion on a couple of things. > > (1) Have you been following the threading discussion on linux kernel > (summarized at > http://kt.linuxcare.com/kernel-traffic/kt20000911_84.epl#1for those > of us that have lives)? > I wondered if you had any opinions on the Posix thread discussion > and Linus's evaluation > of pthreads. I just took a gander at it. Now I won't say I got it all figured out in 15 minutes of scanning, but Linus is, as is often the case, full of himself. He talks about "Right" (Linux) vs. "Engineered" (POSIX) as a moral battle. I probably agree with him on many points, but at the end of the day I want something that works & is used. PThreads fits the bill. (I notice that Linus is not about using other engineered solutions...) > > (2) Is there a current POSIX standard for Pthreads? Where might one find > or obtain this. > I couldn't find a reference to where the standard is in > "Multithreaded Programming with > pthreads". It's in there. For $130 (no free on-line access :-( ) you can buy it from IEEE. NB: POSIX (1995) vs. UNIX98 (the follow on with a few slight changes.) > > (3) There also has been a lot of discussion on the mailing list about some > changes > that Linux has put into 2.4.0-test8 to support "thread groups". > This is a way to > provide a container for Linux threads (the process provides this > container on most > other operating systems). Apparently this breaks the current Linux > implementation > of pthreads. But other than that it is a good thing should allow a > better implementation > of pthreads under Linux, but no details are forthcoming at the > moment..... No idea about that. It sounds very odd, considering Linus' railing about LWPs being broken. I'd want to hear a *very* good reason for threads groups, esp. as they are pretty much useless in Java. -Bil Bil, Yes, we talked right after the BOF at Usenix. I think the thing that upsets me about the Linus/Linux discussion on threading is the utter contempt the Linux core team has for the rest of the world. I mean I am willing to accept that POSIX threads were not designed to fit into the Linux threading model. But it seems to me that the people who worked on the POSIX thread standard were trying to pull together a consensus among a wide variation of expecations, requirements, and that they likely did this in a concientious, diligient, and competent way. To that work "shit or crap" is disrepectful to the people who tried (quite hard) to make it a good standard. Oh well. Yes, I eventually found in your book where to go order the POSIX specification. But I figured not only would I not get my management to cough up $143; I really didn't want to read a 784 page document. I think I am going to go with the Butenhof book instead. The thread group changes that Linus has put in are an attempt to provide a "container" for a program's threads. Using this, one can send a signal to the container and have one of the threads in the thread group that is enabled for the signal be the one that gets the signal; as per POSIX semantics. At the moment,, however, the changes break pthreads. Oh well, again. Best Regards, Ray Bryant IBM Linux Technology Center [email protected] 512-838-8538 http://oss.software.ibm.com/developerworks/opensource/linux We are Linux. Resistance is an indication that you missed the point "...the Right Thing is more important than the amount of flamage you need to go through to get there" --Eric S. Raymond =================================TOP=============================== Q322: multiprocesses vs multithreaded..?? > >Patrick TJ McPhee wrote: >> >> In article <[email protected]>, >> David Schwartz wrote: >> >> % Threads are inherently faster than multiple proceses. >> >> Bullshit. > > Refute it then. For Linux users, Question/Answer 6 of this interview: http://slashdot.org/interviews/00/07/20/1440204.shtml by kernel developer Ingo Molnar on the TUX webserver is quite illuminating -- he notes the context switch time for two threads and for two processes under Linux is identical, around 2 microseconds on a 500 MHz PIII. Ingo makes the case using threads (instead of just fork()'ing off processes) under Linux should be reserved for: "where there is massive and complex interaction between threads. 98% of the programming tasks are not such. Additionally, on SMP systems threads are *fundamentally slower*, because there has to be (inevitable, hardware- mandated) synchronization between CPUs if shared VM is used." Just passing along an interesting interview, not necessarily my personal opinion -- actually, on the project I'm working on these days, using non-blocking I/O of multiple streams in a single process turns out to be the best way of doing things, so I vote for "none of the above" :-). ------------------------------------------------------------------------- John Lazzaro -- Research Specialist -- CS Division -- EECS -- UC Berkeley lazzaro [at] cs [dot] berkeley [dot] edu www.cs.berkeley.edu/~lazzaro ------------------------------------------------------------------------- -- =================================TOP=============================== ? Check http://sources.redhat.com/pthreads-win32/ Regards, Jani Kajala "Mark M. Young" wrote in message news:[email protected]... > I've read the FAQ, I've searched the net. Could someone help me to a > Win32 Pthreads implementation (I don't use Windows by choice)? =================================TOP=============================== Q323: CGI & Threads? Do a web search for a standard called "Fast CGI". It basically uses a LWP with a thread pool to service CGI requests. The nice thing about Fast CGI is that any program written to be a Fast CGI executable is still compatible as a standard CGI executable. Regards, Shelby Cain "Terrance Teoh" wrote in message news:[email protected]... > Hi, > > Has anyone seen any CGI done in either C / C++ using threads ? > Basically I am thinking of reducing resources taken up when there are > too many > access being done at the same time ? > > Thoughts ? Pointers ? Comments ? > > Thanks ! > Terrance =================================TOP=============================== Q324: Cancelling detached threads (posix threads)? Jason Nye wrote: > I'm trying to find out whether the posix specification allows the > cancellation of detached threads. I have a copy of Butenhof's book and in > the posix mini-reference at the end of the book, he says that pthread_cancel > should fail if the target thread is detached. This makes sense to me, but is > this the correct behaviour? For example LinuxThreads does allow cancellation > of a detached thread -- who is correct? This is actually an ambiguity in the standard. An implementation that allows cancellation of a detached thread doesn't violate the standard. HOWEVER, other provisions of the standard make such an allowance of questionable value, at best. For example, when a detached thread terminates the state of that thread is immediately invalidated, and may be reused immediately for a new thread. At that point (which you cannot determine) you would be cancelling some new thread that you probably didn't create, with possibly disastrous consequences to your application. There's no excuse for ever cancelling a detached thread. If you do, you may be breaking your application. If it works today, it might not work tomorrow, for reasons you cannot easily determine. In other words, regardless of what the standard says, this is an application error that individual implementations may or may not detect. (And in the general case, implementations that reuse pthread_t values cannot detect the cases where it really matters, because when you cancel a reused pthread_t, the value is valid at the time.) So, if you believe my interpretation, you're less likely to get yourself into trouble by taking advantage of dangerous (and in the final analysis, unusable) loopholes provided by other implementors. ;-) /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q325: Solaris 8 recursive mutexes broken? "Bill Klein" wrote in message news:[email protected]... > Hey, > > I came on to this newsgroup prepared to ask exactly this question - I'm trying to get recursive mutexes working under Solaris 7 but > am having no luck at all. > > "Spam Me Not" wrote... > > Confirmed from sun's web page: > > http://sunsolve.sun.com/pub-cgi/retrieve.pl?doc=fpatches%2F106980&zone_32=42 88299 > > Have you tried the patch? Does it actually solve all problems? > > Thanks! Yes, I tried the patch. Look a few messages back on this message thread, where I posted a program to test this, and the 4 resulting cases. Case #2 was fixed by the patch, which was simple pthread_mutex_lock and pthread_mutex_unlock of a recursive mutex by multiple pthreads. The patch did not fix case #1, where I used a mutex that happens to be recursive (but where the recursion is never used AFAIK) in a pthread_cond_wait. In this case, the cond_wait never returns, so there's still a problem with recursive mutexes even with this patch. Note that sun's documentation suggests not using recursive mutexes in cond_waits, because a recursive mutex with count > 1 won't be fully released by cond_wait, but I don't think that note applies to my program, which doesn't take recursive locks of the mutex. Anyone know how to submit a bug report to sun? :) Do you have to register on sunsolve first? [email protected] writes: >The problem first showed up in Solaris 7 and there was a >patch (106980-13) that fixed the problem. > >Now I've upgraded my system to Solaris 8 and the problem is >back. Obviously the Solaris 7 patch was not propagated >into the Solaris 8 code and there does not appear to be a >patch for the Solaris 8 system. I've searched the SunSolve >web site with no luck so far. The Solaris 8 sparc equivalent patch for 106980-13 is 108827-05. >Does anyone know about this problem and if so, is there a I don't know what the problem is, because you haven't said. >workaround for it? A quick mod to the program to dump out the error gives: . . . lock: 251, thread 4 lock: 252, thread 4 lock: 253, thread 4 lock: 254, thread 4 lock failed: Resource temporarily unavailable, 255, thread 4 lock failed: Resource temporarily unavailable, 256, thread 4 lock failed: Resource temporarily unavailable, 257, thread 4 lock failed: Resource temporarily unavailable, 258, thread 4 . . . "Resource temporarily unavailable" = EAGAIN man pthread_mutex_lock... ERRORS EAGAIN The mutex could not be acquired because the maximum number of recursive locks for mutex has been exceeded. So what's the fault? It seems to be behaving exactly as described. -- Andrew Gabriel Consultant Software Engineer =================================TOP=============================== Q326: sem_wait bug in Linuxthreads (version included with glibc 2.1.3)? Jason Andrew Nye wrote: > The problem is that POSIX-style cancellation is very dangerous in C++ code > because objects allocated on the stack will never have their destructors > called when a thread is cancelled (leads to memory leaks and other nasty > problems). This statement is not strictly true. Only an implementation of POSIX thread cancellation that completely ignores C++, combined with an implemenation of C++ that completely ignores POSIX thread cancellation, results in a dangerous environment for applications and use both in combination. Because POSIX cancellation was designed to work with exceptions (it was in fact designed to be implemented as an exception), the combination is obvious and natural, and there's simply no good excuse for it to not work. Personally, I think it's very near criminal to release an implemenation where C++ and cancellation don't work together. Developers who do this may have the convenient excuse that "nobody made them" do it right. The C++ standard doesn't recognize threads, and POSIX has never dealt with creating a standard for the behavior of POSIX interfaces under C++. (Technically, none of the POSIX interfaces are required to work under C++, though you rarely see a UNIX where C++ can't call write(), or even printf().) Excuses are convenient, but this is still shallow and limited thinking. I don't understand why anyone would be happy with releasing such a system. I spent a lot of time and energy educating the committee that devised the ABI specification for UNIX 98 on IA64 to ensure that the ABI didn't allow a broken implementation. Part of this was simply in self defense because a broken ABI would prohibit a correct implementation. I'd also had some hope that the reasonable requirements of the ABI would eventually percolate up to the source standard. More realistically, though, I hoped that by forcing a couple of C++ and threads groups to get together and do the obviously right (and mandatory) thing for IA64, they might do the same obviously right (though not mandatory) thing on their other platforms. Maybe someday it'll even get to Linux. Please don't settle for this being broken. And especially, don't believe that it has to be that way. Anyone who can implement C++ with exceptions can create a language-independent exception facility that can equally well be used by the thread library -- and, with a few trivial source extensions, by C language code (e.g., though the POSIX cleanup handler macros). /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ On Mon, 06 Nov 2000 09:16:42 -0500, Dave Butenhof wrote: >environment for applications and use both in combination. Because POSIX >cancellation was designed to work with exceptions (it was in fact designed to >be implemented as an exception), the combination is obvious and natural, and >there's simply no good excuse for it to not work. What should the semantics be, in your opinion? POSIX cleanup handles first, then C++ unwinding? Or C++ unwinding first, then POSIX cleanup handlers? Or should the proper nesting of cleanup handlers and C++ statement blocks be observed? I understand that in the Solaris implementation, the POSIX handlers are done first and then the C++ cleanup. How about Digital UNIX? My concern is what GNU libc should do; where there isn't a standard, imitating what some other popular implementations do would make sense. My opinion is that they should be executed in the only possible correct or useful order. ( ;-) -- but only for the phrasing, not the message.) Each active "unwind scope" on the thread must be handled in order. (The opposite order from that in which they were entered, of course.) The obvious implementation of this is that both C++ destructors (and catch clauses) and POSIX cleanup handlers, are implemented as stack frame scoped exception handlers, and that each handler is executed, in order, as the frame is unwound by a single common unwind handler. Any other order will break one or the other, or both. > I understand that in the Solaris implementation, the POSIX handlers are done > first and then the C++ cleanup. How about Digital UNIX? My concern is what GNU > libc should do; where there isn't a standard, imitating what some other popular > implementations do would make sense. I don't know the details of the Solaris implementation, but what you describe is clearly broken and useless except in trivial and contrived examples. We, of course, do it "correctly", though it could be cleaner. For example, right now C++ code can't catch a cancel or thread exit except with the overly general "catch(...)", because C++ isn't allowed to use pthread_cleanup_push/pop, (and shouldn't want to since C++ syntax is more powerful), and C++ doesn't have a name for those "foreign" exceptions. (Of course destructors work fine.) We've worked with the compiler group to add some builtin exception subclasses to deal with that, but we never found the time to finish hooking up all the bits. Our UNIX was architected from the beginning with a universal calling standard that supports call-frame based exceptions. All conforming language processors must provide unwind information (procedure descriptors) for all procedures, and a common set of procedures (in libc and libexc) support finding and interpreting the descriptors and in unwinding the stack. Our C compiler provides extensions to allow handling these native/common exceptions from C language code. Our uses these extensions to implement POSIX cleanup handlers. (For other C compilers, we use a setjmp/longjmp package built on native exceptions "under the covers", though with some loss of integration when interleaved call frames switch between the two models. Support for our extensions, or something sufficiently similar, would allow me to make gcc work properly.) Both cancel delivery and pthread_exit are implemented as native exceptions. The native unwind mechanism will unwind all call frames (of whatever origin) and call each frame's handler (if any) in the proper order. (Another minor glitch is that our exception system has a single "last chance" handler, on which both we and C++ rely. We set it once at initialization, but C++ sets it at each "throw" statement, which will break cancellation or thread exit of the initial thread since we can't put a frame handler at or below main(). This is also fixed by our not-quite-done integration effort with C++.) This is all covered by the IA64 ABI. Of course it specifies API names, and data structure sizes and contents. It's also somewhat more biased than our implementation towards C++, since it was a generalization and cleanup of the C++ ABI section on exceptions rather than something designed independently. (The ABI, and any C++ implementation, had to do this anyway. Making it general was only a little more work than making it exclusive to C++, and of fairly obvious value.) /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q327: pthread_atfork?? Steve Watt wrote: > > [ ... ] Because the thread ththread calls fork() owns all of your locks in the > >parent, and because that thread (and only that thread) exist in the child, the > >child thread owns all of the locks when the CHILD handler is called, and it can > >be sure that all of your data (protected by the locks) is clean and consistent. > > I mostly agree with this, except for the statement that the child thread > owns all the locks. Specifically, what about error checking mutexes > a'la UNIX98? Can the child process really unlock them? Does an > implementation have to keep some manner of PID information around so that > new threads in the child would correctly EPERM? You're of course welcome to agree or disagree, but be warned that when it comes to matters of POSIX threads interpretations, suggesting disagreement with me can lead to long and complicated replies containing detailed analyses and interpretations of the relevant sections of the standard. You've been warned. ;-) Yes, I deliberately and carefully said that the child owns the locks. That's what I meant, which is why I said it. That is, in the child process, the recorded owner (of any mutex for which owner is recorded) of a mutex locked by the thread that called fork() in the parent IS the single thread in the child process. POSIX does not specify that the thread ID of the single thread in the child is identical to the ID of the forking thread in the parent, but it does require that any necessary "transfer of ownership" be made transparent to the application. If you locked it in a PREPARE handler, you can unlock it in the CHILD handler, no matter what type of mutex it was. At least... this is the INTENT of the POSIX working group. Unfortunately, the text of the standard is somewhat less clear than one might like. Little is said about what pthread_atfork() does, or how or why you might use it, except in the RATIONALE section, which is explicitly NOT a binding part of the standard. (It's commentary and explanations, but can place no requirements on either implementation or application.) The description of fork() also is not particularly useful because, despite the clear implication (by having pthread_atfork()), the standard says that applications may call only "async-signal safe" functions between the return from fork() (in the child) and a call to one of the async-signal safe exec*() functions, and mutex operations are not async-signal safe. (But then, technically, the atfork CHILD handlers, which are called implicitly by the user-mode wrapper of the _fork() syscall, are not actually called "by the application" after return from fork(); thereby adding yet another level of fuzzy haze to the dilemma.) What this all means is that we (the working group) didn't spend nearly enough time reviewing the vast body of the POSIX standard to find words and implications that should have been changed. Originally, the thread standard was a completely separate document, though it modified certain sections of 1003.1. That made a thorough review awkward. Eventually, the 1003.1c amendment text was integrated with the standard. We found many of the resulting inconsistencies and holes -- but not all of them. Unfortunately, some areas, like this one, are not mere editorial changes; fixing the standard to say what we meant could break some implementations that currently conform to the letter (while violating the spirit). What this really means is that use of pthread_atfork() may be broken (and unusable) on some implementations; and those implementations may not be technically "nonconforming". We were always aware this would occur in some cases, because we knew we couldn't make the standard perfect. Many such issues that came up were dismissed as simple matters of "quality of implementation". Nobody, obviously, would buy a broken implementation. (The flip side, to which we didn't pay sufficient heed, is that people DO buy broken implementations all the time, or are forced to use such systems bought by others, learn to accept the limitations, and even expect them of other systems.) "Life's not fair." > What about thread-specific data? Should the thread in the child get the > same thread-specific data as the thread that called fork()? What if > the result of pthread_self() is different, such that pthread_equal won't > say they're equal? The standard doesn't require that the thread ID will be the same, though we assumed it usually would be. This wasn't an omission. While we said that thread IDs are private to the process, there was some interest in "not precluding" an implementation where thread IDs are global. If thread IDs are global, the thread in the child must have a unique ID. This silly and irrelevant intent, however, has certain implications, adding to the general "fuzz" around fork(), because it implies that the ownership information of mutexes would need to be fixed up; but that's not actually required anywhere. (In fact, this could be considered a technical violation of the requirement that the child has a copy of the full address space, "including synchronization objects".) Nevertheless, in any implementation crafted by "well intentioned, fully informed, and competent" developers, it must be possible to use pthread_atfork() (with sufficient care) such that normal threaded operation may continue in the child. On any such implementation, the thread ID of the child will be the same as in the parent, all mutexes properly locked in PREPARE handlers will be unlockable in CHILD handlers, all thread-specific data attached to the forking thread in the parent will be accessible to the single thread in the child, and so forth. This may not apply to Linuxthreads, if, (as I have always assumed, but never verified), the "thread ID" is really just the pid. At least, pthread_self() in the child would not be the same as in the parent. This is just one of the reasons that building "threads" on top of processes is wrong; though the nonconformance of the consequences here are, as I've detailed, somewhat less clear and absolute than in other places. (Nevertheless, this substantially and clearly violates the INTENT of the working group, and may render pthread_atfork(), an important feature of the standard, essentially useless.) The thread library could and should be smart enough to fix up any recorded mutex (and read-write lock) ownership information, at least; and TSD should be carried over because it's "just memory" and there's no reason to do anything else. > >The CHILD handler may also just unlock and continue, though more commonly it > >will do some cleanup or reinitialization. For example, it might save the current > >process pid somewhere, or reset counters to 0. > > I generally think that about the only good thing to do in the child > handler is re-initialize the IPCs. "The only good thing" to do in CHILD handlers is whatever is necessary to clean up and get ready for business. If you don't do that, there's no point to even bothering... in which case you just can't expect to fork() a threaded process at all. /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ > Does the std address a forkall() concept vs. fork1()? There's some rationale (commentary that's not part of the standard) explaining that forkall was proposed and rejected. It doesn't bother to explain the problems with forkall. > > the requirement that the child has a copy of the full address space > > Implying _all_ threads too - a "forkall()" concept. Doesn't POSIX > replace fork() with fork1(), thus the above requirement is > not violated since the "true" fork() is not called? "Full address space" doesn't imply "all threads" at all, except perhaps to a "pure user mode" thread library. Kernel threads don't live in the process address space. POSIX doesn't "replace fork" with anything. POSIX **defines** fork. Rather, Solaris "replaces fork" with their proprietary fork1 interface. (Though only, of course, in nonstandard compilation environments.) The concept of "forkall" is foolish. You can't arbitrarily replicate execution streams unless you can tell them what happened, and there's simply no way to do that. (Solaris allows that threads in blocking syscalls "might" return EINTR, but that's all, and it's not nearly enough.) With a single execution context for each process, fork was just fine, because the execution stream asked for the copy, and knows what happened on both sides of the fork. When you have multiple independent execution contexts, you have to deal with the fact that you don't know what any other context is doing, and it doesn't know what you're doing. A lot of new mechanism would have to be invented, and many complicated constraints added, to make "forkall" a useful interface. Each cloned execution context would need to be immediately notified, and it would need to be able to "clean up" in whatever way necessary, including terminating itself. This might be done by delivering a signal, but much of the cleanup likely to be necessary (and thread termination) cannot be done in a signal handler. Forkall was proposed. We discussed it a lot. We dismissed it as far too complicated, and way beyond any rational interpretation of the working group's scope and charter. To some people who don't look deeply, forkall seems "simpler" than pthread_atfork; but it is actually vastly more complicated. Unless you don't care about correctness or usability. /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q328: Does GNU Pth library support process shared mutexes? CoreLinux++ WILL support process shared mutexes through a combination of shared memory and semxxx. This will take a few weeks to implement and will require that all applications needing this will require using the libcorelinux++ libraries. It is also C++. Frank V. Castellucci http://corelinux.sourceforge.net =================================TOP=============================== Q329: I am trying to make a thread in Solaris to get timer signals. I am trying to make a thread in Solaris to get timer signals every second. I am using setitimer() and sigwait() to set up and catch the signals respectively. I am sorry to tell you that setitimer()/sigwait() does not work with the threads/pthreads library on Solaris. I won't go into the details, but it is a sorry tale. To make a thread do a periodic action, use cond_timedwait() on a dummy cond_t/mutex_t pair that has no other function than to be used in the call to cond_timedwait(). The thread will wakeup at the time you specify. It can then do the periodic thing and reissue the cond_timedwait() to wait another interval. Roger Faulkner [email protected] =================================TOP=============================== Q330: How do I time individual threads? I am getting very puzzling behavior using 2 threads on a 2 processor Solaris computer. I am new to Solaris, so I am probably just doing something stupid. I am writing code to parallelize some numeric computations. It was originally written on NT, and I am porting it to Solaris. I am using a dual processor Dell NT and a dual processor Sun Solaris for development. The threads are very simple, and can operate completely independently. The code ports easily from the point of view of compiling, linking, and executing. However, on NT, I get over 90% speedup in using two threads, but on Solaris I get almost none (at most about 15%). Simplified example code is shown below. double GetTime() { return ((double)clock())/CLOCKS_PER_SEC; } Not stupid, just a misinterpretation of clock(3C). This is from the clock(3C) manual page: DESCRIPTION The clock() function returns the amount of CPU time (in microseconds) used since the first call to clock() in the calling process. What you get from clock() is the CPU time used by all threads in the process since the last call to clock(). To do your timing, you want to get the elapsed time. This is my modification to testth.cpp (times() returns the number of ticks (HZ) since some time in the past): #include #include #include ... double GetTime() { struct tms dummy; return (times(&dummy))/(double)(HZ); } Roger Faulkner [email protected] =================================TOP=============================== Q331: I'm running out of IPC semaphores under Linux! >>>>> "Doug" == Doug Hodson writes: Doug> Now that I have IPC semaphores working under Linux, I am Doug> running into another problem. I'm running out of them!!! Do Doug> I have to recompile the kernel to make more available? echo 500 > /proc/sys/kernel/sem Replace 500 with whatever you want. (untested) =================================TOP=============================== Q332: Do I have to abandon the class structure when using threads in C++? > in C++ much easier. Currently to use threads in C++ you have to > virtually abandon the class structure and the type checking and > resort to low level hacking when using threads. No, that's incorrect. This problem only occurs if you do not understand the appropriate patterns and idioms for effective multi-threaded programming in C++. We've been developing and deploying high-performance and real-time OO applications in C++ for the past decade, and there's now plenty of "collective wisdom" on how to do this properly and abstractly using lots of nice high-level C++ features. I recommend that you check out the following resources for more information: http://www.cs.wustl.edu/~schmidt/Concurrency.ps.gz http://www.cs.wustl.edu/~schmidt/ACE-papers.html http://www.cs.wustl.edu/~schmidt/patterns-ace.html http://www.cs.wustl.edu/~schmidt/patterns/patterns.html All of these resources are based on the threading abstractions provided with ACE, which is a freely-available, open-source framework that defines a rich source of components for concurrent and network programming. You can learn download at from. http://www.cs.wustl.edu/~schmidt/ACE.html BTW, I'm teaching a short-course at UCLA in a couple weeks that'll cover this all material in depth. You can download the course notes and learn more about the course itself at http://www.cs.wustl.edu/~schmidt/UCLA.html Take care, Doug =================================TOP=============================== Q333: Questions about pthread_cond_timedwait in linux. >i've been programming threads for years and am just moving to linux >(redhat) and have questions about pthread_cond_timedwait: > >pthread_cond_timedwait takes a struct timespec * and looking at the >example in the doc, the fields are initialized from the fields >in gettimeofday plus an offset. that raises the following questions: > what happens if the offset puts the tv_sec over the maximum value > for that day? The tv_sec field is the number of seconds since the epoch. If this overflows, then the year must be 2037, and you aren't using the latest 256 bit hardware. :) :) :) > What happens if the clock is changed? (like a dst adjustment) The Linux implementation of pthread_cond_timedwait converts the absolute time to a relative wait, which is then the subject of a nanosleep() call (with the delivery of a signal cutting that sleep short when the condition is cancelled). The nanosleep system call in Linux is based on the system clock tick and not on calendar time. So the answer is that changing the system time will have no effect on when the call wakes up. Namely, moving the date forward will not cause an immediate wakeup. However, if an unrelated signal (not due to the condition wakeup) interrupts the pthread_cond_timewait, it will call gettimeofday() again and recompute the relative wait. At that time it may wake up due to the date change. =================================TOP=============================== Q334: Questions about using pthread_cond_timedwait. I need your help to clarify something... Consider the following code: void foo() { struct timeval tv; struct timespec ts; if (gettimeofday(&tv, NULL) < 0) // error handling stuff here // Convert and store to structure that pthread_cond_timedwait wants ts.tv_sec = tv.tv_sec; ts.tv_nsec = tv.tv_usec * 1000; // Add 10 milli-sec (this is how long I want to wait) ts.tv_nsec += 10 * 1000 * 1000 while (MyPredicate == false_ { status = pthread_cond_timedwait(&condvar, &mutex, &ts); // do stuff depending on status } // Other stuff goes here } The problem is that I get lots of ETIMEDOUTs... Here come the questions: 1) On a normal PC (single processor) running linux, what is the minimum time I can wait??? I assume 10 milli-sec is ok... 2) On the other end of the scale, what is the max time I can wait ??? e.g. can I put 300 milli-sec (i.e. ts.tv_nsec += 300 * 1000 * 1000)??? I am asking because gettimeofday will return the time in a timeval. If I just increase the usec and not the seconds, are there overflow problems ??? if tv_tv.sec is X and tv_tv.usec is 999.999, if I increase by 100.000 is that going to keep the seconds the same and go to 099.999, or is it clever enough to either increase X to X+1 OR make usec equal to 1.099.999 ??? What I am thinking is that the ETIMEDOUTs might be because the new time ends up being EARLIER that the current time. pthreads conditional waits use an absolute time to specify the timeout not a relative time. In general you will get the _time now_ and add some delta to determine the absolute time corresponding to the relative timeout delta that you wish. That's the theory. In practice system operators can totally screw you up by adjusting the clock which changes the machine's notion of the current absolute time. There isn't an awful lot that you can do about this problem except... Certain versions of Unix provide clock_gettime; among those versions of Unix some will support CLOCK_MONOTONIC, a type of clock that alawys advances at the same rate regardless of changes to the machine's absolute clock. A monotonic clock will very useful to use in conjunction with relative timeouts. The trouble with this is that while the monotonicity of the clock used for conditional waits is the default, it seems to be associated with the condition variable attribute. How then, are you supposed to compute the timeout value? My guess is clock_gettime with CLOCK_MONOTONIC + delta should be used but I can't be sure. Also, what happens if the condition variable attribute is initialized to specify a non-monotonic clock and we use a monotinic clock to compute the timeout? If anybody has up to date information on this I'd like to hear about it. >> >// Add 10 milli-sec (this is how long I want to wait) >> >ts.tv_nsec += 10 * 1000 * 1000 >> >> The problem with this statement is that it may potentially increase >> the value of tv_nsec beyond one billion less one, thus giving >> rise to an invalid struct timespec. >> > >Just to clarify, the tv_usec field (although a long) will only go up to >999.999 (max value gettimeofday will return for usec). Or, 999,999 for those of us whose locale calls for , as a digit separator symbol. ;) >Since a long goes up to 2.xxx.xxx.xxx, if I go above 999.999 >this is considered an illegal value... Yes. >And also when I convert to a timespec to use with pthread_cond_timedwait, >although again the tv_nsec field is a long, I am only allowed to >go up to 999.999.000 (or 999.999.999 ???) Yes, up to 999999999. >If yes, how come the conditional variable returns with an >ETIMEDOUT and NOT with a EINVAL ??? Because the behavior is simply undefined when you pass a bogus timespec; undefined means that any response is possible, including ETIMEDOUT. The Single UNIX Specification does not require the pthread_cond_timedwait function to detect bad timespec structures. If the programmer has taken care that the structures have valid contents, checking them is just a waste of cycles; and the progrmamer who gets them wrong will likely also ignore the return value of pthread_cond_timedwait(). See http://www.opengroup.org/onlinepubs/007908799/xsh/pthread_cond_timedwait.html =================================TOP=============================== Q335: What is the relationship between C++ and the POSIX cleanup handlers? >Ian Collins wrote: >> >> Stefan Seefeld wrote: >> > >> > Ian Collins wrote: >> > >> > > Cleanup handlers are not a good place to destroy objects, use some >> > > for of container object that to the object and delete it in its >> > > destructor >> > > (a smart pointer) to do this. >> > >> > that would indeed be the right thing, if....if thread cancelation >> > would do proper stack unwinding. >> > >> > Stefan >> >> I would hope it does - the Solaris one does. > >Only if you use Sun's own compiler and even then - not always. From what I understand, in this compiler, when a thread is canceled, it acts as if some special exception was thrown which is never caught and which does not call unhandled()---in other words, just the unwinding is performed. What is the relationship between this unwinding and the POSIX cleanup handlers? Are these properly interleaved with the unwinding, or are they done first? In other words, what I'm asking is: are the POSIX cleanup handlers somehow hooked into the destructor mechanism so that if I have this construct: { Object foo; pthread_cleanup_push(A, ...) { Object bar; pthread_cleanup_push(B, ...) pthread_exit(PTHREAD_CANCELED); pthread_cleanup_pop(1); } pthread_cleanup_pop(1); } what happens? Ideally, handler B() would get called, then the destructor of object bar, followed by the handler A, and then the destructor of object foo. For that to work, the cleanup handlers would have to be hooked into the unwinding mechanism of the C++ implementation, rather than on a separate stack. E.g. pthread_cleanup_push(X, Y) would be a macro which expands to something like: { __cleanup_obj __co(X, Y); where __cleanup_obj is a class object with a destructor which calls the handler---except that some compiler extensions are used to make this work in C as well as C++. I know there are ways to add these kinds of hooks into GCC. I'm thinking that it would be nice to add this support to LinuxThreads, and it would also be nice if it was compatible with some existing good or popular scheme. LinuxThreads currently keeps its own chain of cleanup nodes, but there is no reason not to use some GCC extensions to get it to use the destructor mechanism instead (other than breaking binary compatibility; i.e this would have to wait until glibc 2.2.x, and support for 2.1.x and 2.0.x cleanup handling would have to be retained.) =================================TOP=============================== Q336: Does selelct() work on calls recvfrom() and sendto()? >>> I hope that some one can help me see the light. >>> >>> Assume: >>> 1. A socket based server. >>> 2. On a client connection server creates a child-server thread to >>> take care of this clinet. >>> 3. Child-server implements a retransmission of packet on negative >>> ACK (uses alarm signal for time out) >> >>Why not use select() with a timeout to block each child-server thread >>instead of alarm? > >Does selelct() work on calls recvfrom() and sendto()? I am under the >impression it only works on connection oriented sockets accept(), >read(), write(), recv(), send() etc. Please give a simple sketch of >the usage? No, select works on datagram sockets as well. On UNIX-like systems, select also works on other kinds of objects: regular files (not really useful there), terminal devices, printer ports, etc. It works pretty much the same way on datagram sockets as it does on stream sockets. Read availability means there are one or more datagrams waiting. Write availability means there is buffer space for datagrams. =================================TOP=============================== Q337: libc internal error: _rmutex_unlock: rmutex not held. >Hi, > >I'm writing a distributed application using Iona's Orbix ORB and RogueWaves >ToolsPro class libraries. Since the application is multi-threaded, I'm using >RogueWave's Threads.h++ also. The application is build over POSIX threads. >When I run the application I get the following error: > >libc internal error: _rmutex_unlock: rmutex not held. > >...and the application just hangs there. I have tried moving to Solaris >threads >instead, but of no good use. I tried some sample thread programs but >they all >worked fine. > >Is there something I'm missing? A quick reply or advice will be greatly >appreciated as the deadlines are short and customer not in a good mood >:-) The message you are getting indicates an internal inconsistency in libc. The standard I/O implementation (in libc) uses mutex locks to protect its internal data structures (the stuff behind the FILE struct). The message is saying that some thread that is doing standard I/O attempted to unlock a lock that it does not own. This, of course, "cannot happen". It could be caused by the application (overwriting memory) or it could be an inconsistency between libc and libthread (caused by linking libc statically but libthread dynamically [you can do static linking with libc but there is no static libthread]) or the application could be defining its own _thr_main() function that subverts the ones in libthread and libc. To make any progress on the problem, I'd need a test case that exhibits the problem. I have to admit that I know nothing about the other software you are using (Iona's Orbix ORB and RogueWave's ToolsPro class libraries and RogueWave's Threads.h++) but they might interfere with the proper working of libthread and libc (I'm just speculating here, not accusing). And, of course, there could be a bug somewhere in libthread/libc. One thing you could do to discover more about the problem would be to apply a debugger (adb or dbx or gdb) to the hung process. Also you can get stack traces of all the threads in the process by applying the pstack command to the hung process: $ /usr/proc/bin/pstack What release of Solaris are you running? Would it be possible for you to send me your program? Maybe we should just continue this privately via e-mail rather than on the newsgroup. Feel free to send me mail. Roger Faulkner [email protected] ===== From: Boris Goldberg It may happen due to incorrect order of linking with libc and libthread. You must link with libthread before libc. That can be ensured by specifying -mt flag on lin line. do ldd on your program: if you see libthread after libc, that's your problem =================================TOP=============================== Q338: So how can I check whether the mutex is already owned by the calling thread? On Mon, 27 Mar 2000 12:08:54 GMT, [email protected] wrote: >Thanks for all your qualified contributions. >This is what I've learned: >If I want to abide by the POSIX standard on UNIX platforms I'd better >drop the habit of using recursivly lockable mutexes. OK, so be it. But >I'd really love to port a lot of existing C++ code and use it on Linux. Obviously. Any sort of proscription against recursive mutexes must be weighed against the pressing need to port a whole lot of code that needs them. >So how can I implement my Mutex-Lock-class in a way that it checks >whether the mutex is already owned by the calling thread? Very easily. The class simply has to store a counter and the ID of the owning thread. These can be protected by an additional internal mutex. >It looks to me that if I just put in an additional boolean flag, no >thread can safely check this flag because it may be changed >simultanously by another thread. >Given a mutex class "NThreads::Mutex" (that used to be recursivly >lockable), the class NThreads::MutexLock has been implemented as you >can see below (abbreviated). How can I change it to make it work with a >non-recursive Mutex class? > >namespace NThreads >{ >class Mutex >{ > friend class MutexLock; >public: > Mutex(); > ~Mutex(); > bool lock( int timeout ) > { > //return true if not timed out > } These kinds of strategies aren't all that useful, except for debugging assertions. About all you can do in the case of such a timeout is to log an error that a deadlock has probably occured and then abort the application. It is an internal programming error that is not much different from a bad pointer dereference, or divide by zero, etc. > void unlock() > { > // unlock system mutex > } >private: > void lock() > { > // lock with infinite timeout, no need for return value, but >dangerous. > } >}; > >class MutexLock >{ > public: >MutexLock( Mutex& mtx ) : rMtx_( mtx ) >{ > rMtx_.lock(); >} I see, this is just one of those safe lock classes whose destructor cleans up. It is the Mutex class that should be made recursive, not the safe lock wrapper, as in: #ifdef USE_POSIX_THREADS class Mutex { pthread_mutex_t actual_mutex_; pthread_mutex_t local_mutex_; pthread_t owner_; int recursion_count_; public: Mutex(); ~Mutex(); void Lock(); void Unlock(); void Wait(Condition &); }; #endif #ifdef USE_OS2_THREADS // definition of Mutex class for OS/2 #endif The methods definitions for the POSIX variant would look something like this: Mutex::Mutex() : recursion_count_(0) { pthread_mutex_init(&actual_mutex_, NULL); pthread_mutex_init(&mutex_, NULL); // leave owner_ uninitialized } Mutex::~Mutex() { assert (recursion_count_ == 0); int result = pthread_mutex_destroy(&actual_mutex_); assert (result == 0); int result = pthread_mutex_destroy(&local_mutex_); assert (result == 0); } void Mutex::Lock() { pthread_mutex_lock(&local_mutex_); if (recursion_count_ > 0 && pthread_equal(pthread_self(), owner_)) { assert (recursion_count_ < INT_MAX); // from recursion_count_++; } else { pthread_mutex_unlock(&local_mutex_); pthread_mutex_lock(&actual_mutex_); pthread_mutex_lock(&local_mutex_); assert (recursion_count_ == 0); recursion_count_ = 1; owner_ = pthread_self(); } pthread_mutex_unlock(&local_mutex_); } void Mutex::Unlock() { pthread_mutex_lock(&local_mutex_); assert (pthread_equal(pthread_self, owner_)); assert (recursion_count_ > 0); if (--recursion_count_ == 0) pthread_mutex_unlock(&actual_mutex_); pthread_mutex_unlock(&local_mutex_); } Or something along these lines. I haven't tested this code. I did make sure that wherever both locks are held, they were acquired in the same order to prevent the possibility of deadlock. It's more or less obvious that you must never try to acquire the actual mutex while holding the local one. A condition wait requires special trickery: void Mutex::Wait(Condition &cond) { pthread_mutex_lock(&local_mutex_); assert (pthread_equal(pthread_self, owner_)); assert (recursion_count_ > 0); int saved_count = recursion_count_; recursion_count_ = 0; pthread_mutex_unlock(&local_mutex_); pthread_cond_wait(&cond.cond_, &actual_mutex_); pthread_mutex_lock(&local_mutex_); assert (recursion_count_ == 0); recursion_count_ = saved_count; owner_ = pthread_self(); pthread_mutex_unlock(&local_mutex_); } I hope you can massage this into something that works. If I messed up, flames will ensue. -------------------------- As a followup to my own posting, I want to make a remark about this: >A condition wait requires special trickery: > > void Mutex::Wait(Condition &cond) > { > pthread_mutex_lock(&local_mutex_); > > assert (pthread_equal(pthread_self, owner_)); > assert (recursion_count_ > 0); > int saved_count = recursion_count_; > recursion_count_ = 0; > > pthread_mutex_unlock(&local_mutex_); > > pthread_cond_wait(&cond.cond_, &actual_mutex_); > > pthread_mutex_lock(&local_mutex_); > > assert (recursion_count_ == 0); > recursion_count_ = saved_count; > owner_ = pthread_self(); > > pthread_mutex_unlock(&local_mutex_); > } Firstly, there is no condition checking while loop around the pthread_cond_wait because it is assumed that the caller of Mutex::Wait() will implement the re-test. The intent here is only to wrap the call. Thanks to John Hickin for raising this in an e-mail. Secondly, because pthread_cond_wait is a cancellation point, it is necessary to deal with the possibility that the waiting thread may be canceled. If that happens, the actual_mutex_ will be locked by the canceled thread, but the state of the owner_ and recursion_count_ will not be properly recovered. Thus the user of the class has no recovery means. This requires a messy change, involving an extern "C" redirection function which calls a method that does mutex reacquire wrapup. There is a need to communicate the saved recursion count to the cleanup handler, as well as the identity of the mutex object, using a single void * parameter, so a context structure is introduced: struct MutexContext { Mutex *mtx_; int saved_count_; MutexContext(Mutex *m, int *c) : mtx_(m), saved_count_(c) { } }; The cleanup handler is then written, which takes the context and calls the object, passing it the saved count: extern "C" void Mutex_Cancel_Handler(void *arg) { MutexContext *ctx = (MutexContext *) arg; ctx->mtx_->CancelHandler(ctx->saved_count_); } The code that is executed at the end of the old version of Mutex::Wait is moved into a separate method. This assumes that actual_mutex_ is locked on entry, which is the case if the pthread_cond_wait is canceled. void Mutex::CancelHandler(int saved_count) { // actual_mutex_ is locked at this point pthread_mutex_lock(&local_mutex_); assert (recursion_count_ == 0); recursion_count_ = saved_count; owner_ = pthread_self(); pthread_mutex_unlock(&local_mutex_); } Finally, Wait() is revised to look like this: void Mutex::Wait(Condition &cond) { pthread_mutex_lock(&local_mutex_); assert (pthread_equal(pthread_self, owner_)); assert (recursion_count_ > 0); MutexContext context(this, recursion_count_); recursion_count_ = 0; pthread_mutex_unlock(&local_mutex_); // Ensure cleanup takes place if pthread_cond_wait is canceled // as well as if it returns normally. pthread_cleanup_push(Mutex_Cancel_Handler, &context); pthread_cond_wait(&cond.cond_, &actual_mutex_); pthread_cleanup_pop(1); } =================================TOP=============================== Q339: I expected SIGPIPE to be a synchronous signal. > >Using Solaris threads under Solaris 5.7. > > > >I would have expected SIGPIPE to be a synchronous signal when it > >occurs as a result of a failed write or send on a socket that has > >been disconnected. Looking through past articles in Deja seemed to > >confirm this. > > > >However, I thought I would undertake the radical idea of actually > >testing it. In my tests it looks as if it's an asynchronous signal. > > Yes, it is an asynchronous signal in Solaris. > This is not a bug in Solaris; it is intentional. > > The purpose of SIGPIPE is to kill a process that is writing > to a pipe but that has made no provision for the pipe being > closed at the other end. > On HP-UX, SIGPIPE is a synchronous signal and one shouldn't even try 'sigwait'-ing for it. Sounds logical too. Any reason why it's different on Solaris7? The above paragraph didn't seem like a very convincing reason. Thanks, -- Rajiv Shukla > If you want to deal with a pipe or socket being closed, then either > mask SIGPIPE or catch it with a do-nothing signal handler and test > the errno that comes with a failed write() or a send() operation. > If it is EPIPE, then that corresponds to SIGPIPE, and you have > gotten the answer synchronously. In Digital (Tru64) Unix, we made SIGPIPE a synchronous signal, and I still believe that's the right disposition for it. Uncaught, I will terminate the process. Caught, it allows corrective action to occur in the thread that cares about the broken connection. Useful? Barely. More accurate? Much. That aside, the best thing to do with SIGPIPE is to set it to SIG_IGN and pick up the EPIPE error return on the write() call. Masking/catching the signal isn't the right thing to do if you don't care about the signal, and you most likely don't. It's cheapest to ignore it and move on. Jeff =================================TOP=============================== Q340: I have a problem between select() and pthread... >Hi! everyone.. > >I have a problem that is the syncronization between select() and pthread... > >That is as follows... > >the main thread is blocking in select() func. >and at the same time, the other thread is closed a socket descriptor in >fd_set.. this work causes a EBADF error in select(). >so, I wrote in main thread: > >SELECT_LABEL: > if ((nready = select(nfds, readfds, writefds, exeptionfds)) == -1) { > if (errno == EBADF) goto SELECT_LABEL; > perror("select()"); > } > >But that is not solved... >after goto syntax, I got infinitely EBADF error in select(). > >How do I for solving that??? >after select(), close a socket descriptor?? >or only *ONE* thread controls socket descriptors?? > >I use the POSIX thread on Solaris 7.. You have to figure out in the main thread which file descriptor was closed by the other thread and delete its bit from the fdset's before reissuing the select(). The select interface() itself will not help you to determine this. In Solaris, select() is implemented on top of poll(2). If you use the poll() interface directly, then a closed file descriptor will show up in the array of pollfs's with revents containing the POLLNVAL bit. Thus the poll() interface will tell you which file descriptor has been closed and you can stop polling on it. Roger Faulkner [email protected] =================================TOP=============================== Q341: Mac has Posix threading support. > I'm looking at a cross-platform strategy for our application. > Threads was one issue which came up, and Posix threads seems like a good > prospect. > It is supported under Windows (http://sourceware.cygnus.com/pthreads-win32/) > and Unix, but I don't think Mac has Posix threading support. I'm maintaining a free (nonpreemptive) pthreads library, available at ftp://sunsite.cnlab-switch.ch/platform/macos/src/mw_c/GUSI* Matthias -- Matthias Neeracher http://www.iis.ee.ethz.ch/~neeri "I really don't want the SNMP agent controlling my toilet to tell someone when/where I'm using it." -- Sean Graham =================================TOP=============================== Q342: Just a few questions on Read/Write for linux. >Just a few questions on Read/Write >lock stuff since man pages don't exist (yet) >for linux. > >1) Where can I find documentation, sample code, >or anything else that will help (eg. URLs etc.) These locks are based on The Single Unix Specification. http://www.opengroup.org/onlinepubs/007908799/ >2) Can I treat the rwlock stuff same as a mutex >in terms of init/destroy/lock/unlock/trylock ??? >I had a look at pthread.h and all the calls look >the same... (Is it basically a mutex that allows >multiple locks for readers?) Something like that. >3) What's the story with overhead if you start using >r/w locks? In Linux, there is somewhat more overhead compared to mutexes because the locks are more complex. The structures and the operations on them are larger. Also, as of glibc-2.1.3, each thread maintains a linked list of nodes which point to the read locks that it owns. These nodes are malloced the first time they are needed and then kept in a thread-specific free list for faster recycling. The nodes of these lists are destroyed when the thread terminates. Each time a read lock is acquired, a linear search of this list is made to see whether the thread already owns the read lock. In that case, a reference count field is bumped up in the linked list field and the thread can proceed. (The lists are actually stacks, so that a recently acquired lock is at the front of the list.) This algorithm is in place in order to implement writer-preference for locks having the default attribute, while meeting the subtleties of the spec with respect to recursive read locks. The prior versions of the library purported to implement writer preference, but due to a bug it was actually reader preference. >4) If you have many readers could that mean that the >writer will never get a chance to lock, or are the >locks first-come-first-serve ??? I'm thinking Writer preference, subject to the requirements of The Single UNIX Specification which says that a thread may recursively acquire a read lock unconditionally, even if writers are waiting. In glibc-2.1.3, LinuxThreads supports the non-portable attribute PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP which gives you more efficient writer preference locks, at the cost of not supporting recursive read locks. These kinds of locks do not participate in the aforementioned linked lists. If a writer is waiting on a lock, and a thread which already has a read lock tries to acquire another one, it simply deadlocks. >(I know it's probably dim but...) if a reader can >always lock, there might be a case where there is >always at least one reader on the mutex. What >happnes if a writer comes along and rwlocks ??? If you read the spec, you will note that this is implementation defined. An implementation may, but is not required to, support writer preference. The Linux one does (now). =================================TOP=============================== Q343: The man pages for ioctl(), read(), etc. do not mention MT-safety. > >But so far I do have an implementation in mind, and >I have learned enough to check if any library >functions I will call are MT-safe. And so I >started checking man pages, and to my horror >found that the man pages for such indespensable >familiars as ioctl(), read(), and write() do >not mention this issue. > >(messy complication: I'm looking at man pages on >SunOS, but the project will be on Linux. I don't >have a Linux account yet. Bother, said Pooh) On Solaris, everything in section 2 of the manual pages (that is, system calls, not general C library functions) is thread-safe unless explicitly stated otherwise. Sorry that the man pages are not more clear on this point. I can't speak for Linux. Roger Faulkner [email protected] =================================TOP=============================== Q344: Status of TSD after fork()? >OK, here's an ugly scenario: > >Imagine that you're some thread running along, you've got some reasonable >amount of stuff stashed away in pthread_{get,set}specific[1]. > >Now you call fork(). > >Those who have read the POSIX standard know that "If a multithreaded >process calls fork(), the new process shall contain a replica of the >calling thread and its address space... Consequently ... the child >process may only execute async-signal safe operations until ... one of >the exec functions is called." > >So, the process is using pthread_*, but it hasn't called pthread_create(), >so it doesn't really count as a multithreaded process, right? (Well, I'm >using that as an assumption at the instant.) I can't speak for other implementations, but with Solaris pthreads, the child of fork() is a fully-fledged multithreaded process that contains only one thread, the one that performed the fork(). It can continue doing multithreaded stuff like create more threads. Of course, there are the standard caveats that apply to fork(), like the process must have dealt with its own locks by appropriate use of pthread_atfork(3THR) or some other mechanism. >Now for the hard part: Does pthread_self() return the same value for the >thread in the child process as it did in the parent for the thread that >called fork()? This has implications on thread-specific data, in that >the definition of "the life of the calling thread" (POSIX 1003.1:1996 >section 17.1.1.2, lines 15-16) would be assoicated (in my mind) to the >result of pthread_self(). On Solaris, in the child process, pthread_self() returns 1 (the thread-ID of the main thread) regardless of the value of the thread-ID of the corresponding thread in the parent process. >So what I'm looking for is opinions on: > A) Should thread-specific data be replicated, or > B) Should all pthread_getspecific keys now return NULL because it's a > new thread in a different process? >Ugh. Implementor opinions welcome, as well as users. On Solaris, the thread-specific data of the forking thread in the child process is replicated. Should it? I think so, but you must ask the standards bodies. >[1]I like to think of pthread_{get,set}specific as (conceptually) indexing > a two-dimensional array that is addressed on column by the result of > pthread_self(), and the row by pthread_key_create()'s return. You should stop thinking this way. The thread-ID is an opaque object; it is not to be interpreted as an index into anything. You should think of pthread_{get,set}specific as being indexed by the thread (its register set if you wish), not by its thread-ID. Roger Faulkner [email protected] =================================TOP=============================== Q345: Static member function vs. extern "C" global functions? Do I have to? Oh well here goes.... This still uses a nasty cast. It is also not a good idea to start a thread in a constructor for the simple reason that the thread may run _before_ the object is constructed - this is even more likely if this a base class - I know, I've been there and done that. Use an extern "C" friend as in the following compete example: #include #include extern "C" void* startIt( void* ); class Fred { pthread_t tid; friend void* startIt( void* ); void* runMe() throw() { std::cout << "Done" << std::endl; return NULL; } public: int start() throw() { return pthread_create( &tid, NULL, startIt, this ); } pthread_t id() const throw() { return tid; } }; void* startIt( void* p ) { Fred* pF = static_cast (p); return pF->runMe(); } int main() { Fred f; int s; if( (s = f.start()) ) return s; std::cout << "Started" << std::endl; void* status; pthread_join( f.id(), &status ); pthread_exit( 0 ); } Warwick Molloy wrote: > Hi, > > What's the difference between a static member function and extern "C" global > functions? > > name mangling > > All C++ code is linked with a regular C linker. That's why you need name > mangling to allow such things as overloading etc. > > If you want to get an extern "C" pointer to a static member function, do this > > extern "C" { > typedef void* (*extern_c_thrd_ptr)( void *); > } > > class floppybunny { > > void worker_func( void ); > > static void* foo_func( void *p) > { > floppybunny* ptr =(floppybunny*)p; > > ptr -> worker_func(); // convert explic this pointer to implied this > pointer. > } > > floppybunny( void ) > { > pthread_create( &tid, (extern_c_thrd_ptr)foo_func, (void*)this); > } > }; > > QED > > That makes the thread function nicely associated with your class and best of > all... > > IT WORKS. > > Regards > Warwick. (remove the spam to reply) > > Ian Collins wrote: > > > Timmy Whelan wrote: > > > > > You can also make the member function static: > > > > > > > For the Nth time, static members are _NOT_ the same as extern "C" > > functions. > > Thier linkage may be different. Use a friend defined as extern "C" or make > > the > > real start member public. > > > > Ian > > > > > > > > class foo > > > { > > > public: > > > static void *startThread(void *param); > > > > > > void *actualThreadFunc( ); > > > }; > > > > > > void * > > > foo::startThread( void *param ) > > > { > > > foo *f = (foo *)param; > > > return f->actualThreadFunc( ); > > > } > > > > > > If you need to pass in parameters, use member variables. > > > > > > "Mr. Oogie Boogie" wrote: > > > > > > > > Howdy, > > > > > > > > How does one make a C++ class member function as the starting function > > > > for a thread? > > > > > > > > I keep getting the following warning and have been unable to find any > > > > documentation/source to get rid of it. > > > > > > > > slm_th.cc: In method `slm_th::slm_th(char * = "/dev/tap0")': > > > > slm_th.cc:98: warning: converting from `void * (slm_th::*)(void *)' to > > > > `void * ( > > > > *)(void *)' > > > > > > > > This is the class: > > > > > > > > class slm_th { > > > > public: > > > > void *Read(void *arg); > > > > } > > > > > > > > void *slm_th::Read(void *arg) { > > > > ... > > > > } > > > > > > > > Thanks, > > > > > > > > -Ralph > > > One minor point: Calling convention is, in the general case, a compiler-specific thing and not an operating-system-specific thing. Different compilers for the same operating system can easily have calling conventions for functions with "C" or "C++" linkages that are incompatible. (Some platforms/operating systems have an ABI standard that defines the C language calling conventions for the platform and operating system. This is not universally the case, however. It is especially not the case for x86 platforms running non-Unix operating systems.) =================================TOP=============================== Q346: Can i kill a thread from the main thread that created it? >can i kill a thread from the main thread that created it? >under Windows, i only found the CWinThread::ExitInstance () method, You can kill a thread with TerminateThread(). Using TerminateThread is really, really, really, really, not recommended. If thread owns a critical section the critical section is not released and it will forever be unaccessable. If other threads then try to enter it they will hang forever. Also, the stack allocated to the thread is not released and various other bad things can happen. If you think you need to use TerminateThread it's a good sign that your threading design is broken. You should be telling the thread to exit itself. Figuring out how to call TerminateThread using MFC'isms such as CWinThread is left as an exercise to the reader. -Mike > Also, the stack allocated to the thread is not released > and various other bad things can happen. Yes, it's that bad... A while ago I started writing an app that used TerminateThread() - it leaked about a megabyte per second under load =). > If you think you need to use TerminateThread it's a > good sign that your threading design is broken. > You should be telling the thread to exit itself. I don't agree 100%; I've encountered several situations where it would be very handy to kill a thread (think about a long-running computation whose results you aren't interested in anymore). Pthreads has a nice design - a thread can explicitly say when it may be cancelled... (I often end up coding a solution like yours - a message-passing mechanism to tell threads to die - but that always seems to add more complexity than it's worth...) Dan =================================TOP=============================== Q347: What does /proc expose vis-a-vis LWPs? > >> Thanks for the answer! I would really like to know how to see which >> thread is running on which processor to see if my multithreaded >> app (which uses the pipeline model) is really using the 6 available CPUs on my >> platform. Is there such a beast? > >/proc on Solaris doesn't expose this information, so I doubt that any >non-Sun utility can show it. I don't know if Sun has something (bundled or >unbundled). As for migrating LWPs from one processor to another - it's >perfectly normal on Solaris. You are wrong. /proc does provide this information, in the lwpsinfo struct contained in /proc/ /lwp/ /lwpsinfo for each lwp in the process: processorid_t pr_onpro; /* processor which last ran this lwp */ It is displayed with the prstat utility. Use the command 'prstat -L' to see each lwp in each process. Roger Faulkner [email protected] =================================TOP=============================== Q348: What mechanism can be used to take a record lock on a file? > whats mechanism can be used to take a record lock on a file (using the > fcntl() call), in a posix multi threaded application. Seems to me that > these locks are process based, and therfore multiple threads within the same > process are treated as the same thing. > > Any pointer would be appreciated This has been discussed several times before. Yes, fcntl() locks are process-based, for a number of reasons historical and pragmatic. Some people have successfully built a two-level file locking strategy that uses mutexes between threads within a process and fcntl() between processes. Essentially, you reference count the fcntl() lock(s) so that the process holds an fcntl() lock whenever any thread within the process has an area locked; if more than one thread within the process is interested in the same file area, they synchronized among themselves using a mutex. I believe that sample code may have been posted. Search the newsgroup archives, if you can find a good server. (I don't know what the state of Deja is now; it was always a good one, and may be again if the transfer of control has been straightened out.) /------------------[ [email protected] ]------------------\ =================================TOP=============================== Q349: Implementation of a Timed Mutex in C++ Thanks to everbyody who spend brain in my program. It works!(stable like a rock) Here it is. If it is usefull for somebody -> use it: #include #include #include #include #include #include"pthread.h" typedef struct { pthread_mutex_t mutex; pthread_cond_t cond; pthread_t owner; int value; }Mutex_t; void* main_mutex; /* Pointer to my main_Mutex */ int mutexTestCnt = 0; /* Counter */ pthread_t thread; pthread_cond_t startcond; /* Cond to start threads */ pthread_mutex_t startmutex; int MutexCreate(void* *id) { Mutex_t *Mutexvar = malloc(sizeof(Mutex_t)); if(Mutexvar == NULL) {return -1;} pthread_mutex_init(&Mutexvar->mutex,NULL); pthread_cond_init(&Mutexvar->cond,NULL); Mutexvar->value=1; *id = (void*)Mutexvar; return 0; } int MutexDelete(void* id) { Mutex_t *mutex =(Mutex_t *)id; if (mutex->value!=1) {return -1; } free(mutex); return 0; } int MutexObtain(void* id, int timeoutrel) { Mutex_t *mutex =(Mutex_t *)id; int status=0; struct timeval now; struct timespec timeout; if(mutex == NULL) return -1; pthread_mutex_lock(&mutex->mutex); if ((mutex->value<0)||(mutex->value>1)) { pthread_mutex_unlock(&mutex->mutex); return -2; } if (mutex->value==0) { gettimeofday(&now,NULL); timeout.tv_sec = now.tv_sec + timeoutrel; timeout.tv_nsec = now.tv_usec * 1000; do{ status=pthread_cond_timedwait(&mutex->cond,&mutex->mutex,&timeout); if(status==ETIMEDOUT) { pthread_mutex_unlock(&mutex->mutex); return -3; } }while((status!=0)||(mutex->value!=1)); } mutex->value=0; mutex->owner=pthread_self(); pthread_mutex_unlock(&mutex->mutex); return 0; } int MutexRelease(void* id) { Mutex_t *mutex =(Mutex_t *)id; pthread_mutex_lock(&mutex->mutex); if ((mutex->value<0)||(mutex->value>1)) { pthread_mutex_unlock(&mutex->mutex); return -1; } if (pthread_equal(mutex->owner,pthread_self())==0) { pthread_mutex_unlock(&mutex->mutex); return -2; } mutex->value=1; mutex->owner=0; pthread_cond_signal(&mutex->cond); pthread_mutex_unlock(&mutex->mutex); return 0; } void *testfunc(void * arg) { int i; pthread_mutex_lock(&startmutex); /* Start all threads at the same time */ pthread_cond_wait(&startcond,&startmutex); pthread_mutex_unlock(&startmutex); printf("Thread %s started as %i.\n",(char *)arg,pthread_self()); for(i=0;i<100000;) { if(MutexObtain(main_mutex, 1000) != 0) { printf("Thread %i: MutexObtain() FAILED\n", thread); } /* Modify protected variables */ i = ++mutexTestCnt; thread=pthread_self(); /* Release CPU */ sched_yield(); /* And check if somebody else could get into the critical section */ if(i!=mutexTestCnt) { printf("Thread %i: Mutex violated by %i\n", thread,pthread_self()); } /* Leave critical section */ if(MutexRelease(main_mutex) != 0) { printf("Thread %i: MutexRelease() FAILED\n", thread); } /* Allow rescheduling (another thread can enter the critical section */ sched_yield(); } printf("Thread %s ready\n",(char *)arg); return NULL; } int main(void) { pthread_t t_a,t_b,t_c; int ret; char* a; pthread_cond_init(&startcond,NULL); pthread_mutex_init(&startmutex,NULL); if(MutexCreate(&main_mutex)!=0) return -1; ret=pthread_create(&t_a,NULL,testfunc,(void *)"a"); if(ret!=0) fprintf(stderr,"Can't create thread a\n"); ret=pthread_create(&t_b,NULL,testfunc,(void *)"b"); if(ret!=0) fprintf(stderr,"Can't create thread b\n"); ret=pthread_create(&t_c,NULL,testfunc,(void *)"c"); if(ret!=0) fprintf(stderr,"Can't create thread c\n"); printf("Press key to start\n"); getc(stdin); pthread_mutex_lock(&startmutex); pthread_cond_broadcast(&startcond); pthread_mutex_unlock(&startmutex); ret=pthread_join(t_a,NULL); ret=pthread_join(t_b,NULL); ret=pthread_join(t_c,NULL); MutexDelete(main_mutex); printf("All done\n"); return 0; } =================================TOP=============================== Q350: Effects that gradual underflow traps have on scaling. Dave Butenhof writes: > Martin Shepherd wrote: > > By the way, neither in your book, nor in the other POSIX threads books > > that I have, is there any mention of the devastating effects that > > gradual underflow traps can have on scaling. I'm not even sure why > > this is occurs, and would like to understand it better. My guess is > > that if multiple threads are suffering underflows at the same time, as > > was the case in my program, there is contention for a single underflow > > handler in the kernel. Is this correct? > > Perhaps, in the HP-UX kernel. I don't know. It would depend on whether the > underflow is handled by hardware or software; and, if in software, precisely > how and where. If you're reporting underflow traps to the APPLICATION, that's > certainly a performance sink if you're underflowing much; signal delivery is > expensive, and certainly doesn't help your application's scaling. My experience on a number of systems is that gradual underflow is usually performed in software, not in hardware, and this includes expensive workstations and super-computers traditionally used for number crunching. For example, Sun sparcs, HP's, Dec Alpha's etc.., all do this. If this weren't bad enough, there is no standard way to disable it. In Solaris one calls nonstandard_arithmetic(), on HP one calls fpsetflushtozero(), and I don't know what one does on other systems. Whether gradual-underflow traps are delivered as signals all the way to the application, or whether the kernel handles them I don't know, but regardless, they can increase the run time of any program by large factors, and seriously suppress scaling in parallel programs, so in general it is really important to either avoid them or disable them. In particular, the ability to reliably disable them process-wide, just as a diagnostic aid, is indispensable, because vendors rarely provide tools to monitor them. > This is getting extremely machine-dependent, and therefore it's hard to say > much about it in a general book. Furthermore, even on platforms where it's a > problem, it's only going to affect the (relatively, and maybe absolutely) > small number of FP-intensive applications that do a lot of > underflowing. While it is true that most FP-intensive applications shouldn't underflow, and that good programmers will do their utmost to avoid performing any calculations that might underflow, everybody makes mistakes. In my case, once I worked out how to globally enable sudden underflow across all of my threads, my program speeded up by a factor of 4. This then led me to a bug in the test that was supposed to have prevented the underflowing calculations in the first place, and the end result was a factor of 15 speedup. I agree that this is somewhat specialized and very machine specific, but so are the discussions of memory barriers, and memory caching models that one finds in good books on parallel programming with threads... Martin =================================TOP=============================== Q351: LinuxThreads woes on SIGSEGV and no core dump. > > is there something inherently wrong with my system or is this all > "normal" behaviour? i'm using the pthreads shipped with glibc 2.1.2 - > they might be a bit old, but i don't want to get into a big fight with > my sysadmin. I have experienced all sorts of strange errors similar to yours. The workaround is to include this in your program: void sig_panic(int signo) { pthread_kill_other_threads_np(); abort(); } .. struct sigaction act; memset(&act,0,sizeof(act)); act.sa_handler = sig_panic; sigaction(SIGSEGV, &act, NULL); sigaction(SIGBUS, &act, NULL); sigaction(SIGTRAP, &act, NULL); sigaction(SIGFPE, &act, NULL); This produces reliable core dumps and you can do a post-morten analysis. Regards, Ivan =================================TOP=============================== Q352: On timer resolution in UNIX. Under most Unix flavors, user processes (usually) enjoy the 10ms resolution. This is the time unit the kernel dispatcher is timer-interrupted to handle ``asynchronous'' events. When timer-related events, such as, firing, handling, etc., are bound to the dispatcher `tick', it is not possible to get finer resolution than that. But, there are several exceptions to the above, especially on machines equipped with ``cycle counters''. IRIX 6.5.x allows privileged processes to call nanosleep() with sub-millisecond resolution. The actual resolution is only restricted by the overhead to dispatch a kernel thread to handle the event. I have seen reaction times in the range of 300-400 micro-seconds on 200 MHz 2 CPU systems. The same is true for timers (see timer_create()) based on the CLOCK_SGI_FAST timer, which is IRIX specific, and thus, not portable. Solaris 8 finally managed to be able to disassociate the handling of timer events from the scheduler tick. One can utilize the high-resolution cycle counter by specifying CLOCK_HIGHRES for clock-id in the timer_create(3RT) call. I have seen sub-millisecond resolutions under Solaris 8. Unfortunately nanosleep(3RT) is still bound to the 10ms dispatcher tick. For earlier Solarise's one could change the HZ (or something like that) variable to, say, 1000, in order to obtain 1 millisecond dispatcher tick duration. Some people claimed that this can be tuned to 10000, but then the system could spend most of its time serving the timer interrupts. HP-UX 11.00 supports the 10ms resolution with nanosleep() and timer_create(). One needs to get special real-time version of the kernel in order to have access to higher resolution timers. From a casual perusal of BSD4.4 derivatives (and I think also in Linux systems) the best on can get is the 10ms resolution. In POSIX systems the portable way to request ``high-resolution'' timers is via the CLOCK_REALTIME clockid in timer_create() which is guaranteed to be as small as 10ms. I have not seen any system giving finer resolution than 10ms with timer_create()and CLOCK_REALTIME. I don't have access to AIX or Tru-Unix 64. poll(), select(), sigtimedwait() offer the usual 10ms resolution. Michael Thomadakis Computer Science Department Texas A&M University Joe Seigh wrote: > bill davidsen wrote: > > > I believe that the resolution of select() is actually 100ms, even > > though you can set it in us. > > > I think what you are seeing is probably an artifact of the scheduler. It > looks like what the system is doing is when the timer pops, the system just > marks the thread ready and the the thread has to wait until the next available > time slice. On solaris for programs in the time sharing class this appears > to be about 10 ms or so. Try timing nanosleep with various settings to > see this affect. > > You might try running the program at real time priority to put it in the > real time scheduling class and playing with the scheduler's real time > parameters. However setting up the kernel for real time stuff probably > increases the kernel overhead significantly, so if you are looking for > overall system throughput, this is not the way to do it. > > For non timed waits, I've seen a lot less latency. This is probably because > the pthread implementation chose to pre-empt a running thread. The implication > of this is that they are rewarding the cooperative processing model though > possibly at the expense of extra context switching unless you do something > to alleviate that. > > Joe Seigh =================================TOP=============================== Q353: Starting a thread before main through dynamic initialization. > c) As my program starts I might start a thread before main because of > some other file static object's dynamic initialization. This thread > might acquire my lock xyzzy before that lock is dynamically initialized > setting xyzzy.locked_ to 1. My coding policies do not permit this. I recommend that you don't allow it either. Threads should not be started by the initialization or creation of static objects. This just makes too many problems. For many of my classes, since we know that there is only one thread running before all initialization is complete, we don't bother to mess with any locks, we just bypass them. Fortunately, any thread created after a change to a memory location is guaranteed to see that change. DS =================================TOP=============================== Q354: Using POSIX threads on mac X and solaris? Does any one know of any advantages or disavtanges of using posix thread (pthread) on mac X and solaris compared to native implementations. Do pthread make call to native implementation in both these cases and is the maping between pthread and kernel object 1:1 . Thanks Sujeet I don't know anything about the thread implementation on the mac. On Solaris, pthreads are roughly equivalent to the so-called solaris threads implementation. I believe that both APIs sit on top of lower-level calls. The main advantage of using POSIX threads is portability. The other is simplicity. % Do pthread make call to native implementation in both these cases and is the % maping between pthread and kernel object 1:1 . The mapping between pthreads and the kernel scheduling entity in Solaris depends on what you ask for. Note that you must be careful if you try to use the m:n model, because the Solaris two-level thread scheduler is crap. (this is not related to the API -- it's crap for both pthreads and UI threads). -- On Mac OS X, POSIX threads is the lowest-level threading interface anyone should be calling, at least outside the kernel. The POSIX interface uses Mach threads, and there is an API to create Mach threads -- but it's not very convenient. (You need to create the thread, load the registers with intimate knowledge of the calling standard, including creating a stack and setting it to "bootstrap" the thread.) Also, the Mach API has limited (and inefficient) synchronization mechanisms -- IPC. On Solaris, while libpthread depends on libthread, UI threads isn't really so much a "native implementation"; they're more or less parallel, and happen to share a common infrastructure, which happens (for mostly historical reasons) to reside in libthread. You could consider the LWP layer to be "native threads", but, somewhat like Mach threads, they're really not intended for general use. The POSIX thread API is far more general, efficient, and portable than Mach, UI, or LWP interfaces. Unfortunately, the POSIX thread implementation on Mac OS X is incomplete, (it can't even build half of my book's example programs), and I wouldn't want to count on it for much. (Though I have no evidence that what's there doesn't work.) Still, you wouldn't be any better off working directly with Mach threads. Solaris, by the way, supports both "N to M" and "1 to 1" thread mappings. Solaris 8 has a special library that's always 1 to 1. The normal libpthread provides both N to M (Process Contention Scope, or PCS) and 1 to 1 (System Contention Scope, or SCS); though the default is PCS and you can't change the scope of the initial thread. Mac OS X supports only 1 to 1 scheduling. /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q355: Comments on ccNUMA on SGI, etc. > I have a big problem with my simulation. I am trying to implement a > parallel neural network simulator using SNNS simulator, C language and > POSIX threads. I am using a SGI machine with 8 processors and an IRIX > Origin 2000 system. For more than 2 weeks I am trying to make my code to > run faster on 8 processors than on 2 - but I still can't get any > progress ! [...] The Origin 2000 systems are ccNUMA; that means, unlike traditional SMP multiprocessors, all systems do not have equal access to all memory. Any memory you use will be on one "node" or another. Threads running on that node (each Origin 2000 node has 2 processors, so you're potentially running threads on 4 different nodes) have fast local access. Threads running on other nodes have to go through the network interconnects between nodes. Those interconnects are slower (typically much slower) and also have limited bandwidth. That is, it's probably not possible for 3 nodes to simultaneously access memory in the 4th node without severe performance degradation over the "normal" local access. > I have read now on the IRIX documentation, that the cache memory may be > a very important issue - and that each thread should access the same > subset of data all the time - for good performances. This is not the > case in my program. And also, the network (which has around 600 units) > and the connections are created by the main thread and - probably - are > stored on one processor ?! This means that all the others processors are > communicating with this one to get the unit's information ? Is this so > bad ? This can be the only reason for the low performances ? Running with cache is always the best strategy for performance. Modern processors are so much faster than memory, that memory access is the only defining characteristic of program performance. We used to count instructions, or CPU cycles; but all that's irrelevant now. You count memory references, as a first indicator; for detailed information, you need to analyze the cache footprint. Most processors have multiple levels of cache, maybe up to 3, before you hit main memory. The first level delays the instruction pipeline by a couple of cycles. The second may be on the order of 10 cycles, the third 20 to 100 cycles. And, relative to normal processor speeds, if you've got to hit main memory you might as well break for lunch. And that's just LOCAL memory, not remote memory on some other node. > Also, the global list of spikes is updated by all threads - and now I am > wondering where is stored, and how I should store it, in order to have > an efficient use of it. In the same documentation it says that you > should store the used data on the same processor but here the spikes are > inserted by different threads and computed by any of the threads. This > is because the entire simulation is driven by 'events' and time issues - > so any available thread compute the next incoming event. Writing closely packed shared data from multiple threads, even on an SMP, is "bad mojo". When the data lives within the same hardware cache line, all other processors that have written or read the data "recently" need to see that their cached copy is now invalid, and refetch from main memory. That, obviously, is expensive. When all of your threads are writing to the same cache line continuously, the data in that line "ping pongs" between processor caches. This is the number 1 program characteristic that leads to the old "my program runs faster without threads". (The second, and less subtle, is overuse of application synchronization.) And remember, in a ccNUMA system like yours, anything dealing with memory is far worse unless the memory is local to your node. Obviously, memory shared by all your threads cannot possibly be local to all of them unless you're using only a fraction (2) of the available processors. That is very likely why you ran into the magic number "2 threads (processors)". When you're using only 2, the system can keep both of them, and their memory, on the same node. Beyond 2, that's impossible. I'm not sure how IRIX manages your memory in this case. Some systems might automatically "stripe" memory that's not otherwise assigned across all the nodes. (If there's enough data to do that.) That may tend to even out the non-local memory references, and can often perform better than simply putting all the memory into one node. On the other hand, memory that's not explicitly assigned is often allocated on the first node to reference the memory; and if your startup initializes your entire data array (or allocates it all from malloc), then it's likely that the entire data set IS on the node where you started. Which means that the other 3 nodes are beating on its interconnect port continuously, and you're operating in the worst case performance mode. The best strategy (if you can) would be to explicitly target memory to specific nodes along with two specific threads that will be doing all (ideally) or most of the access to that memory. (In your case, this probably isn't possible; but the closer you come, the better your performance will be.) Even making sure that your global arrays are striped might help. In fact, even making sure that they're allocated from two nodes instead of just one might double your performance. I'm not familar with the IRIX APIs for assigning memory (or threads) to specific ccNUMA nodes, but such things must exist, and you might consider looking them up and giving it a try. Otherwise, you might consider limiting your application to a single node. Given that your application sounds pretty heavily CPU bound with relatively little I/O, you're unlikely to gain any advantage in that case from more than 2 threads. (The more blocking I/O you do, the more likely it is that additional threads will improve throughput.) If you can split the dataset more or less in half, you might consider doing that across 2 nodes, with 4 threads, and see how that works. Just as optimizing threaded performance has started to go from pure black magic to something that's almost engineering, along comes ccNUMA and breaks all the rules and brings back that element of magic. Welcome to the bleeding edge, and... good luck. ;-) /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ Origin 2000 is rather old at this point. Origin 3000 is the current system, and its memory system is even less NUMA than the Origin 2000. > [...good stuff snipped...] > I'm not familar > with the IRIX APIs for assigning memory (or threads) to specific > ccNUMA nodes, but such things must exist, and you might consider looking > them up and giving it a try. Indeed, one has complete control over where memory is placed. > [...more snipped...] > Just as optimizing threaded performance has started to go from pure black > magic to something that's almost engineering, along comes ccNUMA and > breaks all the rules and brings back that element of magic. Welcome to the > bleeding edge, and... good luck. ;-) I'd hardly call NUMA bleeding edge after all these years. One thing Dave didn't bring up is "execution vehicle"-pthread affinity. In an M-on-N pthread implementation the kernel schedules the "execution vehicles" and the library schedules the pthreads onto them. The kernel is cache+memory affinity aware and tries to schedule execution vehicles to maximize affinity, while trying to be fair, schedule real time threads, etc. The library has to avoid deadlock, observe priorities, and schedule what could be far more threads than execution vehicles. What can happen is that IRIX may nicely place say 5 execution vehicles on the 4 CPUs in one C-brick, and 1 CPU in another "nearby" C-brick, and leave them there, maximizing affinity, but the library, for a variety of reasons, may end up moving pthreads around on these execution vehicles in a way that is not affinity friendly. For CPU intensive applications this may be a performance issue, so the library provides a nonportable scope: PTHREAD_SCOPE_BOUND_NP to bind a pthread to an execution vehicle. For realtime and applications which typically run alone on a system the library provides a nonportable call: pthread_setrunon_np() to force a bound (or system scope) thread to run on a particular CPU. I understand that Sun recently released an alternate version of its pthread library which has a N-on-N implementation. I'd guess they did this because of the same affinity issue. Does anyone know different? =================================TOP=============================== Q356: Thread functions are NOT C++ functions! Use extern "C" Patrick TJ McPhee wrote: > > In article , > Doug Farrell wrote: > > % And again you refer to 'the standard C++ thread function', what are you > % talking about? > > There isn't any, but do use it if you don't want to pass a C function. > > [Cry of frustation followed by general elucidation omitted] > > If it causes you emotional distress to create a C function, > then use the standard C++ thread class (keeping in mind that there > isn't one). Just so's this doesn't go on and on and on: Patrick, is it fair to assume that you are ladling on the irony here? Doug, the essence of what has been said so far is this: pthread_create's prototype in C++ is: extern "C" pthread_create(pthread_t *, pthread_attr_t *, void *(*start_routine)(void *), void *); See that `extern "C"'? That covers _all_ function types in the declaration; in particular the start_routine function pointer, whose type is actually extern "C" void *(*)(void *); that is, `pointer to C function taking void * and returning void *'. By passing a function whose C++ prototype is: class SomeClass { // ... static void *threadfn(void *); }; or just: void *threadfn(void *); (therefore, &threadfn is `pointer to C++ function taking void * and returning void *'), you are invoking undefined behaviour. Your implementation is now allowed to activate your modem and phone the speaking clock in Ulan Batur, amongst other things. You _know_ you _mustn't_ invoke undefined behaviour, just as you _know_ that unless you feel obliged to by current compilers' handling of implicit template instantiation, you shouldn't put the implementation in the header file...* In short, using POSIX threads, you cannot put the argument to pthread_create inside the class. Period. Put it in the implementation file inside an anonymous namespace, or use a global static, and pass it a pointer to the class as its argument, like this: class SomeClass { void *threadfn(); }; extern "C" static void *threadfn(void *args) { SomeClass *pSomeClass = static_cast (args); return pSomeClass->threadFn(); } Guy (not saying anything further just in case it starts another pointless "Standard C++ is broken with respect to threading--oh no it isn't--oh yes it is etc. ad nauseam" thread). *Ask yourself: given a header file containing the implementation, or a library/object file containing the implementation, what must my users do if I change the implementation? =================================TOP=============================== Q357: How many CPUs do I have? NoOfCpus = sysconf(_SC_NPROCESSORS_CONF); /* Linux */ GetSystemInfo(&SystemInfo); /* NT */ NoOfCpus = SystemInfo.dwNumberOfProcessors; I have made the experience that for busy CPU bound threads the number of threads should not extensively exceed the number of available processors. That delivered the best performance. [email protected] Victor Khomenko wrote in message news:[email protected]... > Hi, > > I want to make the number of working threads depend on the number of > processors in the system. Is there a good way to find out this information > (run time)? How many threads per processor is a good ratio (all threads are > going to be pretty busy, but can sometimes wait on mutexes and conditions)? > > I need this information for Linux and Win32. > > Victor. > > =================================TOP=============================== Q358: Can malloc/free allocate from a specified memory range? > Using mmap to share data between processes leads to the requirement to > dynamically allocate and free shared memory blocks. I don't want to > mmap each block separately, but prefer to allocate and free the memory > from within the mapped region. Is there a way to redirect malloc/free > library functions to allocate from a specified memory range, instead of > the heap? > > I don't want to mmap each block separately or to use shmget because of > the cost of so many mappings. > > -K The mmalloc package at: http://sources.redhat.com/gdb/5/onlinedocs/mmalloc.html might be a good starting point. HTH, --ag =================================TOP=============================== Q359: Can GNU libpth utilize multiple CPUs on an SMP box? >> > Is there any existing patches that can make GNU libpth utilize >> > multiple CPUs on an SMP box? >> I recall that IBM is doing something much like it. I cannot remember > Can you give me some clues to find it? I've tried google, but it > returned either too many or no results. Here is the URL: http://oss.software.ibm.com/developerworks/opensource/pthreads/ bye, Christof =================================TOP=============================== Q360: How does Linux pthreads identify the thread control structure? R Sharada wrote: > I have a query related to how Liux pthreads implementation > idnetifies the thread control structure or descr for a current thread, > in the case when the stack is non-standard ( by way of having called a > setstackaddr /setstacksize ). First off, don't ever use pthread_attr_setstackaddr(), because it's a completely brain-damaged interface that's inherently broken and totally nonportable. I've explained why it's broken (both in terms of engineering features and political history), and I won't repeat it here. (You can always search the archives.) Just don't use it. The next version of POSIX and UNIX (2001) contains my corrected version, which ended up being named pthread_attr_setstack(). At some point, this will begin to appear on Linux and other systems. > Currently the method ( in thread_self > routine ) just parses through the whole list of threads until one > matches the current sp and then obtains the descr from there. This could > get quite slow in conditions where there are a lot of threads ( close > to max ). Isn't there a better way to this? No; not for Linux on X86. > Does anone know how this is handled in other UNIXes - AXI, Solaris, > etc.?? The best way to handle it is to define in the hardware processor context a bit of information that's unique for each thread. SPARC and IA-64 define a "thread register" that compilers don't use for anything else, but can be read by assembly code or asm(). Alpha defines a "processor unique" value that can be read by a special instruction. I believe that PowerPC has one or the other of those techniques, as does MIPS. LinuxThreads can and should use these mechanisms when built for the appropriate hardware; but on X86 (which is still the most common Linux platform), none of this is an option. Of course, "the system" could define a universal calling standard that reserved from the compiler code generators "a register" that could be used in this way. However, the X86 register set is pretty small already, and, in any case, trying to make that change NOW would be a major mistake since you couldn't use any existing binary code (or compilers). /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q361: Using gcc -kthread doesn't work?! > i have a multithreaded program on a dual-pentium machine running freebsd > 4.3. compiling everything with > > gcc -pthread ... > > works fine, but doesn't realy make the second processor worth the money > (i.e. everything runs on one thread). according to 'man gcc' compiling > with > > gcc -kthread > > should fix the problem. unfortunately, gcc tells me it doesn't recognise > the option. in a message on mailing.freebsd.bugs i read that for freebsd > 4.2 one had to recompile gcc with the appropriate arguments set. i did a > make and make install in /usr/src/gnu/usr.bin/cc, but i couldn't add any > options and the compiler turned out just the same as the last... > > anybody know what i should do here? As you may have already seen, there's a FreeBSD bug report on this: http://www.FreeBSD.org/cgi/query-pr.cgi?pr=24843 Here are the comments in the "Audit-Trail" section at the bottom of the page: "The -kthread link flag was purposely removed, since linuxthreads is not part of the base system. There are explicit instructions that come with the linuxthreads port that explain how to link with linuxthreads." > p.s. i don't want to start a flame-war on linuxthreads vs. whatever - > the purpose of compiling under freebsd is to be able to tell for myself > which os is best for my needs ;) You may find "Kernel-Scheduled Entities for FreeBSD" interesting reading: http://people.freebsd.org/~jasone/refs/freebsd_kse/freebsd_kse.html I once did a test with linuxthreads (available under /usr/ports/devel) on a dual-CPU FreeBSD system and my test program successfully used both processors. However, my test program was trivial, so I'd want to do a lot more testing before I'd put anything more complicated into production. As you may know from reading this newsgroup, there's some criticism of the linuxthreads model. But at least it lets threaded programs use multiple CPUs on FreeBSD :-) -- Michael Fuhr =================================TOP=============================== Q362: FAQ or tutorial for multithreading in 'C++'? Using Only for WIN32 API !! MSDN Library (With samples and function documentation) http://msdn.microsoft.com/library/devprods/vs6/visualc/vccore/_core_multithr eaded_programs.3a_.overview.htm Thread function documentation : http://www.clipcode.com/content/win32_3.htm If you speak french : http://perso.libertysurf.fr/chez_moe/programmation/index.html Tomasz Bech a écrit dans le message : [email protected]... > Hi, > Does anybody know about good faq or tutorial for multithreading in > 'C++'? > Thanks, > Tomasz > > =================================TOP=============================== Q363: WRLocks & starvation. > "Dave Butenhof" schrieb im Newsbeitrag > news:[email protected]... > > The UNIX 98 standard (and the forthcoming POSIX 1003.1-2001 standard) > includes POSIX read-write lock interfaces. You'll find these interfaces implemented > (at least) on AIX 4.3.x, Solaris 8, Tru64 UNIX 5.0, and any moderately recent > version of Linux. Earlier versions of Solaris and Tru64 UNIX also provided > different nonstandard interfaces for read-write locks. > > I guess these implementations will take care of classic problems like > starvation of the writer, don't they? Sure. If they want to. In whatever manner thought best by the designers. (Or in whatever way the code happened to fall out if they didn't bother to think about it.) Even the POSIX standard read-write lock doesn't require any particular preference between readers and writers. Which (if any) is "right" depends entirely on the application. Preference for readers often results in improved throughput, and is frequently better when you have rarely updated data where the USE of the data is substantially more important than the updates. (For example, the TIS read-write locks on Tru64 UNIX were developed specifically to replace a completely broken attempt to do it using a single mutex in the libc exception support code. It used the construct to manage access to the code range descriptor list for stack unwinding; or to update it with a newly loaded or generated code range. Read preference was appropriate, and sufficient.) Write preference can be better when you don't care principally about "throughput", or where multiple readers are really relatively rare; and where operating on stale data is worse than having the readers wait a bit. (Or where you simply cannot tolerate the possibility of a starving writer wandering the streets.) A generally good compromise is a modified FIFO where adjacently queued readers are batched into a single wakeup; but that still constrains reader concurrency over read preference and increases data update latency over writer preference. Like all compromises, the intention is more to keep both sides from being angry enough to launch retaliatory nukes, rather than to make anyone "happy". It does avoid total starvation, but at a cost that may well be unacceptable (and unnecessary) to many applications. It wouldn't make sense for the standard to mandate any of those strategies. Partly because none of them is "best" for everyone (or even for ANYone). Partly because there are probably even better ideas out there that haven't been developed yet, and it makes no sense to constrain experimentation until and unless a clear winner is "obvious". (For example, had the standard specified a strategy, it would have been either reader or writer, not "modified FIFO", because the latter wasn't in wide use at the time.) We considered a read-write lock attribute to specify strategy. We decided that this would be premature. While we've advanced a bit in the intervening time, I think it would still be premature. Though of course individual implementations are welcome (and even encouraged) to experiment with such an attribute. If some set of strategies become relatively common practice, the next update of POSIX and UNIX (probably 2006) could consider standardizing it. > I enjoyed the discussion about how to implement condition variables in > Win32. What would the windows implementation of these read-write lock > interfaces look like? Probably already done, somewhere. Go look! I don't even want to THINK about it. (But then, I feel that way about anything Windows-ish. Everyone, except possibly Bill Gates, would be better off without Windows.) /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q364: Reference for threading on OS/390. Gee M Wong wrote: > I've got a new project starting up, and it has been over a decade since > I last wrote a C/C++ program on the mainframe. Would someone please > suggest a current libraray and reference for threading on OS/390 > (preferably Pthread). http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/CBCPG030/4.3 "4.3 Chapter 23. Using Threads in an OS/390 UNIX Application..." http://www.ibm.com/software/ad/c390/cmvsdocs.htm "OS/390 C/C++ Library Start here to access the OS/390 C/C++ publications available on the Web..." =================================TOP=============================== Q365: Timeouts for POSIX queues (mq_timedreceive()) > > Wile a thread is waiting for a message to arrive in a message queue, > > using mq_receive(), I'd like to have a way to unblock the thread when > > after a certain timeout no message has arrived. In pSOS the timeout is > > a parameter of the q_receive() call. > > Is this also possible using POSIX queues? > > Well, sort of. The POSIX 1003.1d-1999 amendment to POSIX 1003.1-1996 > includes an mq_timedreceive() function that allows you to specify a > timeout. However, it's not widely implemented yet, and likely won't be > available on your platform. (You haven't said what your platform is; > "POSIX" doesn't help much since there's no such operating system!) You're right. Actually, the software should work on multiple platforms, being Linux and pSOS the most important. mq_timedreceive() is not implemented in pSOS. > > > If not, is there a work around for this problem? > > You could always create a thread that waited for the specified interval > and then sends a special message to the queue, awakening a waiter. Yes, I tried that one. It works, but I wondered if there is a more elegant way to do this. As you pointed out, this is mq_timedreceive() (maybe implement my own mq_timedreceive for pSOS?) > You could also interrupt it with a signal, causing an EINTR return from > mq_receive(); though that imposes a number of complications, including > deciding what signal number to use, what happens (what you want to > happen) when the thread isn't actually waiting in mq_receive(), and so > forth. > > You can't use alarm(), because the signal it generates isn't directed at > any particular thread but rather to the process as a whole. (Although you > can get away with it if you control the main program, so that you can > ensure SIGALRM is blocked in all threads except the one you want to > interrupt.) Thanx for that one. I have to check if alarm() is supported by pSOS. > > If you can wait for the platforms you care about to implement > mq_timedreceive(), that'd be the best solution. Otherwise... choose your > hack. > > /------------------[ [email protected] ]------------------\ > | Compaq Computer Corporation POSIX Thread Architect | > | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | > \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q366: A subroutine that gives cpu time used for the calling thread? > I would like to write a subroutine that gives cpu time used for the calling > thread. I used times (I'm under Tru64 V5.0 using pthread), and it returns a > cumulative cpu time, not the cpu time for the given thread. Any suggestions ? The 1003.1d-1999 amendment to POSIX added optional per-thread clock functions; but I doubt they're implemented much of anywhere yet. (And definitely not on Tru64 UNIX.) Where implemented, you'll find that defines _POSIX_THREAD_CPUTIME, and you could call clock_gettime() with the clock ID CLOCK_THREAD_CPUTIME_ID (for the calling thread), or retrieve the clock ID for an arbitrary thread (for which you have the pthread_t handle) by calling pthread_getcpuclockid(). (I'd like to support this, and a lot of other new stuff from 1003.1d-1999 and 1003.1j-2000, as well as the upcoming UNIX 2001. But then, there are a lot of other things we'd like to do, too, and I couldn't even speculate on time frames.) Whether there are any alternatives or "workarounds" depends a lot on what you're trying to accomplish. In any case, times() MUST return process time, not thread time. That's firmly required by the standard. Otherwise, times() would be broken for any code that wasn't written to know about threads; which is most of the body of UNIX software. /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q367: Documentation for threads on Linux > "Dan Nguyen" wrote in message > news:[email protected]... > > Robert Schweikert wrote: > > > I am looking for some documentation for threads on Linux. What I am > > > after is some idea what is implemented, what works, what doesn't. Where > > > the code is, and what is planned for the future. > > > > Linux uses a 1-1 type threading model. LinuxThreads as it is known is > > a kernel level thread using the clone(2) system call (only available > > in Linux, and don't use it yourself). It implemnts the pthread > > library, so any pthread application should run correctly. > > > I am everthing but an expert on this, but it seems pthread is not fully > implemented on Linux. This is correct. The essential problem is that clone() doesn't, currently, support the creation of multiple THREADS within a single PROCESS. Instead, it creates multiple PROCESSES that share a single ADDRESS SPACE (and other resources). The basic distinction is that each clone()d process has its own pid and signal actions, and that they lack a shared pending signal mask. While these deficiencies can be critical for some code, the LinuxThreads implementation does enough extra work "under the covers" that most threaded applications won't notice. There are people working on solving the problems, so you can expect them to be "short term". > Have a look at: comp.os.linux.development.apps The thread from the 11th June > 2001 called "sharing Pthread mutexes among processes". POSIX provides an OPTION supporting "pshared" synchronization objects, that can be used between processes. Implementations need not provide every option to be "POSIX". If by "full POSIX" you choose to mean "an implementation correctly and completely providing all mandatory and optional features and behaviors", then I doubt any exist. /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q368: Destroying a mutex that was statically initialized. Ross Smith wrote: > David Schwartz wrote: > > > > > Thanks. Apparently even Mr Butenhof makes the occasional mistake :-) Like R2D2, I have been known to make mistakes, from time to time. Still, this particular example isn't one of them. What I actually said was "You do not need to destroy a mutex that was statically initialized using the PTHREAD_MUTEX_INITIALIZER macro." And you don't. You CAN, if you want to; but you don't need to. Why should you? It's static, so it never goes out of scope. You can't have a memory leak, because the little buggers can't reproduce. If you want to destroy one, and even dynamically initialize a new mutex at the same address, have at it. > > Let me point out one more thing: It really doesn't make sense to > > attempt to statically initialize something that's dynamically created. > > So you shoulnd't be statically initializing a mutex that isn't global > > anyway. And if it's global, you should never be done with it. > > > > Can you post an example of a case where you are done with a statically > > initialized mutex where it isn't obvious that dynamic initialization is > > better? > > Any case where the mutex isn't global, I would have thought. > > void foo() { > pthread_mutex_t mutex(PTHREAD_MUTEX_INITIALIZER); > // ... > pthread_mutex_destroy(&mutex); > } You can't do that. It's illegal. You can ONLY use the POSIX static initializers for STATICALLY ALLOCATED data. Nevermind that compilers will happily compile the broken code: that doesn't mean it's not broken any more than being able to compile "x=0;z=y/x;" means you should expect it to work. You're violating POSIX rules. The resulting code MAY work (at least sometimes, or "appear to work" in some situations) on SOME implementations, but it is not legal POSIX code and isn't guaranteed to work anywhere. Of course, this is legal: void foo() { static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; // ... } But that's not the same thing. ALL invocations of foo() share the same global mutex. Private mutexes aren't much good, anyway. Your example is pointless unless foo() is creating threads and passing the address of "mutex" to them for synchronization; in which case it had better also be sure all threads are DONE with the mutex before returning. It must also use pthread_mutex_init() to initialize "mutex", and pthread_mutex_destroy() to destroy it before returning. /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ > > I am looking for some documentation for threads on Linux. What I am > after is some idea what is implemented, what works, what doesn't. Where > the code is, and what is planned for the future. > The best resourse for Linuxthreads documentation is in the info pages for libc ('info libc') -- under the headings ADD-ONS -> `POSIX Threads'. HTH, Artie Gold, Austin, TX I found cprof at http://opensource.corel.com/cprof.html very useful. Regards, Erik. On Wed, 13 Jun 2001, stchang wrote: > We are developing muti-thread program code. However, it does not have > good performance. The performance is about 1.5X compare with > non-mutithread code. Sometime, it slower than non-mutithread code. Does > someone give me some idea about how to profile muti-thread code or > analysis thread? > > Thanks! stchang wrote: > We are developing muti-thread program code. However, it does not have > good performance. The performance is about 1.5X compare with > non-mutithread code. Sometime, it slower than non-mutithread code. Does > someone give me some idea about how to profile muti-thread code or > analysis thread? The first, and often the best tool to apply is common sense. You don't say on what hardware (or OS) you're running. Actually, for an untuned application, if you're running on a 2-CPU system, 1.5X speedup isn't at all bad. However, a performance decrease isn't particularly surprising, either. It means you're not letting the threads run in parallel. There are many possible reasons, some of which are due to "quirks" of particular implementations. (For example, on Solaris, you need to use special function calls to convince the system you're on a multiprocessor.) The most common reasons are that your application is designed to "wait in parallel". Contention for a common resource is the most common problem. For example, all threads do all (or nearly all) their work holding one particular application mutex. No matter how many threads you have, they can't do anything significant in parallel, and they waste time in synchronization and context switching. Guaranteed to perform worse than single-threaded code, trivial to write. The contention may not even be in your code. If they all do I/O to a common file (stream or descriptor), they will spend time waiting on the same kernel/C/C++ synchronization. If that I/O drives the performance of the application, you lose. The problem might even be in how you're using your hardware. When processors in an SMP or CC-NUMA system repeatedly write the same "block" of memory, they need to synchronize their caches. If all your threads are busily doing nothing else on all available processors, you can reduce those processors to doing little but talking to each other about what cache data they've changed. Adding threads doesn't make an application "go faster". Careful design makes it go faster. Careful and appropriate use of threads is one TOOL a developer can use when designing a faster application. But it affects every aspect of the design (not just implementation) of the code. Sometimes you do need to analyze the behavior of running code, and it's nice to have good tools. (If you can run on Tru64 UNIX or OpenVMS, by the way, Visual Threads does an awesome job of helping you to understand the synchronization behavior of your program.) Regardless of the tools, though, good performance comes from careful design and thorough understanding of what your application does, and the demands it places on the OS and hardware; the sooner in the design cycle you accomplish this, and the more completely you apply the knowledge, the better the results will be. /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q369: Tools for debugging overwritten data. > Oh btw, suppose I've got a local variable in a thread function, and I KNOW > something overflows or overwrites it (when NOT supposed to happen), is there > a way to find out who trashes it ? There are several tools to it: - purify by Rational Software (www.rational.com) Very good tool, but expensive - Insure by Parasoft Very good, has a few quirks but nothing serious. Can catch illegal parameters to systems calls too. If your budget can't handle the above tools, or if you can limit the trashing to the heap, you can look into: - electric fence freeware, pretty good debug-heap. I have encountered a few problems with fork()ing multithreaded programs under solaris, though. - miscellaneous debug heaps - idh (www.platypus.adsl.dk/idh/index.html) (disclaimer: I wrote it) =================================TOP=============================== Q370: POSIX synchronization is limited compared to win32. On Fri, 20 Apr 2001 23:28:55 -0400, Timur Aydin wrote: >Hello everybody, > >After quite some time doing multithreaded programming under win32, I have >now started to do development under Linux using LinuxThreads. However, I am >noticing that the synchronization objects are quite limited compared to the >ones under win32. However, the nature and variety of the objects provided by Win32 leaves much to be desired. Events are simply not very suited for solving a wide variety of synchronization problems. It's a lot easier to solve synchronization problems with condition variables because they have no programmer visible state. The logic is based entirely on the state of your own data. Objects like events or semaphores carry their own state; to solve a synchronization problem, the programmer must bring about some meaningful association between the semaphore's state and the state of the program. In my programming experience, such associations are fragile and difficult to maintain. >As far as I have learned, it is not possible to do a timed >wait on a mutex or a semaphore. Timed waits on mutexes are braindamaged for most kinds of work. They are useful to people working in the real-time domain, so the 200X draft of POSIX has added support for timed mutex waits---it was due to pressure from some real time groups, apparently. In real time applications, the duration of a critical region of code may be determined precisely, so that a timed out mutex wait can serve as a watchdog. You can find the implementation of pthread_mutex_timedlock in glibc 2.2. For reasons of efficienty, not every mutex type supports this operation, just the default one. Glibc 2.2 also adds barriers, and the POSIX timer functions: timer_create and friends. Also realize that the Linux pthread_mutex_t is a lot closer to the Windows CRITICAL_SECTION than to the Windows mutex. Note that there is no timed lock function for critical sections! >Also, while under win32 the synchronization objects can have both >interprocess and intraprocess scope, under linux the only object that can do >this is the semaphore. The traditional UNIX semaphore, that is. >So you can't have a mutex or a condition object that >can be accessed by separate processes. There is a provision in the POSIX interface for process shared mutexes and conditions, but it's not implemented in Linux. >And, lastly, it is not possible to >wait on multiple objects simultaneously. Again, this is a braindamaged concept to begin with, and severely limited in Windows (only 64 handles can be waited on). Not to mention that the WaitForMultipleObjects function is broken on Windows CE, so it cannot be considered portable across all Win32 platforms. Lastly, it has fairness issues: under the ``wait for any'' semantics, the interface can report the identity of at most one ready object, regardless of how many are actually ready. This can lead to one event being serviced with priority over another one, depending on its position in the array. With condition varibles, your program is waiting for a *predicate* to become true. The condition variable is just a place to put the thread to sleep. If you want to wait for more than one predicate, just form their logical conjunction or disjunction as needed, and ensure that signaling of the condition variable is done in all the right circumstances, e.g. /* wait for any of three predicates */ while (!predicate1() || !predicate2() || !predicate3()) { pthread_cond_wait(&cond, &mutex); } This is equivalent to waiting on three events. The thread is parked in some wait function, and can wake up for any of three distinct reasons. A better structure might be this: int p1 = 0, p2 = 0, p3 = 0; /* mutex assumed locked */ for (;;) { p1 = predicate1(); p2 = predicate2(); p3 = predicate3(); if (p1 || p2 || p3) break; pthread_cond_wait(&cond, &mutex); } if (p1) { /* action related to predicate1 */ } if (p2) { /* action related to predicate2 */ } if (p3) { /* action related to predicate3 */ } Multiple object waiting is primarily useful for I/O multiplexing; for this you have the poll() or select() function. Both of these functions provide feedback about which file descriptors are ready, thereby avoiding problems of fairness, and can handle much more than 64 descriptors. =================================TOP=============================== Q371: Anyone recommend us a profiler for threaded programs? On 2 Jul 2001, Bala wrote: > Hi, can anyone recommend us a profiler, possible free, that will profile > multi-threaded programs based on pthread? > > Our development platform is Linux x86 and Solaris. We've looked at gprof, but > accoding to the docs it says that it won't do MT apps. > Maybe (?) the Linux Trace Toolkit can help you? -- "I decry the current tendency to seek patents on algorithms. There are better ways to earn a living than to prevent other people from making use of one's contributions to computer science." D.E. Knuth, TAoCP 3 =================================TOP=============================== Q372: Coordinating thread timeouts with drifting clocks. > > > Hello all, > > > > > > We have a small problem in our application. It is that our computer > > > (running Solaris 7 on UltraSparc) is synchronised with several other > > > > > I find this surprising. My experience has been that the SPARC systems > > have extremely stable clocks, almost good enough for use as time > > references. Even without NTP the worst drift I ever saw with any of our > > SPARC systems was 2 seconds per month. > > Are you sure the NTP server is stable? > > > > The problem is not so much the stationary situation. Our system must be > synchronised to an external system that may or may not be synchronised with UTC. > As long as everything is running stationary, everything is fine. However, from > time to time it is necessary to change the time reference in the external system > and, hence, also for the Sparcs. This creates the problem. If we cannot find out > when the clock changes or acter for it by using relative times, we will have to > make a manual procedure whereby the spark software is reset so to speak. We > would like to avoid this. > You could look into some of the timer facilities. See setitimer or clock_settime. It may be that one of the clock types will take into account adjustments to the system clock. You should use sigwaitinfo or sigtimedwait to wait for the timer signals rather than signal handlers so that you don't run into the problem that practically nothing is async-safe with respect to threaded code. sigtimedwait appears to be a relative time interval but I don't know what clock type it uses then or whether that can be changed. Secondly, whatever is changing the system clock should be using adjtime() so you don't have problems like this. That's the whole point of adjtime. Thirdly, I don't know why people think you can have synchronized clocks. This is meaningless and the laws of physics don't support that kind of thing. All you can do is determine approximate synchronicity of two clocks with some amount of accuracy and certainty. And that's making a lot of assumptions and probably ignoring relativistic effects. That's all NTP does. And if you can deal with unsynchronized clocks, then having a clock appear to go backwards once is a while is nothing. Joe Seigh =================================TOP=============================== Q373: Which OS has the most conforming POSIX threads implementation? Mine, of course, on Tru64 UNIX V5.1A. (Oh yeah, but it hasn't actually released yet... ;-) ) Seriously, though, any "fully conforming POSIX threads implementation", right now, is broken and shouldn't be used. There are several serious bugs in POSIX 1003.1-1996 that should not be implemented. (These have been fixed for 1003.1-2001, but that's still in balloting and thus isn't yet really a standard.) So what you really want is an implementation that's "sufficiently conforming" without being "fully conforming". So do you want one that does everything in the standard that SHOULD be done and nothing that SHOULDN'T be done? Is it a firm requirement that this implementation have no bugs, known or unknown? Yeah, that's a grin, but it's also serious, since an implementation with conformance bugs can't really be said to conform, at least by any abstract and objective definition of "conform". Pragmatically, the best objective use of the term would be to claim the UNIX 98 brand, proof of having passed the VSTH test suite; but that suite isn't perfect. Once you loosen the bounds of "100% strict conformance", we get to the important issue... which is deciding what meets your actual needs. The current LinuxThreads implementation falls well short of full conformance; but while it fails to implement many features of the standard, it also so far as I know fails to implement any of the standard's bugs. For most applications, that implementation is going to be quite sufficient. IBM is working on NGPT ("Next Generation" POSIX threads), which they claim will relieve most if not all of the conformance bugs in LinuxThreads. However, as far as I can tell (as it appears to require no substantial kernel changes) it will inevitably add a set of bugs that the developers apparently like (or at least accept), and will share many of the "weaknesses" (some of which many consider actual conformance bugs) of the Solaris and AIX two-level scheduler implementations. They appear to be doing this principally because current limitations of Java encourage a "thread per client" design pattern, and "one to one" kernel thread implementations such as LinuxThreads tend to perform poorly with unreasonably large numbers of threads. They will give up a lot to gain support for "thousands of threads" servers that violate many principles of good threaded design and probably won't work well anyway. So... what particular conformance do you want? ;-) Or, to put it another way... choose your own most personally useful definition of "conformance", and look for the implementation that most closely implements it. /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q374: MT random number generator function. Chris M. Moore wrote: > On Tue, 10 Jul 2001 17:30:36 -0700, TC Shen > wrote: > >>Hello: >> Take this as an example, I need a random number generator function >>like either randome() or rand() or rand_r() to be used in a >>multi-threaded application > > Generally, functions ending in _r are re-entrant i.e. MT-safe. It's a little more complicated than that. When there is an _r version of a function, the original version has an interface (involving static data) that cannot trivially be made thread-safe. In such cases, POSIX added the _r version (with no implicit static data) and specified that the original form NEED NOT be made thread-safe. On many implementations, those original functions nevertheless ARE thread-safe. For example, rand() could be written to synchronize access to a shared static seed, or to maintain the seed in thread-specific data. Either would be thread-safe, though the behavior would be quite different. The rand_r() interface, though, provides more flexibility. If you provide your own external synchronization, you can share the explicit context used by rand_r() between threads. You can also use multiple random number sequences in a single thread. And as long as you DON'T share the context, you have no overhead for unnecessary synchronization or thread-specific data indirection. POSIX specifies that ALL ANSI C and POSIX functions are thread-safe, with the exception of a short list. (Mostly, though not exclusively, those replaced with _r variants.) The XSH6 specification (UNIX 98) requires that nearly all UNIX 98 interfaces (with a few more non-POSIX exceptions) must be thread-safe. This is important, as a lot of commonly used interfaces (for example, select) are not part of POSIX, but are in UNIX 98. /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q375: Can the main thread sleep without causing all threads to sleep? > If a main() creates mutiple threads that are off executing their > specified routines, can the "main()" thread then all sleep(3) with_out_ > causing any of the threads to sleep? > > Esentially: > 1) create and start threads > 2) sleep (yourself) while threads do ob > 3) wake up and stop the threads, clean-up, and end execution > > I have looked in /usr/include/unistd.h and at the Single Unix Spec but > am not sure of the behavior You're asking two separate questions. Reference to the Single UNIX Specification (version 2, which includes threads; aka UNIX 98) is really off topic if you're interested in Linux, because Linux bears only vague and incomplete similarities to UNIX 98. (Especially when you get to threads.) In terms of UNIX 98 conformance, LinuxThreads is full of large and serious bugs. (Though this is a pointless and somewhat unfair criticism because no aspect of Linux actually claims "conformance"... rather, the code strives to be compatible where that's practical. This is a good goal, and only those who depend on it can really judge whether what they've achieved is "good enough".) POSIX (and UNIX 98) absolutely require that sleep() function as specified for the calling thread without any effect at all on other threads. However, on implementations that don't/can't claim full POSIX conformance, sleep() is one of the functions voted most likely to be broken because the traditional implementation relies on SIGALRM and won't work correctly with threads. When such implementations support nanosleep(), that's more likely to work correctly, though there are no guarantees. (Again, POSIX requires that BOTH work correctly, and once someone's broken one rule, it makes little sense to bet they've followed other related rules... at least, don't bet very much.) However, sleeping for a period of time after creating threads, and then assuming that those threads have done anything at all (much less finished) is extremely bad practice. Instead, set up some explicit synchronization. If you really expect the threads to complete, use pthread_join() to wait for them to finish. If you want them to sit there for some period of time (recognizing they may in fact have done absolutely nothing at the end) and then terminate, you can use pthread_cancel() (for example). If, on the other hand, you want to be sure that the threads have "done something", but still make them quit after some reasonable period of time, set up your own synchronization protocol. Have the threads report on their progress in some shared data that the main thread can monitor. When they've "done enough" and you've waited "long enough", set some flag that tells the threads to terminate at the next convenient time. Or cancel them. /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q376: Is dynamic loading of the libpthread supported in Redhat? In article <[email protected]>, CKime wrote: >I am wondering if the dynamic loading of the libpthread that >ships with Linux RedHat 6.2 is supported. No. libpthread is integrated into libc. When a program is linked against libpthread, the behavior of libc changes because libpthread overrides a few symbols in libc. This provides thread safety to some internal modules within libc (example: malloc and stdio become thread safe), and adds some necessary multithreaded semantics to certain functions (example: fork() calls pthread_atfork handlers, sets up threading environment in child process). Not only can you not dynamically load the threading library, but in general you cannot dynamically load a shared library which uses threads into an executable that was not compiled and linked for multithreading. If some program is to support multithreaded plugins, it should be compiled as a multithreaded application. =================================TOP=============================== Q377: Are reads and writes atomic? > Suppose an integer variable is shared between threads. Is it safe to > assume that reads and writes are atomic (assuming reads and writes are > single instructions)? How big is an int? Does the machine provide instructions to read and write data with that size atomically? Does the compiler always generate the appropriate instructions to read and write 'int' data atomically? (E.g., "load 32-bit" rather than 'load 64-bit' the enclosing 64-bit cell and mask/shift.) Does the compiler/linker always ALIGN data of that size "naturally"? (If not) Does the machine still read and write data of that size atomically when the alignment is not natural? > I suspect the answer is 'no, no standard provide such a guarantee', > but then I'd like to know on what, if any, kind of hardware I can > expect it to fail. You're trying to rely on a bunch of different guarantees from the hardware up through the compiler and linker. You won't find ALL of these guarantees in any single document. The C and C++ languages do NOT require that compilers must generate atomic access sequences even to datatypes to which the hardware may support atomic access, so the standards don't help; you need to find out from the particular compiler you're using, for the particular hardware you're using. To sum it all up, while this sort of assumption will prove viable on many systems, it is 100% implementation-specific. How much portability do you require? To which implementations? What are the consequences of someone later porting to another implementation (without the guarantees you want), if the people doing the port fail to notice your dependency? Is it really worth the risk? /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q378: More discussion on fork(). > Kaz Kylheku wrote: > [...] >> Unlocking is problematic for implementors, because the mutexes may >> contain stuff that only makes sense in the parent process, such as a >> queue of threads that are blocked on the mutex, threads which only exist >> in the parent process! An unlock operation could try to wake one of these >> threads. Making it work requires contorted hacks, and overhead. But it needs to be done, so the only real question is where the hackery and overhead lives. Doing it once, in a central place (the thread library) reduces the risk of errors and minimizes the overhead. >> Also mutexes and other objects may contain internal finer-grained >> lock variables, which are not properly taken care of during the fork. >> (and taking care of them would require the interaction between fork and >> every instance of a mutex). The fork could happen exactly as some thread >> in the parent has acquired some spinlock variable within a mutex. The >> child's unlock operation will then deadlock on the spinlock. >> >> It makes much more sense to require the user to reinitialize the locks, >> which will obliterate any parent-specific cruft they may contain, >> and give them a new lease on life, so to speak. The same goes for any >> sync objects which may be in arbitrary use in the parent when the fork >> takes place and whose use is expected in the child. No, it really doesn't. For one thing, that means that pthread_mutex_init() can't check for EBUSY. (Or, if you require the child cleanup to first destroy mutexes, that pthread_mutex_destroy() couldn't check for locked state, or waiters, etc.) Or, alternatively, that they would need to be somehow aware of the "odd" environment and behave differently... overhead completely wasted in normal calls. All of this is essentially what a normal POSIX implementation must do internally inside of fork() for the child, and it's easier for the thread library to do it "automagically" than to require each "module" of the application to do it and get it right. (In any case, the thread library, and C runtime, etc., must all do this to their own objects.) > wouldn't it make much more sense if the standard would > define a set of "fork-points" (e.g. fork itself, _mutex_lock, > _mutex_unlock, cond_wait, _once, etc) which would allow impls. > make consistent *total* replica (including all threads) > once *all* threads meat each other at fork points; wouldn�t > that approach make it possible to fork a multithreaded process > without "manual" synchronization in prepare/parent handlers ? > is it just "too expensive" or am i missing something else ? What if some threads never reach such a point? How much does it cost the implemenation to check in each of those points... whether or not it'll ever be necessary? How do you tell threads they've been "cloned" behind their backs? This is critical if they're working with external resources such as files (you don't usually want two threads writing identical data to the same file), but may be necessary even if they're not. Solaris foolishly designed UI thread fork() to clone all threads. It even "allowed" that threads currently blocked in syscalls capable of returning EINTR might do so in the child... an ugly concession to implementation simplicity of no real value to application developers. The only solution would be to run some form of "atfork" handler in each active thread so that it can decide whether to reconfigure or shut down. This would be far more expensive and more complicated than the current fork-to-one-thread and "single stream" atfork handler mechanism. (Not that I'm arguing the POSIX model is "simple"... it's not. But the alternatives are even worse.) We really should have just left the original POSIX alone. When you fork, the child can't do anything but exec(); possibly after other calls to async-signal safe functions. (But nothing else.) The POSIX 1003.1d-1999 amendment, by the way, adds posix_spawn() to combine fork() and exec() in a single operation. Not, of course, a "simple" operation, since in order to make this useful to shells, (which is essentially a minimum definition of "useful" in this context), posix_spawn() comes with a "veritible plethora" of ancillary operations and data types to specify changes to environment, files, and all the other "tweaks" a shell would commonly make between the child's return from fork() and the call to exec().) /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q379: Performance differences: POSIX threads vs. ADA threads? @res.raytheon.com wrote: > thanks in advance to any replies to this post, they are much > appreciated. this is the situation: i am trying to debug a real-time > system running on a solaris. it was implemented using ada threads. the > question i have and the issue i am not to clear on is the relationship > between posix threads and realtime systems. is there an advantage to > using posix threads besides portability? i know that solaris implements > posix and that there are posix bindings (florist) for ada. i read > somewhere that posix threads have less overhead than ada threads. i > guess my ultimate question is... in a realtime system running on > solaris, written in ada, would there be any significant performance > difference in implementing posix threads over ada threads, or vice > versa? Though I don't know the internals of the Ada runtime you're using, most likely "Ada tasks" and "POSIX threads" are no different at all below the surface layers. Ada likely uses either the POSIX or the older UI thread interfaces to create and manage threads. (It could also use the lwp syscalls, as the Solaris thread librarys do... however that would cause problems, would have no real advantages, and wouldn't really change the fact that the bulk of the code would be common.) The only way Ada tasks would be "really different" would be if the runtime "imagines" its own pure user-mode threads, multiplexing them itself within a single "kernel entity" (no parallelism on an SMP, and little concurrency). If that's the case, run (don't walk) to a new Ada, because it's probably not even truly thread-safe. So, presuming a reasonable Ada implementation that uses POSIX or UI thread interfaces, is there any advantage to going "behind the runtime's back" to create and manage your own POSIX threads? Depends a lot on which Ada, and what you're trying to accomplish. The behavior of the threads won't be fundamentally different. Your own POSIX thread won't "run faster" than a POSIX thread (or a UI thread) created by the Ada runtime. The main disadvantage with the original Ada spec was that rendezvous is a nice simple model for ordinary client/server type interactions, but a lousy form of general synchronization... and that was all it had. Modern Ada (Ada 95) provides far more general and flexible forms of synchronization through the protected types mechanism, and could be used to write applications that would match native POSIX thread C code. (Often with less effort, presuming you're comfortable with Ada 95, and with perhaps even greater portability, as long as you have an Ada 95 everywhere... ;-) ) /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q380: Maximum number of threads with RedHat 255? >> With a RedHat 7.1 install (Intel architecture), the stock kernel comes >> with a limit of 65535. When running a simple program which sits in a loop >> creating threads & detaching them, the maximum number of threads I can >> reach is 255 - the same limit as a stock kernel from RedHat 6.1. On a modified > > > Too bad no one has an answer for this... I'm going to *attempt* to > figure this one out since one of the programs I'm writing is coming to > this same blockade. > I imagine you hit the max user processes limit, you can change it with ulimit -u 'num' in bash or limit maxproc 'num' under tcsh. You can also change the default soft/hard limits in /etc/security/limits.conf as well. Kostas Gewrgiou =================================TOP=============================== Q381: Best MT debugger for Windows... ahem people, Pardon this post, by me, a rather beginner to multithreading, both under win32 and nix, but, IMO, it would be IDIOTIC (!!) to use a win32 debugger when debugging win32 threads. You want a debugger that can stop the state of the system exactly where and when you want it, and do many many other wonderful things, without relying on the win32 debugging API, then just use SoftIce! I've used it successfully for quite some time, enough to know that it is the best debugger in the world :) It has versions for win9x/NT/2000, and is much more reliable than any other win32 debugger I've seen.. Regards, E =================================TOP=============================== Q382: Thread library with source code ? Yang Ke writes: > Is there any implementation of user-level thread library > with source code ? State Threads: http://oss.sgi.com/projects/state-threads/ Pth: http://www.gnu.org/software/pth/ Quickthreads: http://www.cs.washington.edu/research/compiler/papers.d/quickthreads.html -- Michael J. Abbott [email protected] www.repbot.org/mike =================================TOP=============================== Q383: Async cancellation and cleanup handlers. In article <[email protected]>, [email protected] wrote: >after a (likely naive) naive look at pthread_cleanup_push and _pop, i'm >puzzled by the race condition of: > >pthread_cleanup_push(cleanup_mutex,&mylock); >/* what if we are cancelled here (requires async cancellation, > * deferred is fine) */ >pthread_mutex_lock(&mylock); Code which requires cleanup of locks or resources should not be doing async cancelation. Because the cleanup code has no way to even safely execute, never mind determine what needs to be cleaned up! >does posix require an implementation of pthreads to work around this >condition? or can one simply not safely use cleanup_* in the face of >async cancellation? POSIX does not require any functions to be async-cancel-safe, other than the ones that manipulate the thread's cancelation state. Fistly, Draft 7 defines ``Async-Cancel-Safe'' Function in 3.23 as: A function that may be safely invoked by an application while the asynchronous form of cancelation is enabled. No function is async-sancel-safe unless explicitly described as such. Then, paragram 2.9.5.4 (Async-Cancel Safety) says: The pthread_cancel(), pthread_setcancelstate(), and pthread_setcanceltype() are defined to be async-cancel safe. No other functions in this volume of IEEE Std 1003.1-200x are required to be async-cancel-safe. So, while you have asynchronous cancelation enabled, these are the only three library functions you may use. =================================TOP=============================== Q384: How easy is it to use pthreads on win32? > ok, my question is this, how easy is it to use pthreads on win32? I mean, > does anyone here use them or even recommend them above win32 threads? just > thought I would post some questions on things I wouldnt mind some help with At my company we use the pthread implementation from http://sources.redhat.com/pthreads-win32 extensively because we need a portable thread library between various unix flavours and win32. > > 1. are they just as fast as win32 threads? > > 2. as they implemented as a wrapper for win32 threads underneath (hence > on win32, they are a wrapper and not a whole solution) They are implemented as a wrapper for win32 threads, which is not unusual for a portable thread lib. This causes some overhead, but is depends on your application whether that is a problem or not. In one specific application where we use the above mentioned pthread implementation, we had to resort to win32 primitives for synchronisation because of performance aspects. > > 3. Can I code on win32 using these pthreads and then with just a > recompile on a linux platform with a properly configured pthreads > installed, my code will compile ok and run fine? Usually no problem. Wolf Wolfswinkel > 1. are they just as fast as win32 threads? yep, as long as you're not creating/destroying many threads. > 2. as they implemented as a wrapper for win32 threads underneath (hence > on win32, they are a wrapper and not a whole solution) Quite a thin wrapper actually, so not much overhead. I actually rebuilt the library and compressed it also using UPX (just for laughs :-)) and it was 6K! > 3. Can I code on win32 using these pthreads and then with just a > recompile on a linux platform with a properly configured pthreads > installed, my code will compile ok and run fine? Yep I've been doing this for 2 years. > 4. does anyone have trouble using pthreads on win32? Not me. > well, I think thats about it, of course, you could understand the sorts of > questions I'm trying to ask here and answer back something I havent asked, > but would help me to know, that would be great > > thanks for you help guys ! > > kosh Padraig. =================================TOP=============================== Q385: Does POSIX require two levels of contention scope? > I have a question concernign pthreads and POSIX. Does POSIX require that > there are two levels of contention scope supported > (PTHREAD_SCOPE_SYSTEM, PTHREAD_SCOPE_PROCESS) for the pthreads? I > understand that many platforms support the 1-1 model, where one > user-level thread (ULT) maps to one kernel-level schedulable "entity". POSIX requires an implementation to support AT LEAST ONE of the two contention scopes. While some may support both, that's not required. Also, the only way to determine which scopes are supported is to try creating the threads and see whether one or the other fails. And note that anyone who can do "1 to 1" threads could easily "cheat" and claim to support both. There's nothing a legal/portable program can do to detect whether the implementation has "lied" and given it a system contention scope (SCS) thread when it asked for PCS (process contention scope). The converse isn't true: an implementation that gives a PCS thread when asked for an SCS thread is broken, and an application that really needs SCS can easily tell the difference. (I don't see any point to such "cheating", and I certainly wouldn't endorse it. It's just an illustration of the fact that PCS is a way to enable the implementation to make certain optimizations, and places no functional limitations on the system.) > Is anyone familiar with kernels (except Linux) supporting threads, that > schedule processes only (as opposed to scheduling kernel threads which > map to phtreads)? I remember some work in the early 90's about > continuations where the kernel would employ call back functions for user > level work within a process. Is anyone familiar with any pthreads > implementation based on this or other similar mechanisms? Nathan Williams already pointed out the U. Washington "Scheduler Activations" paper. The term "continuations" comes from a CMU research project applying the scheduler activations theory to the Mach 3.0 kernel. Their model allowed mixing scheduler activations with traditional synchronous kernel blocking states to ease the transition. Both Solaris and Tru64 UNIX claim "inspiration" from the Scheduler Activations paper, but I know of nobody who's fully implemented scheduler activations (or continuations) in any "real" kernel. The Tru64 implementation, at in terms of behavior if not detailed implementation, comes the closest to the ideal; we have much more complete, and tighter, integration and communication between the kernel and user mode schedulers than anyone else. Despite long conviction that this ought to be, as the original paper supposed, "the best of both worlds", that goal has proven to be elusive in the real world. In theory, there are more or less 3 classes of application: 1) Compute-bound, such as highly parallel (HPTC) mathematical algorithms. The fact is that such applications don't care a bit whether their threads are SCS or PCS. They'll have one thread for each available processor, and those threads aren't supposed to block. 2) Synchronization-bound, threads that run a lot and compete for POSIX synchronization objects like mutexes and condition variables. These applications will generally run faster with 2-level scheduling than with "1 to 1" scheduling, because they "never" need to cross the kernel protection boundary. 3) I/O-bound (communications), threads that do a lot of file or network operations, which always involve kernel boundary crossings. Since they usually block in the kernel, they'll usually run faster if all the scheduling activity occurs in the kernel. Web and mail servers tend to fall into this category. Many database engines do, too. In theory, the overhead imposed by scheduler activations (or some reasonable approximation, like ours) will be small enough that such applications won't be substantially impacted; and that overhead will be washed out by gains in whatever user-mode synchronization they also do. In practice, this is not necessarily true, because it's really tough to get the overhead down into the noise level. ("In theory, there's no difference between theory and practice; in practice, there's no similarity.") We have found, in practice, that many of the important applications fall into either category 1 (HPTC) or category 3 (communications), where 2-level scheduling is, on average and in general, either of little benefit or amoderate penalty. We've been working on creative ways to decrease the "upcall penalty", but it takes a lot of work from both the user and kernel sides, and that can be difficult to schedule since everyone's always overworked. We haven't made nearly as much progress as we'd hoped. People keep talking about "improving" Linux by adding 2-level scheduling, and right now I have to say that's nonsense. IBM wants it apparently just so they can support poorly designed Java applications that push the typical Linux configuration's limits on process creation. While there's something to that, the costs are absurd, and the confusion wrought by having two common thread packages will be severe. I'm sure there are easier ways to remove the configuration restrictions. I'd much prefer to see people focusing on fixing the kernel limitations that prevent Linuxthreads from fully conforming to POSIX and UNIX 98 so we can improve industry portability. I'm certainly not ready to give up on 2-level scheduling, but it's not a simple or easily applied technology, and we're still working on understanding all of the details. We're pretty much convinced that "pure scheduler activations" isn't workable, but there are compromises and adaptations that we think would probably work. Unfortunately, changing an existing kernel to use scheduler activations is a really big job; maybe bigger than simply building a new kernel from scratch. The Mach continuations work is probably the best course, especially where Mach is already there (e.g., Tru64 and Darwin/Mac OS X); but raw Mach source is a long way from being "product quality", and it's not easy to patch continuations into a much-altered commercial quality "Mach based" kernel. /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q386: Creating threadsafe containers under C++ > Does anyone have some tips/book/web-pages where I can find some ideas how > to create threadsafe containers under C++ (like lists, trees, etc.)? > I would like to have something like a threadsafe STL or so. The STL > provides a very well structured set of useful containers and algorithms. > But encapsulating them in a kind of threadsafe proxy is sometimes not good > enough. I am familiar with common pattern for concurrent programming, but > they again are too general. > I would be interested in how to clearly provide a useful set of functions > as programming interface. > You can look at our STL thread-safe wrappers in GradSoft Threading. http://www.gradsoft.com.ua/eng/Products/ToolBox/toolbox.html For example lookup in a list and insert if the element does not exist is a > simple example; this should be part of the container class because it > must be an atomar operation and thus cannot be solved by sequentially call > lookup() and insert(). > > Thanks in advance for any hint, > > Alex. =================================TOP=============================== Q387: Cancelling pthread_join() DOESN'T detach target thread? Alexander Terekhov wrote: > the standard says: > > "Cancelation Points > > Cancelation points shall occur when a thread is > executing the following functions: > . > . > . > pthread_join()..." > > "If the thread calling pthread_join() is canceled, > then the target thread shall not be detached." > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Q) WHY ??? This is, as you guessed below, a requirement for the IMPLEMENTATION. What it means is perhaps a little complicated, and this is one of the reasons that the STANDARD isn't really intended as general user documentation. Part of the confusion is historical, and would be more obvious to someone familiar with DCE threads, which was an implementation of an early draft of the standard document. Originally, "detach" and "join" were separate operations. "Join" merely allowed a thread to wait until the target was complete. There was no reason that both threads 1 and 2 couldn't wait for thread 3 to complete. But in order to ensure that system resources could be reclaimed, ONE (and only one) thread would then have to "detach" thread 3. In fact, nobody could ever construct a case where it made sense for both threads 1 and 2 to wait for thread 3 to complete. While there might be cases where thread 3 was doing something on which two threads depended, joining with it is never the right (or best) way to implement that dependency. (It really depends on some DATA or STATE, not on the completion of the thread, per se.) And it's silly to require two calls to "finalize" thread 3. Furthermore, one of the most common DCE thread programming errors was failure to detach after joining, resulting in a memory leak. So it was decided that join should implicitly detach the target thread once it was complete. In fact, the detach operation, and all reference to it, was removed from the standard. (Over the objections of some of us who knew better.) But there were two problems. There was no way to alter the initial thread of a process so that its resources would be automatically reclaimed on termination. And there was this nasty problem in join; where, if the join is cancelled, which implies it didn't complete, the target thread of the join would languish forever -- another memory leak about which the program could do nothing. The solution, of course, was to restore the detach operation; even though it was now less useful (and more rarely needed) than before. However, although the standard could now talk about detach, the description of join wasn't edited to make as clear as we might wish that, effectively, the join operation is "wait for termination and then detach". The standard says and implies in several places that one EITHER detaches OR joins, and only this discussion of the cancellation behavior of join suggests a connection or dependency. > note that the standard also says: > > "It has been suggested that a ''detach'' function is not > necessary; the detachstate thread creation attribute is > sufficient, since a thread need never be dynamically > detached. However, need arises in at least two cases: > > 1. In a cancelation handler for a pthread_join () it is > nearly essential to have a pthread_detach() function in > order to detach the thread on which pthread_join() was > waiting. Without it, it would be necessary to have the > handler do another pthread_join() to attempt to detach > the thread, which would both delay the cancelation > processing for an unbounded period and introduce a new > call to pthread_join(), which might itself need a > cancelation" > > and > > "The interaction between pthread_join() and cancelation > is well-defined for the following reasons: > > - The pthread_join() function, like all other > non-async-cancel-safe functions, can only be called > with deferred cancelability type. > > - Cancelation cannot occur in the disabled cancelability > state. Thus, only the default cancelability state need > be considered. As specified, either the pthread_join() > call is canceled, or it succeeds, but not both. The > difference is obvious to the application, since either > a cancelation handler is run or pthread_join () returns. > There are no race conditions since pthread_join() was > called in the deferred cancelability state." > > so i am really puzzled by the restriction which does not > allow me to detach the target thread. IMHO it should > be declared that it is really safe (and even required) > to _detach (or _join) the target thread "If the thread > calling pthread_join() is canceled" !! > > or am i missing something? Yes, though you've guessed it below. The IMPLEMENTATION isn't allowed to detach the target thread when join is cancelled. But YOU can call pthread_detach() at any time, for any pthread_t value you hold. (You just can't join, or re-detach, a detached thread.) > perhaps "shall not be detached" is meant with > respect to implementations only; not with > respect to applications which can request > cancellation of joining thread(s)? (but then > why implementation would ever want to "DETACH" > when application have requested "JOIN" ?! i > just do not see any connections between > "detach" and "join" with respect to > cancelation even from impl. point of view) Again, the confusion is because "join" is really "wait for termination and detach". So the implementation MUST detach when the application requests "join"... except when the join operation is cancelled before the target thread terminates. Then YOU are responsible for detaching the target thread explicitly, in the cleanup handlers... or allow for the fact that it will continue to consume some process resources (including at least the thread stack, and possibly more). /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q388: Scheduling policies can have different ranges of priorities? Dale Stanbrough wrote: > I read somewhere that different scheduling policies can have > different ranges of priorities, and that symbolic values should > be used to represent min/max priorities. > > Is this correct, and if so where can i find the definition of > these priorities? Presuming that you're talking about POSIX threads, the only way to find the legal range of POSIX priority values for a given policy is by calling sched_get_priority_min() and sched_get_priority_max(). (Each takes a single int argument, the symbol for the policy, such as SCHED_FIFO, and returns an int priority value.) One minor complication is that these interfaces are under the _POSIX_PRIORITY_SCHEDULING option, and are not required for the _POSIX_THREADS option or the _POSIX_THREAD_PRIORITY_SCHEDULING options; thus, you could possibly find realtime threads scheduling support WITHOUT any supported/portable way to determine the priority ranges. (I actually tried to get that fixed, but nobody else seemed to care. In practice, they're probably right since you're unlikely to find "thread priority scheduling" without process "priority scheduling"; though it would have been nice to lose this inconsistency.) On most systems, there are other ways to find out, either by symbols in the header files or documentation; but those methods aren't portable. /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q389: The entity life modeling approach to multi-threading. I certainly don't pretend to be anywhere close to Don Knuth, but my web site www.coloradotech.edu/~bsanden has a lot of material on the entity life modeling approach to multi-threading. It is what you're asking for since it is at the design level and not concerned with the details of any particular kind of threads. www.coloradotech.edu/~bsanden/elm.html is an overview. There are examples of multi-threading at www.coloradotech.edu/~bsanden/rtexamples.html and various papers at www.coloradotech.edu/~bsanden/pubs.html. Bo Sanden Colorado Tech. University Michael Podolsky wrote: > Hi > > Does anybody know really good books about multithreading? > Not about using pthreads or win32 APIs, not for beginners. > Smth. near the level of Knuth multivolume book or GOF Design Patterns. > Something full of wisdom :-) that really teaches multithreading programming. > > Or maybe you know about good web sites about multithreading? > > Thanks, > Michael =================================TOP=============================== Q390: Is there any (free) documentation? >> Buy Dr. Butenhof's book. >Is there any ***FREE*** documentation? a) http://www.lambdacs.com/cpt/FAQ.htm http://devresource.hp.com/devresource/Docs/TechPapers/PortThreads.html http://sources.redhat.com/pthreads-win32/ http://www.humanfactor.com/pthreads/ http://www.cs.msstate.edu/~cr90/doc/sun/POSIXMultithreadProgrammingPrimer.pdf http://twistedmatrix.com/users/jh.twistd/cpp/moin.cgi/ThreadingArticles b) Do your own research in future. c) Buy the damned book. [Let me repeat, Buy the damned book, Buy the damned book -Bil] =================================TOP=============================== Q391: Grafting POSIX APIs on Linux is tough! >> Grafting POSIX APIs on top of a kernel that refuses any support for POSIX >> behavior is tough. > > This discussion attracted my attention. I'm in the process of > implementing a two-level pthread system, and deferred cancellation has > recently popped up on my to-do list. The technique of wrapping system > calls with a check, while doable, is unpleasant for performance > reasons. However, I'm at a loss as to what I can make the kernel do > that will help. > > One of my design goals has been to keep the kernel ignorant of the > upper (application thread) layer, so the kernel can't just check a > flag in a thread's control structure on entry to read(), for > example. Also, since there's no fixed binding between application > threads and kernel execution contexts, it's not reasonable to set a > per-KE flag saying "hey, something here wants deferred cancellation" - > in addition, the flag would have to be set or cleared at every thread > context switch until the target was cancelled. Sorry, but you want my opinion, here it is. Any attempt at 2-level scheduling where the user and kernel level schedulers do not have constant and high-bandwidth communication to share information is completely broken and absolutely useless. The quest to keep the kernel ignorant of the user scheduler is going off completely in the wrong direction. You want more and tigher coordination, not less. If you're not willing to take on that challenge, then stick with LinuxThread-style "1-1" kernel threads. It's simple, it works. There's no such thing as a "simple" 2-level scheduler, and anything that comes even close to working is not only extraordinarily complicated, but extends tendrils throughout the kernel and development environment (debugging, analysis tools, etc.). Solaris has backed off on 2-level scheduling with the alternate "bound thread" library in Solaris 8 because, in my experience and opinion, they lacked the will to increase the integration between kernel and user schedulers to the point where it could approach "correct". > Digital Unix is the only system I'm aware of that has similar > constraints. What does it do? We have the best 2-level scheduler -- arguably the only REAL 2-level scheduler in production UNIX -- precisely because we have tightly coupled integration between user and kernel. We're adding more, because we still suffer from disconnects in scheduling behavior. (For example, we don't have any way to know whether a kernel thread is currently running, or the actual instantaneous load of a processor.) We have the best architecture for debugging and analyzing applications using 2-level threads, but we continue to find subtle places where it just doesn't work, and it's not uncommon for me to go off and spend a month or two hacking around to isolate, identify, and fix some subtle bug. Like everyone else around here, I started out as a big fan of 2-level scheduling. Heck, it's a wonderful engineering challenge, and a lot of fun. But I have become greatly unconvinced that the quest for perfect 2-level scheduling can be worth the effort. I can write any number of programs that will "prove" it's better... but I can just as easily write an equal number of programs that will prove the opposite. Given that it doesn't benefit HPTC, and may hurt (but at least doesn't help) most I/O bound server type applications, it's hard to rationally justify the substantial investment. I think, in particular, that the notion of adding 2-level scheduling to Linux is counterproductive. (I'm trying to restrain myself and be polite, though I'm tempted to use stronger words.) The basic motivation for some of this work appears to be the ability to run "wild" Java apps with bazillions of threads without having to increase the Linux process configuration limits. Cute, but this is much more a "fun engineering experiment" than a practical solution to the problem. For one thing, Java or not, apps with that many threads will spend most of their time tripping over their own heels. It's largely a symptom of the lack of select() or poll() in Java, and I've seen some indication that this has finally been addressed. /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q392: Any companies using pthread-win32? >Hello > I am trying to get a cross-platform thread package. > > QThread released by TrollTech is a pretty stable implementation >althought it does not provide enough interfaces to interact with the >bottom layer pthread package or Win32 APIs but I am satisfied with its >performance; besides, I can always pthread or WIN32 APIs to do what I >want. The only butterfly in the ointment is that my program has to be >linked with the whole qt library even if I only use portion of it. > > I am considering pthread-win32 port but I am not sure whether it is >stable enought for commercial product; plus I am afraid a wrong decision >will have group-wide effects. Just wonder that is there any companies >implementing their commercial product using pthread-win32? How does >pthread-win32 implement pthread_atfork()? > >Thanks > >TCS PocketTV (http://www.pockettv.com) It is an MPEG Movie Player for Pocket PC devices (e.g. iPaq, Cassiopeia, Jornada). It is built on pthread-win32 (ported to Windows-CE). Works flawlessly. Thanks you all! -t =================================TOP=============================== Q393: Async-cancel safe function: guidelines? > 1.When a function claims to be async cancel safe, what are the > guidelines by which it can be classified as async-cancel safe. A function is async-cancel safe if asynchronous cancellation of such code will not corrupt data or otherwise prevent continued execution of threads within the process address space. > Butenhof's book on pg 151 states that no function that acquires > resources while aysnc cancel is enabled should be called. I like to keep my sweeping statements relatively simple. Otherwise, I'd've had to promise an infinite series of thread books delving into minutia in awe-inspiring (and deadly boring) detail. (Come to think of it, I've already written a lot of that "infinite series" in articles to this newsgroup.) All things are possible, including that some types of applications (particularly in a monolithic embedded system environment) might be able to be cleverly coded so that they could recover from asynchronous cancellation of a thread that owned resources. YOU, however, cannot do it; not on any general purpose UNIX system, anyway, because your application depends on invariants and resources that you can't even see, much less control. (For example, C stdio and malloc, C++ iostreams, POSIX mutexes, etc.) > But if I am implementing pthread_cancel which should be async cancel > safe by POSIX, there are critical sections present in pthread_cancel > implementation. But I should not use a mutex to lock, but some > mechanism to disable a context switch. Is this correct?? Absolutely not. "Disabling context switch" is grotesque and almost always improper. There's absolutely no justification for making pthread_cancel() async-cancel safe. It was stupid and pointless. An influential member or two of the working group insisted, and some of us who knew better failed to notice that it had been slipped in until it was too late to change. One member of the group has since hypothesized (from vague memories) that it was done so that the "master" of a compute-bound parallel loop team (e.g., explicit OpenMP type parallelism) could decide to cancel the other members of the group without needing to disable async cancelability. Stupid idea. It's done, so it's going to need to disable async cancel anyway. Furthermore, the only practical way to make pthread_cancel() async-cancel safe (the only way to make any useful routine async-cancel safe) is to DISABLE async cancellation on entry and restore the previous setting on exit. That's what I did, and that's what I recommend you do. Actually, I would prefer to recommend that you ignore the entire issue. Anyone who calls pthread_cancel() with async cancelability enabled is a fool, and their code probably won't work anyway. In fact, async cancellation is dead useless. It was an idea POSIX inherited from our CMA architecture, which had gone through years of theoretical evolution but no practical application. Async cancelability had seemed theoretically useful, but in retrospect it is not of any particular use, and far more trouble than its worth. It shouldn't have been in the standard at all. Even granting async cancellation, pthread_cancel() has no business being async-cancel safe. Your implementation is quite unlikely to be 100% POSIX conformant anyway, no matter what you do or test (even the UNIX 98 VSTH test suite, for which you'd have to pay The Open Group, isn't perfect or complete), and this is such a trivial and stupid issue that you should worry about it only if you're an "obsessive compulsive completion freak" and you've got nothing else in your life to worry about. ;-) ;-) > 2.How can asynchronous cancellation be implemented? POSIX standard says > that when cancelability type is PTHREAD_CANCEL_ASYNCHRONOUS, new or > pending cancellation requests may be acted upon at "any" time. In > letter not spirit of the standard, this can also be implemented as > PTHREAD_CANCEL_DEFERRED,that is only when it reaches a cancellation > point. But in spirit, would it have to be checked at a timer > interrupt/scheduler reschedule? Yes, that's the idea of the "at any time" loophole. Some implementations might choose to defer delivery until a timeslice interrupt or at least the next clock tick. You could also do it using a special signal (though it more or less has to be a special OS-reserved signal to avoid polluting the standard application signal number space). (There's no precise guarantee when a signal will be delivered, either.) /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ >Hello, >I am implementing a POSIX interface for a proprietary RTOS. >The basic schedulable entity is thread, and the OS is completely written >in C++.I am extending this by adding POSIX relevant functionality and >providing a C interface,by internally instantiating the thread class. >In this context, I have a few clarifications. > >1.When I do a pthread_cancel, I have to invoke the cancellation handlers >if it has not been explicitly called by a pthread_cleanup_pop(nonzero >value) and at pthread_exit. Yes. Normally, the pthread_cleanup_pop() would do whatever it takes to remove the cleanup from consideration. >I have a plan of handling this using exceptions.On an ASYNC cancel, an >exception object is thrown, which would contain the pointers to the >cleanup handler function. It's probably better to set up pthread_cleanup_push() and pthread_cleanup_pop() as macros that open and close a statement block. In this statement block, they can set up exception handling. E.g. if you are using purely C++ everywhere, and do not plan to support C, then you could use C++ exception handling for these. E.g. set up a class like this: class __pthread_cleaner { void (*__cleanup_func)(void *); void *__context; bool __do_execute; public: __pthread_cleaner(void (*__f)(void *), void *__c) : __cleanup_func(__f), __context(__c), __do_execute(true) { } ~__pthread_cleaner() { if (__do_execute) __cleanup_func(__context); } __set_execute(bool state) { __do_execute = state; } }; Now: #define pthread_cleanup_push(F, C) { \ __pthread_cleaner __cleaner(F, C); #define pthread_cleanup_pop(E) \ __cleaner.__set_execute((E) != 0); \ } >At the exit point I have a catch clause for this exception object,which >would then call the handler. >Is this design proper, or are there any subtle issues, which should be >handled to preserve POSIX semantics. > >1.One issue I am aware of is during the stack unwinding, only destructor >of local objects would be invoked. Since the cancellation handlers are >invoked at the end, and if the handlers makes a reference to some object >which would have been destroyed,there would be a problem. The right thing to do is to do everything in the proper nesting order. >I am using a customized version of gcc. >Is there any interface I have to add to gcc to handle this? > >2.The cleanup handler has to be invoked in user mode.As the exception is >non resumptive, when I am in the catch clause which is placed in the >kernel,the mode would have to be changed, else the handler would execute >in kernel mode. You need some operating system mechanism for passing the low level operating system exception to the user space program. Then in the user space proram you need to handle this and invoke the user space mechanism. E.g. over a POSIX-like kernel, operating system exceptions are represented as signals. So a signal handler can catch such signals an turn them into, say, C++ exceptions in some platform-specific way. The notification delivery semantics of POSIX cancelation are similar to signals, and are in fact designed to be implementable over signals. >On a general note(probably OT) are there any guidelines for handling >stack unwinding,in a mixed C/C++ call stack scenario. They are compiler specific. Using gcc, there are ways to insert exception handling hooks into C compiled code, because gcc and g++ have a common back end. For example, the Bounds Checking GCC (GCC patched to do pointer arithmetic checking) inserts construction and destruction hooks into functions which call into a special library for tracking the existence of objects and their boundaries. Ironically, this patch is currently only for the C front end. :) That patch does it by actually modifying the compiler to insert the code; I don't know whether it's possible to hack up some macros in GNU C itself to hook into exception handling, without mucking with the compiler. Research item! =================================TOP=============================== Q394: Some detailed discussion of implementations. > >> Nevertheless, in any implementation crafted by "well intentioned, > >> fully informed, and competent" developers, it must be possible to > >> use pthread_atfork() (with sufficient care) such that normal > >> threaded operation may continue in the child. On any such > >> implementation, the thread ID of the child will be the same as in > >> the parent, > >... > >> This may not apply to Linuxthreads, if, (as I have always assumed, > >> but never verified), the "thread ID" is really just the pid. At > >> least, pthread_self() in the child would not be the same as in the > >> parent. > > > >In LinuxThreads, the `thread ID' is not just the pid, at least in this > >context. pthread_self() actually yields a pointer to an opaque > > In LinuxThreads, pthread_t is typedef'ed as an unsigned long. But you are > right; it is converted to a pointer internally. LinuxThreads is handle based; > there is a array of small structs, referred to aws handles, which contain > pointers to the real thread descriptor objects. The thread ID is simply an > index into this array. > > >> The thread library could and should be smart enough to fix > >> up any recorded mutex (and read-write lock) ownership information, > > > >AFAICS, for the above reasons, such a fix-up is unnecessary in > >LinuxThreads. > > Such a fix up simply is necessary, but is unsupported in LinuxThreads. The > child should reinitialize the locks rather than unlock them. This is stated in > the glibc info documentation (but I wrote it, so I can't appeal to it as an > additional authority beyond myself :). That is to say, your child handler of > pthread_atfork() should use pthread_mutex_init() to reset locks rather than > pthread_mutex_unlock(). All thread-related objects should be treated this way: > condition variables, semaphores, you name it. > > The child process could inherit objects such as mutexes in an arbitrary state. > For example, a locked mutex could have a queue of waiting threads, and a > condition variable can have a similar queue. An unlock operation would then try > to pass the mutex to one of these threads, with strange results. > That is not to mention that some objects have internal locks which are not > taken care of across fork. E.g. some thread could be inserting itself on > a condition variable queue while the fork() happens. The child will then > inherit a condition variable with a locked internal lock, and a half-baked > wait queue. > > POSIX basically says that multithreading in the child process is not useful, > because only async safe functions may be used, so the way I see it, > LinuxThreads is off the hook for this one. But it does provide *useful* > multithreading behaviors across fork, above and beyond POSIX, provided you > follow its recommendations. You *can* use the pthread functions in the > child. Yes, there is a "loophole" (a big one) in the standard. It retains the 1003.1-1990 wording insisting that one cannot call any function that is not async signal safe in the child. This completely contradicts all of the intent and implications of the pthread_atfork() mechanism, giving you a clear and undeniable excuse for avoiding the issue of allowing threaded programs to "keep on threadin'" in the child. But then, you're not doing that, are you? You clearly intend to allow continued threading, but in a manner that clearly and broadly violates several aspects of the standard. The loophole doesn't apply here, Kaz! (You're not alone here, though. Even someone in the working group made this mistake in writing the pthread_atfork rationale [B.3.1.3] by suggesting that some libraries might use only the CHILD handler to reinitialize resources. That would be OK only if one could know that state of the resources in the parent allowed reinitialization, implying there were no threads or the resources hadn't been used, in either of which cases "reinitialization" might be legal, but would be useless and redundant.) POSIX has strict rules about mutex usage. One cannot reinitialize a mutex without destroying it. One cannot destroy a mutex that's locked, or "referenced" (e.g., by a condition wait, or by threads waiting to lock). One cannot unlock a mutex one doesn't own. Even if one could, the fact that the mutex is locked implies that application data is inconsistent; "reinitializing" the mutex without repairing all of that data (even if it was legal and portable) would just mean that incorrect data is now properly protected. Most applications cannot reasonably be expected to analyze and repair their data under such (essentially asynchronous) interruption of synchronized activities. An implementor is faced with a choice here. One can go with the strict POSIX and say that pthread_atfork() handlers are the "POSIX appendix" -- theoretically promising, but in practice useless. One must provide the interface, but admit the implementation does not allow you to make use of it. The loophole indeed allows this, though it seems a bit cowardly to me. (On the other hand, "getting it right", even inside the thread and C runtimes, is a bloody nightmare; and often cowards live longer and happier lives than heros. ;-) ) OR, one can implement the full (though poorly and incompletely stated) intent of the working group. This means that one must be able to lock a mutex in the PREPARE handler, and unlock it in both the PARENT and CHILD handler. That implies that fork() must do whatever is necessary to make the mutex consistent and usable in the child, including any steps required to "pass ownership" from the parent thread to the child thread. (If the mutex was owned by any other thread in the parent process, all bets are off.) It also means that any waiting threads, and any recorded association with a condition variable wait, be "cleaned up" (removed) by the thread library. Similarly, any condition variable wait queues must be cleaned up, because the condition variable cannot be reinitialized if there are waiters. /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q395: Cancelling a single thread in a signal handler? > Kaz Kylheku wrote: > > : Again, like pthread_create, pthread_cancel is not an async-safe function that > : can be called from a signal handler. > > Hmm, I've written code in a signal handler to cancel a single thread on > shutdown. What's the possible side effects of doing this since a signal > handler is not thread safe? The consequences are undefined, unspecified, and unportable. It may work. It may fail. It may cause a SIGSEGV. It may silently terminate the process with no core file. It may crash the system. It may cause the programmer to break out in a rash. It may cause the end of the universe, though that's perhaps slightly less likely. You can't do it, so don't worry about what might happen if you do it anyway. The actual answer will vary between platforms, between releases on the same platform, and perhaps even from day to day due to subtle interactions you haven't anticipated. Therefore, "I tried it and it worked" (which actually means "I tried it and I didn't observe any overt evidence that it didn't work") isn't good enough. Sometimes "pushing the limits" just doesn't make any sense. Dangerous coding practices like this will often work fine during development, and even during controlled testing, and will fail on the site of a critical customer on their production system under heavy load, which they cannot allow you to take down or closely analyze. /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q396: Trouble debugging under gdb on Linux. >I'm having a lot of trouble debugging a multi-threaded >application (a CORBA app) under gdb on Linux (Mandrake 7.1). >When I set a breakpoint and a thread hits it, gdb shows me a >different thread's stack, one that isn't anywhere near the >breakpoint. Try: http://www.kernelnotes.org/lnxlists/linux-kernel/lk_0007_03/msg00881.html and related messages. Bye Thomas =================================TOP=============================== Q397: Global signal handler dispatching to threads. > I am fairleynew to threads. I have heard it mentioned that you can have >a global signal handler which will disbatch signals to other threads in an >app. I would like to know how this is achieved. I have tried adding a >signall handler for each thread when it starts and they do not seem to >recieve the signalls when I send them (via kill). I was Thinking if you kept >track of all pid which got started you could then send them signals but as I >stated earlier the threads doen't seem to recieve the signals. If someone >could explain this with a simple code snippet I would appreciate it. First off, since you didn't say what specific pthreads implementation you're using, I'll be answering for POSIX; there are signal weirdnesses in most (all? probably not) implementations because it's hard to get right, and some believe that the POSIX spec itself did not get some things right. Anyhow: Signals and threads are an ugly mix. You've got the asynchronicity of threads multiplied by the asyncronicity and complex rules (though a lot of people don't seem to know them) of signal handlers. Most high-performance thread implementations start doing ugly things if you call threading functions inside signal handlers. In POSIX threads, the signal disposition (i.e. handler) table is a process resource. Thus it is not possible to have different actions for the same signal in different threads without a lot of hard, probably non-portable, work. The signal *mask* on the other hand (blocked signals) is per-thread, so that each thread may indicate what signals may be delivered in its context. The easiest way to deal with signals in multithreaded systems is to avoid them if at all possible. Barring that, have the initialization code block all signals before creating any threads, and have one thread dedicated to doing sigwait() for interesting signals. This removes the (abnoxious?) multiplicitive asynchronicity. Signal delivery in a POSIX environment is quite simple to describe, but has lots of subtle consequences. When a signal is generated asynchronously to a process (i.e. sent with kill(), sigqueue(), or one of the POSIX.1b functions that can cause a signal to happen), the following steps occur: 1. If any thread is in a sigwait(), sigtimedwait() or sigwaitinfo(), that thread will synchronously "accept"[1] the signal. 2. The system looks at the signal masks of all threads in the process, in unspecified (i.e. random) order, looking for a thread that has that signal unblocked. The signal will be "delivered"[1] in the context of one of the threads that have the signal unblocked. 3. If none of the threads are accepting the signal or have it unmasked, the signal will remain pending against the process, and the first thread that unmasks or accepts the signal will get it. Now, for synchronous signals (i.e. those that are attributable to a specific thread[2]) and signals sent via pthread_kill to a specific thread, it's a slightly different story. The signal is delivered (or made pending) to that thread, and will only be deliverable (or acceptable) from that thread. Note also that blocking or SIG_IGNing SIGFPE, SIGILL, SIGSEGV, or SIGBUS causes undefined behavior unless the signal in question is generated by sigqueue(), kill(), or raise(). And finally, you asked about a global signal handler that dispatches to other threads. The way that one would implement that would be to have a single thread sigwait()ing on everything, and when it gets one, does some amount of magic to figure out what thread needs to be kicked, and use some prearranged inter-thread mechanism (probably *not* pthread_kill()!) to let the victim know. There, did that make your head hurt? [1] These are POSIX-defined terms. The phases of a signal's lifetime are reasonably called out in POSIX. [2] Whether alarm() expiration is attributable to a specific thread is up for some debate. POSIX gives illegal instructions and touching invalid memory as examples of signals attributable to a specific thread. =================================TOP=============================== Q398: Difference between the Posix and the Solaris Threads? > >Hi, > >What is the difference between the Posix and the Solaris Threads? > > Solaris Threads is really called UI (Unix International) Threads. > It predates POSIX threads. The API was developed by AT&T (maybe also > Sun, I'm not sure). The header file (/usr/include/threads.h) could > be found on many SVR4 unixs, although rather confusingly, the actual > library (/usr/lib/libthread.so) was shipped with far fewer. I > rather presumed from this that it was up to each vendor to actually > implement their own libthread.so, and more significantly the kernel > support underneath, but most SVR4 vendors didn't bother; Sun Solaris > and Unixware [Novell at the time] were the only two that did AFAIK. Sun put together a "real thread package" for the followon to their 4.x series systems, which had contained a purely dreadful attempt at a user-mode thread package (confusingly called "LWP", the same as the name given to the "kernel entities" supported later). When they started working with USL (the semi-autonomous arm of AT&T tasked with controlling System V UNIX) and later UI (UNIX International, the even more semi-autonomous group put together to try to separate yet more from AT&T control in answer to the apparently perceived threat of OSF) on what became "Solaris 2" (the new SVR5-based Solaris kernel, internally designated "5.0"), the new thread package became part of SVR5, in an update designated SVR5 4.2MP. The interface developed along with the early drafts of POSIX, which is why there's so much similarity. Whereas POSIX put all the "core" thread functions into , UI threads has several headers, including the (mis-)referenced and for mutex and condition operations. By the end of the process, the avowed goal of UI threads was to have a short-term "standard" they could use while the long process of getting industry-wide concensus on POSIX slogged onward. This was, in many ways, a better strategy than OSF's, which was to implement an early draft (draft 4, aka "DCE threads") of the POSIX spec and ship that. Having a different name space avoided a lot of confusion, and made it easier to support both once the standard came out. (That's a major reason why Solaris was the first major UNIX OS to ship with POSIX support; most of the others had been using OSF's "DCE threads", and had a lot more compatibility issues to resolve.) > The two APIs are very similar - in some areas pretty much just different > function call names. However, there are some underlying differences > in symantics when you get into the details, such as the behaviour of > fork() / fork1() and SIGALRM in the two models, so you can't simply > convert one to the other with a few global exchanges. UI threads insisted on the utterly silly, complicated, and ultimately pointless "forkall" model, where your child process is transparently fired up with a whole set of asynchronous cloned threads in unknown states, that have no way of knowing they're running in a different address space. This was intended to "relieve the programmer" of dealing with POSIX "atfork" handlers, but (though atfork handlers are hardly a perfect solution, or necessarily easy to use), "forkall" is far worse. UI threads had "per-LWP" delivery of SIGALRM. Rather pointless, except perhaps when a UI/POSIX thread is "bound" (POSIX system contention scope) to the LWP. POSIX refused to accept this, as it makes for a more complicated signal model at the program level. (We didn't like different asynchronous signals behaving differently.) Solaris has been announcing for quite a while now that the per-LWP timer signals are an obsolete feature that will be going away (for UI threads as well as POSIX threads). As with many other obsolete features, whether it actually ever WILL go away is anyone's guess. > Anyway, POSIX threads has now taken over; no one should be writing > code using UI threads any more. Well, if there is still some release, somewhere, that has UI threads but not POSIX threads, and if you were writing code for that system that didn't need to be portable, or needed to port only to other SVR5 4.2MP systems (e.g., Solaris and/or UnixWare), you might have a convincing argument for using UI threads. ;-) Other than that, yeah; use POSIX threads. /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q399: Recursive mutexes are broken in Solaris? Do a google/deja search on this newsgroup for: "solaris recursive mutex broken" and you'll see a discussion of the details of this, I believe with links to sun's patches for this problem. Beware that even with this patch, you may not use recursive mutexes in the pthread_condition* calls at all in solaris, even I believe in solaris 8, even if the lock count of the mutex never goes above 1 lock. "Roger A. Faulkner" wrote in message news:[email protected]... [snip] > You are almost certainly a victim of this bug: > > Bugid: 4288299 > Synopsis: Recursive mutexes are not properly released > > This has been patched, at least in Solaris 8. > Sorry, I don't know the patch-id. > > Roger Faulkner > [email protected] > > =================================TOP=============================== Q400: pthreads and floating point attributes? > Does the pthreads standard have anything to say about whether threads > inherit the floating point attributes of the main program? > > In particular, I am working on a system that implements the > fesetflushtozero() function to change the underflow behavior of IEEE > floating point from gradual to sudden. The man pages on this system > say that this function has been adopted for inclusion in the C9X > standard. What I need to know is whether the effect of calling this > function in the main program should be inherited by threads that are > subsequently created by the main program? On the system that I am > working on, this turned out not to be the case, and before reporting > this as a bug, I need to know whether this constitutes a bug, from the > standpoint of the pthreads standard, or whether this is just a quality > of implementation issue. POSIX 1003.1c-1995 (and UNIX 98) doesn't say anything about floating point state, because that's not really a portable concept. (Some machines might not even HAVE floating point. Hey, POSIX doesn't even insist that a machine have a STACK.) On an implementation for a machine that does have software-controllable floating point state, it might make sense for the state to be inherited by a created thread. On the other hand, from a quick reading of the C99 description of the floating-point environment functions, I can't really infer what the designers of the C99 features might have wished when they're extended into multiple threads. (By the way, I don't see any mention in C99 of fesetflushtozero(), nor any way to control the behavior of IEEE underflow except to enable or disable reporting of the exception. I don't know whether this is a proprietary EXTENSION to ANSI C99 on your platform, [which they might have hoped would be added to the standard], or a preliminary feature in "C9X" that was removed before the final version.) I can't say whether the behavior you infer is "a bug", but it's definitely NOT in violation of POSIX or UNIX 98. You don't way what platform you're using, so it's not possible for anyone to say authoritatively whether that platform is expected to behave the way you'd like (which would make the behavior a bug), or whether it's supposed to behave the way you observe (in which case its a feature you don't happen to find convenient). We're in the process of finishing up the new versions of POSIX and UNIX. They are now based on C99, but there's no specification of how (or if) the C99 floating-point environment is inherited, by fork, exec, or pthread_create. While both exec and fork have "catch all" phrases that end up requiring the state to be inherited, there's no such statement in pthread_create, which makes the behavior undefined. (And up to the whims, or oversights, of the implementor.) As I said, I'm not really sure what the designers of the C99 features might have wanted in each of these cases. I think that what seems to make sense to me would be for fork and pthread_create to inherit, but for exec to reset to the system default. /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q401: Must SIGSEGV be sent to the thread which generated the signal? > In comp.programming.threads Florian Weimer wrote: > > : SIGSEGV is sent to the thread which generated the signal. Does the > : standard mandate this? > > The POSIX standard mandates that a thread that misbehaves and causes a > SIGSEGV to be generated will have that signal delivered to the thread in > question. More generally, any signal caused by the direct and immediate action of a thread must be delivered to the thread. These are "thread directed" signals, and include SIGSEGV, SIGBUS, SIGFPE, anything sent by pthread_kill() or raise(). Signals that are directed at the process, though, cannot be restricted to a particular thread. This includes kill(), and the normal "asynchronous" external signals such as SIGCHLD, SIGIO, SIGINT, etc. > Most other signals will be delivered to a random thread that does not have > the signal masked. This is the crux of the problem with the Linux implementation. If the PROCESS receives a signal, and it happens to come into a thread (Linux process) that has the signal blocked, the PROCESS cannot receive the signal until the THREAD unblocks it. This is bad. POSIX requires that a PROCESS signal go to any thread that does not have the signal blocked; and if all have it blocked, it pends against the PROCESS until some thread unblocks it. (Linuxthreads can, to some extent, fake this by never really blocking signals, and having the handlers echo them to the manager thread for redirection to some child that doesn't have the signal blocked... but that's complicated, error prone, [e.g., the signal will still EINTR a blocking call, whereas a blocked signal wouldn't], and all that echoing is relatively slow.) Lots of people may disagree with the POSIX model. It was the hardest part of the standard. We fought for years before Nawaf Bitar fearlessly lead (and cattle-prodded) the various camps into "the grand signal compromise" that ended up forming the basis of the standard. Still, the major alternative to what we have would have been a model based on "full per-thread signal state", and that would have made a real mess out of job control because, to stop a process, you'd need to nail each thread with a separate SIGSTOP... problematic if it's dynamically creating threads while you work. (And, sure, there are infinite shadings between the extremes; but that's where people really started knocking each other over the head with chairs, and it just was not a pretty scene.) /------------------[ [email protected] ]------------------\ | Compaq Computer Corporation POSIX Thread Architect | | My book: http://www.awl.com/cseng/titles/0-201-63392-2/ | \-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/ =================================TOP=============================== Q402: Windows and C++: How? % I am writing a program for windows 98 with visual c++. I want to create % a thread : what is the best solution : beginthread or createthread and what % is the difference between them ? When I want to use beginthread the % compiler says he doesn't know this. It's _beginthread. You should use _beginthread or _beginthreadex because it's the C library interface to CreateThread. There's no guarantee that C library functions will work correctly if you use CreateThread. It doesn't really matter in what way they will fail to work correctly, you might as well just use _beingthread. % So I've used CreateThread. I wanted to add my thread-routine to a class but % this didn't work. Has anyone a solution for this ?. This is a quite commonly asked question, which to me suggests that a lot of people ought to think more carefully about what they're doing. The thread function has to have a particular prototype, and if you try to use a function with a different prototype, chances are that it won't work. C++ is particularly anal about ensuring that prototypes match (sometimes to the point where they render the prototypes useless -- the other day, I had a C++ compiler reject a char * argument to a function because the function expected const char *), so it won't compile if you use a function with the wrong prototype, and class member functions never qualify as having the right prototype. % Because I couldn't add % my thread routine to a class I declared it as a friend. But I have to give This is the right thing to do. % an argument by reference (my class) to my thread-routine and I don't know But you can't do this. You must pass a void pointer. You can make this a pointer to an object and cast it on the call and in the function itself. Oh, looking at your code, you're not doing a C++ pass by reference, so what you're doing is OK. % Cserver :: Cserver() % { % HANDLE hThread; % DWORD dwThreadId, dwThrdParam = 1; % hThread = CreateThread(NULL , 0 , &initialisatie , this , 0 , &dwThreadId); Strictly speaking, this is not legal. You can't start using the object until the constructor returns. % DWORD WINAPI initialisatie(void * LpParam ) % { % Cserver * temp = (Cserver *)(LpParam); % } And this isn't doing anything. You really ought to have a temp->something() in here to actually use the class. If you post an example showing the problem, it'll be possible to pick nits with it. -- Patrick TJ McPhee You should always use _beginthread or _beginthreadex. (notice the "_" before the function name that is why your compiler didn't 'know' beginthread, it 'knows' _beginthread) Don't forget to include . They call CreateThread internally. The difference is that _beginthread and _beginthreadex perform cleanup of the standard C libraries on termination. You can not have the threaded function be a regular member of the class because member functions are invisibly passed the this pointer as part of the argument list. You CAN make it a static member function and pass it the this* as the argument to the thread. If there is other data you need passed to the thread create a structure and pass a pointer to the structure (create it using 'new' and 'delete' it in the thread after you extract the values you need. What I normally do to keep the code clean and easy to read is create a startic start function to be executed as the thread then I call my member function from there as the only instruction. I use _beginthreadex because it's more like CreateThread and it allows for more control under NT. Like this: class CThreadedClass; typedef struct{ int Number; int Value; CThreadClass *pMe; } MYDATA, *LPMYDATA; class CThreadedClass { public: CThreadedClass() : m_nNum(0), m_hThreadHandle(0) {}; ~CThreadedClass() { if(m_hThreadHandle) CloseHandle(m_hThreadHandle); } static unsigned long StartThread(void *pVoid); static unsigned long StartThread2(void *pVoid); unsigned long WorerkThread(); unsigned long WorkerThread2(int x, int y); void Begin(); private: long m_nNum; HANDLE m_hThreadHandle; HANDLE m_hThreadHandle2; }; void CThreadedClass::Begin() { unsigned lnThreadID; m_hThreadHandle = (HANDLE) _beginthreadex(NULL, NULL, StartThread, (void *) this, NULL, &lnThreadID); //allocate mem for the dtruct dynamically //if you just create it locally it will be destroyed when the function exits and the thread has a pointer to invalid data LPMYDATA lpData = new MYDATA; lpData->pMe = this; lpData->Number = 20; lpData->Value = 6345; m_hThreadHandle2 = (HANDLE) _beginthreadex(NULL, NULL, StartThread, (void *) lpData, NULL, &lnThreadID); } unsigned long CThreadedClass::StartThread(void *pVoid) { CThreadedClass *lpMe = (CThreadedClass *) pVoid; return lpMe->WorkerThread(); } unsigned long CThreadedClass::StartThread2(void *pVoid) { LPMYDATA *lpData = (LPMYDATA) pVoid; int a = lpData->Number; int b = lpData->Value; CThreadedClass *lpMe = lpData->pMe; //cleanup delete lpData; return lpMe->WorkerThread2(a, b); } unsigned long CThreadedClass::WorkerThread() { for (int i = 0; i < 20000; i++) m_nNum += i; return 0; } unsigned long CThreadedClass::WorkerThread2(int x, int y) { for (int i = x; i < y; i++) m_nNum += i; return 0; } One call to Begin() and your thread is off and running. You can easily access public and private members without referencing a pointer to your object. Much cleaner to debug and read (especially when going back to your code in a few months). Hope that helps, Jim Microsoft has bungled this in a bad way. If you use CreateThread, and also use some functions within the C library, they will allocate thread-local storage which is not destroyed when the thread terminates. If you use _beginthreadex, the C library will call CreateThread for you, and direct that thread to its own thread startup function, which then calls your thread function. When your thread function returns, it will pass control back to the internal startup function will clean up the thread-local storage. This is due to the brain-damaged Win32 TLS design, which does not allow thread local storage keys to be associated with destructor functions that are called when the thread terminates. What's even more braindamaged is that _beginthreadex returns the thread handle cast to an integer. You can cast that back to a HANDLE. Avoid _beginthread, it closes the thread handle internally after making the thread, making it impossible for you to wait on the handle to synchronize on the thread's termination. >So I've used CreateThread. I wanted to add my thread-routine to a class but >this didn't work. Has anyone a solution for this ? This question is asked about twice weekly. Search the newsgroup archives. It's probably in the FAQ by now. =================================TOP=============================== Q403: I have blocked all signals and don't get SEGV! >A thread starts executing code and encouters a bad pointer (non-null but >rubbish anyway) > >NO SEGV or BUSERROR occurs, the thread simply stops executing. > >I have blocked all signals and work with a signal tread (with sigwait) > >Why doesn't the application crash when it accesses the bad pointer and >rather stops executing the offending thread. The application doesn't crash because you've blocked all signals. I suspect what's really happening (and this is an almost-bug in many OSes) is that the trap gets taken, the kernel code sends a SIGSEGV or SIGBUS to the intended target, the target has that signal blocked, so it gets marked as pending, and then the target thread is resumed, only to reexecute the instruction that causes the fault. Lather, rinse, repeat. I suspect you'll see that thread getting a fair amount of CPU time. The reason your sigwait()ing thread is not seeing the signal is that synchronous signals are *always* sent directly to the thread that generates the fault. Thus, if SIGSEGV was unblocked and handled in the thread that caused the problem, the "normal" control flow *of that thread* would be interrupted to execute the signal handler. Some OSes have code that says "hey, wait, I just sent that very same synchronous signal for that very same trap" and forcibly unmask (and, usually SIG_DFL) the signal to get the offender out of the process table. It is possible to get caught on that code, but quite rare. -- Steve Watt KD6GGD PP-ASEL-IA ICBM: 121W 56' 57.8" / 37N 20' 14.9" =================================TOP=============================== Q404: AsynchronousInterruptedException (AIE) and POSIX cancellation BTW. I noticed a discussion thread from a couple of months ago regarding AsynchronousInterruptedException (AIE) and POSIX cancellation. FYI you can not use pthread_cancel to implement the async interrupt. AIE is just an exception (with some special propagation features) and can be caught. As always with Java, interrupt implies cancellation of the work not the worker. It *may* lead to the thread terminating but it doesn't have to. In contrast POSIX cancellation is termination of the thread. You can still subvert a POSIX thread by going off and doing arbitrary stuff in one of the cancellation handlers, but it would be very difficult (I'd almost say impossible) to map the exception based mechanism over that - you still have to deal with finally blocks, lock releasing etc as the call stack unwinds. Cheers, David =================================TOP===============================