Skip to content
Dylan Stark edited this page Feb 11, 2015 · 8 revisions

The qthreads API is designed to make using large numbers of threads convenient and easy, and to allow portable access to threading constructs used in massively parallel shared memory environments. The API maps well to both MTA-style threading and PIM-style threading, and we provide an implementation of this interface in both a standard SMP context as well as the SST context. The qthreads API provides access to full/empty-bit (FEB) semantics, where every word of memory can be marked either full or empty, and a thread can wait for any word to attain either state.

The qthreads library on an SMP (i.e. the POSIX implementation) is essentially a library for spawning and controlling tasks: user-level (non-kernel) threads with small (4k) stacks. The threads are entirely in user-space and use their blocked/unblocked status as part of their scheduling. The library's metaphor is that there are many qthreads and several "shepherds". Shepherds can be thought of as thread mobility domains; they map to specific processors or memory regions, and define where a qthread can, must, or would prefer to execute. Qthreads can be assigned to specific shepherds and do not migrate unless either directed to migrate or the shepherd is disabled or, if unassigned, stolen by another shepherd in search of work. This implementation supports both OpenMP (via the ROSE compiler) and Chapel, and can be used directly.

The Qthreads OpenMP implementation has proven to load-balance and scale better than both the GCC and Intel compiler OpenMP implementations for single-address-space computation (see paper below, Scheduling Task Parallelism on Multi-Socket Multicore Systems).

More Information

  • Data Structures - information about C-based lock-free data structures provided by the Qthreads library.
  • qtCnC - usage information for the Qthreads implementation of the Concurrent Collections model.

Mailing lists

Add mailing list references too:

Projects That Use and/or Contribute to Qthreads

Platforms & Requirements

Architectures

POSIX Qthreads supports most POSIX-style machines, including Linux, Solaris, and MacOS X, running on a variety of architectures. It has been tested on:

Architecture Linux Solaris MacOS X SST Cygwin
PPC32 + + +
PPC64 + +
IA32 + + +
IA64 +
ARM +
AMD64/x86_64 + +
SparcV9+ +
TilePro (MIPS) +
TileGX (MIPS-like) +

Compilers

Qthreads has been tested with:

Compiler Status
GCC 3.x Works (not on PPC)
GCC 4.x Works (PPC requires 4.2+)
Apple Clang 3.0 Works with pthread spinlocks, not built-in spinlocks; C++ support does not work
Clang 2.9 Works with pthread spinlocks, not built-in spinlocks; C++ support does not work
Clang 3.0+ Works ; C++ support does not work
PGI 9.0 Works
PGI 10.0 Works
PGI 11.x Works
Intel ICC 11.1.x Works ; does not support inline assembly on IA64
Intel ICC 12.x Works
Intel ICC 13.x Works
TileraMDE 2.0.0.77314 Works; requires -O0
TileraMDE 4.0.alpha11.134874 Works
SunStudio 12 Causes internal compiler errors ("Wasted space")

Build Requirements

To compile and run the POSIX Qthreads you will require:

  • A UNIX-like shell (Qthreads uses the GNU Autotools)
  • C Compiler (earlier than 1.5 requires either C++ or the [Cprops library](http://cprops.sf.net/ Cprops library)

To compile and run SST Qthreads you will also require:

Installation

Detailed installation directions are included in the INSTALL file in the distribution. Generally, we use GNU autotools and the standard configuration and installation behavior.

Papers & Publications

To cite qthreads, please use:

  • Qthreads: An API for Programming with Millions of Lightweight Threads
    Kyle Wheeler, Richard Murphy, Douglas Thain
    In the Proceedings of the 22nd IEEE International Parallel & Distributed Processing Symposium (IPDPS '08, in the MTAAP '08 workshop), IEEE Press, 2008.

To cite sherwood, please use:

Additional related publications:

  • Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications
    Dylan Stark, Richard Barrett, Ryan Grant, Stephen Olivier, Kevin Pedretti and Courtenay Vaughan
    In the Proceedings of the 2014 Workshop on Exascale MPI (ExaMPI), IEEE Press, 2014.
  • Adaptive Scheduling Using Performance Introspection
    Allan Porterfield, Rob Fowler, Anirban Mandal, David O’Brien, Stephen L. Olivier, Michael Spiegel
    RENCI Technical Report TR-12-02, December 2012.
  • The Chapel Tasking Layer Over Qthreads
    Kyle B. Wheeler, Richard C. Murphy, Dylan Stark, Bradford L. Chamberlain
    In the Proceedings of the Cray User Group 2011, June 2011.
  • Scheduling Task Parallelism on Multi-Socket Multicore Systems
    Stephen Olivier, Allan Porterfield, Kyle Wheeler, and Jan Prins
    In the Proceedings of the 25th International Conference on Supercomputing (ICS ‘11, in the ROSS ‘11 workshop), ACM Press, 2011.
  • Implementing a Portable Multi-threaded Graph Library: the MTGL on Qthreads
    Brian Barrett, Jonathan Berry, Richard Murphy, Kyle Wheeler
    In the Proceedings of the 23rd IEEE International Parallel & Distributed Processing Symposium (IPDPS '09, in the MTAAP '09 workshop), IEEE Press, 2009.
  • Portable Performance from Workstation to Supercomputer: Distributing Data Structures with Qthreads
    Kyle Wheeler, Douglas Thain, Richard Murphy
    In the Proceedings of the First Workshop on Programming Models for Emerging Architectures (PMEA), IEEE Press, 2009.