D. Kalinsky Associates |

Very often, an embedded systems or software engineer will need to figure out the maximum number of messages

that will queue up at a given message queue in a system that's being designed. Many RTOSs need this number in

order to create the queue. Other designers may be concerned with queueing delays, especially in "hard" real-time

systems. If I'm asked about these issues, I tend to answer,*“Oh, that’s just queuing theory”*. And invariably I’ll hear

an uncomfortable nervous sort of laughter from the embedded engineers. They're thinking*“Right! I’m gonna stop** *

my project in order to take a course on queueing theory, just for this* program’s message queue. What’s this guy *

been smoking?”

Similar questions come up about the capacity of linked lists and ring buffers, or the number of buffers in a buffer pool,

or even the assignment of dual-port RAM to buffer descriptors inside of modern communications processors. These

questions impact directly the amount of memory that must be allocated to the resource at issue, and the capacity of an

embedded system to handle the varying ebb and flow of data.

Sometimes engineers use empirical methods to try to estimate the required capacity. In one method, they divvy up

all available RAM in their development target board, assigning it to their stacks, their queues, their linked lists, and

buffer pools. Hopefully the hardware team has supplied sufficient RAM that there’s more than enough for all of these

structures. Before running the application software, test software “paints” these structures with a standard but

improbable content; for example patterns of 0xAAAAAAAA , 0x55555555 or 0xDEADDEAD. Then the application is run

for a while; and later the structures are examined to see how much of the standard “paint” is left. The area where

“paint” remains presumably has not been used by the application, and so most of it can probably be eliminated.

In another empirical technique, many real-time operating system sensitive debuggers/profilers and ‘object browsers’

have the capability of measuring the maximum number of messages in a queue at any time during an experimental

run of application software. Some will even show you graphs of queue filling as a function of time. Similar graphs can

be gotten for stacks and buffer pools. The high point of the graph will tell you how much of the capacity your software

actually used. But is this number a good measure of the required capacity for a given message queue or stack or

buffer pool? Perhaps not, since seconds or minutes after the end of the experimental run of the software, its memory

needs might increase dramatically had it been allowed to continue to run. And this information would not be captured

by any empirical method.

So empirical methods cannot always be relied upon, even if ‘fudge factors’ like adding 30% extra capacity, are

included.

A third approach, based on mathematics rather than empirical tests, may give us a better handle to the answers to our

capacity questions. This paper will present a simplified peek into queuing calculations that may be just enough to

perhaps answer some of our software design questions and let us know if more queuing theory is the way to get

more questions answered.

Software message queues will be used as our example here, although similar considerations could apply to linked

lists, buffer pools, ring buffers and stacks.

**MODEL OF A MESSAGE QUEUE**

A message queue is a set of memory locations that provide temporary storage for data that are being passed from

task to task (or to/from interrupt service software). Each set of data is called a ‘message’. When a new message is

stored but there are already messages in the queue, the old messages remain and the new message is added to

them in the queue. When a message is taken from the queue, no memory of that message remains in the queue.

Computer science gurus call this**“Non-Destructive Write / Destructive Read”**.

that will queue up at a given message queue in a system that's being designed. Many RTOSs need this number in

order to create the queue. Other designers may be concerned with queueing delays, especially in "hard" real-time

systems. If I'm asked about these issues, I tend to answer,

an uncomfortable nervous sort of laughter from the embedded engineers. They're thinking

my project in order to take a course on queueing theory, just for this

been smoking?”

or even the assignment of dual-port RAM to buffer descriptors inside of modern communications processors. These

questions impact directly the amount of memory that must be allocated to the resource at issue, and the capacity of an

embedded system to handle the varying ebb and flow of data.

Sometimes engineers use empirical methods to try to estimate the required capacity. In one method, they divvy up

all available RAM in their development target board, assigning it to their stacks, their queues, their linked lists, and

buffer pools. Hopefully the hardware team has supplied sufficient RAM that there’s more than enough for all of these

structures. Before running the application software, test software “paints” these structures with a standard but

improbable content; for example patterns of 0xAAAAAAAA , 0x55555555 or 0xDEADDEAD. Then the application is run

for a while; and later the structures are examined to see how much of the standard “paint” is left. The area where

“paint” remains presumably has not been used by the application, and so most of it can probably be eliminated.

In another empirical technique, many real-time operating system sensitive debuggers/profilers and ‘object browsers’

have the capability of measuring the maximum number of messages in a queue at any time during an experimental

run of application software. Some will even show you graphs of queue filling as a function of time. Similar graphs can

be gotten for stacks and buffer pools. The high point of the graph will tell you how much of the capacity your software

actually used. But is this number a good measure of the required capacity for a given message queue or stack or

buffer pool? Perhaps not, since seconds or minutes after the end of the experimental run of the software, its memory

needs might increase dramatically had it been allowed to continue to run. And this information would not be captured

by any empirical method.

So empirical methods cannot always be relied upon, even if ‘fudge factors’ like adding 30% extra capacity, are

included.

A third approach, based on mathematics rather than empirical tests, may give us a better handle to the answers to our

capacity questions. This paper will present a simplified peek into queuing calculations that may be just enough to

perhaps answer some of our software design questions and let us know if more queuing theory is the way to get

more questions answered.

Software message queues will be used as our example here, although similar considerations could apply to linked

lists, buffer pools, ring buffers and stacks.

task to task (or to/from interrupt service software). Each set of data is called a ‘message’. When a new message is

stored but there are already messages in the queue, the old messages remain and the new message is added to

them in the queue. When a message is taken from the queue, no memory of that message remains in the queue.

Computer science gurus call this

Figure 1: Message Queue Analogy - Olive Oil Bottling |

Let’s use a simplified model of a queue that uses the analogy of an olive oil bottling plant. A funnel is used to pour

olive oil into bottles. The funnel has a wide opening at its top, so that it can hold lots of olive oil (or software

messages by analogy). The funnel has a narrow opening at its bottom, so that the rate of flow of olive oil out of the

funnel is limited (by analogy to a software process that can receive and process messages slowly). See Figure 1.

The rate at which olive oil is poured into the funnel (or the rate at which messages are sent into the message queue)

is named**Pr(t)**, the Production Rate. The rate at which olive oil comes out of the bottom of the funnel (or the rate at

which messages are consumed from the message queue) is named**Cr(t)**, the Consumption Rate.

Both**Pr(t) **and **Cr(t) **are functions of the time **t**; in other words, the flows in and out can vary over time.

**WHEN DOES A MESSAGE QUEUE HELP ?**

If the Production Rate is always less than the Consumption Rate, **Pr(t) < Cr(t)**, then everything put into the

funnel or queue always flows out immediately. No backup of olive oil (or queue of messages) ever develops. Thus,

an olive oil funnel or message queue is unnecessary in this case.

On the other hand, if the Production Rate is always larger than the Consumption Rate,**Pr(t) > Cr(t)**, then there’

s always more being put in than can be taken out. If you wait a while, olive oil will fill up the entire funnel and begin to

spill out of the top. By analogy, messages will fill up the entire message queue and will have no additional place to be

stored in the message queue. No matter how big the capacity of the funnel (or the queue), eventually all of its capacity

will be filled up. In this situation, a message queue will be of no use since it will eventually overflow and messages

will be lost, even if it has huge capacity. See Figure 2.

olive oil into bottles. The funnel has a wide opening at its top, so that it can hold lots of olive oil (or software

messages by analogy). The funnel has a narrow opening at its bottom, so that the rate of flow of olive oil out of the

funnel is limited (by analogy to a software process that can receive and process messages slowly). See Figure 1.

The rate at which olive oil is poured into the funnel (or the rate at which messages are sent into the message queue)

is named

which messages are consumed from the message queue) is named

Both

funnel or queue always flows out immediately. No backup of olive oil (or queue of messages) ever develops. Thus,

an olive oil funnel or message queue is unnecessary in this case.

On the other hand, if the Production Rate is always larger than the Consumption Rate,

s always more being put in than can be taken out. If you wait a while, olive oil will fill up the entire funnel and begin to

spill out of the top. By analogy, messages will fill up the entire message queue and will have no additional place to be

stored in the message queue. No matter how big the capacity of the funnel (or the queue), eventually all of its capacity

will be filled up. In this situation, a message queue will be of no use since it will eventually overflow and messages

will be lost, even if it has huge capacity. See Figure 2.

Figure 2: Funnel overflows if consistently filledat a greater rate than it can empty. |

So in fact it sounds as if olive oil funnels and message queues are not awfully

useful. And in fact both are useful only in one limited situation:

useful. And in fact both are useful only in one limited situation:

Only if the Production Rate is greater than the Consumption Rate for a short time **T**, **Pr(t) > Cr(t) **for times **t **

within **T**, then an olive oil funnel or a message queue makes sense. This short time **T **may be called a ‘burst’. The

idea is that the funnel or queue can absorb only a temporary excess of production over consumption during an

occasional burst. And so a funnel or a queue is only useful if the Production Rate is greater than Consumption Rate

for a limited burst time only. These bursts should be spread out in time.

Many software designers forget this limitation of message queues and other data buffering schemes, and so they

sometimes tend to over-use them. Please ask yourself if your queues or ring buffers or buffer pools will be used to

store information that is arriving in bursts. Because otherwise they will fail (by explosion) if asked to store information

that will relentlessly be arriving at a faster pace than it can be handled.

**CONSTANT PACE BURST CALCULATIONS**

We can begin with the simplification of making the Consumption Rate ‘**Cr**’ and the Production Rate ‘**Pr**’ be

constants. Of course, the constant Production Rate must be limited to burst time**T**. For now, let’s keep it at zero at

other times.

At the beginning of a burst, the message queue should be empty. During the burst, messages are produced and put

into the queue; and at the same time some messages are taken out of the queue and consumed. Since everything is

linear in this simple situation, the number of messages in the queue gradually grows as the burst proceeds. So, by

the end of the burst:

** Pr*T ** messages have been produced and put in the queue.

And …**Cr*T ** messages have been taken and consumed from the queue.

So the number of messages remaining in the queue at the end of the burst,

is:**L = (Pr-Cr)*T**.

And since the queue gets longer and longer as time advances through the burst, the length of the queue at the end of

the burst is the maximum length of the queue during the burst. And so**L = (Pr-Cr)*T **is the maximum length of

the queue. In other words, it is the required size of the queue.

But there’s a*"gotcha" *here. We’re not done with our constant-pace situation yet! If another burst begins shortly after

the end of the first burst, the funnel will overflow, or the message queue will not have capacity for the second burst. In

fact the formula above is only valid if a new burst begins after the funnel/queue has been totally emptied of content

from the previous burst. This will take some time. We’ll call that the Emptying time**D**, or required __D__**ead **time. We can

calculate it as follows:

At the end of the burst, the Production Rate ‘**Pr**’ goes to zero, but the Consumption Rate ‘**Cr**’ continues at its constant

value. There are**L = (Pr-Cr)*T **messages in the queue at that time.

So the emptying time**D = L/Cr**, or **D = ( Pr/Cr – 1 )*T**.

The emptying time**D **could actually be much longer than the burst time, if the **Pr/Cr **ratio is big. It is a ‘quiet time’

that should be given to the message queue to allow it to recover and get ready for a new burst.

If you think about it for a moment, you'll quickly realize that this emptying time**D **** **is also the longest time that any

message will sit in the queue, waiting to be taken from the queue. It's the time it will take for the very last message

that was put into the queue at the very end of the burst, to wait for all the other messages that are in the queue ahead

of it to be fetched and processed. Hence**D **is also the maximum queueing delay in our simple "linear" queue. [The

variable name "**D**" can be thought to signify maximum __D__**elay time**.] This is a very important value for designers of time-

critical systems.

**EXAMPLE: CONSTANT PACE BURST CALCULATIONS IN A PSYCHOLOGY EXPERIMENT**

The application in this example is a device that will detect changes in a person’s “brain waves” in response to colors

and images that are flashed before the subject’s eyes. “Brain waves”, also called EEG (electoencephalographic)

signals are weak electrical signals that can be detected by an array of electrodes placed in a “halo” around a person’s

head.

Whenever a new color or image is flashed before the subject’s eyes, hardware begins to sample the EEG signals at a

2 millisecond per sample rate. Sampling then continues for 1.5 seconds. In software, each sample consists of an

array of bytes, one byte per electrode. The entire array for a sample is put into a single message that is then put into a

message queue. See Figure 3.

Further processing of the EEG signals takes 5 milliseconds per sample.

When the EEG signal messages are stored in a message queue, with one message per data sample, the required

length of the message queue is calculated as follows:

**Pr = 500 **samples per second, during a burst

**Cr = 200 **samples per second, during and after a burst

**T = 1.5 **seconds, length of a burst

Resulting in ..**L = (500-200)*1.5 = 450 **messages need to be queued up.

The required length of the message queue is then precisely**450 **messages.

idea is that the funnel or queue can absorb only a temporary excess of production over consumption during an

occasional burst. And so a funnel or a queue is only useful if the Production Rate is greater than Consumption Rate

for a limited burst time only. These bursts should be spread out in time.

Many software designers forget this limitation of message queues and other data buffering schemes, and so they

sometimes tend to over-use them. Please ask yourself if your queues or ring buffers or buffer pools will be used to

store information that is arriving in bursts. Because otherwise they will fail (by explosion) if asked to store information

that will relentlessly be arriving at a faster pace than it can be handled.

constants. Of course, the constant Production Rate must be limited to burst time

other times.

At the beginning of a burst, the message queue should be empty. During the burst, messages are produced and put

into the queue; and at the same time some messages are taken out of the queue and consumed. Since everything is

linear in this simple situation, the number of messages in the queue gradually grows as the burst proceeds. So, by

the end of the burst:

And …

So the number of messages remaining in the queue at the end of the burst,

is:

And since the queue gets longer and longer as time advances through the burst, the length of the queue at the end of

the burst is the maximum length of the queue during the burst. And so

the queue. In other words, it is the required size of the queue.

But there’s a

the end of the first burst, the funnel will overflow, or the message queue will not have capacity for the second burst. In

fact the formula above is only valid if a new burst begins after the funnel/queue has been totally emptied of content

from the previous burst. This will take some time. We’ll call that the Emptying time

calculate it as follows:

At the end of the burst, the Production Rate ‘

value. There are

So the emptying time

The emptying time

that should be given to the message queue to allow it to recover and get ready for a new burst.

If you think about it for a moment, you'll quickly realize that this emptying time

message will sit in the queue, waiting to be taken from the queue. It's the time it will take for the very last message

that was put into the queue at the very end of the burst, to wait for all the other messages that are in the queue ahead

of it to be fetched and processed. Hence

variable name "

critical systems.

and images that are flashed before the subject’s eyes. “Brain waves”, also called EEG (electoencephalographic)

signals are weak electrical signals that can be detected by an array of electrodes placed in a “halo” around a person’s

head.

Whenever a new color or image is flashed before the subject’s eyes, hardware begins to sample the EEG signals at a

2 millisecond per sample rate. Sampling then continues for 1.5 seconds. In software, each sample consists of an

array of bytes, one byte per electrode. The entire array for a sample is put into a single message that is then put into a

message queue. See Figure 3.

Further processing of the EEG signals takes 5 milliseconds per sample.

When the EEG signal messages are stored in a message queue, with one message per data sample, the required

length of the message queue is calculated as follows:

Resulting in ..

The required length of the message queue is then precisely

Figure 3: EEG signal data samples are storedas 1 message per 2 millisecond sample. |

another burst begins. In the brain wave device example, this means that:

In other words, a new burst must not begin until at least 2.25 seconds after the end of the previous burst.

This sort of limitation can impose significant constraints on a system. In our brain wave device, it means that a new

color or image must not be flashed before the subject’s eyes more frequently than once every 3.75 seconds. [Burst

Time + Recovery Time] Let’s hope that the subjects of the brain wave experiments have the patience for this!

What is the longest delay that any one of the messages will encounter in the queue of this example? It is also the

value we calculated for

the last message in each burst will be held in the queue for this amount of time, before it gets processed.

constant. In preparation for working with a much more complex equation, some new symbols need to be defined:

The burst period

The rate of production of messages for the queue is now changeable over time, and will be called

of consumption of messages is now also a function of time, called

Since this more general case is non-linear, we can no longer rely on the assumption that the queue will be longest at

the end of a burst. So we need to proceed in two steps: In the first step, we need to figure out the length of the queue

at any time during a burst. And then in the second step, we need to see at what point during the burst was the queue

longest. This would then be the length of the queue we need to prepare in our software. In the general case, the point

at which the queue is longest might well be in the middle of the burst or 2/3 of the way through a burst.

For the first step, here is an equation that tells us the actual length of the queue of messages

By looking at **L(Tx) **, we can see how the length of the message queue varies over time during the burst of

messages. [**Caution**: This formula assumes that the length of the queue **L(Tx) ** does not go negative anytime

during the burst. If yours does, then you may need to sign up for a full Queuing Theory course right now!]

Now for the second step: The issue of greatest interest is the worst-case length of the message queue. We can’t

assume that this worst-case length occurs at the end of the burst. It might occur at any time during the burst,

if**Pr(t) **and **Cr(t) **fluctuate in wild enough ways. So in fact all that can be said is that the worst-case length of

the message queue is:

**L****queue **** = MAXIMUM****[ L(T****x****) ]**

** for any ****T****x in T**

This is the overall queue length that is needed for the message queue.

**EXAMPLE: VARIABLE PACE QUEUE**

The formulas above are pretty frightening for those of us who have forgotten our days in calculus class (or who’d like

to forget them). So here’s a way to avoid the calculus: Work with graphs of the functions, and calculate the areas

under the curves instead of integrating the equation.

In Figure 4 there are graphs of message Production Rate and Consumption Rate as functions of time. To figure out**L**

(Tx) the length of the queue at time **Tx **, you just need to get the area of the **Pr(t) **curve up to time **Tx **and the

area of the**Cr(t) **curve up to time **Tx**. The difference between these two areas is **L(Tx)**, the length of the queue

at time**Tx**. [Kids can do this by putting a ruler vertically on the 2 graphs at time **Tx **, and then figuring out the areas to

the left of the ruler and under each of the 2 curves.]

By doing this for a few different values of the time**Tx **, you can create a plot of **L(Tx) ** as a function of time **Tx**. That’

s the bottom curve in Figure 4. This curve shows you how the number of messages in the queue varies a function of

time. You can continue the curve after the end of the message production burst, and thus see the queue empty out.

[In fact, in the variable-pace general case, the limitation to well-defined “bursts” can be relaxed quite a bit.]

Then all that remains is to find the high point of the curve of**L(Tx) **. That high point represents the longest length of

the message queue at any time in the burst. That’s the overall queue length that’s needed for your message queue.

messages. [

during the burst. If yours does, then you may need to sign up for a full Queuing Theory course right now!]

Now for the second step: The issue of greatest interest is the worst-case length of the message queue. We can’t

assume that this worst-case length occurs at the end of the burst. It might occur at any time during the burst,

if

the message queue is:

to forget them). So here’s a way to avoid the calculus: Work with graphs of the functions, and calculate the areas

under the curves instead of integrating the equation.

In Figure 4 there are graphs of message Production Rate and Consumption Rate as functions of time. To figure out

(Tx)

area of the

at time

the left of the ruler and under each of the 2 curves.]

By doing this for a few different values of the time

s the bottom curve in Figure 4. This curve shows you how the number of messages in the queue varies a function of

time. You can continue the curve after the end of the message production burst, and thus see the queue empty out.

[In fact, in the variable-pace general case, the limitation to well-defined “bursts” can be relaxed quite a bit.]

Then all that remains is to find the high point of the curve of

the message queue at any time in the burst. That’s the overall queue length that’s needed for your message queue.

simple algebraic formulas that rely on simplifying assumptions such as constant or known message arrival and

departure rates. But in many cases, such assumptions are gross over-simplifications. In many cases, embedded

designers may object to what we've seen, saying

describing the rate at which my messages will be sent, or the precise rate at which they’ll be processed. So I

can't use the formulas we've got so far. Is there anything else I can do to figure out my queue lengths and my

queuing delays in the real world??”

wade into

now we’ll focus on systems with timing behavior that can be described in a statistical way. Some people refer to them

as “soft” real-time systems.

We’ll be working with something called an

statistical queuing model. The first ‘M’ means ‘Memoryless arrival time probability function’. In other words, the arrival

times of subsequent messages into the queue don’t depend on when previous messages arrived. The second ‘M’

means ‘Memoryless service time probability function’. In other words, the processing times of subsequent

messages don’t depend on the processing times of previous messages. And the ‘1’ means we’re focusing on a

single queue feeding its messages to a single processor of messages.

message will arrive. They think that if the last 25 messages arrived at a queue around 10 milliseconds apart, then the

next message could just as easily arrive 10,000 years from now as it could arrive 10 milliseconds from now. But that’

d be total chaos and unrealistic, rather than just independent arrival times. So we’ll be talking about a much more

orderly kind of random behavior in which inter-arrival times are randomly distributed but there’s still a known average

number of arrivals per unit of time.

This is known as a Poisson distribution (named in honor of a French mathematician,

Poisson distribution is widely used to describe discrete but rare events, for which the time between successive

arrivals is randomly distributed. For instance, it can describe traffic accident rates, cancer rates in large populations,

radioactive decay, or embedded system interrupt arrival rates. Things will become quite orderly if you look at them in a

very specific way: You need to ask how many of these random events you expect will occur in a given time interval, if

you know the average number of such events per time interval. Sometimes there will be fewer than the average

number of events; and sometimes there will be more than the average. The Poisson distribution expresses this

mathematically. If ‘

in a time interval, where:

P(N) = (e) * ( (E) / N! ) , for N = 0, 1, 2, 3, …

and we can most easily do that by studying a graphic example in Figure 5.

In this example **E=1**. In other words, on average we’re having one ‘event’ (or message arrival at our queue) per time

interval. As we can see in the graph, this actually means that there’s a 1/3 probability that there will be zero ‘events’ in

any given time interval. There’s also a 1/3 probability that there will be exactly one ‘event’ in any given time interval.

And there will be smaller probabilities of larger numbers of ‘events’ in a given time interval. Similar graphs can be

obtained for other values of**E**, which typically fall off on either side of a peak near value **E**. The falloff is gradual for

**N>E ** as **N**, the number of ‘events’ per time interval, gets larger; but it is steep for **N<E **since **N ** cannot go negative.

We will assume that messages arrive into our queues according to such Poisson distributions. Going forward in this

paper, we’ll work with the inverse of ‘**E**’, which we’ll call ‘**M**’ – the Mean Inter-Arrival Time. **M = 1/E ** .

**EXPONENTIAL DISTRIBUTION OF MESSAGE PROCESSING TIMES**

Once a message has arrived in a queue, it stays in the queue until all messages ahead of it have been taken off the

queue and individually processed. It’s not hard to model message processing time as also randomly distributed, but

with a known average processing time. So that’s what we’ll do.

In fact, these are the same assumptions as for a Poisson distribution, so the math that was developed by Mr. Poisson

holds for message processing times as well. But when we talk about message processing times, it’s more

convenient to express the formulas in a different way: If**A ** is the average message processing time, then the

probability**P(T) **that any given message will consume actual processing time **T**, is:

**-T/A**

P(T) = (e) / A , for processing times **T **ranging from **0 **to **A **and higher.

This view of random timing is called an Exponential distribution. Don’t worry … we won’t be deriving this formula

either. We just need to understand this one too. Its graph can be seen in Figure 6.

interval. As we can see in the graph, this actually means that there’s a 1/3 probability that there will be zero ‘events’ in

any given time interval. There’s also a 1/3 probability that there will be exactly one ‘event’ in any given time interval.

And there will be smaller probabilities of larger numbers of ‘events’ in a given time interval. Similar graphs can be

obtained for other values of

We will assume that messages arrive into our queues according to such Poisson distributions. Going forward in this

paper, we’ll work with the inverse of ‘

queue and individually processed. It’s not hard to model message processing time as also randomly distributed, but

with a known average processing time. So that’s what we’ll do.

In fact, these are the same assumptions as for a Poisson distribution, so the math that was developed by Mr. Poisson

holds for message processing times as well. But when we talk about message processing times, it’s more

convenient to express the formulas in a different way: If

probability

P(T) = (e) / A

This view of random timing is called an Exponential distribution. Don’t worry … we won’t be deriving this formula

either. We just need to understand this one too. Its graph can be seen in Figure 6.

For above-average message processing times (to the right of **A**), the curve extends out to infinity --- saying that some

of our messages might have excruciatingly long processing times, although there’s a very low probability that this will

happen to any given message. But for below-average message processing times (to the left of**A**), the curve is

limited to positive times – so that a large number of messages will cluster in this small range, with the number of

messages increasing as the processing load they impose goes down toward zero.

This pattern of lots of easy-to-process messages, with a few occasional hard-to-process messages, very often

describes the reality within our embedded systems. So we will use Exponential distributions to describe message

processing times. The average message processing time**A ** will appear again and again as we continue.

**WHEN WILL A QUEUE DEVELOP?**

If message arrival times and message processing times are both random, then how could a queue of significant

length ever develop? Well, if the arrival times were correlated with the processing times, then a queue might not

develop; or it might be very short in worst case. But in fact in queuing theory as well as in embedded systems and

software, message arrival times and message processing times are not normally correlated with each another in any

way. In other words, a cluster of messages with very short inter-arrival times might well contain a lot of messages

with very long processing times. And that’s when a queue of significant length would develop.

In fact, if the mean inter-arrival time ‘**M**’ is shorter than the average processing time ‘**A**’, **M<A **then the queue will grow

without bounds, overflow and become useless. Hence, in this random statistical world of ours, we must have:

**M > A ** , for a queue to be useful.

Another way of expressing this is to introduce a new parameter**R**, defined as:

**R == A / M ** , or equivalently **R=A*E **.

**R **is the ratio of average processing time to mean inter-arrival time. **R ** must be less than 1 for a queue not to

overflow uselessly. In other words, for a queue to be useful, messages must arrive on average more slowly than they

can be processed on average.

**SOME USEFUL FORMULAS**

It can be shown (although we won’t go through mathematical proofs here) that the average length of a queue will be:

**Lavg = R / (1-R) ** , or **Lavg = A / (M-A)**.

And the average time that a message spends in the queue will be:

**Tavg = A / (1-R) ** , or **Tavg = A*M / (M-A)**.

And if you divide the**Tavg ** formula by the **Lavg ** formula, you’ll get: **Tavg / Lavg = M **, or:

**Lavg = Tavg / M **. , or **Lavg = E* Tavg **.

This is called “**Little’s Theorem**”. It’s very much analogous to the famous “**Rate * Time = Distance**” formula of high

school physics, but applied to queues. It says that when averages are used for all values, the length of a queue is the

product of the rate of arrival of messages multiplied by the time they stay in the queue.

Why doesn’t Little’s Theorem involve**A **, the amount of time it takes to process a message on average? Wouldn’t

that be more intuitive? The answer to this is: In fact**A ** is very much involved in Little’s Theorem, because **A ** is a

major ingredient in determining**Tavg**, the average amount of time a message spends in the queue.

A final useful formula: The probability**P ** that there will be at least **K ** messages in the queue will be:

** K**

P [ > K enqueued ] = (R) .

In other words, the probability of a queue becoming longer goes down geometrically with its length (since**R **is

smaller than 1).

The remainder of this paper will show how these 4 formulas can be used to obtain useful results for designing your

embedded systems and software.

**EXAMPLE #1: QUEUE DELAY CALCULATION**

Say we’ve got a task that produces messages and puts them into a queue with mean inter-arrival time of

**M = 25 ** milliseconds. And we’ve got another task that fetches the messages from the queue, and processes them

with mean processing time of**A = 20 ** milliseconds per message. See Figure 7. What will be the average delay of

a message in this queue?

of our messages might have excruciatingly long processing times, although there’s a very low probability that this will

happen to any given message. But for below-average message processing times (to the left of

limited to positive times – so that a large number of messages will cluster in this small range, with the number of

messages increasing as the processing load they impose goes down toward zero.

This pattern of lots of easy-to-process messages, with a few occasional hard-to-process messages, very often

describes the reality within our embedded systems. So we will use Exponential distributions to describe message

processing times. The average message processing time

length ever develop? Well, if the arrival times were correlated with the processing times, then a queue might not

develop; or it might be very short in worst case. But in fact in queuing theory as well as in embedded systems and

software, message arrival times and message processing times are not normally correlated with each another in any

way. In other words, a cluster of messages with very short inter-arrival times might well contain a lot of messages

with very long processing times. And that’s when a queue of significant length would develop.

In fact, if the mean inter-arrival time ‘

without bounds, overflow and become useless. Hence, in this random statistical world of ours, we must have:

Another way of expressing this is to introduce a new parameter

overflow uselessly. In other words, for a queue to be useful, messages must arrive on average more slowly than they

can be processed on average.

And the average time that a message spends in the queue will be:

And if you divide the

This is called “

school physics, but applied to queues. It says that when averages are used for all values, the length of a queue is the

product of the rate of arrival of messages multiplied by the time they stay in the queue.

Why doesn’t Little’s Theorem involve

that be more intuitive? The answer to this is: In fact

major ingredient in determining

A final useful formula: The probability

P [ > K enqueued ] = (R)

In other words, the probability of a queue becoming longer goes down geometrically with its length (since

smaller than 1).

The remainder of this paper will show how these 4 formulas can be used to obtain useful results for designing your

embedded systems and software.

with mean processing time of

a message in this queue?

We can use the second “useful formula” for this. The average time that a message spends in a queue will be:

**Tavg = A / (1-R) .**

Crunching the numbers results in: **Tavg = 20 / (1-20/25) = 100 **milliseconds.

That is to say, a message will on average be delayed a tenth of a second while it is in this queue.

We can also try out our first “useful formula” on this example.

The average length of a queue will be:**Lavg = R / (1-R) **.

Crunching the numbers results in:**Lavg = (20/25) / (1-20/25) = 20/5 = 4 **messages.

So the average number of messages in this queue will be 4.

And we can also check Little’s Theorem on this example,

by recalculating the average queue length using the formula:**Lavg = Tavg / M **.

Crunching the numbers results in:**Lavg = 100 / 25 = 4 **messages. And so we’re consistent with our

previous results. Little’s Theorem checks out.

**EXAMPLE #2: LIMITING QUEUE OVERFLOW**

Say we’ve got a software task that’s producing messages and depositing them into a queue at random times, but with

mean message arrival time of**M = 30 ** milliseconds. Another software task is responsible for fetching and

processing messages from this queue. The queue is limited to a maximum of 5 messages; otherwise it will

overflow. See Figure 8. What is the highest average message processing time that will limit queue overflow to less

than one ten-thousandth of a percent?

That is to say, a message will on average be delayed a tenth of a second while it is in this queue.

We can also try out our first “useful formula” on this example.

The average length of a queue will be:

Crunching the numbers results in:

So the average number of messages in this queue will be 4.

And we can also check Little’s Theorem on this example,

by recalculating the average queue length using the formula:

Crunching the numbers results in:

previous results. Little’s Theorem checks out.

mean message arrival time of

processing messages from this queue. The queue is limited to a maximum of 5 messages; otherwise it will

overflow. See Figure 8. What is the highest average message processing time that will limit queue overflow to less

than one ten-thousandth of a percent?

We’ll use the formula for the probability that there will be at least **K **messages in the queue:

** K**

P [ > K enqueued ] = (R)

,and we’ll put **P **at less than 0.0001% when **K **is at 6 or more messages for queue overflow. That gives us:

** 6**

P [ >6 enqueued ] = (R) < 0.000001

6 6 6 -6

And since **R = A / M **and **M=0.030 **seconds, we get ... **(R) = (A) / (0.030) < 1*(10) **.

Resulting in**A < 1*(1/10)*0.030 = 0.003 **seconds.

Thus, the average processing time for the task that processes messages must be less than 3 milliseconds if we are

to limit overflow losses in our 5-message capacity queue to less than 0.0001%.

Now, what would be the average queue length under these conditions?

We’ll use our trusty formula:**Lavg = R / (1-R) **.

Crunching the numbers, we get**R= 0.003 / 0.030 = 0.1 **.

And then**Lavg = 0.1 / 0.9 = 0.11 **messages.

So there’s a lot less than one message in the queue, on average! Most of the 5-message capacity of the queue is

hardly ever used. But it’s got to be there in order to limit queue overflow to less than 0.0001%.

**EXAMPLE #3: LIMITING INTERRUPT LOADING**

Say we’ve got a hardware device that delivers interrupt requests to a processor at random times, but with a mean time

between interrupts of**M = 25 **milliseconds. The interrupt service software needs average processing time **A **to

handle each interrupt. Interrupts cannot be queued up before they’re processed; hence they’re lost if the processor is

overloaded with interrupts. What is the value of**A **that will limit the interrupt overloading losses to less than 1%?

Here we really don’t have a queue. Instead, interrupt-related information is stored in hardware until either interrupt

service software fetches it, or another interrupt arrives. This is like a hardware-based queue with capacity for exactly 1

message, and no more. In other words, the arrival of a second message while an earlier one is still in this queue,

would be considered an overloading loss.

We’ll once again use the formula for the probability that there will be at least K messages in the queue:

** K**

P [ > K enqueued ] = (R)

,and we’ll put P at less than 1% when K is at 2 or more messages for overflow. That gives us:

**P [ >2 enqueued ] = R*R < 0.01**

And since **R = A / M **and **M=0.025 **seconds, we get **R*R = A*A / ((0.025)*(0.025)) < 0.01 **.

Resulting in**A < sqrt (0.01)* 0.025 = 0.0025 **seconds.

Thus, the interrupt service software average processing time must be less than 2.5 milliseconds if we are to limit

interrupt overloading losses to less than 1%.

**EXAMPLE #4: RESPONSE TO HIGH-PRIORITY INTERRUPT**

Say that interrupts from a hardware device are arriving sporadically, with mean inter-arrival time **M = 400 **

microseconds. For these interrupts, the mean data processing time is **A = 300 **microseconds. What will be the

mean response time for the processing of these interrupts?

The average time that a message spends in a queue is given by “useful formula”:**Tavg = A / (1-R)**.

Hence, the average time will be:**Tavg = 300 / (1-300/400) = 1200 **microseconds.

This is the average time until software begins to process its responses to these interrupts.

A result such as this brings up a number of system and software design questions:

*** ** Can our application live with a response time that’s so much longer than the average rate of interrupts, and so

much longer than the interrupt’s data processing time?

*** ** Where are the interrupt-related data going to be stored while they’re waiting to be processed?

*** ** Will this storage be implemented in software or hardware?

*** ** …

Let’s try to answer the second question above by using the very first “useful formula” again. If interrupts were to queue

up, then the average length of the queue would be given by our trusty formula:**Lavg = R / (1-R) **.

In this example, the result is:**Lavg = (300/400) / (1-300/400) = 3 **messages. (You can also get this

result using Little’s Theorem.) So, …*aha!!*…, we’re going to need some queuing of interrupts, or interrupt-related

data, for handling this hardware device.

And so the results of our average queue length calculation bring us to the third question above: * Will this queue be

implemented in software or in hardware? Some processors nowadays, such as network communication processors,

have message queuing capability for on-chip hardware interfaces built right into their silicon.

But if we choose to implement this queue in software, we’re going to have to redesign the software that handles this

particular hardware device, to avoid the kind of interrupt overloading losses we saw earlier in**Example #3**. Rather

than doing all of the data processing (**A=300 **microseconds) right inside the interrupt service software, we’re going

to need to break apart the software into a very brief interrupt service routine and a separate software task that will

handle the bulk of the interrupt-related data processing. The interrupt service routine will pass messages (one per

interrupt) to the software task through a message queue, as seen in Figure 9.

P [ > K enqueued ] = (R)

P [ >6 enqueued ] = (R) < 0.000001

6 6 6 -6

Resulting in

Thus, the average processing time for the task that processes messages must be less than 3 milliseconds if we are

to limit overflow losses in our 5-message capacity queue to less than 0.0001%.

Now, what would be the average queue length under these conditions?

We’ll use our trusty formula:

Crunching the numbers, we get

And then

So there’s a lot less than one message in the queue, on average! Most of the 5-message capacity of the queue is

hardly ever used. But it’s got to be there in order to limit queue overflow to less than 0.0001%.

between interrupts of

handle each interrupt. Interrupts cannot be queued up before they’re processed; hence they’re lost if the processor is

overloaded with interrupts. What is the value of

Here we really don’t have a queue. Instead, interrupt-related information is stored in hardware until either interrupt

service software fetches it, or another interrupt arrives. This is like a hardware-based queue with capacity for exactly 1

message, and no more. In other words, the arrival of a second message while an earlier one is still in this queue,

would be considered an overloading loss.

We’ll once again use the formula for the probability that there will be at least K messages in the queue:

P [ > K enqueued ] = (R)

Resulting in

Thus, the interrupt service software average processing time must be less than 2.5 milliseconds if we are to limit

interrupt overloading losses to less than 1%.

mean response time for the processing of these interrupts?

The average time that a message spends in a queue is given by “useful formula”:

Hence, the average time will be:

This is the average time until software begins to process its responses to these interrupts.

A result such as this brings up a number of system and software design questions:

much longer than the interrupt’s data processing time?

Let’s try to answer the second question above by using the very first “useful formula” again. If interrupts were to queue

up, then the average length of the queue would be given by our trusty formula:

In this example, the result is:

result using Little’s Theorem.) So, …

data, for handling this hardware device.

And so the results of our average queue length calculation bring us to the third question above: * Will this queue be

implemented in software or in hardware? Some processors nowadays, such as network communication processors,

have message queuing capability for on-chip hardware interfaces built right into their silicon.

But if we choose to implement this queue in software, we’re going to have to redesign the software that handles this

particular hardware device, to avoid the kind of interrupt overloading losses we saw earlier in

than doing all of the data processing (

to need to break apart the software into a very brief interrupt service routine and a separate software task that will

handle the bulk of the interrupt-related data processing. The interrupt service routine will pass messages (one per

interrupt) to the software task through a message queue, as seen in Figure 9.

So here we’ve seen how a quick * “back of a postage stamp” * queueing calculation can bring us to a major rethinking

of an embedded system software design decision.

**CONCLUSION: WHERE CAN WE GO FROM HERE?**

Unlike its name, we've seen that Queuing “Theory” has lots of non-theoretical uses in practical down-to-earth

embedded system and software design. There are lots of ways to delve even deeper into Queuing Theory. After all,

Queuing Theory is a well-developed branch of classical mathematics.

**END.**

of an embedded system software design decision.

embedded system and software design. There are lots of ways to delve even deeper into Queuing Theory. After all,

Queuing Theory is a well-developed branch of classical mathematics.

Earlier versions of this material were published as a sequence of 2 articles in the April 2001 and May 2003 issues of Embedded Systems Programming magazine. © Copyright 2016, D. Kalinsky Associates, All Rights Reserved. This page Updated March 25, 2016 |

Figure 5:Poisson Distribution of Message Arrival Rates for E=1 |

Message

Queue

Queue

Message Producer Task |

Message Consumer Task |

Pr(t) Production Rate |

Cr(t) Consumption Rate |

60

1

2

3

4

Time (sec.)

Time (sec.)

Time (sec.)

2

3

4

2

3

4

1

1

Pr(t)Message Production Rate (msgs./sec.) |

Cr(t)Message Consumption Rate (msgs./sec.) |

L(t)Queue LengthOver Time (in Messages) |

Overall Queue Length Limit Needed is 35 Messages |

0 |

50 |

100 |

0 |

50 |

100 |

0 |

15 |

35 |