Summer Institute on Advanced Computation

August 20-23, 2000

College of Engineering & CS
Wright State University

Dayton,
Ohio 45435-0001

Overview of the Program  and
Introduction to Networks of Workstations

Dr. Prabhaker Mateti

Department of Computer Science and Engineering
Wright State University

 
August 21, 2000 8:30 a.m. This site contains the "web-based proceedings" of Summer Institute on Advanced Computation that focused on high- performance high- throughput  Cluster Computing.
 
Overview of the Program  Slides
Introduction to Networks of Workstations slides
           

DR. OSCAR GARCIA: Thank you very much, Jay. Now we will get into substance, and to start giving you substance, we have Dr. Prabhaker Mateti. Dr. Mateti received his Ph.D in Computer Science from the University of Illinois, and his master's from the Indian Institute of Technology in Kanpur. He has been very active in student guidance and directed nineteen master's theses and three doctoral dissertations. He is interested in distributed computing, operating systems, programming language design, formal aspects of software engineering, and Internet security. Actually, it is that last topic that is really supporting some technical aspects of the summer institute.

The laboratory that you are going to be visiting on the fourth floor of this building [OSIS Lab] was obtained through a grant from the National Science Foundation, of which Dr. Mateti was the principle investigator. Let's welcome Prabhaker Mateti.

           

 

DR. PRABHAKER MATETI: Thank you very much, and welcome to you all once again.

Let me summarize for you the program that we have planned for you. I would like to go over our schedule and the academic materials that we have provided for you and also get us to introduce ourselves to each other.

The program at a glance is this. You all had the pleasure of listening to the marvelous talk by Miron already, and I hope you are ready and willing to forgive me, but I will not be able to match him in my talk today. We are going to be talking about networks of workstations in a very general sense this entire morning. We have a luncheon talk by a Sun engineer that is then going to be followed by a Globus expert here from Argonne National Labs. And then, again, we have a dinner talk talking about clusters, this time by an SGI expert.

Tomorrow, depending on how this clusters talk goes, I will also talk about clusters, but I'm attempting to focus on parallel programming techniques using PVM and MPI. And that lecture, again, will be in this room, but the second half of the morning we'll go upstairs to the lab so you will have some hands-on feeling of that. So the lecture is in this room, but the lab is going to be on the fourth floor.

Then we have in the afternoon Kumaran Kalyanasundaram, again, from SGI talking about clusters, and the dinner talk is from Mario Lauria from Ohio State on HPVM. The next morning once again you'll have me talking about Condor, this time in the very practical setting. We would like to give you some details about how to set up Condor and give you some hands-on exposure in the lab, which is followed in the afternoon by a continuation of the topic by Mario from Ohio State.

Let me go through the materials that we have provided you for you. We have given you three books. The first two are the usual printed books.  The first one TBD is by David Spector entitled "Building Linux Clusters," which is a very detailed account of how you would go about building Linux clusters. It does not assume that you have previously installed Linux machines. In fact, it comes with a CD. There is another more classical book by Sterling, but unfortunately it received quite a few negative reviews, so I prefer this one. If you begin installing a Linux cluster with the CD there, it has all the necessary software to set up an entire cluster, not just a stand-alone Linux system. The second book TBD is a very academic kind of book. It's a collection of papers written by experts in various fields, and it has three parts. We are going to be focusing in this program here mostly on Part I. We are going to go into some detail in some of the papers there than the papers do, and it is an excellent idea to read the rest of the parts also.  In case you also want a companion volume, this one happens to be Volume II, Volume I, edited by the same editor and is a different collection of papers, is on systems. And only one of those systems happens to be Linux clusters, and there are other system in there. And that is the second book we have given you.

The third one is a collection of things that we selected, which includes some more papers, easier to read articles on Linux clusters, and the standard Beowulf. Unfortunately, it is slightly outdated, but I think it still has lots of practical information there, and, once again, some pointers to where you can get PVM details and a tutorial on MPI. I also included various frequently asked questions in this

Of course, we have handouts from the various speakers also included.

So that's the material that we have provided, and before I go to the next part of my presentation here, let me take a break and ask all of us to introduce ourselves. Why don't we start from the back.

(Introductions inaudible.)

DR. PRABHAKER MATETI: Let me give you a little overview of what my talk is about. I'm trying to illustrate what we typical mean by this phrase "High Performance" Computing. I think you heard yesterday Miron's well-made comment that we should also be talking about "High Throughput" Computing. I would like to talk about how networks of workstations can provide both of these, and all of this depends on parallel algorithms and corresponding software technologies. So with that, let me get into the details of this.

As you all know, of course, that the CPU clock frequencies are all the time increasing. And this morning Intel announced  P4. So, this is always going to continue to happen, and, of course, there is going to be a limit reached, and you want high performance beyond that. It can only be accomplished by having computers work in parallel.  Of course, perhaps not quite around the corner, but there are non-electronic technologies that are on the horizon there, and we don't know what their structure is going to be.

And this word parallel is such a difficult one that later on in the morning session today, I'm going to try to give you a real definition of this word. But for now, let's just go with our intuitive understanding of we what mean by parallel. By parallel computing itself we have typically refer to machines that are SIMD, MIMD, pipelines and so on, that were typically part of traditional supercomputers. SIMD standing for one instruction, but being executed on multiple items of data; and MIMD standing for multiple instructions and multiple data. These traditional supercomputers, like the ordinary computers have memory.  The memory is, kind of, divided into pieces. And the pieces are associated with the CPU's, but the whole thing is shared, and it is very tightly coupled. And it is coupled at the level of bus-level connections. So even though one may say supercomputers do message passing, the message passing of the kind that happens within a traditional computer is of a radically different nature. Of course, supercomputers are expensive to buy and maintain, and this is one of the reasons why one is getting into cooperating networks of computers.

I put this acronym "NOW" in quotes because it has come to mean a specific thing to specific communities. For now, I'm using it in a very lose sense of the phrase that it is simply a network, and it is consisting of several workstations connected by this network. I have a slide here coming up in a minute on what do we mean even by this word "workstation."

So now computing is essentially a collection of these components, and the workstation is obviously the hardware and the network is part hardware, part software, and the rest of it is essentially software. We'll also see that we need cooperation from the owners of these machines.

So to recap. As to why traditional supercomputers are becoming a problem to many, many, scientists, here are some of the reasons. They are, of course, very expensive both in terms of hardware and in terms of software because the software is made especially for these machines, and it is not something that will have wide numbers in terms of salability and so on. And, of course, once you buy these machines they have a very high maintenance cost. They need to be housed in a special way, cooled in a special way, and if anything breaks down, the components to replace them with are going to be expensive. Of course, even these machines will become faster by the year, and when it is time to upgrade, they are equally expensive to upgrade. So, nobody is predicting that these are going to go away any time soon, but the overall future is, what are now known as, computational grids. Let me just give you a quote definition from the Globus Group. "Grids are persistent environments that enable software applications to integrate instructions, displays, computational and information resources that are managed by diverse organizations in widespread locations."  That's a mouth full of a definition but, effectively, if you could think of a grid as on a grid graph paper, and if you could imagine that at each intersection there is a workstation, then you are in good shape in understanding this.  Individual nodes in a grid can be, of course, a big supercomputer or a simple workstation, or, for that matter, an entire network, and the advantages of these are, of course, they are high availability, and it can accommodate peak usage more better than even a supercomputer can. And let me leave you with a simple analogy here. What was local area networks of the Internet, the networks of workstations are going to be with respect to grid. So this is the future, and we have a talk by Lee Liming from Argonne later this afternoon.

Going back to what we said the networks of workstations were. Well, what do we refer to as a workstation? A few years ago you would have called a lowly Sun III, lowly now, a workstation, and we never had a problem with that label. The Sun III a few years ago is worse than the 500MHz PC we have today. So one wonders exactly what do we mean by this word workstation, and my answer to you is that it is not even a question of CPU, so much CPU power, so much memory, so much hard disk, et cetera.  Whether we should call it a workstation or not, in the cultural sense of the word, is very much determined by this so-called workstation operating system. And these are the minimal features (see slide TBD) that we expect it to have. And the moment these features are there on any particular computer system, no matter how physically small it is, I think we can call it a workstation, and it is then entitled to be part of our networking.

We first want authenticated users. We want protection of resources. We want multiple processes that run in their own address space in such a way that if one crashes, it has practically no impact on the others. And we also want preemptive scheduling; that is, whenever we wish, for whatever reason, it should be possible for us to take control away from a running process and give control to another process that is ready to run but currently not running. And virtual memory has become such a de facto thing, that we hardly need to mention. And we also need hierarchal file systems. This has to do with the naming of files and how you make use of it. And, of course, the bottom most requirement, all of this has to be so network centric, that even when such a machine is being run in an isolated room, physically disconnected from the rest of the world, the internal services are all essentially networking in nature. So any time you have a computer system with an operating system of that kind, we can call it a workstation.

Now talking about network, the 10 megabits per second version of Ethernet is most definitely obsolete. The 100 megabits per second is the most common now, and the gigabit networks are already happening, and you will see one such small network in our lab upstairs (OSIS Lab TBD). These gigabit cards are now of the order of $200 per network card, and the cables are still a little expensive, some $30 or so versus $2 or so for the regular 100 Mb cables. So, apart from a little bit of expense still, the new networks, if we are planning now, we should be planning for the gigabit networks.

There are many other protocols, but it does not seem to be the case that we will switch from TCP/IP. Those of you who are following this, you know we are currently using IP version 4, and IP version 6 is already available. But we are legacy driven and slow to switch, but as and when it is possible for you to switch, you should be switching to IP 6. So that is the network, the component as in the network of workstations.

Where does this cooperation come in? Last night you heard Miron talk about the sociological aspect of this. Workstations are very personal. When you are working on it, you want to feel you own it and you do not want to slow down because somebody else's processes are running in the background on your workstation. But more importantly, you call it -- at the pure technical level or sociological level or what have you, these are the two requirements. You should be willing to share the resources that you have. The sharing of resources could be CPU time or the files that you have on hard disk or the hard disk empty space that some other user can store things in.

Of course, there also needs to be a willingness to trust. No matter how secure we make these systems, it is still not going to be so guaranteed that nothing will go wrong, no malicious activity will take place that we can so we guarantee you that, therefore let me run on your workstation my program. No, it will ultimately come down to a willingness to trust these programs that are arriving from the other side.

Then we come to distributed programs. There are at least two different kinds of distributions that we can talk about. The more common ones are the spatially distributed programs, where, in the sense of this networks of workstations, we have workstations geographically distributed, and a part of this program is here, a part is on another workstation. And, of course, they need to somehow talk together to accomplish some common task, and that's where this rather nebulous word synergy comes in. Otherwise, we end up calling highly disjointed computations as distributed systems. That is not the way we define this term.

They can be distributed, they can be divided into pieces and parts all over the place, but they need to have a common goal and they need to be working together. The word for that is synergy. And, unfortunately, we won't be able to give you a formal or rigorous definition for it.

And then comes this word parallel again. A lot of distributed programs that you have seen are the so-called client server programs. Now, by definition, client server programs are often not parallel in the following sense: When a client makes a request to a serve, now the server is working and the client is waiting for the response from that server.

And this is where the well-known remote procedure call comes into may. But the assumption there is that while the client is waiting for this response from the server, there are other things that could be happening on the client machine. But obviously those other things do not become part of this particular program, because those other things are highly disjointed. So because of lack of synergy, we would then say that a client server version of distributed programs doesn't fit into this scheme that I'm talking about, even though it certainly is a distributed program.

Then they have temporally distributed programs, programs where you have divided the computation up in such a way that you do a little bit of computation today and postpone the rest of the computation for another time. There are languages that facilitate this that I'll briefly mention later on.

Then we have migratory programs. These are programs that may themselves fit in the top two categories, but I'm emphasizing here that they are migratory in the following sense: A part or the whole of it may be running currently on a given workstation, but for whatever reason it decides to go elsewhere. It decides to pack up, vacate this machine, go find a new residence and resume from where it left off.

So let me go into some of the details of what is needed to do these kinds of programs. Later on we'll talk about these two words again, but for now, for emphasis, let me include them both. Distributed and parallel.

The spatially distributed programs, by their very nature of their distribution, if they need to communicate -- remember, communication is essential, otherwise that word synergy does not let us say that this is a combined program working towards a common goal. Because they are spatially distributed, the only way they can communicate and the only way they can do this, is by message passing. One process sends, another process receives, and there is a significant amount of detail and a significant amount of trouble that can happen if these are not done right.

Temporally distributed programs, again, the only way they can do this is shared memory. Because if part of the computation is done now and part of the computation is done later, in the sense of time, they have to save the intermediate results and they have to do that in the broader sense of the word shared memory. It may be a disk file that they put this information in which they'll read from, but technically it qualifies to be shared memory.

And, of course, migratory programs. If a program is going to travel, we need a very systematic way of packing the program and its data together in such a way that after it is packed and sent to the new location, it can be unpacked in an equally systematic way to an executable state. And the technical word we use for that is serialization. A slightly older term for the same thing is marshalling. Now, marshalling was a term that was applied to only data, but we could apply that term to programs also.

Talking about distributed shared memory. Later on I'm going to focus on this last bullet item, that semantics is, of course, not going to be clean at all. But first of all, what do we mean by distributed shared memory? There are all kinds of practical explanations that one can give. In the so-called traditional supercomputers also we have shared memory, but that is so tightly coupled that I don't think we would apply this phrase distributed shared memory to them.

Here we're trying to apply this phrase distributed shared memory for computer systems that are as wide spread as the following; for example, that there is one node in Dayton and another one in Cleveland and a third one in Detroit. If we could have these three machines claim to have memory, they can read and write in the old sense of the word shared memory, then we have distributed shared memory.

And this is where we need to understand exactly what it is they promise by way of semantics. At a minimum, whatever this world simultaneous may mean, we want to be able to simultaneously read and write just like you would write ordinary memory. Of course, because of the geographical distribution, that is obviously impossible to do.

So by this phrase distributed shared memory, we are always referring to an abstraction layer provided by some mechanism. And, of course, what will that mechanism use to give this? And the answer is, it has to be based on message passing. So whenever you are talking about distributed computation, whether ultimately you say you have distributed shared memory or not, you must understand message passing reasonably well. And because this is distributed in such a geographical way, the semantics is not going to be clean. And I'll say more about this in the next half of the talk.

Well, let's spend a minute or so on this. What is the technological basis for migration of programs? You will readily see that when a program is migrating, you have somehow packed it up over here and you have packed up the data. You can kind of see that the data perhaps could be packed and unpacked in such a way that we can make multiple architectures deal with the data.

As you all know, we have the immediate problem of little Indian and big Indian integers, where if an integer takes four bytes and is depending on the machine, the least significant byte is in the least address or the highest address. And already that itself is a problem and it can be a big overhead in converting to and from. But if you're talking about programs now moving across, the immediate and the simple answer is that the CPU architecture is the same. So if a program is traveling from node A to node B, we expect the nodes to have the same CPU architecture. And the newest player in there, and that kind of cheats this idea, is, of course, virtual machines of the kind Java proposes.

When you say Java code is migratory or mobile, well, JVM itself can be thought of as a CPU, so, of course, that is the same instruction set, whether it is running on your Intel, whether it is running on your PC architecture or McIntosh's, or what whatever. So this is a requirement of today and almost assumed.

Now, it is possible to pack up your programs if they are written in very highly interpreted languages, like, say, List for example. List itself could be packed up and sent anywhere and reinterpreted, but that, again, is just a slight abstraction on the first line there, same CPU architecture. Replace the word CPU with virtual machine, and you are back to saying the same thing, that it is the same architecture.

Of course, we also want the same operating system environment. When you have packed up and sent it to another node, unless this other node has near exactly the same system and environment, it is not going to run. As you may have heard, Java, for example, has failed on this in a simple-minded way, and yet in the end, to the user, in a miserable way. If you try to move a Java program from Windows to Unix, it might not look right for the simple reason that even the fonts may not be available. If you made use of a particular font and it is not there in the other node, it is not going to look okay. It may not even look readable.

So there are all kinds of things that are assumed here, operating system and environment. And of course, if we are going to pack up and send this program to another location, we wouldn't want to lose all the computation that we have performed so far. And the term that we have for this is checkpointing.

And checkpointing assumes that your operating system and environment, et cetera, are such that is possible to preemptively suspend a process and record its computational state and then send that state across to a different node and resume the computation from the point of suspension. Unfortunately, this is not as simple as it sounds.

Now, going on to the development of these programs. We have essentially three approaches that we could do. By developing brand new code, brand new software, and, of course, algorithms, there is a well-accepted rule in software engineering that applies to nearly all large programs, and that is the so called 90%/10%, or if you wish 80%/20% rule, that says 80% of the time of execution by a large program is often spent in 20% of the source code. So in order to improve anything, you should first go into identifying which 20% or 10% is where most of the computing time is being spent, and that is where you would focus on redeveloping new algorithms, taking into account the parallel architectures that are in place.

on the other hand, you could take old programs, observe its design, reverse engineer it in some way, and then rewrite them in new programming languages or at least use new libraries. Of course, there is a third one which is somehow parallelize legacy code --

(end of this side of the tape.)

-- are happening out there, and, unfortunately, I must say right at this stage none of these are widely being used. But it is worth learning what they are and what they are contributing.

There are languages that are based on shared memory models of various kinds. There are languages that are OP languages and parallel, and there are, of course, parallel functional and logic programming languages. Talking about libraries, instead of programming them in the new programming languages, which I'll go into in some more detail in the later half of this talk, but libraries, these two have had phenomenal success.

Nearly all computational science in the physics and in the computational fluid dynamics, chemistry, et cetera, seems to be being done using these libraries. Both of them happen to be message passing primitives, and they can be embedded in existing programming languages. There are near-term, short-term problems with MPI not having C++ interfaces, et cetera, but these are very, very near term. They are almost solved in MPI II.

Of course, they're architecturally portable. By that we mean you could get this thing working, in fact, on a single machine by having multiple processes using PVM and MPI, and then make it work on a network of workstations. And when that works, you could take it to a real supercomputer and have it done. And, of course, the very attractive thing about both these libraries are they are both open-sourced implementations. Later tomorrow I'll be talking more about this and probably also say a few more things about clusters. So that's the talk tomorrow morning.

There is one other thing here that I would like to mention, and we unfortunately do not have any talks on this part, but it is an important development and you should be aware of it, the implementations are about to become available, and that is distributed shared memory being implemented through an API, similar to message passing is done in MPI, which has now become essentially a standard. And this is a developing standard, but the implementations are not yet available.

An acronym that we should become familiar with, it is based on SIMD, but in this case the P there, the second letter, SPMD, they're talking about single program and multiple data. And this is something that is typically the case in a number of cluster-based computing paradigms. The same program runs on multiple nodes. They may or may not be lock-step in computation, and, of course, the nodes may not even be of the same speed. It could be that one CPU is rather slow, another CPU is very fast, and in the end, when the results are all available, they go into what is known as a barrier synchronization. Later on today I'll explain more about this barrier synchronization.

Last night you heard about Condor, and this is a system that assumes that we have operating workstations and it does do this migration of programs by doing checkpointing and remote IO. We need to explain this is a little more so let me postpone that, and as you heard last night, it does resource matching based on your requirements, and I'll be going into the practical details of this on Wednesday morning.

Well, that brings us to clusters of workstations. Our overall view is that inexpensive sets of clusters of machines can be a near, equal power alternative to traditional supercomputers. The high availability is something that we need to take with a grain of salt, but overall, surprisingly, overall, compared to traditional supercomputer downtime, we find that networks of workstations downtime is lower.

Of course, the access, the physical access, the bureaucratic way of accessing these, of course, is much, much easier. It can be used as a development platform. You could develop your programs on these, and because MPI, PVM, et cetera, are so portable, they will run on the traditional computer and traditional supercomputer.

We have at least two talks scheduled on Linux clusters. I'm assuming that they will be SGI specific, but we'll see. They are by Dr. Kumaran Kalyanasundaram. There is a dinner talk at about 6:30, and there is also an afternoon session tomorrow on Linux clusters. And depending on how fast we are going, I will also be talking about this tomorrow morning.

We also have a talk scheduled on HPVM clusters, these are NT based, I think. And we have this talk on Wednesday the last day of the institute in the afternoon, note that it ends at four o'clock. And probably this is a reasonable time to take an early break, and I think we need to release you guys earlier than twelve for lunch because we have a twelve o'clock lunch talk. So let's take a 15-minute break.

I want to spend a couple of minutes at least on this word granularity. Even though the next few slides are going to focus more or less on hardware, it is this word that we need to appreciate. And unfortunately, it's a word that is difficult to communicate, difficult to get across. One of the reasons I am trying to do that is to place the networks of workstations at a certain level, and you will see that it is going to place at a rather very course-grained parallelism.

Let me also take this moment to mention one other clarification, the acronym NOW, of course, stands for networks of workstations, and that is a generic acronym. There is also a project from the University of Berkeley with that as an acronym. We are not talking about that specific project. We are talking about networks of workstations in the very broad, very, very broad sense.

Later on when you go home, you will read "A Beowulf Cluster," a book in preparation that we included, not the Linux clusters one, and where the author is so religious about what exactly NOW is and there are a few other acronyms you will come across. For now, let me point out that NOW for now, we are using it in the networks of workstations in the general context.

As I said, one of the reasons for having these slides that are more or less hardware specific and more or less talking about traditional supercomputers, is to give you a sense of how parallelism can be and where everything fits. And even then these words fine-grained, medium-grained, course-grained, and so on, those are not very well defined. They are only meaningful in relation to some other similar phrase.

For example, under the heading of fine-grained we may talk about tens of thousands of processors, each one of them rather tiny CPU, each one can only do minimal computation, and, of course, they have distributed memory, but they have, very, very fast interconnection networks. And the typical machines of this kind do single instruction and multiple data.

During the break somebody asked me about this, so let me send a few seconds on this. When we classify a machine as a single instruction multiple data, we are implying the following: That the many CPU's that are executing, they are all executing the same instruction, but, of course, they are executing that instruction with different operands. And by and large because of the tiny nature of that instruction, by and large the instruction will begin and end almost at the same instant. They may be off a level of nanoseconds or what have you, but they are by and large therefore in lock-step.

Later when we come back to networks of workstations, we are going to use the comparative acronym SPMD. And sometimes people use SCMD, second letter replaced with either C or P. But there we are talking about single program and multiple data, and I'll say more about it.

And here is an example of these kinds of machines. These thinking machines are one of the connection machines, one of the most well-known ones, and then we also occasionally use MPP, again, as a general acronym for these kinds of things.

Under the heading of medium-grained, we may have thousands of processors, processors that have obviously power between coarse- and fine-grained, and these are more or less like our ordinary CPU's, and then they can have either shared or distributed memory. And by and large, these have been traditionally research machines, and now we changed this I to a C there, single code multiple data. And this is one typical example of a medium-grained machine.

Other than that, we have nothing much to talk about this. The coarse-grained machines are hundreds of processors. These are by and large regular CPU's, and they may have large amounts of cache memory. They can do vector operations, and they, again, have shared memory or not. And in this case, we don't expect each of these CPU's to be working on the same instruction. They may occasionally do that, but by and large these very many CPU's are invoking their own instructions and their own data on them.

So we have multiple instruction multiple data kind of machines. The SGI Origin that some of us have, we have one here and I heard from the audience that there are other people who have these machines, they can be classified as this. And in this case, the processing element happens to be a rather well-known, common CPU, and these are the various numbers that go with it. And once again, between these many CPU's, the connection network is a very, very fast crossbar switch. So we're not talking about slow, relatively speaking slow, Ethernet, even Gigabit networks.

Now, going back to our networks of workstations, we, of course, want to exploit inexpensive workstations. And as I mentioned, it is difficult to classify what exactly is a workstation, and from now on any time we say workstation, you might as well replace it with PC's.

For what ever reason, probably a cultural one, other personal workstations of the kind Macintoshes, et cetera, are not playing a role in this. Typically, when I hear of clusters, they tend to be these days PC clusters or the more traditional workstation clusters like SPARC's, like SGI and so on.

We want to also use commodity network. And you have heard this word commodity used many, many times, and you are going to hear this again and again. By this we mean basically that it should not be a network that is so special that it is either very, very specifically made for one particular small group or so on, even if it is a common enough network like, for example, Myrinet, it is a gigabit speed network, but it is not common enough. We wouldn't be calling that commodity network.

But, today, the standard Ethernet which is running at gigabit speed, that one we could almost call it that. But under this heading, we can certainly include, as of now, the standard 100 megabit per second Ethernet. And the network then becomes a truly distributed multiprocessor system.

It is so truly distributed that there's absolutely no question that there is no shared memory, in the old sense of the word, that you could do a memory assignment operation. You set a certain location to a certain value, this is no longer what you would refer to as an assignment statement. That kind of an assignment, when done on a so-called distributed memory or processor, has to happen through some way of doing message passing equivalent.

And, of course, these workstations work together by sending and receiving messages, and we will spend quite a bit of time on this. And the standard way of dealing with these programs is you may have C and Fortran programs that you will revise to accommodate these newer libraries such as PVM and MPI, et cetera.

Now, as I promised, we are going to debate a little bit about this word parallel. Parallel to us could mean any one of these four words, and depending on the communities that you talk to, these four words mean very, very special things to them. But in this group here, I'm going to take a union of all these meanings. So from now on if I say parallel, do not therefore assume it is not distributed or it is not networked, et cetera.

So these four words, on the other hand, can be very confusing, and let me give you the computer science community's view of this word parallel. And some hardware people don't quite agree with this, but it is worth really understanding this definition. If you take two statements that are guaranteed to be sequential, so that there is no dispute about the definition, S1 and S2 themselves are whatever that word may mean, really sequential. Now we are trying to put them together in the parallel way.

And the standard definition for what does it mean to say S1 is being executed in parallel S2 is the following. And I'm right now focusing on the time element. There are a few other items I would like to say.

Let's say independently of S2, that you have observed that S1 begins at time b1 and it ends at time e1. Similarly, S2 begins at time b2 and ends at e2. Now these are the time instants that you have observed in the context of S1 combined with S2 in this parallel way.

And then we would like to answer the same question about when did this parallel combination begin and when the did this parallel combination end? And the answer is the parallel combination began at the earliest of these two instants, b1 and b2, and it ended it is latest at these two, e1 and e2.

As a result, you will notice that if you view this parallel bar as an operator, that the parallel bar is all of a sudden symmetric. You could commute the two operands. Instead of writing S1 || S2, you could have written it S2 || S1. What is an even more surprising corollary of that definition is, it permits totally sequential execution of S1 followed by S2. And furthermore, because of the commutativity. You could have executed S2 and then S1. So you view the sequential execution of this parallel combination as a very degenerate case of the parallel operation.

Now what becomes a nuisance, because of the parallelism, indeed is the following. So far we have said nothing about shared memory between these two statements S1 and S1, and depending on how the shared memory is being manipulated within S1 and S12, we have considerable amount of extra care that we must take before the parallel combination produces a well-defined result.

Now, we are not going to worry so much at this level of definition in the next few minutes, but on the other hand, we also need to understand this dependency business. And I have cooked up a little example that illustrates that dependency in a hopefully a very, very clear way.

I have three lines of code up there, and the second one uses the parallel bar and the first and the third one has a semicolon. As you know, the semantics of a semicolon, apart from it being a syntactic delimiter that tells you when a statement has ended, is that the statement y := c + d in line one, must begin its execution. If you were to time that again as b2, that b2 must be greater than or equal to the time when x := a + b has ended.

So again, if you think of these two as S1 and S2, the ending of x := a + b, must precede the beginning in line one of y := c + d. And if you look at the second line, they are in parallel. And if you assume, indeed, that A, B, C, and D are not alias's to each other, they are not different names to same memory location, you can see that line one is actually an over specification by the programmer.

The programmer shouldn't have told the compiler or in the language that indeed x should be assigned the value prior to y being assigned a value. He should have been willing to say, in this particular case, x and y, of course, to receive their values, but it didn't matter to his computation the order in which these two variables, these two targeted variables x and y, receive their values.

And so would prefer, for the purpose of our computation, that a programmer specifies the sequentially of things only when it is strictly needed, and that is the situation, for example, in the following line. So if you then say that A, B, C, D are not independent, then as you can see well the order in which these two statements get executed matters. Whereas, as it is, those three lines would in the end produce the same end effect in the two variables x and y. So this is the data dependency issue that one should understand.

Then we come to trying to classify different kinds of parallel computations. And this is a naming scheme that Linda people prefer, and I'm going to relate it to others. There are essentially three kinds of parallel computations. The first one is called result, and it has alternate names which maybe we'll just move on and then we'll see.

The result parallelism is also called perfect parallelism. And last night Miron was suggesting that we should call it natural parallelism. It is also called in literature embarrassingly parallel. But perfect parallelism strictly, strictly speaking is a computation that can be divided into sets of independent tasks that require, cross out the word little, no communication.

For example, if you have a matrix and we are interested in summing up the columns, the individual summing of the columns are perfectly independent of each other, and it could be done in this sense of the word perfectly parallel. At the end of summing up each of these columns, there is no combination of the results that needs to be done, but, in general, that is not the case. And the reason it is called the result of parallelism, is the following:

If you observe the shape in some geometric sense the shape of the computation, the computation is subdivided, as we say in that bullet two, in a way where the shape matches the original computation, and that is why it is often called result parallelism. And the standard examples of that are simulations and computing some arbitrary function such as F with some number of variables that are all independent.

The MW model, and this is kind of politically correctly speaking a little bit. The M and the W in the historical sense refers to all kinds of things. The earliest similar expansion is master worker I think we now prefer the manager worker model as a name.

In this case, the one process is called a manager, and it is the manager who receives the initial request more the computation. And it is the manager who decides how to divide the computation up, and it is the manager who initiates the worker process. So he starts the worker processes, he gives them the various computational tasks, and of course he is tracking progress. And whatever requests the workers have, he handles them. And any interface that a user wishes to have is not with the workers at all, but with this manager process.

And the workers, well, they do the real work, the grudge work of the actual computation, and every once in a while they will send results back to the manager, and it is the manager's job to combine the results into the final answer. And there is a word that we should use here so that it is helpful in our later discussions about MPI, and that is reduction. The way the manager typically gives workers, is kind of a broadcast, and the MPI term that you will see later on is a scatter.

But reduction basically does the following: We have independent subresults computed by the workers, such as r1 r2, et cetera, up to some rn, and then we have a certain operation with certain properties. And the reduction effectively puts this operation in between the subresults, so that the result is then obtained by the normal computation by the master process as r1 operated on r2 operated on r3 and so on. So in the master worker, or manager worker paradigm, it is perfect parallelism. And in the end there is some way of combining the results back into a common answer, and that is typically done with reduction.

The data parallelism is also called domain decomposition by other groups, and I prefer the term specialist. In this case we have some processes that are capable of doing one thing very well, and we have many such specialists. So, for example, you may have a little process that can do only fast Fourier transform, and some other process that can only do matrix multiplication and so on. In which case, you use them in such a way that each one of these processes does its own thing, and they are being done in parallel. And the term specialist parallelism explains this better than the term data parallelism.

Occasionally there is control parallelism, and this can be compared with the term I had mentioned before, agenda parallelism. In this case we have things to do. Like, for example, if you are a handyman and your house needs repairing, today you may do some painting job, tomorrow you might do some house carpentry, and so on and so forth. And this is the explanation of that. Different operations perform simultaneously, of course, on different processors. And the next bullet explains this in the context of a chemical plant.

Then we come to this. So we have parallel processes of whatever kind, and how do they communicate? The bottom line answer is, well, there are essentially only two ways of doing this: Shared memory and message passing. And I'm going to spend several minutes on each one of these. And please feel free to ask me questions, because these are, at least in our normal course of teaching, these are difficult concepts.

By shared memory, first of all, we mean the following: The idea does not make sense unless you are talking about at least two processes. So let's say that we have processes A and B. Process A writes to a memory location; B reads from that memory location. Now the question often we need to ask is: What would B get? And the answer depends on when did B read.

And that brings us to, well, does that mean that we have a clearer way of saying when, as in before or after A has done its writing. This is why synchronization is crucial. And, of course, the main reason for this being a way of doing things, is the speed is very, very good.

Shared memory. Remember, we still haven't attached the word distributed to this yet. So this is shared memory. If you want to attach a word, call it true shared memory. True shared memory needs hardware support. It needs multi-ported memory. And in order to do the synchronization correctly, the computer science literature over the years has come up with things like these: Test-and-Set. It is an atomic operation in that it cannot be separated into testing and then setting.

If you invoke an instruction like this, a typical test set instruction would do the following to a memory location: It tests whether the memory location contained a 0 or not. So it's a true or false based on that test. And then at the same time it stores a value that you asked to set. So you may say test-and-set 10. It may turn out to be true, but provided the old value was or was not a 0.

Whereas at the end of this test set, the new value that the location has is definitely 10. So if you were to test again, it will be non-0, and there is no intermediate state in between. And this is not to be confused with turning of interrupts and so on and so forth.

Typically, test and set is not an instruction that turns interrupt offs. It is something that is done at the level of hardware as a cooperating thing between the CPU and memory that it should be guaranteed at the lowest level of hardware that it is an atomic instruction.

A software artifact that can be constructed out of test and set, which has much cleaner semantics, is semaphores.

AUDIENCE MEMBER: When you said distributed shared memory, are --

DR. PRABHAKER MATETI: I'm coming to that. Right now it is still true, in parentheses, true shared memory.

AUDIENCE MEMBER: My question was going to be how far away are they? Are we doing distributed shared memory? And I guess --

DR. PRABHAKER MATETI: In the context of networks we are only doing distributed shared memory. But within the network, suppose you have an SMP machine, that SMP machine has, let's say, two processors. Those two processors are using true shared memory. Whereas between two nodes, we are doing distributed shared memory.

So let's continue this discussion about shared memory. And we need to understand the semantics of it. Here are some assumptions. We assume that there is such a notion of time that we could somehow say is global, and it is almost something that is so obvious in the context of true supercomputers and so on, that it is almost too difficult to justify talking about this.

But let's say that there is a global time, and we assume that this global time increments in discrete steps. So shared variable, let's call it s, and let's give this notation to the value that this variable will have at various times. So it will have the value vi at time instant ti. And because this is a discrete kind of time, i is going to go from, say, 0 to some upper limit. And process A sets this variable s to a value of, let's say, v1, and let's call the corresponding time instant t1.

Now, assume no other assignment occurred after t1. Now the question is B reads s at time t and gets a value, and that value is v, and it read this at time t. And we would like to answer the question about what was this value v based on knowing what was t. If t is less than t1, we better get v0. If t is equal to t2, we better get v1. And, of course, there is that middle ground, what if it is t1?

Well, this is where, depending on how carefully the hardware is designed, you may get an answer that is reasonable when, indeed, time equals t1. Here we are talking about, indeed, exactly simultaneous read/write access to a location.

So far in what I have explained, I had essentially made a hidden assumption that setting the value v1 to a shared available s, happens instantaneously, happens in zero time. Of course, that is not the case. Even sitting in memory location takes a finite amount of time. But within the assumptions we should therefore say that the reading of the value of that shared variable should not occur at all at time t1. And this is what we expect, this so-called shared memory to deliver to us as semantics.

This is where distributed shared memory, if you are going to use it, better promise us something close to this. And we are still going to take a minute or two before we get to the distributed version. So we now have, again, true shared memory, and we have multiple processes.

And awhile ago we talked about statements S1 and S2 being executed in parallel, and we said, well, semaphore is a good way to protect how the variables get updated if the dependencies are such that the order of execution matters. And I said semaphores are a good way of doing that. Once again, the definition of semaphores is so subtle that it is extremely confusing to people.

The angular brackets there is a standard notation to suggest that the enclosed statement is atomic. Again, atomic meaning no intermediate action is possible on the computer system. So S is assigned S + 1, and this is the definition of the so-called V operation. Some textbooks call this V operation by the name Signal, but I would like the avoid that.

So V operation on semaphores atomically increments it. Some textbooks will say things about process gets removed from the queue, gets placed in the queue and so on, and I think we should avoid that. All that happens in V operation -- and this is the real definition that one ought to use. All others are particular implementations that make particular assumptions and therefore become invalid when you don't check these assumptions are still valid.

So V operation simply increments it and that's that. Whereas the P operation is this. And it is the word "when" that is the most confusing one in there. First of all, let me observe that the entire thing is put in the angular brackets so that whole thing is atomic. On the other hand, we have this "word" when. I will explain that world in just a second.

For now, let's replace that word "when" with "if," so that you get a first-level understanding of what is going on. If S is greater than 0, obviously it is true or false. So if it is true, well then we are decrementing it by one. And the whole thing is in angular brackets, so therefore this happens atomically.

So there is no question of somebody else also noticing that S is greater than 0 and also trying to decrement it by one. So there are two competing processes who are trying to do this P operation. Because of the angular brackets, it is guaranteed that exactly one of the two processes will succeed in doing the decrement operation. It will not be the case that both will wait on each other. It will not be the case that both will be able to complete the operation.

Now the crucial part. It is indeed not if, it is when. Because the if, when you replace that with the word if, we are not describing what happens when indeed the value of this semaphore is 0. If we are using the word if there, it would suggest that the entire thing is like a no op, like a do-nothing statement. That is not the effect of this. Indeed, when the value of this semaphore is 0, the process that is attempting to do this operation should be waiting, should somehow be stopped from proceeding further.

So, when it tests S greater than 0 and it fails -- now, remember this testing is happening within these angular brackets, meaning it is happening in an exclusive way. While this testing is taking place, no other process is able to test or change the value of this S. And low and behold, we just discovered that it is 0.

What should this process do having obtained exclusive access because of the angular brackets? And the answer is this process, conceptually speaking, steps out of the angular brackets and is now in front of the angular brackets in competition with other processes who may be trying to do the same P operation or, for that matter, the V operation somewhere else. Note that while this process has just discovered that the value is 0, no other process is able to do the V operation because of the atomicity that we assumed. So that is what we mean by when.

Now, again, quite literally, how does this process know that some other process has changed the value of the semaphore to a 1 or 2 or whatever so that it could go do this decrement operation. And quite conceptually, literally, imagine the following: That this process is blocked here, but it is constantly keeping its eye on the semaphore. And the moment it becomes greater than 0, imagine that it grabs access to it and decrements it.

And now the question is what if multiple processes are doing that? And the answer is, one of them will succeed. We will not guarantee which one. And a word that comes up again and again in this context is nondeterminism, and it becomes very important when you enlarge the scope of this kind of parallelism to networks. The nondeterminism means not even that it is probabilistic governed. It may not obey any particular probabilistic distribution. It may change from time to time. It may not change at all. It may always prefer one over the other.

So the P operation semantics is difficult to explain, but this can be formalized. If we had the math tools we could do that.

AUDIENCE MEMBER: (Inaudible.)

DR. PRABHAKER MATETI: That is where the implementation of the semaphore should take advantage of that. That given that this process has tested S, discovered that its value is 0, therefore it should wait. While this process is waiting, is it possible to do other useful computation? And that is why a practical implementation of this will usually get you into an operating system very low level scheduler, typically called the dispatcher. So a real-life implementation of the semaphore will get you into a multiplexed situation.

A process switch would in all likelihood happen. But remember, that is an easy thing to do on a single CPU situation, and this is the reason we are talking about this here. We want to scale this up to multiple CPU's and, in fact, scale it up to distributed systems. And this is where we will have a tough time detecting when condition, detecting who has arrived first.

And so far I mentioned to you the word nondeterminism. A word that goes with that is fairness. If two processes are competing for this, should we in the long run, if they had completed a thousand times, is it reasonable that half the time one succeeds and the other half of the time the other one succeeds? That notion of fairness we're not at all guaranteeing in this definition here.

Now, a derived bunch of things that are also in use now are so-called condition variables. Note that both of these are what we call in the computer science literature abstract data types. Meaning, after you have created them with certain initial values, they are entirely operated only by these operations and by nothing else. So if it is a semaphore, the only thing you can do on them later on are V and P operations. Similarly, if C is a condition variable, the only thing you can do later on are the wait and signal operations.

As a result of this, you will note that if you had created a semaphore with a non-negative value, it will never ever become negative. And in the case of condition variable, and we don't have time to get into more detail than this, they do not even have an initial value. So a condition variable is simply a variable that you create, and it doesn't make sense to talk about what is its value. All that makes sense is you can do a wait operation on it, and you can do a signal operation on it.

The reason I didn't wish to use these two words for the semaphores is because the semantics of the condition variable operations are similar to the P and the V operations. The C.wait makes the process that is invoking that go into a wait state. Remember, this is not testing anything. It's simply unilaterally self-imposing it into a wait state. So, again, as a result, a good scheduler would switch the CPU to another process. Similarly, signal will signal quote/unquote to the processes that may be waiting.

Now, both of these have made their way into real-life languages. These had been initially very theoretical proposals, but they are now real and there are lots of programming languages that have this. The latest of these that have this is Java.

Now we'll talk about distributed shared memory. Of course, by this we mean that it is shared memory in the same old sense, but now it is geographically distributed, and now the question is: How do we describe the semantics of it? First of all, remember the very first assumption we have made is that there is global time, so I could tell you about what the value ought to be at time instant t1 and time instant t2, and so on.

Given that these nodes are distributed geographically, we do not have global time. And this itself is an extremely hard theoretical problem to try to synchronize time across nodes. It is a near impossible problem to solve. So distributed shared memory does not assume this. It can only give certain minimal guarantees about the semantics.

So programming with distributed shared memory is even more hard. Even more bug prone than the old true shared memory programming was. True shared memory programming you could do right, provided you understood semaphores and condition variables and you programmed them right. Distributed shared memory is an extremely difficult thing to do.

So we move on to the other communication primitive, namely, message passing. By the word message, we mean the following: It is simply a sequence of bytes moving between processes, and, of course, the sender and the receiver both agree on what kind of data is going in that message. It should not be the case that the sender thinks he's sending one integer and the receiver thinks he's receiving four characters.

This is an assumption that we make, but this is an assumption that can easily be guaranteed by how you send this. This is a technical term. Previously I used the word serialization, the so-called Marshalling of data. It does the packing of the data in such a way that the sender and the receiver view the sequence of bytes in exactly the same way, and it takes care of byte order as implied in these little Indian, big Indian kinds of things across different CPU's.

So that's what we mean by message. So process A sends a data buffer as a message to process B. I'm trying to give you the semantics of this. Process B, of course, waits for a message from A; and when it arrives, it copies it into its own local memory. There is absolutely no memory that is shared between A and B when we say A has sent a message to the process named B.

Now there are some obvious things. But you'll be surprised how many times in real-life networks these will be these will be the obvious ones. Of course, a message should not be received before it is sent, but it happens sometimes. A receiver, of course, should wait until there is a message. This is also obvious.

The so-called asynchronous message passing is the following: The sender never blocks, even if infinitely many messages are waiting to be received. So there is no such thing as quote/unquote there was a channel between the sender and receiver, and the channel is full or the network is full, therefore the sender must wait. In the so-called asynchronous message passing, the sender never blocks.

On the other hand, the typical use of this word, unfortunately, asynchronous message typically means this semi-asynchronous message where there is a limit on the buffer space. It could be a large one and therefore there can be a situation where a sender, even though he's doing this so-called asynchronous send, would eventually block.

So these are some of the obvious ones. Now, what I just described to you is point-to-point message passing, where the two processes, the sender and the receiver, were clearly identified. In the process named, say, Q, there was an invocation of an operation, such as, send, and it gave the message M -- sorry for the mix up of lower and uppercase there -- and named the to process, it is being sent to the process P, and, similarly, the process P says receive this from the particular sender.

Later on we'll see that there are some generalizations of these two. Q can say, Well, I'm sending this message. Whoever is listening let him get it. Similarly, P can say, Whoever sends it, I'm receiving it. But in this particular example because we are still focusing on the meaning of this, P has identified Q, Q has identified as P, the receiver.

The message data. The semantics is that after the send-receive operation completed, the semantics is exactly the same as having done the assignment statement. X is assigned m. So the type of m and the type of variable x that was declared must have matched in the usual sense of the assignment statement. So that is the overall semantics of a send-receive.

Broadcast is simply this: It is one sender. Instead of naming the receiver, you are saying whoever is listening. And not all receivers may receive this at the same time of course. And it is a more derived kind of question to ask: Are you even sending them at the same time, even though you did a broadcast?

But unless other statements are made, that is the assumption. That the sending is occurring at the same time, but, of course, depending on where the receivers are, they may or may not receive them at the same time. And this has, as you know from real-life network, nothing to do with physical proximity, that the closer ones gets sooner than the nodes that are far away.

In computer science circles we view this synchronous message passing as more fundamental than asynchronous message passing, and this can be debated on a theoretical level. One ought to think of asynchronous message passing as more fundamental, but this is often viewed as fundamental because, effectively, this is what happens in real-life hardware.

Whereas, if you remember in asynchronous message passing I made the statement that the sender never blocks, and that is not a doable thing in practical terms and therefore it has power. Whereas you could build synchronous message passing out of that, out of asynchronous, vice versa would be hard because how would you build an infinite buffer?

So back to this. The sender blocks until the receiver is ready to receive. That's what we mean by synchronous message passing. Obviously the corollary is you cannot send a message to yourself, because how would you be ready at the same time to both send and receive? Whereas, in the asynchronous message passing, that's perfectly all right, and, of course, there is no buffering at all in this because the sender and receiver are ready to do it.

Now in comparison to shared memory, message passing speed, of course, is never going to be similar no matter how carefully you would implement. Here are some of the reasons why the speed will not be good. Sender copies the message into system buffers. Talking about it in a practical term now. The message, of course, travels the network and receiver copies the message from system buffers to local memory.

And there are all kinds of -- even when this is happening on the same machine, when this so-called network really is within the same memory, even then this copying happens unless you use clever virtual memory techniques, where you make the other process own the page where the message is. But even then speed for message passing will never match shared memory. But on the other hand, and this is an experience-based kind of comment, cannot be proven that it is easier, less error-prone, to do.

Let me explain this because it will show up a few more times. For example, in that master worker paradigm we mentioned that various processes would work on things and they have subresults, and now they eventually want to give the results back to the master. That master, if he's there, will do effectively this synchronization.

Alternately, we could program the workers in such a way that they are all waiting at a certain point, and that certain point is often referred to as the barrier. And barrier synchronization is a primitive that a library can provide so that a process, if it arrives at that control point sooner than the others, that process waits. So until all the processes show up at a certain specified control point, this library routine will keep them waiting, and this is known as barrier synchronization.

Let me move on to how do we do software development for parallel computation. Well, the obvious one is have a mechanical way of converting your code into parallel code, and indeed there are some compilers that will do that -- out of order. The other one is to take your source code, assuming you have it, you reverse engineer a design from that and you recode using new languages, or new libraries at least.

I would like to mention this at this point because we will have almost no further discussion of it. There is this OpenMP I think I mentioned to you in the earlier talk also. This is for shared memory architectures. But distributed or not, the same thing should work.

But this is where the user writes his code in the usual way, in FORTRAN or C or what have you, but there are special hints that he will code as specialized comments. Based on those specialized comments, the compiler can do appropriate things. This software is yet to become widely available.

Now the other way of doing this is, of course, to do libraries. And one library we just mentioned, OpenMP for shared memory. And here are a bunch of libraries that we will spend some time on called PVM and MPI. And it looks like from talking to you, quite a few of you have used one or the other before.

So here the programmer is responsible for distributing the data, for synchronizing, and, of course, sending and receiving information. And BSP is something, again, that we are not going to talk further about, so let me just say a few words now. BSP stands for Bulk Synchronous Parallel model. And there is a library available based on this.

This divides computation into so-called supersteps. In each superstep, a processor can work on local data and send messages. So while so each superstep is happening on a particular node and while it is executing the superstep, there is no further communication. It can only send. There is no receive during the superstep.

At the end of the superstep, this barrier synchronization that I mentioned happens. So if there are some superstep executions that are taking longer, they will have to wait -- sorry, the faster ones will have to wait until all supersteps are finished, so the faster ones will have to wait for the slower ones.

So at the end of the superstep by all these processors, the barrier synchronization takes place, and then all the processes receive the messages that were sent during the current superstep and then they repeat this. So this is a standard that we have at least one library for it, and like PVM and MPI, this can be linked to C and Fortran and so on. Because this won't come up again, this library has process creation primitives in it, and it has remote data access, and it does this bulk synchronization.

Let me say a few things about parallel languages. So we are moving from library to languages. There are surprisingly a large number of these languages. Nearly all of them what we would have to call as experimental or research prototypes. This is one of the reasons why I think none of them have taken a foothold. But here are the various models that they are based on.

There are languages that are essentially shared memory based, distributed or not. There are languages that are, of course, OOP, and there are any number of them. Nearly all of them will have somewhere in there C++ as part of the name. And parallel functional languages and so on.

I'm going to throw at you some of the names of these languages. The most well-known among the shared memory languages is a language called Linda. And this one happens to be by David Gelernter. Some of you may know about the Unabomber, he was the one of the unfortunate victims, so a side comment there.

The Tuple space, as proposed by Linda, is essentially a shared memory model abstracted at a very high level. A Tuple is simply a sequence of values such as in this example here, values v1 through vk. These values may or may not be alike; v1 may be an integer, v2 may be a string, v3 may be a Boolean value and so on.

So here we have an ordered sequence of values. In this case the Tuple is of length k. So that Tuple is placed into a so-called Tuple space. I found the best way to explain that is to make you imagine that the Tuple space is kind of like this carpet. So you just drop this Tuple on the carpet. But while it is like the carpet, there is no notion of proximity as it is there in the real example.

So here we have these atomic primitives which operate on this Tuple space. The In operation takes one Tuple, exactly one Tuple out of this Tuple space, and it does this atomically. Now remember, so far we said nothing about how big the T is. The Tuple can be one megabyte long or ten megabytes or maybe it's only two bytes. So the size of the Tuple can be whatever. Now there is a notion of matching, which I'll explain in a minute.

So the In operation simply grabs one Tuple from the Tuple space and removes it from the Tuple space, so that if you are keeping a count of how many tuples there are in the Tuple space, now there is one less, and this has happened atomically. Meaning that if there are two processes ...

[INCOMPLETE  transcription; tape ran out ... Sorry]

last revised:11/21/00 05:48:59 PM
editor: pmateti@cs.wright.edu