ObjectSpace: An Intuitive Model for Concurrent ...

1 downloads 0 Views 228KB Size Report
May 10, 2018 - Apache dual-licensed (as is the standard in the Rust community) and .... [12] B. Hindman et al., “Mesos: A platform for fine- grained resource ...
ObjectSpace: An Intuitive Model for Concurrent Programming Tuan Tran May 10th, 2018

Abstract

1

Introduction

We introduce ObjectSpace, a new concurrent pro- The rise of the age of Big Data and planetary-scale sysgramming model that aims towards flexibility and tems has put concurrent computing front and center simplicity. The goal of ObjectSpace is to be intuitive

in modern research. The most successful corporations

to programmers while having good expressiveness and such as Google, Microsoft, or Facebook have spent scalability. It is designed to fulfill many roles in a huge effort providing simple, extensible, and faultconcurrent setting, including data passing and com- tolerant concurrent programming models and have munication between threads.

enjoyed considerable success [5], [7], [14], [17]. On the consumer’s side of computing, for the last ten years,

ObjectSpace could be considered a natural evolution of

chip producers have been researching on putting more

Linda, introduced by Gelernter[10]. It centers around

processors into their systems to make up for the end

a concurrent data structure that could store objects

of Moore’s Law. Having a simple and efficient concur-

of arbitrary type. It provides atomic addition and

rent model to take advantage of this new increase in

removal of objects of arbitrary type and lets users

power is more crucial than ever.

retrieve objects based on the value of any of its field.

The addition of complex data structures provides Ob- However, the most common paradigm for concurrency, jectSpace with better expressiveness and type safety multi-threading, requires dealing with lock conditions, than Linda.

which has poor usability and scalability, and is errorprone, as each new thread presents new synchroniza-

We also provide a referential implementation of the

tion issues (coupled with the fact that pthread is

model in Rust. The API of the model proves to be

not an intuitive interface). Message passing interface

simple and intuitive, and could be generalized in many

(MPI) requires machines to be tightly coupled in both

programming languages.

time and space, and therefore is also difficult to use [8]. Even the low-level infrastructure provided by the 1

Unix environment is limited in its capability. This

pass tuples between threads becomes a big limitation.

issue leads to various studies on concurrent program- Not only that tuples are too simplistic to represent ming, starting from the early 1970s, including original

structures occurred in real production, they also fail

and impactful ideas in all aspects: hardware support to convey the meaning of the elements in a message [4], [15], compiler support [2], performance analysis individually and as a whole. [6], and programming models.

ObjectSpace is an extension to Linda aiming to fix

Recently, the distributed community seems to favor these problems. By storing complex objects instead ad hoc systems such as MapReduce [7] or Spark [21], of simple tuples, it enhances the system’s power, imwhich focus on ‘big data’ processing and manipula- proves its flexibility, and enforces type safety, fixing tion. While they have proven to be successful in this

the biggest problems of Linda. We also retain Linda’s

particular field of parallel processing, these models unique ability to specify a value of object’s field as achieve their concurrency by limiting their API, which “filter” condition, producing a powerful but intuitive hampers their ability to integrate with other work- framework for concurrent programming. flows and makes them unsuitable for a wide array of

In the next sections of this paper, we will describe

concurrent programming tasks [16].

the API of an ObjectSpace and introduce a proof-of-

A simple and elegant approach to distributed comput- concept implementation in Rust, a modern language ing is Linda which was first proposed by Gelernter[10]. for system programming gearing towards safety and Although Linda has not received much attention from concurrency. Then we will introduce an example using the community, we believe it offers a great balance be- ObjectSpace to achieve concurrency and analyze the tween simplicity and flexibility. Linda centers around

framework’s performance. Finally, we propose a few

a shared memory space storing tuples from client directions which future work could focus on. nodes in the system. Through Linda’s atomic read and write operations, agents in a concurrent system

2

could achieve time and space decoupling. Its biggest strength though lies on its ability to allow wildcards

2.1

in its operations, which enables nodes in the system to

The ObjectSpace API Basic Operations

control the scope of their operations and open many The basic operations of an ObjectSpace space inpossibilities for communication.

cludes write, read, and take. Each of these opera-

However, as our need to communicate within a concur- tions is atomic. The write operation is simple: given rent setting gets more complex, Linda’s design to only an object obj, we could add it to the ObjectSpace by

2

calling space.write(obj). Notice that obj could be

space.read_by_range("age", range(6,

of any type: an int, a string, a boolean, a tuple, or a

9)) reads an object of type T whose filed 'age'

complex object.

has value between 6 and 9.

read

operation

has

three

variations: Each of these types of conditional operations has

space.try_read() is a non-blocking read of one

all of the variations found in normal read and take

object of type T from the space, space.read() operations. is a blocking read, and space.read_all() is a

In general, the surface API of ObjectSpace is small and

non-blocking read of all elements of type T. Notice

very easy to understand. However, it provides a robust

that since an ObjectSpace could store objects of any

base for multiple data passing and communication jobs

type, a generic type is necessary for the operation.

for concurrent programming.

A take operation is similar to read, except that after an object is read, it will be destroyed from the

3

ObjectSpace. It also has three variations as read.

Reference Implementation

We provide a reference implementation of the Ob-

2.2

Conditional Operations

jectSpace model. The implementation is MIT and Apache dual-licensed (as is the standard in the Rust

Beside reading objects of a type, ObjectSpace also

community) and could be found on GitHub [19]. This

allows filtering output objects based on the value of a

implementation is written in Rust, a language for sys-

field of the type. We call this conditional operations.

tem programming focusing on safety and concurrency.

There are two main types of conditional operations:

Since Rust forbids a lot of possible concurrent error

• Reading by exact value: Given a field name

through its concepts of lifetime and ownership and a

and a value, we return an object of which the

very thorough compiler, limiting how variables could

specified field has the given value. For example: space.read_by_value("age", 8)

be created and passed between functions, it has a

reads

high initial learning curves. However, the benefit com-

an object of type T whose field 'age' has value

ing from native performance with high-level features,

9. • Reading by range:

and straightforward concurrent programming proves Given a field name

worth these limitation.

and a range of values, we return an object

The main data structure of our implementation is a

of which the value of the specified field is in the specified range.

HashMap between the type’s ID and an Entry storing

For example:

all objects of the corresponding type. Before being 3

written into the ObjectSpace, objects are serialized

after a worker thread finds a prime number, it writes

into a flatten JSON-like structure and assigned a

the number into the ObjectSpace. Then at the end of

unique ID. The Entry for each type consists of two

the program, the master thread reads in all calculated

data structures: a HashMap between an object’s ID prime numbers from the ObjectSpace and prints them and itself; and a reversed indexer which maps a pos- out. sible value of a field with an ID list of objects whose

The second usage of ObjectSpace is communication,

mentioned field has the corresponding value.

achieved through the Task object, which represents a

This structure enables straightforward implementa- range of number. In each round of iteration, the mastion of all of ObjectSpace’s features, especially condi- ter thread writes new Tasks to the ObjectSpace. Each tional operations. A downside of this implementation worker thread will take one Task and calculate the though is that it can only store objects that are JSON- prime numbers within the range of the Task, before serializable. Moreover, conditional operations could writing them to the ObjectSpace and using the Task only operate on “leaf fields” of an object: fields whose

to communicate back to the master that a job has

values are numbers, booleans, or strings; and condi- been finished. Notice that the master thread does not tional operations on more than one field at a time

need to know which task is taken by which thread, it

are complicated. However, we find that despite these

just needs to know that such task has been completed

operations, our implementation is still very robust (this information, however, could be easily added to a and flexible, and proves to be suitable for a wide array Task by the programmer if necessary). This helps us of concurrent programming jobs.

achieve space decoupling between threads and reduce the complexity of the program.

4

We also provide a few other examples of ObjectSpace,

Example: Calculating Primes

including a reminder program in the same vein as one introduced in Gelernter’s original paper[10], and a

An example of ObjectSpace usage could be found in

A. This example calculates and prints out all prime program for drawing Mandelbrot fractal. All of them numbers smaller than ten million. The code requires could be found in our GitHub repository[19]. a few changes compared to a program written for single-threaded setting, but in general still clear and

4.1

Performance

simple to follow. Here we measure the performance of the aforemen-

This example introduces two common usages of Ob-

tioned prime numbers example to test the scalability

jectSpace. The more obvious usage is for data passing:

of the framework. All experiments are done on a 4

MacBook Pro 15 inch late 2016 running macOS 10.13, and consistent, lowering the learning curve while allowwith Intel i7 6820HQ, 16GB of RAM and 512 GB of ing a high degree of flexibility for the programmers to SSD.

express their intention through conditional operations. Time(s)

In practice, we find that ObjectSpace is suitable for

Normal single-threaded

24.193

a wide array of concurrent work. The integration of

ObjectSpace 1 thread

32.248

the reference implementation with the Rust language

ObjectSpace 2 threads

19.507

makes it very natural to learn and use. We expect any

ObjectSpace 4 threads

13.931

future implementation of the paradigm to integrate

ObjectSpace 8 threads

14.741

with their respective language similarly closely.

ObjectSpace 16 threads

16.084

Compared to MapReduce, another popular concur-

ObjectSpace 32 threads

16.731

rent programming framework[7], ObjectSpace does not

Setup

Notice that since this machine has only 4 cores, 8 force the programmers into any particular paradigm, threads (and several already used to run the OS), we instead merely serving as a facilitator for parallel comexpect slowdown as the number of threads exceeds

puting. Its flexible nature means that it could serve

four.

as either a data storage, data passer, or communi-

The experiment shows that our implementation of

cation intermediary. As a result, programmers have

ObjectSpace introduces a non-trivial overhead to the more freedom in choosing the best design for their program, most likely due to serialization and synchro- program; yet it also requires more thought and effort nization mechanism using the Task structure, which on the part of the programmers to get their model correctly. However, a MapReduce-like paradigm could

is not necessary for this case. However, it proves to

scale well up to the number of available threads in be achieved quite easily through ObjectSpace using an interface similar to a Task in our example.

the system. We believe that given more optimization

in the example program, we could achieve even better When using ObjectSpace, programmers need to carescalability.

fully think about the flow of objects passing into the structure to maximize performance. We, however, do

5

not think this is a fault of the paradigm, since parallel

Insights and Future Research

programming is too complex to hide perfectly, and The design goal of ObjectSpace API is based on three thus it is best to require programmers to consider it principles: simplicity, flexibility, and good language explicitly[14]. integration. The API surface of ObjectSpace is small Due to the experimental nature of the framework,

5

there are still a lot of room for improvement. Most who has worked with me on the first prototype obvious of all is performance enhancement: the proof- implemented in C#, with could be found at of-concept implementation still heavily relies on locks https://github.com/tmt96/dotSpace-objectSpace. for the sake of ease of implementation, which brings a lot of overhead to the program. Serialization of

7

objects, which in some cases is unnecessary, also contributes to the overall overhead. We hope to improve

References

[1] G. Agha and C. J. Callsen, “ActorSpace: An open

in future iterations.

distributed programming paradigm,” vol. 28, no. 7,

We would also like to investigate additional APIs that 1993. could benefit the framework. An example of such an

[2] D. F. Bacon, S. L. Graham, and O. J. Sharp,

API is the ability to declare conditional operations

“Compiler transformations for high-performance com-

on multiple fields at once, mirroring such ability in

puting,” ACM Computing Surveys (CSUR), vol. 26,

Linda. Extra work still needs to be done to figure the

no. 4, pp. 345–420, 1994.

easiest way to implement such an API.

[3] H. Baker and C. Hewitt, “Laws for communicating

A big goal of ObjectSpace is to generalize to the

parallel processes,” 1977.

distributed setting, similar to Linda. Since in our

[4] G. E. Blelloch, “Scans as primitive parallel opera-

implementation, objects are already serialized before

tions,” IEEE Transactions on computers, vol. 38, no.

added to the ObjectSpace, adding distributed capa-

11, pp. 1526–1538, 1989.

bility should be possible. The distributed setting will

bring new unique challenges to ObjectSpace, for ex- [5] F. Chang et al., “Bigtable: A distributed storage ample: whether objects in he framework should be system for structured data,” ACM Transactions on Computer Systems (TOCS), vol. 26, no. 2, p. 4, 2008.

stored distributively or centrally.

[6] D. Culler et al., “LogP: Towards a realistic model of

6

parallel computation,” in ACM sigplan notices, 1993,

Acknowledgement

vol. 28, pp. 1–12. This project could not have been completed without [7] J. Dean and S. Ghemawat, “MapReduce: Simhelp and support from Professor Duane Bailey and plified data processing on large clusters,” CommuProfessor Jeannie Albretch, who have directly super- nications of the ACM, vol. 51, no. 1, pp. 107–113, vised this project.

2008.

Special thanks to my friend Daishiro Nishida, 6

[8] P. T. Eugster, P. A. Felber, R. Guerraoui, and A.- performance computer architecture, 2007. hpca 2007. M. Kermarrec, “The many faces of publish/subscribe,” ieee 13th international symposium on, 2007, pp. 13– ACM computing surveys (CSUR), vol. 35, no. 2, pp. 24. 114–131, 2003.

[17] K. Shvachko, H. Kuang, S. Radia, and R.

[9] A. S. Foundation, “JavaSpaces service specification.” Chansler, “The hadoop distributed file system,” in 2016. [10] D. Gelernter,

Mass storage systems and technologies (msst), 2010 ieee 26th symposium on, 2010, pp. 1–10.

“Generative communication

in linda,” ACM Transactions on Programming

[18] A. Thusoo et al., “Hive: A warehousing solution

Languages and Systems (TOPLAS), vol. 7, no. 1, pp. over a map-reduce framework,” Proceedings of the 80–112, 1985.

VLDB Endowment, vol. 2, no. 2, pp. 1626–1629, 2009.

[11] S. Ghemawat, H. Gobioff, and S.-T. Leung, “The

google file system,” SIGOPS Oper. Syst. Rev., vol. [19] T. Tran, “Rust object space,” GitHub repository. 37, no. 5, pp. 29–43, 2003.

GitHub, 2018.

[12] B. Hindman et al., “Mesos: A platform for fine- [20] Y. Yu et al., “DryadLINQ: A system for generalgrained resource sharing in the data center.” in NSDI, purpose distributed data-parallel computing using a 2011, vol. 11, pp. 22–22.

high-level language.” in OSDI, 2008, vol. 8, pp. 1–14.

[13] C. A. Hoare, “Communicating sequential pro- [21] M. Zaharia, M. Chowdhury, M. J. Franklin, S. cesses,” Communications of the ACM, vol. 26, no. 1, Shenker, and I. Stoica, “Spark: Cluster computing pp. 100–106, 1983.

with working sets.” HotCloud, vol. 10, nos. 10-10, p. 95, 2010.

[14] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: Distributed data-parallel programs from sequential building blocks,” in ACM sigops operating systems review, 2007, vol. 41, pp. 59–72. [15] R. E. Ladner and M. J. Fischer, “Parallel prefix computation,” Journal of the ACM (JACM), vol. 27, no. 4, pp. 831–838, 1980. [16] C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis, “Evaluating mapreduce for multi-core and multiprocessor systems,” in High 7

A

Calculating Prime Numbers with ObjectSpace

extern crate object_space; extern crate serde; #[macro_use] extern crate serde_derive;

use std::thread; use std::env; use std::sync::Arc;

use object_space::{ObjectSpace, ValueLookupObjectSpace, RangeLookupObjectSpace, TreeObjectSpace};

fn main() { let mut args = env::args(); let upper_lim = 1000000; let thread_count = 4;

// setup. add 2 & 3 just because we can let mut n = 4; let space = Arc::new(TreeObjectSpace::new()); space.write::(2); space.write::(3);

// create 4 worker threads for _ in 0..thread_count { let space_clone = space.clone(); thread::spawn(move || { check_numbers(space_clone); }); }

8

// continue until we hit limit while n < upper_lim { let max = if n * n < upper_lim { n * n } else { upper_lim };

for i in 0..thread_count { // divide work evenly between threads let start = n + (((max - n) as f64) / (thread_count as f64) * (i as f64)).round() as i64; let end = n + (((max - n) as f64) / (thread_count as f64) * ((i + 1) as f64)).round() as i64;

let clone = space.clone(); clone.write(Task { finished: false, start: start, end: end, }); }

// "joining" threads for _ in 0..thread_count { let clone = space.clone(); clone.take_by_value::("finished", &true); } n = max; } }

fn check_numbers(space: Arc) { loop {

9

let task = space.take_by_value::("finished", &false); let max = task.end; let min = task.start; let primes: Vec = space.read_all::().filter(|i| i * i < max).collect(); for i in min..max { if primes.iter().all(|prime| i % prime != 0) { space.write(i); } } space.write(Task { finished: true, start: min, end: max, }); } }

#[derive(Serialize, Deserialize)] struct Task { finished: bool, start: i64, end: i64, }

10