Download PDF - Springer Link

3 downloads 164264 Views 545KB Size Report
construction and use of affordable housing built with industrialized methods. One .... compactness of the housing block, the degree of optimization of building services .... Retrieval System (IRS) is a search engine based on an enhanced vector.
Applying Clustering Techniques to Retrieve Housing Units from a Repository

Álvaro Sicilia, Leandro Madrazo, and Mar González ARC Enginyeria i Arquitectura La Salle, Spain

The purpose of BARCODE HOUSING SYTEM, a research project developed over the last four years, has been to create an Internet-based system which facilitates the interaction of the different actors involved in the design, construction and use of affordable housing built with industrialized methods. One of the components of the system is an environment which enables different users – architects, clients, developers – to retrieve the housing units generated by a rulebased engine and stored in a repository. Currently, the repository contains over 10,000 housing units. In order to access this information, we have developed clustering techniques based on self-organizing maps and k-means methods.

Introduction Nowadays, the possibility of carrying out design processes collaboratively on the Internet offers different users the opportunity to participate in the design of mass customized housing [1], [2], and [3]. This requires the design of environments which support user interaction by means of appropriate interfaces while at the same time taking advantage of computer programs’ capacity to generate and evaluate design solutions. BARCODE HOUSING SYSTEM creates housing blocks as aggregations of housing units which have been automatically generated using a rule-based system. The overall process of designing a housing block, from the housing unit to the overall building, is open to the participation of the different stakeholders (architects, builders, manufacturers, occupants, facilities managers), which lead the design process by providing inputs at different stages using the interfaces accessible on-line. A housing system has been specifically created which is based on a combination of horizontal and vertical space bars. The housing system is represented in a graph which contains all possible floor plans [4]. J.S. Gero (ed.): Design Computing and Cognition'10, pp. 387–401. © Springer Science + Business Media B.V. 2011

388

Á. Sicilia, L. Madrazo, and M. González

At the outset, one possible way to generate a housing unit is by specifying its individual characteristics. A second way it is by specifying the generic characteristics of a type of housing unit so that the system returns a series of floor plans. In the first case, there is the risk that system is unable to return a valid solution, or that it takes a lot of time to find it. With the second procedure, both problems can be avoided. The trade-off in this case is that it is necessary to have a background generation of the housing units which are stored in a repository. To search the housing units in the repository, an information retrieval system is necessary. To improve the performance of the retrieval, we have applied clustering techniques which facilitate the interaction between users and the design system. Clustering techniques have been used to solve classification problems in information systems in different domains, such as in financial [5] and traffic data analysis [6], to classify massive document collections [7], and to analyze protein sequences in the field bioinformatics [8]. Some applications can also be found in the domain of architecture, such as data mining techniques to analyze design information stored in a case library [9], or techniques which are integrated into an information system to retrieve floor plans [10]. Also, clustering techniques have been used along with shape grammar systems to categorize shape rules and designs in order to facilitate design exploration [11]. In this paper we describe the search tools which users employ to retrieve the repository of housing units previously generated. Two clustering techniques – Self-Organizing Maps and k-means – have been implemented and their performance has been compared.

BARCODE HOUSING SYSTEM: A Generative System for Housing Design BARCODE HOUSING SYSTEM consists of interwoven working spaces in which the different actors (architects, developers, manufacturers, occupants) participate synchronously and asynchronously throughout the entire process of design and construction of housing units and of the buildings resulting from their aggregations. The work spaces and their functionalities are the following: • PROJECT DEVELOPMENT. In this working space, developers, architects and building managers specify site properties (area, size), the number and type of housing units, building and planning regulations (building volumes and height) and environmental conditions (climate, orientation). Alternative building solutions (massing, location) can then be explored for the given site conditions and brief.

Applying Clustering Techniques

389

• HOUSING LAYOUTS. In this space, architects select a set of units that will later be used to generate a building. The units have been generated by the system in batch processing and are stored in the system database. The selection becomes a “discovery” process as the architect finds the housing layouts while navigating through the space of solutions – using clustering techniques – which the system has generated in previous project developments. Should an adequate layout not be found from the pool of the existing solutions, the architect can request the generative system to create alternative layouts that conform to the desired criteria (surface, number of rooms, number of bathrooms, open or closed kitchen). The new solutions are stored in the database, thus enhancing the previously existing pool of solutions. • HOUSING CONFIGURATION. Occupants describe their housing program (number of family members, usage of spaces, lifestyles) working with user-friendly interfaces that represent housing units and layout in a graphic language that can be understood by lay people (schematic plans, photographs depicting activities in spaces, bubble diagrams). The system returns the housing units which most closely correspond to the criteria defined by the users and they select those that most closely meet their needs. Then, the selected units are used in the generative process that creates the housing block. Once the housing units have been assembled, there is a process by which occupants and architects collaborate using a 3-D environment to define the arrangement of a living unit (finishing, partitions and furniture). • HOUSING ASSEMBLY. In this environment, the architect defines the design criteria for the assembly of housing units, including: degree of compactness of the housing block, the degree of optimization of building services, the minimum distances to access cores (staircases, elevators), the material of the structural skeleton and so on. Once the design values are set, a generative process creates the solutions that satisfy these criteria. • BUILDING COMPONENTS CATALOGUE. A XML-based product modeling catalogue enables manufacturers to enter descriptions of their products, which will be then selected by the team in charge of the project development. Based on this selection, the future occupant chooses the components (doors, windows, partitions) and inserts them in the 3-D depiction of a dwelling. In the following sections, we will introduce the retrieval processes in the HOUSING LAYOUTS work space where the clustering has been applied.

390

Á. Sicilia, L. Madrazo, and M. González

Housing Layout Workspace As Steadman observed, there are two basic approaches to automatic floor plan generation: to generate one or few plans that satisfy a set of specified constraints, or to produce all the possible plans which cover all the requirements [12]. In the second case, we avoid the high cost of generating a possible solution by shifting the computational power to the search process. In this way, it is possible to facilitate the process of finding a solution by guiding the search for designs that suit the specific set of criteria. We have opted for separating the rule-based generation of design solutions from the search in the design space. In the Housing Layouts work space, architects select and/or generate a set of housing units according to the project specifications: floor plan dimensions and area, and access type. These specifications have been previously set in the Project Development workspace by the different agents – developers, architects and building managers – involved in the project. Housing Layout encompasses two environments (Figure 1): Housing Generation and Housing Selection. In the former, the floor plan layouts are created through a generative process. The spatial structure of a housing unit is represented by a graph. The nodes of the graph are cell spaces and the edges represent the connections between them. To minimize the computational cost of generating a layout by searching through all connections in the graph, a constraint list has been implemented. For example, the constraint list contains the information about the sizes and proportions of the spaces or the type of entrance to the dwelling (by staircase or walkway). The constraint list contains information about the graph and it is separate from it. In this way, the information contained in the list can easily be edited so that the user can guide the generative process of the layout. At the end of the generative process, the designs are stored in the Housing Repository, Figure 1. The search on the designs previously generated takes place in the Housing Selection environment. At the start, a clustering process is launched to classify the designs. Then the floor plans subsequently generated are classified using the current clusters configuration. These clusters help the architect user to identify the housing design he or she needs for a particular housing project. Also, the groups previously created by other users can provide insights that facilitate the search. Furthermore, by adding tags to the discovered designs users participate in creating a metadata layer, thus bringing their individual knowledge into the system. These tags also can be used in future searches by other users.

Applying Clustering Techniques

391

Fig. 1. Structure of the system and workspaces relations

The Housing Configuration workspace is enabled when the architect has created a collection of floor plans. Later on, the future resident contributes to the design process by describing the characteristics of the dwelling through three interactive interfaces. Tenants choose their living units from among the collection previously selected by the architect. Afterwards, the tenants can customize their dwelling with the help of the architect interacting with a three-dimensional model. Later on, once the building has been constructed, residents can still have access to the 3-D model and add tags to describe their experience after living in the unit. These tags can inform the search process of architect users in later projects.

Housing Selection Workspace In the Housing Selection workspace, the architect user can search and collect designs through a discovery process. This environment provides a variety of tools – attribute search, cluster navigation, similarity search, social search and group navigation – to assist the user in this process, Figure 2. With searches, the user creates a query using the attributes of the housing designs, giving a specific value for each attribute (e.g., area, room

392

Á. Sicilia, L. Madrazo, and M. González

type). Also, the user can specify the weight of the attributes. The output of the query is a list of floor plans ordered by their degree of relevance. In the cluster navigation, the user selects a cluster and the system returns the floor plans that fall within it. Additionally, the user can use the attribute search to delimit the scope of the exploration [13]. The clusters are created by the clustering system, which will be described later. With the similarity search, a collection of designs are retrieved which are similar to a floor plan provided by the user. This type of search makes use of the clusters. In the social search, the user can add tags to the previously described search types. These tags are part of a metadata layer between the users and the repository and are created by the users themselves. In group navigation, the groups created by the user are stored in the system database and can be accessed later by other users who can use them in their own search. These groups can be considered ‘custom’ clusters.

Fig. 2. Selection Housing Units structure

The Housing Selection workspace is composed of the Information Retrieval System and the Clustering System, Figure 2. The Information Retrieval System (IRS) is a search engine based on an enhanced vector model adopted from information retrieval science (IRS) [14]. IRS parses the user queries, retrieves the elements from the Housing Repository and sorts the outputs according their relevance. The query is defined as a list of the floor plan attributes. These attributes can be architectural, Table 1, tags, or associations of clusters and user groups. In this way, the IRS can meet the requirements of the different search tools described above. To perform a query, the IRS needs a list with the weights of each attribute.

Applying Clustering Techniques

393

Then, the elements having at least one attribute are retrieved from the repository and the IRS calculates their degree of relevance, Figure 3. Finally, the outputs are ordered, situating the elements with a high relevance value at the top of the list. Moreover, IRS makes use of a cache memory to speed up the response time.

1, 0,

Fig. 3. Calculation equation of relevance degree

The Clustering System (CS) is responsible for clustering the Housing Repository. As the architect user generates new floor plan layouts, the CS assigns a cluster to them. Should the new layouts not fit the existing cluster configuration, the CS will cluster the entire repository content from scratch. The CS implements two clustering methods – Self-Organizing Maps and k-means – and it can run them with different configurations. In the case of Self-Organizing Maps, it sets up the grid dimension and the learning coefficient. The configuration parameters of the k-means method are the number of clusters, the distance function and the cluster initialization method. The Housing Repository is organized as a multilayer structure, Figure 4. At the bottom layer are the housing floor plans. On top of them are the architectural attributes that describe the floor plans which are extracted by the generative process. The cluster data is created by the Clustering System accessing the architectural attribute layer. The cluster layer is formed by links to the housing floor plan layer. The tag layer is composed of metadata generated by users and includes links to the other layers.

Fig. 4. Housing Repository data structure

394

Á. Sicilia, L. Madrazo, and M. González

The interface for architect users is depicted in Figure 5. Using the previously described search tools, the user can search for housing layouts with, for example, a 70 sqm area, two bedrooms and one bathroom. These values of attributes and tags are introduced in the lower gray window. The output of the query is shown in the large window. On the top left corner, the most relevant layouts are shown, with the less relevant ones in the opposite corner.

Fig. 5. Architect user interface to select housing units

When the relevance is far below the maximum value, the figure is dimmed. In this way, the users can see at a glance the housing layouts which best meet their requirements. On the top right, the collections that are being created are shown. Directly underneath, the clusters are displayed. At the very bottom the collections previously created by other users which can be used to navigate are listed.

Clustering Housing Layouts Automatic classification systems can be implemented using either supervised or unsupervised methods. In our project, we applied unsupervised clustering methods because the number and types of samples to be clustered is increasing over time, as more layouts are being generated

Applying Clustering Techniques

395

and stored in the repository. However, there is no guarantee that the optimum results will be achieved with such methods [15]. Therefore, the project aims to apply known cluster methods to our data and compare their outputs. At the time of writing, the housing generation processes have created over 10,000 layouts. There are sixteen characteristics that describe a layout in terms of space, circulation and services, Table 1. Table 1 Attribute list that characterize a housing layout Attribute name Surface Rooms Private rooms Public rooms Water-closet Toilets Water-closet at the center Balcony Room extensions Building depth Entrance type Wet spaces segregation Rooms with water-closet Circulation space Exterior spaces Kitchen integrated in living rm

Description Surface of the housing unit in square meters Number of rooms Number of private rooms Number of public rooms Number of water-closets Number of toilets Yes/No Yes/No Yes/No Depth of the building Staircase, walkway Distance value from toilettes to other rooms Yes/No Surface of the circulation space of the apartment Surface of exterior spaces (galleries, balconies) Yes/No

The types of attributes are numeric and Boolean, and most of them are numeric with a known range. The attributes listed in Table 1 are used in different ways. For instance, cluster algorithms use them as dimensions of the input elements, and the architect user uses them to perform queries. We have opted for Self-Organizing Maps and k-means because they are two well-established techniques that are unsupervised, easy to understand and implement, have a linear time complexity, and are successfully used to cluster large amounts of data [16]. To compare them, we have used the following quality measures: • Q: This takes into account the intra-cluster and inter-cluster distances [15]. Also, the weight of a cluster (e.g., its size) is used in the is the inter-cluster distance where is measurement. In Figure 6, the minimum distance between the samples in the cluster and all the

396

Á. Sicilia, L. Madrazo, and M. González

other samples in the remaining clusters. The variable , is the value of the attribute i of the sample k; and is the value of the attribute i of the centroid. is the distance intra-cluster, where is the number of elements of the cluster. is a weight indicator proportional to the size cluster. min

,

1

min

,

,

,

, ,

1, 2 …

,

Fig. 6. Equations of Q quality measurement

• Qn: This measurement indicates how close a sample is to its cluster with regard to the others clusters. In other words, it indicates the cohesion of the clusters. The variable is the distance of the element i to its cluster and is the distance to the closest remaining cluster.

max

,

Fig. 7. Equation of Qn quality measurement

• QNN: This measurement indicates the cohesion of the samples. The equation is similar to the previous one, but it uses the distance between samples instead of distances between clusters. The variable d is the distance of the sample i to the closest sample from the same cluster; and r is the distance to the closest sample from a different cluster. To make it less sensible to noise, a variant of this measure uses the mean of several samples instead of only one. ̌ max

̌,

Fig. 8. Equation of Qnn quality measure

Clustering Algorithms Self-Organizing Maps (SOM) is a classification technique based on a type of unsupervised, competitive neural network with a regular distribution

Applying Clustering Techniques

397

grid that can discover the underlying relationships between data. This technique aims to reduce the number of data dimensions using neural networks [17]. One of its most remarkable advantages is that it can show the quality of the results graphically. Another significant characteristic is that it can show data similarities. We have implemented the original algorithm which has less computational load [7]. We have used a Gaussian neighborhood function in a rectangular grid with variable dimensions. The number of executions is set at twice the input samples. The learning rate has been set to be sensitive enough to get a maximum number of clusters with the largest cohesion possible. The visualizing power of this technique is not relevant to our case. We have implemented an automatic process to label the nodes and merge the similar ones based on the U-Matrix [18] without information on the input classes. This iterative process follows these steps: 1. First, the process selects a neuron with the minimum similitude value. 2. If there are input samples for which this neuron is the winner, then it is labeled. 3. Afterwards, it searches for other neurons with the same similitude within a small threshold; these neurons are labeled like the first one. 4. Go to step 1 if there are unlabeled neurons. The k-means technique is a type of cluster analysis whose goal is to minimize the quadratic function error [19]. The inputs of the algorithm are the samples to be clustered and the number of partitions. It is important to accurately choose the proper number of partitions, the centroid initialization method and the appropriate distance function. We have taken the number of clusters value from the SOM results as input, and we have tested four centroid initialization methods – random values, random domain values, random sample and D2 weighting – and three distance functions: Euclidian, Manhattan and Hamming. The D2 weighting method [20] uses probabilities to improve the original k-means. It is an iterative process, where first a centroid is chosen as a sample from an input samples list. Then, the next centroid is calculated by selecting the farthest sample with the proportional probability to the minimum distance between the element and the previous centroid.

Results We have executed the algorithms 200 times with different configurations and compared the results. Both techniques use the architectural attributes of the floor plan layouts. We have tested three different k-means

398

Á. Sicilia, L. Madrazo, and M. González

configurations and one SOM configuration. The statistics in Table 2 show that the different methods behave similarly. Taking into account the standard deviation, we can see that the SOM method is more stable than others according to the executions. Table 2 Statistics generated after 200 executions Q

Qn

Qnn

K-MEANS

Existing elements

1,112 ± 0,387

0,530 ± 0,020

0,997 ± 0,0006

K-MEANS

Domain values

1,454 ± 1,076

0,534 ± 0,038

0,998 ± 0,0004

K-MEANS

D2 Weighting

1,230 ± 0,914

0,532 ± 0,031

0,997 ± 0,0006

SOM

20x20

1,114 ± 0,114

0,520 ± 0,013

0,996 ± 0,0004

In order to test the significance of the measurements, we have plotted the quartiles of the results of the different methods. The quality measurement Q relates the cohesion of the samples and the spread between clusters: the lower the value the better the results. Figure 9a shows that there are no significant differences between the results obtained by each method. Therefore, we cannot compare these results. The Qn measurement determines the unity of the input samples with their clusters, in this case, the closer the value is to 1 the better. We can see that the k-means method with domain values has the best result, but Figure 9b shows there are no significant differences between methods. Finally, for the measurement Qnn which evaluates the cohesion of the samples, we can see that the behavior of all methods is significantly different, so we can conclude that for this measurement the k-means with domain values is the best option.

(a) Quartile plot for the Q measure

(b) Quartile plot for the Qn measure

Applying Clustering Techniques

399

(c) Quartile plot for the Qnn measure

Fig. 9. Quartile plot for the different quality measurements

Once we have assessed the results, it is difficult to choose the best method from this data because they all perform similarly, and some quality measurements cannot be used to compare them. Because of this, we have turned to the best execution of all the methods to cluster the floor plans. The number of clusters is high enough to express all of the variations. With the help of the interface, Figure 5, of the Housing Selection, we have manually checked the quality of the clusters. As seen in Figure 10, the final clusters have a high degree of cohesion and are homogeneous.

Fig. 10. Clusters generated with a SOM algorithm

Conclusions We have successfully integrated a rule-based generative process which generates housing floor plans with an information retrieval system which returns a series of plans with the help of clustering techniques. The application of the search tools has demonstrated that they provide proper housing units, thus facilitating the navigation through the repository. With

400

Á. Sicilia, L. Madrazo, and M. González

the data we have used, there have been no substantial differences in the performance of the two clustering techniques. Further developments of the visualization capacities inherent to SOM techniques and the interfaces that support them would enhance the cognitive potential of clustering techniques.

Acknowledgements This research project was carried out with the support of grant BIA200508707-C02-01 from the Spanish National RDI Programme, 2005-2008. We would like to thank Francesc Teixidó, professor in the Computer Science department at Enginyeria La Salle, for his advice and support in interpreting the results.

References 1. Chien, S.F., Shih, S.G.: A Web Environment to Support User Participation in the Development of Apartment Buildings. In: Special Focus Symposium on WWW as the Framework for Collaboration, InterSymp., Baden-Baden, Germany, pp. 225–231 (2000) 2. Gerzso, J.M.: Automatic generation of layouts of an Utzon housing system via the Internet. Reinventing the Discourse - How Digital Tools Help Bridge and Transform Research, Education and Practice in Architecture. In: 21st Annual Conference of the ACADIA, Buffalo, New York, pp. 202–211 (2001) 3. Huang, J.C., Krawczyk, R.: A Choice Model of Consumer Participatory Design for Modular Houses. In: 25th International Conference Aided Architectural Design in Europe, Germany, pp. 679–686 (2007) 4. Madrazo, L., Sicilia, A., González, M., Martin, A.: Integrating floor plan layout generation processes within an open and collaborative system to design and build customized housing. In: Tidafi, T., Dorta, T. (eds.) Joining Languages, Cultures and Visions: CAADFutures, pp. 656–670 (2009) 5. Deng, Q.: Combining Self-Organizing Map and K-Means Clustering for Detecting Fraudulent Financial Statements. In: IEEE International Conference on Granular Computing, GRC 2009, pp. 126–131 (2009) 6. Chen, Y., Zhang, Y., Hu, J., Yao, D.: Pattern Discovering of Regional Traffic Status with Self-Organizing Maps. In: Intelligent Transportation Systems Conference, ITSC 2006, pp. 647–652. IEEE, Los Alamitos (2006) 7. Kohonen, T.: Self organization of a massive document collection. IEEE Transactions on Neural Networks 11(3), 574–585 (2000) 8. Zhong, W.: Improved K-Means Clustering Algorithm for Exploring Local Protein Sequence Motifs Representing Common Structural Property. IEEE Transactions on NanoBioscience 4(3), 255–265 (2005)

Applying Clustering Techniques

401

9. Lin, C., Chiu, M.: Smart Semantic Query of Design Information in a Case Library. Digital Design: Research and Practice. In: 10th International Conference on CAADFutures, pp. 125–135 (2003) 10. Inanc, B.S.: Casebook. An Information Retrieval System for Housing Floor Plans. In: CAADRIA 2000, 5th Conference on Computer Aided Architectural Design Research in Asia, Singapore, pp. 389–398 (2000) 11. Lim, S., Prats, M., Chase, S., Garner, S.: Categorisation of Designs According to Preference Values for Shape Rules. In: Gero, J.S., Goel, A.K. (eds.) Design Computing and Cognition, pp. 41–60. Springer, Heidelberg (2008) 12. Steadman, J.P.: Architectural Morphology. Pion Limited, London (1983) 13. Quintarelli, E.: Facetag: Integrating Bottom-up and Top-down Classification in a Social Tagging System. Las Vegas IA Summit (2007) 14. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, Addison-Wesley, New York (1999) 15. Savaresi, S.: Cluster selection in divisive clustering algorithms. In: 2nd SIAM ICDM, Arlington, VA, USA, pp. 299–314 (2002) 16. Jain, A.K.: Data clustering: A Review. ACM Computing Surveys 31(3) (1999) 17. Kohonen, T.: Self-Organizing Maps. Springer, New York (1995) 18. Ong, J.: Data Mining Using Self-Organizing Kohonen maps: A Technique for Effective Data Clustering & Visualization. In: International Conference on Artificial Intelligence (IC-AI), Las Vegas (1999) 19. MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. In: 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297. University of California Press, Berkeley (1967) 20. Arthur, D., Vassilvitski, S.: K-Means++: The advantages of careful seeding. In: Bansal, N., Pruhs, K., Stein, C. (eds.) 18th Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, Louisiana, pp. 1027–1035 (2007) 21. Singhal, A.: Modern Information Retrieval: A Brief Overview. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 24(4), 35–43 (2001) 22. Aghagolzadeh, M.: Finding the number of clusters in a dataset using information theoretic hierarchical algorithm. Electronics, Circuits and Systems. In: ICECS. 13th IEEE International Conference, Nice, France, pp. 1336–1339 (2006) 23. Michalski, R., Stepp, R.: Learning from observation: Conceptual clustering. Machine Learning: An Artificial Intelligence Approach, pp. 471–498. Morgan Kaufmann, Los Altos (1986) 24. Baçao, F., Lobo, V., Painho, M.: Self-organizing Maps as Substitutes for K-Means Clustering. In: 5th International Conference Computational Science ICCS, Atlanta, GA, USA (2005) 25. Nguyen, Q.H., Rayward-Smith, V.J.: Internal quality measures for clustering in metric spaces. Int. J. Business Intelligence and Data Mining 3(1), 4–29 (2008)