Wye Valley and. Lorne Fires. 25-31 Dec, 2015. Bush Fires. Examples: ⢠Modelling Extreme and High Impact events â BoM. ⢠NWP, Climate Coupled Systems ...
The Dawn of the Exascale Age: Using Integrated HPC and Connected Data Dr Ben Evans
nci.org.au @NCInews nci.org.au
What is Exascale – more than ExaFLOP (10^18 operations per second) • •
Exascale practically means addressing multiscale science problems at 1000 times better than achievable on current petaflop systems. US National Strategic Computing Initiative (NSCI) (July 29, 2015) to maximise the benefits of HPC for US economic competitiveness & scientific discovery. https://www.whitehouse.gov/the-press-office/2015/07/29/executive-order-creating-national-strategic-computing-initiative
•
•
The NSCI is a whole-of-government effort designed to create a cohesive, multiagency strategic vision and Federal investment strategy, executed in collaboration with industry and academia, to maximize the benefits of HPC for the United States. There are three lead agencies for the NSCI: the Department of Energy (DOE), the Department of Defense (DOD), and the National Science Foundation (NSF). – The DOE Office of Science and DOE National Nuclear Security Administration will execute a joint program focused on advanced simulation through a capable exascale computing program emphasizing sustained performance on relevant applications and analytic computing to support their missions. – NSF will play a central role in scientific discovery advances, the broader HPC ecosystem for scientific discovery, and workforce development. – DOD will focus on data analytic computing to support its mission. © National Computational Infrastructure 2016
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
Principles of NSCI
•
Coordinated Federal strategy guided by four principles: 1. The United States must deploy and apply new HPC technologies broadly for economic competitiveness and scientific discovery. 2. The United States must foster public-private collaboration, relying on the respective strengths of government, industry, and academia to maximize the benefits of HPC. 3. The United States must adopt a whole-of-government approach that draws upon the strengths of and seeks cooperation among all executive departments and agencies with significant expertise or equities in HPC while also collaborating with industry and academia. 4. The United States must develop a comprehensive technical and scientific approach to transition HPC research on hardware, system software, development tools, and applications efficiently into development and, ultimately, operations. Directed to capable scientific computing – science and mission applications, scalable software stack and data software, integrated engineering for supercomputer systems. © National Computational Infrastructure 2016
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
Engagement with Societal Application Agencies
•
Deployment Agencies. There are five deployment agencies for the NSCI: • • • • •
the National Aeronautics and Space Administration (NASA), the Federal Bureau of Investigation (FBI), the National Institutes of Health (NIH), the Department of Homeland Security (DHS), and the National Oceanic and Atmospheric Administration (NOAA).
– Agencies participate in the co-design process to integrate the special requirements of their respective missions and influence the early stages of design of new HPC systems, software, and applications. – Agencies will also have the opportunity to participate in testing, supporting workforce development activities, and ensuring effective deployment within their mission contexts.
© National Computational Infrastructure 2016
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
NCI High Performance Scaling activities 2014-16 supported by Fujitsu •
Objectives: • Upscale and increase performance of high-profile community codes – especially for Government and Research community
• Year 1 • Characterise and scale critical BoM weather and climate operational applications for higher resolution • Best practise configuration for improved throughput • Establish analysis toolsets and methodology
• Year 2 • Characterise, Optimise and Tune of next generation high priority applications • Select high priority earth systems and geophysics codes for scalability. • Parallel Algorithm Review and I/O optimisation methods for Next-gen scaling
• Year 3 • Assess codes for scalability, encourage adoption from “best in discipline” • new software and hardware technologies to better assess performance & energy efficiency © National Computational Infrastructure 2016
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
Exascale Earth Systems Science Research and Societal Impacts Examples: • Modelling Extreme and High Impact events – BoM • NWP, Climate Coupled Systems and Data Assimilation – BoM, CSIRO, Uni’s. • Hazards - Geoscience Australia, BoM, States • Geophysics – Geoscience Australia, Universities • Monitoring the Environment and Ocean – ANU, BoM, CSIRO, GA, IMOS, TERN, States • International research – Universities Tropical Cyclones
Cyclone Winston 20-21 Feb, 2016
Volcanic Ash
Manam Eruption 31 July, 2015
© National Computational Infrastructure 2016
Bush Fires
Wye Valley and Lorne Fires 25-31 Dec, 2015
Flooding
St George, QLD February, 2011
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
ACCESS - Numerical Weather Prediction (NWP) A prime requirement for the Bureau of Meteorology will is to provide vastly improved
weather prediction, including better resolution and severe events: • • • •
High resolution: 1-1.5km Data assimilation Rapid update cycles High resolution ensembles APS-2
APS-3
APS-4
Op: 2016
Op: 2017-2018
Op: 2019-2020
ACCESS-G
25km {4dV}
12km {4dVH}
12km {4dVH/En}
ACCESS-R
12km {4dV}
8km {4dVH}
4.5km {4dVH/En}
ACCESS-TC
12km {4dV}
4.5km {4dVH}
4.5km {4dVH}
ACCESS-GE
60km (lim)
30km
30km
ACCESS-C
1.5km {FC}
1.5km {4dVH}
1.5km {4dVH/En}
ACCESS-CE
-
2.2km (lim)
1.5km
ACCESS-X
-
1.5km {4dVH}
1.5km {4dVH/En}
ACCESS-XE
© National Computational Infrastructure 2016
-
1.5km
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
International UK Unified Model Development Partnership
Shared science, model evaluation and technical development: • Joint process evaluation groups • Technical infrastructure teams • User workshops & tutorials
A foundation for relationships with other organisations: • Science & model development • Weather & climate services • Jointly growing with businesses
Operational users complemented by: • research partners in national / international universities & organisations • capacity building consultancy projects with other partners
© National Computational Infrastructure 2016
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
Fully coupled Earth System Model
Core Model • • • •
Atmosphere
Terrestrial
Atmospheric chemistry
Coupler
Carbon
Atmosphere – UM 10.4 Ocean – MOM 5.1 Sea-Ice – CICE5 Coupler – OASIS-MCT
Carbon cycle (ACCESS-ESM1) • Terrestrial – CABLE • Bio-geochemical • Couple to modified ACCESS1.3
Ocean Oceanand andsea-ice sea-ice
Aerosols • UKCA • Couple to ACCESS-CM2
© National Computational Infrastructure 2016
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
IPCC Climate Reports: CMIP1 through to CMIP5 Data Volumes
© National Computational Infrastructure 2016
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
Global infrastructure supporting reproducible scientific analysis Earth System Grid Federation: Exemplar of an International Collaboratory for large scientific data and analysis
© National Computational Infrastructure 2016
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
What is the difference between 0.25 and 0.1 degree?
Now need to move to 0.03 degree and more coupled systems © National Computational Infrastructure 2016
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
The Queensland storm surge - sea surface height using ROMS.
© National Computational Infrastructure 2016
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
Himawari-8 Observations, Data Assimilation and Analysis Captured at JMA, Processed after acquisition at BoM Made available at NCI Data Products still to be generated, but first stage was to make the image data available.
© National Computational Infrastructure 2016
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
Earth Observation Time Series Analysis •
•
Over 300,000 Landsat scenes (spatial/temporal) allowing flexible, efficient, large-scale in-situ analysis Spatially-regular, time-stamped, band-aggregated tiles presented as temporal stacks.
Continental-Scale Water Observations from Space
WOFS water detection
Spatially partitioned tiles
•
27 Years of data from LS5 & • LS7(1987-2014)
•
25m Nominal Pixel Resolution
Temporal Analysis •
© National Computational Infrastructure 2016
Approx. 300,000 individual • • source ARG-25 scenes in approx. 20,000 passes
Ben Evans, eResearch Conference, Oct 2016
Entire 27 years of 1,312,087 ARG25 tiles => 93x1012 pixels visited 0.75 PB of data 3 hrs at NCI (elapsed time) to compute.
nci.org.au
EU Copernicus Sentinel Earth Observation: • • •
Six families of satellites: Sentinels 1-6 progressively from 2014 – Monitoring of land, ocean, vegetation, soil, altimetry, etc. Australia to provide the regional data access and analysis hub Consortium: GA, CSIRO, State Govt. agencies (WA, NSW, Qld)
© National Computational Infrastructure 2016
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
Combining HPC &HPD: Prediction of hazards at local scales • • •
Modelling tropical cyclones to capture peak wind speed near the eye require 1-2 km resolution calculations Impacts of hazard events vary at 30 metre scale — landscape variations (topography/land cover) Risk analysis requires large ensembles (106) to be modelled and impacts analysed at landscape scales
© National Computational Infrastructure 2016
Ben Evans, eResearch Conference, Oct 2016 Be Evans, AMSI Conference, June 2016
nci.org.au
Emerging Petascale Geophysics codes -
Assess priority Geophysics areas - 3D/4D Geophysics: Magneto-tellurics, AEM - Hydrology, Groundwater, Carbon Sequestration - Forward and Inverse Seismic models and analysis (onshore and offshore) - Natural Hazard and Risk models: Tsunami, Ash-cloud
-
Issues - Data across domains, data resolution (points, lines, grids), data coverage - Model maturity for running at scale - Ensemble, Uncertainty analysis and Inferencing
© National Computational Infrastructure 2016
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
Multi-Year Strategic Science and Services Plan - Total Water Prediction FY 19-24 and National Water Centre Major Integration c/- David Maidment – CUAHSI and the National Flood Interoperability Experiment
FY 17-22 Major Integration
FY 16-21 Key Enhancement
FY 15-20 Core Capability
Centralized Water Forecasting • National Water Model (NWM) operational May 2016 ² Water forecasts for 2.7 million stream reaches in U.S.
Flash Flood and Urban Hydrology • Enhance NWM with nested hyperresolution zoom capability and urban hydrologic processes ² Heightened focus on regions of interests (e.g. follow storms)
Coastal Total Water Level
• Couple NWM with marine models to predict combined storm surge, tide, and riverine effects ² More complete picture of coastal storm impacts
Key Enhancement
Dry Side: Drought and Post-Fire • Couple NWM with groundwater and transport models to predict low flows, drought and fire impacts ² Add NWM processes that affect subsurface water movement and storage during dry conditions
² Water prediction information linked to geospatial risk and vulnerability
² Add NWM ability to track constituents (e.g. sediment, contaminants, nutrients) through stream network ² New decision support services for water shortage situations and waterborne transport
² 100 million people get a terrestrial water forecast for first time
² Street level flood inundation forecasts for selected urban demonstration areas
² New service delivery model implemented – increased stakeholder engagement and integrated information
² National Water Center (NWC) begins providing daily situational awareness and guidance to NWS field offices
² NWC increases guidance to NWS field offices to improve consistency and services for flash floods
² NWC operations center opens and provides national decision support services and situational awareness
© National Computational Infrastructure 2016
FY 18-23
² NWC operations center expands to include drought and post-fire decision support services
19
Ben Evans, eResearch Conference, Oct 2016
Water Quality • Integrate enhanced NWM with key water quality data sets, models and tools to begin water quality prediction ² Incorporate water quality data from federal and State partners into NWM ² Link NWM output to NOAA ecological forecasting operations ² New decision support services for predicting water quality issues such as Harmful Algal Blooms ² New decision support services for emergencies such as chemical spills ² NWC operations center expands to include water quality decision support services
nci.org.au
© National Computational Infrastructure 2016
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
Building Genomics data analysis and sharing platforms
The arrival of the “$1,000” genome
© National Computational Infrastructure 2016
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
NCI National Reference Earth Systems Datasets NCI Proposal to NCRIS RDSI (RDS) for a High Performance Data Node to: •
Enable dramatic increases in the scale and reach of Australian research by providing nationwide access to enabling data collections;
•
Specialise in nationally significant research collections requiring high-performance computational and data-intensive capabilities for their use in effective research methods;
•
Realise synergies with related national research infrastructure programs
As a result, Researchers will be able to: •
share, use and reuse significant collections of data that were previously either unavailable to them or difficult to access
•
access the data in a consistent manner which will support a general interface as well as discipline specific access
•
use the consistent interface established/funded by this project for access to data collections at participating institutions and other locations as well as data held at the Nodes
© National Computational Infrastructure 2016
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
NCI National Earth Systems Research Data Collections 1. Climate/ESS Model Assets and Data Products 2. Earth and Marine Observations and Data Products 3. Geoscience Collections 4. Terrestrial Ecosystems Collections 5. Water Management and Hydrology Collections http://geonetwork.nci.org.au Data Collections
Approx. Capacity
CMIP5, CORDEX, ACCESS Models
5 Pbytes
Satellite Earth Obs: LANDSAT, Himawari-8, Sentinel, MODIS, INSAR
2 Pbytes
Digital Elevation, Bathymetry Onshore/Offshore Geophysics
1 Pbytes
Seasonal Climate
700 Tbytes
Bureau of Meteorology Observations
350 Tbytes
Bureau of Meteorology Ocean-Marine
350 Tbytes
Terrestrial Ecosystem
290 Tbytes
Reanalysis products
100 Tbytes
© National Computational Infrastructure 2016
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
NCI’s Australian Geophysics Data Collection Australian Geophysics Data Collection
Collection National Grids
Gravity
Mag
Radiometrics
Theme
Gravity
Mag
Radiometrics
AEM
Seismic
MT
Seismology
Survey
Survey 1
Survey 2
Survey 3
Survey 4
Survey 5
Survey 6
Survey n
Grids
20 m
40 m
30 m
80 m
10 m
50 m
10 m
50 m
20 m
Lines Points
See Lesley Wyborn’s talk on Thursday © National Computational Infrastructure 2016
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
40 m
Data Classified Based On Processing Levels Level*
points grids
Proposed Name
Description*
0
Raw Data
Instrumental data as received from sensor. Includes any and all artefacts.
1
Instrument Data
Instrument data that have been converted to sensor units but are otherwise unprocessed. Data includes appended time and platform georeferencing parameters (e.g., satellite ephemeris).
2
Calibrated Data
Data that has undergone corrections or calibrations necessary to convert instrument data into geophysical value. Data includes calculated position.
3
Gridded Data
Data that has been gridded and undergone minor processing for completeness and consistency (i.e., replacing missing data).
4
“Value-added” Data Products
Analytical (modelled) data such as those derived from the application of algorithms to multiple measurements or sensors.
5
Model-derived Data Products
Data resulting from the simulation of physical processes and/or application of expert knowledge and interpretation.
HPD
*The level numbers and descriptions above follow definitions used in satellite data processing, as defined by NASA. (see ; ; ). © National Computational Infrastructure 2016
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
Enable global and continental scale as well as to local/catchment/plot scalescale-down
• NWP and Forecasts UM, APS3 (Global, Regional, City), ACCESS-TC • Coupled Seasonal and Decadal Climate ACCESS-GC2/3 (GloSea5) • Data Assimilation 3D-VAR, 4D-VAR (Atmosphere), EnKF (Ocean)
• • • Ocean Forecasting and Research • OceanMaps, BlueLink, MOM5, CICE/SIS, WW3, • ROMS • Fully-Coupled Earth System Model • ACCESS-CM, ACCESS-ESM, CMIP5/6 © National Computational Infrastructure 2016
Water availability and usage over time Catchment zone Vegetation changes Data fusion with point-clouds and local or other measurements Statistical techniques on key variables
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
Transform data to become transdisciplinary and born-connected
• A call to action for a Transdisciplinary approach starting at the conception of data collections • Researchers across the science disciplines, the social sciences and those beyond academia need to work together to enable horizontal interoperability for: -> high end researchers, students and the general public. • Then achieve interoperability and information will be accessible to all sectors
© National Computational Infrastructure 2016
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
National Earth Systems Research Data Interoperability Platform: a simplified view Compute Intensive
Virtual Laboratories
Fast/Deep Data Access
NERDIP Data Platform Server-side Data functions Services
© National Computational Infrastructure 2016
Portal views
Ben Evans, eResearch Conference, Oct 2016
Machine Connected
Program access
nci.org.au
National Environmental Research Data Interoperability Platform (NERDIP) Biodiversity & Climate Change VL
Climate & Weather Science Lab
eMAST Speddexes
eReefs
AGDC VL
All Sky Virtual Observatory
VGL
Globe Claritas
VHIRL
Open Nav Surface
Workflow Engines, Virtual Laboratories (VL’s), Science Gateways Ferret, NCO, GDL, Fortran, C, C++, Python, R, Models GDAL, GRASS, QGIS MPI, OpenMP MatLab, IDL
Visualisation Drishti
ANDS/RDA AODN/IMOS TERN AuScope Portal Portal Portal Portal
Data. gov.au
Digital Bathymetry & Elevation Portal
Tools Data Portals
National Environmental Research Data Interoperability Platform (NERDIP) Open DAP
OGC W*TS
OGC SWE
OGC W*PS
OGC
netCDF-CF
WCS
OGC WFS
OGC WMS
RDF, LD
Data Conventions
Fast “whole-of-library” catalogue
CS-W
Direct Access
Services Layer
Vocab PROV Service Service
ISO 19115, ACDD, RIF-CS, DCAT, etc. GDAL
API Layers
HP Data Library Layer
NetCDF4 Climate, Weather
NetCDF4 Ocean Bathy
ASDF HDF5
PH5 HDF5
NetCDF4 EO
HDFEOS
[Airborne [SEG-Y] Geophysics]
[FITS]
[LAS LiDAR]
HDF5
Other Legacy formats
Lustre
Object Storagenci.org.au
Enabling transparency, reproducibility, informatics & deep learning techniques
© National Computational Infrastructure 2016
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
Climate and Weather Science Laboratory The CWSlab provides an integrated national facility for research in climate and weather simulation and analysis. • To reduce the technical barriers to using state of the art tools; • To facilitate the sharing of experiments, data and results; • To reduce the time to conduct scientific research studies; and • To elevate the collaboration and contributions to the development of the Australian Community Climate Earth-System Simulator (ACCESS)
ACCESS Modelling © National Computational Infrastructure 2016
Data Services
Computational Infrastructure
Climate Analysis
Ben Evans, eResearch Conference, Oct 2016
nci.org.au
Working in the era of Exascale • •
• •
Key Messages for raising a Data Centre in a Big Data World Scientific Computing scales of today have to be built across collaborations of national priorities and national institutions that need to scale up and scale-down Data needs to be born-connected, Transdisciplinary Data: interoperable international standards for data collections are critical for allowing complex interactions in HP environments both within and between HPD collections at are applied at birth Needs expertise around usability and performance tuning to ensure getting the most out of the data.
Collaborative efforts across disciplines and collaboration across nations
© National Computational Infrastructure 2016
Ben Evans, eResearch Conference, Oct 2016
nci.org.au