NSF Cyberinfrastructure User Survey 2005

2005 NSF Cyberinfrastructure User Survey
September 2005

Introduction

To ensure the effective support of its broad user community, SDSC and NCSA have conducted annual user surveys for many years. In August and September 2005, the annual survey was conducted for the first time as a joint activity of the Cyberinfrastructure Partnership (SDSC and NCSA) and the TeraGrid. This year's survey not only showed the continued success of the NSF-supported cyberinfrastructure environment, but also demonstrated the user community's general satisfaction with the resources, software and services from all NSF cyberinfrastructure sites.

The 2005 NSF Cyberinfrastructure User Survey was prepared and administered by David Hart of SDSC and Dave McWilliams of NCSA, with help from Nancy Wilkins-Diehr (SDSC), John Towns (NCSA), Sergiu Sanielevici (PSC), Charlie Catlett (Argonne National Lab) and many staff members within the CIP and TeraGrid. The survey was open for four weeks during August and September 2005. The 39 questions cover the areas of computational environment, grid environment and services, programming environment, software and tools, data and I/O, visualization, allocations, consulting and help desk support, training and documentation, and some optional demographic information.

We advertised the survey by sending e-mail messages to active users at NCSA, SDSC, PSC and TeraGrid. The potential respondent pool included approximately 2,600 e-mail addresses from SDSC, 1,900 from TeraGrid, 2,000 from PSC, and 2,200 from NCSA; there is some overlap among these lists. A total of 412 complete and partial responses were recorded. Based on a high estimate of 7,000 unique addresses in the respondent pool, we achieved a response rate of better than 5%. CIP and TeraGrid sites will be able to make decisions about resources, services, and software based on this survey.

A detailed report with complete questions as well as the survey data is below. The various components of the survey are reviewed and the responses summarized. Specific questions from the survey are shown with a gray background. Quantitative responses are presented as tables. For questions that permitted lengthy text answers, the individual responses are shown in bulleted lists and are provided without editing other than correcting obvious spelling and grammar errors.

Table of Contents

  Contact Information
NSF division or directorate
Scientific Successes
Computational Environment NSF-supported systems: satisfaction level
Mass storage systems: satisfaction level
NSF-supported sites: satisfaction level
Suggestions for improving the NSF cyberinfrastructure resource environment
Grid Environment and Services Grid tools which are used
Experience or plan to use grid tools and services
Web-based portal: features desired
Interest in pre-production and trial grid services
Suggestions for improving grid environment
Programming Environment, Software and Tools Nature of parallelism in your code
How the code is parallelized.
Mathematical software/libraries used.
Parallel/performance tools used
Third-Party scientific software used
Suggestions for improving the software offerings.
Data and I/O Data software, libraries or services used
Interest in persistent online storage area
Ideal access method for accessing persistent storage area
Use or planned use of public data collections
Suggestions for improving data storage and I/O environments
Visualization NSF visualization systems: satisfaction level
Scientific visualization applications used
Suggestions for improving NSF visualization software offerings
Allocations Allocations process: satisfaction level
Suggestion for improving allocations process
Consulting and Help Desk Support Phone consulting services: satisfaction level
Email consulting services: satisfaction level
Preferred method to contact help desk/consultants
Suggestions for improving direct user support
Training and Documentation Training activities in which you participated
Documentation: satisfaction level
Suggestions for improving training and documentation
Demographic Details Your institution affiliation
Your position
Gender
Race

 

1. Your contact information (optional, but required for iPod drawing):

Remarks/Plans: Almost all respondents included their contact information. CIP and TeraGrid sites will use this information to follow up with users.

 

2. Please select the NSF division or directorate (or corresponding area of science) that most closely represents your research:

Remarks/Plans: Nearly 40% of the respondents were in the top three areas: Chemistry, Molecular and Cellular Biology and Physics. (The table is sorted by number of responses in descending order.)

 
  Number of Responses Response Ratio
Chemistry 57 17%
Molecular and Cellular Biology 37 11%
Physics 33 11%
Astronomical Sciences 20 10%
Other 18 6%
Computer and Network Systems 18 5%
Materials Research 17 5%
Atmospheric Sciences 17 5%
Chemical and Transport Systems 12 5%
Civil and Mechanical Systems 9 4%
Earth Sciences 7 3%
Bioengineering and Environmental Systems 6 2%
Computer and Communications Fundamentals 6 2%
Ocean Sciences 5 2%
Mathematical Sciences 3 1%
Information and Intelligent Systems 3 1%
Social, Behavioral and Economic Sciences 1 0%
Education and Human Resources 1 0%
Electrical and Communications Systems 1 0%
Environmental Biology 1 0%
Polar Sciences 0 0%
TOTAL 335

 

3. Please describe any scientific successes from your computational research as enabled by NSF cyberinfrastructure resources. Please provide URLs for further details, if available.

Remarks/Plans: Nearly 100 URLs are listed below. See the Survey Summary for a list of representative successes.


Computational Environment

 

4. Please rate your satisfaction level with any NSF-supported systems that you have used in the last year. (If you haven't use a system, please select 'N/A' or skip that item.)

Remarks/Plans: The top seven systems had an average rating of "somewhat satisfied" or better. The lowest rated systems were rated between "neutral" and "somewhat satisfied". The table is sorted by the satisfaction level in descending order. (5=extremely satisfied, 4=somewhat satisfied, 3=neutral, 2=somewhat dissatisfied, 1=extremely dissatisfied).

  Satisfaction Level (1-5) Number of responses
SDSC DataStar IBM p655 4.2 67
NCSA TeraGrid IA64 Cluster 4.2 95
SDSC DataStar IBM p690 4.1 64
NCSA Xeon Cluster (tungsten) 4.1 100
NCSA IBM p690 (copper) 4.0 106
SDSC TeraGrid IA64 Cluster 4.0 70
PSC TCS1 (Lemieux) 4.0 98
NCSA SGI Altix (cobalt) 3.9 64
PSC HP Marvel (Rachel) 3.9 50
TACC Cray-Dell Cluster (Lonestar) 3.8 20
NCSA Condor Pool (radium) 3.7 15
ANL TeraGrid IA64 Cluster 3.7 32
Caltech TeraGrid IA64 Cluster 3.5 35
Purdue IBM SP 3.5 14
Purdue Linux Clusters 3.5 14
TACC Sun Fire E25K (Maverick) 3.4 10
ORNL Xeon Cluster 3.4 11
Indiana AVIDD-I64 Cluster 3.4 17

 

5. Please rate your satisfaction level with the mass storage systems you have used in the past year. (If you haven't use a system, please select 'N/A' or skip that item.)

Remarks/Plans:: The mass storage systems had similar ratings to the computing systems above, with NCSA UniTree and SDSC HPSS rated slightly better than "somewhat satisfied." The table is sorted by the satisfaction level in descending order. (5=extremely satisfied, 4=somewhat satisfied, 3=neutral, 2=somewhat dissatisfied, 1=extremely dissatisfied).

Satisfaction Level (1-5) Number of responses
NCSA UniTree 4.1 132
SDSC HPSS 4.1 80
PSC File ARchiver (FAR) 4.0 70
Storage Resource Broker (SRB) 3.8 32
GridFTP/uberftp to UniTree 3.7 33
TACC Data Migration Facility (DMF) 3.6 10

 

6. Considering all your interactions with the resources and services at NSF-supported sites, please rate your satisfaction with the overall quality of the computing environment (computing systems, networks, consulting support, tools and software, etc.) at the following sites.

Remarks/Plans: The quality of the computing environment was generally rated higher than the individual computing systems at the sites. This may be at least partly due to the high ratings for user support. The table is sorted by the satisfaction level in descending order. (5=extremely satisfied, 4=somewhat satisfied, 3=neutral, 2=somewhat dissatisfied, 1=extremely dissatisfied).

Satisfaction Level (1-5) Number of responses
NCSA 4.3 210
TACC 4.2 18
SDSC 4.2 131
PSC 4.1 116
Oak Ridge National Lab 4.1 18
TeraGrid (overall) 4.0 148
Argonne National Lab 3.8 33
Purdue University 3.5 15
Caltech 3.4 29
Indiana University 3.4 19

 

7. Please offer any specific comments or suggestions for improving the NSF cyberinfrastructure resource environment. If you have comments for a particular site, please mention the site:

Remarks/Plans: Some of the common complaints are lack of reliability and availability, long queue wait times, lack of storage space, poor integration, lack of response and the difficultly of learning to use a complex computational environment. These comments will be passed on to the appropriate groups with the CIP and TeraGrid to follow up with users.


Grid Environment and Services

 

8. Which of the following grid tools are you using or would you be interested in using?

Remarks/Plans: Users are expanding their use of grid capabilities, although 'Don't know' tops the list of grid tools users are using or would be interested in using, 32% of respondents are using or interested in using MPICH/MPICH-G2 and 22% are using or interested in GridFTP. (The table is sorted by number of responses in descending order.)

 
Number of Responses Response Ratio
Don't know 101 33%
MPICH/MPICH-G2 97 32%
GridFTP 67 22%
None 52 17%
Globus Toolkit command-line programs 43 14%
Condor/Condor-G 41 13%
Globus Toolkit API 36 12%
Uberftp 31 10%
GSI-enabled ssh 28 9%
GridShell 26 8%
Custom portal/workbench (or developing one) 21 7%
Other, please specify 20 7%
Existing portal/workbench 16 5%
MyProxy 14 5%
MDS 2 1%
TOTAL 595

 

9. What is your experience or plan to use grid tools and services?

Remarks/Plans: 27% of users are starting to experiment with Grid tools, and another 28% may use them within two years; 18% have conducted actual research with grid tools and services. Only one user reported that he tried them and discontinued use. There is not enough information to determine why 26% say they have no plans to use them. (The table is sorted by number of responses in descending order.)

 
Number of Responses Response Ratio
May consider using them in the next 2 years 88 28%
Just starting to experiment with them 84 27%
No plan to use them 80 26%
Experienced, have conducted actual research using them 57 18%
Have tried them, but plan to discontinue use 1 0%
TOTAL 310

 

10. We are developing a web-based user portal that will allow users to manage allocations and accounts, inquire about system status, and perform other tasks. Please rate the following functions in terms of your interest:

Remarks/Plans: The strongest interest is in a metaqueue that will allow users to submit a batch job to next available computing resource. The table is sorted by the level of interest in descending order. (3=extremely interested, 2=somewhat interested, 1=not at all interested)

Level of Interest (1-3) Number of responses
Submit jobs to a "metaqueue" for next available resource 2.6 301
Reserve compute and visualization resources 2.4 294
Manage staging of data between sites 2.2 296
Discuss issues with other users 2.1 297

 

11. In response to user input, we have put in place a set of pre-production and trial services. Please indicate your interest in each.

Remarks/Plans: The greatest level of interest is in the wide-area parallel filesystem.

Extremely interested Somewhat interested Not interested Tried, but need help Tried, was not useful
GPFS Wide-area parallel file system 25% 39% 33% 1% 2%
Science Gateways 21% 38% 38% 1% 2%
Striped GridFTP service (tgcp) 21% 35% 41% 2% 1%
GridShell 17% 39% 41% 1% 1%

 

12. Please offer any specific comments or suggestions for improving our grid environment and services. If you have comments for a particular site, please mention the site.

Remarks/Plans: Some of the common complaints are the "learning curve", difficulty in using certificates, performance of transferring data from one system to another, and lack of understanding (or skepticism) of the benefits in using grid tools and services.


Programming Environment, Software and Tools

 

13. What is the nature of the parallelism in your code?

Remarks/Plans: Nearly half of the respondents indicated that nature of parallelism was domain decomposition or task/functional parallelism. (The table is sorted by number of responses in descending order.)

Number of Responses Response Ratio
Domain decomposition 90 29%
Task/functional parallelism 60 19%
Commercial software implementation 39 12%
Don't know 35 11%
None 28 9%
Embarrassingly parallel 21 7%
Matrix decomposition 21 7%
Other, please specify 20 6%
TOTAL 314

Other:

 

14. If you develop your own codes, how do you parallelize them?

Remarks/Plans: Most users use MPI to parallelize their codes. (The table is sorted by number of responses in descending order.)

Number of Responses Response Ratio
MPI 176 61%
None 46 16%
OpenMP and MPI mixed 39 13%
OpenMP 30 10%
Don't know 24 8%
Other, please specify 17 6%
Charm++ 14 5%
POSIX Threads 15 5%
Automatic Parallelization 10 3%
Condor 10 3%
HPF 2 1%
PVM 2 1%
Linda 1 0%
POOMA 0 0%

 

Other:

 

15. Which of the following mathematical software/libraries have you used in the last year on NSF-supported machines?

Remarks/Plans: FFTW, LAPACK, MatLab and IBM ESSL/PESSL are the most popular mathematical software/libraries. (The table is sorted by number of responses in descending order.)

Number of Responses Response Ratio
None 98 34%
FFTW 67 23%
LAPACK 66 23%
MatLab 53 18%
IBM ESSL/PESSL 40 14%
Intel MKL 35 12%
GSL (GNU Scientific Library) 31 11%
ATLAS 25 9%
Mathematica 24 8%
ScaLAPACK 22 8%
SGI SCSL (BLAS, LAPACK, FFTs) 19 7%
Don't know 18 6%
Other, Please Specify 18 6%
IMSL (VNI) 13 4%
Metis/ParMETIS 11 4%
NAG 10 3%
PETSc 9 3%
SuperLU 8 3%
(P)ARPACK 3 1%
SPRNG 4 1%
AZTEC 1 0%
DAGH 1 0%

 

Other:

 

16. Which of the following parallel / performance tools have you used in the last year on NSF-supported machines?

Remarks/Plans: MPI_wtime and TotalView are the most popular tools. Most users do not use parallel/performance tools, perhaps due to the lack of documentation and training. (The table is sorted by number of responses in descending order.)

Number of Responses Response Ratio
None 127 47%
MPI_wtime() 46 17%
TotalView 47 17%
Don't know 44 16%
gdbx 27 10%
MPICH MPE Jumpshot/Upshot 20 7%
PAPI 17 6%
dtime()/etime() 14 5%
HPM Toolkit 13 5%
pdbx 7 3%
PerfSuite 7 3%
Xprofiler 9 3%
Other, please specify 7 3%
Vampir 5 2%
DPCL/Dyninst 2 1%
HPCView 3 1%
PMAPI 3 1%
svPABLO 3 1%
Tau 2 1%
VProf 4 1%
KAI Guide/Guideview/Assure 0 0%
MPIP (LLNL) 1 0%
Paradyn 0 0%
PCT/PVT 1 0%

 

Other:

 

17. Please list the third-party (commercial or public domain) scientific software you use regularly on NSF computational resources.

Remarks/Plans: Other than compilers, the most popular software is in the areas of Chemistry and Molecular and Cellular Biology. (The table is sorted by number of responses in descending order.)

Software Number of Responses Response Ratio
Gaussian 30 12%
Charmm 12 5%
Compilers 12 5%
NAMD 12 5%
Amber 10 4%
IDL 10 4%
Abaqus 6 2%
HDF 6 2%
BLAST 4 2%
GADGET 4 2%
NCAR graphics 4 2%
Flash 3 1%
Fluent 3 1%
Gromacs 3 1%
NWChem 3 1%
python 3