CIP User Survey 2006

CIP User Survey 2006

To ensure the effective support of its broad user community, NSF-supported centers have conducted annual user surveys for many years. In October 2006, the annual survey was conducted for as an activity of the Cyberinfrastructure Partnership (CIP) sites, SDSC and NCSA. This year's survey not only showed the continued success of the NSF-supported CIP environment, but also demonstrated the user community's general satisfaction with CIP resources, software and services. The CIP sites will make decisions about resources, services, and software based on this survey. The survey responses are summarized below.

Survey Methodology

For the 2006 survey, the CIP sites undertook a new survey methodology. Rather than canvassing the entire user community, a random sampling method was employed. We constructed a total user population that included all users from non-staff projects who had charged time on any NCSA or SDSC resource as well as all PIs who had allocations on NCSA or SDSC resources (even if they had no time charged). Each user in this population of approximately 1,500 individuals was categorized according to the size and discipline of the largest aggregate allocation award with which they were associated. (ÒDACÓ awards include awards up to 30,000 SUs, ÒMRACÓ between 30,001 and 200,000, and ÒLRACÓ greater than 200,000.) Table 1 shows the overall population characteristics by discipline and award size.

From this population a survey sample was selected. The sample was weighted toward LRAC and MRAC users since most of the time consumed on CIP resources is used by such projects. Targeting a survey sample of about 300 users, we randomly selected 150 LRAC users/PIs (~50% of the sample), approximately 100 MRAC users (~30%), and approximately 75 DAC users (~25%). Table 2 shows the survey sample characteristics; note the close correspondence to percentage of users in each discipline for the total population.

The survey was open for four weeks during October 2006. The questions covered the areas of computational environment, grid environment and services, programming environment, software and tools, data and I/O, visualization, allocations, consulting and help desk support, training and documentation, and some optional demographic information. All questions were optional.

We contacted the users in the sample via e-mail. A first message was sent jointly from the directorÕs of NCSA and SDSC, encouraging participation. This was followed by an e-mail sent via the online survey hosting service we employed. Three reminder e-mails were also sent at approximately weekly intervals. The targeted users were given the further incentive of an iPod nano to two randomly chosen respondents.

Table 1. CIP User Population

DisciplineUsers
DACLRACMRACTotal% Total
Chemistry121627625917.9%
Chemical, Thermal Systems101303616711.5%
Astronomical Sciences31973216011.0%
Molecular Biosciences56841715710.8%
Materials Research52524214610.1%
Physics304620966.6%
Advanced Scientific Computing61129825.7%
Atmospheric Sciences333513815.6%
Mechanical and Structural Systems475523.6%
Computer and Computation Research326382.6%
Earth Sciences14158372.6%
Mathematical Sciences32322.2%
Design and Manufacturing Systems25251.7%
Biological and Critical Systems1057221.5%
Electrical and Communication Systems1342191.3%
Ocean Sciences5310181.2%
Environmental Biology122141.0%
Integrative Biology and Neuroscience65110.8%
Information, Robotics, and Intelligent Systems8190.6%
Social and Economic Science5270.5%
Microelectronic Information Processing Systems440.3%
Training3140.3%
Cross-Disciplinary Activities2130.2%
Behavioral and Neural Sciences220.1%
Biological Instrumentation and Resources 110.1%
Industrial Science and Technological Innovation110.1%
Networking and Communications Research110.1%
Grand Total7074522891,448

Table 2. CIP User Survey Sample.

DisciplineUsers
DACLRACMRACTotal% Total
Chemistry1321266018.5%
Astronomical Sciences432104614.2%
Molecular Biosciences62864012.3%
Materials Research617143711.4%
Chemical, Thermal Systems1110123310.2%
Physics3157257.7%
Atmospheric Sciences3125206.2%
Advanced Scientific Computing643134.0%
Earth Sciences25292.8%
Mechanical and Structural Systems6172.2%
Computer and Computation Research4261.9%
Ocean Sciences11351.5%
Biological and Critical Systems11241.2%
Electrical and Communication Systems12141.2%
Design and Manufacturing Systems330.9%
Integrative Biology and Neuroscience1230.9%
Mathematical Sciences330.9%
Environmental Biology1120.6%
Social and Economic Science1120.6%
Behavioral and Neural Sciences110.3%
Information, Robotics, and Intelligent Systems110.3%
Grand Total7815096324

Summary of User Survey Results

We received 121 complete and 7 partial responses from the initial 319 invitations, a response rate of 40%. (Five e-mail addresses were invalid.) This response rate is considerably higher than the lower response rate for the 2005 survey (which did not use random sampling). In addition, the responses correspond well to the targeted sample population characteristics, with 41% LRAC, 33% MRAC, 15% DAC respondents (8% selected ÒDonÕt know or OtherÓ). The disciplinary areas with the most respondents also generally corresponded to the disciplines with the most users in the overall population, though the ranking (in user/respondent counts) did not exactly align.

Overall, we believe using the random sampling methodology proved valuable. The response rate improved, the responses and respondents map closely to the overall user population, and the survey impacted a minimal number of users (300 out of the total TeraGrid user population of approximately 4,000).

The survey responses are summarized in Table 3 below. Satisfaction scores are given on a scale of 1-5, with 5 being "extremely satisfied." For application and software use, the table shows all provided responses for questions along with the number of responses and the percentage of respondents selecting that answer.

The survey results indicate that the CIP sites are highly successful in meeting the overall needs of a very diverse user community, but there are opportunities for improving and increasing awareness of services offered.

Table 3. Summary of User Survey Results

Satisfaction Levels
COMPUTE RESOURCES Responses Avg. Rating
SDSC DataStar p690 21 4.5
SDSC DataStar p655 32 4.4
NCSA IBM p690 (copper) 36 4.3
NCSA TeraGrid IA64 Cluster (mercury) 29 4.1
NCSA Xeon Cluster (tungsten) 47 3.9
NCSA SGI Altix (cobalt) 35 3.9
SDSC TeraGrid IA64 Cluster 17 3.8
SDSC IBM Blue Gene 17 3.4
NCSA Condor Pool (radium) 4 3.0
DATA STORAGE SYSTEMS Responses Avg. Rating
SDSC HPSS 38 4.5
SDSC Collections Disk Space (SAM-QFS) 7 4.4
NCSA UniTree 63 4.3
GridFTP-UniTree 12 3.8
TeraGrid GPFS-WAN 17 3.8
SRB 4 3.5
RESOURCES USED FOR REMOTE VISUALIZATION Responses Avg. Rating
NCSA 26 4.1
SDSC 19 3.9
OVERALL SATISFACTION, RESOURCES AND SERVICES Responses Avg. Rating
NCSA 88 4.4
SDSC 58 4.3
TeraGrid (overall) 60 4.1
Phone Consulting Responses Avg. Rating
NCSA Consulting 40 4.7
NCSA Help Desk 41 4.6
SDSC Consulting 34 4.4
TeraGrid Help Desk 19 4.4
E-mail Consulting Responses Avg. Rating
consult@ncsa.uiuc.edu 63 4.6
help@ncsa.uiuc.edu 55 4.5
consult@sdsc.edu 41 4.4
help@teragrid.org 32 4.2
ALLOCATIONS Responses Avg. Rating
Allocations Staff 81 4.6
Overall Process 99 4.3
Time to get award 91 4.1
POPS 76 4.1
Proposal format/length 85 4.0
Online help/documentation 90 4.0
Reviewer Comments 75 3.9
TRAINING Responses Avg. Rating
On-site Courses 13 4.2
Web-based Courses 8 3.6
Access Grid Courses 6 3.5
Video Archive 4 3.3
DOCUMENTATION Responses Avg. Rating
SDSC User Guides 49 4.2
NCSA Web sites 88 4.2
NCSA User Guides 83 4.1
SDSC Web sites 51 4.1
NCSA User News 55 4.0
SDSC User News 39 3.9
TeraGrid Web sites 52 3.8
TeraGrid User Guides 45 3.8
TeraGrid User News 35 3.7
Application and Software Use Responses %
Primary limitation to completing target runs Responses %
Long queue wait times7364%
Other, please specify1311%
No limitations encountered109%
Size of allocation award54%
System downtime54%
Number of processors on a system33%
Disk or tape storage33%
Memory on a system or node22%
Network bandwidth00
Maximum processors used, personall, in batch job Responses %
None109%
1-163228%
17-642623%
65-1281110%
129-2561210%
257-5121311%
513-102465%
1025-204843%
2049 or more11%
Maximum processors used, by your group, in batch job Responses %
None22%
1-161816%
17-641917%
65-1281614%
129-2561311%
257-5122017%
513-102487%
1025-204898%
2049 or more11%
Don't know98%
Codes used for production runs Responses %
Codes developed by your group, augmented with third-party libraries or routines4135%
Codes developed entirely by your group3429%
Third-party (commercial or otherwise) software2522%
Third-party software, augmented with routines or libraries written by your group1311%
Other, please specify22%
Don't know11%
Use of math software or libraries Responses %
Did not use mathematical libraries or software4236%
Used math libraries in codes developed by your group3933%
Used math software (e.g., Matlab) for post-processing or analysis2622%
Used math libraries called from within third-party codes (if known)2320%
Other, please specify54%
Don't know43%
Use of parallel performance or debugging tools Responses %
Did not use parallel performance or debugging tools4539%
Used parallel performance tools (e.g., PAPI, Xprofiler, MPI_wtime) to analyze and improve performance of codes developed by your group2723%
Used third-party software, so did not need performance or debugging tools2320%
Used parallel debugging tools (e.g., TotalView) to help write, modify, port and/or debug codes1916%
Don't know87%
Other, please specify54%
Use of data management software, libraries, or services Responses %
Managed data movement and storage primarily using standard file movement tools or utilities (e.g., scp, tgcp, archival storage commands)8678%
Managed data within codes using data format libraries (e.g., HDF4/5, netCDF, MPI-IO)1614%
Don't know1614%
Managed data using higher-level data management tools (e.g., Storage Resource Broker, databases, Globus RFT)65%
Other, please specify21%
Use of visualization software and hardware Responses %
Copied output data from NCSA or SDSC resources to in-house computers for plotting or visualization7463%
Did not perform any visualization3025%
Used visualization software on NCSA or SDSC compute resources to visualize data remotely during post-processing1613%
Used visualization software on NCSA or SDSC compute resources to visualize data as part of workflow during compute runs65%
Other, please specify65%
Don't know21%
Use of grid services or software Responses %
Have not used or investigated use of grid tools8170%
Have heard about and investigated capabilities offered by grid tools1311%
Have used grid tools in production runs1210%
Have started to experiment with grid tools in test runs65%
Other, please specify33%
Useful size for persistent online storage area Responses %
1 TB or less4742%
1 TB - 10 TB5448%
10 TB - 50 TB98%
50 TB or more33%
Plan to use public data collections Responses %
Yes1614%
No9986%