CIP User Survey 2006
To ensure the effective support of its broad user community, NSF-supported centers have conducted annual user surveys for many years. In October 2006, the annual survey was conducted for as an activity of the Cyberinfrastructure Partnership (CIP) sites, SDSC and NCSA. This year's survey not only showed the continued success of the NSF-supported CIP environment, but also demonstrated the user community's general satisfaction with CIP resources, software and services. The CIP sites will make decisions about resources, services, and software based on this survey. The survey responses are summarized below.
For the 2006 survey, the CIP sites undertook a new survey methodology. Rather than canvassing the entire user community, a random sampling method was employed. We constructed a total user population that included all users from non-staff projects who had charged time on any NCSA or SDSC resource as well as all PIs who had allocations on NCSA or SDSC resources (even if they had no time charged). Each user in this population of approximately 1,500 individuals was categorized according to the size and discipline of the largest aggregate allocation award with which they were associated. (ÒDACÓ awards include awards up to 30,000 SUs, ÒMRACÓ between 30,001 and 200,000, and ÒLRACÓ greater than 200,000.) Table 1 shows the overall population characteristics by discipline and award size.
From this population a survey sample was selected. The sample was weighted toward LRAC and MRAC users since most of the time consumed on CIP resources is used by such projects. Targeting a survey sample of about 300 users, we randomly selected 150 LRAC users/PIs (~50% of the sample), approximately 100 MRAC users (~30%), and approximately 75 DAC users (~25%). Table 2 shows the survey sample characteristics; note the close correspondence to percentage of users in each discipline for the total population.
The survey was open for four weeks during October 2006. The questions covered the areas of computational environment, grid environment and services, programming environment, software and tools, data and I/O, visualization, allocations, consulting and help desk support, training and documentation, and some optional demographic information. All questions were optional.
We contacted the users in the sample via e-mail. A first message was sent jointly from the directorÕs of NCSA and SDSC, encouraging participation. This was followed by an e-mail sent via the online survey hosting service we employed. Three reminder e-mails were also sent at approximately weekly intervals. The targeted users were given the further incentive of an iPod nano to two randomly chosen respondents.
| Discipline | Users | ||||
|---|---|---|---|---|---|
| DAC | LRAC | MRAC | Total | % Total | |
| Chemistry | 121 | 62 | 76 | 259 | 17.9% |
| Chemical, Thermal Systems | 101 | 30 | 36 | 167 | 11.5% |
| Astronomical Sciences | 31 | 97 | 32 | 160 | 11.0% |
| Molecular Biosciences | 56 | 84 | 17 | 157 | 10.8% |
| Materials Research | 52 | 52 | 42 | 146 | 10.1% |
| Physics | 30 | 46 | 20 | 96 | 6.6% |
| Advanced Scientific Computing | 61 | 12 | 9 | 82 | 5.7% |
| Atmospheric Sciences | 33 | 35 | 13 | 81 | 5.6% |
| Mechanical and Structural Systems | 47 | 5 | 52 | 3.6% | |
| Computer and Computation Research | 32 | 6 | 38 | 2.6% | |
| Earth Sciences | 14 | 15 | 8 | 37 | 2.6% |
| Mathematical Sciences | 32 | 32 | 2.2% | ||
| Design and Manufacturing Systems | 25 | 25 | 1.7% | ||
| Biological and Critical Systems | 10 | 5 | 7 | 22 | 1.5% |
| Electrical and Communication Systems | 13 | 4 | 2 | 19 | 1.3% |
| Ocean Sciences | 5 | 3 | 10 | 18 | 1.2% |
| Environmental Biology | 12 | 2 | 14 | 1.0% | |
| Integrative Biology and Neuroscience | 6 | 5 | 11 | 0.8% | |
| Information, Robotics, and Intelligent Systems | 8 | 1 | 9 | 0.6% | |
| Social and Economic Science | 5 | 2 | 7 | 0.5% | |
| Microelectronic Information Processing Systems | 4 | 4 | 0.3% | ||
| Training | 3 | 1 | 4 | 0.3% | |
| Cross-Disciplinary Activities | 2 | 1 | 3 | 0.2% | |
| Behavioral and Neural Sciences | 2 | 2 | 0.1% | ||
| Biological Instrumentation and Resources | 1 | 1 | 0.1% | ||
| Industrial Science and Technological Innovation | 1 | 1 | 0.1% | ||
| Networking and Communications Research | 1 | 1 | 0.1% | ||
| Grand Total | 707 | 452 | 289 | 1,448 | |
| Discipline | Users | ||||
|---|---|---|---|---|---|
| DAC | LRAC | MRAC | Total | % Total | |
| Chemistry | 13 | 21 | 26 | 60 | 18.5% |
| Astronomical Sciences | 4 | 32 | 10 | 46 | 14.2% |
| Molecular Biosciences | 6 | 28 | 6 | 40 | 12.3% |
| Materials Research | 6 | 17 | 14 | 37 | 11.4% |
| Chemical, Thermal Systems | 11 | 10 | 12 | 33 | 10.2% |
| Physics | 3 | 15 | 7 | 25 | 7.7% |
| Atmospheric Sciences | 3 | 12 | 5 | 20 | 6.2% |
| Advanced Scientific Computing | 6 | 4 | 3 | 13 | 4.0% |
| Earth Sciences | 2 | 5 | 2 | 9 | 2.8% |
| Mechanical and Structural Systems | 6 | 1 | 7 | 2.2% | |
| Computer and Computation Research | 4 | 2 | 6 | 1.9% | |
| Ocean Sciences | 1 | 1 | 3 | 5 | 1.5% |
| Biological and Critical Systems | 1 | 1 | 2 | 4 | 1.2% |
| Electrical and Communication Systems | 1 | 2 | 1 | 4 | 1.2% |
| Design and Manufacturing Systems | 3 | 3 | 0.9% | ||
| Integrative Biology and Neuroscience | 1 | 2 | 3 | 0.9% | |
| Mathematical Sciences | 3 | 3 | 0.9% | ||
| Environmental Biology | 1 | 1 | 2 | 0.6% | |
| Social and Economic Science | 1 | 1 | 2 | 0.6% | |
| Behavioral and Neural Sciences | 1 | 1 | 0.3% | ||
| Information, Robotics, and Intelligent Systems | 1 | 1 | 0.3% | ||
| Grand Total | 78 | 150 | 96 | 324 | |
We received 121 complete and 7 partial responses from the initial 319 invitations, a response rate of 40%. (Five e-mail addresses were invalid.) This response rate is considerably higher than the lower response rate for the 2005 survey (which did not use random sampling). In addition, the responses correspond well to the targeted sample population characteristics, with 41% LRAC, 33% MRAC, 15% DAC respondents (8% selected ÒDonÕt know or OtherÓ). The disciplinary areas with the most respondents also generally corresponded to the disciplines with the most users in the overall population, though the ranking (in user/respondent counts) did not exactly align.
Overall, we believe using the random sampling methodology proved valuable. The response rate improved, the responses and respondents map closely to the overall user population, and the survey impacted a minimal number of users (300 out of the total TeraGrid user population of approximately 4,000).
The survey responses are summarized in Table 3 below. Satisfaction scores are given on a scale of 1-5, with 5 being "extremely satisfied." For application and software use, the table shows all provided responses for questions along with the number of responses and the percentage of respondents selecting that answer.
The survey results indicate that the CIP sites are highly successful in meeting the overall needs of a very diverse user community, but there are opportunities for improving and increasing awareness of services offered.
| Satisfaction Levels | ||
| COMPUTE RESOURCES | Responses | Avg. Rating |
| SDSC DataStar p690 | 21 | 4.5 |
| SDSC DataStar p655 | 32 | 4.4 |
| NCSA IBM p690 (copper) | 36 | 4.3 |
| NCSA TeraGrid IA64 Cluster (mercury) | 29 | 4.1 |
| NCSA Xeon Cluster (tungsten) | 47 | 3.9 |
| NCSA SGI Altix (cobalt) | 35 | 3.9 |
| SDSC TeraGrid IA64 Cluster | 17 | 3.8 |
| SDSC IBM Blue Gene | 17 | 3.4 |
| NCSA Condor Pool (radium) | 4 | 3.0 |
| DATA STORAGE SYSTEMS | Responses | Avg. Rating |
| SDSC HPSS | 38 | 4.5 |
| SDSC Collections Disk Space (SAM-QFS) | 7 | 4.4 |
| NCSA UniTree | 63 | 4.3 |
| GridFTP-UniTree | 12 | 3.8 |
| TeraGrid GPFS-WAN | 17 | 3.8 |
| SRB | 4 | 3.5 |
| RESOURCES USED FOR REMOTE VISUALIZATION | Responses | Avg. Rating |
| NCSA | 26 | 4.1 |
| SDSC | 19 | 3.9 |
| OVERALL SATISFACTION, RESOURCES AND SERVICES | Responses | Avg. Rating |
| NCSA | 88 | 4.4 |
| SDSC | 58 | 4.3 |
| TeraGrid (overall) | 60 | 4.1 |
| Phone Consulting | Responses | Avg. Rating |
| NCSA Consulting | 40 | 4.7 |
| NCSA Help Desk | 41 | 4.6 |
| SDSC Consulting | 34 | 4.4 |
| TeraGrid Help Desk | 19 | 4.4 |
| E-mail Consulting | Responses | Avg. Rating |
| consult@ncsa.uiuc.edu | 63 | 4.6 |
| help@ncsa.uiuc.edu | 55 | 4.5 |
| consult@sdsc.edu | 41 | 4.4 |
| help@teragrid.org | 32 | 4.2 |
| ALLOCATIONS | Responses | Avg. Rating |
| Allocations Staff | 81 | 4.6 |
| Overall Process | 99 | 4.3 |
| Time to get award | 91 | 4.1 |
| POPS | 76 | 4.1 |
| Proposal format/length | 85 | 4.0 |
| Online help/documentation | 90 | 4.0 |
| Reviewer Comments | 75 | 3.9 |
| TRAINING | Responses | Avg. Rating |
| On-site Courses | 13 | 4.2 |
| Web-based Courses | 8 | 3.6 |
| Access Grid Courses | 6 | 3.5 |
| Video Archive | 4 | 3.3 |
| DOCUMENTATION | Responses | Avg. Rating |
| SDSC User Guides | 49 | 4.2 |
| NCSA Web sites | 88 | 4.2 |
| NCSA User Guides | 83 | 4.1 |
| SDSC Web sites | 51 | 4.1 |
| NCSA User News | 55 | 4.0 |
| SDSC User News | 39 | 3.9 |
| TeraGrid Web sites | 52 | 3.8 |
| TeraGrid User Guides | 45 | 3.8 |
| TeraGrid User News | 35 | 3.7 |
| Application and Software Use | Responses | % |
| Primary limitation to completing target runs | Responses | % |
| Long queue wait times | 73 | 64% |
| Other, please specify | 13 | 11% |
| No limitations encountered | 10 | 9% |
| Size of allocation award | 5 | 4% |
| System downtime | 5 | 4% |
| Number of processors on a system | 3 | 3% |
| Disk or tape storage | 3 | 3% |
| Memory on a system or node | 2 | 2% |
| Network bandwidth | 0 | 0 |
| Maximum processors used, personall, in batch job | Responses | % |
| None | 10 | 9% |
| 1-16 | 32 | 28% |
| 17-64 | 26 | 23% |
| 65-128 | 11 | 10% |
| 129-256 | 12 | 10% |
| 257-512 | 13 | 11% |
| 513-1024 | 6 | 5% |
| 1025-2048 | 4 | 3% |
| 2049 or more | 1 | 1% |
| Maximum processors used, by your group, in batch job | Responses | % |
| None | 2 | 2% |
| 1-16 | 18 | 16% |
| 17-64 | 19 | 17% |
| 65-128 | 16 | 14% |
| 129-256 | 13 | 11% |
| 257-512 | 20 | 17% |
| 513-1024 | 8 | 7% |
| 1025-2048 | 9 | 8% |
| 2049 or more | 1 | 1% |
| Don't know | 9 | 8% |
| Codes used for production runs | Responses | % |
| Codes developed by your group, augmented with third-party libraries or routines | 41 | 35% |
| Codes developed entirely by your group | 34 | 29% |
| Third-party (commercial or otherwise) software | 25 | 22% |
| Third-party software, augmented with routines or libraries written by your group | 13 | 11% |
| Other, please specify | 2 | 2% |
| Don't know | 1 | 1% |
| Use of math software or libraries | Responses | % |
| Did not use mathematical libraries or software | 42 | 36% |
| Used math libraries in codes developed by your group | 39 | 33% |
| Used math software (e.g., Matlab) for post-processing or analysis | 26 | 22% |
| Used math libraries called from within third-party codes (if known) | 23 | 20% |
| Other, please specify | 5 | 4% |
| Don't know | 4 | 3% |
| Use of parallel performance or debugging tools | Responses | % |
| Did not use parallel performance or debugging tools | 45 | 39% |
| Used parallel performance tools (e.g., PAPI, Xprofiler, MPI_wtime) to analyze and improve performance of codes developed by your group | 27 | 23% |
| Used third-party software, so did not need performance or debugging tools | 23 | 20% |
| Used parallel debugging tools (e.g., TotalView) to help write, modify, port and/or debug codes | 19 | 16% |
| Don't know | 8 | 7% |
| Other, please specify | 5 | 4% |
| Use of data management software, libraries, or services | Responses | % |
| Managed data movement and storage primarily using standard file movement tools or utilities (e.g., scp, tgcp, archival storage commands) | 86 | 78% |
| Managed data within codes using data format libraries (e.g., HDF4/5, netCDF, MPI-IO) | 16 | 14% |
| Don't know | 16 | 14% |
| Managed data using higher-level data management tools (e.g., Storage Resource Broker, databases, Globus RFT) | 6 | 5% |
| Other, please specify | 2 | 1% |
| Use of visualization software and hardware | Responses | % |
| Copied output data from NCSA or SDSC resources to in-house computers for plotting or visualization | 74 | 63% |
| Did not perform any visualization | 30 | 25% |
| Used visualization software on NCSA or SDSC compute resources to visualize data remotely during post-processing | 16 | 13% |
| Used visualization software on NCSA or SDSC compute resources to visualize data as part of workflow during compute runs | 6 | 5% |
| Other, please specify | 6 | 5% |
| Don't know | 2 | 1% |
| Use of grid services or software | Responses | % |
| Have not used or investigated use of grid tools | 81 | 70% |
| Have heard about and investigated capabilities offered by grid tools | 13 | 11% |
| Have used grid tools in production runs | 12 | 10% |
| Have started to experiment with grid tools in test runs | 6 | 5% |
| Other, please specify | 3 | 3% |
| Useful size for persistent online storage area | Responses | % |
| 1 TB or less | 47 | 42% |
| 1 TB - 10 TB | 54 | 48% |
| 10 TB - 50 TB | 9 | 8% |
| 50 TB or more | 3 | 3% |
| Plan to use public data collections | Responses | % |
| Yes | 16 | 14% |
| No | 99 | 86% |