On Friday, June 8th, 2012, I had the opportunity to discuss Open Data and Open Government with Tony Clement, the President of the Treasury Board Secretariat (TBS), as well as other MPs and members of the Laurier Institute of Public Opinion and Policy (LISPOP). Our time with the Minister was short and the agenda was tight, but we managed to have a great discussion on data.gc.ca, the Treasury Board’s pilot programme to build a single, unified government data and statistics portal for Canadians.
When you get a chance to sit with industry, government and academic leaders, you better come prepared. I knew data.gc.ca from recent experience and had already compared it to other Government of Canada (GC) data portals, as well as some other national portals. I also printed recommendations to distribute to the room, which I believed helped focus the meeting on a few key points. Given the people we were speaking to, i.e., high-level ministers, aides and MPs, I focused my talking points on managerial and strategic concerns they can address in their own planning and visioning exercises. What’s important here is understanding what your audience wants to learn, and then communicating it in a manner than will help them visualize the idea they have themselves. The Minister’s chief concerns probably lie not in which research community uses which programme for statistical analysis as they do in the ability to develop the means for people from all backgrounds to access government-collected data regardless of platform. In moments like this, it’s best to keep your focus on broad, easily communicated strategies that can produce short-term and long-term results. No conversation is not without a persuasive element, and if I had a chance to speak, I wanted to be sure I was understood.
data.gc.ca is a pilot project, so we shouldn’t be quick to judge its merits and deficiencies. The Minister and his aides emphasized that the site is a work in progress and that he was looking for advice on how to improve its design and the end-user’s ability to access government data and statistics through it. In short, he wanted to meet us so that we could share our opinions and make recommendations to improve the service now and in the future.
The site itself represents the Treasury Board’s attempt to build a “union catalogue” of Government of Canada datasets and statistical tables. This is a great start at an issue that the LIS community has been grappling with for hundreds of years, i.e., developing one catalogue for all sorts of stuff from all sorts of places, and I think we should commend them on their efforts. Given the government’s current fiscal restraint, I’m impressed that this had even started.
information dump: data.gc.ca search results are wordy
In my mind, the main issue with data.gc.ca has to do with convincing the managers at the very top of the project (i.e., the Minister and his aides, who give direction but don’t necessarily have day-to-day input on implementation) that data management and access issues must be resolved if this site is to be a long-term success. The site has content, but it lacks organization, has poor access issues, and may suffer from long-term data management planning with other government departments. This is not so much a data issue as it is a data management issue. And thankfully, there is a solution (more on that later).
Here’s a short list of data.gc.ca‘s benefits and challenges that were addressed in the meeting:
- Offers a single portal to access government statistics and data
- Offers easy to use search and browse functions
- Maintains the regular Government of Canada (GC) website look-and-feel
- Contains geospatial data
- Offers data in the common .CSV format for Excel
- The portal does not always conform to “plain language principles.” - Some pages are wordy, and others are littered with government or data jargon that is not always explained and therefore may irritate the user.
- Search and Browse functions are easy to use, but their results are not useful. - There are few search refinement opportunities, inconsistent keyword indexing, and few search result and survey descriptions. This creates an information dump for novice and expert users alike.
- Records are inconsistent. - Sometimes, there are duplicate records for the same French and English data; other times, there are two unmarked links in one record.
- Records are incomplete. - Most records contain more empty fields than complete fields. The novice user will be confused; the expert user will be annoyed.
- Retrieved records are difficult to browse. - GC look-and-feel web standards are designed for text pages and not for database search results; this makes the website cumbersome.
- Data is offered only in CSV and XML. - In some ways, this is a “good thing” since it reduces the amount of files and file versions (or links) that the DB maintainers must worry about. On the other hand, Treasury Board and GC departments may want to consider encoding their datasets into other formats for their users in order to reduce the chance of file corruption, programming error, or user misinterpretation.
This list isn’t exhaustive, and in the portal’s current form, there is a greater number of challenges than benefits. But I would argue that it’s better to consider the quality or weight of all these items as opposed to their number since the portal is still a pilot programme. Many things must be improved (namely data organization, management, access, and data usefulness for both novice and expert users), but I think that Treasury Board may be on the right track here. I argued that TBS must take three actions to accelerate their forward momentum:
- Standardize data management principles across departments. This is a case of herding cats, I know, but if TBS truly wants data.gc.ca to become a premier stats/data engine, then it must coordinate with other departments to ensure that the records in its database are consistent and complete. In its current form, too many records have too many incomplete fields, so expert users will likely turn directly to the departmental websites, and novice users will either learn to do the same or turn away completely.
- Consider acquiring a license for Nesstar, which will improve the data presentation, manipulation, and access. Nesstar can facilitate cross-tabulations for novice users or people with immediate statistical needs, and if the government chooses to do so, also develop a repository for large datasets in a variety of formats (e.g., .CSV, SPSS, SAS, STATA, etc.). This will give TBS and other government departments the assurance that the data they are releasing “into the wild” is accurate and cannot be corrupted or misinterpreted when exporting from one format to another.
- (2a:) I also noted that several GC departments, provincial governments, other national governments, and the Canadian research community are already experienced users of Nesstar. I suggested that TBS discuss implementation and management plans with OCUL, which has been using Nesstar for several years now for <odesi> through ScholarsPortal.
- Consult with Statistics Canada regarding license development and outreach. StatCan is a world leader in data collection and dissemination; not involving them fully would be lost opportunity to TBS and a disservice to this agency’s expertise.
A screen caption of, the OCUL data repository
I can’t say exactly how well the Minister and his aides appreciated the information we gave, but it seemed to have been well-received. For my part, I intentionally used plain language to focus on resource management and policy development since these areas are most directly under their control and are things they can work on today, if they’re so willing. Hopefully, they walked away with a better understanding of the need for data management and access, and can now contextualize what areas they must work on to transform their big-picture thinking exercises into a successful public data portal.
On a personal note, this was one of the most productive hours I’ve had in a long time. My knowledge drove the discussion at times, and I walked back to my office feeling as though I successfully informed people who can implement programmes and make change in Canada. I’m pretty sure I raised the profile of data librarianship, of repositories like <odesi>, and of the need to pay more attention to data management today and in the future. I’m certainly no expert when it comes to data librarianship and data management, but for a moment, I was able to “speak truth to power,” which was a good reminder that yes, we know things in our field, and yes, I may know a few things, too. And that felt good. (I told you this paragraph was a personal note!)