Since I'm a member of Opendata.ch and worked on the platform, of course I went along. My partner David Stark came too. (Actually, since I'd been away at DjangoCon for the previous week, this was a sneaky way to spend some quality time with him.) The ETH had allocated us several rooms in their main building, including a big open space where several groups could organise themselves within reach of the generous food and drink offerings. First of all, researchers introduced some of the datasets that were available for us to work with: proteomics data, environmental data, the openSNP genetics project and more.
When the participants split up into different groups, David and I went to discuss the possibilities in archived digital data, led by Ana Sesartic and her colleagues from the ETH Bibliothek. Some of this group went on to analyse and visualise data from CORDIS, records of scientific projects that have been funded by the EU. A lot of their time went into cleaning and normalising the data, but they did find some interesting results. For example, depending on the questions you ask and the type of analysis you perform, you can show either that Switzerland receives the greatest amount of this funding, or that the country is somewhere in the middle.
We formed a team with guy called Wolfram and Claudia Lienhard from the ETH Bibliothek's Innovation department. The problem we wanted to solve is this: with more and more portals being created, there is more open data than ever before, but it's getting less straightforward to search for it. We want to streamline the process with a unified search across different open data websites.
Our work is documented on the project's wiki page. Claudia's experience was very useful, and she and Wolfram outlined a proposed workflow for a user of our site. We took some inspiration from the ETH Bibliothek's Knowledge Portal. David and I, meanwhile, programmed a demo site. The current version runs a search on both the Swiss Open Government Data Pilot Portal and the Open Research Data Platform Switzerland. You can set a date range for publication, and the site will suggest possible search terms to narrow your search further.
This is only a start, but David and I both intend to keep working on the project. Our team came up with a list of features we'd like to implement, and we have added more as issues on the GitHub repo. Our progress so far was quite easy, because the portals we worked with are both built on CKAN and share its well-developed API. In future we would like to extend the multisearch to portals built on different frameworks as well. We also plan to develop extensions to improve metadata extraction when datasets are uploaded to a CKAN instance.
As open data becomes more well-known and prevalent, the difficulties of efficient searching will only increase, so I think this is a topic worth working on. Thanks to Opendata.ch, FORS and ETH Zürich for letting us develop our ideas with a great group!