Searching for datasets
Datasets described in Etsin may be searched from the home page, the Datasets page and through an API.
On the Datasets page you can limit your search to open data or datasets from a certain year onwards. After doing a query you may limit your search further. Above the search result list you are shown available filters from various categories. You can select filters, even multiple, by clicking on them. You can remove an active filter by clicking it again.
You can execute more complex searches using the right syntax. As Etsin uses the commonly used Solr search engine in the background, its syntax is usable in this service.
Note that if you want to search data by a specific field, the correct syntax is field:searchcriteria. Because of this, the colon is a reserved character in Etsin service's search.
The searches support two different wildcard characters. Symbol "?" provides a single character wildcard search. The following search would match both "text" and "test":
To match multiple characters you can use the symbol "*". It matches 0 or more characters and can be placed in the middle of the term as well as the end:
Logical operators and grouping
The search supports the boolean operators AND, "+", OR, NOT and "-". Note that the boolean operators must be provided as upper case characters.
To search datasets containing both "climate" and "rain" you would type:
climate AND rain
To search for datasets that must contain climate and may also contain rain the search would be:
NOT and "-" work similarly. These can be used to rule out words. For example, to search for datasets containing "climate" but not the word "rain" you would search for:
climate NOT rain
If needed, you can group the search criteria with parentheses. This will enable sub queries. For example, to search for either "sun" or "rain" and "climate", use query: (sun OR rain) AND climate
With fuzzy search you can match a word that is similar to the given one. Fuzzy search is constructed with symbol "~". To search spellings similar to "roam" you could make a search:
The range searches work for both date fields and non-date fields. You can search for datasets added between the lower and upper bounds of the provided dates with:
version:[20131212 TO 20141212]
Boosting a term
You can raise any word's importance by boosting it with the symbol "^" and a number. The number is so called boost factor and the higher the number is, the more relevant the word will be. For example, to search for terms "northern hemisphere" and "southern hemisphere" with the "northern hemisphere" wanted to be more relevant, you could construct a search like:
"northern hemisphere"^5 "southern hemisphere"