Support #3605

CSW Access to GeoPortal

Added by Angelo Quaglia about 1 year ago. Updated about 1 year ago.

Status:FeedbackStart date:28 May 2019
Priority:NormalDue date:
Assignee:Angelo Quaglia% Done:

0%

Category:Geoportal Services
Target version:-
Submitting Organisation: Knowledge-Base relevant?:No
Proactive:No Keyword #1:
Country: Keyword #2:
Originating UI: Keyword #3:

Description

From: Kathi Schleidt <kathi@datacove.eu>
Sent: 28 May 2019 14:59
Subject: CSW Access to GeoPortal
 
Hi Robert,

I'm currently chewing on how to gain an overview of EF/O&M related
datasets on the GeoPortal, tried to access the underlying CSW to save me
from manually clicking through over 1000 records. But, fear I'm running
into issues on the CSW, when I do the following request, I get an answer
"Not Implemented, GET to /GeoportalProxyWebServices/resources/OGCCSW202
not supported."

CSW Request:

http://inspire-geoportal.ec.europa.eu/GeoportalProxyWebServices/resources/OGCCSW202?service=CSW&version=2.0.2&request=GetRecords&constraintLanguage=CQL_TEXT&constraint_language_version=1.1.0&constraint=csw:AnyText=%E2%80%99Environmental%20monitoring%E2%80%98

:?

Kathi

Related issues

Related to Geoportal Helpdesk - Support #3606: Error in INSPIRE CSW GetRecords Feedback 30 May 2019

History

#1 Updated by Angelo Quaglia about 1 year ago

  • Category set to Geoportal Services
  • Status changed from New to Feedback
  • Assignee set to Katharina Schleidt
From: QUAGLIA Angelo (JRC-ISPRA-EXT)
Sent: 28 May 2019 17:00
To: kathi@datacove.eu
Cc: TOMAS Robert (JRC-ISPRA)
Subject: Re: CSW Access to GeoPortal
 

Dear Kathi,

 

the INSPIRE Geoportal supports the minimum requirements for INSPIRE Discovery Services, i.e.:

 

1) HTTP POST  binding for GetRecords, no HTTP GET which is optional in CSW 2.0.2:


 

2) Only ogc:Filter is supported as constraint language, no CQL:

 

So, your request translates to:

 

 

<?xml version='1.0' encoding='UTF-8' ?>
<GetRecords service="CSW" version="2.0.2" maxRecords="10" startPosition="1" resultType="results"
            outputFormat="application/xml" outputSchema="http://www.opengis.net/cat/csw/2.0.2"
            xmlns="http://www.opengis.net/cat/csw/2.0.2" xmlns:ogc="http://www.opengis.net/ogc"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <Query typeNames="gmd:MD_Metadata">
        <ElementSetName>full</ElementSetName>
        <Constraint version="1.1.0">
            <ogc:Filter>
                <ogc:PropertyIsLike wildCard="%" singleChar="_" escapeChar="\">
                    <ogc:PropertyName>csw:AnyText</ogc:PropertyName>
                    <ogc:Literal>Environmental monitoring</ogc:Literal>
                </ogc:PropertyIsLike>
            </ogc:Filter>
        </Constraint>
    </Query>
</GetRecords>
 

 

 

 

 

However, if you need to visually browse the records, I recommend using the Resource Browser because you get the results in one click and already translated into English::

 

http://inspire-geoportal.ec.europa.eu/proxybrowser/#fq=text:"Environmental monitoring"&fq=sourceMetadataResourceLocator:\/*&q=*:*

 

 

 

 

Best regards,
Angelo

 

#2 Updated by Katharina Schleidt about 1 year ago

Also here:

thanks for the info on how to access the CSW via post, now just need to know that you've added to the header to avoid the 415 Unsupported Media Type error

As to the reason I'd like to access the CSW directly - as Facilitator I've been requested to scan all datasets provided by MS, as a sane human being I'd prefer not to manually click through the over 1000 records contained! Thus looking to automate this process

#3 Updated by Katharina Schleidt about 1 year ago

OK, found the content type online, the following does the trick

Content-type: application/xml

#4 Updated by Angelo Quaglia about 1 year ago

From: QUAGLIA Angelo (JRC-ISPRA-EXT)
Sent: 28 May 2019 17:32
To: Kathi Schleidt
Cc: TOMAS Robert (JRC-ISPRA)
Subject: Re: CSW Access to GeoPortal
 

 

Dear Kathi,

that is hardly a secret as it is actually a requirement coming from the HTTP specifications and also described in the OGC CSW 2.0.2 specifications:

 

10.2.2 Message headers
The standard headers are defined in Section 14 of RFC 2616. Some of these are of
particular significance to catalogue operations.
Any HTTP/1.1 message containing an entity-body shall include a Content-Type header
field defining the media type of that body (RFC 2616, 7.2.1); the charset parameter shall
also be specified for text.
EXAMPLES 1
Content-Type: application/xml; charset=utf-8
Content-Type: application/octet-stream
Content-Type: multipart/related; boundary="part-boundary";

 

 

 

Best regards,

Angelo

 

 

 

 

 

#5 Updated by Angelo Quaglia about 1 year ago

Please note that should you need to query by country you can do that using the virtual endpoints:

OGCCSW202/AT

OGCCSW202/BE

...

 

 

#6 Updated by Angelo Quaglia about 1 year ago

  • Assignee changed from Katharina Schleidt to Angelo Quaglia

#7 Updated by Katharina Schleidt about 1 year ago

Next issue - the schema references at the start of the csw:GetRecordsResponse has issues (missing xmlns:xsi, as well as the relevant entries in xsi:schemaLocation)

Once I added these missing bits (seems to need all of the following as Location: "http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/csw.xsd http://www.isotc211.org/2005/gmd http://schemas.opengis.net/iso/19139/20070417/gmd/gmd.xsd http://www.isotc211.org/2005/gmx http://schemas.opengis.net/iso/19139/20070417/gmx/gmx.xsd"), the output is valid according to XMLSpy, but would be far easier to access and process if one didn't first have to fix this!

 

#8 Updated by Angelo Quaglia about 1 year ago

Dear Katharina,

thank you for your comments.

In an XML document the schemaLocation attribute is always optional. If that is missing, the declaration of xmlns:xsi is also optional.

In addition, the schemas you point to are not all those that are needed for INSPIRE metadata.

The correct schema for INSPIRE is this one:

http://schemas.opengis.net/csw/2.0.2/profiles/apiso/1.0.0/apiso.xsd

(For Metadata Technical Guidance versions < 2.0, the schema is now only available as a zipped file here http://schemas.opengis.net/csw/2.0.2/profiles/apiso/apiso-1_0_0.zip)

In any case, before you spend more time on these minor issues, please take a look at what I wrote here:

 

Issue #3606

Dear Jonathan,

oh, absolutely: there are also other things that I would like to change but, as I wrote, the OGC CSW 2.0.2 adapter is, currently, not a priority and it is not to me to change this.

The current OGC CSW 2.0.2 adapter of the INSPIRE Geoportal has never been officially released and it is made available on an as-is basis.

In 2013 I implemented the minimum needed to make harvesting possible from GeoNetwork and other clients (including the INSPIRE Geoportal itself).

OGC CSW 2.0.2 clients know that they need to read the capabilities and only use the bindings announced in the capabilities, as I advise you to do.

 

Back in 2013, I was in the process of passing the OGC compliance tests but I had to stop because their tool was not compliant:

https://portal.opengeospatial.org/?m=projects&a=view&project_id=85&tab=5&act=details&issue_id=864

later moved to GitHub:

https://github.com/opengeospatial/ets-csw202/issues/2

It took them one year and a half to fix the issue.

 

In any case, the purpose of the INSPIRE Geoportal has never been to serve metadata records of Member States. They are responsible to maintain their National Discovery Services.

For this reason, the OGC CSW 2.0.2 adapter of INSPIRE Geoportal MODIFIES the fileIdentifiers upon serving the records. Please pay attention to this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<csw:GetRecordsResponse >
    <csw:SearchStatus timestamp="2019-05-30T15:24:13.720+02:00"/>
    <csw:SearchResults numberOfRecordsMatched="2127" numberOfRecordsReturned="10" nextRecord="11">
        <gmd:MD_Metadata xmlns:gmd="http://www.isotc211.org/2005/gmd"
                 xmlns:gco="http://www.isotc211.org/2005/gco"
                 xmlns:gmx="http://www.isotc211.org/2005/gmx"
                 xmlns:srv="http://www.isotc211.org/2005/srv"
                 xmlns:gml="http://www.opengis.net/gml/3.2/3.2/3.2"
                 xmlns:geonet="http://www.fao.org/geonetwork"
                 xmlns:xlink="http://www.w3.org/1999/xlink">
            <gmd:fileIdentifier>
                <gco:CharacterString xmlns:xs="http://www.w3.org/2001/XMLSchema"
                           xmlns:fn="http://www.w3.org/2005/xpath-functions">/INSPIRE-93ee1068-1dc3-11e7-a02d-52540023a883_20190506-202548/services/1/PullResults/1-20/datasets/3_ID_4f0c2330-491f-462f-a83f-66cc3324f699</gco:CharacterString>
            </gmd:fileIdentifier>

Best regards,

Angelo

#9 Updated by Katharina Schleidt about 1 year ago

Hi Angelo,

sorry for being such a bother, and yes, I am quite aware of the GeoPortal as a tool for viewing the individual records.

One of our tasks as Cluster Facilitators is getting an overview of the types of datasets reported under our Themes. I have tried the process of accessing the individual MD records via the GeoPortal, clicking my way through the information provided. A very nice user interface, but not that well suited towards this type of processing (simple math, if I'm really fast and can analyze a MD record in 1 minute, with the 2000+ MD records for the Theme EF, I should be done in just over 33 hours. Not thinking about the content, just viewing!). As the fields I'm really looking for are minimal, it would be far easier to make an extract containing title, abstract, service linkage info (what formats are available), dump this in an Excel or DB, and then be able to just think and not just click!

I've now gotten the program logic sorted to the point where I can pull this information off of the CSW, but still have a few questions:

  • Are the english translations automatically generated, or provided in the original MD document? Is there any way of automatically accessing the english translations, or are these ONLY available via the GeoPortal Interface?
  • Is there a way of aligning the fileIdentifier with the static URIs used for providing the 19115 MD via the GeoPortal?

Also - many thanks for the info on the country specific end points. I'd been trying to glean country information from the MD record, but this is missing in at least half the cases, this way is far easier

:)

Kathi

#10 Updated by Angelo Quaglia about 1 year ago

Hi Kathi,

you are welcome.

Many thanks for the explanation.

Now that I your aim is clearer to me, I should be able to assist you in a better way.

If you formulate your query visually with the Resource Browser:

http://inspire-geoportal.ec.europa.eu/proxybrowser/#fq=text%3A%22Environmental%20monitoring%22&fq=sourceMetadataResourceLocator%3A%5C%2F*&q=*%3A*

and you inspect the requests it is firing through the browser, you can come up with something like this:

http://inspire-geoportal.ec.europa.eu/solr/select?q=*:*&rows=100&start=10&wt=json&fl=id,resourceType,resourceTitle,resourceTitle_provided_eng,resourceTitle_automated_eng, resourceAbstract,resourceAbstract_provided_eng,resourceAbstract_automated_eng,inspireTheme

If you omit wt=json, you get back xml instead of json.

 

Translations

We have recently rolled auto support for translations for Resource Title and Resource Abstract of metadata documents.

Not all countries have published the updated harvesting results, so translations are not available for all Member States, as of now.

The Resource Browser however, automatically translates all non English content on-the-fly.

Translations are both automated and (partially) provided.

Some Member States provide English translations for some of the metadata fields but this is not mandatory in INSPIRE.

Published translations are available via specific fields, for example.

resourceTitle is in the language of the metadata.

resourceTitle_provided_eng is the English  translations, present if provided.

resourceTitle_automated_eng is the English  translations, present of the metadata language is not English and new havresting results have been published by the Member State.

Using the above URL that would become:

http://inspire-geoportal.ec.europa.eu/solr/select?q=*:*&rows=100&start=10&wt=json&fl=id,resourceType,resourceTitle,resourceTitle_provided_eng,resourceTitle_automated_eng,resourceAbstract,resourceAbstract_provided_eng,resourceAbstract_automated_eng,inspireTheme&fq=memberStateCountryCode:cz

Best regards.

Angelo

 

#11 Updated by Angelo Quaglia about 1 year ago

  • Is there a way of aligning the fileIdentifier with the static URIs used for providing the 19115 MD via the GeoPortal?

I am not sure I understand. Concerning the identifiers provided by the GetRecords operation, the part in yellow if the INSPIRE URI, the part in green is the original fileIdentifier:

            <gmd:fileIdentifier>
                <gco:CharacterString xmlns:xs="http://www.w3.org/2001/XMLSchema"
                           xmlns:fn="http://www.w3.org/2005/xpath-functions">/INSPIRE-93ee1068-1dc3-11e7-a02d-52540023a883_20190506-202548/services/1/PullResults/1-20/datasets/3_ID_4f0c2330-491f-462f-a83f-66cc3324f699</gco:CharacterString>
            </gmd:fileIdentifier>

Please note that, currently, some National Discovery Services still provide duplicate fileIdentifiers.

The old geoportal was able to resolve the fileIdentifier in the User Interface providing HTTP URIs but this has not yet been ported to the new User Interface.

 

 

 

#12 Updated by Katharina Schleidt about 1 year ago

Thanks Angelo,

while the solr links you provided didn't work for me (only returned original language), analysing the GeoPortal calls showed me that you're also running a translation service :)

I'd added google translations to my crawler over the weekend, but once I started bulk conversions, they blocked me ;)

Thus, instead of accessing the solr URL as recommended, I'm continuing with pulling data off the CSW, and very much hope JRC doesn't block me on the translation service!!!

#13 Updated by Angelo Quaglia about 1 year ago

Dear Kathi,

the actual Translation Service of the European Commission is here:

https://ec.europa.eu/cefdigital/wiki/display/CEFDIGITAL/Machine+translation

Please consult the conditions of use.

 

As I wrote above and as you noticed, the Resource Browser automatically translates all non English content on-the-fly using the proxy made available by the INSPIRE Geoportal.

The proxy exposes an easy HTTP GET client and caches the translation results so that the Commission Service is not flooded with request for translating the same text over and over again.

Please ensure you use it only for translating the metadata of the INSPIRE Geoportal.

As soon as all Member States will have published translated harvestings, the proxy will only serve translations present in its cache.

 

 

Either you did not copy and paste the URL correctly , or you did not look hard enough:

http://inspire-geoportal.ec.europa.eu/solr/select?q=*:*&rows=1000&start=1000&fl=id,resourceType,resourceTitle,resourceTitle_provided_eng,resourceTitle_automated_eng,resourceAbstract,resourceAbstract_provided_eng,resourceAbstract_automated_eng,inspireTheme&fq=memberStateCountryCode:cz

 

 

Best regards,

Angelo

 

#14 Updated by Katharina Schleidt about 1 year ago

Hi Angelo,

maybe you get a different response from within JRC?

I reduced the count to 10, tried the following:

http://inspire-geoportal.ec.europa.eu/solr/select?q=*:*&rows=10&start=1000&fl=id,resourceType,resourceTitle,resourceTitle_provided_eng,resourceTitle_automated_eng,resourceAbstract,resourceAbstract_provided_eng,resourceAbstract_automated_eng,inspireTheme&fq=memberStateCountryCode:cz

Get the following response, but don't really know what to do with it:

<?xml version="1.0" encoding="UTF-8"?>
<response>
    <result name="response" numFound="2790" start="1000">
        <doc>
            <str name="id">/INSPIRE-16542303-763e-11e4-8b38-52540004b857_20190521-115116/services/1/PullResults/91-100/services/5/resourceLocator1/download/services/1/downloadDatasets/59/entry1/resourceLocator1/spatialObjects/2</str>
        </doc>
        <doc>
            <str name="id">/INSPIRE-16542303-763e-11e4-8b38-52540004b857_20190521-115116/services/1/PullResults/91-100/services/5/resourceLocator1/download/services/1/downloadDatasets/59/entry2/resourceLocator1/spatialObjects/1</str>
        </doc>
        <doc>
            <str name="id">/INSPIRE-16542303-763e-11e4-8b38-52540004b857_20190521-115116/services/1/PullResults/91-100/services/5/resourceLocator1/download/services/1/downloadDatasets/59/entry2/resourceLocator1/spatialObjects/2</str>
        </doc>
        <doc>
            <str name="resourceTitle">INSPIRE - adresní místa - obec - Tršice [505366]</str>
            <str name="id">/INSPIRE-16542303-763e-11e4-8b38-52540004b857_20190521-115116/services/1/PullResults/91-100/services/5/resourceLocator1/download/services/1/downloadDatasets/59</str>
        </doc>
        <doc>
            <str name="id">/INSPIRE-16542303-763e-11e4-8b38-52540004b857_20190521-115116/services/1/PullResults/91-100/services/5/resourceLocator1/download/services/1/downloadDatasets/94/entry1/resourceLocator1/spatialObjects/1</str>
        </doc>
        <doc>
            <str name="id">/INSPIRE-16542303-763e-11e4-8b38-52540004b857_20190521-115116/services/1/PullResults/91-100/services/5/resourceLocator1/download/services/1/downloadDatasets/94/entry1/resourceLocator1/spatialObjects/2</str>
        </doc>
        <doc>
            <str name="id">/INSPIRE-16542303-763e-11e4-8b38-52540004b857_20190521-115116/services/1/PullResults/91-100/services/5/resourceLocator1/download/services/1/downloadDatasets/94/entry2/resourceLocator1/spatialObjects/1</str>
        </doc>
        <doc>
            <str name="id">/INSPIRE-16542303-763e-11e4-8b38-52540004b857_20190521-115116/services/1/PullResults/91-100/services/5/resourceLocator1/download/services/1/downloadDatasets/94/entry2/resourceLocator1/spatialObjects/2</str>
        </doc>
        <doc>
            <str name="resourceTitle">INSPIRE - adresní místa - obec - Hradec nad Moravicí [507270]</str>
            <str name="id">/INSPIRE-16542303-763e-11e4-8b38-52540004b857_20190521-115116/services/1/PullResults/91-100/services/5/resourceLocator1/download/services/1/downloadDatasets/94</str>
        </doc>
        <doc>
            <str name="id">/INSPIRE-16542303-763e-11e4-8b38-52540004b857_20190521-115116/services/1/PullResults/91-100/services/5/resourceLocator1/download/services/1/downloadDatasets/77/entry1/resourceLocator1/spatialObjects/1</str>
        </doc>
    </result>
</response>

#15 Updated by Angelo Quaglia about 1 year ago

Hi Kathi,

if you reduce the number of records, you are less likely to stumble upon a record that contains the desired fields.

The 10 records you selected represent spatialObjects, not metadata documents.

If you do not ask to return only the representations of metadata document, you will get all kinds of representation (Network Services, Layers, etc.)

You can use the Resource Browser to refine your query.

 

http://inspire-geoportal.ec.europa.eu/solr/select?q=*:*&rows=1000&start=1000&fl=id,resourceType,resourceTitle,resourceTitle_provided_eng,resourceTitle_automated_eng,resourceAbstract,resourceAbstract_provided_eng,resourceAbstract_automated_eng,inspireTheme&fq=memberStateCountryCode:cz

 

 

 

#16 Updated by Katharina Schleidt about 1 year ago

True, if one queries 1000 records, there are a few translations. 5 of the records contain resourceTitle_automated_eng and resourceAbstract_automated_eng, but missing for the other 9995

Thus will continue to hope that JRC doesn't turn off the automated translation I'm running, don't wanna do the work of setting up other access points to keep down the number of requests from one IP! ;)

#17 Updated by Angelo Quaglia about 1 year ago

That is normal.

The INSPIRE Geoportal only stores automated translations for metadata records of dataset and services.

For the Czech Republic, the INSPIRE Geoportal creates 2790 records.

However, the actual metadata records are just 126 datasets, 23 series and 163 services.

Of the 126 datasets, 10 are already provided in English, so the INSPIRE Geoportal only translates 116 into English.

The series are 23, all in Czech, so all are translated.

So, you will find translations in only 5% of the records returned for the Czech Republic..

If you add the clause:

"fq=sourceMetadataResourceLocator:\/*" and you request 314 rows you will better your chances

If you then add also:

"fq=resourceType:(dataset or series)"

most of the results will contain tranlations.

 

 

From my side, if you only translate the text coming from the metadata contained in the INSPIRE Geoportal, I do not see a reason to block you.

Concerning what the JRC network administrators might think about your requests, it is a nice test. However, the INSPIRE Geoportal receives tens of thousand requests per month for other services, so you should be fine. Let me know.

If you want a wider access please contact:

the actual Translation Service of the European Commission:

https://ec.europa.eu/cefdigital/wiki/display/CEFDIGITAL/Machine+translation

 

 

 

Also available in: Atom PDF