Support #2867

BE - FEDERAL: Duplicate records received from Belgium Federal Discovery Service

Added by Angelo Quaglia almost 4 years ago. Updated over 3 years ago.

Status:FeedbackStart date:18 Oct 2016
Priority:NormalDue date:
Assignee:Angelo Quaglia% Done:

0%

Category:Harvesting results
Target version:-
Submitting Organisation:BE - FEDERAL Knowledge-Base relevant?:
Proactive:No Keyword #1:
Country:BE - Belgium Keyword #2:
Originating UI: Keyword #3:

Description

From: Alain CAMUS [mailto:Alain.Camus@ngi.be]
Sent: 18 October 2016 09:36
To: angelo.quaglia@ext.jrc.ec.europa.eu
Cc: Danny Vandenbroucke <danny.vandenbroucke@kuleuven.be>; Nathalie DELATTRE <Nathalie.Delattre@ngi.be>; Dominique FLANDROIT <Dominique.Flandroit@ngi.be>
Subject: RE: [Geoportal Helpdesk - Support #2865] BE-CIRB-CIBG-BRIC: Double entries in CSW response

 

Hi Angelo,

 

It seems Belgium's federal discovery service (http://csw.geo.be/eng/csw?request=GetCapabilities&service=CSW&version=2.0.2) is returning duplicates. I can't find any evidence of this, the links in your e-mail below from 17 October at 16:23 all return 404 errors to me.

 

Could you show me what the problem is ?

 

Regards,

 

Alain Camus

Web developer

National Geographic Institute of Belgium

 

From: Nathalie DELATTRE
Sent: mardi 18 octobre 2016 09:14
To: Alain CAMUS <Alain.Camus@ngi.be>; Dominique FLANDROIT <Dominique.Flandroit@ngi.be>
Cc: Danny Vandenbroucke <danny.vandenbroucke@kuleuven.be>
Subject: FW: [Geoportal Helpdesk - Support #2865] BE-CIRB-CIBG-BRIC: Double entries in CSW response

 

Alain, Dominique

 

Bonjour, le cadastre et d’autres ont remarqué la génération de doublons dans les métadonnées que notre service de découverte est moissonné.

Pouvez-vous vérifier de votre côté avec Angelo Quaglia d’où vient le problème pour notre service ?

 

Cordialement

 

Nathalie

 

From: DU MORTIER François [mailto:fdumortier@cirb.brussels]
Sent: lundi 17 octobre 2016 18:21
To: Nathalie DELATTRE <Nathalie.Delattre@ngi.be>
Cc: VANDEBOEL Gustaaf <gvandeboel@cibg.brussels>; STREIGNARD Vincent <vstreignard@cirb.brussels>
Subject: TR: [Geoportal Helpdesk - Support #2865] BE-CIRB-CIBG-BRIC: Double entries in CSW response

 

Bonjour Nathalie,

 

D'après la Commission européenne, et plus précisément Angelo Quaglia, votre service de découverte retourne des doublons quand on l'interroge par bloc de 10 métadonnées. Nous avons résolu ce problème qui se produisait à cause des load balancers : les paquets de 10 métadonnées ne provenaient pas toutes de la même instance, et chaque instance travaillait avec des pointeurs propres. Peut-être que vous êtes confrontés au même problème. À tout hasard, je mets en Cc les collègues qui ont travaillé à la résolution de ce problème et qui pourront peut-être t'aiguiller vers la solution.

 

Bien à toi,

 

François

 

François DU MORTIER
Service Head
Customer Solutions - Consultancy
Avenue des Arts 21, 1000 Bruxelles - cirb.brussels - disclaimer

Be green, leave it on the screen !                                                          

De : sdi-notify@jrc.ec.europa.eu [mailto:sdi-notify@jrc.ec.europa.eu]
Envoyé : lundi 17 octobre 2016 16:23
Objet : [Geoportal Helpdesk - Support #2865] BE-CIRB-CIBG-BRIC: Double entries in CSW response

 

Issue #2865 has been updated by Angelo Quaglia.

Dear Francois,

it is very weird, indeed.

I am sorry to ask you again and just in order to get that possibility out of the way: I know that GeoNetwork used to have (in previous versions) a default setting about re-indexing. I am sure you checked that already but just in case.

In any case, as you can see from the GetRecordsResponse envelopes, the nextRecord clearly confirms that the INSPIRE Geoportal had asked each time for the correct startPosition and maxRecords=10.

The stored GetRecordsResponse envelopes content confirm that it was GeoNetwork which sent back duplicate records, despite the arguments.

It suggests there might be something wrong with GeoNetwork and the windowing algorithm:

http://inspire-geoportal.ec.europa.eu/resources/INSPIRE-56ef5d74-17be-11e4-ae76-52540004b857_20161016-185012/services/1/PullResults/1-10/downloaded.xml

<csw:GetRecordsResponse xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd">

<csw:SearchStatus timestamp="2016-10-16T18:50:54"/>

<csw:SearchResults numberOfRecordsMatched="90" numberOfRecordsReturned="10" elementSet="full" nextRecord="11">

 

http://inspire-geoportal.ec.europa.eu/resources/INSPIRE-56ef5d74-17be-11e4-ae76-52540004b857_20161016-185012/services/1/PullResults/11-20/downloaded.xml

<csw:GetRecordsResponse xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd">

<csw:SearchStatus timestamp="2016-10-16T18:51:24"/>

<csw:SearchResults numberOfRecordsMatched="90" numberOfRecordsReturned="10" elementSet="full" nextRecord="21">

http://inspire-geoportal.ec.europa.eu/resources/INSPIRE-56ef5d74-17be-11e4-ae76-52540004b857_20161016-185012/services/1/PullResults/21-30/downloaded.xml

<csw:GetRecordsResponse xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd">

<csw:SearchStatus timestamp="2016-10-16T18:51:47"/>

<csw:SearchResults numberOfRecordsMatched="90" numberOfRecordsReturned="10" elementSet="full" nextRecord="31">

 

 

http://inspire-geoportal.ec.europa.eu/resources/INSPIRE-56ef5d74-17be-11e4-ae76-52540004b857_20161016-185012/services/1/PullResults/31-40/downloaded.xml 

<csw:GetRecordsResponse xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd">

<csw:SearchStatus timestamp="2016-10-16T18:56:09"/>

<csw:SearchResults numberOfRecordsMatched="90" numberOfRecordsReturned="10" elementSet="full" nextRecord="41">

 


Support #2865: BE-CIRB-CIBG-BRIC: Double entries in CSW response

  • Author: François Du Mortier
  • Status: Feedback
  • Priority: Normal
  • Assignee: Angelo Quaglia
  • Category: Harvesting results
  • Target version:
  • Submitting Organisation: CIRB-CIBG-BRIC
  • Proactive: No
  • Country: BE - Belgium

Dear,

I observe a strange behaviour with your CSW at address http://inspire-geoportal.ec.europa.eu/GeoportalProxyWebServices/resources/OGCCSW202/BE : when reading the metadata grouped by pakket of 10, some records appear twice. For instance, today, we got double receords for following identifiers :

"c74e45c9-f51e-4639-89d3-ba49009fda1c","Vlaamse Hydrografische Atlas - Zones, 1 december 2014"
"E5274E60-D896-42D0-A8E7-5FF83D9AC7EC","Erosiegevoelige gebieden (Watertoets)"
"F9DDA633-1F45-483B-8227-91A466646329","Digitale boswijzer Vlaanderen 2010"
"c7aef6a2-7ffa-418f-91e9-ea1f24d1249f","G3Dv2_ 0315, voorkomen van basis Fm van Hannut"
"c7ceb6e5-ebcc-4f7a-856c-6c8f4962ccb3","G3Dv2_0802, breuken in basis Namuriaan"
"f99353d7-bb34-45cc-8304-ddf5873b5e0e","G3Dv2_0307_PA_Ma, basis Fm van Maldegem"
"a1644822-1a3e-4975-a32b-312718ecdff5","Belgian regions"
"d2a2b095-d0d1-4bb8-ac5a-7988444464dd","1GE GSB 1:250.000 surface Geologic Unit"
"d5f503fe-c228-48a6-9f00-927c95bbd450","Statistical districts of Belgium, 2011"
"d68c5d6b-13df-4fe1-8b4a-4a7a8219ea45","Core Cities in Belgium, Urban audit"
"dbbf39e3-9a2f-4a9c-ad6f-20b85e5717d3","Sand and gravel exploitation area 1a (MB/BS 20140328)"
"bf431c9a-84a7-4de6-8b08-f7cbffb0ac39","The CADGIS cadastral parcel plan"
"c6c6d674-e3c5-47a9-8398-f4e1082d3e8e","Top10Vector-High-Tension Network"
"dbbf39e3-9a2f-4a9c-ad6f-20b85e5717d3","Sand and gravel exploitation area 1a (MB/BS 20140328)"

The results fluctuates. On October 11th, we got:

"f288e3d2-7e38-40f5-a664-f0e42a499167","Positions des lignes d'arrêt STIB"
"c74e45c9-f51e-4639-89d3-ba49009fda1c","Vlaamse Hydrografische Atlas - Zones, 1 december 2014"
"E5274E60-D896-42D0-A8E7-5FF83D9AC7EC","Erosiegevoelige gebieden (Watertoets)"
"F9DDA633-1F45-483B-8227-91A466646329","Digitale boswijzer Vlaanderen 2010"
"c7aef6a2-7ffa-418f-91e9-ea1f24d1249f","G3Dv2_ 0315, voorkomen van basis Fm van Hannut"
"c7ceb6e5-ebcc-4f7a-856c-6c8f4962ccb3","G3Dv2_0802, breuken in basis Namuriaan"
"f99353d7-bb34-45cc-8304-ddf5873b5e0e","G3Dv2_0307_PA_Ma, basis Fm van Maldegem"
"a85be844-16eb-4ef8-bbad-b3fdc1e35117","Special Protection Area for birds zone 1 (MB/BS 20140328)"
"d2a2b095-d0d1-4bb8-ac5a-7988444464dd","1GE GSB 1:250.000 surface Geologic Unit"
"d5f503fe-c228-48a6-9f00-927c95bbd450","Statistical districts of Belgium, 2011"
"d68c5d6b-13df-4fe1-8b4a-4a7a8219ea45","Core Cities in Belgium, Urban audit"
"fa117329-f180-4ae6-a08a-99392fc07095","Functional Urban Areas in Belgium, Urban Audit"
"RMI_DATASET_CLIMATE_STATISTICS_57f649a854cb25.13340646","Climate statistics"
"RMI_DATASET_EXTREME_PRECIPITATION_57f64acaaa52d8.67252786","Extreme precipitation"
"RMI_DATASET_LIDAR_57fc18c3ab3bd4.64133419","Lidar"
"ea430cb8-2805-4c67-8392-9e2a47c1ef55","Population distribution on a grid, Belgium, 2011"
"fa117329-f180-4ae6-a08a-99392fc07095","Functional Urban Areas in Belgium, Urban Audit"
"ea430cb8-2805-4c67-8392-9e2a47c1ef55","Population distribution on a grid, Belgium, 2011"
"RMI_DATASET_ALARO_57fc17a3a13643.03731392","Weather Model Alaro"
"RMI_DATASET_ALARO_57fc17a3a13643.03731392","Weather Model Alaro"

We experienced the same issue with our own geonetwork-based CSW and it was due to the load balancer: during the process, the answer for a bunch of the next 10 records is not always attributed to the same machine. The metadata presented in an answer are not always presented in the same order. As a consequence, you get subsequent answers giving 10 records but not in the same order because they come from different machines.

Regards,

François Du Mortier


Related issues

Related to Geoportal Helpdesk - Support #2894: EU: Summary and status of duplicate metadata fileIdentifiers Assigned 23 Dec 2016

History

#1 Updated by Angelo Quaglia almost 4 years ago

From: Angelo Quaglia [mailto:angelo.quaglia@ext.jrc.ec.europa.eu]
Sent: 18 October 2016 12:50
To: 'Alain CAMUS' <Alain.Camus@ngi.be>
Cc: 'Danny Vandenbroucke' <danny.vandenbroucke@kuleuven.be>; 'Nathalie DELATTRE' <Nathalie.Delattre@ngi.be>; 'Dominique FLANDROIT' <Dominique.Flandroit@ngi.be>
Subject: RE: [Geoportal Helpdesk - Support #2865] BE-CIRB-CIBG-BRIC: Double entries in CSW response
Importance: High

 

Dear Alain,

Your CSW service was harvested again last night, so the old results have been moved to history and the links have changed. You can find the updated links below(*).

 

However, last night’s harvesting returned different duplicates.

 

I will now walk you through the steps to determine today’s duplicates.

 

You can find the fileIdentifiers in the INSPIRE Geoportal at any time by opening the following URL in your browser.

 

http://inspire-geoportal.ec.europa.eu/solr/select?facet=true&q=(id:\/INSPIRE-56ef5d74-17be-11e4-ae76-52540004b857*%20AND%20sourceMetadataResourceLocator:\/*)&facet.field=remoteMetadataIdentifier&facet.limit=-1&facet.mincount=2&rows=0

 

 

<response>

<lst name="responseHeader">

<int name="status">0</int>

<int name="QTime">23</int>

<lst name="params">

<str name="q">

(id:\/INSPIRE-56ef5d74-17be-11e4-ae76-52540004b857* AND sourceMetadataResourceLocator:\/*)

</str>

<str name="facet.limit">-1</str>

<str name="facet.field">remoteMetadataIdentifier</str>

<str name="facet.mincount">2</str>

<str name="rows">0</str>

<str name="facet">true</str>

</lst>

</lst>

<result name="response" numFound="89" start="0"/>

<lst name="facet_counts">

<lst name="facet_queries"/>

<lst name="facet_fields">

<lst name="remoteMetadataIdentifier">

<int name="6e741b67-21cb-4dfb-acc8-1cf78c342982">2</int>

<int name="706cd0654ec6d2b12e6279a907ba03ccb72586a">2</int>

<int name="76949313-923f-4d87-a4d9-a6ad977ba501">2</int>

<int name="7b467d8e-d72a-4285-8dd7-edb2a1d9132c">2</int>

<int name="810521fd-5822-4546-b73e-e3201be224e0">2</int>

<int name="91862c56-0c4c-42ae-a3ee-1b32f582a431">2</int>

<int name="9516809a-4240-4bd7-b686-aebb8286975b">2</int>

<int name="a85be844-16eb-4ef8-bbad-b3fdc1e35117">2</int>

<int name="aeea5743-57a2-47cb-a884-272ea6c09944">2</int>

<int name="b47f2ffd-ebc9-413c-903f-d83af520fcdb">2</int>

<int name="d2a2b095-d0d1-4bb8-ac5a-7988444464dd">2</int>

<int name="d3e4b2b2-de99-44fe-81e1-e2979e336559">2</int>

<int name="dbbf39e3-9a2f-4a9c-ad6f-20b85e5717d3">2</int>

<int name="dd9e42f7-9094-4c69-bf8f-1d2015d28d5c">2</int>

<int name="e14a8d0f-745e-4db3-931e-b0a789579207">2</int>

<int name="e37b7aed-e147-494a-b25e-5bcb83ab862f">2</int>

</lst>

</lst>

<lst name="facet_dates"/>

<lst name="facet_ranges"/>

</lst>

</response>

 

 

Then, you can search for those in the Resource Browser:

http://inspire-geoportal.ec.europa.eu/proxybrowser/

 

For example, the first in the list is:

6e741b67-21cb-4dfb-acc8-1cf78c342982

 

This is what you get:

 

 

 

You click on the “Inspire Metadata” link and a dialog will open:

 

 

You scroll down in the dialog and click on the symbol boxed in red in the previous figure.

 

A new browser window opens on this URL:

http://inspire-geoportal.ec.europa.eu/resources/INSPIRE-56ef5d74-17be-11e4-ae76-52540004b857_20161017-185121/services/1/PullResults/31-40/services/1/

The GetRecords response that was received is here: http://inspire-geoportal.ec.europa.eu/resources/INSPIRE-56ef5d74-17be-11e4-ae76-52540004b857_20161017-185121/services/1/PullResults/31-40/downloaded.xml

We do  find “6e741b67-21cb-4dfb-acc8-1cf78c342982”:

<csw:GetRecordsResponse xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd">

<csw:SearchStatus timestamp="2016-10-17T18:55:54"/>

<csw:SearchResults numberOfRecordsMatched="90" numberOfRecordsReturned="10" elementSet="full" nextRecord="41">

<gmd:MD_Metadata xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:srv="http://www.isotc211.org/2005/srv" xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:gml="http://www.opengis.net/gml"xmlns:geonet="http://www.fao.org/geonetwork" xsi:schemaLocation="http://www.isotc211.org/2005/srv http://schemas.opengis.net/iso/19139/20060504/srv/srv.xsd">

<gmd:fileIdentifier>

<gco:CharacterString>6e741b67-21cb-4dfb-acc8-1cf78c342982</gco:CharacterString>

</gmd:fileIdentifier>

 

 

 

You click on “Inspire Metadata” of the second record and you do the same thing:

http://inspire-geoportal.ec.europa.eu/resources/INSPIRE-56ef5d74-17be-11e4-ae76-52540004b857_20161017-185121/services/1/PullResults/21-30/services/6/

The GetRecords response that was received is here: http://inspire-geoportal.ec.europa.eu/resources/INSPIRE-56ef5d74-17be-11e4-ae76-52540004b857_20161017-185121/services/1/PullResults/21-30/downloaded.xml

We do find “6e741b67-21cb-4dfb-acc8-1cf78c342982”:

<csw:GetRecordsResponse xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd">

<csw:SearchStatus timestamp="2016-10-17T18:52:23"/>

<csw:SearchResults numberOfRecordsMatched="90" numberOfRecordsReturned="10" elementSet="full" nextRecord="31">

<gmd:MD_Metadata xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:srv="http://www.isotc211.org/2005/srv" xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:gml="http://www.opengis.net/gml"xmlns:geonet="http://www.fao.org/geonetwork" xsi:schemaLocation="http://www.isotc211.org/2005/srv http://schemas.opengis.net/iso/19139/20060504/srv/srv.xsd">

<gmd:fileIdentifier>

<gco:CharacterString>6e741b67-21cb-4dfb-acc8-1cf78c342982</gco:CharacterString>

</gmd:fileIdentifier>

 

 

As you can see, the duplicate records were actually received by the INSPIRE Geoportal.

Please note that the INSPIRE Geoportal adds a sort clause to each GetRecords request:

<csw:GetRecords xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:ogc="http://www.opengis.net/ogc" xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:apiso="http://www.opengis.net/cat/csw/apiso/1.0" service="CSW" version="2.0.2" maxRecords="1" startPosition="31" resultType="results" outputSchema="http://www.isotc211.org/2005/gmd" outputFormat="application/xml">

  <csw:Query typeNames="gmd:MD_Metadata">

    <csw:ElementSetName>full</csw:ElementSetName>

    <ogc:SortBy>

      <ogc:SortProperty>

        <ogc:PropertyName>apiso:Identifier</ogc:PropertyName>

        <ogc:SortOrder>ASC</ogc:SortOrder>

      </ogc:SortProperty>

    </ogc:SortBy>

  </csw:Query>

</csw:GetRecords>

 

 

 

Updated links (*)

http://inspire-geoportal.ec.europa.eu/resources/history/INSPIRE-56ef5d74-17be-11e4-ae76-52540004b857_20161016-185012/services/1/PullResults/1-10/downloaded.xml

<csw:GetRecordsResponse xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd">

<csw:SearchStatus timestamp="2016-10-16T18:50:54"/>

<csw:SearchResults numberOfRecordsMatched="90" numberOfRecordsReturned="10" elementSet="full" nextRecord="11">

 

http://inspire-geoportal.ec.europa.eu/resources/history/INSPIRE-56ef5d74-17be-11e4-ae76-52540004b857_20161016-185012/services/1/PullResults/11-20/downloaded.xml

<csw:GetRecordsResponse xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd">

<csw:SearchStatus timestamp="2016-10-16T18:51:24"/>

<csw:SearchResults numberOfRecordsMatched="90" numberOfRecordsReturned="10" elementSet="full" nextRecord="21">

http://inspire-geoportal.ec.europa.eu/resources/history/INSPIRE-56ef5d74-17be-11e4-ae76-52540004b857_20161016-185012/services/1/PullResults/21-30/downloaded.xml

<csw:GetRecordsResponse xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd">

<csw:SearchStatus timestamp="2016-10-16T18:51:47"/>

<csw:SearchResults numberOfRecordsMatched="90" numberOfRecordsReturned="10" elementSet="full" nextRecord="31">

 

 

http://inspire-geoportal.ec.europa.eu/resources/history/INSPIRE-56ef5d74-17be-11e4-ae76-52540004b857_20161016-185012/services/1/PullResults/31-40/downloaded.xml 

<csw:GetRecordsResponse xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd">

<csw:SearchStatus timestamp="2016-10-16T18:56:09"/>

<csw:SearchResults numberOfRecordsMatched="90" numberOfRecordsReturned="10" elementSet="full" nextRecord="41">

 

 

Best regards,

Angelo

 

#2 Updated by Angelo Quaglia almost 4 years ago

From: Alain CAMUS [mailto:Alain.Camus@ngi.be]
Sent: 18 October 2016 13:39
To: Angelo Quaglia <angelo.quaglia@ext.jrc.ec.europa.eu>
Cc: 'Danny Vandenbroucke' <danny.vandenbroucke@kuleuven.be>; Nathalie DELATTRE <Nathalie.Delattre@ngi.be>; Dominique FLANDROIT <Dominique.Flandroit@ngi.be>
Subject: RE: [Geoportal Helpdesk - Support #2865] BE-CIRB-CIBG-BRIC: Double entries in CSW response

 

Angelo,

 

Thanks for your reply, I see that you received duplicates indeed.

 

To be able to solve this problem, I need to reproduce it, and therefore I need to send the same requests as you did. In the pages that you sent me, I see the response that our server is sending, but not the request that you sent to our server. Or did I miss it ?

 

There is a POST example at the end of your mail, but it contains a maxRecords="1", so I guess it's not a real request that was sent. Can you show me the requests that were sent by your server ?

 

Also can I know at what time the harvesting occurs ? It could be that it occurs at the same time as our harvesting of our source servers, and this would generate problems.

 

Regards,

Alain

#3 Updated by Angelo Quaglia almost 4 years ago

  • Status changed from Assigned to Feedback

From: Angelo Quaglia [mailto:angelo.quaglia@ext.jrc.ec.europa.eu]
Sent: 18 October 2016 14:18
To: 'Alain CAMUS' <Alain.Camus@ngi.be>
Cc: 'Danny Vandenbroucke' <danny.vandenbroucke@kuleuven.be>; 'Nathalie DELATTRE' <Nathalie.Delattre@ngi.be>; 'Dominique FLANDROIT' <Dominique.Flandroit@ngi.be>
Subject: RE: [Geoportal Helpdesk - Support #2865] BE-CIRB-CIBG-BRIC: Double entries in CSW response

 

Dear Alain,

 

The harvesting happened between

 Page created17 Oct 2016, 16:51:31 GMT     Page modified17 Oct 2016, 17:03:11 GMT

 

The full report is here:

http://inspire-geoportal.ec.europa.eu/resources/INSPIRE-56ef5d74-17be-11e4-ae76-52540004b857_20161017-185121/services/1/PullResults/

 

 

Here is an excerpt from the log file, where you can see one example of the actual requests that were sent to your service:

 

17 Oct 2016 18:51:30,825 (OGCSRequest.java:543) -  INFO http-nio-8080-exec-418 eu.europa.ec.inspire.geoportal.ogc.OGCSRequest - built request URL: http://csw.geo.be/eng/csw

17 Oct 2016 18:51:31,032 (PullProcessor.java:2579) -  INFO pool-1970182-thread-1 eu.europa.ec.inspire.resource.discovery.Collect - Entering PullProcessor.Collect.call

17 Oct 2016 18:51:31,961 (CSW.java:356) -  INFO pool-1970182-thread-1 eu.europa.ec.inspire.geoportal.ogc.csw.CSW - Attempt n. 1

17 Oct 2016 18:51:31,961 (CSW.java:367) -  INFO pool-1970182-thread-1 eu.europa.ec.inspire.geoportal.ogc.csw.CSW - Timeout set to  ms: 11000

17 Oct 2016 18:51:31,961 (CSWRequestGetRecords.java:218) -  INFO pool-1970182-thread-1 eu.europa.ec.inspire.geoportal.ogc.csw.CSWRequestGetRecords - <csw:GetRecords xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:ogc="http://www.opengis.net/ogc" xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:apiso="http://www.opengis.net/cat/csw/apiso/1.0" service="CSW" version="2.0.2"  maxRecords="10"  startPosition="1"  resultType="results" outputSchema="http://www.isotc211.org/2005/gmd" outputFormat="application/xml"> <csw:Query typeNames="gmd:MD_Metadata"> <csw:ElementSetName>full</csw:ElementSetName>    <ogc:SortBy>

      <ogc:SortProperty>

        <ogc:PropertyName>apiso:Identifier</ogc:PropertyName>

        <ogc:SortOrder>ASC</ogc:SortOrder>

      </ogc:SortProperty>

    </ogc:SortBy>  </csw:Query></csw:GetRecords>

 

 

Properly formatted:

<csw:GetRecords xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:ogc="http://www.opengis.net/ogc" xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:apiso="http://www.opengis.net/cat/csw/apiso/1.0" service="CSW" version="2.0.2" maxRecords="10" startPosition="1" resultType="results" outputSchema="http://www.isotc211.org/2005/gmd" outputFormat="application/xml">

  <csw:Query typeNames="gmd:MD_Metadata">

    <csw:ElementSetName>full</csw:ElementSetName>

    <ogc:SortBy>

      <ogc:SortProperty>

        <ogc:PropertyName>apiso:Identifier</ogc:PropertyName>

        <ogc:SortOrder>ASC</ogc:SortOrder>

      </ogc:SortProperty>

    </ogc:SortBy>

  </csw:Query>

</csw:GetRecords>

 

 

 

 

 

May I ask you to kindly get an ECAS account here https://webgate.ec.europa.eu/cas/eim/external/register.cgi ?

ECAS Registration

You will now be directed to the European Commission Authentication Service website (ECAS)

  1. Choose a login name.  You are strongly advised to choose yourindividual professional email address (or alternatively, your personal email address), which will be easy to remember next time you try to login to PADOR.
  2. Set your password.  Once you fill in the personal details requested, an email will automatically be sent to you with a link to set your password. You have 90 min to click on the link (it will then expire).( If you do not receive this automatic email, please notify the ECAS Helpdesk(link sends e-mail))

If you send me your ECAS account, I will then enable it to access the INSPIRE MIG Collaboration Enviroment.

That will avoid the copy and pasting from emails I have to do.

 

Best regards,

Angelo

#4 Updated by Angelo Quaglia almost 4 years ago

From: Alain CAMUS [mailto:Alain.Camus@ngi.be]
Sent: 18 October 2016 16:40
To: Angelo Quaglia <angelo.quaglia@ext.jrc.ec.europa.eu>
Cc: 'Danny Vandenbroucke' <danny.vandenbroucke@kuleuven.be>; Nathalie DELATTRE <Nathalie.Delattre@ngi.be>; Dominique FLANDROIT <Dominique.Flandroit@ngi.be>; fdumortier@cirb.brussels
Subject: RE: [Geoportal Helpdesk - Support #2865] BE-CIRB-CIBG-BRIC: Double entries in CSW response

 

Angelo,

 

Thank you for the details, I could reproduce the problem here.

 

It seems related to the fact that some fileIdentifiers are in curly braces and some other not, and also the fact that GeoNetwork doesn't sort them alphabetically always the same way.

 

When I download all metadata by packets of 10, like you do, the metadata with fileIdentifier in curly braces are never gotten. It looks like they are skipped and replaced by the next ones. But they are taken into account for the startPosition, so the ones that were skipped in previous responses never appear.

 

I'll send this to GeoCat, we have a support contract with them. But it will take some time to get solved…

 

I guess you're not interested in a workaround, as you have to apply the same procedure to all countries, but if you were, you could increase the maxRecords to 100 and get all our metadata records at once.

 

 

About the ECAS account, I already tried it several times but the registration is too complex. The first problem is the "Where are you from ?" screen and I have no idea what kind of account I should take. What are an "executive agency staff", an "european institution and/or body" ? Or am I only a citizen using information systems of the EC ? I've already picked one of them without being sure it was the right choice, but there were more unexplained questions afterwards.

 

With kind regards,

Alain

#5 Updated by Angelo Quaglia almost 4 years ago

From: Angelo Quaglia [mailto:angelo.quaglia@ext.jrc.ec.europa.eu]
Sent: 18 October 2016 17:12
To: 'Alain CAMUS' <Alain.Camus@ngi.be>
Cc: 'Danny Vandenbroucke' <danny.vandenbroucke@kuleuven.be>; 'Nathalie DELATTRE' <Nathalie.Delattre@ngi.be>; 'Dominique FLANDROIT' <Dominique.Flandroit@ngi.be>; 'fdumortier@cirb.brussels' <fdumortier@cirb.brussels>
Subject: RE: [Geoportal Helpdesk - Support #2865] BE-CIRB-CIBG-BRIC: Double entries in CSW response

 

Dear Alain,

 

Your explanation of what is causing it is very interesting. I am having troubles with other Member States and nobody has ever been able to identify it clearly.

What version is your GeoNetwork?

 

What you suggest is a very reasonable workaround, but I see your GeoNetwork seems to be having troubles emitting some records:

 

If I send to:

http://csw.geo.be/eng/csw?SERVICE=CSW&VERSION=2.0.2&REQUEST=GetCapabilities

 

The following request:

<csw:GetRecords xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:ogc="http://www.opengis.net/ogc" xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:apiso="http://www.opengis.net/cat/csw/apiso/1.0" service="CSW" version="2.0.2" maxRecords="200" startPosition="31" resultType="results" outputSchema="http://www.isotc211.org/2005/gmd" outputFormat="application/xml">

  <csw:Query typeNames="gmd:MD_Metadata">

    <csw:ElementSetName>full</csw:ElementSetName>

    <ogc:SortBy>

     <ogc:SortProperty>

        <ogc:PropertyName>apiso:Identifier</ogc:PropertyName>

        <ogc:SortOrder>ASC</ogc:SortOrder>

      </ogc:SortProperty>

    </ogc:SortBy>

  </csw:Query>

</csw:GetRecords>

 

This is what I am getting back:

<?xml version="1.0" encoding="UTF-8"?>

<csw:GetRecordsResponse xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd">

    <csw:SearchStatus timestamp="2016-10-18T17:07:24" />

    <csw:SearchResults numberOfRecordsMatched="90" numberOfRecordsReturned="60" elementSet="full" nextRecord="91">

 

 

 

 

 

I am sorry you had troubles registering an ECAS account.

You belong to the “External Domain” (external with respect to the European Commission)

 

Please go to https://webgate.ec.europa.eu/cas/eim/external/register.cgi

If it asks you about the domain, please select External:

 

 

Then choose a username and fill in the other fields:

 

 

 

 

Best regards,

Angelo

#6 Updated by Angelo Quaglia almost 4 years ago

From: Alain CAMUS [mailto:Alain.Camus@ngi.be]
Sent: 19 October 2016 08:11
To: Angelo Quaglia <angelo.quaglia@ext.jrc.ec.europa.eu>
Cc: 'Danny Vandenbroucke' <danny.vandenbroucke@kuleuven.be>; Nathalie DELATTRE <Nathalie.Delattre@ngi.be>; Dominique FLANDROIT <Dominique.Flandroit@ngi.be>
Subject: RE: [Geoportal Helpdesk - Support #2865] BE-CIRB-CIBG-BRIC: Double entries in CSW response

 

Hi Angelo,

 

Our server is GeoNetwork 3.0.5.

 

About the workaround, do you mean the response only goes to the csw:SearchResults tag and you have no  gmd:MD_Metadata included ? That's a problem, I did the same request and I got all 60 results as expected. Maybe try to remove the "REQUEST=GetCapabilities" from the URL ?

 

If the problem is about the numbers you put in yellow, it's maybe due to the fact that you have a startPosition="31" in the request. We have 90 metadata, if you start at 31 you get 60 of them.

If I try the same with startPosition="1", the response contains :

<csw:SearchResults numberOfRecordsMatched="90" numberOfRecordsReturned="90" elementSet="full" nextRecord="0">

I agree that, in your case, the nextRecord should not be 91 but it's supposed to be a workaround, so I'd go with startPosition="1".

 

 

Thanks for the help about ECAS, my account is acangi.

 

Regards,

Alain

#7 Updated by Angelo Quaglia almost 4 years ago

Dear Alain,

yes, sorry, you are right, I did not notice the 31 in startposition, many thanks for spotting that.

Indeed, it works fine:

<csw:GetRecords xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:ogc="http://www.opengis.net/ogc" xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:apiso="http://www.opengis.net/cat/csw/apiso/1.0" service="CSW" version="2.0.2" maxRecords="200" startPosition="1" resultType="results" outputSchema="http://www.isotc211.org/2005/gmd" outputFormat="application/xml">
  <csw:Query typeNames="gmd:MD_Metadata">
    <csw:ElementSetName>full</csw:ElementSetName>
    <ogc:SortBy>
      <ogc:SortProperty>
        <ogc:PropertyName>apiso:Identifier</ogc:PropertyName>
        <ogc:SortOrder>ASC</ogc:SortOrder>
      </ogc:SortProperty>
    </ogc:SortBy>
  </csw:Query>
</csw:GetRecords>

 

<?xml version="1.0" encoding="UTF-8"?>
<csw:GetRecordsResponse xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd">
    <csw:SearchStatus timestamp="2016-10-19T11:57:53" />
    <csw:SearchResults numberOfRecordsMatched="90" numberOfRecordsReturned="90" elementSet="full" nextRecord="0">

 

 

I understand this is a temporary workaround, so I modified the settings on the server instead of requesting you to modfy . I set the maxRecords to 200 just to account for an expansion of your catalogue:

        <ns9:MaxRecords>200</ns9:MaxRecords>
        <ns9:MaxParallelRequests>1</ns9:MaxParallelRequests>
        <ns9:LinkScenario>centralised</ns9:LinkScenario>
        <ns9:OGCFilter></ns9:OGCFilter>

 

I ran a new harvesting and the report is here (valida for one day then it will expire )

The harvesting report is here:

http://inspire-geoportal.ec.europa.eu/resources/INSPIRE-56ef5d74-17be-11e4-ae76-52540004b857_20161019-120107/services/1/PullResults/

 

There are no duplicates:

http://inspire-geoportal.ec.europa.eu/solr/select?facet=true&q=(id:\/INSPIRE-56ef5d74-17be-11e4-ae76-52540004b857*%20AND%20sourceMetadataResourceLocator:\/*)&facet.field=remoteMetadataIdentifier&facet.limit=-1&facet.mincount=2&rows=0

 
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">22</int>
<lst name="params">
<str name="q">
(id:\/INSPIRE-56ef5d74-17be-11e4-ae76-52540004b857* AND sourceMetadataResourceLocator:\/*)
</str>
<str name="facet.limit">-1</str>
<str name="facet.field">remoteMetadataIdentifier</str>
<str name="facet.mincount">2</str>
<str name="rows">0</str>
<str name="facet">true</str>
</lst>
</lst>
<result name="response" numFound="89" start="0"/>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="remoteMetadataIdentifier"/>
</lst>
<lst name="facet_dates"/>
<lst name="facet_ranges"/>
</lst>
</response>

 

Again, I understand this is a temporary workaround, so I kindly ask you to update this issue once the problem has been fixed.

Many thanks for your fast responses.

Best regards,

Angelo

#8 Updated by Alain Camus almost 4 years ago

Yes, I confirm this is a temporary workaround and will notify you when it's no longer needed.

Regards,

Alain

#9 Updated by Angelo Quaglia over 3 years ago

Dear Alain,

do you know if the problem has been fixed?

Best regards,

Angelo

#10 Updated by Alain Camus over 3 years ago

Hi Angelo,

It's not fixed yet, but GeoCat confirmed it's related to the curly braces. We have another problem with these curly braces (related to an update of Tomcat) and will get rid of them in our IDs by the end of this month, thus the workaround shouldn't be necessary anymore in a few weeks. I'll tell you when this gets solved on our side.

With kind regards,

Alain

#11 Updated by Alain Camus over 3 years ago

Hi Angelo,

We removed the curly braces from our fileIdentifiers but it doesn't resolve the problem. I did a test harvesting our metadata 10 by 10 and we still have duplicates. This time none of our records has curly braces in its fileIdentifier, so the windowing problem is not (only ?) about this.

I'd need to investigate more on this, but I don't have time for this right now, hopefully you can keep the workaround a little longer.

With kind regards,

Alain

#12 Updated by Angelo Quaglia over 3 years ago

Dear Alain,

I understand and I thank you for keeping me posted.

It is not a problem for the INSPIRE Geoportal to keep these settings.

It is important to solve the issue for the sake of interoperability inside the INSPIRE infrastructure.

Best regards,

Angelo

Also available in: Atom PDF