Task #2220

MIWP-8 (L) Unique Resource Identifier:

Added by Michael Östling about 6 years ago. Updated over 5 years ago.

Status:ClosedStart date:17 Sep 2014
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Proposed change or action:

Description

This section has now been locked for editing and a draft version have been moved into the draft document.
See page:
https://ies-svn.jrc.ec.europa.eu/projects/metadata/wiki/DraftVersionsMIWP-8
Further discussions and comments should be done on the draft

*WIKI:* https://ies-svn.jrc.ec.europa.eu/projects/metadata/wiki/MIWP-8_(L)_Unique_Resource_Identifier TG Requirement 6 suggests to use RS_identifier for the encoding of the resource identifier. This is a poor choice as it’s a.) against the foreseen semantic in the ISO standard, which defines it as an “identifier used for reference systems” and b) because it’s not a CSW-queryable. As foreseen by the ISO-Standard the MD_Identifier (a “value uniquely identifying an object within a namespace”) should be suggested, which is also CSW-queryable. The task will be to recommend a “MD_Identifier” (a “value uniquely identifying an object within a namespace”)

UniqueResourceIdentifier_CzechRep.JPG (14.5 KB) Lucie Kondrova, 04 Dec 2014 11:28 am

MD_IR_and_ISO_20131029_Copy_for_editing_in_MIWP8_L_UniqueResourceIdentifier.doc (1.39 MB) Martin Seiler, 09 Feb 2015 03:56 pm

814

History

#1 Updated by Michael Östling about 6 years ago

TG Requirement 6 suggests to use RS_identifier for the encoding of the resource identifier. This is a poor choice as it’s a.) against the foreseen semantic in the ISO standard, which defines it as an “identifier used for reference systems” and b) because it’s not a CSW-queryable. As foreseen by the ISO-Standard the MD_Identifier (a “value uniquely identifying an object within a namespace”) should be suggested, which is also CSW-queryable. The task will be to recommend a “MD_Identifier” (a “value uniquely identifying an object within a namespace”) Included in the work could also be recommendations on best practices for its specification, in order to ensure the consistent use of global identifiers, possibly by using HTTP URIs. This would help solve a number of practical issues - e.g., related to harvesting - that are preventing the effective exploitation of the INSPIRE infrastructure. This activity is linked to MIWP-4 (see #2126).

#2 Updated by Michael Östling almost 6 years ago

  • Description updated (diff)

#3 Updated by Pawel Soczewski almost 6 years ago

  • Description updated (diff)

#2: In my opinion, there is no logical inconsistency between the definitions in the IR and TG. The description of definition in IR says: "[...] a character string namespace uniquely identifying the context of the identifier code[..]". This clearly indicates that the unique resource identifier must contains a namespace and a unique identifier in it. The current definition in the TG specifies general provision of the IR take into account the requirements of the element domain (from IR). If is plan to change the implementation to MD_Metadata (only code element) in the TG need to explicitly the resource identifier consists of a namespace and identifier in it. This can be done either by leaving the current definition, or write some implementing roles in the "Comment" filed.

#4 Updated by Pawel Soczewski almost 6 years ago

#3: I agree that the entity MD_Metadata is more suitable than the entity RS_Identifier. Use only and exclusively the code element to encode the namespace and id of unique resource identifier, has allowed it CSW-queryable. 

The issue is the syntax of identifier. I think, resource identifiers should be URIs in the ?http scheme. Ultimately, the identifiers should be "URI dereferencing" according to the Linked Date idea. The TG should include an explanation of how to build identifiers, something like Annex H in D2.5v3.4.

The question remains whether the resource identifier should be part of the spatial object identifier (inspire id). Finnish example from presentation in Aalborg:
http://{register}.fi/so/{namespace}/{localId}[/{versionId}] –
? URI?service by the individual data provider, especially regarding large municipalities
• {namespace} = data source, i.e. dataset identifier in national spatial data metadata
• {register} = URI management body i.e Helsinki for municipality, or authority
? e.g. to avoid minting URI’s to different thematic domains within a single dataset

This part of object id "http://{register}.fi/so/{namespace}" is resource identifier.
In Poland spatial data are often divided into dataset, depending on the responsible party or territorial/administrative units. The problem occurs, for example if the spatial object as a result of changes in the boundaries of the administrative units will be moved to a different dataset - the ID should be changed. In this case, the conclusion of the resource identifier in the identifier of the spatial object appears to be incorrect.

#5 Updated by Lucie Kondrova almost 6 years ago

Regarding the syntax of the Unique Resource Identifier, I'm attaching the example how we deal with it on national level in the Czech Republic. The code is defined in our national metadata profile as {country_code}-{national_tax_identification_number}-{internal_identifier}, for example CZ-00025798-CGS-GEOCR500-SDE. The codespace is discussed right now but we tend to agree on the link to the webpage of the organization, i.e. www.geology.cz - this should be unique also in the European scale (not the case of using acronyms of organizations).

#6 Updated by Michael Östling almost 6 years ago

I think use of MD_Identifier is a good option since RS_Identifier is not intended for this.

Also that we then get a single string connecting name spapce and code.

Regarding the use of HTTP URI/URL as identifier I think we here need to discuss this with MIG-t since that is a task done in a separate subgroup of MIG-t
https://ies-svn.jrc.ec.europa.eu/issues/2288 and it may be outside the scope this specific tas of TG Metadata
 

Since many of our resources will be open and linked I think its good to check work done in this area (not directly related to Inspire)
I add some links below to documents that could be relevant. 

Study on persistent URIs, with identification of best practices and recommendations on the topic for the MSs and the EC
https://joinup.ec.europa.eu/sites/default/files/D7.1.3%20-%20Study%20on%20persistent%20URIs.pdf

Designing URI Sets for the UK Public Sector
https://www.gov.uk/government/publications/designing-uri-sets-for-the-uk-public-sector

Cool URIs for the Semantic Web
http://www.w3.org/TR/cooluris/

 

 

 

 

 

#7 Updated by Martin Seiler almost 6 years ago

Pawel Soczewski schrieb:

[...] In Poland spatial data are often divided into dataset, depending on the responsible party or territorial/administrative units. The problem occurs, for example if the spatial object as a result of changes in the boundaries of the administrative units will be moved to a different dataset - the ID should be changed. In this case, the conclusion of the resource identifier in the identifier of the spatial object appears to be incorrect.

I think in SDIs its quite essential that identifiers are persistent and don't change. However ISO 19115 allows multiple identifiers for datasets.

#8 Updated by Martin Seiler almost 6 years ago

For me an essential argument of using MD_Identifier.code here is the relation to issue MIWP-8_(M)_Coupled_resources. The OperatesOn element should hold the resource identifier. When looking for services that provide a specific resource I'd query the CSW for services with OperatesOn==resource identifier. If codespace and code is seperated as in the RS_identifier, this is not working.

#9 Updated by Martin Seiler almost 6 years ago

Michael Östling schrieb:

Regarding the use of HTTP URI/URL as identifier I think we here need to discuss this with MIG-t since that is a task done in a separate subgroup of MIG-t https://ies-svn.jrc.ec.europa.eu/issues/2288 and it may be outside the scope this specific tas of TG Metadata 

Yes and no. This is essentially a question regarding the overall architecure of the infrastructe and the way that resources are discoverable. So, yes this needs to be discussed and decided upon elsewhere, but no, as we can't really propose a solid and benefitial solution here without this issue solved.

#10 Updated by Pawel Soczewski almost 6 years ago

Martin Seiler napisa?(a):

Pawel Soczewski schrieb: [...] In Poland spatial data are often divided into dataset, depending on the responsible party or territorial/administrative units. The problem occurs, for example if the spatial object as a result of changes in the boundaries of the administrative units will be moved to a different dataset - the ID should be changed. In this case, the conclusion of the resource identifier in the identifier of the spatial object appears to be incorrect.
I think in SDIs its quite essential that identifiers are persistent and don't change. However ISO 19115 allows multiple identifiers for datasets.

Martin, generally you right that identifiers shouldn't change :)

You write that ISO 19115 allows multiple identifiers for dataset but if dataset identifier is a part of spatial object identifier (also persistent and unchanged) it could be only one instance. If we change dataset id, we have to change spatial object id also, such action isn't permitted because spatial object identifier is unchanged. For this reason in practice dataset could has only one instance of identifier.

 

#11 Updated by Pawel Soczewski almost 6 years ago

Martin Seiler napisa?(a):

Michael Östling schrieb: Regarding the use of HTTP URI/URL as identifier I think we here need to discuss this with MIG-t since that is a task done in a separate subgroup of MIG-t https://ies-svn.jrc.ec.europa.eu/issues/2288 and it may be outside the scope this specific tas of TG Metadata 
Yes and no. This is essentially a question regarding the overall architecure of the infrastructe and the way that resources are discoverable. So, yes this needs to be discussed and decided upon elsewhere, but no, as we can't really propose a solid and benefitial solution here without this issue solved.

Maybe Michael's proposal is a good solution. The TG Metadata should only require that the value of dataset identifier must be in accordance with the rules of building identifiers in INSPIRE SDI and should refer to the relevant document prepared by a separate subgroup.
Of course this subgrup should also take into account identifiers datasets and their relationship with spatial object identifiers in its work .

#12 Updated by Pawel Soczewski almost 6 years ago

Martin Seiler napisa?(a):

For me an essential argument of using MD_Identifier.code here is the relation to issue MIWP-8_(M)_Coupled_resources. The OperatesOn element should hold the resource identifier. When looking for services that provide a specific resource I'd query the CSW for services with OperatesOn==resource identifier. If codespace and code is seperated as in the RS_identifier, this is not working.

In may opinion it's a good idea but it possible to achieve only in the case if identifiers are consistent with the idea of Linked Data and will be resovable. By this time it should allow an alternative value of OperatesOn element - OperatesOn==GetRecordsById of CSW Service

#13 Updated by Martin Seiler almost 6 years ago

Pawel Soczewski schrieb:

Martin Seiler napisa?(a): For me an essential argument of using MD_Identifier.code here is the relation to issue MIWP-8_(M)_Coupled_resources. The OperatesOn element should hold the resource identifier. When looking for services that provide a specific resource I'd query the CSW for services with OperatesOn==resource identifier. If codespace and code is seperated as in the RS_identifier, this is not working. In may opinion it's a good idea but it possible to achieve only in the case if identifiers are consistent with the idea of Linked Data and will be resovable. By this time it should allow an alternative value of OperatesOn element - OperatesOn==GetRecordsById of CSW Service

While that would be desireable from a LD perspective, they don't have to be resolveable themsevles, but have to be queryable by CSW (ISO AP), hence the importance of using MD_Identifier:

If you have a datasets' resource identifier and are looking for a service that offers this dataset you can do a CSW query with a filter for services and operatesOn=resource identifier of the dataset to receive ther services.

If you take the metadata record of a service as a start and the identifier of the dataset is resolvable, you can directly receive the datasets' metadata record. Thats a different use case.

#14 Updated by Pawel Soczewski almost 6 years ago

Martin, you right :) Really, it is sufficient that the resource identifier will be coded only as MD_Metadata.code (namespace + code) and it will value of OperatesOn element. I canceling out my proposal GetRecorrdsById as value of operatesOn element ... You've convinced me ;D

#15 Updated by Christian Ansorge over 5 years ago

Sorry for my absence so far, I had to finish a range of other thing towards end of the year.


Anyway, I had a look at the wiki regarding this issue and I was wondering if there is anything said about the actual role and function of URIs as element of INSPIRE metadata? Is there a defined business case which need to be supported?

For me this would be actually the starting point of the discussion in the wiki and I would propose to add there a section addressing the following issues.

 

What role/function we expect from the URI?

It might be recommended to have a discussion around the actually use cases of URI within INSPIRE metadata as one of several preconditions for a discussion of problems. 

  • What is the specific use case of URI as element of the INSPIRE metadata profile?
  • Is tracebility a use case? Would this need a querie functionality?
  • Is there a need to ensure the uniqueness of an URI?

 

 

#16 Updated by Martin Seiler over 5 years ago

Attached a first draft for section 2.2.5 in the TGs.

#17 Updated by Michael Östling over 5 years ago

  • Description updated (diff)
  • Status changed from Submitted to Closed

#18 Updated by Angelo Quaglia over 5 years ago

The MD Guidelines have always indicated the use of MD_Identifier if the Unique Resource Identifier of the dataset contains the code but no namespace.

The MD Guidelines indicate the use of RS_Identifier if and only if the dataset Unique Resource Identifier contains a namespace.

The Metadata Regulation clearly describes that the Unique resource identifier is composed of two distinct parts: a mandatory code and a namespace.

The namespace is not qualified as being mandatory nor unique but if present, the code must be unique inside that namespace:

1.5. Unique resource identifier

A value uniquely identifying the resource. The value domain of this metadata element is a mandatory character string code, generally assigned by the data owner, and a character string namespace uniquely identifying the context of the identifier code (for example, the data owner).

ISO 19115

An approach alternative to using RS_Identifier, is to encode the namespace inside the optional authority element of MD_Identifier:

More precisely, the namespace can be accommodated at the following position:

MD_Identifier/authority/CI_Citation/identifier/MD_Identifier/gmd:code/gco:CharacterString 

For example:

                    <gmd:identifier>
                        <gmd:MD_Identifier>
                            <gmd:authority>
                                <gmd:CI_Citation>
                                    <gmd:title>
                                        <gco:CharacterString> National Center of Remote Sensing and Geoinformatics "GIS-Centras"</gco:CharacterString>
                                    </gmd:title>
                                    <gmd:date>
                                        <gmd:CI_Date>
                                            <gmd:date>
                                                <gco:Date>2003</gco:Date>
                                            </gmd:date>
                                            <gmd:dateType>
                                                <gmd:CI_DateTypeCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#CI_DateTypeCode" codeListValue="creation">creation</gmd:CI_DateTypeCode>
                                            </gmd:dateType>
                                        </gmd:CI_Date>
                                    </gmd:date>
                                    <gmd:identifier>
                                        <gmd:MD_Identifier>
                                            <gmd:code>
                                                <gco:CharacterString>http://www.gis-centras.lt</gco:CharacterString>
                                            </gmd:code>
                                        </gmd:MD_Identifier>
                                    </gmd:identifier>
                                </gmd:CI_Citation>
                            </gmd:authority>
                            <gmd:code>
                                <gco:CharacterString>PS.ProtectedSite0</gco:CharacterString>
                            </gmd:code>
                        </gmd:MD_Identifier>
                    </gmd:identifier>

 

Also available in: Atom PDF