Discussion #2448

Draft use case of common validation

Added by Giacomo Martirano about 5 years ago. Updated almost 5 years ago.

Status:New
Priority:Normal
Assignee:-

Description

Attached a document containing a draft use case of common validation, prepared in view of the face-to-face meeting in Lisbon.

I do hope it's a useful starting point for a fruitful discussion.

Use_Case_Common_Validation_v02.docx (38.6 KB) Giacomo Martirano, 09 Jun 2015 01:10 am

History

#1 Updated by Michael Lutz about 5 years ago

Dear MIWP-5 members, please provide feedback to Giacomo's use case proposal (attached to this issue).

Thanks.

#2 Updated by Francisco J Lopez-Pellicer about 5 years ago

We should take into account that the validation process may be painfully slow, and even it may fail.  In addition, results will be referenced by third parties thus the validation result should be persistent and uniquely identifiable.Therefore, the flow of events should be designed as a process that triggers a background process whose running status and final status can be queried at any time, even long after it has finished. In addition, a subscription/notification subsystem should be also considered. 

 

  1. The INSPIRE Data provider creates a validation request by providing the dataset metadata to the validation ystem 
  2. The valiation system returns a unique ID identifying this validation request to the INSPIRE Data provider
  3. [Optional] The INSPIRE Data provider subscribes to a notification service of events related to the ID provided by the validation system
  4. The validation system schedules the validation request
  5. A scheduler in the validation system triggers the validation process
  6. The validation system executes the “metadata validation” and updates the status of the validation request
  7. The validation system executes the “services validation” against third party services and updates the status of the validation request
  8. The validation system executes the “datasetvalidation validation” and updates the status of the validation request
  9. [Optional] The validation system notifies the INSPIRE Data provider the availability of the final report
  10. The INSPIRE Data provider requests to the validation system the validation report associated to ID 

Step 10 can be invoked by the user at any time, thus the user can see the progress of the report.

Some of these ideas (but notification) were implemented in the Spanish IDEE validator.

Regards.

 

#3 Updated by Freddy Fierens about 5 years ago

Giacomo Martirano wrote:

Attached a document containing a draft use case of common validation, prepared in view of the face-to-face meeting in Lisbon. I do hope it's a useful starting point for a fruitful discussion.

Good start to identify use cases. I added some comments for discussion to the review version.

#4 Updated by Michael Lutz about 5 years ago

I support both Francisco's and Freddy's comments:

  • We should support asynchronous execution of the validation request, maybe in addition to synchronous execution
  • It is indeed interesting to consider the complete chain (MD, DS, NS), but we also need to support more simple validation requests (only MD, only NS, only DS)

 

#5 Updated by Daniela Hogrebe about 5 years ago

Thanks to Giacomo for initiating the use case discussion. I like very much the approach of what I would call interoperability testing to check if the components of the infrastructure are not only compliant to a specific specification but also working together properly. Actually, this very broad approach should consider all implementation scenarios that are included in the TG documents.

Some more thoughts...

  • It's of course only one of many use cases we should consider.
  • The user should decide if he would like to validate the whole chain or just one of the components.
  • Maybe we should also consider different user groups (data providers, service providers, national contact points/member states, EC, ...).?

  

#6 Updated by Giacomo Martirano about 5 years ago

Thank you all for your positive feedback.

I fully agree that additional use cases taking into account several scenarios (synchronus vs. asynchronus, integrated vs. by-single-component, etc.) will better represent the whole "validation ecosystem" and therefore support different types of users.

In particular, regarding the Freddy's comment FF4 he put in the reviewed version of the word file, in our "ideal-fully-integrated" scenario, we envisaged the case in which the value of the Resource locator metadata element "points users to the location (URL) where the data can be downloaded".

#7 Updated by Christian Ansorge about 5 years ago

Dear Giacomo and dear colleagues,


Sorry for being silent for a longer period and not replying earlier.

Thank you Giacomo for taking the needed initiative to start the use case discussion. Reading the comments and the proposal I basically agree with what was said, especially with the point that Data validation might be done asynchronous for practical reasons.

I would like ask for more information on the idea of "interoperability testing" Daniela mentioned. This is an interesting idea but how this could possible work, as this is very much depending on the client (software) side. Do you have a specific idea how this could be done or even experience?

Daniela Hogrebe wrote:

Thanks to Giacomo for initiating the use case discussion. I like very much the approach of what I would call interoperability testing to check if the components of the infrastructure are not only compliant to a specific specification but also working together properly. Actually, this very broad approach should consider all implementation scenarios that are included in the TG documents. ...   

Thank you very much

Best regards

Chris

#8 Updated by Luis Bermudez about 5 years ago

Giacomo,

I also like the use case. As Michael said, there is a need to also be able to validate services and data without the catalog. I also envision a system, where countries can register their own profiles/extensions, so the validator can properly report towards the rules of a country.

Responding to Christian and Daniela,

We just released the first OGC client test: WMS 1.3 client test, which can help advance your idea of "interoperability testing".

http://www.opengeospatial.org/blog/2244 

 

 

 

 

#9 Updated by Daniela Hogrebe almost 5 years ago

Christian Ansorge wrote:

I would like ask for more information on the idea of "interoperability testing" Daniela mentioned. This is an interesting idea but how this could possible work, as this is very much depending on the client (software) side. Do you have a specific idea how this could be done or even experience?

Dear Christian,

sorry for answering late. In Germany, we have implemented some kind of "interoperability testing" in our Registry for checking the INSPIRE monitoring. The workflow is the following (currently without data testing):

  1. start with the fileIdentifier of the metadata for the spatial data set
  2. check conformity of metadata with the validator (DE: GDI-DE Testsuite)
  3. request the full metadata from the discovery service (DE: Geodatenkatalog.de as central end point) and get the coupled services (view service and download service)
  4. check conformity of services with the validator (DE: GDI-DE Testsuite)

Thus, the only thing you need to know, is the fileIdentifier of the metadata.

Cheers, Daniela

#10 Updated by Paul van Genuchten almost 5 years ago

I recently wrote a small blog related to INSPIRE usability which in my opinion fits a typical scenario for an interoperability test.

https://www.geocat.net/bike-thefts-part-2

- query the catalog on a (required) keyword

- for each dataset result, get the connected service record

- for each service record, get the wfs/atom/sos/wcs endpoint

- for each endpoint determine the appropriate featuretype/atom-feed

- fire a relevant request to the appropriate featuretype/atom-feed

- verify the result

 

As stated in the blog, there are many reasons why this may fail

- those member states without a search hit either have no metadata or misconfigured metadata 

- those dataset records where you can not determine the service metadata and the service endpoint have broken metadata (or provide a website link in stead of a direct connection)

- Some endpoints may be inaccessible or have not properly linked the featuretype in the getcapabilities with the dataset metadata

- retrieving the data may cause issues (wfs version, performance, as-is data)

- the data itself may be invalid (not according to model, gml version, ringorder, etc)

 

A challenge may be how to rank interoperability issues, a service may be badly discoverable but could besides that have good quality. Since the interoperability test fails at the first step, could be flagged as non existing.

#11 Updated by Giacomo Martirano almost 5 years ago

Ciao Paul.

By chance I read a few days ago your post and I found it extremely relevant and interesting!

With reference to the "interoperability issues ranking" question that you highlighted, based on my (still limited) experience, poor data harmonization is one of the biggest barriers, which is a step back with respect to any accessibility issue.

Just to make another example, yesterday I discovered a dataset for which the conformity metadata element was declared conformant with respect to 1089, but then, after "diving" until a downloadable gml test subset, I saw that the reference schema is not an INSPIRE one!!

In general, I see a quite dangerous tendency followed by many data providers preferring shortcuts in their harmonization processes instead of pursuing a true interoperable usability of their resources.

#12 Updated by Paul van Genuchten almost 5 years ago

Michael Ostling notified me of a challenge in my test setup. And he's right, there are actually 2 challenges.

- How will a test client be aware of a service record being attached to a dataset record, since the relation is one way, from service to dataset. Sure, most of us provide a data-endpoint in the dataset metadata itself, however strictly, the endpoint should be taken from the service metadata. Catalog systems usually create a link both ways, as soon as they discover a link between a service and a dataset, but a testing client (or regular csw client) will not be able to turn around that link. I have no suggestion how to solve this, besides recommending data providers to also add a data link in their dataset metadata.

- How will a test client (or any client) be aware of the protocol used in a specific endpoint? If it is a download service it can be either wfs or atom or a landingpage. Some countries (sweden, netherlands, ...) added a requirement to their metadata profiles, so data providers are required to provide a protocol value in the onlineresource. An alternative, as far as I know used by the JRC validator, is to use probing the endpoint url to deduce from the returned content (capabilities or atom) what type of service it is.

Aother issue mentioned by Michael is following

How do you actually find the Atom-entry or WMS-layer for the actual dataset you are interested in ?
In a WMS or Atom you only have a link to the metadata-record describing each dataset.
Do you have these MetadataURLS are queryables in your CSW or how do you manage this ?

As far as I know this is where namespace:identifier should be used, the namespace:identifier as mentioned in the capabilities/atom should match the namespace:identifier in the dataset metadata. As an alternative a client could get the metadata url from the capabilities and see if that is the same document as the original csw-metadata.

Also available in: Atom PDF