Data Discovery (Exchange Network Discovery Service - ENDS)
From Exchange Network Wiki
Contents |
Discovery Defined
Discovery refers to the ability to use a defined set of protocols to find out what data is available, who has it, how to ask for it, and how to interpret it once it is retrieved. For web services in general, the Universal Description, Discovery, and Integration (UDDI) services enable data discovery. For the Exchange Network, discovery is accomplished through the Exchange Network Discovery Services (ENDS). ENDS is implemented as a set of services complying to the Exchange Network protocols (accessed through "Query").
ENDS Background
The initial concept for Exchange Network Discovery Services was imagined and developed by Glen Carr of Oregon DEQ. Glen was asked to consider "How can we build a web site that can be used to "browse" the Exchange Network?" After considering the Exchange Network specifications and protocols, along with existing discovery mechanisms, Glen designed a set of services to extend and enhance existing capabilities. He built a prototype Network Browser in parallel with the services, and used that work to inform the requirements for ENDS. In May of 2005, the first functional prototype of ENDS and an accompanying browser were released. By the summer of 2006, the ENDS schema and FCD has been formally adopted by the Network Operations Board. The Board announced intent to provide ENDS as a central service of the Exchange Network.
Web Service Discovery and the Exchange Network
The Worldwide Web Consortium (W3C) is the standards body for the web, and has defined a set of discovery services for web services. This powerful set of standards makes it possible to simply "ask the Internet" what is there. UDDI is the top level language that allows one to find things like what servers offer web services, where they are located, their native language, and how to retrieve their Web Services Definition Language (WSDL) file. The WSDL in turn describes what services are available, and how to call them.
In the context of the Exchange Network, all nodes use the same WSDL file to promote interoperability. This creates some very powerful capabilities (like Solicit--request data and come back for it later), but it disables most of the power of the WSDL in the process. Using UDDI and the WSDL, one can find out (for example) that any node will support a service called "Query", that needs a service name, a NAAS token, and some parameters. Nothing in that specification will tell you if a node has water monitoring data, or the specific array of parameters to use (including the critical service name), or what form the returned data will take. Even when implemented as expected, WSDL and UDDI do not provide much help in what to put in a parameter field to return valid data--let alone form a meaningful filter query.
Discovery on the Network
The Network standard WSDL defeats much of standard data discovery, but at the same time adds new power. The "Exchange Network Browser" (a web site/tool) was imagined as a site that could look up:
- What nodes exist on the Exchange Network
- What services each node offers
- What parameters are required for a service: e.g. GetWaterMeasurements (Agency, County, analyte, date range)
- What values are allowed for each parameter: e.g. What are the valid county entries for the Washington node?
- What stylesheets exist to view the data in a meaningful form--and where can they be downloladed
- Additional information relvant to a specific node.
The target was a web application that is both dumb (it "knows" nothing of the data it is accessing) and remarkable (it can construct an onscreen form on the fly to input prarameters and retrieve data). What makes the borwser special is that it can find and access any new service that is registered in ENDS without modification.
ENDS Implementation
Version 1.3
The first production version of ENDS (still operating as of December, 2008) was ENDS 1.3. The service/registry was mounted by Oregon DEQ, then briefly transferred to EPA. Both implementations suffered from a lack of registry entries. The only way to register services was via a set of Exchange Network services that deliver a well-formed XML description of the services to the ENDS server. There was no requirement or incentive to register services, so entries were limited to persons willing to invest effort in contacting node owners and getting their service details to the ENDS manager (Glen Carr, Oregon) and those who were activley using the service (EnfoTech, under contract to New Jersey DEP). In early 2007, Oregon DEQ resumed hosting the official ENDS implementation, in an attempt to get a cleaner and more useful set of entries.
ENDS 1.3 Architecture
The ENDS v1.3 exchange defines several Query services to retreive a list of nodes and services:
- GetDataServices - returns ENDS_DataServices_v.1.3.xsd
- GetSchemaList - returns ENDS_NetworkSchemaList_v.1.3.xsd
- GetRequestList - returns ENDS_RequestList_v.1.3.xsd
- GetStyleSheetList - returns ENDS_NetworkStyleSheetList_v.1.3.xsd
- GetParametersList - returns ENDS_ParameterList_v.1.3.xsd
- GetExampleList - returns ENDS_NodeExampleList_v.1.3.xsd
The six common data service payloads can be used to update data within the exchange just as they are used to consume data about the exchange. To update ENDS, replace the word Get with Set in the list above and perform a submit instead of a query. The FCD states that the ENDS database would merge in the changes (add or update) as needed. There was no mechanism defined for deleting nodes, services, or parameters.
This version of ENDS defined each Service and Parameter as global. This means that there can only be one definintion of a "PermitName" parameter for a data service across the entire Network. The FCD states that the ENDS administrator would need to reconcile conflicting definitions. Nodes then link to Services in a many-to-many relationship in the ENDS database. In the same manner, two Services can share the same Parameter.
Issues
The global definitions for services and Parameters immediately started causing problems since groups that piloted the use of ENDS and had conflicting definitions for FRS v2.3 exchange services. One group objected heavily to the requirement that Service and Parameter definitions be unique. There were limitations at the time that have since been overcome that greatly reduce the arguments that were made, however. For example, with the addition of the dataflow parameter in Node 2.0, service names no longer need to be unique across the entire network.
The primary underlying problem with ENDS 1.3 continues to be the very limited scope of data publishing on the network. Data that is only available when the owner initiates a "Submit" action is inherently not discoverable.
Version 2.0
ENDS version 2.0 was developed primarily because with the advent of Node 2.0, it was necessary to specify the node version along with the other ENDS information. The Node 2.0 specification also offered an opportunity to automate some of the effort involved in registering services. Every Node 2.0 must honor a "GetServices" primitive request that returns a file lisitng the services provided. ENDS 2.0 will use this service to populate ENDS automatically. Recognizing that many node owners would not take the time to provide valid values lists for services, the ENDS 2.0 FCD provides an optional "GetServiceDetails" method defining node specifics, including the allowable values for parameter queries. The NTG approved the ENDS 2.0 FCD in November, 2008. It is being implemented at EPA to replace ENDS 1.3.
ENDS 2.0 Technical Architecture
ENDS 2.0 consists of three components:
- ENDS Node - A central node that offers services that allow network partners to query the central repository. It also allows node administrators to update the central list of nodes, exchanges, and services. The GetServices_v2.0.xsd schema is the format in which the ENDS central directory returns data.
- ENDS Web Site - A web site to manually view and edit nodes, exchanges, and services. It is the human interface to the ENDS Node listed above.
- Node Service Description Specification - A specification for how nodes publish a list of the services they implement.
One major difference between ENDS v2.0 and ENDS v1.3 is that there is no longer a single standardized list of network services. Each node supplies a list of their own services which may or may not align with other nodes that implement the same exchange. See the ENDS Service section below for more information.
Each component is described in the following sections.
ENDS Node
The ENDS v2.0 node is hosted by EPA. It can be configured to intermittently call the GetServices method of each node on the network to retrieve an updated list of services from each. The information would be used to update the central database with the services offered by each node. This function has yet to be implemented.
The ENDS v2.0 Specification states that schematron will be used to validate that services returned by a node are described consistently with the specification for the service in the exchange's FCD. This function has not yet been implemented.
The ENDS Node implements two services:
- GetNodeServiceList (Query) - Allows for querying for a list of services, filterable by node name, dataflow, service name, and node version. Returns an XML file conforming to the NDSL XML schema (see below).
- Service Refresh (Submit) - Allows for updating the ENDS repository with details for a given node.
ENDS Web Site
The ENDS v2.0 web site is located at https://ends2.epacdxnode.net/.
The ENDS web site uses NAAS authentication for users to log in. The ENDS implementation uses Test NAAS credentials A production version of ENDS is not yet available.
Node Service Description Specification
The ENDS v2.0 Specification implements the GetServices_v2.0.xsd XML schema, refered to in the specification as Node Service Description Language (NSDL). The schema has one root element; NetworkNodes, containing a list of zero or more nodes. Each node can include one more more Services. In turn, each Service can contain zero or more parameters, stylesheets and properties. An instance document conforming to this schema should be returned when the GetServices method defined in the Node v2.0 WSDL is invoked by a remote partner.
The ENDS v2.0 Specification also mentions a second service that should be implemented by all nodes called GetServiceDetails. This service is optional for nodes to implement. The results are retuned in a schema format called Data Element Description Language (DEDL). This schema describes node-specific details for a given service. The DEDL schema has not been formally published and it is believed that no nodes implement this service yet.
