Session E - Table 1

Title: Tools and Workflows #LinkedPipes

Challenges

versioning – if you have only RDF
graph partitioning (named graphs, VoID, reification etc)
Aligning ontologies and vocabularies (what are the best tools? For example Protégé, OntoME (by Data for History consortium). Historical vs non-historical, event-based vs resource-based etc. Interoperability of different ontologies, what are the inconsistencies that you may introduce when merging ontologies and data; how to find the right ontologie(s) for aligning existing datasets
How do you decide what is the best ontology for your project? For example when trying to choose the best ontology to describe time
Combining archaeological data from different excavations. Provenance, idiosyncratic data, different level of granularity
Building a community of users, also on the technical side. For example with the CWRC-Writer
How to express different level of academic interpretations around objects, for example around 3D objects
How people can embed LOD in research process (esp Recogito); are there workflows going from Recogito to EpiDoc, GIS, etc.
Tool replicability; often there are things that are quite close but not quite right for purpose so we build something new; or building database AND all the tools on top of it rather than just working with the data
Need intuitive tools for teaching people who cannot code; workflows that use open software (e.g. From the Page to Voyant to Recogito)
Not promoting the idea of a closed virtual work environment (one single tool to do eerything) but have a tool inventory, instead, keep the tools pipeline-able i.e. modular
pipelines are long and there are many segments; problems replicate at different stages; as aggregator WHG allows contributors to enhance their data (e.g. reconciliation); how to manage an update process
In the context of aggregation, provide tools for reconciliation and erichment of data
impact of modelling decisions on future work but challenges to make time for this work/consulting
API design (for exposition of text or whatever is specific about the kind of data at hand)
Sometimes the models and standards to enable the connections are not there yet, so it is up to the practictioners to find their way around it, possibly connecting different tools

Clusters of challenges

Ontologies:
- building (decide which to use, where to extend? where to import whole or just cherry-pick parts)
- mapping/aligning (keep free from inconsistencies)
- inferencing (move from rdfs to owl2)
- tools for validation, quality control (e.g. Protogé)
- how to support complex queries–example of building up from snippets of sparql
Data complexity
- complex provenance information that needs to stay with object
- choosing ontologies to bring together heterogeneous legacy databases
- partitioning
- versioning, persistence, long-term preservation, also relevant to moving objects from one tool environment to another
- replicability of chain of actions performed on an object
Tool /flows
- input/output data/serialization formats (json-ld vs rdf+xml) and their conversions; limitations in tools (loss due to transformations performed by tools)
- pipelines for building LOD into researcher workflow
- overlaps/complementarity
- what fields do the tools absolutely need to be interoperable and documented (e.g. which bits of provenance information, serialization format, ontologies used, respective documentation location) etc.
- roundtripping/snakepit of data enhancement
- how and where to track provenance, when and how to refer to the original

Strategies

Overall Goal

Strategy on “Tools and Flows” aka Linked Pipes

-> build up a LinkedPipes working group

About Linked Pipes

produce a feature matrix specifically related to LOD and promoting pipelines
fields: name, link to source code; date of entry; entry-level tool?; consumes LOD?; produces LOD?; input formats (a section with individual columns: xml, xml+tei, xml+rdf, csv, json-ld (iiif, …), djvu+xml, geoJSON, plaintext, html, ttl, n3, graphml, jpg, audio, video, shapefiles, lpif, sql, sparql, shacl); output formats;
Encourage projects to enter in software registry (TaPoR or teresah.dariah.eu) and link to that general profile; also advise providing link to tool use case and flows that include that tool; provide link from the main page to other pages (e.g. wiki with list of tools). Focus this on working towards pipelines: Categories like consumption/production/bridging; Data formats consumed/produced;
More general information about the tools that we suggest should be entered at those other places: license; howtos; also free comments for things to point out about tools individually; project contacts (project director); technical contacts (programmers); institution(s)
later (?): categorize by type e.g. production, visualization; also bridging tools
also produce links to the pipelines, wherever they are: another page on github, a programming notebook, blog post, or whatever.

List of participants in the Linked Pipes working group

(just those whose github ids are not listed below)

name
Guenther Goerz

github ids etc.

name	github-id	twitter-id	short
Ben Brumfield	@benwbrum	@benwbrum
Gimena del Rio	@Gimena	@gimenadelr
Andreas Wagner	@awagner-mainz	@anwagnerdreas
Frank Grieshaber	@wenamun	@wenamun
Florian Thiery	@florianthiery	@fthierygeo	FT
Rainer Simon	@rsimon	@aboutgeo
Valeria Vitale	@valeriavitale	@nottinauta	VV
Susan Brown	@susanbrown	@susanirenebrown

Linked Pipes WG

“Project Manager”: Florian
“Committee”: Susan, Valeria, Rainer, Florian, Ben, Andreas, Frank, Gimena, Guenther

Notes, tools and other links to (maybe) integrate later into the inventory

Commitments

With regards to ontologies, we try to resume the discussion on ADHO’s LOD SIG as well as see if the Data for History consortium would be a good community to approach. Andreas will do the first, and Andreas will do the second (others are very welcome to chime in).
We nominate Karl :-) to create a Linked Pipes Repo for us in the Linked Pasts github organization, and call it Linked Pipes (short: Linked||); have a searchable page with the list of tools that we will commit individually to documenting and Florian to set it up; NOTE (FT): would recommend not a md structure, we should use JSON templates (as single documents to pull request files) in order to use nice frameworks like filter.js.
NOTE (FT): A domain is registered http://linkedpipes.xyz which will contain the filter.js framework.
first logo proposal by FT, VV will to digital version

CC BY 4.0 Linked Pipes WG

Google group for cummunication inside the Linked Pipes Working Group: (Ben will create, Valeria will get the email addresses of the people in the group)
Florian will work with Karl to set up the pages and be project manager of Linked||;
The following folks will be admins on the repo and collaborate to approve new contributors:
- Will document tool(s):
  - Susan
  - Valeria and Gimena (Recogito)
  - Andreas
  - Loïc
  - Florian
  - Frank
  - Guenther (Pointers to WissKi doc)
- Will document at least one workflow:
  - Ben
  - Valeria and Gimena (Recogito-related workflows)
  - Andreas
  - Florian (Alligator to AMT)
- Will prepare some kind of summary (blog post/white paper) for reporting at next Linked pasts meeting
  - Susan will lead
  - Florian will help ;-)
- will be technical admin(s) of the LinkedPipes Repo
  - Florian
- will be content admins of the LinkedPipes Repo
  - Susan
  - Valeria
  - Gimena
  - Florian
  - Andreas
  - Ben
  - Rainer
  - Frank
The basic structure of the template:
CC BY 4.0 Linked Pipes WG

{
	"name": "",
	"links": [],
	"dateOfEntry": "",
	"entryLevel": "{beginner:yes/no}",
	"consumesLOD": "true/false",
	"producesLOD": "true/false",
	"inputFormats": ["JPG", "TIFF", "PNG", "N3", "RDF/XML", "XML-TEI", "CSV", "JSON-LD", "GEOJSON", "IIIF-JSON", "PLAIN-TEXT", "HTML", "TTL", "SHP", "X3D", "any 3D format", "SQL", "SPARQL", "SHAQL", "CYPHER", "audio/video"],
	"outputFormats": ["JPG", "TIFF", "PNG", "N3", "RDF/XML", "XML-TEI", "CSV", "JSON-LD", "GEOJSON", "IIIF-JSON", "PLAIN-TEXT", "HTML", "TTL", "SHP", "X3D", "any 3D format", "SQL", "SPARQL", "SHAQL", "CYPHER", "audio/video"]
}