Session E - Table 1
Challenges
- versioning – if you have only RDF
- graph partitioning (named graphs, VoID, reification etc)
- Aligning ontologies and vocabularies (what are the best tools? For example Protégé, OntoME (by Data for History consortium). Historical vs non-historical, event-based vs resource-based etc. Interoperability of different ontologies, what are the inconsistencies that you may introduce when merging ontologies and data; how to find the right ontologie(s) for aligning existing datasets
- How do you decide what is the best ontology for your project? For example when trying to choose the best ontology to describe time
- Combining archaeological data from different excavations. Provenance, idiosyncratic data, different level of granularity
- Building a community of users, also on the technical side. For example with the CWRC-Writer
- How to express different level of academic interpretations around objects, for example around 3D objects
- How people can embed LOD in research process (esp Recogito); are there workflows going from Recogito to EpiDoc, GIS, etc.
- Tool replicability; often there are things that are quite close but not quite right for purpose so we build something new; or building database AND all the tools on top of it rather than just working with the data
- Need intuitive tools for teaching people who cannot code; workflows that use open software (e.g. From the Page to Voyant to Recogito)
- Not promoting the idea of a closed virtual work environment (one single tool to do eerything) but have a tool inventory, instead, keep the tools pipeline-able i.e. modular
- pipelines are long and there are many segments; problems replicate at different stages; as aggregator WHG allows contributors to enhance their data (e.g. reconciliation); how to manage an update process
- In the context of aggregation, provide tools for reconciliation and erichment of data
- impact of modelling decisions on future work but challenges to make time for this work/consulting
- API design (for exposition of text or whatever is specific about the kind of data at hand)
- Sometimes the models and standards to enable the connections are not there yet, so it is up to the practictioners to find their way around it, possibly connecting different tools
Clusters of challenges
-
Ontologies:
- building (decide which to use, where to extend? where to import whole or just cherry-pick parts)
- mapping/aligning (keep free from inconsistencies)
- inferencing (move from rdfs to owl2)
- tools for validation, quality control (e.g. Protogé)
- how to support complex queries–example of building up from snippets of sparql
-
Data complexity
- complex provenance information that needs to stay with object
- choosing ontologies to bring together heterogeneous legacy databases
- partitioning
- versioning, persistence, long-term preservation, also relevant to moving objects from one tool environment to another
- replicability of chain of actions performed on an object
-
Tool /flows
- input/output data/serialization formats (json-ld vs rdf+xml) and their conversions; limitations in tools (loss due to transformations performed by tools)
- pipelines for building LOD into researcher workflow
- overlaps/complementarity
- what fields do the tools absolutely need to be interoperable and documented (e.g. which bits of provenance information, serialization format, ontologies used, respective documentation location) etc.
- roundtripping/snakepit of data enhancement
- how and where to track provenance, when and how to refer to the original
Strategies
Overall Goal
Strategy on “Tools and Flows” aka Linked Pipes
-> build up a LinkedPipes working group
About Linked Pipes
-
produce a feature matrix specifically related to LOD and promoting pipelines
-
fields: name, link to source code; date of entry; entry-level tool?; consumes LOD?; produces LOD?; input formats (a section with individual columns: xml, xml+tei, xml+rdf, csv, json-ld (iiif, …), djvu+xml, geoJSON, plaintext, html, ttl, n3, graphml, jpg, audio, video, shapefiles, lpif, sql, sparql, shacl); output formats;
-
Encourage projects to enter in software registry (TaPoR or teresah.dariah.eu) and link to that general profile; also advise providing link to tool use case and flows that include that tool; provide link from the main page to other pages (e.g. wiki with list of tools). Focus this on working towards pipelines: Categories like consumption/production/bridging; Data formats consumed/produced;
-
More general information about the tools that we suggest should be entered at those other places: license; howtos; also free comments for things to point out about tools individually; project contacts (project director); technical contacts (programmers); institution(s)
-
later (?): categorize by type e.g. production, visualization; also bridging tools
-
also produce links to the pipelines, wherever they are: another page on github, a programming notebook, blog post, or whatever.
List of participants in the Linked Pipes working group
(just those whose github ids are not listed below)
github ids etc.
name |
github-id |
twitter-id |
short |
Ben Brumfield |
@benwbrum |
@benwbrum |
|
Gimena del Rio |
@Gimena |
@gimenadelr |
|
Andreas Wagner |
@awagner-mainz |
@anwagnerdreas |
|
Frank Grieshaber |
@wenamun |
@wenamun |
|
Florian Thiery |
@florianthiery |
@fthierygeo |
FT |
Rainer Simon |
@rsimon |
@aboutgeo |
|
Valeria Vitale |
@valeriavitale |
@nottinauta |
VV |
Susan Brown |
@susanbrown |
@susanirenebrown |
|
Linked Pipes WG
- “Project Manager”: Florian
- “Committee”: Susan, Valeria, Rainer, Florian, Ben, Andreas, Frank, Gimena, Guenther
Commitments
-
With regards to ontologies, we try to resume the discussion on ADHO’s LOD SIG as well as see if the Data for History consortium would be a good community to approach. Andreas will do the first, and Andreas will do the second (others are very welcome to chime in).
-
We nominate Karl :-) to create a Linked Pipes Repo for us in the Linked Pasts github organization, and call it Linked Pipes (short: Linked||); have a searchable page with the list of tools that we will commit individually to documenting and Florian to set it up; NOTE (FT): would recommend not a md structure, we should use JSON templates (as single documents to pull request files) in order to use nice frameworks like filter.js.
-
NOTE (FT): A domain is registered http://linkedpipes.xyz which will contain the filter.js framework.
-
first logo proposal by FT, VV will to digital version
CC BY 4.0 Linked Pipes WG
CC BY 4.0 Linked Pipes WG
-
Google group for cummunication inside the Linked Pipes Working Group: (Ben will create, Valeria will get the email addresses of the people in the group)
-
Florian will work with Karl to set up the pages and be project manager of Linked||;
-
The following folks will be admins on the repo and collaborate to approve new contributors:
- Will document tool(s):
- Susan
- Valeria and Gimena (Recogito)
- Andreas
- Loïc
- Florian
- Frank
- Guenther (Pointers to WissKi doc)
- Will document at least one workflow:
- Ben
- Valeria and Gimena (Recogito-related workflows)
- Andreas
- Florian (Alligator to AMT)
- Will prepare some kind of summary (blog post/white paper) for reporting at next Linked pasts meeting
- Susan will lead
- Florian will help ;-)
- will be technical admin(s) of the LinkedPipes Repo
- will be content admins of the LinkedPipes Repo
- Susan
- Valeria
- Gimena
- Florian
- Andreas
- Ben
- Rainer
- Frank
-
The basic structure of the template:
CC BY 4.0 Linked Pipes WG
{
"name": "",
"links": [],
"dateOfEntry": "",
"entryLevel": "{beginner:yes/no}",
"consumesLOD": "true/false",
"producesLOD": "true/false",
"inputFormats": ["JPG", "TIFF", "PNG", "N3", "RDF/XML", "XML-TEI", "CSV", "JSON-LD", "GEOJSON", "IIIF-JSON", "PLAIN-TEXT", "HTML", "TTL", "SHP", "X3D", "any 3D format", "SQL", "SPARQL", "SHAQL", "CYPHER", "audio/video"],
"outputFormats": ["JPG", "TIFF", "PNG", "N3", "RDF/XML", "XML-TEI", "CSV", "JSON-LD", "GEOJSON", "IIIF-JSON", "PLAIN-TEXT", "HTML", "TTL", "SHP", "X3D", "any 3D format", "SQL", "SPARQL", "SHAQL", "CYPHER", "audio/video"]
}
Session E - Table 1
Title: Tools and Workflows #LinkedPipes
Challenges
Clusters of challenges
Ontologies:
Data complexity
Tool /flows
Strategies
Overall Goal
Strategy on “Tools and Flows” aka Linked Pipes
-> build up a LinkedPipes working group
About Linked Pipes
produce a feature matrix specifically related to LOD and promoting pipelines
fields: name, link to source code; date of entry; entry-level tool?; consumes LOD?; produces LOD?; input formats (a section with individual columns: xml, xml+tei, xml+rdf, csv, json-ld (iiif, …), djvu+xml, geoJSON, plaintext, html, ttl, n3, graphml, jpg, audio, video, shapefiles, lpif, sql, sparql, shacl); output formats;
Encourage projects to enter in software registry (TaPoR or teresah.dariah.eu) and link to that general profile; also advise providing link to tool use case and flows that include that tool; provide link from the main page to other pages (e.g. wiki with list of tools). Focus this on working towards pipelines: Categories like consumption/production/bridging; Data formats consumed/produced;
More general information about the tools that we suggest should be entered at those other places: license; howtos; also free comments for things to point out about tools individually; project contacts (project director); technical contacts (programmers); institution(s)
later (?): categorize by type e.g. production, visualization; also bridging tools
also produce links to the pipelines, wherever they are: another page on github, a programming notebook, blog post, or whatever.
List of participants in the Linked Pipes working group
(just those whose github ids are not listed below)
github ids etc.
Linked Pipes WG
Notes, tools and other links to (maybe) integrate later into the inventory
In terms of pipelines: Notebooks (Jupyter, R)
curl - command line tool and library for transferring data with URLs
xTriples - Web services for extracting rdf from xml
X3ML Toolkit - Extracting rdf from other data formats
jq - a commandline json processor (lesson in Programming Historian)
SAMOD: an agile methodology for the development of ontologies
Labeling System - web app for creating and publishing terms with contextual validity as LOD
Academic Meta Tool - webapp for modelling vagueness in graphs including reasoning
Alligator - web app transforming a correspondence analyses to a relative chronology as RDF
WissKI Virtual Research Environment, see also
ResearchSpace environment
Pipeline RDF+XML to JSON-LD
Protégé
OntoME
Commitments
With regards to ontologies, we try to resume the discussion on ADHO’s LOD SIG as well as see if the Data for History consortium would be a good community to approach. Andreas will do the first, and Andreas will do the second (others are very welcome to chime in).
We nominate Karl :-) to create a Linked Pipes Repo for us in the Linked Pasts github organization, and call it Linked Pipes (short: Linked||); have a searchable page with the list of tools that we will commit individually to documenting and Florian to set it up; NOTE (FT): would recommend not a md structure, we should use JSON templates (as single documents to pull request files) in order to use nice frameworks like filter.js.
NOTE (FT): A domain is registered http://linkedpipes.xyz which will contain the filter.js framework.
first logo proposal by FT, VV will to digital version
CC BY 4.0 Linked Pipes WG
CC BY 4.0 Linked Pipes WG
Google group for cummunication inside the Linked Pipes Working Group: (Ben will create, Valeria will get the email addresses of the people in the group)
Florian will work with Karl to set up the pages and be project manager of Linked||;
The following folks will be admins on the repo and collaborate to approve new contributors:
The basic structure of the template:
CC BY 4.0 Linked Pipes WG