This page describes the setup of our 2020 NLIWoD workshop which is going to happen as an online event.

Requirements for authors

The authors need to prepare a video recording of their talk according to their submission type:

  • Full articles submissions: max. 15 minutes + 5 minutes discussions

  • Short articles submissions: max. 10minutes + 5 minutes discussions

  • Notes of all Q&A sessions will be made public after the workshop on the website. The Zoom stream will not be recorded.

Time Table

Half-day Workshop at the 2nd 2020 13.30 - 17.00 CET (Berlin time zone)

Program and Notes


  • NLIWOD - Keynote: Bhaskar Mitra - Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond

    • If you use proprietary datasets, no one can reproduce them

    • Neural Models are now in 79% of SIGIR papers

    • We still lack public IR benchmarks with large scale training data

    • Even industrial Teams now use BERT on a day-to-day basis

    • Challenges:

      • Single-Deadline/Single-Submission Challenges (such as TREC)

      • Leaderborad benchmarking lead to overfitting

      • Approaches have to work on more than one dataset

      • Bender-Rule: English is not the only Language

      • Cross-flow between communities needed!

    • Questions:

      • Are we hitting a glass ceiling with current ML models? A: More general purpose models

      • What advice would you do to researchers working on other languages, where the challenge for benchmark is even harder? A: Start building Benchmarks and gather a community around it

  • NLIWOD -Chatbot For Interacting with SDMX Databases - Guillaume Thiry, Ioana Manolescu and Leo Liberti

    • Ranking of queries/datasets can be supported by metadata

    • Usage of DataCubes still relevant

    • Real world use on the horizon - OECD

    • Questions:

      • What is important to your approach to generalize?

  • NLIWOD -Verbalizing the Evolution of Knowledge Graphs with Formal Concept Analysis - Martin Arispe, Mayesha Tasnim, Damien Graux, Fabrizio Orlandi and Diego Collarana

    • Formal Concept Analysis to find hierachies in real-world KGs

    • Questions:

      • Which verbalisation functions did you use? A: We are currently in the phase of trying out different ones.

      • What is the performance? A: FCA can deal with big data already now.

  • PROFILES - Keynote: Prof. Dr. Felix Naumann - Data Profiling in the Relational World

    • Commercial tools are still not there yet

    • How to efficiently find good dependencies? Algorithms!

    • Questions:

      • Have you considered how users can be involved to quickly reduce the search space? A: Show the results as they created and show them to users as early as possible.

      • Databases usually follow the closed-world assumption. What to consider for your proposed algorithms if that is not given?

      • What happens in the presence of NULLs? A: Algorithms can deal with the answer but there are challenges!

  • PROFILES - An Architecture for Cell-Centric Indexing of Datasets - Lixuan Qiu, Haiyan Jia, Brian Davison and Jeff Heflin

    • Table indexes are typically created on the table-level or column-level

    • Usage of cell-centric index that involves metadata, cell values and other values (context) in the respective row

    • Question:

      • How flexible is your cell indexing approach towards enriching the set of indexed fields (title, context,…), in particular w.r.t. dataset profiles? A: ElasticSearch easily allows addition of further search fields.

  • PROFILES - A Template-Based Approach for Annotating Long-Tail Datasets - Daniel Garijo, Ke-Thia Yao, Amandeep Singh and Pedro Szekely

    • Table annotation typically requires expertise in semantic technologies

    • Users add meta data to the table to support the transformation of the table into a KG

    • Question:

      • Which Wikifier do you use? How do you understand columns? A: External Service based on Wikidata, but that is not the bottleneck. For example, property linking.

  • NLIWOD - Generating Knowledge Graphs from Unstructured Texts: Experiences in the eCommerce Field for Question Answering - Diogo Sant’Anna, Rodrigo Caus, Lucas Ramos, Victor Hochgreb and Julio Cesar Dos Reis

    • Question Answering in GoBots can increase sales by 120%

    • Entity and Intent based QA systems

    • Question:

      • What do you use for training? Propriatary data and the Rasa framework.

      • Your precision is really high, how about the recall? We did measure it, please see paper.

  • NLIWOD - Generating Grammars from lemon lexica for Questions Answering over Linked Data: a Preliminary Analysis - Philipp Cimiano, Basil Ell, Viktoria Benz and Mohammad Fazleh Elahi

    • Question Grammar generation from a lemon lexicon based on LTAG grammars and LexInfo ideas

    • Advantage: portability between domains without training data and auto-completion

    • Question:

      • Have you thought of combining word embedding model with lemon lexicon? We will look into it since it can add synonyms also in high-dimensional space?

Community Discussion


  • More people can participate due to lower entrance barrier

  • Split it in two events to ensure people from all time zones can particpate more easily

  • How can we open the mic better to have more people ask more question?

  • Live conference is preferred over pre-recorded videos in the workshops

