de
Menü

ENTAILab is the core infrastructural service and research centre of the New Data Spaces programme.

It is dedicated to the use of existing research infrastructures, their advancement and the demand-oriented generation of a new research infrastructure for the needs of the InfPP projects and the development of new data spaces. ENTAILab aims to create a unique infrastructure for research-based innovations in the field of survey data and beyond. ENTAILab consists of a set of four infrastructure measures that provide a successful and supportive environment for research within and across the projects of InfPP. Together, they will systematically feed results back into different kinds of panel applications and studies and social science research in general.   

Measure 1: Build on and Develop Existing Panel Studies

Principle Investigators: Stefan Liebig and Sabine Zinn

The InfPP initiative aims to expand data access for social science research by utilizing existing large-scale surveys and infrastructures such as the SOEP Innovation Panel, the GESIS Panel, the GESIS Access Panel for digital behavioural data, and the NEPS Next cohorts. These platforms enable innovative surveys and experiments, including the selection of special-interest groups, linking different data forms, and incorporating technical innovations such as app-based surveys and API data collection. Additionally, joint surveys and pilot studies can be developed based on research needs and target populations.

Two central tasks are at the core of the Build on and Develop Existing Panel Studies work package. First, novel modules, such as new survey content or technical innovations, must be integrated into ongoing panel studies. This requires close coordination between the projects and infrastructures to ensure adherence to established standards. Second, the impact of these innovations on the overall quality of the data must be evaluated, considering factors such as participant engagement and non-response rates. This assessment is particularly important for long-running panels, as design changes can affect participant retention and data quality across different survey waves. Addressing these tasks ensures that the innovations contribute meaningfully to empirical social research. 

Measure 2: Research-Driven Infrastructure for Advanced Survey-Related Data (CIRCLET)

Principle Investigator: Alexander Mehler

ENTAILab involves the implementation, testing and provision of a strong research-oriented tool in the form of a research-driven infrastructure for advanced survey-related data (CIRCLET). CIRCLET will ensure the reproducibility and interoperability of methods working with survey data. This is done through a multi-phase strategy that drives, scales and evaluates the development of methods based on new survey data over the course of InfPP. CIRCLET develops, tests and provides generic services to open up new data and methodological horizons according to the evolving needs of InfPP.

CIRCLET is preferably used by all InfPP projects to share data and methods, test their reproducibility and interoperability, and enrich their methods. Using the Docker Unified UIMA Interface (DUUI), CIRCLET provides a distributed multi-server infrastructure that allows InfPP to containerize methods and facilitate their operation in server clusters to make them reusable. This contributes to the coherence of all InfPP projects and to making innovations available in such a way that they can be reused outside the innovating project as quickly and extensively as possible. Collaboration between projects using CIRCLET as a common platform will be massively strengthened.

CIRCLET is research-driven; it focuses on the needs of the InfPP for which there is currently no or insufficient provision, and go beyond what is offered by the NFDIs with which the InfPP collaborates in order to maximize synergies. CIRCLET includes several means to model and enhance the survey data research cycle: a multimodal data acquisition system, a machine learning system that leverages large language models and related technologies and a hub technology for securing reproducibility. 

Measure 3: Data Protection and Ethics

Principle Investigator: Reinhard Pollak

Data protection and data ethics are crucial boundaries for research projects using survey data, Big Data, AI, and register data, which cut across all four research areas of the InfPP. Projects often face the challenge of balancing the requirements of data protection regulation with demands of specific research designs, while meeting ethical standards of research. Projects must clarify which data should be analysed and which data processes are planned without jeopardizing information privacy.

The application of AI methods raises questions about the explainability and reconstructability of the results, which in turn may serve as inputs to further data analysis, blurring the algorithmic origins of the original results. Moreover, the application of AI methods raises the problem that the original training data (e.g., texts or images) may be partially reconstructable from the trained models, which touches on issues of data privacy. InfPP needs to deal with questions of anonymization of data and of algorithms. At the same time, we need to adhere to rather unspecified ethical standards, leaving room to explore, discuss and develop standards that match both, the valid and valuable concerns regarding privacy, information sovereignty, deception, and integrity of study subjects, and the research interests of the InfPP project members.

The work package has three goals: 1) Providing support for the individual InfPP projects thru workshops, moderated discussion groups, and individual counselling; 2) Networking with international and national researchers, organizations, and associations on data protection issues and becoming a research-based stimulating voice in the national and international debate; 3) Exploring ethical challenges of current and future InfPP projects and develop guidelines for future research in the research areas of InfPP. 

Measure 4: Results for Future Data Spaces and Open Science

Principle Investigators: Corinna Kleinert & Cordula Artlet

The New Data Spaces programme aims to produce new data, to generate knowledge and results on data, designs and instruments, and to provide tools, methods and guidelines that will benefit and systematically feed into large-scale panel studies, linked data, as well as smaller-scale data generation projects and applications. Thus, the programme is not limited to large-scale survey and panel studies, but will address future social science research as a whole.

The New Data Spaces projects will contribute to these goals, but will need support to integrate their findings and results with those of other projects, disciplines, and international programs, to reflect systematically on the reproducibility of their results, or to scale up their methods and procedures to large-scale data and surveys. Because the previous research that forms the basis of the New Data Spaces programme’s four research has often been siloed or fragmented by discipline, methodological approach, or topic and is, therefore, not easy to synthesize and evaluate. To address these challenges, ENTAILab Measure 4 is devoted to collecting, transferring and diffusing knowledge within and beyond the New Data Spaces programme. We will do so by compiling existing research findings and programs, systematizing findings from the InfPP projects, supporting the reproducibility and reusability of the projects’ outcomes, presenting and explaining tools and outcomes to non-expert and non-technical audiences, identifying training needs within the project, and assisting in conceptualizing the second phase of the programme.