GermaParl Corpus of Plenary Protocols (Q6318)

From MaRDI portal
Dataset published at Zenodo repository.
Language Label Description Also known as
English
GermaParl Corpus of Plenary Protocols
Dataset published at Zenodo repository.

    Statements

    0 references
    The GermaParl Corpus of Parliamentary Protocols has been prepared in thePolMine Project and covers debates in the German Bundestags since the first meeting on September 7, 1949. GermaParl v2.2.0-rc2 covers debates until June 28, 2024. It prepares a forthcoming public release. The most important new feature of GermaParl v2.2.0 is the inclusion of an annotation layer with DBpedia URIs. GermaParl is a quality-tested resource on German parliamentarism. Newly added material for recent partliamentary sessions has not yet been tested as comprehensively as the data in previous public releases. This is why we call on beta users to be aware of remaining errors in the data and to contribute to the quality of the resource by giving feedback. Beta users of GermaParl v2.2.0-rc1 need to request access to the corpus and are kindly asked to give feedback on any issues they encounter, so that GermaParl v2.2.0 will be a trustworthy, high-quality resource for research on parliamentary proceedings. The beta release includes a linguistically annotated indexed version (Corpus Workbench / CWB data format) of GermaParl. Beta users are requested to proceed as follows: Request access: Click the respective button on this page. A personal invitation to serve as a beta user is not required to be eligible. A short and telling note on the research interest you associate with GermaParl Beta will help us to make a quick decision. Confirm Email:You will then receive an email from Zenodo to verify your email-address. It may take a while (up to an hour) until you receive this message. If you still do not find it in your Inbox, check the SPAM folder of your mail account. Confirm your Email address. Confirmation ofaccess: We need to confirm data access manually. We will consider incoming requests on a continuous basis, but please allow 2-3 days for a response. Join GitHub issue tracker: To process feedback systematically, we use the issue tracker of a private GitHub repository. We will invite you to be a collaborator of this repository. To be able to invite you to the GitHub repository, we will ask you toprovide us with yourGitHub account. Please consider creating a GitHub account if you do not yet have one. Download and install corpus: Once we have confirmed data access, Zenodo will send you an Email with a download link. Please retain this download link. If you work with the CWB variant of GermaParl, we suggest to install the corpus using functionality included in the R package cwbtools, using the following code. Insert the download link. A proper internet connection is advisable: The size of the corpus tarball is ~2,6 GB. # insert download link zenodo_url - "INSERT-ZENODO-LINK-HERE" # install cwbtools install.packages("cwbtools") # install corpus library(cwbtools) tmp_tarball - zenodo_get_tarball(url = zenodo_url) corpus_install(tarball = tmp_tarball) # install polmineR install.packages("polmineR") # check installation library(polmineR) corpus("GERMAPARL2") If you have not used CWB indexed corpora before, the installation process will suggest and create directories for data storage. This involves defining the environment variable CORPUS_REGISTRY permanently for future R sessions. Explore GermaParl and give feedback: Given the size of the data, it is impossible to manually check the data throughout. Remaining errors are to be expected. Your feedback will help us to prepare a consolidated official release of the updated version GermaParl! Acknowledgements: We gratefully acknowledge funding from the German National Research Data Infrastructure (NationaleForschungsdaten-Infrastruktur / NFDI).Funding from KonsortSWD(project number 442494171)has advanced the data preparation tool set to facilitate the robust annotation of additional annotation layers in large corpora (such as Named Entities). This is instrumental for linking parliamentary data with other data.KonsortSWD is funded by the German Research Foundation (DFG) as part of the National Research Data Infrastructure Germany (Nationale Forschungsdateninfrastruktur, NFDI) under project number 442494171. Funding from the Text+ consortium is instrumental for updates of the corpus, quality control and keeping data formats up with current and futuredevelopments.Text+ is funded by the German Research Foundation (DFG) as part of the NFDI under project number 460033370. The data quality ofGermaParl we are able to offer at this stagehas benefitted significantly from a cooperation with the SOLDISK project at the University of Hildesheim, andcomprehensive manual quality control of the data carried out by the SOLDISK team. A very special thanks goes to Hannes Schammann, Max Kisselew, Franziska Ziegler, Carina Bker, Jennifer Elsner and Carolin McCrea. We also would like to thank our beta users which provided us with invaluable feedback and greatly enhanced the quality of the data over the course of multiple release candidates.
    0 references
    22 July 2024
    0 references
    0 references
    0 references
    v2.2.0-rc1
    0 references

    Identifiers

    0 references