{"id":2698,"date":"2025-06-09T14:35:57","date_gmt":"2025-06-09T14:35:57","guid":{"rendered":"https:\/\/www.pacores.eu\/?p=2698"},"modified":"2025-06-11T15:22:51","modified_gmt":"2025-06-11T15:22:51","slug":"irene-doval-y-michael-lang-han-presentado-en-el-v-congreso-pacor-parallel-corpus-based-approaches-to-language-and-data-celebrado-en-valladolid-del-28-al-30-de-mayo-la-ponencia-evaluation-of-automatic","status":"publish","type":"post","link":"https:\/\/www.pacores.eu\/en\/irene-doval-y-michael-lang-han-presentado-en-el-v-congreso-pacor-parallel-corpus-based-approaches-to-language-and-data-celebrado-en-valladolid-del-28-al-30-de-mayo-la-ponencia-evaluation-of-automatic\/","title":{"rendered":"Irene Doval y Michael Lang han presentado en el V Congreso PaCor, Parallel-Corpus-based Approaches to Language and Data, celebrado en Valladolid del 28 al 30 de mayo la ponencia: Evaluation of Automatic Sentence Alignment Methods for Spanish-English, Spanish-German, and Spanish-Chinese Literary Texts"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">The<a href=\"http:\/\/www.pacores.eu\/\"> PaCorES<\/a> collection comprises three parallel bilingual bidirectional corpora: Spanish\/German, Spanish\/English, and Spanish\/Chinese (1). The core corpora of the collection consist of literary texts from the late 20th and early 21st centuries, which were manually selected and sentence-aligned with their corresponding translations.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The fundamental step in creating parallel corpora is the alignment. Sentence alignment is the issue of finding correspondence between source sentences and their equivalent translations in the target text.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Recent advances in automatic alignment tools, including neural network-based methods have achieved accuracy levels between 90% and 95% for closely related languages like German or English. However, these methods are primarily optimized for non-literary texts, and their accuracy declines significantly with literary texts, necessitating manual revision<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The challenges of sentence alignment are especially pronounced in the Spanish\/Chinese language pair due to significant structural and linguistic differences.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This paper describes methods aimed at minimizing the need for subsequent manual revision. To address frequent misalignments caused by improper segmentation, we developed a Python script (2) tailored to the specific linguistic characteristics of each language. We evaluated three well-known tools for sentence alignment: LF-Aligner (Hunalign), Vecalign, and Bertalign (3). Aligning bilingual literary poses unique challenges, since most of the translation is interpretative and not based on 1-to-1 mappings between source and target sentences. Existing alignment methods have difficulty coping with 1-to-many and many to-many alignments that are common in literary texts.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We evaluated the performance of each aligner using standard metrics: precision, recall, and F1 score (the harmonic mean of precision and recall). These metrics were calculated for each of the three language pairs.<strong>References<br><\/strong>Spanish\/German:<a href=\"http:\/\/www.corpuspages.eu\/\"> <\/a><a href=\"http:\/\/www.corpuspages.eu\">www.corpuspages.eu<br><\/a>Spanish\/English:<a href=\"http:\/\/www.corpuspaens.eu\/\"> <\/a><a href=\"http:\/\/www.corpuspaens.eu\">www.corpuspaens.eu<br><\/a>Spanish\/Chinese:<a href=\"http:\/\/www.corpuspaches.eu\/\"> <\/a><a href=\"http:\/\/www.corpuspaches.eu\">www.corpuspaches.eu<br><\/a><a href=\"https:\/\/github.com\/michaeljlang\/PaCorEs-Splitter\">&nbsp;https:\/\/github.com\/michaeljlang\/PaCorEs-Splitter<br><\/a>LF-Aligner:<a href=\"https:\/\/sourceforge.net\/projects\/aligner\/\"> https:\/\/sourceforge.net\/projects\/aligner\/<br><\/a>Vecalign:<a href=\"https:\/\/github.com\/thompsonb\/vecalign\"> <\/a><a href=\"https:\/\/github.com\/thompsonb\/vecalign\/\">https:\/\/github.com\/thompsonb\/vecalign\/<br><\/a>Bertalign :<a href=\"https:\/\/github.com\/bfsujason\/bertalign\"> https:\/\/github.com\/bfsujason\/bertalign<\/a>\/<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The PaCorES collection comprises three parallel bilingual bidirectional corpora: Spanish\/German, Spanish\/English, and Spanish\/Chinese (1). The core corpora of the collection consist of literary texts from the late 20th and early 21st centuries, which were manually selected and sentence-aligned with their corresponding translations. The fundamental step in creating parallel corpora is the alignment. Sentence alignment is [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":2699,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"set","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[54],"tags":[55],"class_list":["post-2698","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-mayo-2025-2","tag-congresos-2"],"_links":{"self":[{"href":"https:\/\/www.pacores.eu\/en\/wp-json\/wp\/v2\/posts\/2698","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pacores.eu\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pacores.eu\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pacores.eu\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pacores.eu\/en\/wp-json\/wp\/v2\/comments?post=2698"}],"version-history":[{"count":2,"href":"https:\/\/www.pacores.eu\/en\/wp-json\/wp\/v2\/posts\/2698\/revisions"}],"predecessor-version":[{"id":2732,"href":"https:\/\/www.pacores.eu\/en\/wp-json\/wp\/v2\/posts\/2698\/revisions\/2732"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.pacores.eu\/en\/wp-json\/wp\/v2\/media\/2699"}],"wp:attachment":[{"href":"https:\/\/www.pacores.eu\/en\/wp-json\/wp\/v2\/media?parent=2698"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pacores.eu\/en\/wp-json\/wp\/v2\/categories?post=2698"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pacores.eu\/en\/wp-json\/wp\/v2\/tags?post=2698"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}