Automatic bilingual term extraction with OneClickTerms by SketchEngine

When I wrote about this great terminology extraction tool OneClickTerms back in 2017 I was already quite enthusiastic about how useful it was for last-minute conference preparation. But the one thing I didn’t mention back then (or maybe it wasn’t available yet) was that OneClickTerms does not only extract terminology from monolingual documents, but it also does bilingual extraction. I happened to use it the other day and noticed once again how much I like it.

In my experience, the bilingual extraction function works best with a pre-aligned file format, i.e. a file where it is clear which segment or sentence from the source and target texts correspond to each other. This can be a TMX file from a translation memory system, or a bilingual Excel sheet.

I like to create bilingual Excel sheets from any relevant text I find in two or more languages. I like to use them for parallel reading, and also as bilingual text corpora to search for relevant terminology in the booth. Creating such parallel text corpora works very smoothly for example with EU legal acts from Eurlex. You simply copy and paste the bilingual display (make sure you paste “values only” and not the original format), and the sentences usually align quite neatly and require very few corrections. This usually also works quite nicely with websites in different languages.

For OneClickTerms, you then need to make sure that the names of the languages are written in English in the first lines of the respective columns.

This is what such an Excel file can look like:

And here comes the extraction result:

What I really appreciate and saves a lot of frustration is that OneClickTerms lets you check the extraction results and chose from different options manually. Knowing that bilingual term extraction works far from perfectly, I prefer to look at the results quickly, correct or discard some terms here and there, and then have a reliably bilingual term list, which I can download as an XLSX, CSV, or TBX file.

I was impressed by the fact that the tool even matched a three-word term in German (technische Bewertungsstelle) correctly with its English acronym (tab) – while, interestingly, it failed to identify the English full form (technical assessment body). But then the English acronym was much more often used in the discussion anyway.

OneClickTerms also aligns unaligned parallel texts, but, depending on the text structure, the matching will be less reliable.

For the purpose of conference interpreters, especially for last-minute preparation when you have no time to go through a complete document but need to know the most essential technical terms, I find this tool extremely useful. The monolingual extraction even provides context for each extracted term (see blog article on monolingual extraction).

In the example above, I checked after the meeting how many relevant terms the tool had extracted, and I can confirm that most of the terms in the extraction results also came up in the meeting. The extracted list is of course far from complete, so if you have time and the text you have is highly relevant to the meeting you are going to work in, I would always recommend reading both texts thoroughly and extracting terms manually using Interpretershelp or Interpretbank.

Knowing that term extraction varies considerably in quality depending on source texts and language combination, I would be really interested to know what your experience is. OneClickTerms offers a free trial period and has a monthly subscription plan for 17.84€ (automatic renewal can be deactivated!) – so feel free to play around with it and share your impressions!

About the author

Anja Rütten is a freelance conference interpreter for German (A), Spanish (B), English (C), and French (C) based in Düsseldorf, Germany. She has specialised in knowledge management since the mid-1990s.

Kommentar verfassen

Diese Website verwendet Akismet, um Spam zu reduzieren. Erfahre mehr darüber, wie deine Kommentardaten verarbeitet werden.