How to tell ChatGPT to extract terminology from parallel texts in different languages

My dear colleague (and former student) Florian Pfaffelhuber just drew my attention to the fact that ChatGPT is great at multilingual terminology extraction. It can also handle more than two languages and will create very nice multilingual glossary tables for you.

What worked best when we tested it today was to copy the prompt and the respective texts into one message for ChatGPT, i.e. not submit the prompt and then send the text corpora in separate successive messages. I put the different bits together in a Word document and then copied everything into the message field in one go. The texts we used for testing were sets of claims from European patents that we just copied out of the PDF patent specifications. This is why we told ChatGPT to include the reference signs, to make sure it picked the essential terms.

These are the prompts we used:

In English:

Please extract the technical terms from the following text and list them in a table in German, English and French. Please also include all terms that stand in front of a reference sign (reference signs are the numbers in brackets). Please also include the reference signs in the table.

In German:

Bitte extrahiere aus folgendem text die Fachtermini und liste sie mir in einer Tabelle auf Deutsch, Englisch und Französisch. Bitte schließe auch alle Begriffe mit ein, die vor einem Bezugszeichen stehen (Bezugszeichen sind die Zahlen in Klammern). Bitte übernimm auch die Bezugszeichen in die Tabelle.

***

Addendum [April 20, 2023]: After some further experimenting, it turned out that ChatGPT can be very “creative” from time to time and adds terms to the glossary randomly that do not occur at all in the respective text. Here is another prompt that seemed to overcome this problem:

I have a text in two languages. Please create a bilingual glossary containing exclusively the terms from this text. This is the text:

All in all, it is important to bear in mind that ChatGPT or any AI does not work like a software tool, which is based on rules and programmed to perform predefined tasks. ChatGPT may respond to the same prompt differently each time you use it.

***

This is what the result looks like using the first prompt:

Deutsch Englisch Französisch
Kanäle/Kabel channels/cables canaux/câbles
Verteilungsnetz distribution network réseau de distribution
Zugangsnetz access network réseau d’accès
Schicht (L1) layer (L1) couche (L1)
Schicht (L2) layer (L2) couche (L2)
Tragschicht support layer couche de support
Sand sand sable
Schotter gravel gravier
Steine stones pierres
Schmutz dirt saleté
Netzlayout network layout disposition du réseau
Haupt-Mikrograben main micro-trench micro-tranchée principale
Spleißstelle (F) splice point (F) point de fusion (F)
Verzweigung branch branche
Anschluss connection connexion
Endziel end target cible finale
Fräsen milling fraisage
Fräsmaschine milling machine fraiseuse
Diamantfräsblatt diamond milling blade lame de fraisage au diamant
Gegenlauffräsen counter-rotational milling fraisage contre-rotatif

And here you can see ChatGPT at work:

And how does ChatGPT make your life easier? Feel free to share your experience, tips, and tricks in the comments 🙂


About the author

Anja Rütten is a freelance conference interpreter for German (A), Spanish (B), English (C), and French (C) based in Düsseldorf, Germany. She has specialised in knowledge management since the mid-1990s.

She is a full member of AIIC, an accredited freelance interpreter with the EU institutions and the European Patent Office, and has vast experience as a university lecturer.

 


Posted

in

, , ,

by

Comments

One response to “How to tell ChatGPT to extract terminology from parallel texts in different languages”

  1. Anja Rütten avatar

    Auch nett, um eine deutsche Liste der Bezugszeichen zu haben und sie neben die Abbildung zu kleben: “Bitte extrahiere aus folgendem text die Fachtermini mit Bezugszeichen (Bezugszeichen sind die Zahlen in Klammern) und liste sie mir in einer Tabelle auf. Stelle die Bezugszeichen vor den dazugehörigen Fachterminus. “

Leave a Reply to Anja RüttenCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.