Splitting segments after paragraph marks
Thread poster: Sophie Borel
Sophie Borel
Sophie Borel  Identity Verified
France
Local time: 00:41
English to French
+ ...
Oct 7, 2019

Hello,

Could someone help me with the segmentation of XML files?
I'd like to segment the file after paragraph marks. I tried to change the TM segmentation rules, but I don't know how to tell Studio to split segments after paragraph marks (¶). Can somebody help?

Thanks,
Sophie


[Edited at 2019-10-08 07:52 GMT]


 
Anthony Rudd
Anthony Rudd

Local time: 00:41
German to English
+ ...
Segmentation Oct 8, 2019

Project Settings → Language Pairs → Translation Memory … → Settings → Language Resources → Segmentation Rules

Add Segmentation Rule


 
Sophie Borel
Sophie Borel  Identity Verified
France
Local time: 00:41
English to French
+ ...
TOPIC STARTER
Details Oct 8, 2019

Thanks Anthony,

Do you know if I have to add a segmentation rule for source language only or for target and source languages?

Thanks


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 00:41
Member (2006)
English to Afrikaans
+ ...
@Sophie Oct 8, 2019

Sophie Borel wrote:
I'd like to segment the file after paragraph marks (¶).


Firstly, let's just make sure that you really mean segmentation by paragraph marks, i.e. "¶", and not simply segmentation by paragraph. Do you have these characters (i.e. "¶") in the XML file and you want to split the text by those marks, or are you actually just viewing the XML file in a viewer that shows the paragraph breaks as "¶" symbols?

Let's assume that your XML file actually contains "¶" characters, i.e. that if the text is "The cat¶sat on¶the mat", then you want there to be three segments, namely "The cat¶", "sat on¶" and "the mat". To accomplish this, you have to add a segmentation rule that has "¶" as the "Before break" text and nothing as the "After break" text.

I'm no expert, but AFAIK, in Trados 2019, you can set segmentation rules in two places, namely as part of a translation memory's settings and as a language resources file. Then, when you create a project, you can specify either a translation memory or a specific language resources file.

If you want to set the segmentation rule as part of a translation memory's setting, then you can access that setting in a number of ways, but one way is to go to the "Translation Memories" pane, right-click the translation memory > Settings > Language Resources > Segmentation Rules > Edit > Add. Then name the rule something like "Paragraph mark break", click Advanced View, and put ¶ in the "Before break" field and make the "After break" field empty.

You can also reach this TM editing dialog from the "Projects" pane, right-click the relevant project > Project Settings > Language Pairs > (choose the correct language pair) > Translation Memories and Automated Translation > (click the relevant TM) > Settings (if not greyed out) > Language Resources > Segmentation Rules > Edit > Add.

If you want to set the segmentation rule as part of a language resource file, go File > New > New Language Resource Template Segmentation Rules > Edit > Add. Add the segmentation rule, and then save the language resource file somewhere. The language resource file will then show up in the "Translation Memories" pane, where you can edit it if you want to. When you create a new project, you can choose the language resource file at step #3, All Language Pairs > Language Resources > and select the file from the drop down or browse for it. You can also add the language resource file to an existing project, by right-clicking the project > Language Pairs > All Language Pairs > Translation Memory and Automated Translation > Language Resources.

Sophie Borel wrote:
Do you know if I have to add a segmentation rule for source language only or for target and source languages?


As far as I know, that is only necessary if you want to do alignment.

Also, the option to select a language resources file is only available under "All Language Pairs".


 
Sophie Borel
Sophie Borel  Identity Verified
France
Local time: 00:41
English to French
+ ...
TOPIC STARTER
paragraph Marks Oct 8, 2019

Hi Samuel,

Yes, I mean mean segmentation by paragraph marks.
Thanks for your reply. I tried to add the TM segmentation rule, but I still obtain the same segmentation...

For example, I have

(h3)The product Benefits(/h3)
(ul)
(li)Complete product solutions(/li)
(li)quality at high speeds(li)
(li)User-friendly(/li)
(li)Flexible and productive(/li)
(li)products for every application(/li)
(li)Small-or l
... See more
Hi Samuel,

Yes, I mean mean segmentation by paragraph marks.
Thanks for your reply. I tried to add the TM segmentation rule, but I still obtain the same segmentation...

For example, I have

(h3)The product Benefits(/h3)
(ul)
(li)Complete product solutions(/li)
(li)quality at high speeds(li)
(li)User-friendly(/li)
(li)Flexible and productive(/li)
(li)products for every application(/li)
(li)Small-or large-formats, on-demand (/li)
(li)Low cost of ownership
(li)Solutions that achieves a competitive edge(/li)
(/ul)
Above all, our brand manages..[...].

In ONE segment...
It would really be helpful to split that properly to use the TM efficiently, but I can't find a solution here...

[Edited at 2019-10-08 13:29 GMT]

[Edited at 2019-10-08 13:31 GMT]
Collapse


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 00:41
Member (2006)
English to Afrikaans
+ ...
@Sophie Oct 8, 2019

Sophie Borel wrote:
I mean mean segmentation by paragraph marks.


But the example that you show below has no paragraph marks in it. (-:

For example, I have
(h3)The product Benefits(/h3)


Did you use round brackets here to avoid using pointy brackets (i.e. < and > ) which break the ProZ.com forum software, or does your text actually use round brackets?

What is the file type?


 
Sophie Borel
Sophie Borel  Identity Verified
France
Local time: 00:41
English to French
+ ...
TOPIC STARTER
@Samuel Oct 8, 2019

Well, they disappeared in the forum, but there are paragraph marks every time there's a line break, at the end of each line...
And yes, I rounded the > signs to force them to appear in the post...


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 00:41
Member (2006)
English to Afrikaans
+ ...
@Sophie Oct 8, 2019

Sophie Borel wrote:
...


Well, I guess it may be a file filter issue, but I have reached the limit of what I can do without seeing the actual file. Sorry.


 
NeoAtlas
NeoAtlas
Spain
Local time: 00:41
English to Spanish
+ ...
Exclude Oct 9, 2019

Sophie Borel wrote:
(h3)The product Benefits(/h3)
(ul)
(li)Complete product solutions(/li)

(li)Solutions that achieves a competitive edge(/li)
(/ul)

Are the tags (h3) (/h3) (ul) (/ul) (li) (/li) in Project settings > File Types > [your XML] > Advanced > Embedded content?
If so, click on each tag and Edit > Advanced > Exclude
☛ Mentioned tags can be excluded, but (b) (/b) are not., so be careful if you change the settings.

[Edited at 2019-10-10 07:16 GMT]


 
Sophie Borel
Sophie Borel  Identity Verified
France
Local time: 00:41
English to French
+ ...
TOPIC STARTER
Solution found! Oct 10, 2019

Thanks for your help!

I finally found how to resolve this issue thanks to a link given by Paul Flikin from SDL Client Services :
multifarious.filkin.com/.../

Hope it will help others with the same issue!


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Splitting segments after paragraph marks







Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »