Moses-support Digest, Vol 110, Issue 13

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: continue partial translation (Ondrej Bojar)
2. Re: continue partial translation (He He)
3. Re: decoder question (Ulrich Germann)

----------------------------------------------------------------------

Message: 1
Date: Fri, 04 Dec 2015 18:32:27 +0100
From: Ondrej Bojar <bojar@ufal.mff.cuni.cz>
Subject: Re: [Moses-support] continue partial translation
To: Philipp Koehn <phi@jhu.edu>, He He <hhe.xiy@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <4795f22e-32e5-4c83-b32e-8585eea72902@email.android.com>
Content-Type: text/plain; charset=UTF-8

Hi,

I'm also curious about max-phrase-lenght.

On a separate note: what would go wrong if you cut the partial input you are providing down to just the last N-1 words, where N is the maximum order of your language models?

Which sparse features look this far? What would one need to supply to get their scores even without this full prefix?

How far back would OSM look?

Cheers, O.

On December 4, 2015 5:28:00 PM CET, Philipp Koehn <phi@jhu.edu> wrote:
>Hi,
>
>interesting... this may be due to the maximum phrase length (the XML
>specified translation is treated as a phrase translation), which is 20
>by default.
>
>You can tell the decoder otherwise with the switch -max-phrase-length.
>
>I'd be interested to know, if this fixes the problem.
>
>-phi
>
>On Wed, Dec 2, 2015 at 9:30 PM, He He <hhe.xiy@gmail.com> wrote:
>> Hi,
>>
>> Yes. The input to the decoder is " -v 0 -threads 4 -n-best-list - 10
>> --print-alignment-info-in-n-best -xml-input exclusive
>>
>> If I break the long translation into parts it works though.
>>
>> He
>>
>> On Wed, Dec 2, 2015 at 6:14 PM, Philipp Koehn <phi@jhu.edu> wrote:
>>>
>>> Hi,
>>>
>>> it's not clear to me what you are exactly specifying to the decoder,
>>> but what you intend to do should work.
>>>
>>> Did you use the switch "-xml-input exclusive"?
>>> What exactly do you specify as input?
>>>
>>> -phi
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Nov 24, 2015 at 10:19 PM, He He <hhe.xiy@gmail.com> wrote:
>>> > Hi there,
>>> >
>>> > I'm trying to do translation conditioned on some already
>translated
>>> > prefix
>>> > (essentially what -continue-partial-translation was supposed to
>do). I'm
>>> > using -xml-input exclusive to pass in the prefix source and
>translation.
>>> >
>>> > However, when the prefix becomes long, this doesn't work, e.g.
>>> > <p translation="Britain 's trade house E D & F Man said on the
>amount of
>>> > money in eastern europe , sugar beet output both Ukraine and
>Russia in">
>>> > ??
>>> > ? ED & F ?? ? ? ?? ? , 96 / 97 ?? ? ?? ? ??? ?? ? , ????? ???? ?
>?? ??
>>> > ???</p> ?? ? ?? ? ?? ? ? , ??? ?? ? ??> 0 ||| ?? ? ED & F?? ? ED &
>F ??
>>> > ? ?
>>> > ?? ? / 96 , 97 ?? ? ?? ? ??? ?? ? ????? , ? ??? ? ?? ?? ???
>substantial
>>> > decline was expeted to be tough ||| LexicalReordering0= -4.48185
>>> > -7.01678
>>> > -1.48808 -4.3759 -6.89465-0.942918 Distortion0= -12 LM0= -227.918
>>> > WordPenalty0= -40 PhrasePenalty0= 36 TransltionModel0= -7.25771
>-34.5474
>>> > -2.80336 -22.3651 ||| -3322.64"
>>> >
>>> > It just copies the source prefix. I suspect it's because many
>words now
>>> > becomes UNK due to ignoring entries in phrase table that overlaps
>the
>>> > prefix.
>>> >
>>> > Is there a way around this? Thanks a lot in advance!
>>> >
>>> > Best,
>>> > He
>>> >
>>> > _______________________________________________
>>> > Moses-support mailing list
>>> > Moses-support@mit.edu
>>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>> >
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>_______________________________________________
>Moses-support mailing list
>Moses-support@mit.edu
>http://mailman.mit.edu/mailman/listinfo/moses-support

--
Ondrej Bojar (mailto:obo@cuni.cz / bojar@ufal.mff.cuni.cz)
http://www.cuni.cz/~obo

------------------------------

Message: 2
Date: Fri, 4 Dec 2015 13:56:44 -0500
From: He He <hhe.xiy@gmail.com>
Subject: Re: [Moses-support] continue partial translation
To: Ondrej Bojar <bojar@ufal.mff.cuni.cz>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>, Philipp Koehn
<phi@jhu.edu>
Message-ID:
<CAMdMQUM6qUba-AprP_mWXC71zpOt7JF3LgHXVPOndqyGJbXZkA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Setting -max-phrase-length 100 fixes the problem. Thanks!

On Fri, Dec 4, 2015 at 12:32 PM, Ondrej Bojar <bojar@ufal.mff.cuni.cz>
wrote:

> Hi,
>
> I'm also curious about max-phrase-lenght.
>
> On a separate note: what would go wrong if you cut the partial input you
> are providing down to just the last N-1 words, where N is the maximum order
> of your language models?
>
> Which sparse features look this far? What would one need to supply to get
> their scores even without this full prefix?
>
> How far back would OSM look?
>
> Cheers, O.
>
>
> On December 4, 2015 5:28:00 PM CET, Philipp Koehn <phi@jhu.edu> wrote:
> >Hi,
> >
> >interesting... this may be due to the maximum phrase length (the XML
> >specified translation is treated as a phrase translation), which is 20
> >by default.
> >
> >You can tell the decoder otherwise with the switch -max-phrase-length.
> >
> >I'd be interested to know, if this fixes the problem.
> >
> >-phi
> >
> >On Wed, Dec 2, 2015 at 9:30 PM, He He <hhe.xiy@gmail.com> wrote:
> >> Hi,
> >>
> >> Yes. The input to the decoder is " -v 0 -threads 4 -n-best-list - 10
> >> --print-alignment-info-in-n-best -xml-input exclusive
> >>
> >> If I break the long translation into parts it works though.
> >>
> >> He
> >>
> >> On Wed, Dec 2, 2015 at 6:14 PM, Philipp Koehn <phi@jhu.edu> wrote:
> >>>
> >>> Hi,
> >>>
> >>> it's not clear to me what you are exactly specifying to the decoder,
> >>> but what you intend to do should work.
> >>>
> >>> Did you use the switch "-xml-input exclusive"?
> >>> What exactly do you specify as input?
> >>>
> >>> -phi
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Nov 24, 2015 at 10:19 PM, He He <hhe.xiy@gmail.com> wrote:
> >>> > Hi there,
> >>> >
> >>> > I'm trying to do translation conditioned on some already
> >translated
> >>> > prefix
> >>> > (essentially what -continue-partial-translation was supposed to
> >do). I'm
> >>> > using -xml-input exclusive to pass in the prefix source and
> >translation.
> >>> >
> >>> > However, when the prefix becomes long, this doesn't work, e.g.
> >>> > <p translation="Britain 's trade house E D & F Man said on the
> >amount of
> >>> > money in eastern europe , sugar beet output both Ukraine and
> >Russia in">
> >>> > ??
> >>> > ? ED & F ?? ? ? ?? ? , 96 / 97 ?? ? ?? ? ??? ?? ? , ????? ???? ?
> >?? ??
> >>> > ???</p> ?? ? ?? ? ?? ? ? , ??? ?? ? ??> 0 ||| ?? ? ED & F?? ? ED &
> >F ??
> >>> > ? ?
> >>> > ?? ? / 96 , 97 ?? ? ?? ? ??? ?? ? ????? , ? ??? ? ?? ?? ???
> >substantial
> >>> > decline was expeted to be tough ||| LexicalReordering0= -4.48185
> >>> > -7.01678
> >>> > -1.48808 -4.3759 -6.89465-0.942918 Distortion0= -12 LM0= -227.918
> >>> > WordPenalty0= -40 PhrasePenalty0= 36 TransltionModel0= -7.25771
> >-34.5474
> >>> > -2.80336 -22.3651 ||| -3322.64"
> >>> >
> >>> > It just copies the source prefix. I suspect it's because many
> >words now
> >>> > becomes UNK due to ignoring entries in phrase table that overlaps
> >the
> >>> > prefix.
> >>> >
> >>> > Is there a way around this? Thanks a lot in advance!
> >>> >
> >>> > Best,
> >>> > He
> >>> >
> >>> > _______________________________________________
> >>> > Moses-support mailing list
> >>> > Moses-support@mit.edu
> >>> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >>> >
> >>
> >>
> >>
> >> _______________________________________________
> >> Moses-support mailing list
> >> Moses-support@mit.edu
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>
> >
> >_______________________________________________
> >Moses-support mailing list
> >Moses-support@mit.edu
> >http://mailman.mit.edu/mailman/listinfo/moses-support
>
> --
> Ondrej Bojar (mailto:obo@cuni.cz / bojar@ufal.mff.cuni.cz)
> http://www.cuni.cz/~obo
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151204/73848c8c/attachment-0001.html

------------------------------

Message: 3
Date: Fri, 4 Dec 2015 23:13:10 +0000
From: Ulrich Germann <ulrich.germann@gmail.com>
Subject: Re: [Moses-support] decoder question
To: Vincent Nguyen <vnguyen@neuf.fr>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAHQSRUrv9tX7X2LHwEFoU=1m6Kb1yA=8tYy4huGOeUkm7MrRbQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi Vincent,

as far as Moses is concerned, the end of a sentence is marked by whatever
the end-of-line marker is on the respective OS (Win: CRLF, Linux: LF, Mac:
CR, apparently). A period is treated as a plain old token. The purpose of
the sentence splitter that Kenneth mentioned is to tell Moses what the
"sentence" boundaries are.

The language model has a concept of sentences beginning and ending and
usually doesn't like periods anywhere except at the end of a sentence, so
it'll down-vote translation hypotheses containing isolated periods.

- Uli

On Fri, Dec 4, 2015 at 1:18 PM, Vincent Nguyen <vnguyen@neuf.fr> wrote:

>
> well not exactly my question. I know Moses translate one "line" at a
> time, meaning a string ending with a line feed.
>
> My question is more, if the string contains a PERIOD (tokenized as
> such), separating the line in 2 "sentences" then how does it behave ?
>
> given my observation I have the feeling that we really need to
> "sentence-tokenize" first before word-tokenizing.
>
>
>
> Le 04/12/2015 13:52, John D Burger a ?crit :
> > I think you're asking if Moses translates one sentence at a time. The
> answer is yes.
> >
> > - John Burger
> > MITRE
> >
> >> On Dec 4, 2015, at 04:43, Vincent Nguyen <vnguyen@neuf.fr> wrote:
> >>
> >> Actually I don't know if this is a decoder question or such.
> >>
> >> Here is my issue
> >>
> >> Let's say I have a text string with 2 sentences, with a period ending
> >> the first sentence, but no CR+LF, just a space before the second
> sentence.
> >>
> >> When I pass the full string to the pipe :
> >> tokenizer + truecaser + moses + detruecase + detokenizer
> >> the output is only one sentence, the period at the end of the first
> >> sentence has been eliminated, the sentence is nonsense (well not good at
> >> all)
> >>
> >> If I insert a CRLF just after the period of the first sentence and send
> >> the whole thing to the pipe, the output is correct.
> >>
> >> Am I missing something ?
> >>
> >> Should we only send string to moses segment by segment ?
> >>
> >> thanks,
> >> Vincent
> >> _______________________________________________
> >> Moses-support mailing list
> >> Moses-support@mit.edu
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

--
Ulrich Germann
Senior Researcher
School of Informatics
University of Edinburgh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151204/158da514/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 110, Issue 13
**********************************************

Moses-support Digest, Vol 110, Issue 13

0 Response to "Moses-support Digest, Vol 110, Issue 13"

Post a Comment