Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: tokenizer.perl weirdness with some patterns (Philipp Koehn)
2. Re: tokenizer.perl weirdness with some patterns (Ozan ?a?layan)
3. Re: nplm / Bilingual LM (Marwa Refaie)
4. Re: nplm / Bilingual LM (Marwa Refaie)
----------------------------------------------------------------------
Message: 1
Date: Fri, 3 Jul 2015 13:05:50 -0400
From: Philipp Koehn <phi@jhu.edu>
Subject: Re: [Moses-support] tokenizer.perl weirdness with some
patterns
To: Ozan ?a?layan <ozancag@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAAFADDB6e6dH-qnaUi7HQwE=jR9B7hJTw61RA2DKcb24EpmY1Q@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Hi,
the proper handling of "." in tokenization is a hard problem.
The heuristics that the script uses to determine if a period
is a end-of-sentence period and hence should be separated
(and not an abbrev. period that should stay attached) include
checking if the next word is uppercase. In your example,
the next word is lowercase, so the script concludes that
it is an abbreviation period and hence does not split it off.
You may change the script in any way you want for your
own purposes. It is hard to predict what the effect of that
will be for machine translation quality in your case.
-phi
On Thu, Jul 2, 2015 at 3:48 PM, Ozan ?a?layan <ozancag@gmail.com> wrote:
> Hello,
>
> $ echo "tu ne peux pas me voir. blabla" | tokenizer.perl -l fr
> tu ne peux pas me voir. blabla
>
> $ echo -n "I don't understand your reactions. sorry." | tokenizer.perl -l en
> I don 't understand your reactions. sorry .
>
> So the problem is that if a dot is followed by a space and then a
> lowercase letter, it is not tokenized. This is happening in at least
> the french tasks of IWSLT. Is this expected? The responsible line for
> this problem is tokenizer.perl:330. What should I lose if I comment
> out the responsible part for this in large scale processing?
>
> Thanks.
>
> PS: I also filed an issue for this:
> https://github.com/moses-smt/mosesdecoder/issues/118
>
>
>
> --
> Ozan ?a?layan
> Research Assistant
> Galatasaray University - Computer Engineering Dept.
> http://www.ozancaglayan.com
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
------------------------------
Message: 2
Date: Fri, 3 Jul 2015 20:45:34 +0200
From: Ozan ?a?layan <ozancag@gmail.com>
Subject: Re: [Moses-support] tokenizer.perl weirdness with some
patterns
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAFub=KTy2-v7KaFXWjh0AdRKNm9pGdUbYVrR3SMqvrUV9xjJBA@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Hello,
Thanks for your answers which helped clarify things for me.
Thanks!
------------------------------
Message: 3
Date: Fri, 3 Jul 2015 23:25:45 +0200
From: Marwa Refaie <marwa.refaie@gmail.com>
Subject: Re: [Moses-support] nplm / Bilingual LM
To: Raj Dabre <prajdabre@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CADJOO16oBy981Sw2AK7KCRtXzEL8nKhkv38AycrEq5ADEuNHUQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
./bjam --with-nplm=....
Is working .. now moses knows the Bilnigual & Nplm feature in moses.ini.
The problem is while the decider run the machine crashedown
Loading LM0
Loading LM1
......................
Is there any methode to binarize or get smaller bilingual LM !! May be that
is the problem , rather from a month everything was working fine with same
files !!!
Marwa N. Refaie
On 3 Jul 2015 17:45, "Raj Dabre" <prajdabre@gmail.com> wrote:
> On that note.
> I tried to use KENLM and NeuralLM together but for some reason the decoder
> simply exits.
> I am attaching my moses-nplm-kenlm.ini file.
>
> By the way:
> I compiled my mosesdecoder (latest pull) wit the command: ./bjam
> --with-boost=/share/usr-x86_64
> --with-nplm=/home/raj/softwares-and-scripts/nplm-master --max-factors=10
> -j10
> Compilation with nplm integration was a success.
>
> Here is where it gets weird. I ran 2 tuning experiments: 1 with no
> Language model and 1 with NeuralLM (not the bilingual version). The
> moses-nplm.ini is also attached.
> The tuning proceeds for both but the tuning set BLEU for both are the same
> (for each run from run 2 to run 25).
> I made sure that the LM was trained on the target side training data.
> Am I missing something?
>
>
> On Fri, Jul 3, 2015 at 11:03 PM, Nikolay Bogoychev <nheart@gmail.com>
> wrote:
>
>> Hey Marwa,
>>
>> I can't reproduce the problem. Using latest moses git and nplm from
>> https://github.com/rsennrich/nplm it compiles just file and I get both
>> BilingualNPLM and NeuralLM FFs
>> I can suggest that you do a ./bjam clean and try recompiling again.
>>
>> Cheers,
>>
>> Nick
>>
>> On Fri, Jul 3, 2015 at 12:19 PM, Marwa Refaie <marwa.refaie@gmail.com>
>> wrote:
>>
>>> Hi
>>> From a month both the nplm & Bilingual NPLM was working great, suddenly
>>> now they seems not linked to moses!!!!!!!!!!!!! .. I try to recompile but
>>> always add nothing
>>>
>>> ./bjam --with-nplm=~/nplm-master/src
>>>
>>> Tip: install tcmalloc for faster threading. See BUILD-INSTRUCTIONS.txt
>>> for more information.
>>> mkdir: cannot create directory ?bin?: File exists
>>> warning: No toolsets are configured.
>>> warning: Configuring default toolset "gcc".
>>> warning: If the default is wrong, your build may not work correctly.
>>> warning: Use the "toolset=xxxxx" option to override our guess.
>>> warning: For more configuration options, please consult
>>> warning:
>>> http://boost.org/boost-build2/doc/html/bbv2/advanced/configuration.html
>>> ...patience...
>>> ...patience...
>>> ...found 4625 targets...
>>> SUCCESS
>>>
>>> *Then when try the featyre "NeuralLM " or "BilingualNPLM"*
>>>
>>> Feature name NeuralLM is not registered.
>>> Feature name BilingualNPLM is not registered
>>>
>>> Any suggestion please ??
>>>
>>> Marwa N. Refaie
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
> --
> Raj Dabre.
> Doctoral Student,
> Graduate School of Informatics,
> Kyoto University.
> CSE MTech, IITB., 2011-2014
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150703/bc08f518/attachment-0001.htm
------------------------------
Message: 4
Date: Sat, 4 Jul 2015 00:14:18 +0200
From: Marwa Refaie <marwa.refaie@gmail.com>
Subject: Re: [Moses-support] nplm / Bilingual LM
To: Nikolay Bogoychev <nheart@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CADJOO148ftV80hM6UuthEh71pMfydFEWGipHx02ZgKh5-my6jQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Thanks
Memory issue solved adding the following to the BilingualNPLM line in the
moses.ini " premultiply=false "
On Fri, Jul 3, 2015 at 4:03 PM, Nikolay Bogoychev <nheart@gmail.com> wrote:
> Hey Marwa,
>
> I can't reproduce the problem. Using latest moses git and nplm from
> https://github.com/rsennrich/nplm it compiles just file and I get both
> BilingualNPLM and NeuralLM FFs
> I can suggest that you do a ./bjam clean and try recompiling again.
>
> Cheers,
>
> Nick
>
> On Fri, Jul 3, 2015 at 12:19 PM, Marwa Refaie <marwa.refaie@gmail.com>
> wrote:
>
>> Hi
>> From a month both the nplm & Bilingual NPLM was working great, suddenly
>> now they seems not linked to moses!!!!!!!!!!!!! .. I try to recompile but
>> always add nothing
>>
>> ./bjam --with-nplm=~/nplm-master/src
>>
>> Tip: install tcmalloc for faster threading. See BUILD-INSTRUCTIONS.txt
>> for more information.
>> mkdir: cannot create directory ?bin?: File exists
>> warning: No toolsets are configured.
>> warning: Configuring default toolset "gcc".
>> warning: If the default is wrong, your build may not work correctly.
>> warning: Use the "toolset=xxxxx" option to override our guess.
>> warning: For more configuration options, please consult
>> warning:
>> http://boost.org/boost-build2/doc/html/bbv2/advanced/configuration.html
>> ...patience...
>> ...patience...
>> ...found 4625 targets...
>> SUCCESS
>>
>> *Then when try the featyre "NeuralLM " or "BilingualNPLM"*
>>
>> Feature name NeuralLM is not registered.
>> Feature name BilingualNPLM is not registered
>>
>> Any suggestion please ??
>>
>> Marwa N. Refaie
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
--
Marwa N. Refaie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150704/9676505f/attachment.htm
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 105, Issue 10
**********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 105, Issue 10"
Post a Comment