User talk:Jmath666/latex2wiki

Latest comment: 10 years ago by Silas Maxfield in topic align bug

LaTeX to Wiki translator wanted

edit

(copied from User talk:Oleg Alexandrov)

Is there a LaTeX to Wiki translator, please? I know I can do with some amount of hand editing because the math formulas are largely compatible, but having the structure (paragraphs, sections, crossreferences, citations,...) and tables translated automagically would be great. Jmath666 19:09, 17 February 2007 (UTC)Reply

I don't know of any. Besides, the output of any such program would not look so good on Wikipedia I think. I suggest you use a text editor to replace dollar signs with <math> </math> tags, and then do everything else by hand. Not pretty, but works. :) Oleg Alexandrov (talk) 19:11, 17 February 2007 (UTC)Reply
Thanks. It should not be hard to write a Perl script to do that and also replace \section{X} by ==X== and do citations and so on. I could rework some introductions from my papers and proposals into useful articles and such translation as the first step would save a lot of work. (I am aware of the need to steer clear of copyright issues.) But my Perl is rusty. Maybe one day if I can find the time. Is there a place to post such codes on Wikipedia? Do you think it would be useful to the community? I am not sure I would bother just for my own use. Jmath666 19:31, 17 February 2007 (UTC)Reply
I wrote a small perl code which attempts something like that, see here. For now all it does is converting dollars to math tags and sectioning. Doing the bibliography is more complicated, I may work on that sometime but not today. Oleg Alexandrov (talk) 21:15, 17 February 2007 (UTC)Reply
Cool. Where is the script itself? I run few paragraphs through it and it has shown some simple things I might want to add. Jmath666 21:35, 17 February 2007 (UTC)Reply
The code is now linked from the web form, so also here. I guess you can download it to disk and modify the parse_latex routine to your heart's desire (you can easily build a stand-alone perl script around that routine not depending on the web form interface). Then sometime later I could merge it in. Oleg Alexandrov (talk) 21:49, 17 February 2007 (UTC)Reply

I implemented conversion from TeX to wiki for references per Wikipedia:Footnote3. In case you use BibTeX, you first need to paste the generated .bbl file at the end of the article for this to work well. The code basically replaces every \cite with a Wikipedia {ref}, and every bibitem with a Wikipedia {note}. Oleg Alexandrov (talk) 20:58, 18 February 2007 (UTC)Reply

Re: LaTeX to Wiki

edit

I wrote a small script to do some conversion work. I replied in full on my talk page. Oleg Alexandrov (talk) 21:17, 17 February 2007 (UTC)Reply

I replied about how to get the edit summary too, in the appropriate section on my talk page. Oleg Alexandrov (talk) 21:19, 17 February 2007 (UTC)Reply
I replied on my talk again. Note that if my talk page is on your watchlist, it will be easy for you to see if anything changed there. Oleg Alexandrov (talk) 22:11, 17 February 2007 (UTC)Reply
Thanks again. You do not have to alert me on this page separately unless this is an automated process. How do you know if I have your talk on my watch list or not? Jmath666 22:50, 17 February 2007 (UTC)Reply
You can visit my talk page, you will see a tab at the top which says either "watch" or "unwatch". If it says "unwatch" it means that it is currently on your watchlist.
It is actually simpler to click on "my watchlist" on the very top of the page. You will see which pages on your watchlist changed in the last few days. Oleg Alexandrov (talk) 01:10, 18 February 2007 (UTC)Reply
OK, but if you asked how do I know what's on your watchlist, then the answer is that I don't. I know what's on my watchlist, and you know what's on your watchlist. :) Oleg Alexandrov (talk) 01:11, 18 February 2007 (UTC)Reply
Indeed, I meant to ask how do you know what I have on my watchlist and I understand your answer is you don't. Sorry about the confusion. Jmath666 02:49, 18 February 2007 (UTC)Reply

Running LaTeX to Wiki

edit

I have donwloaded your cgi-lib.pl and pm.pl, grabbed latex2wiki by cut and paste from browser (amazingly, I could not rightclick-save, kept getting something else in the saved file, not just colorize, even from directory listing?? your web server substituting something?), installed a bunch of Perl modules from CPAN to get latex2wiki.cgi run at all but still cannot get anything reasonable from the command line. I am trying

./latex2wiki.cgi < file.tex > file.wiki

The error I am getting is

Use of uninitialized value in substitution (s///) at ./latex2wiki.cgi line 38 (#1)

and nothing like wiki source comes out. Eventually seg fault or some boilerplate html depending on which machine, I tried two. This is probably bogus and your code is fine. It may be missing some environment or expecting input in some other way than standard input. I know some Perl (I wrote an OpenMP to threads translator) but not Perl with the web stuff. Or something happens to the code in cut and paste. Jmath666 04:29, 19 February 2007 (UTC)Reply

I mentioned something about that sometime earlier on my talk page, but I was not very explicit. Here's the thing. You don't need any of that web stuff, that's necessary just to have Perl play nice with the web browser. You need to copy only the parse_latex routine, paste it in a .pl file, and at the top of that file write the following
#!/usr/bin/perl

use strict;                   # 'strict' insists that all variables be declared
use diagnostics;              # 'diagnostics' expands the cryptic warnings

undef $/; # undefines the separator. Can read one whole file in one scalar.

MAIN: {

  my ($file, $text);

  $file = $ARGV[0]; # the command line argument

  # read the data from $file into $text
  open (FILE, "<$file"); $text = <FILE>; close (FILE);

  # process the text
  $text = &parse_latex ($text);
 
  print "$text\n";

}

# the parse_latex subroutine goes here
sub parse_latex {

....

}

Oleg Alexandrov (talk) 16:11, 19 February 2007 (UTC)Reply

Yes, you did say that but the code looked to me like the few lines you now replaced were doing some parsing using those web libraries. My Perl is indeed rusty. OK, I'll give it a try. Thanks! Jmath666 20:04, 19 February 2007 (UTC)Reply
This goes through fine, I can take it from there. Thanks. Jmath666 23:54, 19 February 2007 (UTC)Reply
Glad to hear that. You are right, that ParseInput function looks as if it could be doing more than what it does (it just reads data from the webform). Oleg Alexandrov (talk) 03:16, 20 February 2007 (UTC)Reply
I have added few rules but it still has a bit to go. For now I am just deleting crossreferences to sections and equations. I have been putting the code at User:Jmath666/latex2wiki.pl and a sample output at User:Jmath666/latex2wiki. Jmath666 04:58, 20 February 2007 (UTC)Reply
Formatting OK on my sample article, but the numbering of citations is wrong - wiki ignores the labels and numbers the citations sequentially. Wikipedia:Footnote3 does not seem to be quite compatible with the \cite \bibitem semantics. The treatment of multiple references to the same footnote is particularly different. Wikipedia:Footnote is not it either. This might require a more complicated Perl code than the simple substitutions. Jmath666 06:47, 20 February 2007 (UTC)Reply
Yes, it may require more complicated Perl code than the simple substitutions. :) I think the way to go is as illustrated at Help:Footnotes, where the first time a citation occurs you actually embed the full bibliographical entry of that citation within the article. Next time the same citation occurs you don't need to embed the bibliographical entry again, just refer to it by a label. Then, at the end, Wikipedia generates a list of all bibliographical entries for you. That's different what LaTeX does where you put all the bibliographical entries at the end and only use citations to them in the text.
This should not be trivial to implement, but not too hard either. We'd need a hash which maps each citation label to its bibliographical entry (that can be done by parsing the LaTeX bibliography section). Then, for each citation label we substitute its first occurrence by the bibliographical entry. Give it a try. :) I may think of this too, in the weekend. Oleg Alexandrov (talk) 16:17, 20 February 2007 (UTC)Reply
So you prefer Wikipedia:Footnote to Wikipedia:Footnote3 then. That's good to know.
Yeah, I know. I did similar things in Perl before. This is starting to look much like real work. Maybe in a few days if I cannot resist. Jmath666 20:27, 20 February 2007 (UTC)Reply
Well I could not resist, the references are done. It works on the sample paper, but I'll have to add conversion for more math commands - the subset Wiki supports is pretty small. I won't bother with tables (yet). Jmath666 04:47, 21 February 2007 (UTC)Reply
Enjoy. :) You may want to make an announcement at Wikipedia talk:WikiProject Mathematics when you are ready. Do you have a place from where you can host it in the form of a web form, the way it was originally on my web page? That way other people could use it too. I mean, I can always merge your changes to my script, but I think you are into this more than me so you could as well maintain the whole thing yourself. Cheers, Oleg Alexandrov (talk) 05:12, 21 February 2007 (UTC)Reply
Note that the line
 # Get rid of {} around a single letter, common in BibTeX
 $text =~ s/\{([A-Z])\}/$1/g;
may have the unintended consequences of messing up \frac{A}{B} or \sqrt{A}. I think you should create a function which would isolate the bibliography part from the article to be translated and apply such changes only to it. Perhaps. Oleg Alexandrov (talk) 16:40, 21 February 2007 (UTC)Reply
The bibliography part does get isolated so this will be easy. Thanks for noticing this.
What I would really need is parenthesis matching something instead of \{(.*?\)} that would match only paired { } e.g. in {ab{c}} it would match ab{c}.
What is the place to ask about simple extensions of tex emulation, e.g. missing \widetilde (\widehat is implemented). Jmath666 18:17, 21 February 2007 (UTC)Reply

I don't know how to do matching braces in Perl, maybe there's a module for that. I don't know either about where to ask about TeX extensions, try maybe the Wikipedia:Village pump. Cheers, Oleg Alexandrov (talk) 04:45, 22 February 2007 (UTC)Reply

Yes, I can host the script in the web form. Is it OK when I copy the other files supporting the web form from your site? Jmath666 07:30, 23 February 2007 (UTC)Reply
Sure. Notice that the code pm.pl contains a lot of stuff which you won't need (I started latex2wiki as a hack based on pm2wp). To make the code more elegant you may want to move the functions &print_head() and &print_foot() from pm.pl to the bottom of your new latex2wiki.pl and not use pm.pl at all, but that is of course not strictly necessary (and my code is in the public domain anyway). And again, you could make an announcement at the math wikiproject, some people may be interested in this converter. Oleg Alexandrov (talk) 16:18, 23 February 2007 (UTC)Reply
Thanks. I'll figure out the web forms in a week or so hopefully. I have still to fix few things in latex2wiki.pl before it is ready for an announcement. I have used it to translate Ensemble Kalman filter and there were relatively few edits to make by hand but some things did not work that I thought I fixed (esp. those matching {} in bibliography) so it needs another look. Jmath666 20:45, 23 February 2007 (UTC)Reply

Latex2wiki

edit

Please put comments about the LaTeX to wiki conversion below. Jmath666 06:20, 25 February 2007 (UTC)Reply

My first comment is that I wonder if it is indeed necessary to output the text


 print '<!-- The original version of this page was converted from LaTeX by [[User:Jmath666/latex2wiki.pl]] -->'."\n";
 print '<!-- first version written by [[User:Oleg Alexandrov]], now developed and maintained by [[User:Jmath666]] -->'."\n";
 print '<!-- This page is using references by [[Wikipedia:Footnote]] -->'."\n";

each time at the top of the converted LaTeX code. If I were a user of latex2wiki I'd cut off that text anyway before saving the created wikicode.

Also, in the code you give me more credit than what I deserve. The code was your idea, I did the first hack indeed, but all the real work was done by you. Perhaps you could rephrase the

# written by  User:Oleg Alexandrov
# modified and maintained by User:Jmath666

to something like

# written by User:Jmath666
# with code contributions from User:Oleg Alexandrov

What do you think? Cheers, Oleg Alexandrov (talk) 07:24, 25 February 2007 (UTC)Reply

Oh, and I tried to use the web form but it just hangs on me. Is there a problem with the server? Oleg Alexandrov (talk) 07:38, 25 February 2007 (UTC)Reply
OK will do. Thanks. Does it hang always? Sometimes the conversion may take a while. You can download the Perl code and run it from the command line which will give you more info, or post the code you tried to convert and I'll look into it.
Please try again and let me know. I tried few more documents, the conversion server worked, but preview hangs sometimes with eventually The Wikimedia Foundation servers are currently experiencing technical difficulties. I think the Wikimedia server caches the rendering which makes it look fast in the edit/preview cycle but it may choke if it gets a big chunk at once. Jmath666 08:02, 25 February 2007 (UTC)Reply
I tried again. I inserted the very simple text
$x$
and hit "Submit", and nothing happened. Could you see if you can reproduce that? Oleg Alexandrov (talk) 03:04, 26 February 2007 (UTC)Reply
The perl script goes in infinite loop on that. Please try some more realistic example. I do not understand why; your input gets translated correctly, then on line 142 there is a match where it should not be. Watch:
main::parse_latex(standalone.pl:142):
142:        $text =~ s/\\cite\{([^\}]*?)\,/\\cite\{$1\}\\cite\{/;
  DB<4> print $text
<math>x\,\!</math>

  DB<5> s
main::parse_latex(standalone.pl:143):
143:        $e=$1;
  DB<5> s
main::parse_latex(standalone.pl:142):
142:        $text =~ s/\\cite\{([^\}]*?)\,/\\cite\{$1\}\\cite\{/;
  DB<5> print $e
x
To my best understanding, it should not match and $1 and thus $e should be undefined. When I copy this into a separate test code it behaves correctly. Perl bug? Jmath666 06:00, 26 February 2007 (UTC)Reply
Well contrary to documentation $1 is unpredictable if match was not made. I fixed that and some other things. Please try again. Thanks again for pointing it out. Jmath666 07:21, 26 February 2007 (UTC)Reply

Thanks, now it works. By the way, per the math style manual it is good that formulas are kept in html (non-PNG) form if inline, so   rather than   (the latter looks odd on my screen, pushed half-way down). One more note, it is an unspoken standard here that one should write

<math>x^2\,</math>

rather than

<math>x^2\,\!</math>

if you want to force the PNG conversion. Perhaps you could adapt you script to not add "\,\!" for inline formulas, and if it adds that, that it be of the form "\,". But these are small things. Oleg Alexandrov (talk) 16:01, 26 February 2007 (UTC)Reply

I am aware of the issue. Having the same symbol rendered differently in different places looks just as odd. In an article with an occasional symbol, I agree. In an article where there is lots of repeated formulas inconsistent rendering hinders understanding and consistency is important. I have assumed that latex2wiki would tend to be used for the latter. See Help:Formula#Forced PNG rendering where they give as example consistent rendering in a proof. So I followed the markup there to force PNG always for consistency. Maybe there ought to be a button on the web form to choose. Hopefully in time Wikimedia will mature enough so that this will not be an issue, or at least they will fix the positioning. Jmath666 16:55, 26 February 2007 (UTC) Not so small - it would be painful to edit every single formula in hand. I'll see how this interacts with user preferences. The article I was working on does look bad with rendering piecemeal this and piecemeal that. Jmath666 19:24, 26 February 2007 (UTC)Reply
There are now some variables set in the source up front that control adding "\'" "\," or "\,\!". The way it is set now is not to add anything because this is can be controlled by user preferences. But the PNG rendering ought to be fixed or replaced; it is much like latex2html, which never worked right, with the added visual confusion when something is rendered in PNG and something not (with default user settings at least). Jmath666 01:13, 27 February 2007 (UTC)Reply
Yeah, math display on the web sucks. MathML is supposed to fix things, but who knows how long it would take until it is widely accepted and supported. Oleg Alexandrov (talk) 03:19, 27 February 2007 (UTC)Reply
Math display on the web has been solved long ago: render the whole page in PDF. That's what we all use to put papers on the web. All online journals are in PDF. It supports links, too. And it looks the same regardless of the browser. Why not have PDF display of everything in user preferences? Jmath666 16:42, 27 February 2007 (UTC)Reply
P.S. HTML is snappier faster for the browser but with faster CPUs and more bandwidth this is not important any more. On the server side, I am sure Wikimedia caches the rendered pages anyway. —The preceding unsigned comment was added by Jmath666 (talkcontribs).
It is a pain to read stuff in pdf on the screen (if screens would be not only wide, but also high, to fit an entire page at a reasonable level of detail, that may change). Also, adobe acrobat is a hog. A faster CPU does not help, it takes ages to load the thing from the hard drive. There are other reasons to not consider pdf an option I would think. Oleg Alexandrov (talk) 03:30, 28 February 2007 (UTC)Reply
True. There are reasons either way. An option to render PDF would not hurt though, people have different preferences. It could be patched into Wikimedia from a wiki2latex Perl script and some free TeX software. But I had enough fun hacking for now and this would take more than a few days esp. when counting the learning curve. Jmath666 06:21, 28 February 2007 (UTC)Reply


LaTeX to Wikicode translation

edit

(copied from WikiProject Mathematics)

A raw version of a translator is available, by joint effort of User:Oleg Alexandrov and myself. Jmath666 06:57, 25 February 2007 (UTC)Reply

"Joint work" here means that I did the original hack of several lines and then Jmath666 took the effort to make this actually output something usable. This is an interesting way to create articles, surely much faster and more efficient than using the textbox and the "Preview" button. Oleg Alexandrov (talk) 07:08, 25 February 2007 (UTC)Reply

If you insist on getting inline TeX out of this thing, can you at least use \scriptstyle when it's inline? Michael Hardy 03:15, 28 February 2007 (UTC)Reply

Please explain. Jmath666 22:18, 8 March 2007 (UTC)Reply
How about the reverse - Wikicode to LaTeX? Tompw (talk) 16:25, 1 March 2007 (UTC)Reply
The basic stuff (sections, equations, <ref></ref> to \bibitem, but no pictures or links) would not be so hard either. I wanted LaTeX to Wikicode translator for myself, because over time I wrote some introductory material in LaTeX that may be useful. And citations are so much easier if I can just pull them from existing BibTeX databases. Jmath666 22:18, 8 March 2007 (UTC)Reply
Not even speaking of the convenience of a wysiwyg editor instead of hacking the source. Jmath666 01:20, 9 March 2007 (UTC)Reply

By the way, is there some permanent place to make a link on Wikipedia to such tools? Jmath666 22:18, 8 March 2007 (UTC)Reply

There is now a separate user page for the translation. Jmath666 00:09, 15 March 2007 (UTC)Reply

Found it. Now listed in Wikipedia:Tools/Editing_tools#From_LaTeX and in Category:Wikipedia tools. Jmath666 06:07, 29 March 2007 (UTC)Reply

align bug

edit

I found a failed parse for the LaTex "align" environment, in Explicit formulae (L-function), and when checking how to correctly format the text, I discovered the Help page also fails to parse it, see: Help:Displaying a formula#Fractions.2C_matrices.2C_multilines and go where the "align" environment is used.

The Parse error tells us that "\begin" is not recognised, but then says "\begin{aligned}" and the same at "\end{aligned}". The correct word {align} is being used in the source, but this is somehow becoming {aligned} in preformatting, and the parser then returns a syntax error. Hope this helps. Silas Maxfield (talk) 15:10, 7 February 2014 (UTC)Reply

Resolved, thanks. Silas Maxfield (talk) 11:07, 19 February 2014 (UTC)Reply