Wikipedia:Reference desk/Archives/Computing/2024 July 25

Computing desk
< July 24 << Jun | July | Aug >> Current desk >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


July 25

edit

Are there apps or any software, that can identify one's accent?

edit

E.g, software that can identify a person's native language, when they are currently speaking in a non-native language (e.g. English), rather than in their native language we want to identify.

Yesterday, I presented this question at the language reference desk. However, no one has given me a positive answer yet, except for a possible direction via AI, but without a certain answer. 2A06:C701:7B31:C100:7D63:C50F:C3A5:9744 (talk) 10:18, 25 July 2024 (UTC)[reply]

You received a comprehensive answer at the language desk. The answer is no. Shantavira|feed me 11:48, 25 July 2024 (UTC)[reply]
AI is used for pattern matching and classification. If there is some pattern that classifies speech as having a specific accent, AI can identify the pattern and classify the accent. AI is not magic. It won't do any more than identify a pattern using one of the many methods of pattern matching and then classify using one of the many ways of clustering and classification. 75.136.148.8 (talk) 12:12, 25 July 2024 (UTC)[reply]
A more precise answer is that no respondent here is aware of the existence of such an app. Perhaps the NSA has developed one but is keeping it under wraps. If so, how would we know?  --Lambiam 13:33, 25 July 2024 (UTC)[reply]
It is likely that, for the moment, it is far easier and cheaper to employ non-Artificial Intelligences, i.e. people, with linguistic expertise that enables them to make such identifications. This of course would only apply to specific instances – an AI-like application would be needed for automatic surveillance on a mass scale. {The poster formerly known as 87.81.230.195} 94.2.67.235 (talk) 00:18, 28 July 2024 (UTC)[reply]
People on Hugging Face have created some accent related models but putting that into some piece of software you can use will be a very much do-it-yourself task. Models found there also have rather variable quality, most of them are research projects not intended for wider consumption. With enough data, classifying existing accents in order to infer other accents should be possible. But, speaking anecdotally as someone who grew up in world cities, the way someone learns a language hugely influences their accent... possibly about equally to the languages they spoke before that, and the possibility for error is huge.
Again speaking anecdotally: You should think of accents as individual but similar to each other - usually a property of how that specific person has used and learned their languages, but sometimes completely learned and how that person wishes to speak. You should approach whatever problem you are trying to solve with this in mind, it is not just a symptom of a person's previous languages. Komonzia (talk) 21:13, 3 August 2024 (UTC)[reply]

How to automatically search and replace text in Linux CLUI in a multi-lined way?

edit

We can automatically search and replace single-lined text in Linux CLUI with awk and sed, but I need a way to do it multi-lined.

  • File A has several HTML structures.
  • File B has this HTML structure:
    <footer class="site-footer">
      <div class="site-footer__inner container">
        {{ page.footer_top }}
        {{ page.footer_bottom }}
      </div>
    </footer>
  • File C has this HTML structure:
    <footer class="site-footer">
      <div class="site-footer__inner container">
        {{ page.footer_top }}
        {{ page.footer_bottom }}
      </div>
      <span class="globalrs_dynamic_year">{{ 'now' | date('Y') }}</span>
    </footer>

How to automatically search in file A and if it contains the text of file B then replacing that text with the text of file C?

How would you do this with C/Perl/Python/PHP/Node.js or something else? 103.199.70.159 (talk) 19:06, 25 July 2024 (UTC)[reply]

While it would be trivial to do this in Python or any other reasonable programming language, if I wanted to do this with a script, my approach would be:
  1. . Convert all three files to a version which eliminates newlines, using sed. For convenience, I would replace the newline character with some character or string which would not occur in the HTML, call it '~' (tilde).
  2. . Now add a newline to change the tilde in every occurrence of "</footer>~" to a newline in each of the three converted files. Do the same for "~<footer class="site-footer">". You end up with files where the html of interest is on a single line.
  3. . Use sed to do the substitution of the single line file C text to replace the single line file B text in the single line file A text.
  4. . Use sed to convert single line file A back to the original formatting by replacing '~' with newline.
This won't work if files B and C are not marked with the exact footer head and tail as you have shown.-Gadfium (talk) 20:46, 25 July 2024 (UTC)[reply]
In any programming langauge, A, B, and C are just text. So, you use a string replace function. In C++, it is (from memory) replace(A,B,C);. In Perl (again from memory), it is A=~s/B/C/;. In PHP, it is $A=str_replace($B,$C,$A);. In Python, it is A = A.replace(B,C);. In Node.js it is A = A.replace(B,C) as well. Note that in a programming language, a string is just a string of characters. It doesn't care if there are newline or return characters in it. So, replacing a substring replaces it all, including the return and newline characters. But, the text has to match perfectly. For example, if A is using two spaces for indentation and B is using tab characters, it won't match. Similarly, if one uses all lower case tag names and another uses all upper case tag names, it won't match. In that case, you need to reformat the text so it is all the same or use regular expressions. 75.136.148.8 (talk) 13:31, 26 July 2024 (UTC)[reply]
The Unix/Linux utility sed can do this; see sed § Multiline processing example. The search ["sed" multi-line replace] gives some more examples.  --Lambiam 23:01, 26 July 2024 (UTC)[reply]
Since the OP asked for a Perl solution, here is a simple one.
my $orig = `cat $ARGV[0]`;
my $repl = `cat $ARGV[1]`;
my $text = `cat $ARGV[2]`;
$text =~ s:$orig:$repl:g;
print $text;
CodeTalker (talk) 18:57, 27 July 2024 (UTC)[reply]