Wikipedia:Reference desk/Archives/Mathematics/2020 April 4

Mathematics desk
< April 3 << Mar | April | May >> Current desk >
Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


April 4

edit

John Marlowe and the other John Marlowe

edit

Being stuck indoors (that's my excuse; I probably would have done this anyway), I chose to watch a couple of old b/w movies on TV today, ones I'd never seen before.

What a weird coincidence, I said to myself. So, naturally, I got to wondering how likely this would have been, assuming the films weren't chosen for broadcast deliberately because of the names of the characters, which I think would be extremely unlikely.

I don't know what sorts of assumptions one would need to make to have a stab at this, but let me phrase it thus:

Our article Coincidence notes that "[f]rom a statistical perspective, coincidences are inevitable". I dare to state that they are even more inevitable when you are getting bored. In case you are fed up watching old movies, here are three not too long reads about the odds of coincidences:
 --Lambiam 13:56, 4 April 2020 (UTC)[reply]
There's no way of knowing for sure, but the writers of both movies might have been influenced by the Raymond Chandler character Philip Marlowe, especially since the radio program The Adventures of Philip Marlowe was airing at about the same time. There's also the Joseph Conrad character Charles Marlow; perhaps not as well known at the time but a professional writer worthy of the title would be familiar with him. Put that together with the fact that John is a very common first name in English and it's not too surprising that there would be two movies from the early 50's with characters named John Marlowe. The real coincidence is that you happened to pick those two same movies as a double feature, but as pointed out above such coincidences aren't always as unlikely as they may seem. You might be interested in the Stanisław Lem novel The Chain of Chance, which explores the nature of coincidence in the guise of a futuristic detective story. --RDBury (talk) 19:12, 4 April 2020 (UTC)[reply]
A database with film character names would not be of help in getting a precise value unless we also know the likelihood of a pair of films being chosen in succession. It is not very likely that a channel will programme The Texas Chain Saw Massacre to follow a broadcasting of The Sound of Music. But Earth vs. the Flying Saucers, although rarely shown, is more likely to be shown right after The Day the Earth Stood Still than after most other flicks. Below I follow an entirely different "armchair statistics" approach. I would not dream of submitting this to a peer-reviewed journal – in real life I have a reputation to uphold.
OK, here we go. Assume that the screen writer (or book author if the film is adapted from a book) creates a character's name by picking the given name of someone reminiscent of the character and the surname of someone else also reminiscent of the character. So for a serial killer they might combine Leonard Fraser with Alexander Pearce to name a character "Leonard Pearce". There is a non-zero chance that this procedure results in a name that must be rejected for obvious reasons, such as "Tony Abbott", but I think this can be disregarded, as the chance is still fairly small. I believe that any name for a serial killer is equally likely as the name is for a bookkeeper, so we can disregard the character of the character. I'll confine myself, though, to English-language male names. Not all names have an equal prevalence. Let us assume that both given names and surnames independently follow (the simplest case of) Zipf's law. While this assumption is not founded on evidence, it is not unreasonable as an approximation.
Before moving on to applying this model to the question, let us first examine a more general question. Given is a discrete probability distribution over a set of   items, numbered   through  , where the probability (relative frequency) of the  -th item is denoted by   Consider a pair of random draws (with replacement) according to the given distribution from these   items. If the first one drawn is item  , the probability that the second draw yields the same item equals  . To find the overall probability of a matching pair, we need to take the weighted sum, where the weights are the probabilities of the first item. This results in
 
Zipf's law corresponds to the distribution given by
 
in which the notation   denotes the  -th harmonic number, so that the probabilities sum up to   as they should. Let   be the number of given names and   the number of surnames, so that there are   given-name–surname combinations in total. Each name can be indexed by a pair   and then has probability  . Now we find
 
The two sums are partial sums of a convergent series with limit   (for which see the Basel problem). Since the series converge quickly, we can approximate both sums for large values of   and   by the limit  . The harmonic numbers can be approximated by the leading term of their well-known asymptotic expansions:  ,  . Combining all this gives us the approximation
 
It remains to supply numbers for   and  . For this we use the numbers of entries (as of 19:58, 4 April 2020 (UTC)) in the Wikipedia categories English-language masculine given names and English-language surnames. This gives us   and  . Plugging this in and taking numeric values results in
  .
This approximate estimate is for a name match between one character from the first and one from the second film, say the two main characters. If more characters from each cast are considered, say   from movie number one and   from movie number two, where both numbers are fairly limited, the chance of a match increases by almost a factor of  . If both equal  , we get   I agree that this seems implausibly high.
Concluding thought. If character names were really distributed as in real life, occasionally two characters should happen to coincidentally have the same name without this being relevant to the plot. Why do we never see this? So many questions remain.  --Lambiam 19:58, 4 April 2020 (UTC)[reply]
Double wow! Thanks for all that. I'm very surprised that N is as low as 1769. -- Jack of Oz [pleasantries] 00:23, 5 April 2020 (UTC)[reply]
I have computed   using exact rational arithmetic instead of approximations. It makes a considerable difference for the harmonic numbers. With the values of   and   as before, I then find   . For a match in 100 possible pairs, the probability goes up to  
Thanks for that. Most intriguing. -- Jack of Oz [pleasantries] 07:55, 8 April 2020 (UTC)[reply]