Wikipedia:Reference desk/Archives/Mathematics/2024 July 27

Mathematics desk
< July 26 << Jun | July | Aug >> Current desk >
Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


July 27

edit

Data estimation with excessive log functions

edit

In health care, I noticed that many estimation algorithms make extensive use of log functions. For example, the ASCVD 10-year risk estimation from "2013 ACC/AHA Guideline on the Assessment of Cardiovascular Risk" sums up a coefficient times the log of age, a coefficient times the log of total cholesterol, a coefficient times the log of HDL, etc... It is a set of coefficients, each multiplied by the log of an attribute. Is this type of function or algorithm the result of a specific type of data modeling? It looks to me like they took a sample data set and correlated the log of each attribute, one at a time, to the outcome and produced a coefficient that represents how correlated the log of that attribute is in the sample set. But, I'm just guessing and I'd prefer to know how this type of function is actually produced. 75.136.148.8 (talk) 10:54, 27 July 2024 (UTC)[reply]

I'm not familiar with how this estimator was devised, but model building is an art, especially in cases where the data is noisy and the causal processes are poorly understood. Social scientists routinely use purely linear regression models, because that is what they were taught as students, it is the default model of R, which many use, and everyone else in their field does this. When a variable (independent or dependent) can only assume positive values, it cannot have a normal distribution. This is an indication that pure linear regression may not be the best approach when devising an estimator. So then it is good practice to use a data transformation that makes the observed distribution more normal. I don't know if this is why they did what they did. Another possibility is that they just computed the correlation coefficients and saw they were higher when using a logarithmic scale.  --Lambiam 11:52, 27 July 2024 (UTC)[reply]
It is pretty common, and somewhat sensibly motivated, to use the log data transformation when the variables of interest are all strictly positive (e.g. weight, height, waist size). If you do linear regression of the log of the positive result variable in terms of the logs of the input variables, the coefficients are interpretable as the exponents in a multi-variate power law model, which is nice, because then the coefficients are interpretable the same way independent of any of the measurement units. On the other hand, for any specific problem, there are likely better data transformations than the log, and even the most suitable and well-motivated data transformation might be seen as an attempt to "fudge the data" compared to just using linear. Dicklyon (talk) 04:24, 8 August 2024 (UTC)[reply]

Are there other triangular numbers with all digits 6?

edit

6, 66, 666 are all triangular numbers, are there other triangular numbers with all digits 6? 218.187.67.217 (talk) 16:42, 27 July 2024 (UTC)[reply]

These correspond to solutions of the Diophantine equation  
 
 
 
For each solution, the number   is an all-6 triangular number.
I don't expect any further solutions, but neither do I see an argument exhibiting that they cannot exist. The weaker requirement   has four solutions for   for each given value of   corresponding to the final digits   For example, for   they are   The polynomials in   in the rhs of the Diophantine equation are irreducible. It seems that considerations based on modular arithmetic are not going to give further help.  --Lambiam 19:59, 27 July 2024 (UTC)[reply]
The discriminant of the quadratic is  . This needs to be a perfect square for there to be a solution, so we need   for some integer k. Since   will get "closer" to being an even perfect square as p approaches infinity, I heuristically wouldn't expect more than a finite amount of solutions to exist.--Jasper Deng (talk) 03:34, 28 July 2024 (UTC)[reply]
This gives yet another way of phrasing the problem. Define the recurrent sequence   by:
 
It goes like this:
 
The first four values are squares. Will the sequence ever hit another square?  --Lambiam 10:05, 28 July 2024 (UTC)[reply]
It turns out that because the discriminant is added or subtracted to 3 and then divided by 2a=24 in the quadratic formula, there are even more stringent restrictions: the numerator has to be divisible by 24, so we must have   and thus  . That restriction alone would seem to greatly reduce the amount of candidates (only every other odd perfect square satisfies that).--Jasper Deng (talk) 04:49, 29 July 2024 (UTC)[reply]
If the sequence   ever hits another square   its square root   will satisfy this requirement. This can be seen as follows. For   since   and         The only residue classes for   modulo   that have   are   in all four cases,    --Lambiam 10:13, 29 July 2024 (UTC)[reply]
Right. For any modulus m you can use the recursion to easily compute ap mod m. It's a bit harder, but still possible to then determine if ap is a quadratic residue mod m. If it isn't then you can eliminate that ap as a non-square. Do this for a few thousand prime (or prime power) values of m and you have a sieve which only let's though those ap's that are square and a vanishingly small number of "false positives". (There are going to be some m where all the values of ap are quadratic residues, but this won't happen if 10 is a primitive root mod m, and this occurs at a relatively constant rate.) This could be implemented in Python (or whatever) fairly easily to eliminate all the non-square ap's up to some value, say p≤10000. Keep in mind that a10000 would have around 10000 digits, but there's no need for multiprecision arithmetic to carry this out. However, all you would be doing is creating a lower bound on the next highest square ap, you wouldn't actually be proving there are none. (That's assuming the sieve didn't produce an actual square ap with p≤10000.) It shouldn't be hard to use a probabilistic argument to show that the "expected" number of squares is finite, but this wouldn't be a proof but rather an indication that it's unlikely that there will be additional squares above a given bound. In any case, I couldn't think of anything that would answer the original question better than a somewhat wishy-washy "probably not". --RDBury (talk) 13:10, 29 July 2024 (UTC)[reply]