Mathematics desk
< July 26	<< Jun \| July \| Aug >>	Current desk >

Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.

July 27

Data estimation with excessive log functions

In health care, I noticed that many estimation algorithms make extensive use of log functions. For example, the ASCVD 10-year risk estimation from "2013 ACC/AHA Guideline on the Assessment of Cardiovascular Risk" sums up a coefficient times the log of age, a coefficient times the log of total cholesterol, a coefficient times the log of HDL, etc... It is a set of coefficients, each multiplied by the log of an attribute. Is this type of function or algorithm the result of a specific type of data modeling? It looks to me like they took a sample data set and correlated the log of each attribute, one at a time, to the outcome and produced a coefficient that represents how correlated the log of that attribute is in the sample set. But, I'm just guessing and I'd prefer to know how this type of function is actually produced. 75.136.148.8 (talk) 10:54, 27 July 2024 (UTC)[reply]

I'm not familiar with how this estimator was devised, but model building is an art, especially in cases where the data is noisy and the causal processes are poorly understood. Social scientists routinely use purely linear regression models, because that is what they were taught as students, it is the default model of R, which many use, and everyone else in their field does this. When a variable (independent or dependent) can only assume positive values, it cannot have a normal distribution. This is an indication that pure linear regression may not be the best approach when devising an estimator. So then it is good practice to use a data transformation that makes the observed distribution more normal. I don't know if this is why they did what they did. Another possibility is that they just computed the correlation coefficients and saw they were higher when using a logarithmic scale. --Lambiam 11:52, 27 July 2024 (UTC)[reply]

It is pretty common, and somewhat sensibly motivated, to use the log data transformation when the variables of interest are all strictly positive (e.g. weight, height, waist size). If you do linear regression of the log of the positive result variable in terms of the logs of the input variables, the coefficients are interpretable as the exponents in a multi-variate power law model, which is nice, because then the coefficients are interpretable the same way independent of any of the measurement units. On the other hand, for any specific problem, there are likely better data transformations than the log, and even the most suitable and well-motivated data transformation might be seen as an attempt to "fudge the data" compared to just using linear. Dicklyon (talk) 04:24, 8 August 2024 (UTC)[reply]

Are there other triangular numbers with all digits 6?

6, 66, 666 are all triangular numbers, are there other triangular numbers with all digits 6? 218.187.67.217 (talk) 16:42, 27 July 2024 (UTC)[reply]

These correspond to solutions of the Diophantine equation

10^{p}=12k^{2}\pm 3k+1\!:

10^{1}=12\cdot 1^{2}-3\cdot 1+1

10^{2}=12\cdot 3^{2}-3\cdot 3+1

10^{3}=12\cdot 9^{2}+3\cdot 9+1

For each solution, the number

8k^{2}\pm 2k

is an all-6 triangular number.

I don't expect any further solutions, but neither do I see an argument exhibiting that they cannot exist. The weaker requirement

12k^{2}\pm 3k+1\equiv 0~~({\text{mod}}~10^{p})

has four solutions for

k

for each given value of

p,

corresponding to the final digits

1,3,7,9.

For example, for

p=10

they are

8811378991,8647117103,1352882897,188621009.

The polynomials in

k

in the rhs of the Diophantine equation are irreducible. It seems that considerations based on modular arithmetic are not going to give further help. --Lambiam 19:59, 27 July 2024 (UTC)[reply]

The discriminant of the quadratic is

9-48(1-10^{p})=48\cdot 10^{p}-39=3(16\cdot 10^{p}-13)

. This needs to be a perfect square for there to be a solution, so we need

16\cdot 10^{p}-13=3k^{2}

for some integer k. Since

16\cdot 10^{p}

will get "closer" to being an even perfect square as p approaches infinity, I heuristically wouldn't expect more than a finite amount of solutions to exist.--Jasper Deng (talk) 03:34, 28 July 2024 (UTC)[reply]

This gives yet another way of phrasing the problem. Define the recurrent sequence

(a_{p})_{p=0}^{\infty }

by:

a_{0}=1,~a_{p{+}1}=10a_{p}+39.

It goes like this:

1,49,529,5329,53329,533329,5333329,...

The first four values are squares. Will the sequence ever hit another square? --Lambiam 10:05, 28 July 2024 (UTC)[reply]

It turns out that because the discriminant is added or subtracted to 3 and then divided by 2a=24 in the quadratic formula, there are even more stringent restrictions: the numerator has to be divisible by 24, so we must have

3k\cong \pm 3\mod 24

and thus

k\cong \pm 1\mod 8

. That restriction alone would seem to greatly reduce the amount of candidates (only every other odd perfect square satisfies that).--Jasper Deng (talk) 04:49, 29 July 2024 (UTC)[reply]

If the sequence

(a_{p})_{p=0}^{\infty }

ever hits another square

k^{2},

its square root

k

will satisfy this requirement. This can be seen as follows. For

p\geq 2,a_{p}\equiv 17~~({\text{mod}}~64)

since

a_{2}=529\equiv 17~~({\text{mod}}~64)

and

a_{p{+}1}=

10a_{p}+39\equiv

170+39=

209\equiv 17~~({\text{mod}}~64).

The only residue classes for

k

modulo

64

that have

k^{2}\equiv 17~~({\text{mod}}~64)

are

9,23,41,55;

in all four cases,

k\equiv \pm 1~~({\text{mod}}~8).

--Lambiam 10:13, 29 July 2024 (UTC)[reply]

Right. For any modulus m you can use the recursion to easily compute a_p mod m. It's a bit harder, but still possible to then determine if a_p is a quadratic residue mod m. If it isn't then you can eliminate that a_p as a non-square. Do this for a few thousand prime (or prime power) values of m and you have a sieve which only let's though those a_p's that are square and a vanishingly small number of "false positives". (There are going to be some m where all the values of a_p are quadratic residues, but this won't happen if 10 is a primitive root mod m, and this occurs at a relatively constant rate.) This could be implemented in Python (or whatever) fairly easily to eliminate all the non-square a_p's up to some value, say p≤10000. Keep in mind that a₁₀₀₀₀ would have around 10000 digits, but there's no need for multiprecision arithmetic to carry this out. However, all you would be doing is creating a lower bound on the next highest square a_p, you wouldn't actually be proving there are none. (That's assuming the sieve didn't produce an actual square a_p with p≤10000.) It shouldn't be hard to use a probabilistic argument to show that the "expected" number of squares is finite, but this wouldn't be a proof but rather an indication that it's unlikely that there will be additional squares above a given bound. In any case, I couldn't think of anything that would answer the original question better than a somewhat wishy-washy "probably not". --RDBury (talk) 13:10, 29 July 2024 (UTC)[reply]

Wikipedia:Reference desk/Archives/Mathematics/2024 July 27

Contents

July 27

Data estimation with excessive log functions

Are there other triangular numbers with all digits 6?