Talk:Clearview AI/GA1
GA Review
editGA toolbox |
---|
Reviewing |
Article (edit | visual edit | history) · Article talk (edit | history) · Watch
Nominator: Czarking0 (talk · contribs) 04:50, 30 March 2024 (UTC)
Reviewer: Mike Christie (talk · contribs) 22:17, 6 June 2024 (UTC)
I'll review this. Mike Christie (talk - contribs - library) 22:17, 6 June 2024 (UTC)
Running Earwig finds the following:
Source: "Clearview has created more than 200 accounts for users at five Ukrainian government agencies, which have conducted more than 5,000 searches. Clearview has also translated its app into Ukrainian ... from three agencies in Ukraine, confirming that they had used the tool. It has identified dead soldiers and prisoners of war, as well as travelers in the country,..." Article: "Clearview had created over 200 accounts for users at five Ukrainian government agencies, which have conducted more than 5,000 searches, and that Clearview has also translated its app into Ukrainian. Ton-That provided emails from officials of three agencies in Ukraine, confirming that they had used the tool to identify dead soldiers and prisoners of war, as well as travelers in the country." See WP:CLOP; this needs to be rewritten in your own words.- The new wording is better, but I think is still identifiably a version of the original. How about "Ukrainian government agencies have used Clearview over 5,000 times, to identify dead soldiers, prisoners of war, and travelers"? The fact that Clearview created the accounts for them seems trivially obvious, and the "200 accounts" isn't as important in this context as the number of searches. Mike Christie (talk - contribs - library) 14:44, 9 June 2024 (UTC)
Will look at the sources next. Mike Christie (talk - contribs - library) 21:13, 7 June 2024 (UTC)
Sources:
Can we avoid the use of The Daily Dot? Per WP:RS/PS it's not a very good source. Here you're using it as one of three citations covering the same information; if the other sources cover the same ground I'd drop this one.- Done
Similarly, the use of The Next Web is discouraged. This article seems to be just an opinion piece rehashing other sources, so not a great source regardless.- Done Kept the claim but have a much better source
What makes cpomagazine.com a reliable source? The about page just says it is corporately owned, which is a good start, but does it have editorial control over what it publishes or is it a one-person operation? If we can't find that out, does it have a good reputation or get cited by other reliable sources?- Done could not find material to establish RS. Kept the text for now. Will rework in another bullet point
- What makes publicola.com a reliable source? Per the about page it seems to be a one-person operation.
- WIP I do not believe it is a one-person operation as the contact for the publication is not the author of the source. Working on establishing RS.
- I don't know if I should consider a reddit thread but this makes me think not RS
- WIP I do not believe it is a one-person operation as the contact for the publication is not the author of the source. Working on establishing RS.
The New York Post is not a reliable source.- Done
You cite Fight for the Future for a comment they made; it's a reliable source for that, but is the "shady surveillance vendor" comment notable enough to include in this article if nobody else mentions it?- Done I think this is semi-notable as some other sources mention them. However, I grouped them under "other commentators" since the remarks from the senator are much more notable
What makes Biometricupdate.com a reliable source? You cite them as one of four sources saying some information "was not received positively", but I think rather than using the passive we need to say who did not receive it positively, and for that we need reliable sources. Biometricupdate.com's own reaction to the news is not noteworthy but if they're reliable then their report of others' reactions might be.- Done The sourced articles showed some of the sources they used for their reporting. I did not make any determination on RS, but I did rework the article a bit so they are no longer a source.
How confident can we be that the document in documentcloud.org is authentic? What guarantees that? Is the claimed uploader authenticated?- DoneThe document is contributed by Buzzfeed and is linked to in this Buzzfeed News Article.
- FYI, the mississauga.com link is dead. This is not a problem for GA, but you may want to find an archived link for it.
- Done
- Looks like this is still an issue? I'm referring to FN 86. Mike Christie (talk - contribs - library) 15:23, 9 June 2024 (UTC)
- Done
FN 91 is described in the citation as 980 CFPL which is a name I can't find at the linked page; it seems to be a Global News page.- 980 CFPL is in the top tagline next to the author's name
techdirt.com appears to be a group blog, and hence not reliable.- Done
What makes noyb.eu a reliable source?- It is a source for a POV claim about the views of that organization. I think this is a question of notability of the POV claim not reliability? For notability, this is difficult for me to say. As far as privacy groups go are there really any truly notable ones? On the other hand, I think it would be a disservice to the reader to not include any remarks from self-described privacy advocates as some of them could be interpreted as notable to a reader in the more narrow privacy context. This group has gotten some attention and has their own WP article though I do not think is is very good. At least some think they are notable.
- I think that's good enough. Mike Christie (talk - contribs - library) 15:23, 9 June 2024 (UTC)
- It is a source for a POV claim about the views of that organization. I think this is a question of notability of the POV claim not reliability? For notability, this is difficult for me to say. As far as privacy groups go are there really any truly notable ones? On the other hand, I think it would be a disservice to the reader to not include any remarks from self-described privacy advocates as some of them could be interpreted as notable to a reader in the more narrow privacy context. This group has gotten some attention and has their own WP article though I do not think is is very good. At least some think they are notable.
Once these are resolved I'll do a spotcheck. Mike Christie (talk - contribs - library) 21:52, 7 June 2024 (UTC)
Thanks Mike. Tracking progress in line, I hope you are ok with that Czarking0 (talk) 23:29, 7 June 2024 (UTC)
- Sure; I have this watchlisted and will keep an eye. Will be intermittently busy the next few days but should be able to get back here whenever you're ready for me to look at the article again. Mike Christie (talk - contribs - library) 02:06, 8 June 2024 (UTC)
- Ok Mike, I appreciate your insight. Your comments have made be understand several flaws with this article. I have responded to all your comments. If there are any changes that are unsatisfactory just let me know.
- If you think it should just be failed here I would not be offended. However, if you want to keep the review going I will continue to work on it. Czarking0 (talk) 18:53, 8 June 2024 (UTC)
- No need to think about failing it; questions about sources are very common in GA reviews. It'll be some time tomorrow before I can go through your replies but I'm sure the reliability issues can be sorted out, if there are any left over after the changes you've made. Mike Christie (talk - contribs - library) 21:47, 8 June 2024 (UTC)
- I've struck most points; a couple of items left. Will read through and leave further comments next. Mike Christie (talk - contribs - library) 15:23, 9 June 2024 (UTC)
- No need to think about failing it; questions about sources are very common in GA reviews. It'll be some time tomorrow before I can go through your replies but I'm sure the reliability issues can be sorted out, if there are any left over after the changes you've made. Mike Christie (talk - contribs - library) 21:47, 8 June 2024 (UTC)
More comments
editThe first thing that strikes me about the article is the lists under the "Use" section. Compliance with the guidelines for list incorporation is one of the GA criteria; I think these lists are a problem. I see some of the entries on the list have no sources, which is an issue in itself, but overall I think it would be better to identify high profile examples of the different categories and either present the lists in prose, or make the lists much shorter -- no more than three or four in each of the three categories. E.g. "Clearview has been trialed by many law enforcement agencies, including the Royal Canadian Mounted Police and the New Zealand police, and was purchased by others, including the Swedish Police and the Metropolitan Police in London". That's assuming "many" can be sourced directly. I don't think the reader needs the full list unless we have evidence that the list is itself notable -- that is, that other sources find the list itself, rather than just some of the organizations on the list, to be notable.- I think you have a great comment here. I some research into this and I do think the list is sufficiently notable that it should be on WP. However, I think we should consider making a list class article separate and then giving a few examples. Another troubling point on this is that I believe BuzzFeed got exclusive rights to publish the list as obtained through "hacking", and I do not think they have actually published the list as a whole. This would make it more difficult to verify an article on the list itself since it seems that only the notable elements are published. I do think readers would like to check if institutions they care about are on the list. I also think readers who just care about what kind of customers the company has had could use a summary like "Their customer list includes X american police departments, Y federal law enforcement organizations, Z universities, and W international police departments." Czarking0 (talk) 19:41, 9 June 2024 (UTC)
- The lead is a little short for an article of this length. I think it could be about twice as long as it is. The relevant guideline is WP:LEADLENGTH. A related point is that the history section starts "Clearview operated in near secrecy until ...": this assumes the reader already knows what Clearview is -- it's written as if the lead paragraph is the first paragraph of the body of the article. See WP:LEAD for general guidelines around the lead, but the basic idea is that the body of the article should be complete without the lead, and the lead should be a summary of the body. Here that's not the case.
- will get to this after the other stuff. I like working on the lead at the end.
I'm going to hold off reading through and making more detailed comments until that's addressed; in the meantime I'll do the spotchecks:
FN 26 cites "Clearview's attorney, Tor Ekeland stated the flaw was corrected". Verified. I would suggest changing "the flaw" to something like "the flaw in their security", to be clearer.- Done
FN 49 cites "the company has demonstrated its search can identify people while they wear a protective mask". I think this could be rephrased -- this source only says Ton-That successfully identified one person who had their nose and mouth covered, not necessarily with a mask. It's Gross who generalizes this. If there's another source for the more general statement I would use that, otherwise perhaps "Hill found that Clearview's search could identify him even when his nose and mouth were covered, as they would be with a COVID mask".- Done
FN 38 cites "In October 2021 Clearview submitted its algorithm to one of two facial recognition accuracy tests conducted by the National Institute of Standards and Technology (NIST) every few months. Clearview ranked amongst the top 10 of 300 facial recognition algorithms in a test to determine accuracy in matching two different photos of the same person, instead of the test for matching an unknown face to a 10 billion image database, which more-closely matches the algorithm's intended purpose. This was the sole third-party test of the software at the time." Verified; some of the wording is pretty close to the source but the relevant phrases are hard to reword so I think it's OK. I initially misread the last sentence as saying there were no other ways to test the software, but I see it means that Clearview had not previously been tested by a third-party. Perhaps rephrase a little? And it might be worth mentioning that NIST had another more suitable test which Clearview did not submit to.- Done
- FN 20 cites "According to the BBC in 2023, few cases of mistaken identity using Clearview facial recognition have been documented, but "the lack of data and transparency around police use means the true figure is likely far higher." Ton-That claims the technology has approximately 100% accuracy, and attributes mistakes to potential poor policing practices. Ton-That's claimed accuracy level is based on mugshots and would be affected by the quality of the image uploaded." Verified.
- FN 97 cites "In another Florida case, Clearview's technology was used by defense attorneys to successfully locate a witness, resulting in the dismissal of vehicular homicide charges against the defendant." Verified.
One minor rewording needed out of five checks; this is a pass for the spotcheck once that issue is fixed. Mike Christie (talk - contribs - library) 16:15, 9 June 2024 (UTC)
More comments:
Are there any images that could be used -- of any of the people named, perhaps? GA doesn't require images: the criterion is that the article be "illustrated, if possible", but I don't see any justification for fair use claims, so there may be nothing usable.- The material in the infobox is unsourced. This is OK if it's sourced in the article, but the founding date is unsourced, for example.
"the company maintained a low profile until late 2019, until its usage by law enforcement was first reported": it didn't become well-known till then, certainly, but is it accurate to say that it "maintained" a low profile? That would imply they deliberately avoided publicity, which might be true, of course.Why is it relevant where Ton-That and Schwartz met?'Noted far-right "troll king"' is not a neutral description.- I think the history section could be organized a little more. Currently it's a series of paragraphs that range across several topics: the corporate history of the company; use by clients; lawsuits/cease-and-desist orders; and a couple of other things such as security. I would suggest pulling the purely "corporate history" sentences together under history, and grouping the other material under one or two more appropriate headings. You already have a "legal challenges" section; I don't think we need to repeat that material here, so perhaps just moving the non-history material would work. And the material in this section jumps around: for example, "The settlement with the American Civil Liberties Union" is mentioned as if we already know what lawsuit this is, but it hasn't been mentioned before.
I'm going to pause the review here, because I think addressing the structure will change the article quite a bit, and I'd like to wait till that's done before doing a full pass through. Mike Christie (talk - contribs - library) 17:28, 10 June 2024 (UTC)
- I looked and was unable to find images of the founders that can be used. I am not sure what else would make a good image. I think potentially a graph from the NIST study?
- Working on the other structural stuff. Czarking0 (talk) 21:02, 10 June 2024 (UTC)
- It is accurate to say the company maintained a low profile as there are many documented cases of how they avoided journalism. Here is a quote from FN1
Clearview has shrouded itself in secrecy, avoiding debate about its boundary-pushing technology. When I began looking into the company in November, its website was a bare page showing a nonexistent Manhattan address as its place of business. The company’s one employee listed on LinkedIn, a sales manager named “John Good,” turned out to be Mr. Ton-That, using a fake name. For a month, people affiliated with the company would not return my emails or phone calls. While the company was dodging me, it was also monitoring me. At my request, a number of police officers had run my photo through the Clearview app. They soon received phone calls from company representatives asking if they were talking to the media — a sign that Clearview has the ability and, in this case, the appetite to monitor whom law enforcement is searching for.
- Where they met is relevant because it helps explain the connection between one of the best facial recognition companies and the right wing. If they had met at a restaurant that would be less notable. The notability is further established by publication of this fact in NYT per FN1.
- Agreed, is "right wing troll" more neutral? I had to read his WP page to know who he was so I am open to other interpretations.
Czarking0 (talk) 21:13, 10 June 2024 (UTC)
I've struck some points as your answers above address them; feel free to post after each point if you like (I think it's easier to follow the individual answers that way). FYI, the Wikipedia indenting syntax is pretty opaque, but there's a simple rule that helps: copy whatever the last indent was (e.g. "*" or "*:" or whatever) and then add a ":" for indent and a "*" for an indented bullet. It's worth getting right per WP:INDENTMIX because otherwise it becomes a mess for non-sighted readers who use screen readers, which don't handle mixed-up indents very well. So if you want to reply to a bullet point of mine, with a "*", you'd put "*:" to reply with an indent but no bullet, and "**" to reply with an indented bullet. Then I might reply to that with "**:".
Re your last point, I think we can source "right-wing" easily enough, but "troll" is a POV term that we can't use per WP:NPOV (for which a good summary is that it should be impossible for a reader of the article to tell where the sympathies of the writers of the article lie). We need to be accurate, though we don't have to be complimentary if the facts aren't complimentary. I don't know this person so I don't know what the right description is, but something like "Right-wing blogger" would be fine. If we need to emphasize that they deliberately post things with the intention of causing trouble, we need to find a source that states that factually and cite that in support of the description. Mike Christie (talk - contribs - library) 22:37, 10 June 2024 (UTC)
- From his page "Johnson is often described as an internet troll and has been repeatedly involved in the proliferation and spread of multiple fake news stories." This has three sources which I believe are reliable. So maybe this a matter of including those in this article? Czarking0 (talk) 23:29, 10 June 2024 (UTC)
- I think there's a difference between "has been described as a troll" and "is a troll"; one is a factual description, and the other is an opinion. I think it would be better to say he is known for spreading fake news stories, and leave the word "troll" out of it. But why do we need to even mention Johnson as a customer? What does it tell us about Clearview? Is there some evidence of collusion or political leanings on Clearview's side, that they gave him an account? At the moment the sentence just says "Hey, look, this troll had an account", which feels like tainting by association. Mike Christie (talk - contribs - library) 00:35, 11 June 2024 (UTC)
Ok I believe that the history, marketing, and legal challenges sections are now significantly improved. I am most curious if you have other concerns with those sections?
If not I will move on to the list. I really do think it is notable; however, as you pointed out there are quite a few lines that are not sourced. I can go confirm/remove all those and then we will see where we stand? Czarking0 (talk) 06:02, 12 June 2024 (UTC)
- I hope to find time to read through again this evening and will comment again then. Re the list, I would recommend making a separate List of Clearview AI users if you think it's notable. I'm doubtful: I can see why it *might* be notable, but the fact that so many entries are unsourced and will have to be sourced individually implies that the list as a whole is not treated as a single reportable entry by most sources. I do think it should be trimmed to just the sort of prose paragraph I gave as an example above. You can save it on the talk page if you want to keep it around while deciding whether to make a separate list article of it. Mike Christie (talk - contribs - library) 11:29, 12 June 2024 (UTC)
- Did you see my comment that the list was exclusively obtained by BuzzFeed and they have decided not to publish it in its entirety? To me this seems like a journalistic strategy rather than anything about the notability of the list as a whole. I could be wrong though. Czarking0 (talk) 22:21, 12 June 2024 (UTC)
- I did. I also saw your comment about readers wanting to check if a particular organization uses the software. I think reasonable people can disagree on this one, but at the moment I think the article would be better without the list. I'd be OK with a shorter list of maybe half a dozen of the most prominent users, perhaps in addition to the short paragraph approach I suggested above. Mike Christie (talk - contribs - library) 23:06, 12 June 2024 (UTC)
- That seems like a good middle ground. I will work towards that. Czarking0 (talk) 03:31, 13 June 2024 (UTC)
- Ok everything in the list is sourced and it is now much shorter. Czarking0 (talk) 17:30, 13 June 2024 (UTC)
- I did. I also saw your comment about readers wanting to check if a particular organization uses the software. I think reasonable people can disagree on this one, but at the moment I think the article would be better without the list. I'd be OK with a shorter list of maybe half a dozen of the most prominent users, perhaps in addition to the short paragraph approach I suggested above. Mike Christie (talk - contribs - library) 23:06, 12 June 2024 (UTC)
- Did you see my comment that the list was exclusively obtained by BuzzFeed and they have decided not to publish it in its entirety? To me this seems like a journalistic strategy rather than anything about the notability of the list as a whole. I could be wrong though. Czarking0 (talk) 22:21, 12 June 2024 (UTC)
Another read through
edit- "It maintained this secrecy by exerting significant influence on what information can reported on. For example, they have called police officers to ask them why they were communicating with journalists and the founders tried to erase all their social media presence." The support for this seems to be the quote given in FN 14: "I see you have a lot of photos on the internet you should be in the app but you're not here... A couple of minutes later he said he got a call from someone who worked for Clearview AI and they wanted to know why he'd been running my photo." I don't think this works: this doesn't say they successfully exerted influence -- it just gives a single example of a call they made. The BuzzFeed News article describes them as unresponsive to some press enquiries and with some deleted history, but that's not the same thing.
- Edited the claim to better reflect sources.
- What's the source for "discouraging users from talking to the press"? The NYT article says they called the police departments who ran the journalist's photo, and the quote from the audio is similar; those certainly indicate that Clearview wanted to know about media interest but there doesn't seem to be anything saying they told users to avoid talking to the media. Am I missing something elsewhere in the sources? Mike Christie (talk - contribs - library) 11:12, 15 June 2024 (UTC)
- No you have it right, I agree I went to far here is saying they discouraged users from talking to the press. The fundamental point I am trying to get across is that the company was secretive. This is mentioned in nearly every source when they introduce the company. On the other hand it is hard for me to point to a fact that demonstrates how they are secretive. I am hesitant to ignore the number of sources that call them secretive since these reliable sources are probably better able to judge that than I am even if they don't publish an analysis of what makes them secretive. Maybe the middle ground here is to leave it at "publishing fake information about the company's location and employees" which is verifiable? Czarking0 (talk) 16:13, 16 June 2024 (UTC)
- Edited the claim to better reflect sources.
I suggest moving the definition of what the software does to the first paragraph of the body.- Done
- I don't think this is done -- it's in the lead, but the first para of the body needs to say it since the lead is supposed to be a summary of the body. Mike Christie (talk - contribs - library) 11:12, 15 June 2024 (UTC)
- By the first paragraph of the body, do you mean the history section? I feel like it does not really fit there. I moved it to the beginning of the usage section. Is there potentially value in putting usage above history? I don't know that the average reader cares so much about the corporate history so I could go either way on that. Czarking0 (talk) 15:50, 15 June 2024 (UTC)
- After thinking about it some more I think this is a stylistic choice so I'm going to strike this point. I do think it's good to give the reader the key information early in the body. Mike Christie (talk - contribs - library) 18:39, 15 June 2024 (UTC)
- Done
I've put some of the corporate history sentences together in a shortened "History" section, and put the remaining material in a "Usage" section -- let me know if you think that works. It seemed easier to do that than to try to explain which paras I thought went together.- Done
"demonstrated Clearview's expansive, multi-year collaboration with the NYPD. These records demonstrated, contrary to past NYPD denials, that Clearview provided accounts ...": suggest "demonstrated that Clearview had collaborated with the NYPD for years, contrary to past NYPD denials. Clearview provided accounts ...".- Done
The first paragraph of "Marketing efforts and pushback" covers the NYPD; this was already discussed a couple of paragraphs earlier. Can we combine the discussions into one paragraph?- Done techincally not one paragraph but I think I hit the spirit of your comment
"The company markets directly to police officers by encouraging them to "run wild" by searching for family, celebrities, and suspects": this was an email directly to an officer in Green Bay in 2019 -- they probably did send it to multiple recipients, but we don't know that, so we can't phrase this as a general statement. We can give the quote as an example of how they marketed themselves, but I think it should be clear in the body of the article (rather than by following the citation) that this is taken from an email to one of their clients (rather than an exhortation in a brochure or posted on their website, for example).- Changed this, I think it is better now? Done
"Clearview had claimed that its app played a role in a New Jersey police sting, which Grewal confirmed had been used to identify one of the child predators": why "one of the child predators"? We haven't mentioned any child predators before. And I'm not clear from this whether Grewal was confirming they did use Clearview in the sting, or just confirming that the sting identified a child predator.- Done
Why do we mention Jessica Medeiros Garrison? It seems from the citation that Clearview is owned or part-owned by MDM27 Holdings; if we have a good citation for that it can go in the history section. But Garrison's name doesn't help the reader at all unless we have more information about her or MDM27 or there's a link we can add.- Done
"Documents from Clearview have claimed 98.6% or 100% accuracy using a 99.6% confidence interval." This doesn't make sense. A 99.6% confidence interval means that Clearview assert that the image will be correctly identified 99.6% of the time. "Interval" is not the right word anyway, since this is not across a range of a parameter -- when one says "x lies between 10 and 15, with 99.6% confidence", that's a confidence interval, because 10-15 is an interval. In this case we're just talking about a confidence level. But that's just a claim that Clearview make about accuracy, so it doesn't make sense to say they claim either 98.6 or 100% accuracy when claiming 99.6% confidence.- This could get really nuanced which makes me want to back up for a moment. Did you have a background in this? My masters is in statistical experiment design. FN61 shows that the 99.6% confidence interval is not the alpha for the accuracy it is the alpha for the match of the input face to the result. The reported accuracy depends on the alpha for the match. Obviously the WP article as it currently stands is not communicating this. Overall I am somewhat opposed to quantitatively stating the accuracy in WP as a single number. That is really over simplifying the system. However, getting detailed with the performance seems quite technical for WP and probably not what the typical reader of this article is looking for. I would prefer to report more of a summary along the lines of "it is one of the best in the world" given appropriate sourcing.
- As a side note I think you have some misunderstanding of CI. "x lies between 10 and 15, with 99.6% confidence" is not strictly true (depends on what "confidence" really means). The better summary is: x either lies in the interval or not with probability 1. Intervals constructed at a given alpha (99.6% in this case) contain the target parameter with alpha frequency. This is not the same as a specific CI containing a specific parameter.
- My degree is in pure mathematics, with a bit of post-graduate study; no, I'm certainly no expert in statistics and am happy to concede your points about the imprecision of my comments. What I was trying to get at was that the lay interpretation of a single % confidence number is "the odds are this % that this is correct". I may be wrong, but I doubt the sources are precise in the way you discuss. The article says "Documents from Clearview have claimed 98.6% or 100% accuracy using a 99.6% confidence interval." The source for the 99.6% figure appears to be this statement in the test document: "Unlike Amazon’s Rekognition, Clearview does not allow the user to set the confidence level, but instead is fixed at 99.6%." The 98.6% figure comes from the BuzzFeed News article which says "In marketing materials to Atlanta police, Clearview claimed that it could accurately find a match 98.6% of the time in a test of 1 million faces." I don't think we can combine these two statements as the article does: these seem to me like layperson statements about accuracy, not precise statistical assertions. Mike Christie (talk - contribs - library) 11:30, 15 June 2024 (UTC)
- I think you are right here. What do you think of my suggestion to shift to more qualitative claims? We could go with the claim in their marketing material, but I don't love using their own marketing for the reported accuracy. I could dig into into the test document a bit more to see if there is a more reliable summary statistic that is useful to the layperson.
- I don't think it's worth the trouble; the numbers are probably all nonsense anyway. Can we say something like "At various times, Clearview have claimed 98.6%, 99.6%, and 100% accuracy"? The word "claim" avoids the implication this is anything more than marketing. Mike Christie (talk - contribs - library) 18:28, 15 June 2024 (UTC)
- I think you are right here. What do you think of my suggestion to shift to more qualitative claims? We could go with the claim in their marketing material, but I don't love using their own marketing for the reported accuracy. I could dig into into the test document a bit more to see if there is a more reliable summary statistic that is useful to the layperson.
- My degree is in pure mathematics, with a bit of post-graduate study; no, I'm certainly no expert in statistics and am happy to concede your points about the imprecision of my comments. What I was trying to get at was that the lay interpretation of a single % confidence number is "the odds are this % that this is correct". I may be wrong, but I doubt the sources are precise in the way you discuss. The article says "Documents from Clearview have claimed 98.6% or 100% accuracy using a 99.6% confidence interval." The source for the 99.6% figure appears to be this statement in the test document: "Unlike Amazon’s Rekognition, Clearview does not allow the user to set the confidence level, but instead is fixed at 99.6%." The 98.6% figure comes from the BuzzFeed News article which says "In marketing materials to Atlanta police, Clearview claimed that it could accurately find a match 98.6% of the time in a test of 1 million faces." I don't think we can combine these two statements as the article does: these seem to me like layperson statements about accuracy, not precise statistical assertions. Mike Christie (talk - contribs - library) 11:30, 15 June 2024 (UTC)
"Ton-That claims the technology has approximately 100% accuracy": another, different, accuracy claim. We should probably put dates on these claims, so the reader doesn't think they're simultaneous and hence inconsistent."The Android version contains references to": I think it should be clearer to the reader that the functionality described in the next couple of sentences was found by reading the code of the app, and the reporters weren't able to demonstrate working functionality.- what is the implication here? That this may not actually be the code for the app? All those claims would still be relevant if it was an outdated version. Though I am not 100% convinced of notability
- I think it's notable enough to include -- it shows intention on Clearview's part to do these things, and a reader would certainly be interested in that. It doesn't prove they successfully implemented these functions to the point that they worked, or that any user ever used them -- it's common to include software in a released product that is draft or inaccessible, and can't be accessed -- for example in a library of functions. How about making it "... an examination of the code for the Android version revealed references to ..."? That would make it clear to the reader that the source never saw it working in practice. Mike Christie (talk - contribs - library) 11:40, 15 June 2024 (UTC)
- Done
- I think it's notable enough to include -- it shows intention on Clearview's part to do these things, and a reader would certainly be interested in that. It doesn't prove they successfully implemented these functions to the point that they worked, or that any user ever used them -- it's common to include software in a released product that is draft or inaccessible, and can't be accessed -- for example in a library of functions. How about making it "... an examination of the code for the Android version revealed references to ..."? That would make it clear to the reader that the source never saw it working in practice. Mike Christie (talk - contribs - library) 11:40, 15 June 2024 (UTC)
- what is the implication here? That this may not actually be the code for the app? All those claims would still be relevant if it was an outdated version. Though I am not 100% convinced of notability
"Clearview also operates a secondary business, Insight Camera": we say "operates", but should it be "operated"? The website is no longer up.- some additional googling point to the claim that Insight Camera's website was taken down after the press started asking about their connection. That seems speculative to me, but also points to the fact that taking down the website does not necessarily imply ceasing to operate the venture. Maybe this is notable in itself? Like we say "operated" but note the website was taken town and the company has not publishing anything since being contacted?
- Hmm, not sure what the best option is here. If you have a reliable source for the site being taken down after the publicity started, maybe say that? Mike Christie (talk - contribs - library) 11:40, 15 June 2024 (UTC)
- Done
- Hmm, not sure what the best option is here. If you have a reliable source for the site being taken down after the publicity started, maybe say that? Mike Christie (talk - contribs - library) 11:40, 15 June 2024 (UTC)
- some additional googling point to the claim that Insight Camera's website was taken down after the press started asking about their connection. That seems speculative to me, but also points to the fact that taking down the website does not necessarily imply ceasing to operate the venture. Maybe this is notable in itself? Like we say "operated" but note the website was taken town and the company has not publishing anything since being contacted?
- How about combining the "cases" section under the lists with the "usage" section I created? In either location -- maybe moving the "usage" stuff down to under the lists would make the most sense. That way the article organization would be corporate history, then the technology itself, then uses and marketing, and finally the legal challenges. That seems a logical sequence to me -- what do you think?
- agreed, will do Done
- The legal challenges section is very fragmented; I think this is because it sticks strictly to chronological order. See WP:PROSELINE for an essay giving advice about this sort of prose. Can we make it a bit more thematic? E.g. start with a para saying multiple states and organizations have sued Clearview, and give examples; then give details for any that seem important enough; then cover any other information such as fines and rulings (e.g. the EU's decision that their photo database was illegal). The mention of the particular lawyers they hire might go in the corporate history section, but if not then I'd put those separate mentions together -- e.g. "Clearview's lawyers have included Tor Ekeland, Paul Clement, and Floyd Abrams", and then give dates if available and relevant, and any quotes.
That's it for this pass. The lists look fine now. I think the main problem with the article initially was organization, which is why it's taking multiple passes for me to give you this feedback. It's getting there, though. Mike Christie (talk - contribs - library) 22:05, 13 June 2024 (UTC)
- This sounds good. On my todo list. Made some progress still WIP. Czarking0 (talk) 17:00, 15 June 2024 (UTC)
Czarking0 (talk) 02:46, 15 June 2024 (UTC)
- Looks like you're still working on a couple of points; I've gone through and struck or replied to the points you've dealt with. Mike Christie (talk - contribs - library) 11:43, 15 June 2024 (UTC)
- Replied to your first point about discouraging users. I worked on the legal history a bit more and I think it is better now. I am not sure that it is sufficient. Can I get some more feedback there? I believe that covers all the points made here.
- FYI there is some breaking news about another settlement that is notable; however, I believe the story is not sufficiently settled to include it at this time. Czarking0 (talk) 16:16, 16 June 2024 (UTC)
Final pass
editYou've done so much to improve the article that I'm not going to go through and strike the remaining points above; I'll just read through again and note any outstanding issues here. I did read your comments above and will include responses below.
Re secrecy, I think you have pretty good citations that say how secretive they are -- the NYT article is titled "The Secretive Company That ..." after all, and we quote that title in this article. I think we can drop "discouraging users from talking to the press" without changing the message to the reader that this was a company that did not want media scrutiny."Clearview came under renewed scrutiny for enabling officers to conduct large numbers of searches without formal oversight or approval." This is now uncited; I suspect it got detached from its citation when you were moving text around."What Clearview does is mass surveillance and it is illegal. It is completely unacceptable for millions of people who will never be implicated in any crime to find themselves continually in a police lineup." This seems to be a quote but it's uncited. Is it from Therrien? If so I'd tack it on to the previous sentence, in quotes, rather than indenting it as you have here: "... hundreds of illegal searchs using Clearview AI, and said "What Clearview does is mass surveillance ..." and then add whatever the relevant citation is.
That's it for this pass. I read through the legal section again; a couple of bits of info have been moved elsewhere and I think this is OK now -- it's still a bit fragmented but that's just the nature of the information that has to be conveyed. Mike Christie (talk - contribs - library) 18:58, 16 June 2024 (UTC)
- Ok I addressed these points. I appreciate the attention. Czarking0 (talk) 03:53, 17 June 2024 (UTC)
- Fixes look good. This is GA quality now, so I'm passing it. Congratulations, and thank you for being patient with my nitpicking. I also want to say that the reason I picked this article to review was that I saw you'd done quite a few reviews yourself -- I like to prioritize reviewing nominations by editors who are also contributing to the reviewing side of GA, so thank you for those reviews. Mike Christie (talk - contribs - library) 09:33, 17 June 2024 (UTC)
- Thanks Mike, this is my first GA so I am very happy this morning. I'll certainly be doing more reviews in the future! Czarking0 (talk) 14:29, 17 June 2024 (UTC)
- Fixes look good. This is GA quality now, so I'm passing it. Congratulations, and thank you for being patient with my nitpicking. I also want to say that the reason I picked this article to review was that I saw you'd done quite a few reviews yourself -- I like to prioritize reviewing nominations by editors who are also contributing to the reviewing side of GA, so thank you for those reviews. Mike Christie (talk - contribs - library) 09:33, 17 June 2024 (UTC)