Talk:Internet Archive/Archive 3
This is an archive of past discussions about Internet Archive. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 |
Moving image collection
Computer Chronicles https://archive.org/details/MainFram1984 Mainframes to Minis to Micros (2/12/1984) HiRes MPEG4 https://archive.org/download/MainFram1984/MainFram1984_edit.mp4
What a horrible encode you have made of this, I guess rest of the whole moving image archives will be the same. You need to look again at the encoder settings and redo every transcode you have done so far.
Needed better picture quality that matches the source video. For this it is the mpeg2 video the mp4 transcode is 30-50% less than the mpeg2 video. Think of using an intermediate avi file between the decode and encode
mpeg2 > avi lossless > mp4 means lose less picture quality. That is given you have good settings for the final encode to h264
The audio is very poor so bad that that is scratchy or maybe kind of slurred overall poor audio quality (AQ). AAC is part of the mpeg audio family and is from same stable as AC3 so the origianl mp2 should not suffer encoding it to aac it should remain the same. You need to increase the bitrate by a large amount to equal the source audio quality.
mp2 > wav > aac or mp2 > aac
remember you need to to set and use higher video and audio bitrates to maintain the original mp2 audio quality. The same goes for the video with many other obvious tweaks needed for the encoder settings. Seek help you obviously need it. Better to fix all video mp4 recodes now than later.
ATM this MP4 video % compared to the original mpeg2 video is 40% it should be at least 90-95%+ video quality
ATM this MP4 audio % compared to the original mpeg2 video is 10% it should be at least 90-99%+ audio quality
You need to use better encoder settings than you use now to help preserve the video. Preserving is what the internet archive is about right ? So why are you making so bad mistakes like this in doing so. As made this obvious mistake which is easily spotted how many others are you making ?
Suggestion if you really want to preserve the recoded video forget FFmpeg x264 it is poorly maintained or is hopeless maybe reason you video convert is horrible. Look at x264vfw or x264 codec to encode with.
Suggest for convert of the original mpeg2 to h264-acc.mp4 deinterlace keep the bottom field and delete the top field. Resize the video to the original resolution as it is now (mpeg2)720x480 it will get rid of the interlace so make the picture cleaner and sharper. This works for these videos from original mpeg2 file unknown if the same for all Computer Chronicle episode videos. Use Something like x264vfw Slow-none-high-4.1 with single pass ratefactor-base (CRF) 18 (at minimum). Preserve also the audio quality listen and judge to know if the audio quality is the same and not worse which is now far worse. Nothing worse than bad PQ and AQ for video and audio
Do keep encoding video to hires mp4 with better video settings and bitrates for video and audio. Sure we can download the original but why need to when the recode is as good as the original. Until then I won't bother would if had a better download net connection until then I won't bother with any of your videos collections or videos. Better to download quality over rubbish which is what your mp4 videos are now.
Do update you main site to say for a long while the video have been recoded and are 90% or more better than they were originally. To let us know you have done this.
Further to this it would be better if you added episodes in each year in a way that we understand episode listings.
Episode name - Season ## - Episode ## - Title
examples . Computer Chronicles - S02 - E01 - Computers Run Amok . Computer Chronicles - S02 - E02 - Computers Break Out . Computer Chronicles - S02 - E03 - Computers Take Over . Computer Chronicles - S03 - E01 - Computers Fightback Terminating Everyone . Computer Chronicles - S03 - E02 - First Terminator Voices 'I Will Be Back' . Here is a list of all Computer Chronicles episodes but unsure how accurate it is or of any resources there http://stquantum.xtreemhost.com/cc/content/episodelisting.htm
Having each episode listed like this also for the episode name. Means easier to navigate to what people require. Also as new videos are added easily for people to spot if they like to see or have seen the episode. This naming scheme work with any program and episode, do this and the file names makes life for everyone simple as abc. — Preceding unsigned comment added by 78.150.253.167 (talk) 19:25, 22 March 2014 (UTC)
Banned in Russia
Since 24 Oct 2014 Internet Archive (web.archive.org) is banned by Russian authorities. It should be added in the article. — Preceding unsigned comment added by 5.167.173.119 (talk) 09:56, 24 October 2014 (UTC)
Non-controversial sub-section removed from Controversies section
Removed sub-section:
Removal of Citizenfour Documentary
The Internet Archive removed the listing of a documentary about Edward Snowden, called CitizenFour "due to issues with the item's content." <ref>https://archive.org/details/LauraPoitrasCitizenfour</ref>
Reason:
1) Reference provided doesn't support the assertion that the removal is or was controversial. 2) The removal was not and is not controversial, except in one editor's head. 3) The Archive also doesn't host Hollywood movies or other copyrighted non-public content. What's your point here? 4) Anyone can have their content removed from the publicly facing archive by simply throwing up a robots.txt, at any time, without warning or notification to anyone at all. That's how it works. 5) In this case, we really don't know whether the item removed was the documentary, or some other content keyed to "LauraPoitrasCitizenfour" (the trailers are still up at IA). It doesn't matter, IA's TOU are posted, and state when they remove stuff.
I think that trying to create a controversy where none exists takes the encyclopedia substantially backwards, not forwards. —Aladdin Sane (talk) 02:08, 26 March 2015 (UTC)
- Aladdin Sane Thanks for removing the content. It should have been removed because it is self-published. Wikipedia does not allow articles on organizations to cite publications by that organization, and in this case, the source cited was only something self published in a public place on that website. If third-party journalists write about something then it can go here, otherwise it stays out. Blue Rasberry (talk) 15:07, 26 March 2015 (UTC)
stub childs
@Imaginatorium: If anyone wants to lend a hand, I've redirected the following stubs here: RECAP US Federal Court Documents (collection), Microfilm (collection), Universal access to all knowledge, and NASA Images; there's more at Internet Archive's Children's Library, American Libraries (collection), Canadian Libraries (collection), and US Government Documents. Thanks. fgnievinski (talk) 06:47, 15 September 2015 (UTC)
Internet Archive & Wayback Machine servers are s-l-o-w
I have 100/100 Mbps fiber optic service. Internet Archive and Wayback Machine are some of the most frustratingly slow connections of all web connections I make on a daily basis. Sometimes I also get a message from WM that a webpage is not available -- and I'm looking at it with another browser. It's as if IA servers (particularly on weekends) are operating on dial-up time. 100.32.106.189 (talk) 13:28, 30 January 2016 (UTC)
- The Archive operates on a shoestring budget, with a chronic deficit of manpower, and its charter prioritizes the preservation of information foremost, not so much making that information convenient to access (though it's slowly getting better at that). Its data clusters are designed for highly economic storage and the ability to retain data despite hardware failures. Expect access to remain slow. Performance is just not a priority for its extremely limited funds and engineer-hours. TTK (talk) 21:51, 15 April 2016 (UTC)
Please Update this article with these new figures
Hi, I'm the Director of Partnerships at IA. I noticed there are a lot of old facts and figures in this article. Here's a source with up-to-date information: https://archive.org/about/
For instance in 2017 we now have 30 petabytes of data.
Some good secondary sources (that were requested) include: Medium: "Never Trust a Corporation to do a Library's Job": https://medium.com/message/never-trust-a-corporation-to-do-a-librarys-job-f58db4673351
The New Yorker--Jill Lepore's "The Cobweb: Can the Web be Archived?" http://www.newyorker.com/magazine/2015/01/26/cobweb
Thanks for helping to make this more accurate.
best, Wendy Hanamura — Preceding unsigned comment added by Whanamura (talk • contribs) 21:39, 15 April 2017 (UTC)
Robots.txt to be ignored
It's unclear at this time exactly when this will apply to sites other than government ones, but archive.org have announced in their blog that they are "looking to do this more broadly" https://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/
It may be worth mentioning this where the article currently gives a false sense of privacy in saying that robots.txt is obeyed. 51.6.114.17 (talk) 19:09, 24 April 2017 (UTC) >
> Just read this, I'm not sure. Neilc314 (talk) 05:34, 23 May 2018 (UTC)
Gifcities
Is Gifcities notable enough to make a section about it? --Nutshinou (talk) 12:13, 3 September 2018 (UTC)
- As much as I love it, probably not. Isn't it enough to describe it as part of the Geocities archival (which is perhaps most relevant for ArchiveTeam in a way)? --Nemo 18:38, 3 September 2018 (UTC)
Welp, that's it.
2 days of it not being up and running on PC, it's safe to say that the Internet Archive is done with. We've officially lost the world's largest internet archive site. F. CappyKid64 (talk) 17:20, 2 January 2020 (UTC)
- I've been using it all day. The Archive is up. lethargilistic (talk) 17:24, 2 January 2020 (UTC)
- I've just learned that Microsoft Edge is the problem here. It works fine on Chrome. CappyKid64 (talk) 17:26, 2 January 2020 (UTC)
Links to in-copyright books hosted on archive.org
This discussion is off-topic for this talk page. This should be discussed on the talk page of the relevant policy/guideline or the village pump.
|
---|
I'm starting to run across bibliographies on Wikipedia that contain links to archive.org to download a copy of a book. Nearly always, the books are new enough that they are still covered under copyright Should these seemingly WP:COPYVIOEL links be allowed on Wikipedia? For example, seven of the books on Seth Godin#Bibliography include links where a copy can be downloaded from archive.org. All of this author's books are copyright. If there is an exception to Wikipedia's copyright policy that allows for links to in-copyright books to be hosted on archive.org then I believe there should be a Wikipedia help page containing the supporting rational for this exception and that article would be linked to from articles such as Internet Archive, WP:C, WP:CP, etc. --Marc Kupper|talk 22:01, 16 January 2020 (UTC) Also see Open Library#Copyright violation accusations. --Marc Kupper|talk 22:01, 16 January 2020 (UTC)
|
National Emergency Library
On March 25, 2020, the Internet Archive has launched the National Emergency Library which is defined as "a collection of books that supports emergency remote teaching, research activities, independent scholarship, and intellectual stimulation while universities, schools, training centers, and libraries are closed."
This information is sourced also by insider.com as a secondary source.
I think it is related to some previous topics opened in the current talk page. It is the first time for such an initiative in the history of the Internet Archive. Its duration is also relevant since it will be operative at least untile on 30 June, if the emergency state law won't be deferred for a second time.Micheledisaveriosp (talk) 12:29, 26 March 2020 (UTC)
- Yes, I think we can wait for a few days until the sources settle but there are sufficient sources already, for instance NPR, Vice and various others [1] [2] [3] [4] [5] [6] [7]; also in other languages/countries it fr ph pl de my. Nemo 08:28, 29 March 2020 (UTC)
- there was a post in the Internet Archive's blog with many of those sources. Publishers' associations are charging with piracy the National Emergency Library (sourced here) and it prevents the extention of similar initiatives to other websites of public interest such as DOAJ which has the largest collection of high-quality and open access scientific papers existing in the world. While libraries are closed for the coronavirus and are the unique subscribers allowed to have a payment access to the whole database, researchers and physicians -expecially of Third World countries- are deprived of such an important source for their studies and experimental therapies. A full open access approach can meet their limited economic sources. But this would be another chapter of the saga. I think we can integrate a concern of the National Emergy Library into the WP article just when we have the first probable legal claim definetely solved. Best regards.Micheledisaveriosp (talk) 08:50, 31 March 2020 (UTC)
- This talk page is not a forum to express your personal opinion on the topic. I'm not sure what was your point but you seem to be saying that the information should not be added until there is a lawsuit, did I get it right? There is no need to report on the article about every adjective everyone has used about this or another initiative of the Internet Archive (Smith said "gorgeous", Doe said "criminal", blablah), the article needs to stick to reliable sources and avoid fringe views. Nemo 12:47, 31 March 2020 (UTC)
- DOAJ may be interested by the same issue in the middle term but we don't have a sphere on the future. You get right. As you said, there exist a lot of reliable sources for the Wp article. You are a more expert user of WP and if you agree that we can add a concern now, then I think we can proceed.Micheledisaveriosp (talk) 20:21, 31 March 2020 (UTC)
- If there isn't significant reporting of said concerns, there's no need to mention them. The NPR article doesn't count because they were bullied into "balancing" their previous article. Nemo 20:24, 31 March 2020 (UTC)
- DOAJ may be interested by the same issue in the middle term but we don't have a sphere on the future. You get right. As you said, there exist a lot of reliable sources for the Wp article. You are a more expert user of WP and if you agree that we can add a concern now, then I think we can proceed.Micheledisaveriosp (talk) 20:21, 31 March 2020 (UTC)
- This talk page is not a forum to express your personal opinion on the topic. I'm not sure what was your point but you seem to be saying that the information should not be added until there is a lawsuit, did I get it right? There is no need to report on the article about every adjective everyone has used about this or another initiative of the Internet Archive (Smith said "gorgeous", Doe said "criminal", blablah), the article needs to stick to reliable sources and avoid fringe views. Nemo 12:47, 31 March 2020 (UTC)
- there was a post in the Internet Archive's blog with many of those sources. Publishers' associations are charging with piracy the National Emergency Library (sourced here) and it prevents the extention of similar initiatives to other websites of public interest such as DOAJ which has the largest collection of high-quality and open access scientific papers existing in the world. While libraries are closed for the coronavirus and are the unique subscribers allowed to have a payment access to the whole database, researchers and physicians -expecially of Third World countries- are deprived of such an important source for their studies and experimental therapies. A full open access approach can meet their limited economic sources. But this would be another chapter of the saga. I think we can integrate a concern of the National Emergy Library into the WP article just when we have the first probable legal claim definetely solved. Best regards.Micheledisaveriosp (talk) 08:50, 31 March 2020 (UTC)
- Masem, it's not a good faith edit. The anonymous edit was a mine test. The discussion page of the article remained at least a day without the last edit visible, even if nothing in the related chronology showed it was put under approval nor rejected. It was a vandalism but I experienced it can also happen for edits made with the creation of new portions of sourced text and by hand of autoconfirmed users. Wikipedia has no censorship, but the practice is far different. In the Italian Wikiquote anyone can delete the discussions you have created, even to move doubts on the reliability of a new source or if a single quotation shall be integrated into the article. Discussion are uniquely partecipated by the website administrators. But this is not the mater of the current topic. I think it can be hopefully deleted since it didn't produce an improvement of the WP article nor a partecipated discussion. WP is not a forum between me and a couple of other editors. Have a good journey on Wikipedia.Best regards.Micheledisaveriosp (talk) 00:07, 14 April 2020 (UTC)
Edit Request - National Emergency Library
I noticed two fact errors and a possible neutrality issue in the National Emergency Library section. Please excuse any formatting issues in the below requests.
This edit request by an editor with a conflict of interest was declined. The request was not specific enough. |
Edit request part 1
|
---|
|
Surf314 (talk) 20:05, 23 April 2020 (UTC)
- Just want to acknowledge that citing to the Archive's sources may be frowned upon, but the Wired article was indeed very factually inaccurate on this point. It wasn't just the misattributed reason for the opt-out; the same section went on to say that "If the Archive can’t, by default, treat its scan of your book as its own copy to loan, its collection will dwindle to almost nothing," which is, frankly, nonsense because the Archive serves public domain books independently of this. It's not a quality source on the matter. I would suggest either correcting the timeline by putting the opt-out mention with the initial rollout (and perhaps mentioning confusion about the issue) or at least changing the concluding sentence to say "the Archive provided an opt-out system for authors to use when they released the NEL." It's probably also relevant that the Internet Archive never required DMCA requests. They set up an email account. lethargilistic (talk) 21:40, 23 April 2020 (UTC)
Edit request part 2
|
---|
Surf314 (talk) 20:05, 23 April 2020 (UTC)
|
Surf314 (talk) 20:05, 23 April 2020 (UTC)
Reply 02-MAY-2020
- The proposed text to be added to the article is missing. To expedite your request, it would help if you could provide the following items of information:
- Please state each specific desired change and accompanying reference in the form of verbatim statements which can then be added to the article (if approved) by the reviewer.
- The exact location where the desired claims are to be placed should be given.
- Exact, verbatim descriptions of any text and/or references to be removed should also be given.[1]
- Reasons should be provided for each change.[2]
- In the section of text below titled Sample edit request, the four required items are shown as an example:
Sample edit request
|
---|
|
- Kindly open a new edit request at your earliest convenience when ready to proceed with all four items from your request. Thank you!
Regards, Spintendo 16:56, 2 May 2020 (UTC)
References
- ^ "Template:Request edit". Wikipedia. 30 December 2019.
Instructions for Submitters: Describe the requested changes in detail. This includes the exact proposed wording of the new material, the exact proposed location for it, and an explicit description of any wording to be removed, including removal for any substitution.
- ^ "Template:Request edit". Wikipedia. 30 December 2019.
Instructions for Submitters: If the rationale for a change is not obvious (particularly for proposed deletions), explain.
University presses copying from the Internet Archive?
I recently discovered that an e-book being sold by Cornell University Press, namely Induction and Hypothesis: A Study of the Logic of Confirmation by Stephen F. Barker (1957), was copied from one of the Internet Archive's (IA) digitized copies, namely Induction and Hypothesis: A Study of the Logic of Confirmation at the Internet Archive. The ebook that is being sold by Cornell University Press has the IA watermark and URL in it, and is exactly the same as the IA copy in all other respects, so it's obvious that it was copied from the IA. I found this to be quite curious and ironic: While some publishers are suing the IA for allegedly violating copyrights, at least one publisher copied one of the IA's PDFs and is selling it! This suggests to me that the relationship between university presses and the Internet Archive should be mentioned in this Wikipedia article.
There are a few posts on the IA blog that mention the relationship between it and university presses, including Cornell University Press, but unfortunately I haven't been able to find a source that mentions that IA-digitized books are being sold as e-books by the original publishers:
- Freeland, Chris (May 21, 2018). "Internet Archive awarded grant from Arcadia Fund to digitize university press collections". blog.archive.org. Retrieved 2020-06-27.
- Bailey, Lila (September 4, 2019). "MIT Press Embraces New Access Models to Fulfill Mission". blog.archive.org. Retrieved 2020-06-27.
- Freeland, Chris (April 27, 2020). "Forging a Cooperative Path Forward: University Presses & the National Emergency Library". blog.archive.org. Retrieved 2020-06-27.
That last blog post above even mentions Wikipedia as a justification for the IA's digitization of university press books: "University press books are evergreen, well-cited in Wikipedia, and are the foundations of much scholarship." Biogeographist (talk) 15:35, 27 June 2020 (UTC)
- For all purposes, that claim is original research - we cannot call out what may seem to be illegal or questionable activities like this. --Masem (t) 15:53, 27 June 2020 (UTC)
- I'm not saying it's illegal or questionable that Cornell University Press is copying their own book from the IA and selling it as an e-book; the blog posts cited above indicate that there is an explicit agreement between university presses and the IA (but the details are not fully explained in the blog posts). It's that relationship that I'm thinking about adding to the article. Here's a secondary source that mentions the general relationship between some university presses and the IA, for example, from the reputable Publishers Weekly: Green, Alex (December 1, 2019). "New Takes on Academic Publishing: Three university presses find new ways to keep up with a changing market". publishersweekly.com. Retrieved 2020-06-27.
Since she became director in 2015, there's little that Brand hasn't reenvisioned at the press. In 2017, the press partnered with the Internet Archive to make its deep backlist available for free at libraries, resurrecting books that had not seen the light of day in generations.
Biogeographist (talk) 16:18, 27 June 2020 (UTC)- Oh, that part is completely reasonable, yes. That's fair to add, reuse of the IA by others. --Masem (t) 16:23, 27 June 2020 (UTC)
- I'm not saying it's illegal or questionable that Cornell University Press is copying their own book from the IA and selling it as an e-book; the blog posts cited above indicate that there is an explicit agreement between university presses and the IA (but the details are not fully explained in the blog posts). It's that relationship that I'm thinking about adding to the article. Here's a secondary source that mentions the general relationship between some university presses and the IA, for example, from the reputable Publishers Weekly: Green, Alex (December 1, 2019). "New Takes on Academic Publishing: Three university presses find new ways to keep up with a changing market". publishersweekly.com. Retrieved 2020-06-27.
I have added information to the article about the IA and university presses. If anyone finds a reliable secondary source that has further information about what is going on with Cornell University Press independently selling/distributing an IA-digitized book, please mention it here, as I would like to point out in the article that two-way relationship with university presses (i.e., that university presses are benefiting from IA's digitization efforts independently of IA's book lending program). Biogeographist (talk) 17:40, 27 June 2020 (UTC)
- I think the Internet Archive has written in various places that they have programs where the university presses provide materials (in physical form) and copyright licenses (if necessary) in return for the ability to use the scans. Nemo 22:53, 27 June 2020 (UTC)
Unverifiable list of digitizing sponsors for books
The table of book-digitization sponsors that formerly appeared in the article has been pasted below because it is based on an apparently unverifiable source. If the table is created again, it should be created using a verifiable source and more up-to-date numbers. The text below is from the article. Biogeographist (talk) 18:21, 4 December 2020 (UTC)
As of December 2018, over 50 sponsors helped the Internet Archive provide over 5 million scanned books (text items). Of these, over 2 million were scanned by Internet Archive itself, funded either by itself or by MSN, the University of Toronto or the Internet Archive's founder's Kahle/Austin Foundation.[1]
The collections for scanning centers often include also digitisations sponsored by their partners, for instance the University of Toronto performed scans supported by other Canadian libraries.
References
- ^ a b "Internet Archive meta manager". Archived from the original on January 27, 2019. Retrieved December 20, 2018.[failed verification]
archive.org redirect
There's a redirect-confused that says "archive.org redirects here". But currently archive.org redirects to Wayback Machine. But again, it should redirect here because the Internet Archive is much more than just the websites archive. Wikipedians figure it out.--95.208.211.114 (talk) 12:49, 29 April 2021 (UTC)
- A drive-by IP edit changed the redirect about a week ago. I've restored it. Mindmatrix 15:25, 29 April 2021 (UTC)
clarify lede
The main subject of this article seems to be the 501c3 nonprofit organization that runs the Internet Archive digital libary and other products/services. The first 2 paragraphs of the article focus on both at once. Would it make sense to slightly rephrase the lede so that the organization is clearly distinguished from the products/services it produces? -- Oa01 (talk) 11:37, 16 August 2023 (UTC)
Userbox
I have created a userbox to help spread the word that Internet Archive is a useful website archiving service. {{User Internet Archive}} — Preceding unsigned comment added by Blargh29 (talk • contribs) 18:02, 20 July 2009 (UTC)
Status
Does anybody have some news about the current status since it lost the lawsuit? Mr.Lovecraft (talk) 09:31, 5 September 2023 (UTC)
- Thank you for that question.
- As I covered at the Internet Archive's annual update, at this year's Wikimania gathering, here is a blog post we have shared https://blog.archive.org/2023/08/17/what-the-hachette-v-internet-archive-decision-means-for-our-library/ Markjgraham hmb (talk) 18:33, 5 September 2023 (UTC)
- Yeah i´ve read that already... But when i tried to borrow a book last Monday it worked at least for one hour. So i was woundering whether it was just a technical error or indeed the consequences from that lawsuit... Mr.Lovecraft (talk) 09:29, 6 September 2023 (UTC)
- I believe that the lawsuit was just about the controlling digital lending of books and not everything that the Archive does. The Wayback Machine doesn't seem affected. Many of its collections don't seem affected. I have seen some speculation online that defeat could bankrupt the Archive, but the Internet Archive is not very open about its finances and it's hard to find a robust source on that. I don't see a reason to think that the lawsuit is any more a threat to its existence than the bad reputation that it has gained for hosting terrorist and neo-Nazi videos/books. Epa101 (talk) 21:30, 4 November 2023 (UTC)
I was "borrowing heavily" today for several bibliographies I'm working on. As of the past hour (approx 1:30 pm EDT) all searches are resulting in the "Borrow Unavailable" message. Meanwhile, no related news could be found on the web. Has the Archive's lending library shut down? Allreet (talk) 18:02, 10 September 2023 (UTC)
- Hi,
- The Internet Archive's library has not been shut down.
- Please read https://blog.archive.org/2023/08/17/what-the-hachette-v-internet-archive-decision-means-for-our-library/
- You say you were "borrowing heavily". You may have hit a borrowing limit. We have always had limits, just like nearly every library.
- Please feel free to email info@archive.org (or me directly) with specifics if you want to explore this further.
- - Mark Graham mark@archive.org Markjgraham hmb (talk) 18:50, 10 September 2023 (UTC)
- @Allreet – Please note: Talk pages are for improving articles, and not for general discussions about the article subject. Please use WP:Reference desk for generalized inquiries and discussion. Thank you. -- dsprc [talk] 07:34, 12 September 2023 (UTC)
- Understood. Thanks Allreet (talk) 12:48, 12 September 2023 (UTC)
Original research?
Some of the referencing here seems unusual for Wikipedia. The sections on the number of pages archives by year and the languages of the books archived are based on searches. We wouldn't reference YouTube with a search of YouTube on a certain date. With websites that constantly change, the numbers get outdated immediately. If a book said that X million articles on the Archive are in French, that would be fine to cite; using a number found through a search of archive.org on a certain date seems like original research to me. Epa101 (talk) 21:17, 4 November 2023 (UTC)
- Having had no response, I'm going to remove the sections that I believe constitute original research. Epa101 (talk) 13:24, 12 November 2023 (UTC)
- Responding late, but I agree that's appropriate removal. Masem (t) 14:18, 12 November 2023 (UTC)
- @Epa101: Those numbers were not based on searches. They were based on the text on archive.org's front page as it appeared on previous dates. It was no different than citing to any other archived source from the past. Particularly, the YouTube comparison was inapt to this application of the Wayback Machine because you cannot use YouTube's search to find out how many videos YouTube had one a previous day. Neither could you use the Internet Archive's item search to determine this information. This is more like citing old issues of a newspaper that announced its circulation to establish a timeline. I think presenting this information in tables was a bit garish, but they were not OR and removing them on that basis was inappropriate. lethargilistic (talk) 01:20, 16 November 2023 (UTC)
- @Lethargilistic Hello. I've had another look. I see your point on the first table, with the archived pages in billions. This took me a while to find the right number in each case. However, I also deleted the tables by language and century scanned, and those figures are not on the front-page: they only come up through searches and the numbers presented are outdated now. Epa101 (talk) 08:42, 16 November 2023 (UTC)
- @Epa101 Oh, somehow I didn't see the removal of the language/century ones. That's definitely OR. But the first table ought to come back, IMO. lethargilistic (talk) 11:29, 16 November 2023 (UTC)
- @Lethargilistic Yes, I'm happy to agree with you on that. I'll reinstate them after work. Epa101 (talk) 15:53, 16 November 2023 (UTC)
- @Epa101 Oh, somehow I didn't see the removal of the language/century ones. That's definitely OR. But the first table ought to come back, IMO. lethargilistic (talk) 11:29, 16 November 2023 (UTC)
- @Lethargilistic Hello. I've had another look. I see your point on the first table, with the archived pages in billions. This took me a while to find the right number in each case. However, I also deleted the tables by language and century scanned, and those figures are not on the front-page: they only come up through searches and the numbers presented are outdated now. Epa101 (talk) 08:42, 16 November 2023 (UTC)