A purported leak of 2,500 pages of internal documentation from Google sheds light on how Search, the most powerful arbiter of the internet, operates.
The leaked documents touch on topics like what kind of data Google collects and uses, which sites Google elevates for sensitive topics like elections, how Google handles small websites, and more. Some information in the documents appears to be in conflict with public statements by Google representatives, according to Fishkin and King.
Can’t wait for selfhosted web search to become better.
You mean hosting your own crawler/indexer? That doesn’t really sound like a thing you could do cost-effectively.
No problem we crowdsource the crawling torrent style.
We outsourced that to google for reasonnable performance reason. But they shit the bed so now there’s no choice but to do it ourselves.
ooh that might be an interesting app to run on veilid
What is that and how does it apply ?
Source: https://en.wikipedia.org/wiki/Veilid
Federated bookmarks?
Federated directories. We’re going back to Yahoo like it’s 1995
Webrings!!!
Uh…I know we’re all just having fun here, but I need to be part of a webring again. If anyone is more than joking, I kinda need to know about it. Thanks.
there are tons of webring still going these days!
Seriously? Cool. I’m going to go do some research then. And maybe entirely change the purpose of my blog, just to fit into one…
can you share a link to it if you’re comfortable with that
I loved Geocities!
Neocities is trying to be a modern reincarnation https://neocities.org/
I mistook that as neopets
Yahoo patiently plotting its return from Japan.
I’m so ready for something like this. I’ve cleaned up my bookmarks and been waiting for alternatives to search engines.
SearxNG
Right!
Ars
You could use Common Crawl, it’s run by a non profit
https://en.wikipedia.org/wiki/Common_Crawl
Look up the yacy repo in github
How is that even supposed to work? These search engines need per definition massive databanks to search through. Either you need your own crawler and indexer which is more than just inefficient, or you are limited to a relatively short list of curated static results.
If they’re taking tips from Google, why would they get better?
Google actually was good, so there’s probably some good information in this documentation. If nothing else we can perhaps figure out what “went wrong.”
Edit: I’ve been reading the blog post that appears to be the main person the leak was shared with and there’s a lot of in-depth analysis being done there, but I’m not seeing a link to the actual documents. This is a huge article, though, I might be overlooking it.
That was an interesting read. Thanks for linking to it.
What are the current contenders?
Ars Technica this week: Bing outage shows just how little competition Google search really has
The referenced search engine comparison by Rohan “Seirdy” Kumar
can’t emphasise too much that this piece is a very necessary read for anyone who wants to know about search; not just because it says good things about us, but because of the depth of research which has been put in here. Most times you encounter an article about indexes they are just taking whatever a (meta)search engine says about themselves, not even looking at privacy policies for “relationships with microsoft” etc. or doing any comparative work.
I’ve been using Kagi and really like it so far. It’s not good for local stuff, but afaik only Google and Bing have the resources and userbase for things like maps and reviews. It’s designed to be an ad-free ‘premium’ search engine and only earns revenue from users paying for membership.
OpenStreetMap’s platform is the only real way to compete against Google and Apple and it’s why Microsoft even though it has Bing Maps, has licenced to them resources like satellite imagery for mapping. It’s awesome in bigger population areas but there’s still a lot to map in rural places outside the EU.
Review is harder. Right now the leading open platform afaik is Open Reviews (aka Mangrove Reviews) which has tie-ins to OSM projects like MapComplete. OsmAnd and OrganicMaps have open tickets to hook into that ecosystem. You’re right about the userbase problem though, I think it (or a successor) needs AP federation to really take off. That being said there’s several active non-Google nonfree alternatives like Yelp and TripAdvisor as well as niche sites for things like camping, parks, and schools.
the only one I know that isn’t a proxy search is yacy
I was looking at it the other day unfortunatly its got quite poor results
That tracks
YaCy, Mwmbl, Alexandria, Stract, Marginalia to name a few.