Is that a thing? Like, google, except 'here we're only searching THESE
places and only the places that are explicitely OK to be searched.'
It might be cool. It's something I'm interested in.
only the places that are explicitely OK to be searched
Tilde search engine?
On Sat, 22 Feb 2025, Andrew Singleton wrote:
[...]
These are two close by which I have from the top of my head:
- https://wiby.me/
^ Coverage stems from individual URL users nominated.
- https://searchmysite.net/
^ Coverage stems from sites users nominated, as well as sites
paying for them to be their site-internal search engine.
[...]
- I remember ~deepend has an AltaVista clone running at:
https://notaltavista.com/
^ WWW/Gopher, coverage stems from tildes, a bit jank
but might fit your use.
- As ~yeti mentioned, there is also Marginalia.
Marginalia has tilde filter, but it seems to be based on
just having `~` as the first symbol in the path section of the URL.
I'll link to the classic non JavaS'creep-infected version
of Marginalia here:
https://old-search.marginalia.nu/
And as a tangent, for a general mid-sized search engine which has
its own index [1], free for public uses [2], not playing shady ploys
with `robots.txt`, is purely algorithm-based and untainted
by machine learning black-box techniques, as well as not involving
in prose-laundering cartel, there is:
https://www.mojeek.com/
^ English only.
Which I currently use as my main search engine.
(And you cannot search Reddit with it of course [3];
but that wasn't the point of the original question)
Regards
~xwindows
[1] DuckDuckGo don't qualify, because they use result from Micro$oft
Bing. Some people ask how did I know for sure: it was because my WWW
sites used to be searchable from DuckDuckGo several years ago.
Then Micro$oft went all-in with their Open(washing)AI partnership
and use Bing crawling data to feed their prose-laundering
businees; so I banned Bing from my sites (in both `robots.txt` and
CIDR blacklists), while explicitly allowing DuckDuckDo (in both
`robots.txt` and IP whitelist which has greter precence than
blacklist). And sure enough, within a month, my sites had all but
disappeared from DuckDuckGo's search coverage.
[2] Meaning Kagi don't qualify for my use; as a registration-required
service, prolonged usage can result in bubble effect one can't
verify; and as a paid service, there is huge risk of data association
between search terms and user's real-life identity as well.
[3] https://archive.ph/GS2I0
Sysop: | deepend |
---|---|
Location: | Calgary, Alberta |
Users: | 269 |
Nodes: | 10 (0 / 10) |
Uptime: | 89:08:32 |
Calls: | 2,165 |
Calls today: | 3 |
Files: | 4,567 |
D/L today: |
18 files (9,997K bytes) |
Messages: | 419,751 |